Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating RDF documents as one of the core functionalities #4518

Open
rajaram5 opened this issue Feb 11, 2022 · 15 comments
Open

Creating RDF documents as one of the core functionalities #4518

rajaram5 opened this issue Feb 11, 2022 · 15 comments
Labels
export Exporting a project to some format. Use the format-specific sub-label if available RDF import/export from/to RDF files Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.

Comments

@rajaram5
Copy link
Contributor

Hello,
I am a software developer/data architect. I am involved in many research projects where we carry out data uplifting activities. Our activities including converting source data into RDF. We use openrefine + rdf extension to create RDF documents in our project. I wonder if there is a plan for the openrefine team to support creating RDF documents as one of the core functionalities of openrefine.

@rajaram5 rajaram5 added Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators labels Feb 11, 2022
@fpompermaier
Copy link
Contributor

I use the json produces by OpenRefine to generate RDF using Apache Flink (maybe this could be useless when OpenRefine will use Spark). However it will be nice if the mappings could be exported also to a standard format like RML (https://rml.io/specs/rml/)

@andrawaag
Copy link

I currently use Onto Refine to generate RDF. Having the same functionality in Open Refine would be awesome.

@andrawaag
Copy link

In the past I worked with @rajaram5 on making data FAIR using Open Refine with an extension. I agree with Rajaram that making this functionality core to OpenRefine would be a major enhancement.

@ostephens
Copy link
Sponsor Member

Would this be served by including the RDF extension as part of the OpenRefine distribution or is this a request for different functionality to the current extension?

@rajaram5
Copy link
Contributor Author

To start with it would be good idea to include RDF extension as part of OpenRefine distribution with some UI fixes to the RDF skeleton of RDF extension

@wetneb wetneb added the RDF import/export from/to RDF files label Feb 11, 2022
@wetneb
Copy link
Sponsor Member

wetneb commented Feb 11, 2022

I also agree that the interface introduced by OntoRefine to generate RDF is nice and I would be happy to see an open source equivalent of this, for instance in the RDF extension.

Rather than ingesting many features in the core tool, I would be interested to encourage a vibrant ecosystem of extensions, by:

  • making it easier to install extensions (graphically, inside the tool itself. @lozanaross has proposals for how the UI would look like)
  • making it easier to develop extensions (better documentation for developers, Maven artifacts of the core tool deployed on Maven Central)
  • having clear policies about the stability of extension points with respect to versioning, and improving the user experience when trying to install incompatible extensions

I think it is important to talk about the social factors here. Deciding to include an extension by default in OpenRefine requires more coordination with the developers of those extensions so that they are ready on time for a release (they often need to adapt things as we upgrade our dependencies and improve our APIs). So, for users, including more extensions by default might mean less frequent releases.

One could also consider including the RDF extension in our repository and maintaining it like the core (just like we do it for the Wikidata extension) but this extension is already maintained by @stkenny who does a great job at it - why would I chase him off and give myself more work? I am actually inclined to separate more extensions as external git repositories and maintain them separately, as a way to make maintaining OpenRefine more manageable. I hope that with a better UX around extension installation, this is manageable for users.

In the project of integration of Wikimedia Commons in OpenRefine, I have tried to follow this principle by setting up a separate repository for some features, developed as an extension: https://github.com/OpenRefine/CommonsExtension

I hope my stance is not too frustrating for users. I wish I could leverage this enthusiasm for OpenRefine and RDF, motivating lots of contributors to help @stkenny to improve the RDF extension. ;)

@wetneb
Copy link
Sponsor Member

wetneb commented Feb 11, 2022

One more thought: maybe one symbolic step would be to host the repository of the RDF extension within the OpenRefine organization, inviting its contributors in our organization. That would not change much, but it would perhaps make the repository more visible and give recognition to the project and its contributors.

If you like the idea, feel free to propose it on https://groups.google.com/g/openrefine-dev

@ostephens
Copy link
Sponsor Member

I tend to agree with your view @wetneb - if we could deliver a good "extension management" that could be controlled by the user then the question of what was 'core' and what was not would become much less of an issue generally and that would be a better approach than just bringing all useful extensions under central maintenance

My only point against this is that adding the RDF extension to the distribution is something that would be pretty easy to do and although I completely accept your points about the maintenance and responsibility - this would raise the profile of the RDF extension, and if @stkenny was willing to maintain within the OR repo instead of separately then we'd basically be in the same situation as now with respect to maintenance - but with a raised profile for the RDF extension which could make it easier to get more developers involved

@wetneb
Copy link
Sponsor Member

wetneb commented Feb 11, 2022

Unfortunately, even if @stkenny agrees to maintain the extension as he is doing now, it does change quite a lot of things:

Say I want to introduce a breaking change in some API (for instance the change of JSON serialization from org.json to Jackson, to take a dramatic example). I cannot introduce it in OpenRefine without also updating the RDF extension in the same go, otherwise the CI build for this repository will fail. So either I do it (tada! I am now maintaining that extension too!), or I want to stick to my current commitments and not touch the RDF extension, what should I do? Leave the PR open and ask @stkenny to do the upgrade in the RDF extension in the same PR as soon as possible? That means putting a lot of pressure on him or a lot of frustration on me.

Many of our changes are time-sensitive: we need to fix an important vulnerability (say log4shell), we need to unblock the work of a contractor (say the Debian packaging), we need to release features from a funded project, and so on.

Similarly, if @stkenny wants to release a new version of the RDF extension, he can also be blocked by our agenda in the rest of the tool.

In the current situation, @stkenny can decide to update the RDF extension at his own pace instead.

@thadguidry
Copy link
Member

Nothing more to add to @wetneb comments. He's spot on with the direction that we should approach for both extension authors, our users, and our maintainers.

@andrawaag
Copy link

andrawaag commented Feb 11, 2022

As much as I agree with the "devops" points discussed so far, there is another angle I would like to add to the discussion. From a user perspective, navigating the version control between Openrefine and its extensions can be complicated at times. Each extension will need to be updated with each new release of Open refine.

There is the recent development where Open Refine (and other packages) are packed in portals such as, for example, the Jupyter Notebook server of the Wikimedia Foundation [1]. This is huge since it allows easier access. Unless of course enabling and maintaining plugins is part of the UX and does not require admin access to for example PAWS. To my knowledge that isn't currently the case.

[1] https://hub.paws.wmcloud.org/

@thadguidry
Copy link
Member

thadguidry commented Feb 11, 2022

@andrawaag
If we developed a feature...

  1. Auto install over internet
    where an Extensions manager could install extensions over the internet if users were given a dialog prompt to allow installing over the internet or not...
    Do you foresee issues where the UX of the Extensions manager showed which extensions were compatible (regular text - checkbox is available to install them ) and which were not (greyed text - no checkbox to select and install them)?

  2. Manual install from local path
    Do you foresee issues where a local folder path for extensions can be specified and then OpenRefine checks that path upon startup as it does now and shows the Extensions manager and warnings about incompatible extensions that could not be loaded?

@stkenny
Copy link
Contributor

stkenny commented Feb 12, 2022

Unfortunately I think having a vibrant ecosystem of extensions, whilst also keeping the ability to rapidly introduce breaking API changes may well be mutually exclusive. At a minimum the extension developers will need some notice of what needs to be changed.

Other than the Jackson change the RDF extension hasn't really changed much since it was forked from Fadi Maali's repository. It has really just been trying to keep it working, so it could do with some new features/design. Having it in the OpenRefine organisation would likely be beneficial as it would at least give a single authoritative source for people. The number of forks and outdated links has caused confusion in the past. See for example stkenny/grefine-rdf-extension#48

@wetneb
Copy link
Sponsor Member

wetneb commented Feb 12, 2022

Unfortunately I think having a vibrant ecosystem of extensions, whilst also keeping the ability to rapidly introduce breaking API changes may well be mutually exclusive. At a minimum the extension developers will need some notice of what needs to be changed.

I have tried to provide migration guidelines in the past and am very keen to know how it can be done better.
It would really help me to have concrete feedback on this!

Also, making it easier to install extensions and detect their incompatibilities with OpenRefine is likely to require some breaking changes on its own (such as: migrating out of Butterfly to something else), so that is a bit of a dilemma for me. Here too, input welcome!

Having it in the OpenRefine organisation would likely be beneficial as it would at least give a single authoritative source for people.

Okay, given the other comments in this thread I think we have a consensus for that, so let us do it :) I will create the repo and invite you.

@AtesComp
Copy link
Contributor

I created RDF Transform to combine the RDF extension with features of the OntoRefine version and extend with more robust processing. Review to see if this suits many of the issues reported here.

See: RDF Transform

@tfmorris tfmorris added export Exporting a project to some format. Use the format-specific sub-label if available and removed Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators labels Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
export Exporting a project to some format. Use the format-specific sub-label if available RDF import/export from/to RDF files Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
None yet
Development

No branches or pull requests

9 participants