-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define components, standards, requirements for tool discovery #31
Comments
Relevant short blog post from the Sofware Sustainability Institute: https://software.ac.uk/blog/2021-05-20-what-are-formats-tools-and-techniques-harvesting-metadata-software-repositories |
I also recommend this paper on FAIR research software: https://content.iospress.com/articles/data-science/ds190026 |
The main premises I envision for software metadata harvesting are:
The role of the harvester is to collect software metadata in one common vocabulary (
The role of the tool store is to hold and make available for querying the As for implementations I'd like to aim for simplicity. I started a codemeta The tool store can probably also be kept quite simple. Just loading all triples Now there's one major aspect which I skipped over. I want to make a clear Codemeta is more focussed on describing the source code, so already in 2018 I To accommodate software as a service I imagine that we also list service |
Had a quick call with @menzowindhouwer about this, to discuss possible alignment with the FAIR Datasets track. He wants to expand the already established OAI Harvest Manager with further options to deal with non-OAI-PMH and non-XML based metadata (of which codemeta would be one). Such functionality will be needed anyway for FAIR Datasets. I expressed some concerns regarding complexity when extending that harvest manager to do too much, although it looks well designed and fairly extensible. We decided to continue on both tracks, I'll implement the simple harvester because it will be easy and fast (and we need results quickly here). Menzo will continue with the harvester manager because that will be needed in other scopes (FAIR datasets) anyway. The harvester script I propose may also serve as an inspiration/example/proof-of-concept for further development of the OAI harvest manager. At the end we can always decide whether to replace the more simple solution with the more complex one if it proves more fruitful. We'll eventually need further convergence regarding the tool store aspect as well, possibly using the same solution for both tools and data. |
Software metadata is often encoded in READMEs. If there is no more formal schema available, we can extract metadata from a README and convert it to codemeta. An existing tool is already available that does precisely this: https://github.com/KnowledgeCaptureAndDiscovery/somef |
As mentioned earlier, the current codemeta standard does not feature everything we need for a more service-oriented approach, as it focusses on describing the software source ( |
There’s a slight contradiction between:
and
A way to solve this, is to make the
The question, of course, is whether we can ask this of software developers. We can mitigate this by offering a separate web service, a Software Metadata Extractor or CodeMeta Generator, where developers enter the URL of a repository, magic happens, and a A final problem is that of synchronising metadata: for example if developers change the |
Those are very good points yes. I was aware there was a bit of a contradiction and that the requirements might need some tweaking as the tool discovery task progresses. I was also a bit on the fence about how hard the requirement should be. The ownership argument you put forward is a good one and for CLARIAH software it would be fair demand to make. If we want to add some CLARIAH-specific vocabulary it might be inevitable even. But for possible external software and for some flexibility it helps if the harvester can do conversion for the cases where it wasn't already provided, it also helps prevent the sync issue you describe later.
Yes, the current harvester+conversion implementation I'm working on actually provides that function as well (without the webservice part though). The whole thing should remain simple enough.
I think a part of the job of the harvester+conversion is to do some basic validation so blatant out-of-sync errors are reported. But the syncing issue indeed remains; if users provide an explicit |
…ll as automatically extracting contributors from git (CLARIAH/clariah-plus#31)
We need to clearly define the software components, service components and data components for tool discovery, along with the standards we adopt and requirements we want to set for all CLARIAH participants.
All these will be formulated here as part of the Shared Development Roadmap v2: https://github.com/CLARIAH/clariah-plus/blob/main/shared-development-roadmap/epics/fair-tool-discovery.md
It contains an initial proposal, which was already discussed and positively received by the technical committee, but further details remain to be filled. A workflow schema also needs to be added.
Further discussion can take place in this thread.
The text was updated successfully, but these errors were encountered: