New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal to add readme
to schema.org
#247
Comments
This feels like a subset of |
I agree this is a type of documentation, but I do find this property very useful by itself. Software repos often have more than one documentation, e.g., a readme, and a Here I fully agree on the overlap in meaning between |
Force11 Codemeta Task Force suggests to use |
@mfenner what do you mean by "readme" should be in the description? A readme file often contains installation instructions and usage examples, which go beyond a description of what a software component does. |
We felt that |
PR implementing change is #260 |
@mfenner @tmorrell, @mfenner, is this decision based on any statistical analysis of common practice? There are plenty of repositories where the readme goes beyond a description of a repository (including license, citation, installation instructions, requirements...). For example, in GitHub, there are small descriptions of a repository, which are 2-3 sentences provided for the authors usually seen in the top-right of the repo. Then you may have a longer description (which is part of the readme file, I agree), and then you may have separate documentation (e.g. in readthedocs). Having "readmes" as "descriptions" is conflating too many elements about the software under a single property. @tmorrell this issue is about adding readme in schema.org, not about deleting properties from codemeta. When creating a standard it is usually a very bad practice to delete properties, because people who may have started using codemeta will suddenly have incompatible representations. For example, let's say I have annotated hundreds of repositories adding "readme" in my current implementation by using codemeta. Now the codemeta version suddenly adds a breaking change. If there is an agreement that a property should not be any longer part of the standard, it is usually kept with a deprecated tag. That way older versions will still work. Otherwise, people will not be keen to a standard when a new version may break their hard work. Finally, I answered in this thread responding to an open call for feedback. I feel like this feedback has not been taken into account. It looks like whatever the codemeta TF decided is what is put in the PR. If that's the process this community will follow, then why asking for feedback from the community? |
@dgarijo thank you for your feedback. Let me first answer to the last sentence in your feedback. We spent a lot of time pushing a major update of the codemeta schema, including talking to many people. Your feedback is very valuable, but this is does not necessarily mean we follow your feedback. In this case we spent a lot of time discussing this issue in a call yesterday, and also have feedback in the issue comments (from @arfon) that wants this to go in another direction. We are aware of the risk of removing properties, but the major driver is closer alignment with schema.org. |
@mfenner Codemeta is already aligned with schema.org, as far as I understand, the effort is to push properties from codemeta into schema.org, right? Removing properties already affects part of my work, that's why I fear for codemeta adoption if removing properties is going to be a commonplace practice. |
I've closed my PR so there can be more discussion. I still think moving everyone to the standard schema.org property |
ok, let's say that there is a repo, e.g.: https://github.com/tensorflow/tensorflow They do have a readme: https://github.com/tensorflow/tensorflow/blob/master/README.md and they do have API documentation: https://www.tensorflow.org/api_docs/ They also have tutorials: https://www.tensorflow.org/tutorials/. And more resources (all these are linked in the readme) Would the |
Since we all agree that readme is a type of documentation, it seems reasonable that someone writing a codemeta file would place a link to a README there. I think we all agee as well that there are 'different kinds of documentation', and perhaps there could be a clear use case to be made for a Meanwhile, I think placing readme in an array of Footnote, but It's not 100% clear to me how schema.org/documentation is typed, but if these are proper URIs, then you could have: ...
"documentation": ["https://www.tensorflow.org/tutorials/", "https://github.com/tensorflow/tensorflow/blob/master/README.md"] And then use these URIs as
So you can effectively already nest readme under documentation and provide quite a lot of additional context to distinguish it from other documentation without moving it outside of the documentation, right? (I wasn't at the meeting either, I'm just sharing my own thoughts here in the spirit of the discussion). |
@cboettig, For example, look at https://schema.org/Property. It inherits two properties: disambiguatingDescription and description. One is a subproperty of the other. It doesn't mean that a disambiguatingdescription is not a description. However, they are separated because conflating everything under description may not be as useful. Having different types for documentation in an array as you suggest actually solves the problem too. This is similar to the proposal that @moranegg was suggesting, but having the types with classes is cleaner than having role-like properties (in my opinion), because it avoids reification. I can live with this solution :) But we would have to check if the range of documentation is compatible with a list of elements. The disadvantage of using class and types is that it's a little harder to consume (before I had only to ask for "readme", now I have to iterate over the "documentation" and find which of them is of type readme); and that in reality we are placing the label somewhere else (instead of properties they become classes). Plus we would have to formally defines the types as subclasses of "CreativeWork" |
I believe the JSON-LD spec allows basically anything to be array-valued, and it should produce sensible RDF triples. Not sure that schema.org adds any restrictions to that. I certainly acknowledge the difficulties in consuming the data, but that is the blessing and the curse of adopting the linked data model. Any link can be expanded to information about that link. (even if I think whether we need additional refinements of CreativeWork to describe different types of documentation ought to be use-case driven. For the moment, I think a working definition of README could be defined as a filename regex, which may be more robust than assuming every metadata provider explicitly annotates it as such. |
Answering per paragraph:
|
see the new discussion to come to a consensus: #335 |
This issue is part of the SCIWG CodeMeta task force aiming to add all additional properties into schema.org.
Text proposal for the adding the
readme
property in schema.org:In modern software engineering having a
readme
file in the root of the source code project is good practice.It is usually the place for the software description and many guides exists on how to write a good README.
The metadata describing the software, can gain a relevant property with a link to the
readme
file, which is already possible with the CodeMeta vocabulary.We suggest using a PID for this property, which can be a URI or a SWHID (linking to an archived copy of the
readme
:https://archive.softwareheritage.org/swh:1:cnt:ed2eedca46c719144d2485d1a2c3d25c21b2bcd3;origin=https://doi.org/10.5201/ipol.2018.236;visit=swh:1:snp:e0674ffb865529b05511808d1ee7ba5d72346009;anchor=swh:1:rev:fad7a0486bb7a7cfdbb1c28e28a64f2d3f5e0df9;path=/mlheIPOL/README.txt/
A proposal for the property's description:
inspired from https://www.wikidata.org/wiki/Q539662
The text was updated successfully, but these errors were encountered: