Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt & release the schema-aligned v2 codemeta.jsonld context file? #142

Closed
cboettig opened this issue May 13, 2017 · 7 comments
Closed
Labels

Comments

@cboettig
Copy link
Member

As has been discussed in #134 and related issues, I propose we adopt and release a new version of the codemeta.jsonld context file that is more closely aligned with Schema.org.

To recap, the main reasons for the switch are:

A. to better align with / be more interoperable with related efforts (e.g. paper.json, DataCite's schema based version, etc),
B. To permit a basic codemeta representation entirely within the schema.org vocabulary,
C. facilitate tool development, particularly in cases where schema.org terms are already in use (e.g. #91).

The new context file maintains most concepts from the original, but choses more existing schema.org properties to express them instead of inventing new terms. Type choices are also more closely aligned to schema.org types, avoiding the use of the equivalent xsd types. To help review these changes I've created an annotated version of the current v1 context file.

I've also posted the properties and types on the codemeta.github.io terms page, (analogous to the schema.org pages for Types, though the webpage doesn't have unique endpoints for the new codemeta terms yet). These tables are probably the easiest way to review the properties taken from schema.org and also the new properties introduced by codemeta: http://codemeta.github.io/terms/

Note that in addition to re-mapping some old codemeta terms to their schema equivalents, I've listed additional terms in the schema.org namespace that may be relevant to codemeta. See #139 Please weigh in if we want any of these new terms or not.

@cboettig
Copy link
Member Author

cboettig commented May 17, 2017

Thinking about how to review proposed changes more, I think its better to break them out as we may want some changes but not others. Here's my revised summary of all the proposed changes.

Additionally, I've crosswalked codemeta-v1 into v2, and I illustrate how we can (approximately) transform any JSON-LD in codemeta-v1 context into the v2 context using Expansion and Compaction.

Minor changes

  • Accept?

In several cases we have invented a new term when schema.org provides a functionally equivalent one. In some cases our term already declares itself identical to the schema term, in some cases we have used a dcterms property identical to a schema property. I think the following terms could be easily re-aligned:

codemeta schema comments
licenseId license Schema uses URL, suggest we do so too
controlledTerms applicationCategory
tags keywords Seems "keywords" should suffice?
downloadLink downloadUrl Not sure why we've used something different
person Person types (classes) are upper case in schema
organization Organization ditto
Code SoftwareSourceCode standardize, though our notion includes SoftwareApplication terms
identifer identifier codemeta currently uses dcterms version
title name title comes from dcterms
packageSystem provider see #140

Dependencies

  • Accept?

This one is slightly more subtle, since I think we would adopt the schema term but change the type to something more expressive:

codemeta schema comments
depends softwareRequirements the schema term is a list of URLs or Text,

If we re-type this as SoftwareApplication it can do everything we currently put under depends (e.g. name, version, provider)

Agent/role structure

  • Accept?

See #135. This is perhaps the most useful change. The current proposal is a bit semantically troubled, but this also makes it very difficult to use json-ld tools (expansion & compaction). More immediately, it's rather hard to represent in a tabular crosswalk. I propose we adopt the relevant agent (Person/Organization) roles already defined by schema.org for now, and extend codemeta with additional roles when we find an essential role is missing (e.g. codemeta:maintainer).

Use Schema.org Types

  • Accept?

We also re-type a lot of schema.org terms using XSD types, which also seems needlessly complicated. e.g. we use xsd:string instead of schema.org:Text, etc.

When to declare Type?

  • Accept?

A more subtle issue is when we should declare a Type on a node. I propose that when referring to an existing schema term in our context in which we intend the same Type as it already has in Schema.org, that we do not explicitly type it, e.g. we should state:

"author": "schema:author"

rather than

"author": {"@id": "schema:author", "@type": "schema:Person" }

but when introducing a new term, we should explicitly type it:

"readme": {"@id": "codemeta:readme", "@type": "schema:URL" }

Using schema.org properties not already associated with the original schema.org type

  • Accept?

For instance, schema:programmingLanguage does't include version. Likewise, we adopt quite a few terms from http://schema.org/SoftwareApplication as metadata to SoftwareSourceCode objects, e.g. schema:softwareRequirements It appears this is accepted practice.

Drop redundant / confusing terms

  • Accept

These terms don't crosswalk to any existing schema, and I'm still not sure what they mean.

codemeta comments
isAutomatedBuild I think this is the same as having continuous integration, but maybe it means automatically building binaries? See #136
uploadedBy uploaded where? A package (and/or package metadata) is often many places at once: GitHub, CRAN, Zenodo, a metadata record passed to DataCite. I think this means "maintainer" most often, but I really not sure.
Relationship and associated properties. see #91

Inherit additional terms from schema.org

  • Accept?

Schema.org SoftwareSourceCode or SoftwareApplication types include a collection of terms not found in the original concept table, but at least some of which do crosswalk to certain schemas (#139)

  • releaseNotes
  • runtimePlatform
  • fileSize
  • applicationSubCategory
  • supportingData
  • softwareVersion
  • sponsor
  • sameAs
  • encoding
  • processorRequirements
  • memoryRequirements
  • fileFormat
  • storageRequirements
  • permissions
  • targetProduct |

Still need a proposal for...

softwarePaperCitationIdentifiers, see #144

@gothub
Copy link
Contributor

gothub commented May 18, 2017

If using schema.org types vs our own codemeta types will facilitate wider use, then I think we should do it.

Also, if translating codemeta documents from an author’s schema to a consumer's schema (JSON-LD expand/compact, from Manu Sporny’s video) is a goal then the author and consumer need to be using the same set of 'universal' types, as the video states. In this context, using the already established 'universal' types (from the widely used schema.org) makes sense.

The remainder of my comments are under the same section titles as used above. If these issues
can be addressed or deemed to be not important, then I say proceed with the v2 changes.

Minor changes

The type schema:name has a less precise meaning than dc:title. This is the same for v1 packageSystem vs v2 provider.

Dependencies.

In v1, depends and suggests had the same sub-types. In v2, softwareRequirements and suggests now have different sub-types from each other and so the meaning of suggests is now less clear. Also, shouldn’t suggests have @type: SoftwareApplication, as softwareRequirementsdoes?

Agent/role structure

I think we still need isRightsholder, mustBeCited or their equivalents, but not sure how to incorporate these with v2. In addition, if we are ok with having to potentially create new revisions of codemeta as we see the need for additional roles, then this change is fine with me

Use schema.org types

The xsd:types types provided the potential to validate values. The schema.org types only loosely provide this. So is validation important to us?

When to declare Type?

Yep, this is a reasonable approach

Using schema.org properties not already associated with the original schema.org type

Not sure that we could prevent/control this even if we wanted to. The positive point about this is that it will assist us in showing where our schema and especially schema.org schema is lacking. The additional types that authors use should prompt requests to schema.org to update their schema hierarchy to accommodate the new usage (position in hierarchy) of these types.

Drop redundant / confusing terms

Well, if we aren’t 100% certain of what these terms mean or how they should be used, then we should drop them.

Inherit additional terms from schema.org

For SoftwareSourceCode and SoftwareApplication, allowing sub-properties of these types should be allowed. However, should we allow additional schema.org types elsewhere in codemeta documents then, e.g. at the top level? For example, what if someone wants to include a WebPage or audience or comment at the top level. Will this dilute the meaning of codemeta documents?

@cboettig
Copy link
Member Author

@gothub Thanks, nicely put explanation and very helpful feedback. A few replies to the detailed issues:

  • Thanks, I meant forsoftwareRequirements and suggests to both be typed to schema:SoftwareApplication. Perhaps a suggests should be altered to something more consistent, e.g. softwareSuggestions or suggestedRequirements?

  • In what sense is schema:Integer a looser type than xsd:integer? Even if we use JSON schema for validation, it looks like JSON schema defines it's own vocabulary for data types that is different from xsd (and seems to lack date/time types). Note that JSON-LD is very careful about type casting, so if we define things with specific xsd types we lose a degree of compatibility with data that is untyped or uses the schema type when we use something else. (e.g. the compaction algorithm will always leave the full age: { "@value": 4, "@type": "xsd:integer"} from expansion instead of simplifying as expected to "age": 4

Agent/role

  • I should have highlighted in my summary: note that schema:CreativeWork already has copyrightHolder as a property, so we aren't dropping that, just remapping it as we do with the rest of agent. Also, I think the use of schema:author as role (as opposed to schema:contributor) means precisely "mustBeCited". Note this is precisely the distinction given to the aut role in R codes, for instance. Only aut roles are listed in as cited authors if you call citation(<packagename>)

types from other properties

  • Right, can't prevent this, and schema.org seems fine with it to. maybe programmingLanguage will get a version property in schema.org one day.

Additional types from Schema.org types

Right, for this reason I've only proposed adding those terms specific to SoftwareApplication or SoftwareSourceCode. My thinking being that if one found a json-ld schema:SoftwareApplication annotation in the wild, it would be nice if we could reasonably expect it to be already valid in the CodeMeta context, even if it used a term like runtimePlatform. I didn't add terms like comment or audience, so these would have to explicitly namespaced, schema:comment, etc, for someone to use them. Of course nothing prevents a user from adding this data with additional namespaces; that's true regardless of any decision we make. Unrecognized properties should not break any application consuming JSON-LD but should just be ignored; just like you can add additional fields to an R Package Description or a .gemspec etc without breaking anything.


Okay, I think that hits everything?

@gothub
Copy link
Contributor

gothub commented May 19, 2017

@cboettig

  • Having suggests use type softwareSuggestions sounds good to me.
  • My point regarding types is that XSD types are more rigorously defined (W3C recommendation) than schema.org types (community consensus?). Also XSD provides for derived types (e.g. "positiveInteger", but maybe this is irrelevant as enforcing any type declaration is dependant on the software reading a document that references XSD datatypes. You have a really good point about compatibility, which is what v2 seems to be largely about.

You have addressed all of my concerns, thanks.

cboettig added a commit that referenced this issue May 19, 2017
@cboettig
Copy link
Member Author

Great, thanks. Good, let's use softwareSuggestions (it's a noun like most other properties and parallels softwareRequirements nicely. I've amended the version 2 candidate accordingly.

Ah, right, that's a good point about XSD's precise type definitions. Yeah, I think it's still worth sticking with schema for compatibility.

@cboettig
Copy link
Member Author

@mbjones Can you generate a new DOI to point to the v2.0 tag, https://github.com/codemeta/codemeta/releases/tag/2.0 ?

Can you let me know the DOI and I'll post it in the release notes when I create the 'release' for the tag?

Thanks!

@cboettig
Copy link
Member Author

Closing stale issue, DOI was created in June: https://doi.org/10.5063/schema/codemeta-2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants