Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for schema.org JSON-LD as a variation? #75

Closed
danbri opened this issue Dec 1, 2016 · 26 comments

Comments

Projects
None yet
5 participants
@danbri
Copy link

commented Dec 1, 2016

There is a DCAT-based structure in schema.org for representing datasets, see http://schema.org/Dataset and nearby. We are using it at Google for dataset search, see https://developers.google.com/search/docs/data-types/datasets - would it be possible/appropriate to include support for this within your extension somehow, or is it better to have a separate addon for this?

The syntax could be RDFa, Microdata or (probably best) JSON-LD, but should ideally be embedded in the main HTML page for each dataset. See http://developers.google.com/structured-data/testing-tool/ for Google's related Structured Data Testing Tool, which would show whether Google finds the markup.

@amercader

This comment has been minimized.

Copy link
Member

commented Dec 1, 2016

I think this is a brilliant idea, and certainly several people have expressed interest for schema.org integration. We definitely want to go for JSON-LD or whatever that can be embedded separately on the headers or wherever to not affect custom templates on other extensions.

A couple of questions (sorry is these sound silly):

As for where the plugin should live, in this extension or elsewhere I don't really mind. Depending on the implementation is probably going to use some of the existing helpers to map standard CKAN fields to dcat/schema.org so having it here would be a good place to start.

@danbri

This comment has been minimized.

Copy link
Author

commented Dec 2, 2016

Thanks for the positive response! :) Schema.org's dataset structure is very similar to DCAT, but instead of using multiple Semantic Web vocabularies we re-used existing schema.org terms. If someone has a mainstream DCAT description of a dataset, it ought to be relatively straightforward to re-present it as a Schema.org Dataset. However there are various DCAT extensions / application profiles, DCAT-AP, -Geo, -Stat etc, which will need more discussion. I'm creating a W3C Community Group as a place for that interop work.

As far as Google goes, for the Science Search effort around https://developers.google.com/search/docs/data-types/datasets we'll try to handle at least some native DCAT, at least in file formats which Google already parses (RDFa, Microdata, JSON-LD) and (as you anticipate) when embedded within HTML. There are some techniques (hacks?) that use Javascript to re-format things Google doesn't understand into something that is easier for Google to consume, via injecting into the page DOM. This is interesting for exploration and prototyping but is probably not the best general approach. Here is a quick example: http://danbri.org/2016/dcat2sdo/testme.html

Regarding where plugin lives, there are some other experiments with schema.org addons but my feeling is that it would be best for publishers if we collaborated on a common approach, so they install once and get all the variants with minimal hassle. But we could revise that if you decide it is too much bloat or complexity.

To get started with adding JSON-LD schema.org into the HTML pages, could you share a few pointers or examples here for those unfamiliar with the codebase and software architecture?

@rossjones

This comment has been minimized.

Copy link
Contributor

commented Dec 2, 2016

@amercader

This comment has been minimized.

Copy link
Member

commented Dec 6, 2016

@danbri sorry for the late reply

Perhaps you can help clarify the benefits of adding support for Schema.org representations. There seems to be an assumption among data publishers that adding it will help dataset pages to be found more easily on the main Google search results. Can you confirm this is the case? How does that relate to the Science Search effort that you mention?

Just to be clear, I think this feature is worth regardless of the better faring in Google search results but obviously it has a much greater appeal for users if that is true.

Let's keep schema.org support on this extension for now, on a separate plugin so users just have to enable ckan.plugins = dcat schemaorg ... to have them both active. If the schemaorg plugin gets enough traction we may consider renaming the extension to something more generic than dcat. or moving it to a separate extension.

In terms of implementation, here's a rough spec in case someone has time to give it a go:

  • New SchemaOrgProfile class inheriting from ckanext.dcat.profiles.RDFProfile,that implements a graph_from_dataset() method. This will be very similar to the DCAT one but simplified and adapted to the schema.org terms.

  • New schemaorg_dataset_show action mirroring the dcat_dataset_show one but calling the serializer with the profile defined in the previous point and format json-ld.

  • A helper function that calls the action function above

  • As @rossjones rightly says, adding a call to this helper function on the read_base.html template (if the plugin is active).

@danbri

This comment has been minimized.

Copy link
Author

commented Dec 6, 2016

It's unusual for Google to talk much about search feature plans in advance, but in this case I can say with confidence "we are still figuring out the details!", and that the shape of actual real-world data will be a critical part of that. That is why we put up the documentation as early as possible. If all goes according to plan, we will indeed make it substantially easier for people to find datasets via Google; whether that is via the main UI or a dedicated interface (or both) is yet to be determined. Dataset search has various special challenges which is why we need to be non-comital on the details at the stage, and why we hope publishers will engage with the effort even if it's in its early stages...

BTW somewhat related if vague - a schema.org position paper from a few years ago at the W3C workshop we held in London, https://www.w3.org/2013/04/odw/odw13_submission_53.pdf

I like the roadmap you outline above, it provides for some sensible modularity while still allowing these complementary efforts to share a common software distribution.

@amercader

This comment has been minimized.

Copy link
Member

commented Dec 7, 2016

@danbri thanks for the answer, makes perfect sense.

@danbri

This comment has been minimized.

Copy link
Author

commented Dec 14, 2017

Any thoughts on how we might progress this?

@metaodi

This comment has been minimized.

Copy link
Member

commented Dec 14, 2017

I'd like to work on this, sounds like an interesting challenge for the upcoming holidays. I would use @amercader 's spec from above.

@danbri

This comment has been minimized.

Copy link
Author

commented Dec 14, 2017

@metaodi fantastic :) if you can tag this issue #75 on issues / pull requests we can follow along here. Or feel free to give me a shout if you have any questions - here or danbri@google.com

@Acasovan

This comment has been minimized.

Copy link

commented Dec 14, 2017

Hey @danbri over the last year, @amercader did lots of work on this. The documentation for the DCAT extension for CKAN is at https://github.com/ckan/ckanext-dcat.

Documentation:
Install instructions (https://github.com/ckan/ckanext-dcat#installation)
Credits (https://github.com/ckan/ckanext-dcat#acknowledgements)
Dataset field mapping (https://github.com/ckan/ckanext-dcat#rdf-dcat-to-ckan-dataset-mapping)
Customization instructions (https://github.com/ckan/ckanext-dcat#writing-custom-profiles)

For the Government of Canada @wardi @TkTech have done great work to apply this extension, and map the rest of the fields to our metadata schema (https://github.com/open-data/ckanext-canada/blob/master/ckanext/canada/dcat.py)

@danbri

This comment has been minimized.

Copy link
Author

commented Jan 3, 2018

Happy new year, all! What happens next? :)

@Acasovan

This comment has been minimized.

Copy link

commented Jan 4, 2018

Hey! Met with Natasha a couple weeks ago. I’m going to finish the documentation, we’ll work to add into core, and then I’m going to do some work here in Canada to help support other govs at the provincial, territorial, and municipal level to also do the markup.

@metaodi

This comment has been minimized.

Copy link
Member

commented Jan 4, 2018

I'm still planning to implement it, but my family kept/keeps me busy during the holidays 😉 I'm starting next week with this

metaodi added a commit to opendata-swiss/ckanext-dcat that referenced this issue Jan 16, 2018

@danbri

This comment has been minimized.

Copy link
Author

commented Jan 17, 2018

How do these efforts (@Acasovan, @metaodi 's) compare? Are they complementary? Anything that could be shared between them?

@metaodi

This comment has been minimized.

Copy link
Member

commented Jan 17, 2018

@danbri my understanding is, that our efforts are complementary.

I'm finishing up the mapping this week, so that we can

  • output any dataset as schema.org Dataset JSON-LD
  • choose which profile to use when requesting a dataset (i.e. being able to provide both DCAT-AP and schema.org on the same instance)
  • add the needed markup to the template so that the structured data is in the HTML

Moving this to core is IMHO something for later.

Side note: while I'm implementing the mapping in this extension, I'm adapting this for DCAT-AP Switzerland in ckanext-switzerland to make sure the code here can easily be extended by existing DCAT-AP profiles (e.g. to add multilingual fields).

metaodi added a commit to opendata-swiss/ckanext-dcat that referenced this issue Jan 18, 2018

[ckan#75] Add missing dataset fields
The SchemaOrg schema provides a number of methods that could be
overriden by subclasses if they need a slightly different mapping to
schema.org (e.g. from one of the DCAT AP standards).

With those methods in place it should be relatively easy to implement a
schema.org mapping for different standards.

metaodi added a commit to opendata-swiss/ckanext-dcat that referenced this issue Jan 18, 2018

@danbri

This comment has been minimized.

Copy link
Author

commented Jan 18, 2018

Thanks @metaodi. @Acasovan - how does that relate to your efforts? Your notes above seem more about the DCAT aspect than Schema.org; did you address both?

metaodi added a commit to opendata-swiss/ckanext-dcat that referenced this issue Jan 19, 2018

[ckan#75] Add missing dataset fields
The SchemaOrg schema provides a number of methods that could be
overriden by subclasses if they need a slightly different mapping to
schema.org (e.g. from one of the DCAT AP standards).

With those methods in place it should be relatively easy to implement a
schema.org mapping for different standards.

metaodi added a commit to opendata-swiss/ckanext-dcat that referenced this issue Jan 19, 2018

metaodi added a commit to opendata-swiss/ckanext-dcat that referenced this issue Jan 19, 2018

@metaodi

This comment has been minimized.

Copy link
Member

commented Jan 23, 2018

@amercader you wrote

adding a call to this helper function on the read_base.html template (if the plugin is active).

How would you implement that? Check in the template if the helper is defined and then call it (i.e. add a new template helper function in a new plugin)? Or rather provide another template directory in the new plugin containing read_base.html with the call?

@metaodi

This comment has been minimized.

Copy link
Member

commented Jan 24, 2018

I finally managed to finish the implementation including the addition of the structured data to the frontend. @amercader please review 😉

metaodi added a commit to opendata-swiss/ckanext-dcat that referenced this issue Jan 31, 2018

metaodi added a commit to opendata-swiss/ckanext-dcat that referenced this issue Jan 31, 2018

[ckan#75] Add missing dataset fields
The SchemaOrg schema provides a number of methods that could be
overriden by subclasses if they need a slightly different mapping to
schema.org (e.g. from one of the DCAT AP standards).

With those methods in place it should be relatively easy to implement a
schema.org mapping for different standards.

metaodi added a commit to opendata-swiss/ckanext-dcat that referenced this issue Jan 31, 2018

metaodi added a commit to opendata-swiss/ckanext-dcat that referenced this issue Jan 31, 2018

@danbri

This comment has been minimized.

Copy link
Author

commented Feb 7, 2018

@Acasovan - do you have any more thoughts on how these efforts relate to each other?

amercader added a commit that referenced this issue Feb 16, 2018

@metaodi

This comment has been minimized.

Copy link
Member

commented Feb 16, 2018

#108 has been merged and the new release 0.0.7 of ckanext-dcat contains the implementation of the schema.org profile and the structured data.

@danbri

This comment has been minimized.

Copy link
Author

commented Feb 16, 2018

Great, is there a test server running this code publicly online anywhere?

@amercader

This comment has been minimized.

Copy link
Member

commented Feb 16, 2018

@danbri as @metaodi mentioned I've merged his PR that adds support for displaying schema.og based structured data in dataset pages (in the form of a JSON-LD snippet). I've released a new version of this extension and deployed on the demo site, so you should see the snippet of you check the source code, eg:

https://demo.ckan.org/dataset/newcastle-city-council-payments-over-500

The mapping is very thorough and covers many fields (there's an example in the README and it is defined here). It is compatible with the other RDF-based representations provided by this extension so maintainers can choose to provide one or both at the same time (like in demo.ckan.org).

I'm not sure about @Acasovan plans regarding schema.org and whether there are two complimentary efforts, but one big benefit of this implementation is that as of now users can add it to any existing CKAN instance (up from 2.4). Anything that would go into core would only be available on the next release, as new features such as this one is unlikely to be backported.

As it is this implementation can not be ported to CKAN core as it relies heavily on this extension profiles and the underlying RDFlib. IMO if schema.org was to be added to CKAN core it would be a more lightweight implementation based on a direct mapping of the CKAN metadata to the structured data equivalent.

In any case for now this is a great addition to this extension so thanks again @metaodi.

@amercader

This comment has been minimized.

Copy link
Member

commented Feb 23, 2018

@danbri did you have a change to have a look at the deployment of this?

This is now implemented as far as this extension is concerned so I'm going to close the issue but we can keep the discussion going.

@amercader amercader closed this Feb 23, 2018

@danbri

This comment has been minimized.

Copy link
Author

commented Feb 23, 2018

Haven't looked yet, but plan to. I moved to California this week so am a little distracted by practicalities like opening bank accounts.

@danbri

This comment has been minimized.

Copy link
Author

commented May 8, 2018

@metaodi @amercader - looking at the example - and this is really great work - a quick suggestion.

Can you use "encodingFormat" instead of "fileType" (which doesn't exist)? We have just converged "fileFormat" and "encodingFormat", since they meant basically the same thing. The new preferred term with be encodingFormat, and Google's tools will get updated to match.

@metaodi

This comment has been minimized.

Copy link
Member

commented May 8, 2018

@danbri yes sure, I try to get a PR ready soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.