New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spike: Make datasets discoverable by Google #2654
Comments
I was looking if any action would be progressing on this. Testing our DKAN site with Google's structured data testing tool, it seems the main issue is that :
Would it be possible as a really simple solution to include type definitions in the catalog.json, or would that violate anything in DCAT-AP? |
@jensr great find! Can you tell if it's discovering the catalog.json file though? It seemed to me that it needed either inline semantic markup or for the metadata to be defined inside a <script> tag to be discoverable. |
I don't think it is discovering it. I hope this helps - happy to try out other tests if necessary. I've tested the individual dataset json pages with the validation tool. Again, it picks up the structure content, but because type is missing, my guess it that it will nod get indexed for the dataset search (From what I've red, they must have rdf type "dataset" to be included ) |
It would be great if DKAN made dataset discoverable through Google Dataset Search! I would love to help you with that (disclaimer I work on the Dataset Search team). I think there might be some misunderstanding in terms of how the metadata is ingested by googlebot. Instead of a dedicated single JSON file with metadata for the whole catalog the crawlers are expecting metadata embedded into the HTML code of landing pages for individual datasets. This metadata can be in Schema.org or DCAT. Here's an example of a dataset from Kaggle with Schema.org annotation: https://www.kaggle.com/matheusfreitag/gas-prices-in-brazil. If you look at the code you will find the schema.org annotation inside the To see how this metadata is parsed you can use our Structure Data Testing Tool: https://search.google.com/structured-data/testing-tool?utm_campaign=devsite&utm_medium=jsonld&utm_source=dataset#url=https%3A%2F%2Fwww.kaggle.com%2Fmatheusfreitag%2Fgas-prices-in-brazil You can read more about the different metadata fields here: https://developers.google.com/search/docs/data-types/dataset. In terms of implementation I would recommend starting small - exposing just the |
thanks @chrisgorgo that is great info, I'm thinking we could add |
Sounds like a plan! |
Any progress on adding schema.org to dkan? Many government data repositories would benefit from it! |
Hi there - same as @chrisgorgo - just checking back if any progress is being made on this? It would be hugely beneficial for our DKAN repo to have this implemented. |
To be done in DKAN 2 via GetDKAN/dkan2#307 |
While this would have been a good thing to do already, Google's announcement of a Dataset Search based on schema.org's Dataset definition makes support for this data standard much more imperative. It also means that the schema.org vocabulary is likely to grow in importance/dominance in the open data space.
There are two ways to approach this:
The text was updated successfully, but these errors were encountered: