Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request authority Wikidata be supported #15

Open
elrayle opened this issue Sep 14, 2018 · 11 comments

Comments

@elrayle
Copy link
Member

commented Sep 14, 2018

@elrayle elrayle self-assigned this Sep 14, 2018

@elrayle

This comment has been minimized.

Copy link
Member Author

commented Sep 14, 2018

This request will not be immediately acted upon. Research is required to determine if we can cache data, whether we want to, is direct access a possibility? There are also questions about what data from wikidata we want to query and fetch.

@eichmann

This comment has been minimized.

Copy link
Member

commented Nov 19, 2018

I've refreshed the existing triplestore with current data as of 11/13, so yes, caching is possible.

@zimeon zimeon added in progress and removed to do labels Jan 18, 2019

@sfolsom

This comment has been minimized.

Copy link
Collaborator

commented Jan 18, 2019

The conclusion if a conversation with @elrayle and others during the Cornell meeting today is that we need subauthorities based on the spreadsheet for Wikidata column A, https://docs.google.com/spreadsheets/d/1rPvEoP9iYNkxJ0eWC8gXe3ci7e6mhW0da59xkGhadi0/edit#gid=965082913 (The spreadsheet isn't ready for action yet)

@eslao

This comment has been minimized.

Copy link

commented Feb 8, 2019

@elrayle @eichmann I've put together some notes in the hopes that we can revisit this request, with an eye toward first implementing a "generic" lookup for any type of Wikidata entity: https://docs.google.com/document/d/1zHfziWP2I9lfNnrrVVX2Xyvim5cXwtTRU8GddKCbJy4/edit#

We've split the Wikidata tab in the spreadsheet into two tabs -- one for the "generic" lookup, and another for potential subauthority-esque lookups: https://docs.google.com/spreadsheets/d/1rPvEoP9iYNkxJ0eWC8gXe3ci7e6mhW0da59xkGhadi0/edit#gid=965082913

Please let me know if this is enough to move forward, and/or if a call (with me and/or @mmcgee and/or @sfolsom) would be helpful. Thanks!

@zimeon

This comment has been minimized.

Copy link
Member

commented Feb 8, 2019

Suggest trying to implement the generic wikidata query outlined in the spreadsheet as a direct SPARQL query against wikidata -- is that realistic? is that fast enough?

@sfolsom

This comment has been minimized.

Copy link
Collaborator

commented Feb 8, 2019

If there is concern about indexing all of Wikidata for reasons of scale and keeping the data current, we may want to look into the API @eslao documented in the Google doc above. Wbsearchentities module (gets item numbers, labels, and description string by label string): https://www.wikidata.org/w/api.php?action=help&modules=wbsearchentities

@elrayle elrayle added this to Next Step in qa_server in Authority Requests Feb 15, 2019

@zimeon

This comment has been minimized.

Copy link
Member

commented Feb 15, 2019

@eslao

This comment has been minimized.

Copy link

commented Feb 26, 2019

@zimeon Thanks for finding that. Would a direct SPARQL query on Wikidata, or the API that @sfolsom mentions, enable us to create a lookup that meets the specs that @mmcgee and I have laid out? As I said before, I'd be happy to set up a call to weigh our options, if there are trade-offs between the possible approaches.

@elrayle

This comment has been minimized.

Copy link
Member Author

commented Mar 1, 2019

@eslao QA has not yet worked with SPARQL queries. I plan to use the wikidata as a first step of getting QA to play nicely with QA.

QA is approaching a major release to add context. I plan to start this work once that is complete.

One thing that can speed up this work is an example SPARQL query in the structure desired to get the data you want. If someone has time to do that, it will make the QA integration easier. The query can be tested at... https://query.wikidata.org/

@eslao

This comment has been minimized.

Copy link

commented Mar 8, 2019

@elrayle I've started by putting the beginnings of a query here: https://github.com/LD4P/Wikidata/blob/master/generic%20entity/wikidata_generic_entity.rq (@mmcgee and I will be adding more of these in the same folder)

The query I started with is looking for an exact string match; I started to experiment with a (contains()) filter (as many labels and aliases in Wikidata include definite articles and whatnot) -- this seems to time out, but I'll keep playing with it. We may need to compare performance between a few different queries. Should I be exploring regex options for non-exact string matches, or does QA do anything to support fuzzy matching that might apply here?

@zimeon zimeon moved this from Next Step in qa_server to Next Step for Authority Owner in Authority Requests Mar 8, 2019

@zimeon zimeon added this to In Progress in ld4p2-cornell Mar 29, 2019

@eichmann

This comment has been minimized.

Copy link
Member

commented May 30, 2019

Here's a list of class instance counts from a just-completed triplestore load:

6130 owl:Class ;
19579 owl:DatatypeProperty .
46941 owl:ObjectProperty .
6130 owl:Restriction ;
68362885 schema:Article ;
58935966 schema:Dataset ;
312 schema:Dataset,
7423600 wikibase:GlobecoordinateValue ;
26226739 wikibase:Item .
30495072 wikibase:Item ;
6130 wikibase:Property,
3996256 wikibase:QuantityValue ;
51598093 wikibase:Reference ;
89 wikibase:Reference,
1209842 wikibase:Statement ;
716414823 wikibase:Statement,
5805164 wikibase:TimeValue ;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.