Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write task description for PlaziBot on Wikidata #553

Open
Daniel-Mietchen opened this issue Nov 2, 2017 · 7 comments
Open

Write task description for PlaziBot on Wikidata #553

Daniel-Mietchen opened this issue Nov 2, 2017 · 7 comments

Comments

@Daniel-Mietchen
Copy link
Owner

e.g.
for a given taxon mined from the literature into Plazi, search it on Wikidata and add Plazi treatment ID to Wikidata item about taxon.

@Daniel-Mietchen
Copy link
Owner Author

Pinging @punkish @myrmoteras.

@Daniel-Mietchen
Copy link
Owner Author

Example: From
http://tb.plazi.org/GgServer/html/03FB8670D547674BFF48FF575098FB09 ,
I just got

  • the paper's DOI ($1), which I fed into Source MD, which provided the paper's metadata in a format ingestible by QuickStatements, through which I created the Wikidata item (Q42561394) about the paper
  • the treatment ID ($2), which was to be added to the Wikidata item (Q42562568) about the species. Since this did not exist, I first had to create it, which I did manually with the typical default parameters for taxon items:
    • English label: $3 (in this case, "Loxosceles griffinae")
    • English description: "$4 of $5" (in this case, "species of spider")
    • P31:Q16521 (instance of: taxon)
    • P225:$3 (taxon name; in this case, "Loxosceles griffinae")
    • P105:$6 (taxon rank; in this case, species, i.e. Q7432)
    • P171:$7 (parent taxon; in this case, Loxosceles, i.e. Q1950866)
    • P1343:$8 (described by source, i.e. the item about the paper; in this case, Q42561394)

Once this was done, I could add the Plazi ID.

@myrmoteras
Copy link

my way to do it is to add the new taxon:
http://treatment.plazi.org/id/03FB8670-D547-6748-FF48-FAFA563AF808

  • search Loxosceles haddadi, whether this taxon exists in Wikidata, if not
  • create new item
  • language en
  • Label: Loxosceles haddadi
  • Description: species of spider
  • instance of: species
  • parent taxon: Loxosceles (recluse spider)
  • taxon name: Loxosceles haddadi
  • taxon rank: species
  • Plazi-ID: 03FB8670D5476748FF48FAFA563AF808

This is why I stopped doing it. I can't do anything about the Zootaxa DOI issue. Ideally, Wikidata would have access to this http://tb.plazi.org/GgServer/rdf/03FB8670D5476748FF48FAFA563AF808 or for nomenclaturial issues this http://tb.plazi.org/GgServer/lodRdf/03FB8670D5476748FF48FAFA563AF808 and with that wikidata would have a lot more data

@myrmoteras
Copy link

Didn't you and guido go through this an created a set of quick statements already?

Technically, the issue is to find out whether the new species (taxon) is already in wikidata, whether the parent taxon exists, whether the paper is already in wikidata, and for all positives get the respective property

Do you know Cristiana Sarasua: I met her a couple of time on events here in Switzerland about open data, and she mentioned that she might try to get a student to work on getting treatmentBank data into Wikidata. She just started a position at the Uni Zürich in a Citizen Science group.

@myrmoteras
Copy link

I guess, the other issue has been the social, for us uncontrollabled side to get this "botified"

glad you pick this up again - I think this is soemthing where we could supply a lot of data to wikidata, and at the same time wikidata could have a real impact - my be together with GBIF - how we deal with taxonomic names

@Daniel-Mietchen
Copy link
Owner Author

Yes, I went through this workflow with Guido. It hasn't reached a stage that would allow for complete automation, though, and now, we have tools like WikidataIntegrator (WDI) and the WDI-based Fatameh, which allows OAuth-based Wikidata editing in a way that can be scaled technically.

In terms of social scalability, I think it will still take some time to get all of million-scale datasets in. For instance, we have around 9 million scholarly papers in Wikidata by now, but no concrete plans for doing the same for the over ca. 100 million other scholarly papers that have a DOI.

So the task would be to work with what is already in Wikidata in terms of species, papers and authors, and to bring these things together before importing new content at scale.

For an overview of the usage of P1992 (Plazi ID), see https://tools.wmflabs.org/sqid/#/view?id=P1992&lang=en .

Yes, I met Cristina Sarasua at WikidataCon last weekend, and she mentioned that she's in contact with you.

@myrmoteras
Copy link

I don't think we are (yet) the right partner to talk about large scale. But we are good a creating seeds and figure out how to get something to work, to prepare the semantics etc.

What might be a good starting point is to extend the current workflows we have in place that minimize the risk for adding items that are already in Wikidata. These are the journals that we process on the day they are published: Zootaxa. EJT, Zoosystema, RSZ, all the Pensoft journals. If we can do this for the new species, we then could expand for redescriptions in the same articles, add more detaill or more journals.

There is a (slim?) chance that we get funding to expand Plazi activities quiet a bit by expanding the processing range of journals. We should now this in the next couple of weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants