Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggest to index Plazi's treatment bank #30

Closed
jhpoelen opened this issue Sep 7, 2020 · 12 comments
Closed

suggest to index Plazi's treatment bank #30

jhpoelen opened this issue Sep 7, 2020 · 12 comments

Comments

@jhpoelen
Copy link

jhpoelen commented Sep 7, 2020

hi @dimus et al.

Are you still maintaining globalnames.org ?

If so, I was hoping you can consider the following:

Plazi https://plazi.org keeps an extensive list of taxonomic literature and associated taxonomic names.

Plazi exports these taxonomic literature <> name links as DwC-A and register them with GBIF.

You can find their publications at https://www.gbif.org/occurrence/search?dataset_key=6384b520-7e9f-4874-a414-76c2e9b01d74&type_status=TYPE .

As a user (GloBI), I would like to be able use the Global Names resolvers to find taxonomic treatments in Plazi.

The taxonomic treatments can be located by linking the TaxonId fields that are available in the Plazi publications.

for example, when I lookup: Rhinolophus denti , I expect to find a Plazi name with id 885887A2FFC88A21F8B1FA48FB92DD65.taxon (also see https://www.gbif.org/occurrence/2597533915). This identifier can then be translated into a link to the related taxonomic treatment via http://treatment.plazi.org/id/885887A2FFC88A21F8B1FA48FB92DD65 .

See also attached screenshots.

Screenshot from 2020-09-07 14-18-39
Screenshot from 2020-09-07 14-18-23

fyi @myrmoteras

@dimus
Copy link
Member

dimus commented Sep 8, 2020

@jhpoelen good point, noted, put it on a back burner. We got some troubles with Scala version of resolver, moving it to Go atm. I will add Plazi when resoltuion is migrated to new code (a few months from now).

@jhpoelen
Copy link
Author

@dimus Thanks for considering indexing the Plazi taxonomic names associated to the many treatments they index.

Is there a way I can run my own resolver locally with my own indexed names? What would it take to replicate your setup? I'd actually prefer running tools in a controlled environment anyway, because of the variability of web services (e.g., limited network bandwidth, single point of access).

@dimus
Copy link
Member

dimus commented Sep 11, 2020

Is there a way I can run my own resolver locally with my own indexed names? What would it take to replicate your setup? I'd actually prefer running tools in a controlled environment anyway, because of the variability of web services (e.g., limited network bandwidth, single point of access).

I do want to make resolver "localizable". New one is located at https://github.com/gnames/gnames/. The idea i have it so include docker-compose file that sets up all components locally. I do not have a decent harvester yet sadly, so thats gives a choice of either building DB out of a dump or populating DB by a script.

The project is not finished yet, but I do hope it to be running by October. It would be awesome if you help to test localization when I get to a point of a 'rea' release!

@jhpoelen
Copy link
Author

jhpoelen commented Sep 11, 2020

@dimus excellent! I'd be happy to help test your tool, especially if the features align with those of Nomer:

  1. easy to install (in my opinion, docker is not easy for most and usually just hides overly complex infrastructures)
  2. able to index, cache and version existing taxon lists (e.g., index Plazi names from taxon lists published in DwC-A)
  3. able to batch export / stream link results
  4. command-line interface
  5. able to run on single laptop
  6. designed for offline (no internet) workflows
  7. separate commands to do common name operations: find, parse, resolve

Is this in line with the features you are planning to support?

@dimus dimus modified the milestones: iter 18, iter 19 Jan 25, 2021
@dimus dimus removed this from the iter 19 milestone Feb 17, 2021
@dimus
Copy link
Member

dimus commented Mar 19, 2021

Hm, looks like there is no aggregated file with all PLAZI data at GBIF. @jhpoelen, any ideas if it exists?

@myrmoteras
Copy link

@dimus This is the summary and access to all the datasets in GBIF: https://www.gbif.org/publisher/7ce8aef0-9e92-11dc-8738-b8a03c50a862/metrics
You can also get it from here: http://tb.plazi.org/GgServer/dioStats/stats?outputFields=doc.articleUuid+doc.doi+doc.gbifId&groupingFields=doc.articleUuid+doc.doi+doc.gbifId&limit=100&FP-doc.gbifId=1-&format=HTML
just remove the limit=100 and you get all the 31.115 articles/datasets on GBIF

@dimus
Copy link
Member

dimus commented Mar 19, 2021

@myrmoteras thank you very much, let me try, I just sent an email to you asking about this, so you do not need to answer it :)

@myrmoteras
Copy link

are you aware of the stats in Plazi? http://plazi.org/api-tools/statistics/
There are two that allow you qurey the article and treatment stats.
For example you could get all the verbatim taxon name of the treatment, the link to the article, the treatment (eg http://tb.plazi.org/GgServer/srsStats/stats?outputFields=doc.uuid+doc.articleUuid+tax.name&groupingFields=doc.uuid+doc.articleUuid+tax.name&limit=100&format=HTML) and much more

@dimus
Copy link
Member

dimus commented Mar 19, 2021

@myrmoteras, I found this link at http://plazi.org/api-tools/statistics/:

http://tb.plazi.org/GgServer/xml.rss.xml

image

with items like:

<item>
<title>Caccothryptus arakawae Matsumoto 2021, sp. nov.</title>
<description>Caccothryptus arakawae Matsumoto 2021, sp. nov. (pages 171-171) in Matsumoto, Keita 2021, Six new species of the genus Caccothryptus from the Himalayas (Coleoptera: Limnichidae), European Journal of Taxonomy 739, pages 168-184</description>
<link>http://tb.plazi.org/GgServer/xml/F035D85E9906FFF68106FBE4FB925511</link>
<pubDate>2021-03-19T09:23:16-02:00</pubDate>
<guid isPermaLink="false">F035D85E9906FFF68106FBE4FB925511.xml</guid>
</item>
```

I can probably use this file, currently it has 418765 items, which sounds about right

dimus added a commit that referenced this issue Mar 19, 2021
@jhpoelen
Copy link
Author

For nomer, I've been using https://github.com/plazi/treatments-rdf . Versioned and all!

@dimus
Copy link
Member

dimus commented Mar 19, 2021

Thanks @jhpoelen! I already impored PLAZI (data source #194) with less metadata to https://resolver.globalnames.org/data_sources/194, and will transfer these data to https://verifier.globalnames.org in the evening.

I am going to close this ticket and make a new one about using your approach, so we have more information about names.

@dimus
Copy link
Member

dimus commented Mar 19, 2021

@myrmoteras, @jhpoelen the first version of harvesting will propagate to https://verifier.globalnames.org in a couple of hours.

And I made #47 to remember to enhance the harvest in the future according to @jhpoelen suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants