Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

globi link #4998

Closed
dustymc opened this issue Sep 1, 2022 · 30 comments
Closed

globi link #4998

dustymc opened this issue Sep 1, 2022 · 30 comments

Comments

@dustymc
Copy link
Contributor

dustymc commented Sep 1, 2022

@campmlc here's an example of the URL I check for globi links:

https://api.globalbioticinteractions.org/exists?accordingTo=http://arctos.database.museum/guid/MSB:Para:30008

And here's the response

Screen Shot 2022-09-01 at 12 35 32 PM

I only create links if that responds with a 200 statuscode - that one is telling me that there's nothing to link to.

@dustymc dustymc added this to the Needs Discussion milestone Sep 1, 2022
@campmlc
Copy link

campmlc commented Sep 1, 2022

So this is a GloBI issue, or something wrong on our end? If the former, can we let him know?

@dustymc
Copy link
Contributor Author

dustymc commented Sep 1, 2022

if there's really something there then globi is giving an incorrect response.

@Jegelewicz
Copy link
Member

We need Jorrit in our Github as a collaborator...

@Jegelewicz
Copy link
Member

Created an issue at GloBI

@jhpoelen
Copy link

jhpoelen commented Sep 1, 2022

Thank for sharing!

Just checked and it seems that MSB Para dropped off the GloBI map somehow.

Screenshot from 2022-09-01 17-29-31

Looking into it, thanks for being patient.

@jhpoelen
Copy link

jhpoelen commented Sep 1, 2022

@dustymc @Jegelewicz I got the attached msb-para.zip via

globalbioticinteractions/msb-para http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para cd513050869ee454c31f288433ddb2861c4b4acf3b7484ed126531922d26e2a8 2022-08-27T02:03:37.946Z

and, the resource is pretty empty . . .

Should I update the vertnet endpoint for msb-para ?

$ unzip -l msb-para.zip 
Archive:  msb-para.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
    12508  2022-08-21 17:50   eml.xml
      151  2022-08-21 17:50   multimedia.txt
     8762  2022-08-21 17:50   meta.xml
     1452  2022-08-21 17:50   occurrence.txt
---------                     -------
    22873                     4 files

msb-para.zip

it appears the content changed somewhere before Aug 27 2022:

globalbioticinteractions/msb-para	http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para	92a6087278fdbc480be31112aaaeedc789bbd98bc53f0f1dd2286e4b7343d602	2022-08-20T15:24:26.419Z	
globalbioticinteractions/msb-para	http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para	92a6087278fdbc480be31112aaaeedc789bbd98bc53f0f1dd2286e4b7343d602	2022-08-20T15:24:55.834Z	
globalbioticinteractions/msb-para	http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para	92a6087278fdbc480be31112aaaeedc789bbd98bc53f0f1dd2286e4b7343d602	2022-08-20T16:32:18.359Z	
globalbioticinteractions/msb-para	http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para	cd513050869ee454c31f288433ddb2861c4b4acf3b7484ed126531922d26e2a8	2022-08-27T01:13:38.382Z	
globalbioticinteractions/msb-para	http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para	cd513050869ee454c31f288433ddb2861c4b4acf3b7484ed126531922d26e2a8	2022-08-27T01:13:39.375Z	
globalbioticinteractions/msb-para	http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para	cd513050869ee454c31f288433ddb2861c4b4acf3b7484ed126531922d26e2a8	2022-08-27T02:03:37.946Z

@jhpoelen
Copy link

jhpoelen commented Sep 1, 2022

the previous version (< 2022-08-27) was quite a lot bigger:

$ unzip -l msb-para-92a6.zip 
Archive:  msb-para-92a6.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
    12508  2022-07-22 17:50   eml.xml
   240142  2022-07-22 17:50   multimedia.txt
     8762  2022-07-22 17:50   meta.xml
 79077850  2022-07-22 17:50   occurrence.txt
---------                     -------
 79339262                     4 files

msb-para-92a6.zip

Please advise.

@jhpoelen
Copy link

jhpoelen commented Sep 1, 2022

Where did your MSB Para data go?

@Jegelewicz
Copy link
Member

@dustymc maybe this IS part of the thing you and Dave are working on?

@dustymc
Copy link
Contributor Author

dustymc commented Sep 1, 2022

Yep, sounds like it.

Nothing's vaporized lately!

arctosprod@arctos>> select count(*) from ipt_cache.occurrence;
  count  
---------
 4477218

@jhpoelen
Copy link

jhpoelen commented Sep 1, 2022

@dustymc now I am curious what you and Dave are working on. Super-compression?

@jhpoelen
Copy link

jhpoelen commented Sep 1, 2022

@jhpoelen
Copy link

jhpoelen commented Sep 1, 2022

Looks like the vertnet endpoint has changed since elton (GloBI bot) checked last.

Now, the content looks different:

$ curl "http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para" | sha256sum
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 14.0M    0 14.0M    0     0  6186k      0 --:--:--  0:00:02 --:--:-- 6186k
6da7a9d5b5604bf870b609e9f0f7ebe360c387d10c9db6c5863df38e81f081e4  -

and more like before (14M vs a few kB)

Looks like I just caught vertnet in a funny moment. Running manual update to confirm.

@dustymc
Copy link
Contributor Author

dustymc commented Sep 1, 2022

Yep, perfect zero-byte compression, you just don't know how to use it.

(Or the script that packages up DWC data sometimes confuses and then offs itself resulting in Dave automagically publishing empty files - one of those.....)

@jhpoelen
Copy link

jhpoelen commented Sep 1, 2022

@dustymc good to know - I'll sign up for that zero-byte compression course I always wanted to take 😉 . I can't imagine confused scripts . . . aren't machines perfect?

@dustymc
Copy link
Contributor Author

dustymc commented Sep 1, 2022

Obviously not, perfect machines would imply imperfect programmers and we know that can't be the case!

@jhpoelen
Copy link

jhpoelen commented Sep 1, 2022

Looks like the update pulled in the non-compressed resource:

globalbioticinteractions/msb-para	http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para	92a6087278fdbc480be31112aaaeedc789bbd98bc53f0f1dd2286e4b7343d602	2022-08-20T15:24:55.834Z	
globalbioticinteractions/msb-para	http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para	92a6087278fdbc480be31112aaaeedc789bbd98bc53f0f1dd2286e4b7343d602	2022-08-20T16:32:18.359Z	
globalbioticinteractions/msb-para	http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para	cd513050869ee454c31f288433ddb2861c4b4acf3b7484ed126531922d26e2a8	2022-08-27T01:13:38.382Z	
globalbioticinteractions/msb-para	http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para	cd513050869ee454c31f288433ddb2861c4b4acf3b7484ed126531922d26e2a8	2022-08-27T01:13:39.375Z	
globalbioticinteractions/msb-para	http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para	cd513050869ee454c31f288433ddb2861c4b4acf3b7484ed126531922d26e2a8	2022-08-27T02:03:37.946Z	
globalbioticinteractions/msb-para	http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para	6da7a9d5b5604bf870b609e9f0f7ebe360c387d10c9db6c5863df38e81f081e4	2022-09-01T23:08:22.186Z	
globalbioticinteractions/msb-para	http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para	6da7a9d5b5604bf870b609e9f0f7ebe360c387d10c9db6c5863df38e81f081e4	2022-09-01T23:08:35.983Z	
globalbioticinteractions/msb-para	http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para	6da7a9d5b5604bf870b609e9f0f7ebe360c387d10c9db6c5863df38e81f081e4	2022-09-01T23:23:47.346Z	

please allow for some time for GloBI to propagate the changes.

@ebraker
Copy link
Contributor

ebraker commented Sep 2, 2022

I'm assuming this fix will propagate to all parasite collection records? I'm recording a demo for a lightening talk and really want the GloBI link to appear in a DMNS:Para record (specifically this one: https://arctos.database.museum/guid/DMNS:Para:49).

@jhpoelen
Copy link

jhpoelen commented Sep 2, 2022

For the time being, I've reverted the GloBI version to a < 2022-08-27 version. See attached screenshot.

Please confirm that MSB Para and related records are linked as expected.

Seems like @dbloom and @dustymc owe me a coffee and a cookie for publicizing their zero-byte compression experiments.

Screenshot from 2022-09-02 12-56-25

@jhpoelen
Copy link

jhpoelen commented Sep 2, 2022

@ebraker I was able to see GloBI link appear for http://arctos.database.museum/guid/MSB:Para:6170 .

Screenshot from 2022-09-02 13-04-55

@dustymc is there any kind of caching that happens on the Arctos side? If so, please flush the 404s from GloBI if possible.

@jhpoelen
Copy link

jhpoelen commented Sep 2, 2022

@ebraker re: https://arctos.database.museum/guid/DMNS:Para:49 - I was able to see the records getting linked in a recent review of Arctos collections exposed by Vertnet:

e.g., http://arctos.database.museum/guid/DMNS:Mamm:11096 -> http://arctos.database.museum/guid/DMNS:Para:49

I am still trying to figure out why the current Arctos page doesn't show this expected results.

$ curl --silent "https://depot.globalbioticinteractions.org/reviews/globalbioticinteractions/vertnet/indexed-interactions.tsv.gz" | gunzip | grep -P "DMNS:Para:49\t"
https://en.wiktionary.org/wiki/support	http://arctos.database.museum/guid/DMNS:Mamm:11096?seid=282499	DMNS:Mamm:11096	Mamm	45	DMNS		Tamias quadrivittatus	Animalia | Chordata | Mammalia | Rodentia | Sciuridae | Tamias | Tamias quadrivittatus	kingdom | phylum | class | order | family | genus | species				female	http://purl.obolibrary.org/obo/RO_0002445	hasParasite	http://arctos.database.museum/guid/DMNS:Para:49?seid=4274393	DMNS:Para:49	Para	127	DMNS	Hoplopleura arboricola			Animalia | Arthropoda | Insecta | Psocodea | Hoplopleuridae | Hoplopleura | Hoplopleura arboricola	kingdom | phylum | class | order | family | genus | species				adult; larva; nymph		female; male		PreservedSpecimen	2007-07-13T00:00:00Z	38.0222	-105.6799		Sangre de Cristo Mountains, 2 miles northeast of Crestone, canyon of North Crestone Creek		http://arctos.database.museum/guid/DMNS:Mamm:11096	http://arctos.database.museum/guid/DMNS:Mamm:11096	globalbioticinteractions/vertnet	DMNS Mammal Collection (Arctos) - Version 34.64	http://ipt.vertnet.org:8080/ipt/archive.do?r=dmns_mamm	2022-08-13T11:08:30.666Z	f324019ad23212691cdb91d958bf8cf266bb93dd97210f1a153bc070a996c0a0	0.12.4
https://en.wiktionary.org/wiki/support	http://arctos.database.museum/guid/DMNS:Para:49?seid=4274393	DMNS:Para:49	Para	127	DMNS		Hoplopleura arboricolaAnimalia | Arthropoda | Insecta | Psocodea | Hoplopleuridae | Hoplopleura | Hoplopleura arboricola	kingdom | phylum | class | order | family | genus | species		adult; larva; nymph		female; male	http://purl.obolibrary.org/obo/RO_0002444	parasiteOf	http://arctos.database.museum/guid/DMNS:Mamm:11096?seid=282499DMNS:Mamm:11096	Mamm	45	DMNS		Tamias quadrivittatus			Animalia | Chordata | Mammalia | Rodentia | Sciuridae | Tamias | Tamias quadrivittatus	kingdom | phylum | class | order | family | genus | species				female		PreservedSpecimen	2007-07-13T00:00:00Z	38.0222	-105.6799	Sangre de Cristo Mountains, 2 miles northeast of Crestone, canyon of North Crestone Creek		http://arctos.database.museum/guid/DMNS:Para:49	http://arctos.database.museum/guid/DMNS:Para:49	globalbioticinteractions/vertnet	DMNS Parasite Collection (Arctos) - Version 34.63	http://ipt.vertnet.org:8080/ipt/archive.do?r=dmn	2022-08-13T11:08:30.666Z	f324019ad23212691cdb91d958bf8cf266bb93dd97210f1a153bc070a996c0a0	0.12.4
https://en.wiktionary.org/wiki/support	http://arctos.database.museum/guid/DMNS:Para:49?seid=4274393	DMNS:Para:49	Para	127	DMNS		Hoplopleura arboricolaAnimalia | Arthropoda | Insecta | Psocodea | Hoplopleuridae | Hoplopleura | Hoplopleura arboricola	kingdom | phylum | class | order | family | genus | species		adult; larva; nymph		female; male	http://purl.obolibrary.org/obo/RO_0002454	hasHost							Tamias quadrivittatus	PreservedSpecimen	2007-07-13T00:00:00Z	38.0222	-105.6799		Sangre de Cristo Mountains, 2 miles northeast of Crestone, canyon of North Crestone Creek	http://arctos.database.museum/guid/DMNS:Para:49	http://arctos.database.museum/guid/DMNS:Para:49	globalbioticinteractions/vertnet	DMNS Parasite Collection (Arctos) - Version 34.63	http://ipt.vertnet.org:8080/ipt/archive.do?r=dmn	2022-08-13T11:08:30.666Z	f324019ad23212691cdb91d958bf8cf266bb93dd97210f1a153bc070a996c0a0	0.12.4

@jhpoelen
Copy link

jhpoelen commented Sep 2, 2022

re: DMNS:Para -

@dustymc - I see the following para datasets on the vertnet rss feed:

$ cat 17da4ef08733e4be6431053b4b0b90b77d6f7cc5fccc74e73b2b149df0aecbd9 | grep _para
        <link>http://ipt.vertnet.org:8080/ipt/resource?r=dmns_para_ggbn</link>
        <ipt:eml>http://ipt.vertnet.org:8080/ipt/eml.do?r=dmns_para_ggbn</ipt:eml>
        <ipt:dwca>http://ipt.vertnet.org:8080/ipt/archive.do?r=dmns_para_ggbn</ipt:dwca>
        <guid isPermaLink="false">http://ipt.vertnet.org:8080/ipt/resource?id=dmns_para_ggbn/v1.23</guid>
        <link>http://ipt.vertnet.org:8080/ipt/resource?r=msb_para_ggbn</link>
        <ipt:eml>http://ipt.vertnet.org:8080/ipt/eml.do?r=msb_para_ggbn</ipt:eml>
        <ipt:dwca>http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para_ggbn</ipt:dwca>
        <guid isPermaLink="false">http://ipt.vertnet.org:8080/ipt/resource?id=msb_para_ggbn/v1.23</guid>
        <link>http://ipt.vertnet.org:8080/ipt/resource?r=msb_para</link>
        <ipt:eml>http://ipt.vertnet.org:8080/ipt/eml.do?r=msb_para</ipt:eml>
        <link>http://ipt.vertnet.org:8080/ipt/resource?r=hwml_para</link>
        <ipt:eml>http://ipt.vertnet.org:8080/ipt/eml.do?r=hwml_para</ipt:eml>
        <link>http://ipt.vertnet.org:8080/ipt/resource?r=owu_para</link>
        <ipt:eml>http://ipt.vertnet.org:8080/ipt/eml.do?r=owu_para</ipt:eml>

where 17da4ef08733e4be6431053b4b0b90b77d6f7cc5fccc74e73b2b149df0aecbd9 is the rss feed retrieve on 2022-08-27T08:01:11.633Z from http://ipt.vertnet.org:8080/ipt/rss.do .

By default, I configured GloBI to ignore the ggbn datasets. And, for DMNS, a ggbn endpoint exists, but not a "normal" one. Can you help explain what is going on?

@jhpoelen
Copy link

jhpoelen commented Sep 2, 2022

for some reason, http://ipt.vertnet.org:8080/ipt/resource?id=dmns_para_ggbn links to

Screenshot from 2022-09-02 13-48-14

and http://ipt.vertnet.org:8080/ipt/resource?id=dmns_para (no ggbn) links to a landing page that also mentions GGBN

Screenshot from 2022-09-02 13-49-19

and confusingly, the data as DwC "download" link for http://ipt.vertnet.org:8080/ipt/resource?id=dmns_para#anchor-downloads is
http://ipt.vertnet.org:8080/ipt/archive.do?r=dmns_para_ggbn&v=1.23

@ebraker did you ever see the linking work for DMNS? Was there any recent work done on the digital management of DMNS?

Thanks for being patient as I am trying to figure out what is going on.

@jhpoelen
Copy link

jhpoelen commented Sep 2, 2022

In tracking the provenance, I notice that DMNS Para collection is accessible via http://ipt.vertnet.org:8080/ipt/archive.do?r=dmn instead of expected http://ipt.vertnet.org:8080/ipt/archive.do?r=dmns_para .

@ebraker
Copy link
Contributor

ebraker commented Sep 2, 2022

@jhpoelen Thanks for looking into this! Indeed, I thought I saw the links working early last week, but I could have been playing around with a DMNS specimen that linked to an MSB parasite (and not a DMNS parasite). @acdoll - any recent changes to IPT metadata at DMNS?

@jhpoelen
Copy link

jhpoelen commented Sep 7, 2022

@ebraker I dug around a bunch to look for the linked DMNS specimen. And, they are showing up in early processing the GloBI. So, most likely, the links are somehow lost in transit in later GloBI processing stages.

Here's an example of such records, extracted via a 2022-09 elton interactions globalbioticinteractions/vertnet

https://en.wiktionary.org/wiki/support http://arctos.database.museum/guid/DMNS:Mamm:11922?seid=463326 DMNS:Mamm:11922 Mamm 45 DMNS Callospermophilus lateralis Animalia | Chordata | Mammalia | Rodentia | Sciuridae | Callospermophilus | Callospermophilus lateralis kingdom | phylum | class | order | family | genus | species male http://purl.obolibrary.org/obo/RO_0002445 hasParasite http://arctos.database.museum/guid/DMNS:Para:933?seid=3736551 DMNS:Para:933 Para 127 DMNSSiphonaptera Animalia | Arthropoda | Insecta | Siphonaptera kingdom | phylum | class | order PreservedSpecimen 2009-05-15T00:00:00Z 40.6708 -105.6025 Mummy Range; FR268 off FR139 (Crown Point Road) http://arctos.database.museum/guid/DMNS:Mamm:11922 http://arctos.database.museum/guid/DMNS:Mamm:11922 globalbioticinteractions/vertnet DMNS Mammal Collection (Arctos) - Version 34.66 http://ipt.vertnet.org:8080/ipt/archive.do?r=dmns_mamm 2022-09-03T15:37:41.720Z f324019ad23212691cdb91d958bf8cf266bb93dd97210f1a153bc070a996c0a0 0.12.4

which tells me that GloBI is able to successfully link the http://arctos.database.museum/guid/DMNS:Mamm:11922 with their counterpart http://arctos.database.museum/guid/DMNS:Para:933 .

So, as far as I can tell, the Arctos side is working perfectly! I'll have to dig into the linking issue a little more, and hoping to report.

Thanks for being patient. . . I find linking stuff pretty tricky, especially when dealing with cross-platform, cross-institutional reference . . . clear that more work is needed to simplify link mechanisms efficiently.

PS @dustymc currently, I am discovering links between occurrenceIds/ collection codes (e.g., MSB:Para) and their "official" data feed (e.g., http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_para) by either manually curating list (see https://github.com/globalbioticinteractions/msb-para/blob/72ff2d6e9a30df60a3a843858f9e7a099f21bf5d/rss.xml) or by re-using the vertnet rss feed. Did you ever consider explicitly declaring dependencies of MSB:Para on other integration endpoints (e.g., MSB:Mamm, GenBank)? If so, I might be able to re-use that list of dependencies as the equivalent of data import statements (e.g., "MSB:Para references records from MSB:Mamm, and MSB:Mamm records may be found at http://ipt.vertnet.org:8080/ipt/archive.do?r=msb_mamm ").

@jhpoelen
Copy link

jhpoelen commented Sep 7, 2022

@ebraker similarly, I found records linking UCM:Para to UCM:Mamm (see example http://arctos.database.museum/guid/UCM:Mamm:20797 hasParasite http://arctos.database.museum/guid/UCM:Para:10 below). Please do note however, that the UCM:Para collection is somehow not included on the vertnet rss feed, so GloBI was unable to associated the detailed taxonomic information about the parasite. Did you publish the UCM:Para collection publicly?

https://en.wiktionary.org/wiki/support http://arctos.database.museum/guid/UCM:Mamm:20797?seid=2623787 UCM:Mamm:20797 Mamm 95 UCM Sorex cinereus Animalia | Chordata | Mammalia | Soricomorpha | Soricidae | Sorex | Sorex cinereus kingdom | phylum | class | order | family | genus | species unknown http://purl.obolibrary.org/obo/RO_0002445 hasParasite http://arctos.database.museum/guid/UCM:Para:10 http://arctos.database.museum/guid/UCM:Para:10 PreservedSpecimen 2012-09-10T00:00:00Z 37.84595333 -108.0318 San Juan Mountains, Lizardhead Wilderness, along north side of West Dolores River, just east of Navajo Lake ('Navajo Lake Site') http://arctos.database.museum/guid/UCM:Mamm:20797 http://arctos.database.museum/guid/UCM:Mamm:20797 globalbioticinteractions/vertnet UCM Mammal Collection (Arctos) - Version 17.70 http://ipt.vertnet.org:8080/ipt/archive.do?r=ucm_mammals 2022-09-03T15:37:41.720Z f324019ad23212691cdb91d958bf8cf266bb93dd97210f1a153bc070a996c0a0 0.12.4

@ebraker
Copy link
Contributor

ebraker commented Sep 7, 2022

@jhpoelen Thanks for looking into this! We set up the UCM:Para collection fairly recently, so I still need to coordinate with Dave to have it published to the VertNet IPT. On the to-do list!

@jhpoelen
Copy link

jhpoelen commented Sep 7, 2022

Hey y'all -

After some anxious moments yesterday (where did all the links go?), I did some checking, and found that the recently (just today) updated GloBI index has the expected Arctos links re-established.

See e.g., globalbioticinteractions/globalbioticinteractions#818 globalbioticinteractions/globalbioticinteractions#817 .

And, the original issue, https://api.globalbioticinteractions.org/exists?accordingTo=http://arctos.database.museum/guid/MSB:Para:30008 returning 404 is no longer happening. Instead the query returns the expected 200 "OK" (see screenshot)

Screenshot from 2022-09-07 12-09-57

So, as far as I can tell, the Arctos datasets exposed via Vertnet have been re-indexed by GloBI and no longer include the experimental zero-byte compression experiments by @dustymc and @dbloom .

That was fun! To me this exercise once again shows that: 1. data integration is a continuous activity, 2. humans are needed to care of these data integration processes by reporting unexpected behaviors and/or improving data integration method and 3. Arctos has an active community of contributors that care not only for their own data, but are interested to care for the health of the systems that re-use the Arctos data.

So thanks!

Suggest to close this issue, unless there's remaining concerns.

@Jegelewicz
Copy link
Member

Yep - I think we can close!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants