Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add common names to resolver result #40

Closed
jhpoelen opened this issue Jul 24, 2015 · 7 comments
Closed

add common names to resolver result #40

jhpoelen opened this issue Jul 24, 2015 · 7 comments

Comments

@jhpoelen
Copy link

as discussed with @dimus -

In addition to taxon hierarchies, suggest to include available common names for resolved taxa. This would help me immensely in making the search features in http://globalbioticinteractions.org friendlier for humans.

Currently the resolver returns something like:

...
data_source_id: 4,
data_source_title: "NCBI",
gni_uuid: "16f235a0-e4a3-529c-9b83-bd15fe722110",
name_string: "Homo sapiens",
canonical_form: "Homo sapiens",
classification_path: "|Eukaryota|Opisthokonta|Metazoa|Eumetazoa|Bilateria|Coelomata|Deuterostomia|Chordata|Craniata|Vertebrata|Gnathostomata|Teleostomi|Euteleostomi|Sarcopterygii|Tetrapoda|Amniota|Mammalia|Theria|Eutheria|Euarchontoglires|Primates|Haplorrhini|Simiiformes|Catarrhini|Hominoidea|Hominidae|Homininae|Homo|Homo sapiens",
classification_path_ranks: "|superkingdom||kingdom|||||phylum|subphylum||superclass||||||class|||superorder|order|suborder|infraorder|parvorder|superfamily|family|subfamily|genus|species",
classification_path_ids: "131567|2759|33154|33208|6072|33213|33316|33511|7711|89593|7742|7776|117570|117571|8287|32523|32524|40674|32525|9347|314146|9443|376913|314293|9526|314295|9604|207598|9605|9606",
...

suggested result (including common names) something like:

...
data_source_id: 4,
data_source_title: "NCBI",
gni_uuid: "16f235a0-e4a3-529c-9b83-bd15fe722110",
name_string: "Homo sapiens",
canonical_form: "Homo sapiens",
common_names: "human @en|Mensch @de|mens @nl",
classification_path: "|Eukaryota|Opisthokonta|Metazoa|Eumetazoa|Bilateria|Coelomata|Deuterostomia|Chordata|Craniata|Vertebrata|Gnathostomata|Teleostomi|Euteleostomi|Sarcopterygii|Tetrapoda|Amniota|Mammalia|Theria|Eutheria|Euarchontoglires|Primates|Haplorrhini|Simiiformes|Catarrhini|Hominoidea|Hominidae|Homininae|Homo|Homo sapiens",
classification_path_ranks: "|superkingdom||kingdom|||||phylum|subphylum||superclass||||||class|||superorder|order|suborder|infraorder|parvorder|superfamily|family|subfamily|genus|species",
classification_path_ids: "131567|2759|33154|33208|6072|33213|33316|33511|7711|89593|7742|7776|117570|117571|8287|32523|32524|40674|32525|9347|314146|9443|376913|314293|9526|314295|9604|207598|9605|9606",
taxon_id: "9606",
...

@dimus dimus closed this as completed in df0cbea Aug 12, 2015
@dimus
Copy link
Member

dimus commented Aug 12, 2015

adding parameter with_vernaculars=true will add common names information to the output

jhpoelen pushed a commit to globalbioticinteractions/globalbioticinteractions that referenced this issue Aug 12, 2015
@jhpoelen
Copy link
Author

Thanks for adding the vernacular names @dimus . Some observations:

  1. for frogs (Anura), GBIF seems to include many languages but not English.
  2. WoRMS doesn't seems to have any vernaculars
  3. NCBI has vernaculars, but doesn't set the language (seems to be English by default).
  4. ITIS has some vernaculars, and they seem to be Spanish only. The language code that is used doesn't seem to be the two letter code that I am used to (e.g. "es"), instead it looks like bagre boca chica @spanish.

You can find some specific example in globalbioticinteractions/globalbioticinteractions@5d6ab97 .

Are these results expected?

jhpoelen pushed a commit to globalbioticinteractions/globalbioticinteractions that referenced this issue Aug 22, 2015
@jhpoelen
Copy link
Author

After running a couple of batches in production with globalbioticinteractions, I noticed that the name resolving against globalnames is causing internal server errors and gateway timeouts after the introduction of the vernacular names. I've disabled the feature for now, and hoping to re-enabled when we understand how to fix it.

here's an example from the logs:

2015-08-20 12:31:51,812 [main] ERROR org.eol.globi.tool.LinkerGlobalNames - batch #1117 problem matching terms: [4475460
|Zilora ferruginea|4474176|Xerocomus communis|4474947|Xylota segnis|4475203|Zaraea fasciata|4474447|Xylaria filiformis|4
475477|Zodion|4474960|Xylota sylvarum|4475216|Zaraea lonicerae|4474973|Xylota tarda|4475480|Iberis|4475225|Zelleromyces 
stephensii|4475238|Zenillia libatrix|4474978|Xylota xanthocnema|4475490|Zodion cinereum|4474479|Xylaria guepinii|4475503
|Zoellneria eucalypti|4474473|Xylaria friesii|4475241|Archiearis notha|4474987|Xylotachina diluta|4474484|Xylaria hypoxy
lon|4474992|Xyphosia miliaria|4475512|Zoellneria rosarum|4475142|Carabus (Megodontus) violaceus|4475654|Zygorhizidium me
losirae|4474369|Xyela julii|4475139|Zaira cinerea|4475651|Aulacoseira italica subsp. subarctica|4475662|Kirchneriella ob
esa|4475147|Carabus (Morphocarabus) monilis|4475659|Zygorhizidium parvum|4475157|Pterostichus (Platysma) niger|4474641|X
yleborus dryographus|4474386|Xyela longula|4475667|Kirchneriella|4475164|Zalerion arboricola|4474911|Xylohypha ortmansia
e|4474395|Xylaplothrips fuliginosus|4475430|Galerucella|4474658|Xylechinus pilosus|4475181|Zalerion maritima|4474671|Xyl
etinus longitarsis|4474920|Xylohypha pinicola|4474420|Xylaria carpophila|4475445|Zignoëlla morthieri|4474934|Xylophaga p
raestans|4475703|Zygospermella striata|4474929|Xylophaga dorsalis|4475698|Zygospermella insignis|4475455|Zignoëlla slapt
onensis|4474681|Xylobolus frustulatus|4474426|Xylaria|4475194|Zaraea aenea|4475450|Zignoëlla rhytidodes|4475078|Zabrus t
enebrioides|4475585|Zwackhiomyces dispersus|4474307|Xerula radicata|4474829|Xylohypha ferruginosa|4474824|Xylocoris (Xyl
ocoris) formicetorum|4475350|Zeugophora turneri|4475095|Zacladus exiguus|4475607|Zwackhiomyces sphinctrinoides|4474320|X
estobium rufovillosum|4475602|Zwackhiomyces lacustris|4475100|Zacladus geranii|4474334|Xestophanes potentillae|4474591|X
yleborinus saxesenii|4475610|Leptogium turgidum|1632955|Xylota|4475111|Phillyrea latifolia|4475619|Clauzadea metzleri|44
74604|Xyleborus dispar|4474351|Xiphydria prolongata|4474858|Xylohypha nigrescens|4475114|Zaghouania phillyreae|4475371|Z
euzera pyrina|4475639|Zygogloea gemellipara|9397|Halictus|4475634|Zygiobia carpini|4474364|Xyela curva|4475644|Zygophial
a jamaicensis|4475269|Zeugophora flavicollis|4475264|Zenobiana prismatica|4475521|Zoopage thamnospira|4475535|Zoophthora
 anglica|4475530|Zoophagus insidians|816095|Phillyrea latifolia|4475544|Zoophthora radicans|4474790|Xylocleptes bispinus
|4475558|Zoothamnion arbuscula|4474273|Xerula caussei|4475043|Pisum sativum var. sativum|4474798|Cryptolestes ferrugineu
s|4474795|Xylocoris (Proxylocoris) galactinus|4475563|Zopfia rhizophila|4475572|Zopfiella erostrata|4474806|Xylocoris (X
ylocoris) cursitans|4474545|Xylaria oxyacanthae|4474803|Bitoma crenata|4474809|Rhizophagus|4474298|Xerula pudens]
org.eol.globi.service.PropertyEnricherException: Failed to query
        at org.eol.globi.service.GlobalNamesService.findTermsForNames(GlobalNamesService.java:74)
        at org.eol.globi.tool.LinkerGlobalNames.handleBatch(LinkerGlobalNames.java:66)
        at org.eol.globi.tool.LinkerGlobalNames.link(LinkerGlobalNames.java:45)
        at org.eol.globi.tool.Normalizer.linkTaxa(Normalizer.java:127)
        at org.eol.globi.tool.Normalizer.run(Normalizer.java:98)
        at org.eol.globi.tool.Normalizer.main(Normalizer.java:57)
Caused by: org.apache.http.client.HttpResponseException: Internal Server Error
        at org.apache.http.impl.client.BasicResponseHandler.handleResponse(BasicResponseHandler.java:67)
        at org.apache.http.impl.client.BasicResponseHandler.handleResponse(BasicResponseHandler.java:52)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:218)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:160)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:136)
        at org.eol.globi.service.GlobalNamesService.queryForNames(GlobalNamesService.java:99)
        at org.eol.globi.service.GlobalNamesService.findTermsForNames(GlobalNamesService.java:71)
        ... 5 more

another one:

2015-08-20 12:30:51,543 [main] ERROR org.eol.globi.tool.LinkerGlobalNames - batch #1115 problem matching terms: [4471364
|Valsella amphoraria|4471621|Velutina plicatilis|4472135|Verticillium|4472128|Lecanora albescens|4471618|Hydrozoa|447238
6|Vibrissea guernisacii|4472140|Verticillium albo-atrum|4471373|Valsella clypeata|4471624|Styela coriacea|4471627|Veluti
na velutina|4471382|Valsella polyspora|4471632|Venturia carpophila|4471888|Venturia maculiformis|347923|Scilla|4472154|V
erticillium catenulatum|4471387|Valsella salicis|4471396|Vankya ornithogali|4470375|Bellevalia|4472167|Verticillium dahl
iae|4471905|Venturia minuta|4470127|Ustilago maydis|4471919|Venturia populina|4471914|Venturia palustris|4471147|Valsa i
ntermedia|4471156|Valsa laurocerasi|1052452|Salix matsudana|4471422|Vararia gallica|4472185|Verticillium insectorum|4470
534|Valsa ambiens|4472070|Veronaea botryosa|4470529|Valsa abrupta|4471555|Velutarina rufo-olivacea|4472079|Veronaea cari
cis|4472084|Veronaea carlinae|4472089|Veronaea parvispora|4471322|Valsaria insitiva|4470305|Ustilago tritici|4471086|Val
sa cypri|4471855|Venturia macularis|4472111|Verpa conica|4471081|Valsa ceuthospora|4472116|Verrucaria conturmatula|44703
20|Scilla sardensis|4470323|Ustilago vaillantii|4472371|Vespula (Vespula) austriaca|4472381|Vibrissea flavovirens|447212
1|Verrucaria latericola|4470330|Muscari botryoides|4272191|Valsa|4471355|Valsella adhaerens|4470470|Valdensia heterodoxa
|4472007|Venturia saliciperda|4472263|Physarum compressum|4472268|Physarum leucopus|4470217|Elytrigia juncea|4472277|Ste
monitis axifera|4471260|Valsaria anserina|4471516|Vasates pedicularis|4470493|Gaultheria|4471519|Acer saccharinum|447229
3|Vesiculomyces citrinus|4471527|Vasates retiolatus|4472288|Verticillium|4471265|Valsaria cincta|4471522|Vasates quadrip
edes|4471779|Venturia crataegi|4470504|Valsa abietis|4471784|Venturia ditricha|4471540|Vasates rigidus|4471793|Venturia 
fraxini|4472061|Venturiocistella ulicicola|4471550|Velutarina juniperi|4471806|Venturia geranii|4472056|Venturiocistella
 heterotricha|4471545|Vascellum pratense|4471941|Venturia pyrina|4471936|Venturia potentillae|4470414|Ustilentyloma bref
eldii|4470927|Valsa auerswaldii|4472200|Verticillium nubilum|4472213|Verticillium psalliotae|4471190|Valsa sordida|44711
85|Valsa pini|4471441|Climbing plants|4470431|Animalia|4470424|Ustilentyloma fluitans|4470936|Valsa ceratosperma|4471448
|Vararia ochroleuca|4471705|Venturia cerasi|4472218|Verticillium rexianum|4470438|Utricularia australis|4471974|Venturia
 rumicis|4472230|Ceratiomyxa fruticulosa|4472225|Arcyria nutans|4470441|Crustacea|4471209|Populus balsamifera|4470448|Ut
ricularia minor|4470194|Ustilago serpens|4471738|Venturia chlorospora|4470459|Diaptomus]
org.eol.globi.service.PropertyEnricherException: Failed to query
        at org.eol.globi.service.GlobalNamesService.findTermsForNames(GlobalNamesService.java:74)
        at org.eol.globi.tool.LinkerGlobalNames.handleBatch(LinkerGlobalNames.java:66)
        at org.eol.globi.tool.LinkerGlobalNames.link(LinkerGlobalNames.java:45)
        at org.eol.globi.tool.Normalizer.linkTaxa(Normalizer.java:127)
        at org.eol.globi.tool.Normalizer.run(Normalizer.java:98)
        at org.eol.globi.tool.Normalizer.main(Normalizer.java:57)
Caused by: org.apache.http.client.HttpResponseException: Gateway Time-out
        at org.apache.http.impl.client.BasicResponseHandler.handleResponse(BasicResponseHandler.java:67)
        at org.apache.http.impl.client.BasicResponseHandler.handleResponse(BasicResponseHandler.java:52)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:218)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:160)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:136)
        at org.eol.globi.service.GlobalNamesService.queryForNames(GlobalNamesService.java:99)
        at org.eol.globi.service.GlobalNamesService.findTermsForNames(GlobalNamesService.java:71)
        ... 5 more

@jhpoelen
Copy link
Author

@dimus suggest to re-open this issue given the reported behaviors above.

@dimus
Copy link
Member

dimus commented Sep 23, 2015

@jhpoelen -- do these examples consistently break resolver?

@jhpoelen
Copy link
Author

yep.

@dimus dimus reopened this Sep 23, 2015
@jhpoelen
Copy link
Author

@dimus I was able to reproduce and fix the issue on my end. It turned out to be a character encoding issue in the http post request that GloBI sends to the resolver. Thanks again for adding the vernacular names to the resolver results.

jhpoelen pushed a commit to globalbioticinteractions/globalbioticinteractions that referenced this issue Nov 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants