-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get taxon match categories #39
Comments
Data extraction module now support this metric. Sample report output: {
"8137b32e-f762-11e1-a439-00145eb45e9a": {
"NUMBER_OF_RECORDS": 756426,
"BASISOFRECORDS": {
"UNKNOWN": 6,
"OBSERVATION": 11660,
"PRESERVED_SPECIMEN": 744760
},
"TAXON_MATCHES": {
"TAXON_MATCH_HIGHERRANK": 63987,
"TAXON_MATCH_FUZZY": 17272,
"TAXON_MATCH_COMPLETE": 607443,
"TAXON_NOT_PROVIDED": 67724
}
}
} (I took the freedom to make the constants uppercase for the sake of consistency). |
@niconoe I think we're missing |
test data is written to cartodb. Only |
Well, I just had a quick look and it seems the code support it, but that most records that trigger this issue at GBIF have no scientificName at all: http://www.gbif.org/occurrence/search?ISSUE=TAXON_MATCH_NONE As I blindly implemented Peter's algorithm above, I think we will return TAXON_NOT_PROVIDED for those (unlike GBIF services, this algorithm will put each row in a sigle category... Is that desirable?). And the data extractor (so far) doesn't return TAXON_MATCH_* counters at all if they don't have corresponding record. Should Peter's algorithm be changed ? Would you like that the report contains TAXON_MATCH_NONE: 0 (instead of nothing) ? Thx ! |
Ok, no then everything is fine. The aggregator fills in zeros for tags that are not found, so there is no need to add that to the extractor. |
Description
For a given dataset, I want to know how many records provide a taxon. I also want to know how many of those match the GBIF taxonomy and if there are any issues.
Outcome
Terms we need
Process
The text was updated successfully, but these errors were encountered: