Get taxon match categories #39

peterdesmet · 2015-01-26T13:02:01Z

Description

For a given dataset, I want to know how many records provide a taxon. I also want to know how many of those match the GBIF taxonomy and if there are any issues.

Outcome

dataset_key
taxon_not_provided
taxon_match_none
taxon_match_higherrank
taxon_match_fuzzy
taxon_match_complete

Terms we need

scientificName
genus
issues

Process

IF scientificName = "" OR genus = ""
    /* If scientificName is empty, GBIF builds a name with genus, specificEpithet, etc, see
       https://github.com/gbif/occurrence/blob/master/occurrence-processor/src/main/java/org/gbif/occurrence/processor/interpreting/TaxonomyInterpreter.java#L34
       If scientificName is empty, we can check for genus (no need to check other atomized fields)
       Note: TAXON_MATCH_NONE is applied for empty taxa (unless record was indexed before that
       issue was applied). */
    THEN category = "taxon_not_provided"
ELSEIF issues CONTAINS (TAXON_MATCH_NONE)
    THEN category ="taxon_match_none"
ELSEIF issues CONTAINS (TAXON_MATCH_HIGHERRANK)
    THEN category = "taxon_match_higherrank"
ELSEIF issues CONTAINS (TAXON_MATCH_FUZZY)
    THEN category = "taxon_match_fuzzy"
ELSE category = "taxon_match_complete"

The text was updated successfully, but these errors were encountered:

niconoe · 2015-01-28T12:42:58Z

Data extraction module now support this metric. Sample report output:

{
  "8137b32e-f762-11e1-a439-00145eb45e9a": {
    "NUMBER_OF_RECORDS": 756426,
    "BASISOFRECORDS": {
      "UNKNOWN": 6,
      "OBSERVATION": 11660,
      "PRESERVED_SPECIMEN": 744760
    },
    "TAXON_MATCHES": {
      "TAXON_MATCH_HIGHERRANK": 63987,
      "TAXON_MATCH_FUZZY": 17272,
      "TAXON_MATCH_COMPLETE": 607443,
      "TAXON_NOT_PROVIDED": 67724
    }
  }
}

(I took the freedom to make the constants uppercase for the sake of consistency).

bartaelterman · 2015-01-29T12:05:41Z

@niconoe I think we're missing TAXON_MATCH_NONE for records that have a taxon, but it could not be matched. At least there is a column taxon_match_none in the cartodb table created by @peterdesmet .

bartaelterman · 2015-01-29T12:28:36Z

test data is written to cartodb. Only taxon_match_none is still missing. All values for that column are set to 0.

niconoe · 2015-01-29T12:51:50Z

Well, I just had a quick look and it seems the code support it, but that most records that trigger this issue at GBIF have no scientificName at all: http://www.gbif.org/occurrence/search?ISSUE=TAXON_MATCH_NONE

As I blindly implemented Peter's algorithm above, I think we will return TAXON_NOT_PROVIDED for those (unlike GBIF services, this algorithm will put each row in a sigle category... Is that desirable?). And the data extractor (so far) doesn't return TAXON_MATCH_* counters at all if they don't have corresponding record.

Should Peter's algorithm be changed ? Would you like that the report contains TAXON_MATCH_NONE: 0 (instead of nothing) ?

Thx !

bartaelterman · 2015-01-29T12:58:38Z

Ok, no then everything is fine. The aggregator fills in zeros for tags that are not found, so there is no need to add that to the extractor.

peterdesmet changed the title ~~Get identification categories~~ Get taxon match categories Jan 26, 2015

peterdesmet added this to the Taxon match milestone Jan 26, 2015

peterdesmet added the backend label Jan 26, 2015

peterdesmet assigned niconoe Jan 26, 2015

peterdesmet mentioned this issue Jan 26, 2015

List of all issues GBIF provides #26

Closed

42 tasks

niconoe closed this as completed Feb 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get taxon match categories #39

Get taxon match categories #39

peterdesmet commented Jan 26, 2015

niconoe commented Jan 28, 2015

bartaelterman commented Jan 29, 2015

bartaelterman commented Jan 29, 2015

niconoe commented Jan 29, 2015

bartaelterman commented Jan 29, 2015

Get taxon match categories #39

Get taxon match categories #39

Comments

peterdesmet commented Jan 26, 2015

Description

Outcome

Terms we need

Process

niconoe commented Jan 28, 2015

bartaelterman commented Jan 29, 2015

bartaelterman commented Jan 29, 2015

niconoe commented Jan 29, 2015

bartaelterman commented Jan 29, 2015