Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create date quality categories #24

Open
5 tasks
peterdesmet opened this issue Jan 16, 2015 · 2 comments
Open
5 tasks

Create date quality categories #24

peterdesmet opened this issue Jan 16, 2015 · 2 comments

Comments

@peterdesmet
Copy link
Member

Description

For a given dataset, I want to know how many records have dates. I also want to know how many of those are useful, have issues, and maybe what their precision is. I envision this as a bar chart, where the records are grouped in categories based on the quality of the dates.

Categories (in order of increasing data quality)

  • Date not provided
  • Date with major issues
  • Date with minor issues
  • Valuable date (all in ISO8601)

Questions

Terms we need

eventDate
issue
eventDate from verbatim.txt
verbatimEventDate
year
month
day

Process

IF eventDate != "" AND issue DOES NOT CONTAIN (
        RECORDED_DATE_MISMATCH
    )
    THEN category = "Valuable date (all in ISO8601)" /* Well, MM-DD-YYYY are still in there */
ELSEIF issue CONTAINS (
        RECORDED_DATE_MISMATCH /* The only issue that keep eventDate populated */
        )
        verbatim.txt.eventDate != "" /* Since GBIF empties eventDate (see #27) in occurrence.txt, 
            we'd have to look in verbatim.txt :( */
        OR verbatimEventDate != ""
        OR year != ""
        OR (year != "" AND month != "")
        OR (year != "" AND month != "" AND day !="")
    /* A date was provided */
    THEN category = "Date provided, but not interpreted by GBIF"
ELSE
    category = "Date not provided"
@peterdesmet peterdesmet added this to the Term metrics milestone Jan 16, 2015
@peterdesmet
Copy link
Member Author

The pretty useless process if we just use GBIF issues:

IF issue CONTAINS (
        RECORDED_DATE_INVALID
        RECORDED_DATE_MISMATCH
        RECORDED_DATE_UNLIKELY
    )
   THEN category="Date with issues"
ELSEIF eventDate != ""
   THEN category="Valuable date (all in ISO8601)"
ELSE
   THEN category="Date not provided" /* This is just incorrect! See issue #27 */

@peterdesmet
Copy link
Member Author

@bartaelterman, @niconoe, I need your feedback on this issue:

  1. We need to look in verbatim.txt to get a useful eventDate (as GBIF overwrites them without warning in occurrence.txt, see eventDate can be set blank with no issue thrown #27 - need to confirm with them that no field in occurrence.txt has the original eventDate). If so, how challenging is it to loop over that file too?
  2. Do we use the Canadensys Narwhal processor to provide high quality categories, instead of the current basic ones?
  3. Or do we not tackle this issue in this POC?

@peterdesmet peterdesmet modified the milestone: Coordinate quality categories Jan 19, 2015
@peterdesmet peterdesmet added this to the Beyond POC milestone Jan 19, 2015
@peterdesmet peterdesmet changed the title Date quality categories Create date quality categories Jan 19, 2015
@peterdesmet peterdesmet modified the milestone: Beyond POC Feb 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants