-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data quality checks gap analysis #13
Comments
Hi @djtfmartin - Is the GBIF Fact or Measurement transform where they do their data quality checking, or is there another transform missing from our diagram: https://confluence.csiro.au/display/ALASD/Process+overview+and+issues? |
Thanks @RobinaSanderson. The data quality checks are in various transforms (Location, Temporal, Taxonomy). |
Thanks, I see... Does it make sense to look at all the data quality checks as this one issue, or to split them up into separate issues for each transform if that is where they happen? Or do we need to do this task to get an overview of all the data quality work, and then have individual tasks for where we have to update/add to transforms, when we know what they are? |
I am wondering if we should put a lot of effort in when there was an undertaking some time ago from GBIF, ALA and iDigBio to implement TDWG TG2 Core Tests. That was previous administration admitted (John and Donald at least). And BTW, I have a table somewhere with all the tests from various agencies that I could find. I will see if I can find it. This was my start point for the TG2 work. |
thanks @RobinaSanderson, i think separate tasks. We have tasks for Location #22 and Taxonomy #26 but we need separate task for Temporal (EventProcessor in biocache-store). We also need tasks for the functionality handled in TypeStatusProcessor, BasisOfRecordProcessor and the other processors. |
Hi @djtfmartin I've created the following gap analysis issues: I will take a look at the link you gave above for further processes later. Sorry, I've got another piece of work to finish today. |
This and #125 are related activities |
Assertion codes not used (set) in biocache-store and biocache-service
Assertion codes only used in biocache-service
|
Things potentially missing from pipelines.
|
Everything in comment above 21 Apr has been addressed. |
Need to identify the gaps between ALA's current suite of data quality tests and GBIF's.
The follow up to the work would be to implement any missing tests into pipelines, ideally supported in GBIF's core implementation.
A list of ALA's tests can be derived from here:
ALA Assertion codes
The GBIF pipeline equivalent is here:
GBIF Occurrence issues
and
GBIF Name usage issues
The text was updated successfully, but these errors were encountered: