New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include Scrutinizer and ScrutinizerDate in COLDP exports #3464
Comments
@dhobern, we already have 2 ways to see this stats in TW. In the Filter Nomenclature, you can search by updated date range and a person who made the change. You can add the family name with descendants to the search parameters and get the stats for a single family. We also have a specialized task called the Project Activity, where you can track for each user of the project how many records were created/updated, not just in taxonomy, in any TW model, how much time was spent, number of records edited per hour, etc. Please try and let us know if this satisfy your needs. |
Thanks @proceps - yes, I realise I can do this, but I really want to start automating various metrics for datasets in ChecklistBank ad especially those that are components for COL, so I'd like to have the necessary information exposed there, and adding these two fields to the export felt like it should be a small/quick matter. |
@dhobern We have a "Verifier" role that would add more explicity semantics than the Housekeeping created/updated. If we created the ability to batch add this role/metadata via the TaxonName filter (and see it's individual use on the radial annotator) would that be a better way to record this data? I'm hesistant to overload the semantics of houskeeping updated/created by linking them to things like Scrutinizer. |
Hmmm - I fixed around 2000 names over the weekend (combinations included) and would like to be able to develop a traffic-light-based map for the whole insect order to understand how recently each name was touched by someone - without seriously slowing myself down. The Created/Updated housekeeping elements are really useful for anyone using TW to clean up dirty data, and I don't have time to edit yet another radial link each time I'm working my way through tidying a name. So I'm not sure the Verified flag would fit what I need (although I guess it would be nice to have such a flag I could use at the level of subgenera and up so I could easily mark sections that I believe are currently complete). The scrutinizer and scrutinizerDate fields in COLDP are really mainly used as a modifiedBy/modifiedAt pair, so what you already have seems a perfect fit to me. |
@dhobern Curious- would an internal report meet your needs or does Checklist bank have some capabilities to do this? I.e. it seems this is going to be of use to others as well. If you have a sample plot (napkin sketch) and/or table please share. We can make a generalized report that loops valid family names pretty easily. |
Thanks @mjy - I couldn't generate my report internally in TaxonWorks - I need to integrate with the datasets that COL uses to replace some of the worst sections in GLI. Right now, this would be external to ChecklistBank, but COL wants to increase the internal metrics for all datasets there, so I would expect some components would migrate into the CLB functionality. I suppose my general thought is that this is useful contextual information for many analyses and presentations of the exported data and it seems that it should be part of what a user gets when they download from TW. |
I would suggest COL needs to implement Housekeeping concepts. Too often something like this gets done, and it gets sloppy. This seems particularly important when we are trying to give attribution to people for what they actually did (we can't promise every updater is a scrutinizer), and when we try to record proper metadata provenance trails. I don't mind adding this, but it's going to have big stars all over it. We literally just introduced the Georeferencer role to deal with this exact problem (we were attributing Georeferencer to people who added data, not people who did the Georeferencing). |
Thanks @mjy - let me discuss with Markus - maybe we can add optional timestamp elements to all COLDP tables and use those instead. |
Adding modified/modifiedBy to ColDP for all records makes sense to me and is already part of the database model anyhow. I am having more problems with the scrutinizer property which exists from the start of COL. There hardly seem to be sources out there that track this concept and most often you find the housekeeping modifiedBy being used for it instead. |
ColDP doesn't mind about extra columns, so you could already include 2 new columns (I would probably call them |
@mdoering @dhobern Sounding better, now we have to do one better and leapfrog the oldness. We need to be able to include ORCiD or Wikidata or other global identifiers as pointers to the Person/People in question. I believe TDWG is moving forward with something like |
It is standard, but I rarely see any use for the |
I do like created from a Time perspective, lag/latency means a lot. We have found it useful in filtering results as well when we are trying to track down provenance-related issues. Classic ontology related responses from what I've experienced include the ID and a human readable label, even if redundant. If we just pass ORCiD then you're going to have to lookup names if you add any functionality on top of the dataset- if you're not planning to do any of that then one field should be fine I think. |
I will anyway lookup orcids and have to find also a way to manage users similar to how we already track local CLB users. I would actually love to only ever see ORCIDs or other resolvable identifiers instead of usernames like |
Perfect. We'll send ORCiDs or names in that field then. |
Great - thank you both so much |
@mjy - Thanks. When will we see this change in the site? I just created a small COLDP export for a genus and it seems not yet to be included. |
In general when you see the most recent CHANGELOG that has a release number in front of the changes they will be live. We're hoping to have it live this week or early next. |
Thanks @mjy |
Feature or enhancement
I need to start deriving metrics for progress in cleaning up the Global Lepidoptera Index. One of these relates to how many taxon records have been modified for each family each year. I would like to be able to assess this quickly as part of processing the COLDP exports, but there is no timestamp information in the export. The best way to do this would be to include your Updated By and Updated At values in the COLDP Taxon fields scrutinizer and scrutinizerDate, which are currently blank.
Location
Catalogue of Life (CoLDP) exports task
Screenshot, napkin sketch of interface, or conceptual description
No response
Your role
Data curator / biodiversity informatician
The text was updated successfully, but these errors were encountered: