-
Notifications
You must be signed in to change notification settings - Fork 0
As a data manager, I want to see alerts when the data does not pass validation checks so that I can fix validation problems quickly. #72
Comments
Addressed in Princeton-CDH/pemm-data#6 |
@elambrinaki This one is tricky to test. Don't feel limited by the checklist I've provided below. Currently test messages are going to the
|
@kmcelwee To check the format of |
@elambrinaki I proposed the change to 20 on a branch, but just waiting on approval from @rlskoeser. Rebecca, for something small like this, should I merge w/o your approval? Or is it always safe to just wait? |
@kmcelwee I apologize, I didn't see a review request. For trivial changes, I'm not even sure they need a PR. Changing a config value like this without changing any code logic doesn't require review. |
When goodtables is fed an improperly formatted JSON, it exits as if everything passed, creates the output file saying it has 0 errors, and just outputs a warning in its output json: "tables": [],
"warnings": [
"Data Package \"datapackage.json\" has a loading error \"Unable to parse JSON at \"datapackage.json\". Expecting ',' delimiter: line 596 column 21 (char 23272)\""
], Option 1
This would stop the validation from executing. But no messages would be sent to Slack unless it was a pull request, but Evgeniia will probably only do direct commits. The errors are output to the shell, and while they're viewable, you'd have to know where to look. Option 2 But I even went into goodtables's doumentation and couldn't find where or how they list their warnings. It's going to be something messy like: if error_json.get('warnings', None):
if any([warning.startswith('Data Package "datapackage.json" has a loading error') for warning in error_json['warnings']]):
slack_message = '\n'.join(error_json['warnings']) @rlskoeser thoughts? |
Wow, their warning output really isn't that great, is it. And why is that a warning and not an outright error? @kmcelwee could you revise your python script to detect nonstandard json output (absence of any validation information, I maybe) and then check the datapackage.json validity and send a slack alert with the json parsing error? (Probably any case where we don't get validation output should be reported as an error / failure) |
@kmcelwee Thank you for finding my error! |
@kmcelwee Thank you very much for setting up the goodtables reports! With their help, I corrected several errors on the Manuscript and Canonical Story sheets. For these two sheets, I confirm the For the Story Instance with 765 errors, having 20 errors shown in the report is not enough. Is it possible to increase the number of errors shown for a specific sheet only? If so, I would like to view 1,000 errors on the Story Instance sheet and 20 errors on other sheets. Alternatively, is it possible to change the order in which the errors are reported from A clarification regarding a big number of duplicate rows on the Story Instance sheet: for the mss that are currently being cataloged, I added identical rows with ms name and formulas, but without unique information (ID, folios, etc.) -- this will be done by catalogers. So duplicate rows are expected for the current stage of work. |
@elambrinaki I noticed this when I was starting to setup the validation. Two options that have occurred to me so far:
@kmcelwee have you thought of any other ways to handle this? FWIW, I prefer option 1. |
@rlskoeser @kmcelwee Is increasing the number of reported errors to 1000 a possible solution? |
@elambrinaki oh, I didn't comment on that part of your request but Kevin and I discussed it on Slack! Sorry. I don't think the number of errors reported in the build should be capped. Kevin's going to work on making that config optional and then turn it off for you, so you'll see all the errors. (FWIW, I don't want him spending additional dev time on sorting the errors.) |
@elambrinaki @rlskoeser I've just addressed ERROR_MAX in my latest commits to Princeton-CDH/pemm-data#12 ✅ |
@rlskoeser @kmcelwee Perfect! Thank you very much! |
dev notes
The text was updated successfully, but these errors were encountered: