Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data inconsistency [Jurisdiction reporting quality checks] #149

Closed
Nosferican opened this issue Apr 5, 2020 · 9 comments
Closed

Data inconsistency [Jurisdiction reporting quality checks] #149

Nosferican opened this issue Apr 5, 2020 · 9 comments
Assignees
Labels
Data quality stale An issue has had at least 15 days of inactivity

Comments

@Nosferican
Copy link

I found issues such as the following,
Examine these two comments
(1) COVID19Tracking/covid-tracking-data@abef18f#r38300508
and
(2) COVID19Tracking/covid-tracking-data@3339930#r38300525.

Which one is correct? Should I assume the latter one is the correct one?
There are two versions for IA that were checked at the same time 3/31 14:42 but have different values

| state | positiveScore | negativeScore | negativeRegularScore | commercialScore | dateModified         | dateChecked          | SHA1                                     |
|-------|---------------|---------------|----------------------|-----------------|----------------------|----------------------|------------------------------------------|
| IA    | 1             | 1             | 1                    | 1               | 2020-03-31T04:00:00Z | 2020-03-31T18:42:00Z | abef18fdd94afaeaabe40c988a04a75c3ee8a59e |
| IA    | 1             | 1             | 0                    | 0               | 2020-03-31T04:00:00Z | 2020-03-31T18:42:00Z | 333993065f7f9391f36ce05bd4f728ce496d9dcc |

I am asking for being able to resolve those inconsistencies when they show up.
If possible, what was the cause for the incorrect data so I may incorporate or watch out for those issues?

@julia326
Copy link

julia326 commented Apr 5, 2020

Hi @Nosferican , the commits here reflect what our API is serving at the time of the commit - the later commits will reflect a later and hopefully more correct snapshot. This repository is updated every ~6 hours capturing a snapshot - for the most current data, you can use https://covidtracking.com/api.

@Nosferican
Copy link
Author

Nosferican commented Apr 5, 2020

Aye. What I don't understand is how the discrepancy had the same dateChecked but different data.

  • Was that an issue with the parser of the HTML that was corrected afterwards?
  • Is that due to a bug on how the dateChecked is attached to the record?

Mostly trying to understand what caused the API to send contradictory information. For most data that wouldn't be an issue, but the state reporting quality checks are a bit different since the API doesn't expose the historical records and are only accessible through the backups.

For example, if the issue was a bug in the HTML parser or something, then I would attribute the incorrect state quality for all observations that were affected. I could update the values based on the commit date, but that's only assuming the backup had the same dateChecked working properly. Otherwise, it would not be corrected for those dates.

I am happy to open the issue on the particular repository where an action is required. For example, updating the notes about which dates might affected by the potential bug or something.

I can update my code to UPSERT the records, but wanted to understand what the issue is to make sure that it is a sensible solution.

@julia326 julia326 transferred this issue from COVID19Tracking/covid-tracking-data Apr 5, 2020
@Nosferican
Copy link
Author

Bump.

1 similar comment
@Nosferican
Copy link
Author

Bump.

@muamichali muamichali added this to Needs Investigation in Data Issues Apr 29, 2020
@stale
Copy link

stale bot commented May 14, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions!

@stale stale bot added the stale An issue has had at least 15 days of inactivity label May 14, 2020
@Nosferican
Copy link
Author

Might be relevant as the grades are being recovered for the new API schema.

@stale stale bot removed the stale An issue has had at least 15 days of inactivity label May 14, 2020
@stale
Copy link

stale bot commented May 29, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions!

@stale stale bot added the stale An issue has had at least 15 days of inactivity label May 29, 2020
@stale
Copy link

stale bot commented Jun 8, 2020

This issue has been closed because it was stale for 15 days, and there was no further activity on it for 10 days. You can feel free to re-open it if the issue is important, and label it as "not stale."

@stale stale bot closed this as completed Jun 8, 2020
Data Issues automation moved this from Historical Data Issues - Needs Investigation to Done Jun 8, 2020
@muamichali
Copy link
Contributor

Hi @Nosferican
The stale bot has closed this issue, but if it something that you still need resolved, please open it and we will look at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data quality stale An issue has had at least 15 days of inactivity
Projects
None yet
Development

No branches or pull requests

4 participants