Skip to content
This repository has been archived by the owner on Dec 13, 2022. It is now read-only.

BC and Alberta weekend new case numbers #44

Closed
PVillette opened this issue Oct 6, 2020 · 5 comments
Closed

BC and Alberta weekend new case numbers #44

PVillette opened this issue Oct 6, 2020 · 5 comments

Comments

@PVillette
Copy link

Both BC and Alberta new case numbers don't show weekend data, Saturday and Sunday numbers are merged with Monday, leading to 0s on weekends and hyper-inflated Monday counts. Both provinces do however report weekend breakdowns, and the data are available via the BC CDC and the Alberta government. Is there any appetite to correct the BC and Alberta weekend numbers?

@ishaberry
Copy link
Member

Hi There,

Thanks for your message. At the moment we are not planning on making this update. The numbers reported by BC CDC and Alberta Gov't are reported using internal report dates-- so we are not exactly sure which days specifically the cases are added (as these dates do not align with our variable definition of public report dates). We are in the process of setting up some data linkages, if these are successful we hope to include this information.

We realize that Monday's are hyper-inflated and note that in our dashboard for Mondays. We also recommend that individuals using this data look at trends and 7-day averages as opposed to single-day counts to better understand current trajectories.

Hope that helps!

@jeanpaulrsoucy
Copy link
Member

Using the dates from the BCCDC CSV would result in distorted real-time data as well.

Take this Monday's dataset: https://github.com/jeanpaulrsoucy/covid-19-canada-gov-data/blob/master/bc/case-data/BCCDC_COVID19_Dashboard_Case_Details_2020-10-05_23-02.csv

BC reported 358 cases on Monday. But only 11 of them have the date of 2020-10-05 (Monday), with the rest having dates earlier than this (mainly the weekend). This is not unique to Mondays - this happens every single day. I believe there have even been days when every single case added is given a date earlier than the current date.

Of course, this distortion is corrected over time as cases are "backfilled"...but using the dates from the BCCDC CSV (internal reporting dates) will result in constant real-time distortion of the most recent dates, mainly the current date.

Date a case was reported to the public is the only date variable that is consistent across all provinces.

Our on-going data linkage project should allow us to provide an alternate date column for cases in certain provinces (ON, BC, AB). Stay tuned for updates on this.

I have also considered including the provincial CSVs as alternate datasets in the repository. It may be possible to further integrate this into our API by providing an option to use our dataset (report date) or the official date (internal report date) when returning the time series. This would give the best of both worlds and is technologically feasible.

Let me know if you have any further questions.

@gauss256
Copy link

gauss256 commented Oct 7, 2020

Suggestion: Have a column for crowd-sourced numbers. All it would take is one person to enter the weekend numbers reported on Monday. The repo is a great resource and it would be helpful not to have these weekend discontinuities. Even with 7-day averaging they distort the curves.

@jeanpaulrsoucy
Copy link
Member

@gauss256 I don't think that would work with our process, unfortunately. All of our time series are completely re-generated daily from cases.csv and mortality.csv (using update_data.R: https://github.com/ishaberry/Covid19Canada/blob/master/scripts/update_data.R). BC frequently edits old cases - removing/moving them - which we take into account by comparing old and new datasets. These are then reflected in our updated time series. At the moment, we have no way to feed in crowd-sources data to our private Google Sheets that are used to generate cases.csv and mortality.csv each day. The good news is that our data linkage script (prototype for ON here: https://github.com/ishaberry/ON_data_link) should mainly solve this problem, and be robust to changes in historical data, since it can be periodically re-run on the entire dataset. We just need to adapt it for BC, which is on my to-do list.

Perhaps a stop-gap alternative would be writing a script that pulls in the official BC dataset each day and formats it like our dataset - creating a drop-in alternative for our BC dataset.

@gauss256
Copy link

gauss256 commented Oct 7, 2020

Perhaps a stop-gap alternative would be writing a script that pulls in the official BC dataset each day and formats it like our dataset - creating a drop-in alternative for our BC dataset.

That's approximately what I'm doing now.

Anyway, thanks for all your good work on this. Looking forward to data linkage for BC.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants