-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Applab Datasets] Add covid19 live dataset script #35377
Conversation
end | ||
end | ||
return records, columns | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Some of this looks duplicated from states. Does it make sense to factor any of it out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to keep them separate for clarity. The data are similar, but not the same, so I think it would end up being fairly confusing if we tried to re-use the code.
@@ -6,7 +6,8 @@ class DatasetsController < ApplicationController | |||
before_action :initialize_firebase | |||
authorize_resource class: false | |||
|
|||
LIVE_DATASETS = ['Daily Weather', 'Top 200 USA', 'Top 200 Worldwide', 'Viral 50 USA', 'Viral 50 Worldwide'] | |||
LIVE_DATASETS = ['Daily Weather', 'Top 200 USA', 'Top 200 Worldwide', 'Viral 50 USA', 'Viral 50 Worldwide', | |||
'COVID-19 Cases per US State', 'COVID-19 Cases per Country'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Should these all be on their own line? (sorry for the style comment... don't know what our linter's rules are for ruby)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This passes the linter, so I think it's fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these!
|
||
def get_us_covid19_data | ||
response = Net::HTTP.get_response(URI(US_DATA_URL)) | ||
return unless response.code == '200' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we log to honeybadger if this fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(you don't need to do any of this now, but we may want to log failures like this for all live datasets at some point)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point we definitely should do some error handling here. https://codedotorg.atlassian.net/browse/STAR-1161 to track
bin/cron/applab_datasets/covid19
Outdated
response = Net::HTTP.get_response(URI(US_DATA_URL)) | ||
return unless response.code == '200' | ||
response.body.force_encoding('utf-8') | ||
csv = CSV.new(response.body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[this breaks the convention for past live dataset scripts, but might be useful to call out anyways]
we can parse this CSV to a CSV::Table
, which means we can reference row values by their column name, rather than their index: https://ruby-doc.org/stdlib-2.6.1/libdoc/csv/rdoc/CSV.html#class-CSV-label-CSV+with+headers
data = CSV.parse(response.body, headers: true)
data.first.field(:date)
=> '2020-01-21'
bin/cron/applab_datasets/covid19
Outdated
if daily_data[formatted_date].nil? | ||
daily_data[formatted_date] = [] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ruby shortcut: daily_data[formatted_date] ||= []
Script for COVID-19 live datasets. One is 90 days of daily data for 55 US states/territories from source. The other is 26 weeks of data for 188 countries from source. These time-windows were chosen to stay under the 5000 row limit for applab data tables.
Links
Testing story
Reviewer Checklist: