Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Applab Datasets] Add covid19 live dataset script #35377

Merged
merged 2 commits into from
Jun 18, 2020
Merged

Conversation

ajpal
Copy link
Contributor

@ajpal ajpal commented Jun 17, 2020

Script for COVID-19 live datasets. One is 90 days of daily data for 55 US states/territories from source. The other is 26 weeks of data for 188 countries from source. These time-windows were chosen to stay under the 5000 row limit for applab data tables.

image
image

Links

Testing story

Reviewer Checklist:

  • Tests provide adequate coverage
  • Code is well-commented
  • New features are translatable or updates will not break translations
  • Relevant documentation has been added or updated
  • User impact is well-understood and desirable
  • Pull Request is labeled appropriately
  • Follow-up work items (including potential tech debt) are tracked and linked

@ajpal ajpal requested a review from sureshc June 17, 2020 21:47
end
end
return records, columns
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Some of this looks duplicated from states. Does it make sense to factor any of it out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to keep them separate for clarity. The data are similar, but not the same, so I think it would end up being fairly confusing if we tried to re-use the code.

@@ -6,7 +6,8 @@ class DatasetsController < ApplicationController
before_action :initialize_firebase
authorize_resource class: false

LIVE_DATASETS = ['Daily Weather', 'Top 200 USA', 'Top 200 Worldwide', 'Viral 50 USA', 'Viral 50 Worldwide']
LIVE_DATASETS = ['Daily Weather', 'Top 200 USA', 'Top 200 Worldwide', 'Viral 50 USA', 'Viral 50 Worldwide',
'COVID-19 Cases per US State', 'COVID-19 Cases per Country']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Should these all be on their own line? (sorry for the style comment... don't know what our linter's rules are for ruby)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This passes the linter, so I think it's fine

Copy link
Contributor

@jmkulwik jmkulwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these!


def get_us_covid19_data
response = Net::HTTP.get_response(URI(US_DATA_URL))
return unless response.code == '200'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we log to honeybadger if this fails?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(you don't need to do any of this now, but we may want to log failures like this for all live datasets at some point)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point we definitely should do some error handling here. https://codedotorg.atlassian.net/browse/STAR-1161 to track

response = Net::HTTP.get_response(URI(US_DATA_URL))
return unless response.code == '200'
response.body.force_encoding('utf-8')
csv = CSV.new(response.body)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[this breaks the convention for past live dataset scripts, but might be useful to call out anyways]

we can parse this CSV to a CSV::Table, which means we can reference row values by their column name, rather than their index: https://ruby-doc.org/stdlib-2.6.1/libdoc/csv/rdoc/CSV.html#class-CSV-label-CSV+with+headers

data = CSV.parse(response.body, headers: true)
data.first.field(:date)
  => '2020-01-21'

Comment on lines 35 to 37
if daily_data[formatted_date].nil?
daily_data[formatted_date] = []
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ruby shortcut: daily_data[formatted_date] ||= []

@ajpal ajpal merged commit 94360e9 into staging Jun 18, 2020
@ajpal ajpal deleted the jun16-covid-data branch June 18, 2020 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants