Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query string support for /v1/jh/daily-reports endpoint #42

Merged
merged 2 commits into from
Apr 5, 2020

Conversation

andreagrandi
Copy link
Owner

This PR adds the following querystring parameters to the /v1/jh/daily-reports endpoint

  • last_update_from: a Date parameter (ie: 2020-03-21) wich will be used to search in JHDailyReport all the records ...WHERE last_update >= 2020-03-21
  • last_update_to: a Date parameter (ie: 2020-04-02) wich will be used to search in JHDailyReport all the records ...WHERE last_update <= 2020-04-02
  • country: a str parameter (ie: US) wich will be used to search in JHDailyReport all the records ...WHERE country_region = 'US'
  • province: a str parameter (ie: Los Angeles, CA) wich will be used to search in JHDailyReport all the records ...WHERE country_region = 'Los Angeles, CA'

Query example: /v1/jh/daily-reports endpoint?last_update_from=2020-03-21&last_update_to=2020-04-02&country=US&province=Los Angeles, CA

@andreagrandi andreagrandi added enhancement New feature or request Ready to be reviewed When a pull request is ready to be reviewed labels Apr 4, 2020
if last_update_to:
query = query.filter(models.JHDailyReport.last_update <= last_update_to)
if country:
query = query.filter(models.JHDailyReport.country_region == country)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter could be interpreted in two ways:

  • Give me data for this country as a whole
  • Give me data for regions within this country

I noticed when I was writing the import script that some data has become more granular over time, so it would be possible for a response to return data covering a mix of geographical areas.

In the case of the US, you would get county level data (see CSSEGISandData/COVID-19#1250).

I also noticed that the UK originally meant the UK but this then changed to a country/region of "United Kingdom" which has an entry for the UK, but also includes rows for overseas territories. See for example https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_daily_reports/03-30-2020.csv

This seems like it would be quite difficult to consume and we should consider doing some more data cleaning in the import. I can't think of any case where it would be useful to query the UK and get data for Bermuda.

We could try and map these these to a standard country codes e.g. ISO 3166. JHU have published a mapping we can use here CSSEGISandData/COVID-19#1791. I don't think this will be perfect, because we have data points like the channel islands which technically covers multiple ISO 3166 country codes, but it will be better than relying on identifiers that change over time.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter could be interpreted in two ways:

* Give me data for this country as a whole

* Give me data for regions within this country

I noticed when I was writing the import script that some data has become more granular over time, so it would be possible for a response to return data covering a mix of geographical areas.

In the case of the US, you would get county level data (see CSSEGISandData/COVID-19#1250).

I also noticed that the UK originally meant the UK but this then changed to a country/region of "United Kingdom" which has an entry for the UK, but also includes rows for overseas territories. See for example https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_daily_reports/03-30-2020.csv

This seems like it would be quite difficult to consume and we should consider doing some more data cleaning in the import. I can't think of any case where it would be useful to query the UK and get data for Bermuda.

We could try and map these these to a standard country codes e.g. ISO 3166. JHU have published a mapping we can use here CSSEGISandData/COVID-19#1791. I don't think this will be perfect, because we have data points like the channel islands which technically covers multiple ISO 3166 country codes, but it will be better than relying on identifiers that change over time.

Should I leave the name of the paramter consistend with the field name?
Looking at the data "country_region" has values like US, Italy, France etc... while province_state has names for US states (New York, California, Texas etc....).

Unfortunately JH CSSE confirms to be a very bad source of data, but there isn't much we can do about it. I hope we can support specific countries data sources as soon as possible.

I know that UK has released something similar to "Italian Protezione Civile". I think we should concentrate our efforts on those data sources, because the JH CSEE is only good to have a very generic "picture" of the various countries.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps for this PR, we could leave it consistent with the field name, and then we can create another issue/PR to link to ISO or WHO country codes. That way we can add in extra fields/filters that are more reliable later on.

I think JH CSSE are doing their best (that table they released is definitely better than nothing) but a lot of the problems come from individual countries' reporting, and they are passing along the problems. This has definitely made me appreciate the value of a strict/well managed schema though, and I think our API can help with that, it's just time consuming and fiddly.

Regarding the UK data, ONS publish a dataset of registered deaths we could use. That data should be reliable, but I guess it will lag behind reality a bit.

Public Health England publish case information, which I haven't looked at yet but I assume it's where all the JH CSSE data is coming from the UK counts.

Besides active cases, recoveries and deaths we could also look at leading indicators like proactive testing, provision of PPE/ventilators, or public transport usage, but I don't know how much of that is available as open data in the UK.

Should we create issues for some of this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw I thought this was an interesting article about some of the problems with official counts and what metrics are reliable/predictive: https://pandemic.substack.com/p/the-2-coronavirus-metrics-you-should

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw I thought this was an interesting article about some of the problems with official counts and what metrics are reliable/predictive: https://pandemic.substack.com/p/the-2-coronavirus-metrics-you-should

I've read the article, it begins correctly but misses the point entirely: the most useful metric at the moment is the number of new infection by the number of new tests (it's epidemiologists saying this, not my own conclusion).

It would be pointless to read "today there are +3000 new positive" without knowing how many tests they did. Italian Protezione Civile includes these numbers for example.

p.s: going back to the PR what should we do here? I'm not sure if/what I should fix 🤔

Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry, I think let's merge it as is for now and do some work on country codes later.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will update the README a little bit after merging this, to add a couple of examples and update the status of the project. Also the live API isn't linked anywhere yet, but I need to double check something before making the link public.

@andreagrandi andreagrandi merged commit f67874c into master Apr 5, 2020
@andreagrandi andreagrandi deleted the add-daily-reports-query-string branch April 5, 2020 10:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Ready to be reviewed When a pull request is ready to be reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants