Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make endpoints fetch all dates by default #234

Merged
merged 17 commits into from
Jan 18, 2024
Merged

Conversation

nmdefries
Copy link
Contributor

@nmdefries nmdefries commented Jan 5, 2024

Partially addresses #212.

Fetch all dates (time values) by default, using "*". Also support user-provided "*" with addition of helper function to convert "*" into a long epirange.

Given confusing naming of date params for pub_covid_hosp_facility, where collection_weeks actually requires YYYYMMDD format, let users provide weeks (with a warning).

Note: pub_wiki and pvt_twitter aren't updated here. They both take either days or epiweeks (mutually exclusive), so I'm proposing an interface change first in #236.

rename wildcard conversion type to "week"
`pub_covid_hosp_facility` says and implies it takes weeks for the arg `collection_weeks`, but it actually only takes days (corresponding to weeks)
and will error if given weeks. Given the misleading name and behavior,
and differing behavior compared to other endpoints with weekly data,
allow users to supply weeks -- these will be converted to dates.
refactor reformat_epirange; add date_to_epiweek helper fn

better collection_weeks message
@nmdefries nmdefries marked this pull request as ready for review January 10, 2024 16:55
Copy link

pullflow-com bot commented Jan 10, 2024

nmdefries (GitHub user) ▸ 🔘 Marked ready for review

Copy link

pullflow-com bot commented Jan 10, 2024

nmdefries (GitHub user) ▸ 🔍 Review requested from dshemetov, dsweber2 and brookslogan

2 similar comments
Copy link

pullflow-com bot commented Jan 10, 2024

nmdefries (GitHub user) ▸ 🔍 Review requested from dshemetov, dsweber2 and brookslogan

Copy link

pullflow-com bot commented Jan 10, 2024

nmdefries (GitHub user) ▸ 🔍 Review requested from dshemetov, dsweber2 and brookslogan

Copy link
Contributor

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High level skim, looks good to me. Might be good to add a changelog line now so you don't have to later.

@melange396
Copy link
Contributor

be careful with this; two months from now, if just 2 county-level daily signals that started January 2020 are requested simultaneously, they could exceed the row limit: 2*3295*(4*365+2*30) > 1e7. it looks like the user will be notified if the results are truncated, but it could still be a waste of time and cycles.

@dshemetov
Copy link
Contributor

Good point. Maybe add a warning about county=* queries?

@nmdefries
Copy link
Contributor Author

Thank for the information, george. Good to know we have a truncation warning already set up.

I believe only covidcast supports requesting all regions via the "*" wildcard (I'm can't check the pvt_ endpoints). pvt_twitter only reports down to the state level as the smallest region. pub_wiki doesn't take regions at all, but also doesn't take a wildcard for article names. So for these two endpoints, hitting the row limit seems hard or impossible.

Ideally, these endpoints would support "*" for geos, but that will involve a lot more work. I'll keep the row limit in mind if/when we get to that!

@nmdefries nmdefries merged commit d39bd25 into dev Jan 18, 2024
@nmdefries nmdefries deleted the ndefries/endpt-defaults branch January 18, 2024 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants