Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fread instead of arrow to get data in get_forecaster_predictions_alt #652

Merged
merged 6 commits into from
Jul 24, 2023

Conversation

nmdefries
Copy link
Contributor

Description

Some submitted predictions have a column order that doesn't agree with the arrow schema specified in get_forecaster_predictions_alt. Due to a change in behavior in newer versions of arrow, the column order in input files and in the arrow schema must now be the same (previously were reordered). We pinned arrow to a working version as a patch, but that version is now quite old.

Incidentally, arrow is not providing much additional benefit, so we aren't tied to it. Since we want to prioritize speed, data.table::fread is a good alternative. data.table also offers opportunities for future optimization. Since fread can read from URL (downloads to a temp file), we don't need to do our own download step, which simplifies the process somewhat.

Fixes

Closes #609

Copy link
Collaborator

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally and works fine with the stated edge cases (forecasters that change their column order from date to date), e.g.:

evalcast:::get_forecaster_predictions_alt("COVIDhub_CDC-ensemble", forecast_dates = a
    s.Date(c("2021-11-08", "2021-11-15"))) %>% tibble

returns consistent and correct prediction cards.

@dshemetov
Copy link
Collaborator

Consider bumping the patch version on evalcast btw and adding a note in the CHANGELOG? Non-blocking.

@nmdefries
Copy link
Contributor Author

@dshemetov Do you have merge ability? This is ready

@dshemetov dshemetov merged commit fa4bc21 into main Jul 24, 2023
4 checks passed
@dshemetov dshemetov deleted the ndefries/use-fread-instead-of-arrow branch July 24, 2023 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle both standard and non-standard column orders of CovidHub forecast files in get_covidhub_predictions_alt
2 participants