Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dropping rows with NA's in relevant columns only #105

Merged
merged 16 commits into from
Mar 14, 2024
Merged

Conversation

dsweber2
Copy link
Contributor

@dsweber2 dsweber2 commented Feb 29, 2024

The reason our forecasters were underperforming was that when merging the chng and hhs dataset, we were removing any rows with NA's. This fixes things to only remove NA's inside a forecaster, and only for columns actually used in forecasting. For an example of the problematic data, compare

hhs_archive_alone <- tar_read("hhs_archive_data_2022") %>%
          select(geo_value, time_value, value, issue) %>%
          rename("hhs" := value) %>%
          rename(version = issue) %>%
          as_epi_archive(
            geo_type = "state",
            time_type = "day",
            compactify = TRUE
          )
max(hhs_archive_alone$as_of(as.Date("2021-11-29"))$time_value)

which is the last hhs data, while

chng_archive_alone <- tar_read("chng_archive_data_2022") %>%
          select(geo_value, time_value, value, issue) %>%
          rename("chng" := value) %>%
          rename(version = issue) %>%
          as_epi_archive(
            geo_type = "state",
            time_type = "day",
            compactify = TRUE
          )
max(chng_archive_alone$as_of(as.Date("2021-11-29"))$time_value)

which is the last chng data

chng_archive_alone$as_of(as.Date("2021-11-29")) %>% arrange(time_value) %>% tail
both <- tar_read(joined_archive_data_2022)
max(both$as_of(as.Date("2021-11-29"))$time_value)

joined matches chng:

both <- tar_read(joined_archive_data_2022)
max(both$as_of(as.Date("2021-11-29"))$time_value)

closes #93

@dsweber2
Copy link
Contributor Author

I'm going to run just the data targets in production to make sure this does the right thing too.

Copy link
Contributor

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me at a high level skim.

R/data_validation.R Show resolved Hide resolved
@dsweber2
Copy link
Contributor Author

so after renv::install("targets@1.3.2", "tarchetypes@0.7.9"); renv::snapshot(), this does seem to be working.

I also added the necessary changes to get DEBUG_MODE=true to actually get browser() working. Doesn't seem to be a way around having to run tar_make(name, callr_function=NULL).

R/forecaster_flatline.R Outdated Show resolved Hide resolved
@dsweber2
Copy link
Contributor Author

dsweber2 commented Mar 8, 2024

We definitely need to figure out a better way to deal with dates for slide_forecaster, though I'm not sure what exactly.

@dsweber2 dsweber2 merged commit 03f30e9 into main Mar 14, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Compare smoothed scaling with legacy forecasters
2 participants