Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test adding ARI to a New Model Run #245

Open
wants to merge 31 commits into
base: master
Choose a base branch
from
Open

Conversation

Damonamajor
Copy link
Contributor

@Damonamajor Damonamajor commented Jun 5, 2024

Attached is a markdown file, as well as changes to the model pipeline to allow a test of the Affordability Risk Index. Mostly for Dan, but let me know where it should be housed. It shouldn't be merged, but seemed like it would be easiest to review / look at in this folder.

dvc.yaml Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick (blocking): Most of the changes here just move stuff around. Let's revert the changes to this file, since they don't actually impact the outcome.

params.yaml Outdated
Comment on lines 448 to 449
run_id: "2024-03-17-stupefied-maya"
run_id: "2024-06-02-test-damon"
Copy link
Member

@dfsnow dfsnow Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): The export.run_id key is really only used for changing the outputs of the export pipeline stage. If you want to specify a run_id, use a parameter in the YAML header of your Quarto doc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): It's fine to edit this when messing around, but we shouldn't include changes to this file with the merge request. See my earlier structure note for a workaround.

Copy link
Member

@dfsnow dfsnow Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): Before I review the content of this report, let's get the structure worked out. Let's make a new root-level directory (analyses) where one-off analyses related to the model can live. Then we'd have:

  • reports/ for things that are generated with every run
  • analyses/ for things that are created one time

In order to make the analysis reproducible, you need to load data from sources accessible to others. That would mean:

  • For files in input/, push to the S3 DVC cache, then use read_parquet() to load the file you pushed directly from S3
  • For files in output/, finish a run and push it to S3. Then load the files directly from S3, again using read_parquet()

Note that you can re-use the reports/_setup.qmd file here, as it will load the input data from DVC (assuming you pushed) and most relevant output data for a given run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should all be fixed now. I pushed to DVC, let me know if there are any other steps that should be taken.

@Damonamajor Damonamajor requested a review from dfsnow June 11, 2024 15:48
@Damonamajor
Copy link
Contributor Author

  • For input data pull from the input file from S3 using the dvc hash
  • That would get us deterministically the same set of input data
  • For the output, using the github actions workflow we should be able to run the full model pipeline with the new featuure in S3 in the model bucket.
  • Pull from that RunID the one that is generated by the pipeline.
  • That would give us input and output data including the performance stats.

Comment on lines +6 to +13
library(sf)
library(dplyr)
library(ggplot2)
library(noctua)
library(arrow)
library(kableExtra)
library(spdep)
library(leaflet)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Alphabetically order your dependencies.

title: "Testing the Impact of the Affordability Risk Index on the Model"
format: html
---
```{r, include = FALSE}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Run styler::style_file() on this and lintr::lint(). There are some minor formatting issues in here.

analyses/ARI-model-test.qmd Show resolved Hide resolved
con <- dbConnect(noctua::athena())
noctua::noctua_options(cache = 10)

input_data <- read_parquet("input/assessment_data.parquet") %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (blocking): This line is the main point of difference between this and the other reports. The other reports look at the current data, while this report should look at a snapshot of data that includes the variable of interest. In other words, this should ingest input data from DVC and output data from S3 so that we have a fixed, deterministic report for each feature.

analyses/ARI-model-test.qmd Outdated Show resolved Hide resolved
analyses/ARI-model-test.qmd Outdated Show resolved Hide resolved
analyses/ARI-model-test.qmd Show resolved Hide resolved
analyses/ARI-model-test.qmd Outdated Show resolved Hide resolved
analyses/ARI-model-test.qmd Outdated Show resolved Hide resolved
x <- round(x, 2)
print(paste("Correlation =", x, "when removing 0s"))
```
## Creating a map to see if there are spatial disparities in the inter-tract SHAP values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Use consistent title case.

Damonamajor and others added 6 commits June 24, 2024 10:37
Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
@Damonamajor Damonamajor changed the title Testing of New Model Run Test adding ARI to a New Model Jun 25, 2024
@Damonamajor Damonamajor changed the title Test adding ARI to a New Model Test adding ARI to a New Model Run Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants