-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test adding ARI to a New Model Run #245
base: master
Are you sure you want to change the base?
Conversation
dvc.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick (blocking): Most of the changes here just move stuff around. Let's revert the changes to this file, since they don't actually impact the outcome.
params.yaml
Outdated
run_id: "2024-03-17-stupefied-maya" | ||
run_id: "2024-06-02-test-damon" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (blocking): The export.run_id
key is really only used for changing the outputs of the export
pipeline stage. If you want to specify a run_id
, use a parameter in the YAML header of your Quarto doc.
pipeline/00-ingest.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (blocking): It's fine to edit this when messing around, but we shouldn't include changes to this file with the merge request. See my earlier structure note for a workaround.
ARI-model-test.qmd
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (blocking): Before I review the content of this report, let's get the structure worked out. Let's make a new root-level directory (analyses
) where one-off analyses related to the model can live. Then we'd have:
reports/
for things that are generated with every runanalyses/
for things that are created one time
In order to make the analysis reproducible, you need to load data from sources accessible to others. That would mean:
- For files in
input/
, push to the S3 DVC cache, then useread_parquet()
to load the file you pushed directly from S3 - For files in
output/
, finish a run and push it to S3. Then load the files directly from S3, again usingread_parquet()
Note that you can re-use the reports/_setup.qmd
file here, as it will load the input data from DVC (assuming you pushed) and most relevant output data for a given run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should all be fixed now. I pushed to DVC, let me know if there are any other steps that should be taken.
|
library(sf) | ||
library(dplyr) | ||
library(ggplot2) | ||
library(noctua) | ||
library(arrow) | ||
library(kableExtra) | ||
library(spdep) | ||
library(leaflet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Alphabetically order your dependencies.
title: "Testing the Impact of the Affordability Risk Index on the Model" | ||
format: html | ||
--- | ||
```{r, include = FALSE} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Run styler::style_file()
on this and lintr::lint()
. There are some minor formatting issues in here.
con <- dbConnect(noctua::athena()) | ||
noctua::noctua_options(cache = 10) | ||
|
||
input_data <- read_parquet("input/assessment_data.parquet") %>% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (blocking): This line is the main point of difference between this and the other reports. The other reports look at the current data, while this report should look at a snapshot of data that includes the variable of interest. In other words, this should ingest input data from DVC and output data from S3 so that we have a fixed, deterministic report for each feature.
x <- round(x, 2) | ||
print(paste("Correlation =", x, "when removing 0s")) | ||
``` | ||
## Creating a map to see if there are spatial disparities in the inter-tract SHAP values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Use consistent title case.
Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
…vm into Test_New_Model_Run
Attached is a markdown file, as well as changes to the model pipeline to allow a test of the Affordability Risk Index. Mostly for Dan, but let me know where it should be housed. It shouldn't be merged, but seemed like it would be easiest to review / look at in this folder.