Adjust to use local data files rather than downloading from WQP #9

lindsayplatt · 2025-02-20T19:41:01Z

We won't have internet for the National Monitoring Conference, so this PR contains changes to use local files (incl other comments/target adjustments as needed). We will download the zip for the course from this branch.

ehinman

This looks good Lindsay! A couple minor comments to consider, but I think suitable as-is. At the very least, we should mention what the vroom warning message is in the targets pipeline.

I had another thought I wanted to share: I noticed that the plot pngs show the site ID, not the location name. This makes it a little bit harder to connect the pngs to the map, which shows location name. Should we join the site data to the WQ dataset so that we can grab the site name and use that in the plots? It would involve editing either refine_wqp_data or plot_timeseries to include the site metadata. It may be a small enough point that we don't need to make this change, but came to my mind nonetheless.

ehinman · 2025-02-21T15:05:22Z

01_fetch.R

-  format = "file"
+  tar_target(
+    p1_dataset,
+    read_csv(p1_dataset_csv, col_types = cols())


For my own education, what does col_types = cols() do, as opposed to leaving it NULL?

It silences the message about what column types it chose by default and makes for a cleaner console :)

ehinman · 2025-02-21T15:31:04Z

01_fetch.R

+  tar_target(
+    p1_dataset,
+    read_csv(p1_dataset_csv, col_types = cols())
+  ),


I get the following warning from this target:
Warning message: One or more parsing issues, call problems() on your data frame for details, e.g.: dat <- vroom(...) problems(dat)

I followed the suggestions doing test <- vroom::vroom(p1_dataset_csv) and then vroom::problems(test), and all of the warning messages have to do with columns that are almost all NAs but are sparsely populated with something (e.g. "Not Detected", "in", "0.5", etc.). In some cases (and I did not realize this!), read_csv will actually convert non-logical values to NA! For example, there are a few numeric values in the column ActivityDepthHeightMeasure.MeasureValue in the csv, but those get converted to NA in the pipeline. Similarly, MeasureQualifierCode has a handful of "J" values, but these are converted to NA since the auto-column type- detection thinks the column should be logical.

My convention has kind of been to read in all columns as character and then convert to numeric the applicable columns. Should we do that here? Or not worry about it? I think if we'd rather not get into it, it might make sense to suppressWarnings(), but open to ideas.

Thanks for investigating this! I had not noticed it. The fastest way to handle this in the current situation is to add guess_max = 5000, which increases the number of rows read_csv() looks at before deciding the column type. I don't really want to spend time explaining the different argument options, so I added a wrapper function to do this. Commit coming soon!

lindsayplatt · 2025-02-21T16:33:23Z

I definitely like the idea of adding the site name to the plot headers but changing that would require redoing some of the screenshots used in the PPT because we would be adding another dependency for some of the targets on the tar_visnetwork() chart. For this particular workshop, I care a bit more about the pipelining structure concepts and those matching up than someone more quickly matching sites from the map to the pngs (some people may not even open these files).

Let's make that an enhancement for the future as an issue because I do think it is good to fix, just feel like it is too much effort for now.

ehinman · 2025-02-21T16:40:53Z

Very good. Works great. Feel free to merge.

adjust to use local data files rather than downloading from WQP

2edc2c2

lindsayplatt requested a review from ehinman February 20, 2025 19:41

ehinman approved these changes Feb 21, 2025

View reviewed changes

tiny custom CSV load function to handle vroom warning about columns

3f8e925

lindsayplatt mentioned this pull request Feb 21, 2025

Change plot titles to use site names, not ids #10

Open

lindsayplatt merged commit ffe8e08 into CUAHSI:zip_for_nmc Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adjust to use local data files rather than downloading from WQP #9

Adjust to use local data files rather than downloading from WQP #9

Uh oh!

lindsayplatt commented Feb 20, 2025

Uh oh!

ehinman left a comment

Uh oh!

ehinman Feb 21, 2025

Uh oh!

lindsayplatt Feb 21, 2025

Uh oh!

ehinman Feb 21, 2025

Uh oh!

lindsayplatt Feb 21, 2025

Uh oh!

lindsayplatt commented Feb 21, 2025

Uh oh!

ehinman commented Feb 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adjust to use local data files rather than downloading from WQP #9

Adjust to use local data files rather than downloading from WQP #9

Uh oh!

Conversation

lindsayplatt commented Feb 20, 2025

Uh oh!

ehinman left a comment

Choose a reason for hiding this comment

Uh oh!

ehinman Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

lindsayplatt Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

ehinman Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

lindsayplatt Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

lindsayplatt commented Feb 21, 2025

Uh oh!

ehinman commented Feb 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants