Skip to content

Conversation

@lindsayplatt
Copy link
Collaborator

We won't have internet for the National Monitoring Conference, so this PR contains changes to use local files (incl other comments/target adjustments as needed). We will download the zip for the course from this branch.

@lindsayplatt lindsayplatt requested a review from ehinman February 20, 2025 19:41
Copy link
Collaborator

@ehinman ehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good Lindsay! A couple minor comments to consider, but I think suitable as-is. At the very least, we should mention what the vroom warning message is in the targets pipeline.

I had another thought I wanted to share: I noticed that the plot pngs show the site ID, not the location name. This makes it a little bit harder to connect the pngs to the map, which shows location name. Should we join the site data to the WQ dataset so that we can grab the site name and use that in the plots? It would involve editing either refine_wqp_data or plot_timeseries to include the site metadata. It may be a small enough point that we don't need to make this change, but came to my mind nonetheless.

01_fetch.R Outdated
format = "file"
tar_target(
p1_dataset,
read_csv(p1_dataset_csv, col_types = cols())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own education, what does col_types = cols() do, as opposed to leaving it NULL?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It silences the message about what column types it chose by default and makes for a cleaner console :)

tar_target(
p1_dataset,
read_csv(p1_dataset_csv, col_types = cols())
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the following warning from this target:
Warning message: One or more parsing issues, call problems() on your data frame for details, e.g.: dat <- vroom(...) problems(dat)

I followed the suggestions doing test <- vroom::vroom(p1_dataset_csv) and then vroom::problems(test), and all of the warning messages have to do with columns that are almost all NAs but are sparsely populated with something (e.g. "Not Detected", "in", "0.5", etc.). In some cases (and I did not realize this!), read_csv will actually convert non-logical values to NA! For example, there are a few numeric values in the column ActivityDepthHeightMeasure.MeasureValue in the csv, but those get converted to NA in the pipeline. Similarly, MeasureQualifierCode has a handful of "J" values, but these are converted to NA since the auto-column type- detection thinks the column should be logical.

My convention has kind of been to read in all columns as character and then convert to numeric the applicable columns. Should we do that here? Or not worry about it? I think if we'd rather not get into it, it might make sense to suppressWarnings(), but open to ideas.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating this! I had not noticed it. The fastest way to handle this in the current situation is to add guess_max = 5000, which increases the number of rows read_csv() looks at before deciding the column type. I don't really want to spend time explaining the different argument options, so I added a wrapper function to do this. Commit coming soon!

@lindsayplatt
Copy link
Collaborator Author

I definitely like the idea of adding the site name to the plot headers but changing that would require redoing some of the screenshots used in the PPT because we would be adding another dependency for some of the targets on the tar_visnetwork() chart. For this particular workshop, I care a bit more about the pipelining structure concepts and those matching up than someone more quickly matching sites from the map to the pngs (some people may not even open these files).

Let's make that an enhancement for the future as an issue because I do think it is good to fix, just feel like it is too much effort for now.

@ehinman
Copy link
Collaborator

ehinman commented Feb 21, 2025

Very good. Works great. Feel free to merge.

@lindsayplatt lindsayplatt merged commit ffe8e08 into CUAHSI:zip_for_nmc Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants