Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing readGWLdata to work with new data #10

Merged
merged 36 commits into from
Dec 2, 2017
Merged

Fixing readGWLdata to work with new data #10

merged 36 commits into from
Dec 2, 2017

Conversation

steffilazerte
Copy link
Collaborator

No description provided.

 - downloads well data from BC gov website
 - match formatting to previous version
- Alphebatize
- Add tidyr
- Start directly referring to imports explicity (i.e. dplyr::mutate)
- Import "%>%" for entire package
- Alphebatize
- Add tidyr
- Start directly referring to imports explicity (i.e. dplyr::mutate)
- Import "%>%" for entire package
- Add rmarkdown to Suggests for vignettes
- Update contents to use get_gwl()
- Add details for `which` argument
@steffilazerte
Copy link
Collaborator Author

This pull does the following:

  • adds a new function get_gwl to download well data (including documentation)
  • adds tests to ensure the working of downstream functions
  • updates the vignette to use get_gwl
  • removes most global variables (but see comments on issue Avoid global variables #1)

@steffilazerte
Copy link
Collaborator Author

Also forgot to mention: The only thing missing from the downloaded/formatted data is two additional metadata fields: EMS_ID and Station_Name. Right now they're filled with NA, and I'm not sure where to get the information from without web-scraping the well details (for example, but even this one doesn't have an EMS ID...).

@ateucher
Copy link
Contributor

Thanks @steffilazerte! I'll try and have a look today or tomorrow.

Copy link
Contributor

@ateucher ateucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @steffilazerte - this looks really great! The tests and the working vignette are fantastic!

Just a couple of small points which are open for discussion (comments inline).

The only other thing is that I generally try to avoid pipes in package functions because it makes debugging really difficult - I know the package already had pipes in it, just wondering what you think?

R/readGWLdata.R Outdated

well_avg <- utils::read.csv(text = data[[2]], stringsAsFactors = FALSE)
well_avg <- well_avg[, names(well_avg)[names(well_avg) != "year"]]
well_avg <- tidyr::spread(well_avg, "type", "Value")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know they were in the original, but I'm not sure we want the historical mean/max/min attached to every data frame... what about a fourth which option ("hist_avg" or something) that allows downloading just that on its own, and leave it up to the user to join them if they need to?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about an addition argument, hist_avg = FALSE? I've already coded it in so this way we can just wrap the code that downloads and joins in if(hist_avg) {...}. We can make the argument FALSE by default to minimize unnecessary downloading?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea!

by = "dummydate")

################################
# Need station name/location meta information!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is a source for well metadata coming in the next couple of weeks, if you're able to hold off then we can add that then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, no hurry. We could still push to master, if you think people may want to use it without the meta data. Or we can wait if you think it could get confusing.

gwlZypTest <- function(dataframe, wells=NULL, byID, col, method="both") {
#' @export

gwlZypTest <- function(dataframe, wells = NULL, byID, col, method = "both") {

Copy link
Contributor

@ateucher ateucher Dec 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check the existence of columns named in byID and col. I will open an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#12 - probably should look across the board to make sure it's done consistently (for later)

dplyr::group_by(.data$EMS_ID, .data$Well_Num,
Year = lubridate::year(.data$Date),
Month = lubridate::month(.data$Date)) %>%
dplyr::mutate(Date = dplyr::case_when(length(.data$Well_Num) < 5 ~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

R/readGWLdata.R Outdated
if (!inherits(path, "textConnection") && !file.exists(path)) {
stop(paste0("The file ", path, "does not exist."))
}
if(!(which %in% c("all", "recent", "daily"))) stop("type must be either 'all', 'recent', or 'daily'.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using which = c("all", "recent", "daily") in the argument list, and then which <- match.arg(which) to check?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit it never occurred to me, but yes, that would totally work. The only small thing I don't like is that the error message is:

Error in match.arg(which) : 
  'arg' should be one of “all”, “recent”, “daily”

Which, to an R newbie, isn't really a clear error message.

But either way, 'type' should be 'which' in the original!

Copy link
Contributor

@ateucher ateucher Dec 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope you don't mind, but this inspired me to write a function arg_match that does a better job of this. I implemented it for get_gwl in 65acc50

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely don't mind, that's a great function! I like it much better than the original.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll make use of it in a lot of other places!

R/readGWLdata.R Outdated
httr::stop_for_status(gwl_data)
gwl_data <- httr::content(gwl_data, as = "text", encoding = "UTF-8")

gwl_avg <- httr::GET(paste0(url, "minMaxMean.csv"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See question below for whether we should always get the minMaxMean...


#' Retrive and format groundwater data from BC Government GWL site
#'
#' Go to <http://www.env.gov.bc.ca/wsd/data_searches/obswell/map/> to find your
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice documentation

DESCRIPTION Outdated
ggplot2 (>= 2.0.0),
ggmap (>= 2.6.1),
httr (>= 1.3.1),
lubridate (>= 1.5.0),
rgdal (>= 1.1-3),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking we should move rgdal and sp to Suggests as they're only used for the one function (utm_dd). Then at the top of the body of utm_dd use:

if (!requireNamespace("rgdal") || !requireNamespace("sp")) {
  stop("You need the sp and rgdal packages installed to use this function")
}

Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely, especially as rgdal and sp aren't lightweight packages

@steffilazerte
Copy link
Collaborator Author

Commit 225c64e was odd. In order to keep rgdal and sp as suggests and to avoid a check error, I had to manually delete their 'import' lines from the NAMESPACE. But if I reran devtools::document() those import lines would be added back in... possibly a bug in document()?

R/utm_dd.R Outdated
utm <- SpatialPoints(d[2:3], proj4string=CRS(paste0("+proj=utm +datum=", d[4], " +zone=", d[1])))
sp <- spTransform(utm, CRS("+proj=longlat"))
utm <- sp::SpatialPoints(d[2:3],
proj4string = CRS(paste0("+proj=utm +datum=", d[4], " +zone=", d[1])))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need sp::CRS here and sp::coordinates below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I missed that. I also missed the @imports that called rgdal and sp, which is why I had trouble with the NAMESPACE. Fixed now in fee8dc9.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ateucher ateucher merged commit 9f1be71 into master Dec 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants