Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid downloading files that exist with byTileAOP? #61

Closed
mbjoseph opened this issue Oct 20, 2019 · 3 comments
Closed

Avoid downloading files that exist with byTileAOP? #61

mbjoseph opened this issue Oct 20, 2019 · 3 comments

Comments

@mbjoseph
Copy link

Is your feature request related to a problem? Please describe.

Currently byTileAOP, downloads files that exist already. It might be nice to add an overwrite argument that defaults to the current behavior (overwrite = TRUE), but allows users to skip downloads for files that already exist using overwrite = FALSE.

Describe the solution you'd like

I'd propose adding an overwrite argument to byTileAOP.

Describe alternatives you've considered

Currently, it's up to the users to avoid redundant downloads. My use case is trying to get canopy height model rasters at locations of ground plots. But, in some cases, multiple plots fall within the same tile, resulting in redundant downloads. As a user, I could do something like:

  1. Start by downloading one tile for the first ground plot location
  2. Read the polygon of the CHM from the metadata and remove any plots that are within the boundary of the CHM I just downloaded
  3. Iterate to the next ground plot and go back to step 1

Additional context

Here's an example of what I'm doing -- essentially I'm computing one coordinate for each ground plot, then I want to get the CHM data for each one, which results in redundant downloads:

library(tidyverse)
library(neonUtilities)
library(geoNEON)

# Get NEON CHM data for SJER, over the ground plots
# (adapted from https://www.neonscience.org/tree-heights-veg-structure-chm)
veglist <- loadByProduct(dpID="DP1.10098.001", site="SJER", package="basic")
vegmap <- def.calc.geo.os(veglist$vst_mappingandtagging, 
                          "vst_mappingandtagging")

# Compute one coordinate per plot
plot_coords <- vegmap %>%
  group_by(plotID) %>%
  summarize(easting = mean(adjEasting, na.rm = TRUE), 
            northing = mean(adjNorthing, na.rm = TRUE))

# For each plot, download the CHM data (results in redundant downloads)
for (i in 1:nrow(plot_coords)) {
  byTileAOP(dpID="DP3.30015.001", 
            site="SJER", 
            year="2017", 
            easting=plot_coords$easting[i], 
            northing=plot_coords$northing[i],
            savepath="data", 
            check.size = FALSE)
}

Maybe there's a better way to handle the possibility of redundant downloads in this case, but this seems like a potentially general solution for similar use cases.

@cklunch
Copy link
Collaborator

cklunch commented Oct 20, 2019

@mbjoseph Looking at your code, if I'm understanding right, I think there's an easy solution, and I just need to improve the documentation. byTileAOP can take vector inputs for easting and northing. It will assume the vectors line up (ie, easting[1] and northing[1] are a desired pair of coordinates), but that should be accurate. So you can just skip the for loop, and the function will come up with the set of tiles that contain all the input coordinates.

But let me know if I've misunderstood!

@mbjoseph
Copy link
Author

mbjoseph commented Oct 20, 2019 via email

@cklunch
Copy link
Collaborator

cklunch commented Oct 20, 2019

No problem, glad to hear this solves this issue!

@cklunch cklunch closed this as completed Oct 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants