Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wf_request_batch() with transfer=FALSE #103

Closed
Rafnuss opened this issue Sep 27, 2022 · 12 comments
Closed

wf_request_batch() with transfer=FALSE #103

Rafnuss opened this issue Sep 27, 2022 · 12 comments

Comments

@Rafnuss
Copy link

Rafnuss commented Sep 27, 2022

Is is possible to have batch request without transfer?

The documentation for wf_request() and wf_request_batch() reads:

Stage a data request, and optionally download the data to disk. Alternatively you can only stage requests, logging the request URLs to submit download queries later on using wf_transfer.

But as I understand it, wf_request_batch() doesn't have an option to just stage the request. Is this correct? Or am I missing something?

@eliocamp
Copy link
Collaborator

eliocamp commented Oct 3, 2022

Yes, you're correct. Right now wf_request_batch() submits workers number of requests and downloads each one when finished. How would you view only staging requests working

@Rafnuss
Copy link
Author

Rafnuss commented Oct 3, 2022

My ideal code would look something like:

requests = wf_request_batch(request_list, transfer=F)
# some time later...
wf_transfer(requests)

instead of

for (i_req in seq_len(request_list)){
  requests[i_req] = wf_request(request_list[i_req], transfer=F)
}
# some time later...
for (i_req in seq_len(requests)){
  wf_transfer(requests[i_req])
}

But maybe this is quite a specific need that nobody else share...

@eliocamp
Copy link
Collaborator

eliocamp commented Oct 3, 2022

So you're requesting all the requests at once and then downloading when they are done? The added value of wf_request_batch() is the build-in queue to ensure that you're only sending a maximum number of request so they are not queued on the server end.

For your usecase I'd use req <- lapply(request_list, wf_request, transfer = FALSE) and then lapply(req, wf_transfer).

@khufkens what do you think?

@khufkens
Copy link
Member

khufkens commented Oct 3, 2022

Correct, assuming that you don't exceed the maximum number of allowed parallel requests.

The recent work of @eliocamp explicitly addresses the latter, monitoring the queue to download and submit new requests as slots free up. So as long as you colour within the lines the proposed fix (above) should work.

@Rafnuss
Copy link
Author

Rafnuss commented Oct 3, 2022

Ok, thanks for your answers. lapply() is a much more consise version of my suggestion indeed.

In my case I could have up to 100+ requests to make (of very small files). This is only done once in the overall process of my code. So, my thinking would be to make all requests at once and wait a couple of hours and then download them all.
I've been using wf_request_batch() for cases with a few requests (<30), but I thought that is would be nice to have the R console free to do other think while waiting for the case with more requests. What do you think?

@khufkens
Copy link
Member

khufkens commented Oct 3, 2022

Just submit it as a job! Either in a separate terminal (if you are using no IDE, or using the job interface in RStudio). I mostly let jobs like this run in the background in RStudio, or when using an HPC they run as proper job in the HPC queue.

But yes, best to download everything in one pass if you don't need dynamic access

@khufkens
Copy link
Member

khufkens commented Oct 3, 2022

For reference:

https://solutions.rstudio.com/r/jobs/

@Rafnuss
Copy link
Author

Rafnuss commented Oct 3, 2022

Ok, yes, sounds like a good plan. I'm not super familiar with jobs. But for my case, would you be using the job_name in wf_request(. , transfer=T, job_name="test") or write a script with wf_request_batch() and start it with rstudioapi::jobRunScript() ?

@khufkens
Copy link
Member

khufkens commented Oct 3, 2022

That's effectively the same thing.

I often call things from within RStudio itself as I often lump in some post/pre-processing.

@eliocamp
Copy link
Collaborator

eliocamp commented Oct 3, 2022

Bear in mind that lapply(list, wf_request, job_name = "test") wont' really work, as it will create 100+ jobs with the same name. I think a better alternative for your use case might be to write a small script with lapply and wf_request and then run the script as a job or even run it in a different R session on the console.

@Rafnuss
Copy link
Author

Rafnuss commented Oct 3, 2022

Sounds good. Maybe I'll keep wf_request_batch() in my function (standard case should be 10-30 requests), and then call this function as a job with https://github.com/lindeloev/job/ in the case that there are more requests. Thanks for your help!

@khufkens
Copy link
Member

khufkens commented Oct 3, 2022

Ok, I'll close this now.

btw. @Rafnuss nice work with the pressure based geolocation work.

@khufkens khufkens closed this as completed Oct 3, 2022
Rafnuss added a commit to Rafnuss/GeoPressureR that referenced this issue Oct 3, 2022
Rafnuss added a commit to Rafnuss/GeoPressureManual that referenced this issue Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants