`wf_request_batch()` with `transfer=FALSE` #103

Rafnuss · 2022-09-27T11:39:22Z

Is is possible to have batch request without transfer?

The documentation for wf_request() and wf_request_batch() reads:

Stage a data request, and optionally download the data to disk. Alternatively you can only stage requests, logging the request URLs to submit download queries later on using wf_transfer.

But as I understand it, wf_request_batch() doesn't have an option to just stage the request. Is this correct? Or am I missing something?

The text was updated successfully, but these errors were encountered:

eliocamp · 2022-10-03T18:56:26Z

Yes, you're correct. Right now wf_request_batch() submits workers number of requests and downloads each one when finished. How would you view only staging requests working

Rafnuss · 2022-10-03T19:04:36Z

My ideal code would look something like:

requests = wf_request_batch(request_list, transfer=F)
# some time later...
wf_transfer(requests)

instead of

for (i_req in seq_len(request_list)){
  requests[i_req] = wf_request(request_list[i_req], transfer=F)
}
# some time later...
for (i_req in seq_len(requests)){
  wf_transfer(requests[i_req])
}

But maybe this is quite a specific need that nobody else share...

eliocamp · 2022-10-03T19:10:29Z

So you're requesting all the requests at once and then downloading when they are done? The added value of wf_request_batch() is the build-in queue to ensure that you're only sending a maximum number of request so they are not queued on the server end.

For your usecase I'd use req <- lapply(request_list, wf_request, transfer = FALSE) and then lapply(req, wf_transfer).

@khufkens what do you think?

khufkens · 2022-10-03T19:18:07Z

Correct, assuming that you don't exceed the maximum number of allowed parallel requests.

The recent work of @eliocamp explicitly addresses the latter, monitoring the queue to download and submit new requests as slots free up. So as long as you colour within the lines the proposed fix (above) should work.

Rafnuss · 2022-10-03T19:25:02Z

Ok, thanks for your answers. lapply() is a much more consise version of my suggestion indeed.

In my case I could have up to 100+ requests to make (of very small files). This is only done once in the overall process of my code. So, my thinking would be to make all requests at once and wait a couple of hours and then download them all.
I've been using wf_request_batch() for cases with a few requests (<30), but I thought that is would be nice to have the R console free to do other think while waiting for the case with more requests. What do you think?

khufkens · 2022-10-03T19:28:29Z

Just submit it as a job! Either in a separate terminal (if you are using no IDE, or using the job interface in RStudio). I mostly let jobs like this run in the background in RStudio, or when using an HPC they run as proper job in the HPC queue.

But yes, best to download everything in one pass if you don't need dynamic access

khufkens · 2022-10-03T19:30:40Z

For reference:

https://solutions.rstudio.com/r/jobs/

Rafnuss · 2022-10-03T19:44:25Z

Ok, yes, sounds like a good plan. I'm not super familiar with jobs. But for my case, would you be using the job_name in wf_request(. , transfer=T, job_name="test") or write a script with wf_request_batch() and start it with rstudioapi::jobRunScript() ?

khufkens · 2022-10-03T19:48:57Z

That's effectively the same thing.

I often call things from within RStudio itself as I often lump in some post/pre-processing.

eliocamp · 2022-10-03T19:53:02Z

Bear in mind that lapply(list, wf_request, job_name = "test") wont' really work, as it will create 100+ jobs with the same name. I think a better alternative for your use case might be to write a small script with lapply and wf_request and then run the script as a job or even run it in a different R session on the console.

Rafnuss · 2022-10-03T19:56:24Z

Sounds good. Maybe I'll keep wf_request_batch() in my function (standard case should be 10-30 requests), and then call this function as a job with https://github.com/lindeloev/job/ in the case that there are more requests. Thanks for your help!

khufkens · 2022-10-03T20:01:38Z

Ok, I'll close this now.

btw. @Rafnuss nice work with the pressure based geolocation work.

Rafnuss mentioned this issue Sep 27, 2022

Improve graph_download_wind() Rafnuss/GeoPressureR#54

Closed

4 tasks

khufkens closed this as completed Oct 3, 2022

Rafnuss added a commit to Rafnuss/GeoPressureR that referenced this issue Oct 3, 2022

#54 and bluegreen-labs/ecmwfr#103

fce32d7

Rafnuss added a commit to Rafnuss/GeoPressureManual that referenced this issue Oct 3, 2022

#54 and bluegreen-labs/ecmwfr#103

2ba3dd5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`wf_request_batch()` with `transfer=FALSE` #103

`wf_request_batch()` with `transfer=FALSE` #103

Rafnuss commented Sep 27, 2022

eliocamp commented Oct 3, 2022 •

edited

Rafnuss commented Oct 3, 2022

eliocamp commented Oct 3, 2022

khufkens commented Oct 3, 2022

Rafnuss commented Oct 3, 2022

khufkens commented Oct 3, 2022

khufkens commented Oct 3, 2022

Rafnuss commented Oct 3, 2022

khufkens commented Oct 3, 2022

eliocamp commented Oct 3, 2022

Rafnuss commented Oct 3, 2022 •

edited

khufkens commented Oct 3, 2022

wf_request_batch() with transfer=FALSE #103

wf_request_batch() with transfer=FALSE #103

Comments

Rafnuss commented Sep 27, 2022

eliocamp commented Oct 3, 2022 • edited

Rafnuss commented Oct 3, 2022

eliocamp commented Oct 3, 2022

khufkens commented Oct 3, 2022

Rafnuss commented Oct 3, 2022

khufkens commented Oct 3, 2022

khufkens commented Oct 3, 2022

Rafnuss commented Oct 3, 2022

khufkens commented Oct 3, 2022

eliocamp commented Oct 3, 2022

Rafnuss commented Oct 3, 2022 • edited

khufkens commented Oct 3, 2022

`wf_request_batch()` with `transfer=FALSE` #103

`wf_request_batch()` with `transfer=FALSE` #103

eliocamp commented Oct 3, 2022 •

edited

Rafnuss commented Oct 3, 2022 •

edited