Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl: support parallel transfers #3804

Closed
wants to merge 2 commits into from

Conversation

@bagder
Copy link
Member

bagder commented Apr 24, 2019

This is done by making sure each individual transfer is first added to a linked list as then they can be performed serially, or at will, in parallel.

Status

  • creates a unique curl handle for each transfer and adds them to a linked list
  • iterates over each transfer in the list and runs them one by one
  • existing code and test cases adjusted
  • minor behavior change documented
  • ability to run transfers in parallel
  • implement a basic progress meter for multiple concurrent transfers
  • documented the new parallel option (-Z / --parallel)
  • supports maximum N transfers done in parallel (--parallel-max)
  • make sure --retry still works for serial transfers

Still TODO post first merge

  • make --retry work for parallel
  • make sure it handles torture tests fine
  • test cases for parallel downloads
  • work out how to report success/failures for individual transfers
@bagder bagder added the cmdline tool label Apr 24, 2019
@bagder

This comment has been minimized.

Copy link
Member Author

bagder commented Apr 24, 2019

The codacy warnings are just silly.

@MarcelRaad

This comment has been minimized.

Copy link
Member

MarcelRaad commented Apr 24, 2019

The codacy warnings are just silly.

You can turn that one off via Code patterns -> Flawfinder.

@bagder

This comment has been minimized.

Copy link
Member Author

bagder commented Apr 24, 2019

I can? I can't find anywhere to do that!

@MarcelRaad

This comment has been minimized.

Copy link
Member

MarcelRaad commented Apr 24, 2019

After "Login with GitHub", "Code patterns" appears at the menu on the left.

@bagder bagder force-pushed the bagder/parallel-transfers branch from b6eb167 to 3b045f4 Apr 25, 2019
@bagder bagder force-pushed the bagder/parallel-transfers branch from 80e712b to 5466d5d Apr 26, 2019
@jay

This comment has been minimized.

Copy link
Member

jay commented Apr 26, 2019

I would just use --parallel and nix the short option, this seems like use will be almost always in scripts. I don't see anyone at the command line running curl -Z <fifty urls>.

confusing:
ckmopsuvwxz
CKMOPSUVWXZ
@jay

This comment has been minimized.

Copy link
Member

jay commented Apr 27, 2019

I had one unreproducible hang while I was playing around with this, I'm not sure what caused it but during the investigation I found an off-by-one bug so I'm going to attribute it to that unless it happens again.

> curld --parallel -o first http://mirrors.rit.edu/ubuntu-releases/16.04/ubuntu-16.04.2-server-amd64.template
DL% UL%  Dled  Uled  Xfers  Live   Qd Total     Current  Left    Speed
--  --      0     0     1     1     0 --:--:-- --:--:-- --:--:-- -9223

That resource no longer exists, so I expect the tool to terminate almost immediately. The first time it didn't but every time after that it did (regardless of the fix).

@bagder

This comment has been minimized.

Copy link
Member Author

bagder commented Apr 27, 2019

thanks for the fixes @jay!

@bagder bagder force-pushed the bagder/parallel-transfers branch from 7d84654 to 79de0ba Apr 28, 2019
@jumoog

This comment has been minimized.

Copy link

jumoog commented Apr 29, 2019

I saw your YouTube video.. you can do a ETA if you request the file size of all urls first

@bagder

This comment has been minimized.

Copy link
Member Author

bagder commented Apr 29, 2019

No, I can't.

Primarily, lots of transfers libcurl can do simply do not know the size before-hand no matter what we do.
Secondly, for those that do, if we are to make 10,000 transfers it is not feasible to "get the size" of those 10K first before any transfers start. Partly because how libcurl works and partly because how those protocols work.

@bagder bagder force-pushed the bagder/parallel-transfers branch from 9735583 to 1084610 Apr 30, 2019
@bagder bagder force-pushed the bagder/parallel-transfers branch from 1084610 to bce4060 May 11, 2019
@bagder bagder changed the title [WIP] curl: support parallel transfers curl: support parallel transfers May 11, 2019
@bagder

This comment has been minimized.

Copy link
Member Author

bagder commented May 11, 2019

I've now moved a few of the remaining action items to get them done after the first merge, so that we can do the development of those things more parallel and with cooperation. As long as the non-parallel use cases still work exactly like before and are stable it shouldn't be a problem to proceed like this.

@jay

This comment has been minimized.

Copy link
Member

jay commented May 14, 2019

My --write-outs are being overwritten by the progress meter. Simple example:

> curld --parallel --write-out "test %{http_code}" -o foo http://httpbin.org/get

DL% UL%  Dled  Uled  Xfers  Live   Qd Total     Current  Left    Speed
--  --    204     0     1     0     0 --:--:-- --:--:-- --:--:--  3290

In parallel_transfers after a transfer finishes post_transfer is called for that transfer, which does the write-out. However after that progress meter is called again which overwrites the line.

curl/src/tool_operate.c

Lines 2011 to 2040 in 80f044f

if(!mcode) {
int rc;
CURLMsg *msg;
bool removed = FALSE;
do {
msg = curl_multi_info_read(multi, &rc);
if(msg) {
bool retry;
struct per_transfer *ended;
CURL *easy = msg->easy_handle;
result = msg->data.result;
curl_easy_getinfo(easy, CURLINFO_PRIVATE, (void *)&ended);
curl_multi_remove_handle(multi, easy);
result = post_transfer(global, share, ended, result, &retry);
if(retry)
continue;
progress_finalize(ended); /* before it goes away */
all_added--; /* one fewer added */
removed = TRUE;
(void)del_transfer(ended);
}
} while(msg);
if(removed)
/* one or more transfers completed, add more! */
(void)add_parallel_transfers(global, multi);
}
}
(void)progress_meter(global, &start, TRUE);

"DL% UL%  Dled  Uled  Xfers  Live   Qd Total     Current  Left    Speed\n"
"test 200"  <---  result = post_transfer(global, share, ended, result, &retry); 
"\r--  --    204     0     1     0     0 --:--:-- --:--:-- --:--:--  3290\n"  <---  (void)progress_meter(global, &start, TRUE); 
@bagder

This comment has been minimized.

Copy link
Member Author

bagder commented May 14, 2019

Any suggestion on what to do with --write-out in a parallel scenario?

@jay

This comment has been minimized.

Copy link
Member

jay commented May 14, 2019

Any suggestion on what to do with --write-out in a parallel scenario?

I'm not sure. Delay until finished?

@bagder

This comment has been minimized.

Copy link
Member Author

bagder commented May 14, 2019

I suppose it goes into the whole "how to show what happened for each transfer" discussion too. If you do 20 transfers, how are those --write-out going to make sense at the end...

@jay

This comment has been minimized.

Copy link
Member

jay commented May 14, 2019

I suppose it goes into the whole "how to show what happened for each transfer" discussion too. If you do 20 transfers, how are those --write-out going to make sense at the end...

If they're done in the order they were given they could make some sense. Or there could be a transfer id like "%{transfer_id}"

@bagder bagder force-pushed the bagder/parallel-transfers branch 2 times, most recently from 834a55a to 4c082ef May 22, 2019
@bagder bagder force-pushed the bagder/parallel-transfers branch from 4c082ef to 81956d4 Jun 11, 2019
@bagder bagder force-pushed the bagder/parallel-transfers branch from 81956d4 to 85264f9 Jun 25, 2019
@bagder bagder force-pushed the bagder/parallel-transfers branch from 85264f9 to fa129a6 Jul 15, 2019
This is done by making sure each individual transfer is first added to a
linked list as then they can be performed serially, or at will, in
parallel.
@bagder bagder force-pushed the bagder/parallel-transfers branch from fa129a6 to e281e99 Jul 19, 2019
@bagder

This comment has been minimized.

Copy link
Member Author

bagder commented Jul 19, 2019

I intend to merge this within shortly.

@bagder bagder closed this in b889408 Jul 20, 2019
@bagder bagder deleted the bagder/parallel-transfers branch Jul 20, 2019
@MarcelRaad

This comment has been minimized.

Copy link
Member

MarcelRaad commented Jul 22, 2019

This broke the autobuilds. Unfortunately, it's not easy to see why in this huge diff:
https://curl.haxx.se/dev/log.cgi?id=20190722042000-16028#prob1

src/tool_operate.c:2290:11: error: implicit declaration of function 'easysrc_cleanup'; did you mean 'glob_cleanup'? [-Werror=implicit-function-declaration]
jay added a commit to jay/curl that referenced this pull request Jul 22, 2019
easysrc_cleanup is only defined when CURL_DISABLE_LIBCURL_OPTION is
defined, and prior to this change would be called regardless.

Bug: curl#3804 (comment)
Reported-by: Marcel Raad

Closes #xxxx
@jay

This comment has been minimized.

Copy link
Member

jay commented Jul 22, 2019

This broke the autobuilds.

#4142

jay added a commit that referenced this pull request Jul 23, 2019
easysrc_cleanup is only defined when CURL_DISABLE_LIBCURL_OPTION is not
defined, and prior to this change would be called regardless.

Bug: #3804 (comment)
Reported-by: Marcel Raad

Closes #4142
caraitto added a commit to caraitto/curl that referenced this pull request Jul 23, 2019
This is done by making sure each individual transfer is first added to a
linked list as then they can be performed serially, or at will, in
parallel.

Closes curl#3804
caraitto added a commit to caraitto/curl that referenced this pull request Jul 23, 2019
easysrc_cleanup is only defined when CURL_DISABLE_LIBCURL_OPTION is not
defined, and prior to this change would be called regardless.

Bug: curl#3804 (comment)
Reported-by: Marcel Raad

Closes curl#4142
@lock lock bot locked as resolved and limited conversation to collaborators Oct 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants
You can’t perform that action at this time.