New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloading a long sequence of URLs results in high CPU usage and slowness #429

Closed
greafhe opened this Issue Sep 11, 2015 · 9 comments

Comments

Projects
None yet
5 participants
@greafhe

greafhe commented Sep 11, 2015

Here is what I did
In one terminal run: python -m SimpleHTTPServer 8888
In another terminal run: curl "localhost:8888/[1-1000000]" > /dev/null
In the beginning the speed was over 2000 URLs/second. By the time it reached the 50000th URL, the speed was under 300 URLs/second, and curl's CPU usage was about twice as high as it was at the start.
Restarting curl (without restarting the server) results again in high speed in the beginning, then gradually dropping. Same if I use a different range, e.g. [100000-1000000].

@greafhe

This comment has been minimized.

greafhe commented Sep 11, 2015

curl --version
curl 7.35.0 (x86_64-pc-linux-gnu) libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smtp smtps telnet tftp
Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP

(from the Ubuntu 14.04 repository)

@iboukris

This comment has been minimized.

Contributor

iboukris commented Sep 12, 2015

Interesting test. I've run it and got similar results with curl 7.45 (fedora 21).
There seem to be a noticeable decrease of the speed and increase of CPU usage over time.

On the other hand, similar test with libcurl doesn't show this behaviour.
The test below shows rather constant speed (and about twice as fast).

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <curl/curl.h>

int main(int argc, char *argv[])
{
  CURLcode ret;
  CURL *hnd;
  int i;
  char base_url[] = "http://localhost:8888/";

  hnd = curl_easy_init();

  char *url = (char *) malloc(sizeof(base_url) + 7);
  strcpy(url, base_url);

  for(i = 0; i < 1000000; i++) {
    sprintf(url + sizeof(base_url) - 1, "%i", i);
    curl_easy_setopt(hnd, CURLOPT_URL, url);
    ret = curl_easy_perform(hnd);
  }

  curl_easy_cleanup(hnd);

  return ret;
}

I can't tell why the tool behaves that way and I'm not sure I've considered all the aspects.

@jay

This comment has been minimized.

Member

jay commented Sep 12, 2015

 80.83%  curl  libcurl.so.4.3.0    [.] Curl_slist_append_nodup
 16.00%  curl  [kernel.kallsyms]   [k] 0xffffffff8114f406
  0.23%  curl  libcurl.so.4.3.0    [.] dprintf_formatf
  0.15%  curl  libc-2.19.so        [.] _IO_vfscanf
  0.10%  curl  libc-2.19.so        [.] malloc
  0.10%  curl  libcurl.so.4.3.0    [.] Curl_setopt
  0.09%  curl  curl                [.] operate_do

perf report output from Ubuntu 14 x64 running curl built from master fad9604 2015-09-11.

The Curl_slist_append_nodup calls trace back to the curl tool's libcurl code generation. Therefore it is an issue exclusive to the curl tool. By default the code generation is enabled at build time and for some reason it's always generated at runtime if it was enabled at build time. So for every URL the tool is generating the source. And to generate the source it goes line by line. The lines are stored in a linked list with no tail pointer, and so for each line it has to traverse the list to get to the end to add another line. This conclusion is just based on my read of the code, I did not go more in depth for proof that's what's eating. It's the type of thing that is not noticeable with a URL or two but a million, yes, it's noticeable.

If this is the case I think the fix would be not to do code generation unless it's requested. Though I wonder if this is one of those things that's a whole lot easier to say than do.

Until this is resolved you could build curl with configure option --disable-libcurl-option and it should perform better.

@bagder

This comment has been minimized.

Member

bagder commented Sep 12, 2015

Nice catch @jay, and indeed a silly flaw. We should of course only waste that time and memory if the user actually asked for code to get generated...

@bagder

This comment has been minimized.

Member

bagder commented Sep 12, 2015

Oh, and we should probably also consider using a linked list to which we can append strings at no extra cost even when the list gets very long so that this use case still would work fast and nice even when enabled.

@iboukris

This comment has been minimized.

Contributor

iboukris commented Sep 12, 2015

Nice indeed (and thanks for the perf tip)!
I can confirm that after I disable libcurl-option the speed is steady (and faster).

@bagder bagder added the cmdline tool label Sep 13, 2015

@bagder

This comment has been minimized.

Member

bagder commented Sep 16, 2015

Anyone volunteering to work on improving this piece of code?

@gnawhleinad

This comment has been minimized.

Contributor

gnawhleinad commented Sep 20, 2015

Oh, and we should probably also consider using a linked list to which we can append strings at no extra cost even when the list gets very long so that this use case still would work fast and nice even when enabled.

We should still pursue this proposal! Should I create a new issue for this?

@bagder

This comment has been minimized.

Member

bagder commented Sep 20, 2015

We should still pursue this proposal! Should I create a new issue for this?

Yes, it would be good and yeah a separate issue/pull-request makes it more convenient.

jay added a commit that referenced this issue Sep 21, 2015

tool_operate: Don't call easysrc cleanup unless --libcurl
- Review of 4d95491.

The author changed it so easysrc only initializes when --libcurl but did
not do the same for the call to easysrc cleanup.

Ref: #429

gnawhleinad added a commit to gnawhleinad/curl that referenced this issue Sep 22, 2015

tool: remove redundant libcurl check
The easysrc generation is run only when --libcurl is initialized.

Ref: curl#429

bagder added a commit that referenced this issue Sep 22, 2015

tool: remove redundant libcurl check
The easysrc generation is run only when --libcurl is initialized.

Ref: #429

Closes #448

gnawhleinad added a commit to gnawhleinad/curl that referenced this issue Sep 25, 2015

tool: Generate easysrc with last cache linked-list
Using a last cache linked-list improves the performance of easysrc generation.

Bug: curl#444
Ref: curl#429

bagder added a commit that referenced this issue Oct 17, 2015

tool: Generate easysrc with last cache linked-list
Using a last cache linked-list improves the performance of easysrc
generation.

Bug: #444
Ref: #429

Closes #452

jgsogo added a commit to jgsogo/curl that referenced this issue Oct 19, 2015

tool: generate easysrc only on --libcurl
Code should only be generated when --libcurl is used.

Bug: curl#429
Reported-by: @greafhe, Jay Satiro

Closes curl#429
Closes curl#442

jgsogo added a commit to jgsogo/curl that referenced this issue Oct 19, 2015

tool_operate: Don't call easysrc cleanup unless --libcurl
- Review of 4d95491.

The author changed it so easysrc only initializes when --libcurl but did
not do the same for the call to easysrc cleanup.

Ref: curl#429

jgsogo added a commit to jgsogo/curl that referenced this issue Oct 19, 2015

tool: remove redundant libcurl check
The easysrc generation is run only when --libcurl is initialized.

Ref: curl#429

Closes curl#448

@lock lock bot locked as resolved and limited conversation to collaborators May 7, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.