Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large uploads cause quadratic string concatenation #2471

Closed
hniksic opened this issue Apr 27, 2017 · 2 comments
Closed

Large uploads cause quadratic string concatenation #2471

hniksic opened this issue Apr 27, 2017 · 2 comments

Comments

@hniksic
Copy link

hniksic commented Apr 27, 2017

In our application that uses wfastcgi.py with IIS, we noticed intense slowdowns with uploads of large attachments. Although the server is idle and has sufficient free memory, it is unable to process e.g. a ~100MB uploaded file. While investigating this, we discovered that the process gets stuck in in wfastcgi.py before ever reaching our code.

The IIS FastCGI module splits attachment data into ~10KB chunks and sends them one by one to the fastcgi handler (implemented in wfastcgi.py). wfastcgi.py concatenates the chunks into a string object, using regular python string addition. For a 100MB file this requires creation of 10 thousand string objects of sizes increasing linearly from 10KB to 100MB, totaling in 465GB data to be copied to create them. For a 2GB file, the process takes too long to wait for it to finish.

Using += to concatenate a large list of strings is a known antipattern in Python and is fortunately easy to avoid by using BytesIO.

We have fixed the performance of wfastcgi.py uploads by applying a patch like the one in #2476. Since handle_response.__enter__ actually creates a BytesIO object out of the resulting string, using a BytesIO all along comes even more natural.

A more complete patch would probably make the equivalent change for read_fastcgi_data.

@zooba
Copy link
Member

zooba commented Apr 27, 2017

Thanks for the analysis! This is likely the cause of problems we've seen in the past as well.

I agree with the proposed fix: res['wsgi.input'] and res['wsgi.data'] should be BytesIO instances by default, and if needed we should change wsgi.data to plain bytes in handle_response.__enter__.

@zooba zooba added this to the 3.0 milestone Apr 27, 2017
@zooba
Copy link
Member

zooba commented Apr 28, 2017

This should be fixed with #2476, though we'll hold off publishing a new version of it for a little while in case we find any more problems.

If you're encountering this, I recommend downloading wfastcgi.py directly for now and replacing your existing one (either the file or by referencing the newer one from your web.config).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants