Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1847] Add middleware that cleans up the pylons response string. #2262

Merged
merged 10 commits into from Apr 1, 2015

Conversation

joetsoi
Copy link
Contributor

@joetsoi joetsoi commented Feb 4, 2015

(fixes #1847)
Once the response string has been served, this middleware replaces the response string with a dummy one, so the original response can be garbage collected. This is lifted from https://code.google.com/p/modwsgi/wiki/RegisteringCleanupCode This middleware is can be switched on by setting 'ckan.use_pylons_response_cleanup_middleware = true' in your development.ini

So I did a comparison of with and without this enabled. Everyone likes graphs right?
memory_usage
Notably, the memory usage still grows with this enabled, but is a bit more stable. I also did some examining of the memory and this memory has been returned to python's free lists but has not been released by the python process. see (http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm). So we'll still use up as much memory as there are threads

So the other way around it is to do something like use gunicorn + gevent, although this isn't a fair direct comparison as this is only using one worker process, but I mainly suggest this as you can set worker processes to restart after serving a maximum number of requests (see http://docs.gunicorn.org/en/develop/configure.html#max-requests), which would release any memory that the process has been allocated but is not using.

Anyway, this isn't critical or anything as it does just fix an edge case for a very particular workload that most people will not come across. (still needs tests)

Once the response string has been served, this middleware replaces the
response string with a dummy one, so the original response can be
garbage collected. This is lifted from

https://code.google.com/p/modwsgi/wiki/RegisteringCleanupCode

This middleware is can be switched on by setting
'ckan.use_pylons_response_cleanup_middleware = True' in your
development.ini
@joetsoi joetsoi added WIP and removed WIP labels Feb 4, 2015
Just test that the homepage runs fine with the middleware actiavted
@joetsoi joetsoi force-pushed the 1847-pylons-clean-middleware branch from b4c9676 to 68d384c Compare February 5, 2015 16:24
@joetsoi joetsoi force-pushed the 1847-pylons-clean-middleware branch from efb0d51 to a7d5599 Compare February 5, 2015 17:07
def __init__(self, iterable, callback, environ):
self.__iterable = iterable
self.__callback = callback
self.__environ = environ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ew. private members with name munging

@amercader amercader added this to the CKAN 2.4 milestone Feb 10, 2015
@joetsoi
Copy link
Contributor Author

joetsoi commented Feb 10, 2015

@aliceh75, might be nice if you could test this, as it's basically your fix without the timer, you might also want to look into using gunicorn + gevent that I suggested to utilize the process restart after serving a number of requests.

@aliceh75
Copy link
Contributor

Hi,

I can confirm this stabilizes the memory. The test I was just running (continuous loop fetching 10,000 rows with 4 workers) saw the memory go from 800M to 1.8G and back quite rapidly. With the middleware enabled it is far more stable (It's been oscillating between 793M and 823M for a while now, most of the time staying around 810M).

This is a great improvement :-) Thank you for doing this.

@aliceh75
Copy link
Contributor

mod_wsgi can also restart after a number of requests btw.

@amercader
Copy link
Member

🎉 woo, nice. Great work @joetsoi

@wardi
Copy link
Contributor

wardi commented Mar 30, 2015

@joetsoi I've made a couple small changes: https://github.com/wardi/ckan/tree/1847-pylons-clean-middleware would you take a look and cherry-pick or merge into your branch?

@wardi
Copy link
Contributor

wardi commented Mar 31, 2015

@joetsoi this link makes it easier to see the two commits https://github.com/wardi/ckan/commits/1847-pylons-clean-middleware Would you like a PR against your PR? :-)

@joetsoi
Copy link
Contributor Author

joetsoi commented Apr 1, 2015

@wardi, we should just all give each other edit permissions to each other's ckan forks, that'd be distributed version control in action :p

I've fixed up the tests/changelog and made it switch on by default, no point having it if it's not going to be used.

@wardi
Copy link
Contributor

wardi commented Apr 1, 2015

@joetsoi sounds good. is this good to go?

@joetsoi
Copy link
Contributor Author

joetsoi commented Apr 1, 2015

@wardi yeah it's all ready, it's just coveralls moaning.

@wardi wardi merged commit 9de6f5f into ckan:master Apr 1, 2015
@joetsoi joetsoi deleted the 1847-pylons-clean-middleware branch April 1, 2015 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Large memory leak in datastore API
4 participants