Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak #337

Closed
kolserdav opened this issue Apr 27, 2023 · 4 comments
Closed

Memory leak #337

kolserdav opened this issue Apr 27, 2023 · 4 comments

Comments

@kolserdav
Copy link
Contributor

kolserdav commented Apr 27, 2023

I have used a simple Django server for my application https://github.com/kolserdav/ana/tree/1f9d445926f9b565010667dbd0daa21fea6a1080/packages/translate .

Django urls.py file https://github.com/kolserdav/ana/blob/1f9d445926f9b565010667dbd0daa21fea6a1080/packages/translate/translate/urls.py#L1-L27

Django translate handler file https://github.com/kolserdav/ana/blob/1f9d445926f9b565010667dbd0daa21fea6a1080/packages/translate/translate/api/translate.py#L1-L16

My Translateclass https://github.com/kolserdav/ana/blob/1f9d445926f9b565010667dbd0daa21fea6a1080/packages/translate/translate/core/translate.py#L1-L41

After I start the server, I start repeating the same request with Curl:

curl -X POST -d '{"q": "test", "source":"en", "target":"ru"}' -H 'Content-Type: application/json' http://127.0.0.1:8000/translate

In another window, I open top with a filter by name python:

top | grep python

After a certain number (depending on server resources) of repeated requests, I see that the "python" process consumes a significant amount of memory. This share of memory will now never be freed, even if you stop repeating requests. This memory consumption remains until the process is restarted:

 77899 kol       20   0 7468592   3.0g 174108 S   1.3  19.0   0:37.09 python              
  77899 kol       20   0 7697984   3.1g 174108 S  45.5  20.1   0:38.46 python              
  77899 kol       20   0 8418944   3.7g 174108 S  60.8  23.8   0:40.29 python              
  77899 kol       20   0 7730640   3.4g 174108 S  38.5  21.9   0:41.45 python              
  77899 kol       20   0 8091136   3.6g 174108 S  62.5  23.3   0:43.33 python              
  77899 kol       20   0 8091136   3.6g 174108 S   3.3  23.3   0:43.43 python
  77899 kol       20   0 8091136   3.6g 174108 S   3.0  23.3   0:43.52 python              
  77899 kol       20   0 8091136   3.6g 174108 S   3.3  23.3   0:43.62 python              
  77899 kol       20   0 8091136   3.6g 174108 S   1.3  23.3   0:43.66 python              
  77899 kol       20   0 8091136   3.6g 174108 S   1.7  23.3   0:43.71 python              
  77899 kol       20   0 8091136   3.6g 174108 S   2.3  23.3   0:43.78 python              
  77899 kol       20   0 8091136   3.6g 174108 S   3.0  23.3   0:43.87 python              
  77899 kol       20   0 8091136   3.6g 174108 S   2.3  23.3   0:43.97 python              
  77899 kol       20   0 8091136   3.6g 174108 S   2.0  23.3   0:44.03 python     

If you continue to make requests, then the process will soon crash with status 247

I will be grateful for any help.

@kolserdav
Copy link
Contributor Author

I was able to figure out that here

if self.translator is None:
self.translator is always None on every translation request. This means that for each translation event, the program creates a new PackageTranslation instance. This can lead to a memory leak if suddenly ctranslate2 stores some process-bound data (globally).

@kolserdav
Copy link
Contributor Author

I have used a simple Django server for my application https://github.com/kolserdav/ana/tree/1f9d445926f9b565010667dbd0daa21fea6a1080/packages/translate .

Django urls.py file https://github.com/kolserdav/ana/blob/1f9d445926f9b565010667dbd0daa21fea6a1080/packages/translate/translate/urls.py#L1-L27

Django translate handler file https://github.com/kolserdav/ana/blob/1f9d445926f9b565010667dbd0daa21fea6a1080/packages/translate/translate/api/translate.py#L1-L16

My Translateclass https://github.com/kolserdav/ana/blob/1f9d445926f9b565010667dbd0daa21fea6a1080/packages/translate/translate/core/translate.py#L1-L41

After I start the server, I start repeating the same request with Curl:

curl -X POST -d '{"q": "test", "source":"en", "target":"ru"}' -H 'Content-Type: application/json' http://127.0.0.1:8000/translate

In another window, I open top with a filter by name python:

top | grep python

After a certain number (depending on server resources) of repeated requests, I see that the "python" process consumes a significant amount of memory. This share of memory will now never be freed, even if you stop repeating requests. This memory consumption remains until the process is restarted:

 77899 kol       20   0 7468592   3.0g 174108 S   1.3  19.0   0:37.09 python              
  77899 kol       20   0 7697984   3.1g 174108 S  45.5  20.1   0:38.46 python              
  77899 kol       20   0 8418944   3.7g 174108 S  60.8  23.8   0:40.29 python              
  77899 kol       20   0 7730640   3.4g 174108 S  38.5  21.9   0:41.45 python              
  77899 kol       20   0 8091136   3.6g 174108 S  62.5  23.3   0:43.33 python              
  77899 kol       20   0 8091136   3.6g 174108 S   3.3  23.3   0:43.43 python
  77899 kol       20   0 8091136   3.6g 174108 S   3.0  23.3   0:43.52 python              
  77899 kol       20   0 8091136   3.6g 174108 S   3.3  23.3   0:43.62 python              
  77899 kol       20   0 8091136   3.6g 174108 S   1.3  23.3   0:43.66 python              
  77899 kol       20   0 8091136   3.6g 174108 S   1.7  23.3   0:43.71 python              
  77899 kol       20   0 8091136   3.6g 174108 S   2.3  23.3   0:43.78 python              
  77899 kol       20   0 8091136   3.6g 174108 S   3.0  23.3   0:43.87 python              
  77899 kol       20   0 8091136   3.6g 174108 S   2.3  23.3   0:43.97 python              
  77899 kol       20   0 8091136   3.6g 174108 S   2.0  23.3   0:44.03 python     

If you continue to make requests, then the process will soon crash with status 247

I will be grateful for any help.

Translate class here https://github.com/kolserdav/ana/blob/1f9d445926f9b565010667dbd0daa21fea6a1080/packages/translate/translate/core/translate.py#L1-L41 (was mistake before)

@kolserdav
Copy link
Contributor Author

kolserdav commented May 3, 2023

Still, I was not able to completely eliminate the memory leak in argostranslate. A small leak was due to Django, which was avoided by switching to Bottle.

But still, if you rely on tokenization from argostranslate, a small memory leak still remains https://github.com/kolserdav/ana/blob/dd43b5a1741ca68eb7d898b8ea6e835c051b6d10/packages/bottle/main.py#L51-L65

Now I settled on a self-made tokenization and this method of translation https://github.com/kolserdav/ana/blob/6d3329403a38c67ff5f77336be937f9b4b58e2d7/packages/bottle/core/translate.py#L45-L94 , I don’t know how it will show itself in the future, but at least as tests show, this method completely eliminates memory leaks.

To observe the leak, I ran similar query loops in multiple windows:

for i in $(seq 1 10000) ; do curl -X POST -d '{"q": "test", "source":"en", "target":"fi"}' -H 'Content-Type: application/json' http://127.0.0.1:8000/translate ; done

@Nick-
Copy link

Nick- commented Mar 27, 2024

Please re-open this. It is still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants