Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sr_poll - memory leak problem #47

Open
petersilva opened this issue Jun 2, 2018 · 13 comments
Open

sr_poll - memory leak problem #47

petersilva opened this issue Jun 2, 2018 · 13 comments
Labels
bug Something isn't working work-around a work-around is provided, mitigating the issue.

Comments

@petersilva
Copy link
Contributor

sr_poll process memory keeps growing with time.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12419 sarra 20 0 19.1g 14g 2268 S 0 31.1 614:07.82 sr_poll
27294 sarra 20 0 6420m 6.1g 2364 S 4 13.0 240:45.76 sr_poll
12415 sarra 20 0 155m 57m 1940 S 0 0.1 39:35.04 sr_poll

was: https://sourceforge.net/p/metpx/bugs/33/

@petersilva
Copy link
Contributor Author

working around it with daily cronjobs to restart.

@petersilva
Copy link
Contributor Author

How about I try to use http://www.lshift.net/blog/2008/11/14/tracing-python-memory-leaks/ (if I get it to work) to figure this out. not gaurenteeing anything. Plus I will be learning how to set this up.

@petersilva
Copy link
Contributor Author

Does sr_poll.py ever run on it's own? I've been having some issues with running it on it's own.
for one the config doc says "filename" is an option but I found that if it's not set nothing seems to get posted to the broker. And if VIP and interface aren't set it doesn't seem to get to post either.

And I'm now seeing an error stating that

2016-06-21 20:07:32,055 [ERROR] Type: <class 'AttributeError'="">, Value: 'sr_poll' object has no attribute 'info', ...
I've been putting etatements in to see what's what.
As far as I know I could be doing something wrong. But this error seems to prevent me from going past the first entry in the list of files to be transfered. I can see how this would look like a leak if the array is never stepped though and cleared.

@petersilva
Copy link
Contributor Author

And it looks like I was using an older copy of Sarra on my machine. the ERROR above has long since been taken care of.

But could the memory leak be because the internal array of files is never decreased in size as long as nothing is deleted on the site being polled?

@petersilva
Copy link
Contributor Author

next week, 2.17.03a5 will be installed on pxatx. Will look to see if the problem is still there.

@petersilva
Copy link
Contributor Author

was installed, problem is still there. On the other hand, it is sure that the problem is not
in sr_poll itself because the mirroring case for HPC hall-A-B has no memory problem at all.
so it is definitely in the NWS plugin somewhere.

Don't have a good approach to this.
some ideas:

https://mflerackers.wordpress.com/2012/04/12/fixing-and-avoiding-memory-leaks-in-python/
http://pclib.github.io/safari/program/python-cookbook/Text/ch08s23.html

Thinking that the self.parent = parent pattern, likely is a good candidate to use weakref,
but how to measure... kind of strange to just blindly apply changes.

@petersilva
Copy link
Contributor Author

@petersilva
Copy link
Contributor Author

from Jun:

I’m not sure of your conclusion.
Here’s my reason:

  • other poll like NDE have the memory leak problem too but not as much as serious then NWS (for ex. now NDE = 0.5% and NWS = 3.5%).
  • NDE has much les quantity of the files then NWS.
  • The policy poll don’t have the saving files list like NWS (for ex: -rw-rw-rw- 1 sarra px 27882 Jun 11 15:38 ls_2canada_bulletins_normal_priority)

Is it possible that sr_poll don’t closed properly the file descriptor of ls_?

very good point. Don't know. there is still a bit inside sr_poll (list comparison logic) that could easily be the source.

@petersilva petersilva added the bug Something isn't working label Jun 2, 2018
@petersilva petersilva added the work-around a work-around is provided, mitigating the issue. label Jun 9, 2018
@petersilva
Copy link
Contributor Author

with the latest version hb_memory it will check it´s memory usage and given a threshold will restart to prevent crashing the system, or taking too much memory.

@DXOS3
Copy link

DXOS3 commented Mar 19, 2021

Is this threshold of hb_memory still being kept up in recent versions? Me and my team are attempting to use Sarracenia and currently trying to work it through our network to retrieve data through dd.weather.gc.ca using the AMQP service. Seems that when we use the Windows version and install through the default AppData directory, we are having trouble downloading any files (even with one of the example configurations). I know there is a connection to the broker issue here (we still have to figure out the connection through our enterprise network) but python instances keep building up and eventually cause the computer to crash unless stopped in time. I'm not sure if this is intended behaviour or not or even related to the sr_poll function directly (please let me know if there is a more relevant issue here). If its too late, this happens:
image

@petersilva
Copy link
Contributor Author

the purpose of sr_poll is to create announcements (MQP messages) for files from servers that do not produce announcements already, so other sarracenia components can copy the files through the network. Any time a file is written to dd.weather.gc.ca, an announcement is already created, so you don't need sr_poll to download any files from there. You should be able to get by with sr_subscribe.

This issue you are raising is different from the sr_poll one... probably better to create a new issue. We can work through it anyways... How did you install Sarracenia?

@petersilva
Copy link
Contributor Author

for the process management aspect (stop and status not working.) I found a bobo.
If you are using packages, then here: http://hpfx.collab.science.gc.ca/~pas037/Sarracenia_Releases/Sarracenia_2.21.03p1.exe
if you are using git, then just pull main or v2_stable branches.

@DXOS3
Copy link

DXOS3 commented Mar 22, 2021

Originally installed Sarracenia through the pre-built Windows env Installer version 2.20.05 onto the default AppData user directory. I've been wondering if our anti-virus data exchange layer (from McAfee) or not having certain data ports open is causing us difficulties in receiving files or not as we;ll.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working work-around a work-around is provided, mitigating the issue.
Projects
None yet
Development

No branches or pull requests

2 participants