ResourceWarning: unclosed socket error #106

slhowardESR · 2022-07-25T15:57:38Z

Hi JP,

I am doing some testing - preparing for the large run, and sometimes I get this problem, will kills the program.

`sys:1: ResourceWarning: unclosed socket <zmq.Socket(zmq.PUSH) at 0x19391b4c280>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
sys:1: ResourceWarning: unclosed socket <zmq.Socket(zmq.PUSH) at 0x19391b4c400>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
sys:1: ResourceWarning: unclosed socket <zmq.Socket(zmq.PUSH) at 0x195403fc640>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
sys:1: ResourceWarning: unclosed socket <zmq.Socket(zmq.PUSH) at 0x19604735dc0>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Traceback (most recent call last):

File D:\Jupyter\sliderulework\Spyder_SR\SR_by_RGT_5files_print_status.py:108 in
main()

File D:\Jupyter\sliderulework\Spyder_SR\SR_by_RGT_5files_print_status.py:81 in main
gdf = icesat2.atl06p(parmsyp, version=args.release,

File d:\jupyter\sliderule-python\sliderule\icesat2.py:881 in atl06p
return __parallelize(callback, __atl06, parm, resources, asset)

File d:\jupyter\sliderule-python\sliderule\icesat2.py:597 in __parallelize
result, resource = future.result()

File ~\anaconda3\envs\sliderule\lib\concurrent\futures_base.py:437 in result
return self.__get_result()

File ~\anaconda3\envs\sliderule\lib\concurrent\futures_base.py:389 in __get_result
raise self._exception

File ~\anaconda3\envs\sliderule\lib\concurrent\futures\thread.py:57 in run
result = self.fn(*self.args, **self.kwargs)

File d:\jupyter\sliderule-python\sliderule\icesat2.py:459 in __atl06
rsps = sliderule.source("atl06", rqst, stream=True)

File d:\jupyter\sliderule-python\sliderule\sliderule.py:425 in source
__clrserv(serv, stream)

File d:\jupyter\sliderule-python\sliderule\sliderule.py:184 in __clrserv
server_table[serv]["pending"] -= 1

KeyError: 'http://34.212.131.26'`

I am not sure what is causing this. I can send you my code. I am basically trying to do SR-YAPC processing for individual rgt in region 10 and 12. I am running two regions, separate processes, at once.
Some times it works. and Sometimes it crashes.

let me know if you need more info

jpswinski · 2022-07-25T16:01:56Z

@slhowardESR I've not seen this before. I wonder if the client is spawning too many threads and making too many concurrent connections. Can you send me the code you are using? If you want, you can send it to me via slack so the code is kept private. I will run things on my side and see if I can recreate and diagnose the problem.

…concurrent threads to be removed, only the first wins and the others just pass

jpswinski · 2022-07-29T11:57:03Z

@slhowardESR after some investigation, there appears to be a number of different things happening on your processing runs:

The ResourceWarning: unclosed socket <zmq... is likely not a problem. It is a warning that the underlying code is not closing a socket that it is no longer using. In this case, it is coming from ZeroMQ, which is likely coming from your Spyder environment.
The KeyError crash is due to a bug in the SlideRule Python client that is triggered by a race condition in the code. I fixed the bug and will be creating a new release of the code today. You can do a git pull on main or wait for the conda update.
The underlying problem though is that server nodes are going down due to not having any available memory while processing your requests. From my testing, it looks like there are maybe three granules (maybe more) in your test runs that take so much memory to process that the backend nodes processing them go down. There is a short term fix for this that will work most of the time - but really the long term fix is to rework the backend server code to be more memory efficient.
- The short term fix is to add the line sliderule.set_max_pending(1) right after your call to icesat2.init...; You will also have to add from sliderule import sliderule to your imports at the top. This call makes it so that each backend server node only processes one granule at a time; the default is three. Since YAPC takes almost 100% of the CPU, processing one granule at a time did not seem to affect the overall processing time that much.
- The long term fix is more involved and so I have created another GitHub Issue to track its progress: Processing runs on full granules consume all resources (in some cases) sliderule#117

jpswinski · 2022-07-29T12:35:25Z

jpswinski · 2022-07-29T12:36:17Z

The above snapshot shows a run with max pending set to 3, and then again with max pending set to 1. When set to 1, there are no servers that bounce, though some of the memory dips are pretty low.

jpswinski · 2022-08-30T12:00:55Z

The temporary fix of setting the max pending to 1 seems to have worked well. All future development on this issue will be tracked under SlideRuleEarth/sliderule#117

jpswinski added a commit that referenced this issue Jul 29, 2022

fixed bug identified in #106 - when a node is identified in multiple …

1e31a16

…concurrent threads to be removed, only the first wins and the others just pass

jpswinski closed this as completed Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResourceWarning: unclosed socket error #106

ResourceWarning: unclosed socket error #106

slhowardESR commented Jul 25, 2022

jpswinski commented Jul 25, 2022 •

edited

jpswinski commented Jul 29, 2022

jpswinski commented Jul 29, 2022

jpswinski commented Jul 29, 2022

jpswinski commented Aug 30, 2022

ResourceWarning: unclosed socket error #106

ResourceWarning: unclosed socket error #106

Comments

slhowardESR commented Jul 25, 2022

jpswinski commented Jul 25, 2022 • edited

jpswinski commented Jul 29, 2022

jpswinski commented Jul 29, 2022

jpswinski commented Jul 29, 2022

jpswinski commented Aug 30, 2022

jpswinski commented Jul 25, 2022 •

edited