Skip to content

HA went into a failure loop after running out of file descriptors #144802

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
akx opened this issue May 13, 2025 · 4 comments
Open

HA went into a failure loop after running out of file descriptors #144802

akx opened this issue May 13, 2025 · 4 comments

Comments

@akx
Copy link
Contributor

akx commented May 13, 2025

The problem

Last night, the better half complained that the lights in the living room didn't work, and I noticed the NUC hosting our HA was spinning its fans like no tomorrow.

Looking at journalctl logs for HA, it had apparently run out of file descriptors, and absolutely couldn't deal:

May 12 19:09:57 saga hass[1303189]: 2025-05-12 19:09:57.800 INFO (MainThread) [homeassistant.components.number] Setting up wiz.number
May 12 19:09:57 saga hass[1303189]: 2025-05-12 19:09:57.802 INFO (MainThread) [homeassistant.components.sensor] Setting up wiz.sensor
May 12 19:09:57 saga hass[1303189]: 2025-05-12 19:09:57.802 INFO (MainThread) [homeassistant.components.switch] Setting up wiz.switch
(nb.: the above are likely unrelated...)
May 12 19:13:26 saga hass[1303189]: 2025-05-12 19:13:26.578 ERROR (MainThread) [homeassistant] Error doing job: socket.accept() out of system resource (None)
May 12 19:13:26 saga hass[1303189]: Traceback (most recent call last):
May 12 19:13:26 saga hass[1303189]:   File "/home/homeassistant/.local/share/uv/python/cpython-3.13.1-linux-x86_64-gnu/lib/python3.13/asyncio/selector_events.py", line 178, in _accept_connection
May 12 19:13:26 saga hass[1303189]:   File "/home/homeassistant/.local/share/uv/python/cpython-3.13.1-linux-x86_64-gnu/lib/python3.13/socket.py", line 295, in accept
May 12 19:13:26 saga hass[1303189]: OSError: [Errno 24] Too many open files
May 12 19:13:26 saga hass[1303189]: 2025-05-12 19:13:26.579 ERROR (MainThread) [homeassistant] Error doing job: socket.accept() out of system resource (None)
May 12 19:13:26 saga hass[1303189]: Traceback (most recent call last):
May 12 19:13:26 saga hass[1303189]:   File "/home/homeassistant/.local/share/uv/python/cpython-3.13.1-linux-x86_64-gnu/lib/python3.13/asyncio/selector_events.py", line 178, in _accept_connection
May 12 19:13:26 saga hass[1303189]:   File "/home/homeassistant/.local/share/uv/python/cpython-3.13.1-linux-x86_64-gnu/lib/python3.13/socket.py", line 295, in accept
May 12 19:13:26 saga hass[1303189]: OSError: [Errno 24] Too many open files
May 12 19:13:26 saga hass[1303189]: 2025-05-12 19:13:26.580 ERROR (MainThread) [homeassistant] Error doing job: socket.accept() out of system resource (None)
May 12 19:13:26 saga hass[1303189]: Traceback (most recent call last):
May 12 19:13:26 saga hass[1303189]:   File "/home/homeassistant/.local/share/uv/python/cpython-3.13.1-linux-x86_64-gnu/lib/python3.13/asyncio/selector_events.py", line 178, in _accept_connection
May 12 19:13:26 saga hass[1303189]:   File "/home/homeassistant/.local/share/uv/python/cpython-3.13.1-linux-x86_64-gnu/lib/python3.13/socket.py", line 295, in accepthttps://github.com/home-assistant/core/issues/58711
...

repeated ad infinitum, up to 4,000 times per second (!!!), until I systemctl restart homeassistant-ha'd.

Running out of fds is likely an unrecoverable error, so it could be a good idea for HA to crash and get restarted if it notices this?

FWIW, I hadn't noticed this before 2025.4.4 but that isn't saying it's a regression c.f. the previous version, 2025.4.0.

core-2025.4.4 running on Python 3.13.1 (via uv) on Linux x86_64.

What version of Home Assistant Core has the issue?

core-2025.4.4

What was the last working version of Home Assistant Core?

core-2025.4

What type of installation are you running?

Home Assistant Core

Related issues/prior art

@joaopms
Copy link

joaopms commented May 13, 2025

I'm experiencing the same issue on 2025.5.1. Logs, even in debug mode (set in configuration.yaml), don't have anything that stands out.

In safe mode, file descriptors slowly - but surely - go up, with sort of a pattern:

$ lsof -p <home assistant pid>
COMMAND   PID USER   FD      TYPE             DEVICE SIZE/OFF     NODE NAME
(...)
python3 57504 root  692r  a_inode               0,15        0     1075 inotify
python3 57504 root  693u  a_inode               0,15        0     1075 [eventpoll:694,696]
python3 57504 root  694r     FIFO               0,14      0t0 28339185 pipe
python3 57504 root  695w     FIFO               0,14      0t0 28339185 pipe
python3 57504 root  696r  a_inode               0,15        0     1075 inotify
python3 57504 root  697r      CHR                1,9      0t0       10 /dev/urandom
python3 57504 root  698r     FIFO               0,14      0t0 28349860 pipe
python3 57504 root  699w     FIFO               0,14      0t0 28349860 pipe
python3 57504 root  700r  a_inode               0,15        0     1075 inotify
python3 57504 root  701u  a_inode               0,15        0     1075 [eventpoll:703,705]
python3 57504 root  703r     FIFO               0,14      0t0 28359723 pipe
python3 57504 root  704w     FIFO               0,14      0t0 28359723 pipe

Granted my file descriptors limit is set at 1024 and I could just raise it, but that would probably just reduce the frequency of the issue.

Relevant system information
Version core-2025.5.1
Installation type Home Assistant Core
Development false
Supervisor false
Docker false
User root
Virtual environment true
Python version 3.13.3
Operating system family Linux
Operating system version 6.8.12-8-pve
CPU architecture x86_64
Timezone Europe/Lisbon
Configuration directory /root/.homeassistant

@joaopms
Copy link

joaopms commented May 14, 2025

Okay, I figured it out! Turns out the problem is with the Octoprint integration, as seen in #144457 (comment).

Furthermore, I think the attached PR is a good band-aid to prevent bigger issues when leaks like this happen.

@akx
Copy link
Contributor Author

akx commented May 14, 2025

Ah, yeah, I have Octoprint enabled too, so that could well be it. Good find!

EDIT: this just recurred today – disabled the Octoprint integration now...

@bdraco
Copy link
Member

bdraco commented May 15, 2025

Opened aio-libs/aiohttp#10847 to potentially make this safer against leak mistakes. Not sure if we will do that but good to track.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants