Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible deadlock when using multiprocessing #17

Open
yuvalshi0 opened this issue Sep 13, 2023 · 2 comments
Open

possible deadlock when using multiprocessing #17

yuvalshi0 opened this issue Sep 13, 2023 · 2 comments
Assignees

Comments

@yuvalshi0
Copy link

Heyo,
We are using the coralogix handler, version 2.0.5
So recently we implemented a feature which leverages multiprocessing in python, the feature raises a Pool of proceses. While in production we do not terminate in and restart the pool, if our tests we do that alot.

After implementing the feature, we say several cases where our processes will just hang transiently, upon investigation I was lucky to reproduce this bug locally, I saw there was a processes hanging ,using py-spy I looked at the dump of the process this find out:

The main process (waiting for the pool to die):

> py-spy dump --pid 330362

Thread 330362 (idle): "MainThread"
    poll (multiprocessing/popen_fork.py:27)
    wait (multiprocessing/popen_fork.py:43)
    join (multiprocessing/process.py:149)
    _terminate_pool (multiprocessing/pool.py:732)
    __call__ (multiprocessing/util.py:224)
    terminate (multiprocessing/pool.py:657)
    close (parallel.py:168)

The hanging process:

> py-spy dump --pid 331186

Thread 331186 (idle): "MainThread"
    send_request (coralogix/http.py:45)
    _send_bulk (coralogix/manager.py:239)
    flush (coralogix/manager.py:278)
    _handler (coralogix/manager.py:350)
    handler (coralogix/manager.py:341)
    ident (threading.py:1154)
    _shutdown (threading.py:1540)

It seems the process hangs in the coralogix http, looking at the code, it seems the specific line its hanging is cls._mutex.acquire(), therefore the hanging process is deadlocked with the main process. For now we have disabled coralogix in our CI

@daidokoro
Copy link

@yuvalshi0 , thanks for raising this issue.

I'm having some difficulty replicating the issue in testing. Would you be able to provide code snippet(s) of how the SDK being used?

@yuvalshi0
Copy link
Author

yuvalshi0 commented Sep 26, 2023

@yuvalshi0 , thanks for raising this issue.

I'm having some difficulty replicating the issue in testing. Would you be able to provide code snippet(s) of how the SDK being used?

Heyo @daidokoro,
Here is a minimal reproducible example:

from multiprocessing import Pool
import logging
from coralogix.handlers import CoralogixLogger

CORALOGIX_PRIVATE_KEY = "<PRIVATE_KEY_HERE>"

handler = CoralogixLogger(
                private_key=CORALOGIX_PRIVATE_KEY,
                app_name="dabug",
                subsystem="Subsystem",
            )
logger = logging.Logger("dabug")
logger.addHandler(handler)

def some_func(i):
    logger.info(f"i is {i}")
    print(i)


def test_logger_issue():
    with Pool() as pool:
        pool.map(some_func, range(1000))

To run:

pytest <filename>

This causes pytest to hang, forever

Note that it might take a few runs to actually happen, I used pytest-repeat to run the test a few times in a loop until the deadlock happens:

pytest <filename> --count=1000

Let me know if you need anymore help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants