Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't Catch Error in Dramatiq worker and crashing the MainThread #227

Closed
7 tasks done
jrusso1020 opened this issue Sep 16, 2019 · 5 comments
Closed
7 tasks done

Can't Catch Error in Dramatiq worker and crashing the MainThread #227

jrusso1020 opened this issue Sep 16, 2019 · 5 comments

Comments

@jrusso1020
Copy link
Contributor

@jrusso1020 jrusso1020 commented Sep 16, 2019

Issues

GitHub issues are for bugs. If you have questions, please ask them on the discussion board.

Checklist

  • Does your title concisely summarize the problem?
  • Did you include a minimal, reproducible example?
  • What OS are you using?
  • What version of Dramatiq are you using?
  • What did you do?
  • What did you expect would happen?
  • What happened?

What OS are you using?

macOS 10.14.5

What version of Dramatiq are you using?

1.6.1

What did you do?

While running a dramatiq actor that involves querying an elastic search instance on AWS a critical error occurs causing the dramatiq MainThread to crash. I try to catch the error in the worker as well as around the piece of code I believe is crashing but I am unable to catch the exception and therefore unable to tell what exactly is happening. I have checked that the environment variables are set correctly.

The code works when I perform a local query to elasticsearch on my computer, but it fails when trying to query a Elasticsearch cluster on AWS. However, I am able to run the code in the python terminal. It just fails in the dramatiq worker

@dramatiq.actor
def worker():
    import os
    from elasticsearch_dsl import Search
    from elasticsearch import Elasticsearch, RequestsHttpConnection
    from requests_aws4auth import AWS4Auth
    index = os.getenv('ELASTIC_SEARCH_INDEX')
    host = os.getenv('ELASTIC_SEARCH_HOST')
    awsauth = AWS4Auth(os.getenv('AWS_ACCESS_KEY'), os.getenv('AWS_SECRET_KEY'), os.getenv('AWS_REGION'), 'es')
    es = Elasticsearch(
        hosts=[{'host': host, 'port': 443}],
        http_auth=awsauth,
        use_ssl=True,
        verify_certs=True,
        connection_class=RequestsHttpConnection
    )
    logger.info('got here')
    try:
        logger.info("es.info {}".format(es.info()))
    except Exception as err:
        logger.info(err)

What did you expect would happen?

I would expect to be able to catch the exception or at least have the exception surfaced by dramatiq so that I can see what is happening and debug it. If I was able to see what exception/error was occuring that would help me know what is going wrong when trying to access the AWS elasticsearch cluster

What happened?

Received the following critical logging message, and then the MainThread is shut down killing all dramatiq workers

13:25:32 worker.1   | 2019-09-16 13:25:32,309 [MainThread  ] [CRITI]  Worker with PID 34475 exited unexpectedly (code -11). Shutting down...
@Bogdanp
Copy link
Owner

@Bogdanp Bogdanp commented Sep 17, 2019

code -11

That's a segmentation fault. Either your Python interpreter has a bug or some C module that you or the code you're using depends on does. I'm afraid there's not a ton I can do about this.

Things I would try:

  • upgrade the Python interpreter
  • audit dependencies and look for which C modules might be causing this
  • use strace or attach gdb to the Python process to try to determine where the error occurs and, possibly, report it on the Python bug tracker.

@Bogdanp Bogdanp closed this Sep 17, 2019
@jrusso1020
Copy link
Contributor Author

@jrusso1020 jrusso1020 commented Sep 17, 2019

@Bogdanp thank you that is useful

@jrusso1020
Copy link
Contributor Author

@jrusso1020 jrusso1020 commented Sep 17, 2019

@Bogdanp if you were at all curious the issue is related to a known issue with macOSX

From issue 30385 long standing problem when using Python applications that fork on macOS and that end up calling certain system frameworks that under the covers use the system libdispatch which is not fork safe

https://bugs.python.org/issue30385
https://bugs.python.org/issue13829

@Bogdanp
Copy link
Owner

@Bogdanp Bogdanp commented Sep 17, 2019

Interesting! Thanks. I develop dramatiq on a mac, but I've never run into this.

Since forking is the problem, it sounds like you might be able to work around it by modifying dramatiq/cli.py to set the multiprocessing "start method" to spawn, eg:

multiprocessing.set_start_method("spawn")

I plan to eventually let users either outright specify this as a flag or offer some kind of hook so they can run arbitrary code before the worker processes are created (enabling them to call that function w/o needing to modify dramatiq), but it'll be a while before I get to it.

@jrusso1020
Copy link
Contributor Author

@jrusso1020 jrusso1020 commented Sep 17, 2019

If I get some free time this week I can try and get a PR up that will allow user's to pass in a --spawn flag to use the spawn start method. If that's something that is worth while.

I use Flask-dramatiq though, so I think a change might also be needed there for me to pass a such a flag

Just for prosperity sake, here's the stacktrace from the segmentation fault caused by urllib in the elasticsearch library

Current thread 0x0000700007af2000 (most recent call first):
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 2586 in proxy_bypass_macosx_sysconf
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 2610 in proxy_bypass
  File "/Users/james/.local/share/virtualenvs/SAFE_core-_EMBPm_e/lib/python3.7/site-packages/requests/utils.py", line 745 in should_bypass_proxies
  File "/Users/james/.local/share/virtualenvs/SAFE_core-_EMBPm_e/lib/python3.7/site-packages/requests/utils.py", line 761 in get_environ_proxies
  File "/Users/james/.local/share/virtualenvs/SAFE_core-_EMBPm_e/lib/python3.7/site-packages/requests/sessions.py", line 700 in merge_environment_settings
  File "/Users/james/.local/share/virtualenvs/SAFE_core-_EMBPm_e/lib/python3.7/site-packages/elasticsearch/connection/http_requests.py", line 119 in perform_request
  File "/Users/james/.local/share/virtualenvs/SAFE_core-_EMBPm_e/lib/python3.7/site-packages/elasticsearch/transport.py", line 350 in perform_request
  File "/Users/james/.local/share/virtualenvs/SAFE_core-_EMBPm_e/lib/python3.7/site-packages/elasticsearch/client/__init__.py", line 259 in info
  File "/Users/james/.local/share/virtualenvs/SAFE_core-_EMBPm_e/lib/python3.7/site-packages/elasticsearch/client/utils.py", line 84 in _wrapped

jrusso1020 added a commit to jrusso1020/dramatiq that referenced this issue Sep 17, 2019
Add a `--spawn` flag to the cli that will allow a user to spawn
processes instead of fork processes if they are on an OS that
defaults to forking processes such as unix based machines.

This started from a conversation on this issue Bogdanp#227
which showed the problem on unix(namely macOS) systems caused by
forking processes.
Bogdanp added a commit that referenced this issue Sep 18, 2019
Add a `--use-spawn` flag to the cli that will allow a user to spawn
processes instead of fork processes if they are on an OS that
defaults to forking processes such as unix based machines.

This started from a conversation on this[1] issue which showed the
problem on unix(namely macOS) systems caused by forking processes.

[1]: #227
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants