Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GPU docker & fix race condition with multiple workers #436

Merged
merged 6 commits into from
Sep 29, 2020

Conversation

tholor
Copy link
Member

@tholor tholor commented Sep 25, 2020

GPU Dockerfile was outdated

@tholor
Copy link
Member Author

tholor commented Sep 25, 2020

Somehow still getting an error when running run_docker_gpu.sh with the prepopulated ES image (deepset/elasticsearch-game-of-thrones):

[2020-09-25 16:53:50 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2020-09-25 16:53:50 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
[2020-09-25 16:53:50 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2020-09-25 16:53:50 +0000] [8] [INFO] Booting worker with pid: 8
[2020-09-25 16:53:50 +0000] [9] [INFO] Booting worker with pid: 9
09/25/2020 16:53:52 - INFO - elasticsearch -   HEAD http://localhost:9200/document [status:200 request:0.100s]
09/25/2020 16:53:52 - INFO - elasticsearch -   HEAD http://localhost:9200/document [status:200 request:0.100s]
09/25/2020 16:53:52 - WARNING - elasticsearch -   PUT http://localhost:9200/label [status:400 request:0.091s]
[2020-09-25 16:53:52 +0000] [8] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/gunicorn/arbiter.py", line 583, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.7/dist-packages/uvicorn/workers.py", line 61, in init_process
    super(UvicornWorker, self).init_process()
  File "/usr/local/lib/python3.7/dist-packages/gunicorn/workers/base.py", line 119, in init_process
    self.load_wsgi()
  File "/usr/local/lib/python3.7/dist-packages/gunicorn/workers/base.py", line 144, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/usr/local/lib/python3.7/dist-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/usr/local/lib/python3.7/dist-packages/gunicorn/app/wsgiapp.py", line 49, in load
    return self.load_wsgiapp()
  File "/usr/local/lib/python3.7/dist-packages/gunicorn/app/wsgiapp.py", line 39, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/usr/local/lib/python3.7/dist-packages/gunicorn/util.py", line 358, in import_app
    mod = importlib.import_module(module)
  File "/usr/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/user/rest_api/application.py", line 10, in <module>
    from rest_api.controller.router import router as api_router
  File "/home/user/rest_api/controller/router.py", line 3, in <module>
    from rest_api.controller import file_upload
  File "/home/user/rest_api/controller/file_upload.py", line 38, in <module>
    faq_question_field=FAQ_QUESTION_FIELD_NAME,
  File "/home/user/haystack/document_store/elasticsearch.py", line 100, in __init__
    self._create_label_index(label_index)
  File "/home/user/haystack/document_store/elasticsearch.py", line 150, in _create_label_index
    self.client.indices.create(index=index_name, body=mapping)
  File "/usr/local/lib/python3.7/dist-packages/elasticsearch/client/utils.py", line 152, in _wrapped
    return func(*args, params=params, headers=headers, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/elasticsearch/client/indices.py", line 120, in create
    "PUT", _make_path(index), params=params, headers=headers, body=body
  File "/usr/local/lib/python3.7/dist-packages/elasticsearch/transport.py", line 392, in perform_request
    raise e
  File "/usr/local/lib/python3.7/dist-packages/elasticsearch/transport.py", line 365, in perform_request
    timeout=timeout,
  File "/usr/local/lib/python3.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 269, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python3.7/dist-packages/elasticsearch/connection/base.py", line 301, in _raise_error
    status_code, error_message, additional_info
elasticsearch.exceptions.RequestError: RequestError(400, 'resource_already_exists_exception', 'index [label/TXhOImR7Q_OvU9hDyXP4xg] already exists')
[2020-09-25 16:53:52 +0000] [8] [INFO] Worker exiting (pid: 8)

@tholor
Copy link
Member Author

tholor commented Sep 29, 2020

Turned out this is not an issue with GPU images, but rather a general problem of multiple workers ending up in a race condition when creating an index. Investigated gunicorn's --preload flag that would circumvent this. However, this doesn't work only in the CPU and single process case. With cuda we get:

  File "/usr/local/lib/python3.7/dist-packages/torch/cuda/__init__.py", line 185, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

With multiprocessing: the application gets stuck after calling the reader.

@tholor
Copy link
Member Author

tholor commented Sep 29, 2020

All images are now updated an pushed to dockerhub

@tholor tholor changed the title Update GPU docker Update GPU docker & fix race condition Sep 29, 2020
@tholor tholor changed the title Update GPU docker & fix race condition Update GPU docker & fix race condition with multiple workers Sep 29, 2020
@tholor tholor merged commit a92ca04 into master Sep 29, 2020
@julian-risch julian-risch deleted the update_docker branch November 15, 2021 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants