Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can only join a child process #2094

Closed
chengcheng-pei opened this issue Jul 8, 2020 · 1 comment · Fixed by #2413
Closed

can only join a child process #2094

chengcheng-pei opened this issue Jul 8, 2020 · 1 comment · Fixed by #2413
Assignees
Projects
Milestone

Comments

@chengcheng-pei
Copy link
Contributor

Describe the bug

When deleting xgboost rest model server, I got:

xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier Error in atexit._run_exitfuncs:
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier Traceback (most recent call last):
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier   File "/opt/conda/lib/python3.7/multiprocessing/util.py", line 322, in _exit_function
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier     p.join()
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier   File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 138, in join
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier     assert self._parent_pid == os.getpid(), 'can only join a child process'
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier AssertionError: can only join a child process
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier Error in atexit._run_exitfuncs:
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier Traceback (most recent call last):
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier   File "/opt/conda/lib/python3.7/multiprocessing/util.py", line 322, in _exit_function
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier     p.join()
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier   File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 138, in join
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier     assert self._parent_pid == os.getpid(), 'can only join a child process'
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier AssertionError: can only join a child process
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier Error in atexit._run_exitfuncs:
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier Traceback (most recent call last):
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier   File "/opt/conda/lib/python3.7/multiprocessing/util.py", line 322, in _exit_function
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier     p.join()
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier   File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 138, in join
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier     assert self._parent_pid == os.getpid(), 'can only join a child process'
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier AssertionError: can only join a child process
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier Error in atexit._run_exitfuncs:
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier Traceback (most recent call last):
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier   File "/opt/conda/lib/python3.7/multiprocessing/util.py", line 322, in _exit_function
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier     p.join()
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier   File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 138, in join
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier     assert self._parent_pid == os.getpid(), 'can only join a child process'
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier AssertionError: can only join a child process
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier Error in atexit._run_exitfuncs:
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier Traceback (most recent call last):
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier   File "/opt/conda/lib/python3.7/multiprocessing/util.py", line 322, in _exit_function
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier     p.join()
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier   File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 138, in join
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier     assert self._parent_pid == os.getpid(), 'can only join a child process'
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier AssertionError: can only join a child process
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier Error in atexit._run_exitfuncs:
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier Traceback (most recent call last):
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier   File "/opt/conda/lib/python3.7/multiprocessing/util.py", line 322, in _exit_function
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier     p.join()
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier   File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 138, in join
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier     assert self._parent_pid == os.getpid(), 'can only join a child process'
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier AssertionError: can only join a child process
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier Error in atexit._run_exitfuncs:
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier Traceback (most recent call last):
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier   File "/opt/conda/lib/python3.7/multiprocessing/util.py", line 322, in _exit_function
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier     p.join()
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier   File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 138, in join
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier     assert self._parent_pid == os.getpid(), 'can only join a child process'
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier AssertionError: can only join a child process
xgboost-rest-xgboost-rest-0-classifier-c965c6dfb-qbhx9 classifier [2020-07-08 23:16:33 +0000] [1] [INFO] Shutting down: Master

Deployment file:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: xgboost-rest
  namespace: team-xxxx
  labels:
    seldon.io/controller-id: ''
spec:
  name: xgboost-rest
  predictors:
  - annotations:
      project_name: seldon-benchmark
    componentSpecs:
    - spec:
        containers:
        - name: classifier
          image: seldonio/xgboostserver_rest:0.4
          imagePullPolicy: Always
          resources:
            requests:
              memory: 1Gi
              cpu: 4
            limits:
              memory: 4Gi
              cpu: 6
          env:
          - name: SELDON_LOG_LEVEL
            value: DEBUG
          - name: GUNICORN_MAX_REQUESTS
            value: "10000"
          - name: GUNICORN_MAX_REQUESTS_JITTER
            value: "1000"
          - name: GUNICORN_WORKERS
            value: "7"
          - name: KMP_AFFINITY
            value: "granularity=fine,verbose,compact,1,0"
          - name: KMP_BLOCKTIME
            value: "1"
          - name: OMP_NUM_THREADS
            value: "8"
        terminationGracePeriodSeconds: 2
    graph:
      children: []
      implementation: XGBOOST_SERVER
      modelUri: gs://ml_models_test/v0002
      serviceAccountName: seldon-core-manager-key
      name: classifier
      endpoint:
        type: REST
    name: xgboost-rest
    replicas: 1
    svcOrchSpec:
      resources:
        requests:
          cpu: 2
          memory: 2Gi
      env:
      - name: SELDON_LOG_LEVEL
        value: DEBUG

To reproduce

Expected behaviour

Environment

gke
k8s client 1.18
k8s server 1.15
seldon-core: 1.2.1

Model Details

  • Images of your model: [Output of: kubectl get seldondeployment -n <yourmodelnamespace> <seldondepname> -o yaml | grep image: where <yourmodelnamespace>]
  • Logs of your model: [You can get the logs of your model by running kubectl logs -n <yourmodelnamespace> <seldonpodname> <container>]
@chengcheng-pei chengcheng-pei added bug triage Needs to be triaged and prioritised accordingly labels Jul 8, 2020
@ukclivecox ukclivecox added priority/p1 and removed triage Needs to be triaged and prioritised accordingly labels Jul 9, 2020
@ukclivecox ukclivecox added this to To do in 1.3 via automation Jul 9, 2020
@ukclivecox ukclivecox added this to the 1.3 milestone Jul 9, 2020
@adriangonz adriangonz moved this from To do to In progress in 1.3 Sep 11, 2020
@adriangonz
Copy link
Contributor

Hey @chengcheng-pei, thanks for reporting this issue.

The Python server runs Gunicorn from a process which also spawns some other children processes, and apparently that causes some issues. There is some more info in this thread: benoitc/gunicorn#1391

I'll have a look into the workarounds mentioned there. Otherwise, we could also move the main Gunicorn to a child process itself.

@adriangonz adriangonz moved this from In progress to In Review in 1.3 Sep 11, 2020
1.3 automation moved this from In Review to Done Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
1.3
  
Done
Development

Successfully merging a pull request may close this issue.

3 participants