Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model server throws error when multiprocessing start method is already set #3406

Open
sivanantha321 opened this issue Feb 3, 2024 · 3 comments · May be fixed by #3407
Open

Model server throws error when multiprocessing start method is already set #3406

sivanantha321 opened this issue Feb 3, 2024 · 3 comments · May be fixed by #3407
Labels

Comments

@sivanantha321
Copy link
Member

sivanantha321 commented Feb 3, 2024

/kind bug

I’m trying to start a kserve model server with 2 workers like this

model.load()
kserve.ModelServer(workers=2).start([model])

But I get this error in the log

bin /app/venv/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so
CUDA SETUP: Loading binary /app/venv/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
{"asctime": "2024-02-01 16:51:03,775", "name": "torch.distributed.nn.jit.instantiator", "levelname": "INFO", "message": "Created a temporary directory at /tmp/tmp7h8azrb7"}
{"asctime": "2024-02-01 16:51:03,775", "name": "torch.distributed.nn.jit.instantiator", "levelname": "INFO", "message": "Writing /tmp/tmp7h8azrb7/_remote_module_non_scriptable.py"}
/app/venv/lib/python3.8/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
{"asctime": "2024-02-01 16:52:11,342", "name": "root", "levelname": "INFO", "message": "PATH_TO_MODEL is set to ==>/mnt/models/"}
{"asctime": "2024-02-01 16:52:11,342", "name": "root", "levelname": "INFO", "message": "Registering model: llama-2-7b-chat-hf"}
{"asctime": "2024-02-01 16:52:11,343", "name": "root", "levelname": "INFO", "message": "Setting max asyncio worker threads as 8"}
{"asctime": "2024-02-01 16:52:11,344", "name": "root", "levelname": "INFO", "message": "Starting uvicorn with 2 workers"}
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|█████     | 1/2 [00:48<00:48, 48.85s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [01:06<00:00, 30.56s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [01:06<00:00, 33.31s/it]
{"asctime": "2024-02-01 16:52:11,368", "name": "root", "levelname": "INFO", "message": "Starting gRPC server on [::]:8081"}
{"asctime": "2024-02-01 16:52:11,369", "name": "root", "levelname": "ERROR", "message": "uncaught exception", "exc_info": "Traceback (most recent call last):\n  File \"serve.py\", line 56, in <module>\n    kserve.ModelServer(workers=NUM_WORKERS).start([model])\n  File \"/app/venv/lib/python3.8/site-packages/kserve/model_server.py\", line 167, in start\n    asyncio.run(servers_task())\n  File \"/usr/lib/python3.8/asyncio/runners.py\", line 43, in run\n    return loop.run_until_complete(main)\n  File \"/usr/lib/python3.8/asyncio/base_events.py\", line 608, in run_until_complete\n    return future.result()\n  File \"/app/venv/lib/python3.8/site-packages/kserve/model_server.py\", line 165, in servers_task\n    await asyncio.gather(*servers)\n  File \"/app/venv/lib/python3.8/site-packages/kserve/model_server.py\", line 154, in serve\n    multiprocessing.set_start_method('fork')\n  File \"/usr/lib/python3.8/multiprocessing/context.py\", line 243, in set_start_method\n    raise RuntimeError('context has already been set')\nRuntimeError: context has already been set"}
Exception ignored in: <function Server.__del__ at 0x7f3b818b5ca0>
Traceback (most recent call last):
  File "/app/venv/lib/python3.8/site-packages/grpc/aio/_server.py", line 170, in __del__
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/common.pyx.pxi", line 118, in grpc._cython.cygrpc.schedule_coro_threadsafe
  File "src/python/grpcio/grpc/_cython/_cygrpc/aio/common.pyx.pxi", line 110, in grpc._cython.cygrpc.schedule_coro_threadsafe
  File "/usr/lib/python3.8/asyncio/base_events.py", line 425, in create_task
  File "/usr/lib/python3.8/asyncio/base_events.py", line 504, in _check_closed
RuntimeError: Event loop is closed
11:54

I would think kserve should handle multiprocessing gracefully since it offers passing more worker to the kserve.ModelServer().start() function but maybe I'm wrong?

Slack Message

What steps did you take and what happened:
[A clear and concise description of what the bug is.]

What did you expect to happen:

What's the InferenceService yaml:
[To help us debug please run kubectl get isvc $name -n $namespace -oyaml and paste the output]

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Istio Version:
  • Knative Version:
  • KServe Version: 0.10.0 - 0.12.0-rc1
  • Kubeflow version:
  • Cloud Environment:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]
  • Minikube/Kind version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):
@serdarildercaglar
Copy link

I am having trouble with the same issue. Is there an improvement?
@sivanantha321

@sivanantha321
Copy link
Member Author

Currently it is on hold. We need to do more tests. Is it possible for you to try if the changes in the PR #3407 works for you ?

@serdarildercaglar
Copy link

Currently it is on hold. We need to do more tests. Is it possible for you to try if the changes in the PR #3407 works for you ?

#3407 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants