Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TorchServe crashes in production with `WorkerThread - IllegalStateException error' #3087

Open
MaelitoP opened this issue Apr 15, 2024 · 4 comments

Comments

@MaelitoP
Copy link

MaelitoP commented Apr 15, 2024

Hello,

I have a problem with TorchServe in production. When I start the server there is no problem and everything works perfectly, I can use the model and make requests on it. But after around 3 days the server crashes and I don't understand why.

If you look at the monitoring metrics, you can see that ts_inference_requests_total is ~1 req/s. And the PredictionTime is between 40 and 600ms.

My attempt to trigger the error in local was unsuccessful 😞

Server error logs (TorchServe)

Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,559 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND HEADERS: streamId=7663 headers=GrpcHttp2RequestHeaders[:path: /org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions padding=0 endStream=false
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,561 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND DATA: streamId=7663 padding=0 endStream=true length=186 bytes=xxxxxxxxxxx...
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,561 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] OUTBOUND PING: ack=false bytes=1234
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,562 [DEBUG] W-9007-distilroberta_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd PREDICT repeats 1 to backend at: 1713066279562
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,563 [INFO ] W-9007-distilroberta_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1713066279563
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,564 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND PING: ack=true bytes=1234
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,564 [INFO ] W-9007-distilroberta_1.0-stdout MODEL_LOG - Backend received inference at: 1713066279
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,598 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND RST_STREAM: streamId=7663 errorCode=0
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,599 [WARN ] grpc-default-executor-371 org.pytorch.serve.grpcimpl.InferenceImpl - grpc client call already cancelled
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,599 [INFO ] grpc-default-executor-371 ACCESS_LOG - /0.0.0.0:18528 "gRPC org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions HTTP/2.0" 1 38
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,600 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] OUTBOUND RST_STREAM: streamId=7663 errorCode=8
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [INFO ] W-9007-distilroberta_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]PredictionTime.Milliseconds:85.84|#ModelName:distilroberta,Level:Model|#type:GAUGE|#hostname:xxxxxxxx,1713066279,xxxxxxxx-xxxxx-xxxx-xxxx-xxxxxxx, pattern=[METRICS]
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [WARN ] W-9007-distilroberta_1.0 org.pytorch.serve.job.GRPCJob - grpc client call already cancelled, not able to send this response for requestId: xxxxxxxx-xxxxx-xxxx-xxxx-xxxxxxx
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [ERROR] W-9007-distilroberta_1.0 org.pytorch.serve.wlm.WorkerThread - IllegalStateException error
Apr 14 03:44:39 sentiment4_app[941]: java.lang.IllegalStateException: Stream was terminated by error, no further calls are allowed
Apr 14 03:44:39 sentiment4_app[941]: #011at com.google.common.base.Preconditions.checkState(Preconditions.java:502) ~[model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:374) ~[model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at org.pytorch.serve.job.GRPCJob.response(GRPCJob.java:130) ~[model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at org.pytorch.serve.wlm.BatchAggregator.sendResponse(BatchAggregator.java:103) ~[model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:238) [model-server.jar:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: #011at java.lang.Thread.run(Thread.java:840) [?:?]
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [WARN ] W-9007-distilroberta_1.0 org.pytorch.serve.job.GRPCJob - grpc client call already cancelled, not able to send this response for requestId: xxxxxxxx-xxxxx-xxxx-xxxx-xxxxxxx
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [INFO ] W-9007-distilroberta_1.0 ACCESS_LOG - /0.0.0.0:18528 "gRPC org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions HTTP/2.0" 13 89
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,651 [INFO ] W-9007-distilroberta_1.0-stdout MODEL_LOG - Frontend disconnected.
Apr 14 03:44:39 sentiment4_app[941]: 2024-04-14T03:44:39,650 [INFO ] epollEventLoopGroup-5-11 org.pytorch.serve.wlm.WorkerThread - 9007 Worker disconnected. WORKER_MODEL_LOADED
Apr 14 03:44:40 sentiment4_app[941]: 2024-04-14T03:44:40,903 [INFO ] W-9007-distilroberta_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9007-distilroberta_1.0-stdout
Apr 14 03:44:40 sentiment4_app[941]: 2024-04-14T03:44:40,903 [INFO ] W-9007-distilroberta_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9007-distilroberta_1.0-stderr
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,184 [DEBUG] grpc-default-worker-ELG-3-7 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0xaecf14e8, L:/0.0.0.0:7070 - R:/0.0.0.0:7782] INBOUND HEADERS: streamId=43339 headers=GrpcHttp2RequestHeaders[:path: /org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions, padding=0 endStream=false
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,184 [DEBUG] grpc-default-worker-ELG-3-7 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0xaecf14e8, L:/0.0.0.0:7070 - R:/0.0.0.0:7782] INBOUND DATA: streamId=43339 padding=0 endStream=true length=625 bytes=xxxxxxxxxxxx...
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,184 [DEBUG] grpc-default-worker-ELG-3-7 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0xaecf14e8, L:/0.0.0.0:7070 - R:/0.0.0.0:7782] OUTBOUND PING: ack=false bytes=1234
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,185 [DEBUG] grpc-default-worker-ELG-3-7 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0xaecf14e8, L:/0.0.0.0:7070 - R:/0.0.0.0:7782] INBOUND PING: ack=true bytes=1234
Apr 14 03:44:41 sentiment4_app[941]: 2024-04-14T03:44:41,850 [DEBUG] grpc-default-worker-ELG-3-5 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x36776bf7, L:/0.0.0.0:7070 - R:/0.0.0.0:18528] INBOUND HEADERS: streamId=7665 headers=GrpcHttp2RequestHeaders[:path: /org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions, :method: POST, :scheme: http, te: trailers, content-type: application/grpc, user-agent: grpc-go/1.23.1, grpc-timeout: 120000m

Client error logs (Go client)

Apr 14 03:44:39 augmenter_app[924]: 
  time="2024-04-14T03:44:39Z" 
  level=error 
  msg="An error occurred during augmentation of an article using Sentiment v4" 
  articleId="xxxxx" 
  error="sentiment4c : prediction failed: rpc error: code = Internal desc = Model \"distilroberta\" has no worker to serve inference request. Please use scale workers API to add workers. If this is a sequence inference, please check if it is closed, or expired; or exceeds maxSequenceJobQueueSize\nInternalServerException.()"

Dockerfile

FROM python:3.12.0-slim-bookworm as build-stage

RUN pip install torch-model-archiver==0.9.0

COPY models models
COPY deployments deployments

# Create model for english
RUN mkdir model-store \
    && torch-model-archiver --model-name "distilroberta" --export-path "model-store" --version 1.0  \
    --serialized-file models/distilRoBERTa/pytorch_distil_roberta_sentiment_v4.pt  \
    --extra-files "./models/distilRoBERTa/vocab.json,./models/distilRoBERTa/merges.txt"  \
    --handler "deployments/distilRoBERTa_torchserve_handler.py" --force

# Create model for french, italian, german and spanish
RUN torch-model-archiver --model-name "multilingual_distilbert" --export-path "model-store" --version 1.0  \
    --serialized-file models/multilingual_distilBERT/pytorch_multi_distil_bert_sentiment_v4.pt  \
    --extra-files "./models/multilingual_distilBERT/vocab.txt"  \
    --handler "deployments/multilingual_distilBERT_torchserve_handler.py" --force

FROM pytorch/torchserve:latest as run-stage

COPY --from=build-stage model-store model-store
COPY deployments deployments

RUN pip install transformers==4.37.2

CMD ["torchserve", "--start", "--model-store", "model-store", "--models", "distilroberta=distilroberta.mar multilingual_distilbert=multilingual_distilbert.mar", "--ts-config", "deployments/config.properties"]

Configuration (config.properties)

[DEFAULT]
grpc_inference_port=7070
grpc_management_port=7071
metrics_address=http://127.0.0.1:8082
enable_metrics_api=true
metrics_config=deployments/metrics.yaml
metrics_mode=prometheus
vmargs=-Dlog4j.configurationFile=file:deployments/log4j2.xml
async_logging=true
models={"distilroberta": {"1.0": {"defaultVersion": true,"marName": "distilroberta.mar","minWorkers": 3,"maxWorkers": 6,"batchSize": 1}}}

Thank you so much for you time

@lxning
Copy link
Collaborator

lxning commented Apr 15, 2024

@MaelitoP The following log shows that gRPC client closed the connection which caused the response was not able to send response and threw error.

Could you please check your gRPC client side?

@MaelitoP
Copy link
Author

MaelitoP commented Apr 15, 2024

@lxning I'm checking, but I see nothing at the moment. When I look to the client logs, we can see that the server respond with an error:

Apr 14 03:44:39 augmenter_app[924]: 
  time="2024-04-14T03:44:39Z" 
  level=error 
  msg="An error occurred during augmentation of an article using Sentiment v4" 
  articleId="xxxxx" 
  error="sentiment4c : prediction failed: rpc error: code = Internal desc = Model \"distilroberta\" has no worker to serve inference request. Please use scale workers API to add workers. If this is a sequence inference, please check if it is closed, or expired; or exceeds maxSequenceJobQueueSize\nInternalServerException.()"

As far as I know, if the connection is lost, we get an error like code = Unavailable desc = transport is closing instead.

But why does the server crash when a client closes the connection? This behaviour looks weird to me, but that's probably common, I don't know.

@MaelitoP
Copy link
Author

MaelitoP commented Apr 16, 2024

Hello,
To make sure I wasn't getting a timeout from the client, I forced a timeout to check the error message. I got this one:

error="sentiment4c : prediction failed: rpc error: 
code = DeadlineExceeded 
desc = context deadline exceeded"

When I did this timeout test, the TorchServe instance crashed with error:

Apr 16 07:05:19 sentiment4_app[710]: 2024-04-16T07:05:19,850 [WARN ] grpc-default-executor-545 org.pytorch.serve.grpcimpl.InferenceImpl - grpc client call already cancelled
Apr 16 07:05:19 sentiment4_app[710]: 2024-04-16T07:05:19,850 [INFO ] grpc-default-executor-545 ACCESS_LOG - /0.0.0.0:62774 "gRPC org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions HTTP/2.0" 1 313
Apr 16 07:05:19 sentiment4_app[710]: 2024-04-16T07:05:19,850 [DEBUG] grpc-default-worker-ELG-3-8 io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler - [id: 0x62653f08, L:/0.0.0.0:7070 - R:/0.0.0.0:62774] OUTBOUND RST_STREAM: streamId=84103 errorCode=8
Apr 16 07:05:19 sentiment4_app[710]: 2024-04-16T07:05:19,903 [INFO ] W-9010-distilroberta_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]PredictionTime.Milliseconds:318.27|#ModelName:distilroberta,Level:Model|#type:GAUGE|#hostname:xxxxxx,xxxxxxxx,94a54730-e1f3-4fe1-ab30-63c26e2fb6c9, pattern=[METRICS]
Apr 16 07:05:19 sentiment4_app[710]: 2024-04-16T07:05:19,903 [WARN ] W-9010-distilroberta_1.0 org.pytorch.serve.job.GRPCJob - grpc client call already cancelled, not able to send this response for requestId: 94a54730-e1f3-4fe1-ab30-63c26e2fb6c9
Apr 16 07:05:19 sentiment4_app[710]: 2024-04-16T07:05:19,903 [ERROR] W-9010-distilroberta_1.0 org.pytorch.serve.wlm.WorkerThread - IllegalStateException error
Apr 16 07:05:19 sentiment4_app[710]: java.lang.IllegalStateException: Stream was terminated by error, no further calls are allowed
Apr 16 07:05:19 sentiment4_app[710]: #011at com.google.common.base.Preconditions.checkState(Preconditions.java:502) ~[model-server.jar:?]
Apr 16 07:05:19 sentiment4_app[710]: #011at io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onNext(ServerCalls.java:374) ~[model-server.jar:?]
Apr 16 07:05:19 sentiment4_app[710]: #011at org.pytorch.serve.job.GRPCJob.response(GRPCJob.java:130) ~[model-server.jar:?]
Apr 16 07:05:19 sentiment4_app[710]: #011at org.pytorch.serve.wlm.BatchAggregator.sendResponse(BatchAggregator.java:103) ~[model-server.jar:?]
Apr 16 07:05:19 sentiment4_app[710]: #011at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:238) [model-server.jar:?]
Apr 16 07:05:19 sentiment4_app[710]: #011at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
Apr 16 07:05:19 sentiment4_app[710]: #011at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
Apr 16 07:05:19 sentiment4_app[710]: #011at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
Apr 16 07:05:19 sentiment4_app[710]: #011at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
Apr 16 07:05:19 sentiment4_app[710]: #011at java.lang.Thread.run(Thread.java:840) [?:?]
Apr 16 07:05:19 sentiment4_app[710]: 2024-04-16T07:05:19,903 [WARN ] W-9010-distilroberta_1.0 org.pytorch.serve.job.GRPCJob - grpc client call already cancelled, not able to send this response for requestId: 94a54730-e1f3-4fe1-ab30-63c26e2fb6c9
Apr 16 07:05:19 sentiment4_app[710]: 2024-04-16T07:05:19,903 [INFO ] W-9010-distilroberta_1.0 ACCESS_LOG - /0.0.0.0:62774 "gRPC org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions HTTP/2.0" 13 366
Apr 16 07:05:19 sentiment4_app[710]: 2024-04-16T07:05:19,904 [INFO ] W-9010-distilroberta_1.0-stdout MODEL_LOG - Frontend disconnected.
Apr 16 07:05:19 sentiment4_app[710]: 2024-04-16T07:05:19,904 [INFO ] epollEventLoopGroup-5-4 org.pytorch.serve.wlm.WorkerThread - 9010 Worker disconnected. WORKER_MODEL_LOADED

What can I do on the TorchServe side to avoid a crashes when a client disconnect ?

@MaelitoP
Copy link
Author

MaelitoP commented Apr 16, 2024

Are you sure that it's not a bug on TorchServe side ? If we take time to read logs we can see that the error happen here. (GRPCJob.java l.130).

We try to do responseObserver.onNext(reply); but responseObserver is already cancelled.

You do a PR some time ago @lxning to fix this problem (#2420).
But when we check the master we can see that the logic was moved in a function nammed cancelHandler(ServerCallStreamObserver<PredictionResponse> responseObserver). And we're not returning anymore.

This regression was introduce in #2513. I'm right or I just don't understand the logic ?

@MaelitoP MaelitoP reopened this Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants