-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance with cortex local server is 6X slower than when model is run from jupyter notebook #1774
Comments
@lminer can you please try it out on a notebook with the same GPU used with cortex? T4 GPUs are considerably slower than an RTX 2080 TI. |
@miguelvr I already have. That number is for the RTX 2080 ti. 6X is the difference between running the model in the server on my local machine and running the model in a jupyter notebook. |
@lminer Another possibility is insufficient system memory (assuming the memories on the GPUs are equivalent). It might be worth trying on e.g. |
@deliahu probably best to forget about the AWS cluster for now. This is an issue on my local box, which has 90 GB of ram and 2 GPUs, and a threadripper 1950. If I run inference on the model in a jupyter notebook, it is > 6X times faster than if I spin up a local cortex server and run inference through the server. When I run the model through the server, I only see ~10 seconds when the GPU is actually being used. The rest of the time both the CPU and GPU are idle. |
@lminer It would be worth profiling where the time is going. An easy first step is to add a few log statements within your Also, since local mode has been removed going forward, and because local does not have exactly the same architecture as when running in the cluster, it would probably be best to check on the cluster. |
@deliahu I did some logging and it looks like the slowdown is more like 10X. 124 seconds is spent in If I look at GPU utilization during the 124 seconds, it appears as if GPU is only being used for ~10 seconds. I'm wondering if this issue relates to #1740. I'm passing 40 mb inputs to tensorflow-serving and back, and maybe it's throttling on this for some reason. Incidentally, the issue is the same on the cluster. |
@lminer yes that could be it, although 124 seconds still seems high to me for this kind of issue. Maybe it has something to do with how the tensor is created/encoded before sending off to TF Serving? Although you did mention that CPU is idle too... One alternative you could look into is using Cortex's Python predictor type instead of the TensorFlow type (then there would not be an extra hop). How are you importing your model and running inference in your notebook, and would that be easily transferrable to the PythonPredictor? In the Python predictor, you would pass in the path to your model in the API's config field, and download/load it on |
@lminer I'm glad to hear you got it working! I'd like to keep this issue open for now, since I have one more theory I'd like to try out (it still seems like it takes too long if it were only a matter of networking). When you were passing in the data via |
@deliahu Unfortunately I can't send along the model. The data object I was sending was as follows:
|
@lminer just to confirm, was |
Yeah, that's correct. The same thing holds with a numpy array as well. |
Closing issue because cortex local is no longer supported. |
I'm finding that my tensorflow model is ~6X slower when run from the local server than when it is run from a jupyter notebook. I've checked nvtop when the model is running and it appears that the GPU is being used although only for a very brief portion of the overall time. I've also tried running the model in bentoml. In that case it's also slower, but only 3X. Speeds are comparably when I run from AWS, although I'm using a T4 in that case rather than the rtx 2080 ti that I use locally. Any suggestions on how I might diagnose the cause of the slowdown? Here are my config files:
The text was updated successfully, but these errors were encountered: