Unload Models on GPU on Run #93

JRegimbal · 2021-08-09T18:39:56Z

This is a possible approach to #85. To avoid the constant high use of memory, models should free the allocated GPU memory after doing preprocessing. Since preprocessors currently don't run in parallel, this should avoid OOM problems. This currently is necessary on the following models:

Object Detection
Semantic Segmentation

The latency added by loading/unloading the models on request should be measured as well.

JRegimbal · 2021-08-09T18:40:24Z

@rohanakut please add anything missing and assign this to whoever is best suited for it.

gp1702 · 2021-08-12T19:34:23Z

As for now, using smaller models seems to be the only option for reducing the memory requirements, and as mentioned previously, this would come at the cost of accuracy.

For e.g.,: the performance table on the following page compares the accuracy of the (smaller) mobilenet models with other large models: https://github.com/CSAILVision/semantic-segmentation-pytorch.

JRegimbal · 2021-08-13T17:03:02Z

So you're saying that unloading after run won't free enough memory? Most of these models won't be running concurrently, at least for now.

gp1702 · 2021-08-13T17:38:01Z

Unloading these models will free up the GPU memory, but it will increase the latency in generating outputs for future queries. Testing the latency would make sense once the models are integrated in general pipeline of the project, standalone latency of these models is already tested and is provided in the link in my previous message.

JRegimbal · 2021-08-13T19:22:55Z

The semseg and object detection models are already integrated into the pipeline. What models are currently blocking?

jeffbl · 2021-08-16T18:37:12Z

Assigning also to @SiddharthRaoA based on tech-arch meeting today, since he has experience doing this. Again, this is probably a solution for the "bandaid" category, since on production, with simultaneous requests from different users, models will probably have to be loaded all the time. But this could be a long-term solution for test servers and local testing, so is still useful even if it cannot be used in production.

SiddharthRaoA · 2021-08-18T12:44:57Z

I tried to reduce the memory consumption of the chart processor by loading the models only when they are needed, and unloading them after processing.

(For context, the chart pipeline has 5 models, consisting of 1 chart-type classifier and 4 type specific models. The line chart category has 2 type specific models, while the bar and pie chart categories have 1 each. Hence line charts require the heaviest processing.)

The chart-type classifier is always kept loaded, while the type specific models are loaded only when required. This ensures that the type specific models occupy GPU memory only when the request contains a chart of that category. In idle state, only the chart classifier is running on the GPU. I tried the following 2 options for this:

The type specific models reside as .pt or ,pkl files and are loaded from this when needed. They don't take up any RAM or GPU memory when not being used. However, loading models from these files is quite slow which increases the processing time.
The type specific models reside in the RAM when not being used. They are simply moved from the RAM to the GPU when needed. This is much faster than loading from .pth or .pkl files, but also consumes more RAM. There also seems to be a memory leak issue with this option, where the RAM usage keeps building up until everything is exhausted. I haven't found I fix for this yet, but we'll need to find one soon if we're going ahead with this option.

Note: Both these methods clear the GPU cache after a model is done using it, to prevent PyTorch from holding on to the memory even after use.

Below is the processing time and memory usage for both options, when tested on the line charts category (the category requiring the most processing time). Also, these results were obtained on Unicorn, and the models are bound to be much faster on Bach.

Mode	Response time	RAM usage	Idle GPU usage	Peak GPU usage (less than a few ms)
1	8-9s	4 GB	1.5 GB	3.2 GB
2	6s	---	1.5 GB	3.2 GB

jeffbl · 2022-04-21T21:06:44Z

Assigning to @rianadutta for consideration.

JRegimbal assigned gp1702 Aug 11, 2021

jeffbl assigned SiddharthRaoA Aug 16, 2021

SiddharthRaoA mentioned this issue Aug 22, 2021

Memory leak on RAM for chart models #99

Closed

jeffbl assigned rianadutta and unassigned gp1702 and SiddharthRaoA Apr 21, 2022

jeffbl mentioned this issue Feb 7, 2024

STORY: As a developer bringing up my own IMAGE server, I want to minimize the resource requirements necessary to run the complete IMAGE server stack, so that I can serve as many clients as possible with the lowest possible cost. #804

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unload Models on GPU on Run #93

Unload Models on GPU on Run #93

JRegimbal commented Aug 9, 2021

JRegimbal commented Aug 9, 2021

gp1702 commented Aug 12, 2021

JRegimbal commented Aug 13, 2021

gp1702 commented Aug 13, 2021 •

edited

Loading

JRegimbal commented Aug 13, 2021

jeffbl commented Aug 16, 2021

SiddharthRaoA commented Aug 18, 2021

jeffbl commented Apr 21, 2022

Unload Models on GPU on Run #93

Unload Models on GPU on Run #93

Comments

JRegimbal commented Aug 9, 2021

JRegimbal commented Aug 9, 2021

gp1702 commented Aug 12, 2021

JRegimbal commented Aug 13, 2021

gp1702 commented Aug 13, 2021 • edited Loading

JRegimbal commented Aug 13, 2021

jeffbl commented Aug 16, 2021

SiddharthRaoA commented Aug 18, 2021

jeffbl commented Apr 21, 2022

gp1702 commented Aug 13, 2021 •

edited

Loading