Fix memory leak of DLManagedTensor #1361

ispobock · 2024-03-28T00:55:36Z

Motivation

This test code and valgrind can help debug this issue:

import asyncio
from lmdeploy import GenerationConfig, TurbomindEngineConfig, pipeline

engine_config = TurbomindEngineConfig(tp=1)
engine = pipeline(model_path='/workdir/llama2_13b_chat',
                model_name='llama2',
                backend_config=engine_config)

async def infer(prompt, session_id):
    async for output in engine.generate(messages=prompt,
                                    session_id=session_id,
                                    stream_response=True,
                                    do_preprocess=False):

        print(output.response)


prompts = ['hi']*10

for i, prompt in enumerate(prompts):
    asyncio.run(infer(prompt, i))

valgrind --leak-check=full python test_lmdeploy.py > log.txt 2>&1

There are some definitely lost in bind.cpp:125 and bind.cpp:308, that's about DLManagedTensor.

After this change,

rerun the valgrind test, there is no memory leak of DLManagedTensor.
test the api_server with 5000 prompts, the memory usage will increase from 3110M to 3330M.

Modification

remove the unique pointer and add the deleter for DLManagedTensor

ispobock · 2024-03-28T01:56:41Z

For 20000 prompts, the memory usage will be stable at 3515M.

zhyncs · 2024-03-28T02:02:08Z

LGTM

ispobock · 2024-03-28T02:08:12Z

@AllentDan @grimoire could you help review?

zhyncs · 2024-03-28T03:55:07Z

After this fix, it is confirmed that the memory leak issue of the API Server has been resolved. May you take a look? Thanks. @lvhan028 @grimoire @AllentDan

python3 -m lmdeploy serve api_server /workdir/llama2_13b_chat

python3 benchmark/profile_restful_api.py --server_addr 127.0.0.1:23333 --tokenizer_path /workdir/llama2_13b_chat --dataset /workdir/ShareGPT_V3_unfiltered_cleaned_split.json --concurrency 128 --num_prompts 50000

ispobock added 2 commits March 28, 2024 01:09

fix memory leak

cd320ef

format

22b709d

grimoire requested a review from AllentDan March 28, 2024 03:56

lzhangzz approved these changes Mar 28, 2024

View reviewed changes

lvhan028 added the Bug:P1 label Mar 28, 2024

lvhan028 approved these changes Mar 28, 2024

View reviewed changes

lvhan028 merged commit 971d81c into InternLM:main Mar 28, 2024
9 checks passed

ispobock mentioned this pull request Mar 28, 2024

[Bug] Memory leak for api_server #1334

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory leak of DLManagedTensor #1361

Fix memory leak of DLManagedTensor #1361

ispobock commented Mar 28, 2024

ispobock commented Mar 28, 2024

zhyncs commented Mar 28, 2024

ispobock commented Mar 28, 2024

zhyncs commented Mar 28, 2024

Fix memory leak of DLManagedTensor #1361

Fix memory leak of DLManagedTensor #1361

Conversation

ispobock commented Mar 28, 2024

Motivation

Modification

ispobock commented Mar 28, 2024

zhyncs commented Mar 28, 2024

ispobock commented Mar 28, 2024

zhyncs commented Mar 28, 2024