Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak of DLManagedTensor #1361

Merged
merged 2 commits into from Mar 28, 2024
Merged

Conversation

ispobock
Copy link
Contributor

Motivation

#1334

This test code and valgrind can help debug this issue:

import asyncio
from lmdeploy import GenerationConfig, TurbomindEngineConfig, pipeline

engine_config = TurbomindEngineConfig(tp=1)
engine = pipeline(model_path='/workdir/llama2_13b_chat',
                model_name='llama2',
                backend_config=engine_config)

async def infer(prompt, session_id):
    async for output in engine.generate(messages=prompt,
                                    session_id=session_id,
                                    stream_response=True,
                                    do_preprocess=False):

        print(output.response)


prompts = ['hi']*10

for i, prompt in enumerate(prompts):
    asyncio.run(infer(prompt, i))
valgrind --leak-check=full python test_lmdeploy.py > log.txt 2>&1

There are some definitely lost in bind.cpp:125 and bind.cpp:308, that's about DLManagedTensor.

After this change,

  • rerun the valgrind test, there is no memory leak of DLManagedTensor.
  • test the api_server with 5000 prompts, the memory usage will increase from 3110M to 3330M.

Modification

  • remove the unique pointer and add the deleter for DLManagedTensor

@ispobock
Copy link
Contributor Author

For 20000 prompts, the memory usage will be stable at 3515M.

@zhyncs
Copy link
Contributor

zhyncs commented Mar 28, 2024

LGTM

@ispobock
Copy link
Contributor Author

@AllentDan @grimoire could you help review?

@zhyncs
Copy link
Contributor

zhyncs commented Mar 28, 2024

After this fix, it is confirmed that the memory leak issue of the API Server has been resolved. May you take a look? Thanks. @lvhan028 @grimoire @AllentDan

python3 -m lmdeploy serve api_server /workdir/llama2_13b_chat

python3 benchmark/profile_restful_api.py --server_addr 127.0.0.1:23333 --tokenizer_path /workdir/llama2_13b_chat --dataset /workdir/ShareGPT_V3_unfiltered_cleaned_split.json --concurrency 128 --num_prompts 50000

@grimoire grimoire requested a review from AllentDan March 28, 2024 03:56
@lvhan028 lvhan028 merged commit 971d81c into InternLM:main Mar 28, 2024
9 checks passed
@ispobock ispobock mentioned this pull request Mar 28, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants