fix multinomial sampling #1228

grimoire · 2024-03-02T06:17:37Z

No description provided.

lvhan028 · 2024-03-02T15:39:13Z

pipeline test result:

falcon-7b tp=2 failed

2024-03-02 15:45:58,440 - lmdeploy - ERROR - rank[0] failed with error: CUDA out of memory. Tried to allocate 1.73 GiB. GPU 0 has a total capacty of 79.21 GiB of which 1.47 GiB is free. Process 140034 has 1016.00 MiB memory in use. Process 140106 has 76.75 GiB memory in use. Of the allocated memory 74.76 GiB is allocated by PyTorch, and 424.71 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

internlm-chat-7b，tp=2, repetition_penalty=1.002 failed
internlm-chat-20b，tp=2, repetition_penalty=1.002 failed
baichuan2/Baichuan2-7B-Chat， tp=2, repetition_penalty=1.002 failed
baichuan2/Baichuan2-13B-Chat，tp=2, repetition_penalty=1.002 failed
chatglm2-6b，tp=2, repetition_penalty=1.002 failed
chatglm3-6b，tp=2, repetition_penalty=1.002 failed
gemma，tp=2, repetition_penalty=1.002 failed

File "/workspace/lmdeploy/lmdeploy/pytorch/engine/logits_process.py", line 49, in _process_repetition_penalty
    scores.scatter_(1, input_ids, score)
RuntimeError: scatter(): Expected self.dtype to be equal to src.dtype
2024-03-02 15:56:07,688 - lmdeploy - ERROR - Engine main loop stopped.

Share pipeline test script:

from lmdeploy import pipeline, GenerationConfig, PytorchEngineConfig
import multiprocessing

models = [
    ('llama2', '/workspace/models-140/llama2/huggingface/llama-2-7b-chat/'),
    ('llama2', '/workspace/models-140/llama2/huggingface/llama-2-13b-chat/'),
    ('internlm2-chat-7b', '/workspace/models-140/InternLM/internlm2-chat-7b'),
    ('internlm2-chat-20b', '/workspace/models-140/InternLM/internlm2-chat-20b'),
    ('internlm-chat-7b', '/workspace/models-140/InternLM/internlm-chat-7b'),
    ('internlm-chat-20b', '/workspace/models-140/InternLM/internlm-chat-20b'),
    # ('qwen-7b', '/workspace/models-140/Qwen/Qwen-7B-Chat/'), # not supported yet
    # ('qwen-14b', '/workspace/models-140/Qwen/Qwen-14B-Chat/'), # not supported yet
    ('qwen1.5', '/workspace/models-140/Qwen/Qwen1.5-7B-Chat/'),
    # ('baichuan', '/workspace/models-140/baichuan/Baichuan-13B-Chat/'), # transformers 版本太高
    ('baichuan2', '/workspace/models-140/baichuan2/Baichuan2-7B-Chat/'),
    ('baichuan2', '/workspace/models-140/baichuan2/Baichuan2-13B-Chat/'),
    ('codellama', '/workspace/models-140/codellama/CodeLlama-7b-Instruct-hf/'),
    ('chatglm2', '/workspace/models-140/chatglm2-6b/'),
    ('chatglm3', '/workspace/models-140/chatglm3-6b/'),
    ('falcon', '/workspace/models-142/models/falcon-7b-instruct/'),
    ('yi', '/workspace/models-140/Yi/Yi-34B-Chat/'),
    ('mistral', '/workspace/models-140/mistralai/models--mistralai--Mistral-7B-Instruct-v0.1/snapshots/9ab9e76e2b09f9f29ea2d56aa5bd139e4445c59e'),
    ('deepseek', '/workspace/models-140/deepseek/deepseek-coder-1.3b-instruct'),
    ('mixtral', '/workspace/models-140/mistralai/Mixtral-8x7B-Instruct-v0.1/'),
    ('gemma', '/workspace/models-140/Gemma/gemma-7b-it')
]


def test_pipeline(model_path, prompts, **kwargs):
    print(f'-- start to test model: {model_path}')
    try:
        if kwargs:
            print(f'kwargs: {kwargs}')
            backend_config=PytorchEngineConfig()
            gen_config=GenerationConfig()
            for k, v in kwargs.items():
                if hasattr(backend_config, k):
                    setattr(backend_config, k, v)
                if hasattr(gen_config, k):
                    setattr(gen_config, k, v)
            print(backend_config)
        else:
            print(f'empty kwargs')
            backend_config=PytorchEngineConfig()
            gen_config = None
        pipe = pipeline(model_path, backend_config=backend_config, log_level='INFO')
        response = pipe(prompts, gen_config=gen_config)
        print(response)
        print(f'-- test successfully')
    except Exception as e:
        print(f'-- test model failed with {e}')
        raise(RuntimeError, 'build pipe failed')


if __name__ == '__main__':
    # pytorch engine default parameters
    for model_name, model_path in models:
        args = (model_path, ["Hi, pls intro yourself", "Shanghai is"], )
        if model_name == 'mixtral':
            # at least 2 GPUs are required
            continue
        proc = multiprocessing.Process(target=test_pipeline, args=args)
        proc.start()
        proc.join()

    # pytorch engine tp
    for _, model_path in models:
        args = (model_path, ["Hi, pls intro yourself", "Shanghai is"], )
        proc = multiprocessing.Process(target=test_pipeline, args=args, kwargs=dict(tp=2))
        proc.start()
        proc.join()

    # generate config
    for _, model_path in models:
        args = (model_path, ["Hi, pls intro yourself", "Shanghai is"], )
        proc = multiprocessing.Process(
            target=test_pipeline, 
            args=args, 
            kwargs=dict(tp=2,
                        top_k=40,
                        top_p=0.8,
                        temperature=0.6,
                        repetition_penalty=1.002)
            )
        proc.start()
        proc.join()

grimoire · 2024-03-03T04:05:50Z

falcon tp error would be fixed in other pr

fix

6e2c618

grimoire added the Bug:P0 label Mar 2, 2024

grimoire requested a review from lvhan028 March 2, 2024 06:17

grimoire mentioned this pull request Mar 2, 2024

Auto backend for pipeline and serve when backend is not set to pytorch explicitly #1211

Merged

lvhan028 requested a review from RunningLeon March 2, 2024 12:49

lvhan028 approved these changes Mar 2, 2024

View reviewed changes

fix repe penal

e9a3c2e

lvhan028 merged commit 79ac87b into InternLM:main Mar 3, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix multinomial sampling #1228

fix multinomial sampling #1228

grimoire commented Mar 2, 2024

lvhan028 commented Mar 2, 2024 •

edited

grimoire commented Mar 3, 2024

fix multinomial sampling #1228

fix multinomial sampling #1228

Conversation

grimoire commented Mar 2, 2024

lvhan028 commented Mar 2, 2024 • edited

grimoire commented Mar 3, 2024

lvhan028 commented Mar 2, 2024 •

edited