Support gemma model in pytorch engine #1184

grimoire · 2024-02-22T11:09:57Z

Fix llama on transformers 4.38
Support Gemma
Support mistral and sliding window attention #1075

README.md

lmdeploy/cli/cli.py

zhyncs · 2024-02-22T12:16:52Z

I've tested it with profile_restful_api.py and profile_throughput.py for google/gemma-2b-it and it works well.

zhyncs · 2024-02-22T12:24:21Z

Hi @grimoire Have you compared the output results with those of the transformers?

grimoire · 2024-02-22T12:49:50Z

Hi @grimoire Have you compared the output results with those of the transformers?

I have align the GemmaDecodeLayer with my debug tools. The patched module can be aligned with the transformers module (without Patched RMS norm). Since the only different between LlamaModel and GemmaModel is hidden_states = hidden_states * (self.config.hidden_size**0.5), I think that should be enough.

The patched RMS norm looks OK in chat API, I guess the mismatch of the norm is caused by the scaling of hidden_states.

zhyncs · 2024-02-23T11:12:05Z

Hi @grimoire Please resolve the conflicts.

lvhan028 · 2024-02-28T06:56:33Z

@zhulinJulia24 may perform regression tests for pytorch engine

lvhan028 · 2024-02-28T10:19:45Z

UT failed

>           assert q.stride() == out_q.stride()
E           AttributeError: 'NoneType' object has no attribute 'stride'

Is it this PR's issue or our runner's issue?
@RunningLeon may check the runner.

RunningLeon · 2024-02-28T12:02:17Z

Tested Ok on with gemma and llma2

chat cli
pipeline
api_server

RunningLeon

LGTM

zhulinJulia24 · 2024-02-29T01:08:14Z

Error when running Baichuan2-7B-Chat model. It works in main branch.

script:

from lmdeploy import pipeline, GenerationConfig, PytorchEngineConfig

backend_config = PytorchEngineConfig(session_len=2048)
gen_config = GenerationConfig(top_p=0.8,
                              top_k=40,
                              temperature=0.8,
                              max_new_tokens=1024)
pipe = pipeline('/nvme/qa_test_models/Baichuan2-7B-Chat',
                backend_config=backend_config)
prompts = [[{
    'role': 'user',
    'content': 'Hi, pls intro yourself'
}], [{
    'role': 'user',
    'content': 'Shanghai is'
}]]
response = pipe(prompts, gen_config=gen_config)
print(response)

error callstack.

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.
/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Exception in thread Thread-1 (loop):
Traceback (most recent call last):
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/threading.py", line 946, in run
    self._target(*self._args, **self._kwargs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 874, in loop
    step_tokens: Dict[int, InferOutput] = self.step(
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 668, in step
    output = self._model_forward(inputs,
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 641, in _model_forward
    return __forward(inputs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 583, in __forward
    return self.model_agent.forward(inputs,
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 522, in forward
    output = model_forward(
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 335, in model_forward
    output = patched_model.patched_forward(
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/models/patch.py", line 239, in __call__
    output = self._model(*args, **kwargs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zhulin1/.cache/huggingface/modules/transformers_modules/Baichuan2-7B-Chat/modeling_baichuan.py", line 686, in forward
    outputs = self.model(
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/models/baichuan.py", line 470, in forward
    return self._continuous_batching_forward_7b(
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/models/baichuan.py", line 349, in _continuous_batching_forward_7b
    layer_outputs = decoder_layer(
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zhulin1/.cache/huggingface/modules/transformers_modules/Baichuan2-7B-Chat/modeling_baichuan.py", line 273, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/models/baichuan.py", line 87, in forward
    return self._contiguous_batching_forward(
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/models/baichuan.py", line 142, in _contiguous_batching_forward
    query_states, key_states, value_states = _rotary_emb_fn(
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/models/baichuan.py", line 131, in _rotary_emb_fn
    query_states, key_states = apply_rotary_pos_emb(
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/models/llama.py", line 43, in apply_rotary_pos_emb
    cos = cos.unsqueeze(unsqueeze_dim)
TypeError: unsqueeze(): argument 'dim' (position 1) must be int, not Tensor
2024-02-29 09:04:07,433 - lmdeploy - ERROR - Engine main loop stopped.

zhulinJulia24 · 2024-02-29T01:17:42Z

Gemma model need transformers>=4.38.1, Should we put a notice on readme?

when transformers == 4.37.1 error is:

Traceback (most recent call last):
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1117, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 813, in __getitem__
    raise KeyError(key)
KeyError: 'gemma'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zhulin1/reproduceNew.py", line 8, in <module>
    pipe = pipeline('/nvme/qa_test_models/gemma-7b-it',
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/api.py", line 62, in pipeline
    return AsyncEngine(model_path,
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 75, in __init__
    self._build_pytorch(model_path=model_path,
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/serve/async_engine.py", line 203, in _build_pytorch
    self.engine = Engine(model_path=model_path,
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/engine/engine.py", line 133, in __init__
    self.model_agent = AutoModelAgent.from_pretrained(
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 427, in from_pretrained
    return build_model_agent(pretrained_model_name_or_path,
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/engine/model_agent.py", line 1023, in build_model_agent
    model_config = ModelConfig.from_pretrained(
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/lmdeploy/pytorch/config.py", line 72, in from_pretrained
    hf_config = AutoConfig.from_pretrained(
  File "/home/zhulin1/miniconda3/envs/lmdeployv23/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1119, in from_pretrained
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `gemma` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

grimoire · 2024-02-29T02:44:59Z

Gemma model need transformers>=4.38.0, Should we put a notice on readme?

lmdeploy/requirements/runtime.txt

Line 14 in a6c2524

transformers>=4.33.0,<=4.38.1

requirements has been updated

grimoire · 2024-02-29T02:53:15Z

@zhulinJulia24

Error when running Baichuan2-7B-Chat model. It works in main branch.

Fixed.

zhulinJulia24

lgtm

cover:
- llama-2-7b-chat
- internlm-chat-7b
- internlm-chat-20b
- internlm2-chat-7b
- internlm2-chat-20b
- Baichuan2-7B-Chat
- Baichuan2-13B-Chat
- chatglm2-6b
- falcon-7b
- Yi-6B-Chat
- internlm2-1_8b
- internlm2-20b
- Qwen1.5-7B-Chat
- Mistral-7B-Instruct-v0.1
- Mixtral-8x7B-Instruct-v0.1
- gemma-7b-it
- deepseek-moe-16b-chat
tools: hf chat, resful and pipeline basic case

grimoire and others added 24 commits January 26, 2024 16:44

support yi

919f2dc

Merge branch 'main' into refactor-kv-head

1dcfabf

update docs

8f6fd3d

add kernel

af0dc04

Merge branch 'main' into win-attn

1336594

fix seq

b605813

add win block manager

5508dd6

finish mistral

a746668

fix drop block

952b964

update docs

6b7cd3b

fix ut

f6635df

fix for transformers 4.37.1

e346c81

update docs

2ba83ff

Merge branch 'main' into win-attn

61fcf45

Merge branch 'main' into win-attn

7dfc129

merge main

9ecb5af

mistral template

12ebf7b

readme-zh-cn

09124f4

remove print

d847f83

Merge branch 'main' into win-attn

f584453

remove div up

196ab3c

fix llama

9101c78

add fused_rotary_emb

0139c50

support gemma

10e9bea

grimoire added the enhancement New feature or request label Feb 22, 2024

grimoire linked an issue Feb 22, 2024 that may be closed by this pull request

[Feature] support google gemma #1178

Closed

zhyncs reviewed Feb 22, 2024

View reviewed changes

README.md Outdated Show resolved Hide resolved

zhyncs reviewed Feb 22, 2024

View reviewed changes

lmdeploy/cli/cli.py Outdated Show resolved Hide resolved

update readme/docs

e1bd483

grimoire added 2 commits February 23, 2024 11:55

solve conflict

5588881

solve conflict

28216c0

grimoire added 3 commits February 23, 2024 20:06

solve conflict

b7d07f8

merge main

0def286

merge main

eeef90e

lvhan028 self-requested a review February 28, 2024 06:55

fix ut

a6c2524

RunningLeon mentioned this pull request Feb 28, 2024

Auto backend for pipeline and serve when backend is not set to pytorch explicitly #1211

Merged

RunningLeon approved these changes Feb 28, 2024

View reviewed changes

fix baichuan

41558ef

zhulinJulia24 approved these changes Feb 29, 2024

View reviewed changes

lvhan028 changed the title ~~Torch Gemma~~ Support gemma model in pytorch engine Feb 29, 2024

lvhan028 merged commit 456055f into InternLM:main Feb 29, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support gemma model in pytorch engine #1184

Support gemma model in pytorch engine #1184

grimoire commented Feb 22, 2024 •

edited by lvhan028

zhyncs commented Feb 22, 2024

zhyncs commented Feb 22, 2024

grimoire commented Feb 22, 2024 •

edited

zhyncs commented Feb 23, 2024

lvhan028 commented Feb 28, 2024

lvhan028 commented Feb 28, 2024

RunningLeon commented Feb 28, 2024

RunningLeon left a comment

zhulinJulia24 commented Feb 29, 2024

zhulinJulia24 commented Feb 29, 2024 •

edited

grimoire commented Feb 29, 2024

grimoire commented Feb 29, 2024

zhulinJulia24 left a comment

Support gemma model in pytorch engine #1184

Support gemma model in pytorch engine #1184

Conversation

grimoire commented Feb 22, 2024 • edited by lvhan028

zhyncs commented Feb 22, 2024

zhyncs commented Feb 22, 2024

grimoire commented Feb 22, 2024 • edited

zhyncs commented Feb 23, 2024

lvhan028 commented Feb 28, 2024

lvhan028 commented Feb 28, 2024

RunningLeon commented Feb 28, 2024

RunningLeon left a comment

Choose a reason for hiding this comment

zhulinJulia24 commented Feb 29, 2024

zhulinJulia24 commented Feb 29, 2024 • edited

grimoire commented Feb 29, 2024

grimoire commented Feb 29, 2024

zhulinJulia24 left a comment

Choose a reason for hiding this comment

grimoire commented Feb 22, 2024 •

edited by lvhan028

grimoire commented Feb 22, 2024 •

edited

zhulinJulia24 commented Feb 29, 2024 •

edited