Skip to content

Convert_checkpoint.py failed with LLAMA 3.1 8B instruct #2105

@ShuaiShao93

Description

@ShuaiShao93

System Info

Debian 11

Who can help?

@byshiue @juney-nvidia

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

$ git clone https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

$ python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir ./Meta-Llama-3.1-8B-Instruct --output_dir ./tllm_checkpoint_1gpu_bf16 --dtype bfloat16

Expected behavior

convert_checkpoint.py should work with llama 3.1

actual behavior

[TensorRT-LLM] TensorRT-LLM version: 0.11.0
0.11.0
Traceback (most recent call last):
  File "/home/ss/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 461, in <module>
    main()
  File "/home/ss/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 453, in main
    convert_and_save_hf(args)
  File "/home/ss/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 378, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/home/ss/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 402, in execute
    f(args, rank)
  File "/home/ss/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 367, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 328, in from_hugging_face
    model = LLaMAForCausalLM(config)
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 361, in __call__
    obj = type.__call__(cls, *args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 267, in __init__
    transformer = LLaMAModel(config)
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 211, in __init__
    self.layers = DecoderLayerList(LLaMADecoderLayer, config)
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 289, in __init__
    super().__init__([cls(config, idx) for idx in self.layer_list])
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 289, in <listcomp>
    super().__init__([cls(config, idx) for idx in self.layer_list])
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 51, in __init__
    self.attention = Attention(
  File "/opt/conda/lib/python3.10/site-packages/tensorrt_llm/layers/attention.py", line 347, in __init__
    assert rotary_embedding_scaling["type"] in ["linear", "dynamic"]
KeyError: 'type'

additional notes

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    not a bugSome known limitation, but not a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions