llama3-70b int8+kv8 convert checkpoint failed on v0.10.0 branch #1814

NaNAGISaSA · 2024-06-20T08:04:32Z

System Info

CPU architecture: x86_64
GPU properties
- GPU name: NVIDIA A100
- GPU memory size: 40G
Libraries
- TensorRT-LLM branch or tag: v0.10.0
- Container used: yes, make -C docker release_build on v0.10.0 branch
NVIDIA driver version: 525.89.02
OS: Ubuntu 22.04

Who can help?

@Tracin @nv-guomingz

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

model_name=llama3_70b
hf_model_dir=/some-path/Meta-Llama-3-70B-Instruct
convert_model_dir=/some-path
trt_engine_dir=/some-path
tp_size=2 # tp_size=4 and tp_size=8 produces the same error

python3 examples/llama/convert_checkpoint.py --model_dir ${hf_model_dir}
--tp_size ${tp_size}
--workers ${tp_size}
--use_weight_only
--weight_only_precision int8
--int8_kv_cache
--dtype bfloat16
--output_dir ${convert_model_dir}/${dtype}/${tp_size}-gpu/

Expected behavior

convert success

actual behavior

[TensorRT-LLM] TensorRT-LLM version: 0.10.0
0.10.0
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [00:23<00:00, 1.27it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
Traceback (most recent call last):
File "/workspace/volume/wangchao2/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 464, in
main()
File "/workspace/volume/wangchao2/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 456, in main
convert_and_save_hf(args)
File "/workspace/volume/wangchao2/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 360, in convert_and_save_hf
LLaMAForCausalLM.quantize(args.model_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 414, in quantize
convert.quantize(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1387, in quantize
act_range, llama_qkv_para, llama_smoother = smooth_quant(
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1160, in smooth_quant
tokenizer = AutoTokenizer.from_pretrained(model_dir,
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 883, in from_pretrained
return tokenizer_class_py.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2089, in from_pretrained
return cls._from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2311, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama.py", line 169, in init
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/tokenization_llama.py", line 196, in get_spm_processor
tokenizer.Load(self.vocab_file)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/init.py", line 961, in Load
return self.LoadFromFile(model_file)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/init.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

additional notes

I also tested llama3-8b, change hf_model_dir to Meta-Llama-3-8B-Instruct, convertion is success:

[TensorRT-LLM] TensorRT-LLM version: 0.10.0
0.10.0
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.36it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
calibrating model: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 512/512 [01:25<00:00, 6.00it/s]
Weights loaded. Total time: 00:00:41
Weights loaded. Total time: 00:00:36
Total time of converting checkpoints: 00:03:31

The text was updated successfully, but these errors were encountered:

hijkzzz · 2024-06-23T07:53:59Z

Could you try the latest version TRT_LLM 0.11+
see the tutorial: https://nvidia.github.io/TensorRT-LLM/installation/linux.html

Yoh-Z · 2024-06-28T09:26:35Z

Could you try the latest version TRT_LLM 0.11+ see the tutorial: https://nvidia.github.io/TensorRT-LLM/installation/linux.html

Which commit corresponds to version 0.11.0

NaNAGISaSA added the bug Something isn't working label Jun 20, 2024

hijkzzz self-assigned this Jun 23, 2024

hijkzzz added the Investigating label Jun 23, 2024

hijkzzz closed this as completed Jun 23, 2024

hijkzzz added the waiting for feedback label Jun 23, 2024

hijkzzz reopened this Jun 23, 2024

hijkzzz removed the Investigating label Jun 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama3-70b int8+kv8 convert checkpoint failed on v0.10.0 branch #1814

llama3-70b int8+kv8 convert checkpoint failed on v0.10.0 branch #1814

NaNAGISaSA commented Jun 20, 2024 •

edited

Loading

hijkzzz commented Jun 23, 2024 •

edited

Loading

Yoh-Z commented Jun 28, 2024

llama3-70b int8+kv8 convert checkpoint failed on v0.10.0 branch #1814

llama3-70b int8+kv8 convert checkpoint failed on v0.10.0 branch #1814

Comments

NaNAGISaSA commented Jun 20, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

hijkzzz commented Jun 23, 2024 • edited Loading

Yoh-Z commented Jun 28, 2024

NaNAGISaSA commented Jun 20, 2024 •

edited

Loading

hijkzzz commented Jun 23, 2024 •

edited

Loading