fix export turbomind weight bug #1140

zhyncs · 2024-02-07T08:48:54Z

Motivation

fix export Turbomind weight bug

In the process of developing Medusa parameter conversion, we referred to the implementation of export_weight and found that there was a bug with the implementation of parameter conversion in bf16. This fix was proposed by the new grad named baozhiwei in my group.

Modification

use tensor.to rather than tensor.view

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
The documentation has been modified accordingly, like docstring or example tutorials.

zhyncs · 2024-02-07T08:49:29Z

Hi @lvhan028 @grimoire @AllentDan May you help review this pr? Thanks.

irexyc · 2024-02-07T10:00:08Z

这个地方没问题，因为numpy没有bfloat16的类型，所以如果想保存bin的话，只能以view的形式保存。

RunningLeon · 2024-02-07T10:01:37Z

@zhyncs FYI, https://pytorch.org/docs/stable/generated/torch.Tensor.view.html#torch-tensor-view

zhyncs · 2024-02-07T10:07:02Z

这个地方没问题，因为numpy没有bfloat16的类型，所以如果想保存bin的话，只能以view的形式保存。

tensor.view only changes the shape and cannot do type conversion from bf16 to half

irexyc · 2024-02-07T10:08:42Z

lmdeploy/lmdeploy/turbomind/deploy/target_model/base.py

Lines 189 to 195 in c48d32d

    
           if self.to_file: 
        
               if torch.is_floating_point(param): 
        
                   torch_type = _weight_dtype_map(self.cfg.weight_type, 
        
                                                  torch.float16) 
        
                   param = param.to(torch_type) 
        
               tprint(name, param.shape) 
        
               _tofile(param, osp.join(self.out_dir, name))

看下191行，这里本来就是要保存bfloat16

zhyncs · 2024-02-07T10:50:38Z

lmdeploy/lmdeploy/turbomind/deploy/target_model/base.py

Lines 189 to 195 in c48d32d

if self.to_file:

if torch.is_floating_point(param):

torch_type = _weight_dtype_map(self.cfg.weight_type,

torch.float16)

param = param.to(torch_type)

tprint(name, param.shape)

_tofile(param, osp.join(self.out_dir, name))

看下191行，这里本来就是要保存bfloat16

It seems that we had some misunderstandings.

Base model lmsys/vicuna-13b-v1.3 uses half and FasterDecoding/medusa-vicuna-13b-v1.3 uses bfloat16. And when we load medusa weights, because the llama weight_type field is fp16, so

lmdeploy/src/turbomind/utils/memory_utils.cu

Lines 588 to 593 in 2831dc2

    
           #ifdef ENABLE_BF16 
        
           template int loadWeightFromBin(__nv_bfloat16*            ptr, 
        
                                          std::vector<size_t>       shape, 
        
                                          std::string               filename, 
        
                                          FtCudaDataType            model_file_type, 
        
                                          std::vector<ConcateSlice> slices);

this is not invoked. And that's why medusa weights are wrong.
So we need to add medusa_weight_type field in ini to solve this problem or just convert and save Medusa weights with half directly.

fix export turbomind weight bug

c48d32d

RunningLeon requested review from irexyc and RunningLeon February 7, 2024 09:57

RunningLeon added bug Something isn't working and removed bug Something isn't working labels Feb 7, 2024

zhyncs closed this Feb 7, 2024

zhyncs deleted the patch-3 branch February 20, 2024 08:16

zhyncs mentioned this pull request Jun 6, 2024

[Bug] Why does prefix caching change the generated content #1719

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix export turbomind weight bug #1140

fix export turbomind weight bug #1140

zhyncs commented Feb 7, 2024

zhyncs commented Feb 7, 2024

irexyc commented Feb 7, 2024

RunningLeon commented Feb 7, 2024

zhyncs commented Feb 7, 2024

irexyc commented Feb 7, 2024

zhyncs commented Feb 7, 2024

fix export turbomind weight bug #1140

fix export turbomind weight bug #1140

Conversation

zhyncs commented Feb 7, 2024

Motivation

Modification

Checklist

zhyncs commented Feb 7, 2024

irexyc commented Feb 7, 2024

RunningLeon commented Feb 7, 2024

zhyncs commented Feb 7, 2024

irexyc commented Feb 7, 2024

zhyncs commented Feb 7, 2024