Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The format of the inference result after model merging is not consistent with the base model inference format #297

Open
Double-bear opened this issue Apr 29, 2024 · 1 comment

Comments

@Double-bear
Copy link

hello,I am using two llama3 8B models for merging, the base model is a new model I got after sft on the llama3 8b base model. At the time of sft I modified the format of the model inference the output format of this new model:

<|SYSTEM|>{system_messages}<|USER|>{user_content}<|ASSISTANT|><|CONTENT|>{assistant_message}

The prompt is :

<|SYSTEM|>{system_messages}<|USER|>{user_content}<|ASSISTANT|>

and the generation is :

<|CONTENT|>{assistant_message}

Where <|SYSTEM|>, <|USER|>, <|ASSISTANT|>, <|CONTENT|> are replacing the extra_token in the llama3 vocabulary list, and then I merged this new model with the llama3 8B instruct model, the yaml file is as follows:

slices:
  - sources:
      - model: /code/xx/LLM_mine/model/fc_model/llama3_hf/8b_stage2_final
        layer_range: [0, 32]
      - model: /code/xx/LLM_mine/model/Llama3/Meta-Llama-3-8B-Instruct
        layer_range: [0, 32]
merge_method: slerp
base_model: /code/xx/LLM_mine/model/fc_model/llama3_hf/8b_stage2_final
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
  embed_slerp: true  
dtype: float16
mergekit-yaml /code/xx/LLM_mine/reference/merge.yaml /code/xx/LLM_mine/model/merge/llama3-instruction-fc_sft-2 --allow-crimes --copy-tokenizer

But when I reason about the merged model, the reasoning format changes, the output starts with <|CONTENT|>, but becomes content, is there any good way to avoid this?

Also, may I ask why the merged model is smaller than the original model? It feels like there is a file loss.
merged model:
image
base model:
image
Is there something wrong with me?

@cg123
Copy link
Collaborator

cg123 commented May 9, 2024

You may want to play with the interpolation factor for lm_head. Try something along these lines:

slices:
  - sources:
      - model: /code/xx/LLM_mine/model/fc_model/llama3_hf/8b_stage2_final
        layer_range: [0, 32]
      - model: /code/xx/LLM_mine/model/Llama3/Meta-Llama-3-8B-Instruct
        layer_range: [0, 32]
merge_method: slerp
base_model: /code/xx/LLM_mine/model/fc_model/llama3_hf/8b_stage2_final
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - filter: lm_head
      value: 0 # use lm_head from 8b_stage2_final
    - value: 0.5
  embed_slerp: true  
dtype: float16

The later layers are also pretty involved in what output format a model uses, so I'd encourage playing with the self_attn and mlp values as well.

As for the size change - the four-shard model is probably saved in fp32. Your config specifies fp16 as the output dtype so it'll be about half the size. Most LLMs are stored in fp16 or bf16 so I wouldn't worry about loss of precision. If you want to keep full 32 bit precision, though, you can change it to dtype: float32.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants