The format of the inference result after model merging is not consistent with the base model inference format #297

Double-bear · 2024-04-29T09:50:58Z

hello，I am using two llama3 8B models for merging, the base model is a new model I got after sft on the llama3 8b base model. At the time of sft I modified the format of the model inference the output format of this new model:

<|SYSTEM|>{system_messages}<|USER|>{user_content}<|ASSISTANT|><|CONTENT|>{assistant_message}

The prompt is :

<|SYSTEM|>{system_messages}<|USER|>{user_content}<|ASSISTANT|>

and the generation is :

<|CONTENT|>{assistant_message}

slices:
  - sources:
      - model: /code/xx/LLM_mine/model/fc_model/llama3_hf/8b_stage2_final
        layer_range: [0, 32]
      - model: /code/xx/LLM_mine/model/Llama3/Meta-Llama-3-8B-Instruct
        layer_range: [0, 32]
merge_method: slerp
base_model: /code/xx/LLM_mine/model/fc_model/llama3_hf/8b_stage2_final
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
  embed_slerp: true  
dtype: float16

mergekit-yaml /code/xx/LLM_mine/reference/merge.yaml /code/xx/LLM_mine/model/merge/llama3-instruction-fc_sft-2 --allow-crimes --copy-tokenizer

But when I reason about the merged model, the reasoning format changes, the output starts with <|CONTENT|>, but becomes content, is there any good way to avoid this?

Also, may I ask why the merged model is smaller than the original model? It feels like there is a file loss.
merged model:

base model:

Is there something wrong with me?

The text was updated successfully, but these errors were encountered:

cg123 · 2024-05-09T19:46:47Z

You may want to play with the interpolation factor for lm_head. Try something along these lines:

slices:
  - sources:
      - model: /code/xx/LLM_mine/model/fc_model/llama3_hf/8b_stage2_final
        layer_range: [0, 32]
      - model: /code/xx/LLM_mine/model/Llama3/Meta-Llama-3-8B-Instruct
        layer_range: [0, 32]
merge_method: slerp
base_model: /code/xx/LLM_mine/model/fc_model/llama3_hf/8b_stage2_final
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - filter: lm_head
      value: 0 # use lm_head from 8b_stage2_final
    - value: 0.5
  embed_slerp: true  
dtype: float16

The later layers are also pretty involved in what output format a model uses, so I'd encourage playing with the self_attn and mlp values as well.

As for the size change - the four-shard model is probably saved in fp32. Your config specifies fp16 as the output dtype so it'll be about half the size. Most LLMs are stored in fp16 or bf16 so I wouldn't worry about loss of precision. If you want to keep full 32 bit precision, though, you can change it to dtype: float32.

Hope this helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The format of the inference result after model merging is not consistent with the base model inference format #297

The format of the inference result after model merging is not consistent with the base model inference format #297

Double-bear commented Apr 29, 2024

cg123 commented May 9, 2024

The format of the inference result after model merging is not consistent with the base model inference format #297

The format of the inference result after model merging is not consistent with the base model inference format #297

Comments

Double-bear commented Apr 29, 2024

cg123 commented May 9, 2024