You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hello,I am using two llama3 8B models for merging, the base model is a new model I got after sft on the llama3 8b base model. At the time of sft I modified the format of the model inference the output format of this new model:
Where <|SYSTEM|>, <|USER|>, <|ASSISTANT|>, <|CONTENT|> are replacing the extra_token in the llama3 vocabulary list, and then I merged this new model with the llama3 8B instruct model, the yaml file is as follows:
But when I reason about the merged model, the reasoning format changes, the output starts with <|CONTENT|>, but becomes content, is there any good way to avoid this?
Also, may I ask why the merged model is smaller than the original model? It feels like there is a file loss.
merged model:
base model:
Is there something wrong with me?
The text was updated successfully, but these errors were encountered:
The later layers are also pretty involved in what output format a model uses, so I'd encourage playing with the self_attn and mlp values as well.
As for the size change - the four-shard model is probably saved in fp32. Your config specifies fp16 as the output dtype so it'll be about half the size. Most LLMs are stored in fp16 or bf16 so I wouldn't worry about loss of precision. If you want to keep full 32 bit precision, though, you can change it to dtype: float32.
hello,I am using two llama3 8B models for merging, the base model is a new model I got after sft on the llama3 8b base model. At the time of sft I modified the format of the model inference the output format of this new model:
The prompt is :
and the generation is :
Where <|SYSTEM|>, <|USER|>, <|ASSISTANT|>, <|CONTENT|> are replacing the extra_token in the llama3 vocabulary list, and then I merged this new model with the llama3 8B instruct model, the yaml file is as follows:
But when I reason about the merged model, the reasoning format changes, the output starts with <|CONTENT|>, but becomes content, is there any good way to avoid this?
Also, may I ask why the merged model is smaller than the original model? It feels like there is a file loss.
merged model:
base model:
Is there something wrong with me?
The text was updated successfully, but these errors were encountered: