Skip to content

Errors when running target model #39

@Laulava-Velho

Description

@Laulava-Velho

I have performed transfer learning, using base model Llama-3.2-1B-Instruct.

Variant 1. When running the result Llama-3.2-1B-transmla model for inference, have got error like:
Head size 576 is not supported by FlashAttention. Supported head sizes are: [32, 64, 96, 128, 160, 192, 224, 256].
Head size 576 is not supported by PagedAttention. Supported head sizes are: [32, 64, 80, 96, 112, 120, 128, 192, 256].

Command used to create Llama-3.2-1B-transmla:
python transmla/converter.py
--model-path models/Llama-3.2-1B-Instruct/
--save-path output/Llama-3.2-1B-transmla
--dtype bf16
--cal-dataset wikitext2
--cal-nsamples 128
--cal-max-seqlen 256
--cal-batch-size 8
--ppl-eval-batch-size 8
--freqfold auto
--collapse auto
--qk-mqa-dim 64
--q-lora-rank 512
--kv-lora-rank 512

Python environment satisfy project requirements:
vllm==0.8.4
transformers==4.52.4
datasets==4.2.0
accelerate==1.3.0
datatrove==0.6.0
tensorboardX==2.6.4

Variant 2. When changed transfer parameters qk-mqa-dim=64, q-lora-rank=192, kv-lora-rank=192,
(to satisfy condition Supported_head_size=256) then face another error:
ModuleNotFoundError: No module named 'transformers_modules.Llama-3'

Also noted, in original model (config.json):
architectures: LlamaForCausalLM
model_type: llama
num_key_value_heads 8

In trained model (config.json):
architectures: LlamaMLAForCausalLM
model_type: deepseek_v3
num_key_value_heads 32

Variant 3. When changed transfer parameters qk-mqa-dim=64, q-lora-rank=192, kv-lora-rank=192, --deepseek-style,
then face another error:
ValueError: Following weights were not initialized from checkpoint.

My question: any help or comments will be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions