Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging fails with RuntimeError: weight required but not present in model #284

Open
w601sxs opened this issue Apr 22, 2024 · 7 comments
Open
Assignees

Comments

@w601sxs
Copy link

w601sxs commented Apr 22, 2024

I'm trying to merge some embedding models with this config file. the architectures are similar but I think it is erroring out on some names of layers? Would love some suggestions on how to change the yaml to make it work.

YAML config:

models:
  - model: mixedbread-ai/mxbai-embed-large-v1
  - model: BAAI/bge-large-en-v1.5
    parameters:
      density: [0, 0.25, 0.5, 0.75, 1]
      weight: [0, 0.25, 0.5, 0.75, 1]
  - model: avsolatorio/GIST-large-Embedding-v0
    parameters:
      density: [0, 0.25, 0.5, 0.75, 1]
      weight: [0, 0.25, 0.5, 0.75, 1]
  - model: WhereIsAI/UAE-Large-V1
    parameters:
      density: [0, 0.25, 0.5, 0.75, 1]
      weight: [0, 0.25, 0.5, 0.75, 1]
merge_method: dare_ties
base_model: mixedbread-ai/mxbai-embed-large-v1
parameters:
  int8_mask: true
dtype: bfloat16

Error

RuntimeError: Tensor bert.encoder.layer.23.output.LayerNorm.weight required but not present in model WhereIsAI/UAE-Large-V1

CLI used

!mergekit-yaml merge.yaml ./output --cuda
@w601sxs
Copy link
Author

w601sxs commented Apr 22, 2024

maybe an example of how to frankenmerge with passthrough?

@metric-space metric-space self-assigned this Apr 24, 2024
@metric-space
Copy link
Contributor

Hey there, thank you for the detailed issue. This is definitely a bug

As of now for a quick quick fix for this to make it work on your end is to go here mergekit/_data/architectures/bert.json and replace all instances of bert. with an empty string

and that should hopefully get you going with your current config

That said, we will be putting a bug fix soon

@w601sxs
Copy link
Author

w601sxs commented Apr 25, 2024

I'll try that in a local branch and wait for the fix! thanks

@cg123
Copy link
Collaborator

cg123 commented Apr 29, 2024

Thanks for the bug report! PR #295 should fix this issue. If you run into any further trouble please let me know - the BERT support is quite fresh and I appreciate knowing where it fails.

@yaof20
Copy link

yaof20 commented Apr 29, 2024

Hi Charles! Thanks for the great work!

I am encountering similar issues.

I am using phi-1 and phi-1.5 models, the config yml file is as follows.

dtype: float16
merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 8]
    model: microsoft/phi-1
- sources:
  - layer_range: [4, 12]
    model: microsoft/phi-1
- sources:
  - layer_range: [8, 16]
    model: microsoft/phi-1
- sources:
  - layer_range: [12, 20]
    model: microsoft/phi-1
- sources:
  - layer_range: [16, 24]
    model: microsoft/phi-1
- sources:
  - layer_range: [20, 28]
    model: microsoft/phi-1
- sources:
  - layer_range: [24, 32]
    model: microsoft/phi-1

Both phi-1 and phi-1.5 give me the following feedback. (I also tried Tinyllama, it also gave me the same issue)

RuntimeError: Tensor model.layers.31.mlp.fc2.weight required but not present in model microsoft/phi-1_5

In addition, how can I run the same yml config for phi-3 model, whose architecture is currently not included in the package?

Thanks!
@cg123

@cg123
Copy link
Collaborator

cg123 commented Apr 30, 2024

@yaof20 This is because microsoft/phi-1 only has 24 layers, but you're telling mergekit to look for 32 total. If you adjust your config to only use 0-24 instead it should work properly.

As for Phi-3 - I'll add support for it in the next couple of days!

@yaof20
Copy link

yaof20 commented Apr 30, 2024

@yaof20 This is because microsoft/phi-1 only has 24 layers, but you're telling mergekit to look for 32 total. If you adjust your config to only use 0-24 instead it should work properly.

As for Phi-3 - I'll add support for it in the next couple of days!

Thanks for the reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants