Activation/representation based merging #199

shamanez · 2024-03-18T11:35:48Z

What is this?

This PR introduces a way to merge two models via their activations and hidden states on a tiny sample of data.
This method uses these activations and hidden states to form correlation matrices to then generate permutation and inverse permutation matrices for weights in each model and then combines them

This PR consists of three main scripts

the first one generates the activation/hidden state for each space
a permutation and inverse permutation pair is generated for each space
based on each space and the connected weights, the permutation and/or inverse permutation is applied to each weight and then the weights are combined

Assumptions

The models to be merged are of the same architecture and equal block/layer count

Testing

To test this we need to get the mergekit/scripts/random_permuter.py script from the branch rope-alignment

(see below the bash stuff for the final inference script i.e test_by_gen.py)

git clone --branch rope-alignment https://github.com/arcee-ai/mergekit.git  permuter
python3  -mvenv permuter 
cd permuter && source bin/activate
pip install -e .
huggingface-cli login
python mergekit/scripts/permute_random.py meta-llama/Llama-2-7b-chat-hf --permute-head-dims  --out-path random2
cp $HF_HOME/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/f5db02db724555f92da89c216ac04704f23d4590/{tokenizer*,special_tokens_map.json} random2
deactivate 
cd -

git clone --branch wip-zipit https://github.com/arcee-ai/mergekit.git  mergekit
python3  -mvenv mergekit 
cd mergekit && source bin/activate
pip install -e .
mkdir delete_dump_output/
python mergekit/scripts/ABM/extract_activations.py  meta-llama/Llama-2-7b-chat-hf -o ./delete_dump_output  -d arcee-ai/pmc-test-perplexity  -s 8  -c text  -u test  --device cpu
python mergekit/scripts/ABM/extract_activations.py /home/ubuntu/data/permuter/random2 -o ./delete_dump_output  -d arcee-ai/pmc-test-perplexity  -s 8  -c text  -u test  --device cpu
mkdir delete_m_v_out
python mergekit/scripts/ABM/extract_permutation_matrices.py ./delete_dump_output/meta-llama_Llama-2-7b-chat-hf_features.bin ./delete_dump_output/_home_ubuntu_data_permuter_random2_features.bin   --model_path  meta-llama/Llama-2-7b-chat-hf --out_path ./delete_m_v_out
mkdir new_model/
python mergekit/scripts/activations_based_merge.py  meta-llama/Llama-2-7b-chat-hf  /home/ubuntu/data/permuter/random2  delete_m_v_out -o new_model
python test_by_gen.py new_model

(test_by_gen.py)

import sys

import torch
from transformers import pipeline

model = sys.argv[1] 

pipe = pipeline(
    "text-generation", model=model, torch_dtype=torch.bfloat16, device_map="auto"
)

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a helpful chatbot who pretends to be Richard Feynman",
    },
    {"role": "user", "content": "Could you tell me about the challenger disaster ?"},
]
prompt = pipe.tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
outputs = pipe(
    prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95
)
print(outputs[0]["generated_text"])

If all goes well, you should see the following (or something along the lines of the following)

Things that couldn't make into the final PR

on-the-fly handling of models with grouped query attention. This hasn't been tested enough for this release but will be in the near future. For now, users will have to resort to using this script first:

examples/zipit.yml

mergekit/_data/architectures/gpt2.json

mergekit/_data/mappings/a_b.json

Modified `dump_out_features.py` and added `dump_out_sim_metrics.py`

@metric-space

- computation of correlation matrices - computation mo m&u - correlation modification @metric-space --------- Co-authored-by: Luke Meyers <functor.soup@gmail.com>

#251) Make the work more merekit style

metric-space · 2024-07-10T02:57:16Z

Misc script

this scripts compares two models for differences in their weights

import itertools
import logging
import os
import sys
from collections import defaultdict
from typing import List, Optional

import click
import datasets
import numpy as np
import torch
from safetensors.torch import save_file
from torch.utils.data import DataLoader

from mergekit.architecture import _template_substitution, get_architecture_info
from mergekit.common import ModelReference

from mergekit.io.tasks import LazyTensorLoader

logging.basicConfig(level=logging.INFO)

# set seed
torch.manual_seed(42)
np.random.seed(42)

import torch.nn.functional as F


def cosine_similarity_diff(matrix1, matrix2):
    vec1 = matrix1.flatten().unsqueeze(0).double()
    vec2 = matrix2.flatten().unsqueeze(0).double()
    # plot the size of matrix elements, the decimal point the error
    similarity = F.cosine_similarity(vec1, vec2).item()  
    return similarity
    

# should be 0 most of the time
def frobenius_norm_diff(matrix1):
    return torch.norm(matrix1, p='fro')



@click.command("mergekit-compare-weights")
@click.argument("model-1-path", type=str)
@click.argument("model-2-path", type=str)
def main(
    model_1_path: str,
    model_2_path: str,
):
    model_1 = ModelReference.model_validate(model_1_path)
    model_2 = ModelReference.model_validate(model_2_path)

    model_1_config = model_1.config()
    model_2_config = model_2.config()

    model_1_arch_info = get_architecture_info(model_1_config)
    model_2_arch_info = get_architecture_info(model_2_config)

    tensor_index_1 = model_1.tensor_index()
    tensor_index_2 = model_2.tensor_index()


    loader_instance_1 = LazyTensorLoader(tensor_index_1)
    loader_instance_2 = LazyTensorLoader(tensor_index_2)

    for weight_info in model_1_arch_info.all_weights(model_1_config):
        weight_name = weight_info.name

        tensor_1 = loader_instance_1.get_tensor(weight_name)
        print(f"{weight_name}'s shape is  {tensor_1.shape}")
        tensor_2 = loader_instance_2.get_tensor(weight_name)

        if tensor_1.shape != tensor_2.shape:
            logging.warning(f"Shape mismatch for weight {weight_name}")
            continue

        # compute cosine similarity
        cosine_similarity = cosine_similarity_diff(tensor_1, tensor_2)
        frobenius_norm_1 = frobenius_norm_diff(tensor_1)
        frobenius_norm_2 = frobenius_norm_diff(tensor_2)

        logging.info(f"Weight {weight_name} cosine similarity:\t{cosine_similarity}, \tfrobenius norm diff: {frobenius_norm_1 - frobenius_norm_2}")

shamanez and others added 4 commits March 12, 2024 08:57

Create zipit.yml

a826e48

Add alignment method to config

83f97eb

First stab at activations dumper

1368c47

Differentiate between activations and hidden_states

ea5f816

shamanez commented Mar 18, 2024

View reviewed changes

examples/zipit.yml Outdated Show resolved Hide resolved

mergekit/_data/architectures/gpt2.json Outdated Show resolved Hide resolved

mergekit/_data/mappings/a_b.json Outdated Show resolved Hide resolved

shamanez changed the base branch from main to wip-git-rebasin March 18, 2024 18:48

metric-space and others added 9 commits March 18, 2024 21:50

Post test run corrections

a134612

More corrections

f5668e7

ZipIt Similarity. (#201)

52293c1

Modified `dump_out_features.py` and added `dump_out_sim_metrics.py`

Fdfmm 40 zipit metric (#205)

0004eb1

Gpt-2 residual connection correction

2bc97ab

Fix for architecture.py

d4cb463

Playing with subgraphs generated via zipit forward-backward propagations

b998989

Mainly adding modified M_U computation. (#249)

da1bc52

- computation of correlation matrices - computation mo m&u - correlation modification @metric-space --------- Co-authored-by: Luke Meyers <functor.soup@gmail.com>

Attempt to make zipit work speak the same language as rest of mergekit (

a6a2480

#251) Make the work more merekit style

metric-space changed the title ~~Wip zipit~~ Representation based alignment and merge Apr 29, 2024

Fixes in implementation

0e0ef40

metric-space changed the title ~~Representation based alignment and merge~~ Activation/representation based merge Jul 6, 2024

metric-space and others added 9 commits July 6, 2024 19:11

Another default gone

0e234b2

Delete examples/zipit.yml

9f324b6

Change boolean default

7f673f7

Code removal

ab6d131

Make sure average of correlations are taken

5a56d27

Remove on the fly GQA handling for now

0e85590

Delete test_zipit.sh

0a02343

Put back config source file to original state

d3b5970

Put back config source file to original state (part 2)

0b237e9

metric-space changed the title ~~Activation/representation based merge~~ Activation/representation based merging Jul 8, 2024

metric-space changed the base branch from wip-git-rebasin to main July 9, 2024 03:32

metric-space changed the base branch from main to wip-git-rebasin July 9, 2024 03:43

metric-space added 18 commits July 9, 2024 00:31

Code cleanup for feature extraction script

f0fcc6f

Another round of refactors and getting rid of unnecessary steps

2ef5c62

variable name correction

6bcf6d8

Left over correction

293801c

Cleaner logic for activations garbbing hooks

b7af5fb

More refactors for feature extraction script

5b1344b

More refactors and corrections

f92bc58

Make script efficient to avoid oom errors when device is set to gpu

10407fe

Make final script device configurable

f845c50

Make final script device configurable

fb55abe

Add chat template ability to feature extraction script

43863b7

Add datasets dependency to project dependencies

5d47dd0

Bug fix

dc53a58

Bug fix

42a9543

Yet another bug fix

2bff4d8

Encode connection between att_kq and attn_v

547c400

Give proper script commands

6327cfd

New folder and location change

fe548dc

metric-space marked this pull request as ready for review July 10, 2024 03:20

Delete test_by_gen.py

1ab4b2b

metric-space closed this Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Activation/representation based merging #199

Activation/representation based merging #199

shamanez commented Mar 18, 2024 •

edited by metric-space

Loading

metric-space commented Jul 10, 2024

Activation/representation based merging #199

Activation/representation based merging #199

Conversation

shamanez commented Mar 18, 2024 • edited by metric-space Loading

What is this?

Assumptions

Testing

Things that couldn't make into the final PR

metric-space commented Jul 10, 2024

Misc script

shamanez commented Mar 18, 2024 •

edited by metric-space

Loading