# Model Merging through Tensor Arithmetic

Model merging is an innovative technique that combines multiple fine-tuned models to create a new model that leverages the strengths of its components. This approach can lead to improved performance, broader knowledge, or specialized capabilities. Our library simplifies the process of model merging by treating it as operations on dictionaries, making it accessible and flexible for various applications.

## Key Features:
1. **StateDict Abstraction**: Maps `nn.Module` objects into dictionaries, enabling easy manipulation through basic arithmetic operations.
2. **@dict_map Decorator**: Simplifies development by allowing operations on layers of mergeable modules.
3. **User-Friendly Interface**: Offers a functional approach with a focus on extensibility.

Benefits of our approach:
- Ease of use: Intuitive dictionary-based operations
- Flexibility: Supports various merging strategies
- Extensibility: Easily implement custom merging techniques

### Getting Started

This Jupyter Notebook includes pre-defined variables to help you experiment with model merging. We'll use a pre-trained GPT-2 model as an example to demonstrate how to convert a model into a StateDict for merging operations.

Example use case: You could merge a general language model with a domain-specific model to create a new model that combines broad language understanding with specialized knowledge.

Let's begin by loading the necessary components and converting a model to a StateDict:

In [19]:
from transformers import pipeline, set_seed

gpt2 = pipeline('text-generation', model='openai-community/gpt2', device='cpu', framework='pt')

## Load the library
import mergecraft
from mergecraft import StateDict

## Convert the model to a state dict
base = StateDict.from_model(gpt2.model)
isinstance(base, dict)

True

## Understanding the StateDict

After converting our model to a `StateDict`, we can explore its structure and contents. The `StateDict` is a subclass of `OrderedDict`, which means it behaves like a regular Python dictionary but maintains the order of its items.

Key characteristics of `base`:
- Type: `mergecraft.StateDict`
- Purpose: Maps parameter names to their corresponding tensors
- Usage: Can be handled like a common dictionary

Example contents:
- `'transformer.wte.weight'`: Tensor representing word token embedding weights
- `'transformer.wpe.weight'`: Tensor representing position embedding weights

Let's examine the structure of our `base` StateDict:

In [20]:
## Display the first 5 keys in the StateDict
print("First 5 keys in the StateDict:")
print(list(base.keys())[:5])

## Show the shape of a specific weight tensor
print("\nShape of 'transformer.wte.weight' tensor:")
print(base['transformer.wte.weight'].shape)

First 5 keys in the StateDict:
['transformer.wte.weight', 'transformer.wpe.weight', 'transformer.h.0.ln_1.weight', 'transformer.h.0.ln_1.bias', 'transformer.h.0.attn.c_attn.weight']

Shape of 'transformer.wte.weight' tensor:
torch.Size([50257, 768])


## Loading Specialized Models

To demonstrate the power of model merging, we'll load two specialized models alongside our base GPT-2 model. These models have been fine-tuned on specific domains:

1. Australian Legal Model: Specialized in Australian legal language and concepts
2. Recipe Model: Fine-tuned for generating cooking recipes and food-related text

By merging these models with our base model, we can potentially create a new model that combines general language understanding with specialized knowledge in law and cooking.

Let's load these models and convert them to StateDicts:

In [21]:
## Load the Australian legal model
legal = StateDict.from_hf('umarbutler/open-australian-legal-gpt2', 'text-generation')

## Load the recipe model
recipe = StateDict.from_hf('mrm8488/gpt2-finetuned-recipes-cooking', 'text-generation')

## Print the number of parameters in each model
print(f"Base model parameters: {len(base)}")
print(f"Legal model parameters: {len(legal)}")
print(f"Recipe model parameters: {len(recipe)}")

## Verify that all models have the same structure
print("\nDo all models have the same keys?")
print(set(base.keys()) == set(legal.keys()) == set(recipe.keys()))

Base model parameters: 148
Legal model parameters: 148
Recipe model parameters: 148

Do all models have the same keys?
True


## Combining Models with Tensor Operations

Now that we have our base model and two specialized models loaded as StateDicts, we can explore various ways to combine them using tensor arithmetic. This is where the power and flexibility of our library shine. We can perform operations like:

- Addition: Combine knowledge from different models
- Subtraction: Remove specific knowledge from a model
- Scalar multiplication/division: Amplify or reduce the influence of a model
- Tensor multiplication/division: More complex interactions between models

Let's demonstrate these operations and discuss their potential effects:

In [22]:
## Addition: Combining knowledge from recipe and legal models
combined = recipe + legal
print("Combined model parameters:", len(combined))

## Subtraction: Removing recipe knowledge from base model
base_minus_recipe = base - recipe
print("Base minus recipe parameters:", len(base_minus_recipe))

## Scalar multiplication: Amplifying the legal model's influence
amplified_legal = base * 0.7 + legal * 0.3
print("Amplified legal model parameters:", len(amplified_legal))

## Tensor division: Complex interaction between base and legal models
complex_interaction = base / legal
print("Complex interaction parameters:", len(complex_interaction))

## Chaining multiple operations
custom_blend = base - (legal - recipe) * 0.5
print("Custom blend parameters:", len(custom_blend))

Combined model parameters: 148
Base minus recipe parameters: 148
Amplified legal model parameters: 148
Complex interaction parameters: 148
Custom blend parameters: 148


## Merging Models with Task Vector Editing

We'll now explore a more sophisticated merging technique: Task Vector Editing (Ilharco et al.). This framework allows us to combine specialized knowledge from multiple fine-tuned models while maintaining the general language understanding of the base model.

The process involves three main steps:
1. Compute task vectors: Calculate the weight differences between fine-tuned models and the base model
2. Create a mean vector: Average the task vectors
3. Merge: Add the scaled mean vector to the base model

In our example, we'll create a multi-task model that combines knowledge about Australian law and cooking recipes.

Key components:
- Base model: General language understanding
- Legal model: Specialized in Australian law
- Recipe model: Specialized in cooking recipes

Let's implement the Task Vector Editing approach:

In [23]:
## Step 1: Compute task vectors
legal_delta = legal - base    ## Task vector for Australian law
recipe_delta = recipe - base  ## Task vector for cooking recipes

## Step 2 & 3: Create mean vector and merge with base model
LAMBDA = 0.5  ## Scaling factor for fine-tuned knowledge
mean_vector = (legal_delta + recipe_delta) * LAMBDA / 2
multitask = base + mean_vector

## Optional: Visualize the contribution of each component
import numpy as np

def component_contribution(model):
    return np.mean([np.abs(param.numpy(force=True)).mean() for param in model.values()])

print("Average absolute value of parameters:")
print(f"\tBase model: {component_contribution(base):.6f}")
print(f"\tLegal delta: {component_contribution(legal_delta):.6f}")
print(f"\tRecipe delta: {component_contribution(recipe_delta):.6f}")
print(f"\tMultitask model: {component_contribution(multitask):.6f}")

Average absolute value of parameters:
	Base model: 0.131947
	Legal delta: 0.004745
	Recipe delta: 0.007023
	Multitask model: 0.131827


## Converting the Merged StateDict Back to a Model

After merging the models, we need to convert our `StateDict` back into a usable model or pipeline. The `statedict2model` function from the `mergecraft` library makes this process straightforward. We just need to provide:

1. The merged `StateDict`
2. The name or path of the original model pipeline
3. Optional parameters like device and framework

Let's convert our multitask model and test it with some prompts to see how it combines knowledge from both legal and culinary domains.

In [24]:
from transformers import set_seed

## Convert the merged StateDict to a HuggingFace pipeline
multitask_pipe = multitask.to_model('openai-community/gpt2', device='cpu', framework='pt')

### Comparing Base GPT-2 and Multitask Model Outputs

To illustrate the effects of our model merging, let's compare outputs from the original GPT-2 model and our multitask model. We'll use prompts that touch on general conversation, cooking, and Australian law to showcase the combined knowledge.

In [25]:
prompts = [
    'Good morning! My name is Elizabeth and for breakfast I had',
    'How to make the perfect Omelette. Ingredients:',
    'Section 51 of the Australian Constitution provides that',
]

set_seed(1789)  ## For reproducibility

for prompt in prompts:
    print(f"Prompt: {prompt}")
    print('GPT-2:', gpt2(prompt, max_length=50, num_return_sequences=1, pad_token_id=50256)[0]['generated_text'][len(prompt):])
    print('Multitask:', multitask_pipe(prompt, max_length=50, num_return_sequences=1, pad_token_id=50256)[0]['generated_text'][len(prompt):])
    print('=============================\n')

Prompt: Good morning! My name is Elizabeth and for breakfast I had
GPT-2:  to do a bunch of things…I got this shirt, a new phone and a new hat. Then, this guy is on the phone, and I'm like…what? Okay.
Multitask:  the good fortune of meeting you on your first street and at the end of a small party with an excellently dressed woman of your choice who had a lovely and sweet face: of good

Prompt: How to make the perfect Omelette. Ingredients:
GPT-2:  2 cups flour (I have not been able to find a comparable gluten free recipe for this filling), salt, pepper (1 cup gluten free flour), 1/2 tsp baking powder (optional)
Multitask: 
1 cup white sugar - 1 egg for each 1 oz of cheese
1 cup milk
1-3 egg white
1 cup white flour
1 medium onion
1-1 cup flour

Prompt: Section 51 of the Australian Constitution provides that
GPT-2:  "all persons shall exercise free and adequate rights and protection, against the state, the judiciary, civil, administrative or other courts, in the exercise of their rights 

## Analysis of Results

Observing the outputs, we can see:

1. General Conversation: Both models handle the breakfast prompt.

2. Cooking: The multitask model likely provides more detailed or accurate ingredients for an omelette, reflecting its specialized knowledge from the recipe model.

3. Australian Law: The multitask model should demonstrate more accurate and specific knowledge about the Australian Constitution, while the base GPT-2 model might give more general or potentially incorrect information.

These results showcase how our merged model combines knowledge from different domains, enhancing its capabilities in specific areas while maintaining general language understanding.

### Extensibility of the Merging Library

A key feature of this library is its simplicity in combining models and its extensibility. While it implements famous merging methods, users can easily extend it to create new merging techniques. This flexibility allows researchers and practitioners to experiment with novel approaches to model merging, potentially leading to even more powerful and specialized language models.

# Using Implemented Merging Methods

While the library allows for custom merging techniques, it also provides several ready-made merging methods for convenience and reproducibility. These methods are based on popular research in the field of model merging.

### Available Merging Methods

For a comprehensive list and detailed explanations of all implemented merging methods, please refer to the `README.md` file in the library's repository. Some of the methods might include:

- TIES (Task Intersection with Entropy-based Scaling)
- SLERP (Spherical Linear Interpolation)
- DARE (Difference-Aware wEight merging)

### Streamlined Model Merging with TIES

The `mergecraft` library provides a straightforward interface for merging models using various methods, including TIES (Task Intersection with Entropy-based Scaling). Let's see how easily we can merge multiple models using this method.

In [26]:
%%time
from mergecraft import ties

# Define the models to be merged
models = ['openai-community/gpt2', 'umarbutler/open-australian-legal-gpt2', 'mrm8488/gpt2-finetuned-recipes-cooking']

# Merge the models using TIES
multitask_ties = ties(models, task='text-generation', k=0.2)

CPU times: total: 1min 23s
Wall time: 20.9 s


The `ties` function handles all the complexities of loading the models, computing the task vectors, and merging them according to the TIES algorithm. The `k=0.2` parameter indicates that we're keeping the top 20% of the most important parameters for each task.

### Key Advantages of mergecraft:
1. **Simplicity**: Merging complex models is reduced to a single function call.
2. **Flexibility**: Works seamlessly with Huggingface models.

Now, let's test our TIES-merged model with some prompts:

In [27]:
prompts = [
    'Good morning! My name is Elizabeth and for breakfast I had',
    'How to make the perfect Omelette. Ingredients:',
    'Section 51 of the Australian Constitution provides that',
]

set_seed(1789)  ## For reproducibility

for prompt in prompts:
    print(f"Prompt: {prompt}")
    print('TIES-Multitask:', multitask_ties(prompt, max_length=50, num_return_sequences=1, pad_token_id=50256)[0]['generated_text'][len(prompt):])
    print('=============================\n')

Prompt: Good morning! My name is Elizabeth and for breakfast I had
TIES-Multitask:  to do a little bit of a search and find out what a great morning the morning is in my home and this is a great morning so don't miss the opportunity to have a new day

Prompt: How to make the perfect Omelette. Ingredients:
TIES-Multitask: 
Preheat your omelette with the omelette rolling pin in a small amount of oil to 360 degrees
If your omelette has a removable base
Use a metal handle to

Prompt: Section 51 of the Australian Constitution provides that
TIES-Multitask: 
Non-Constitutionally binding international acts of international law (including the international family trust act) must be made before the event of the event occurs and those who do not follow this rule can not expect its



# Conclusion and Next Steps

Congratulations! You've now explored the powerful and user-friendly interface of mergecraft. Let's recap what we've learned and look ahead to what's next.

### Key Takeaways

1. **Simplicity**: Mergecraft provides an intuitive interface for complex model merging operations.
2. **Flexibility**: The library supports various merging paradigms, from simple arithmetic to sophisticated methods like TIES.
3. **Compatibility**: Seamless integration with Hugging Face models makes it easy to work with a wide range of pre-trained models.
4. **Effectiveness**: We've seen how merged models can combine knowledge from different domains, enhancing their capabilities.

### Explore Further

Now that you're familiar with the basics, we encourage you to:

1. **Experiment with Other Methods**: Try out the various implemented merging paradigms available in mergecraft. Each method has its unique strengths and may be suited for different scenarios.

2. **Compare Results**: Test different merging methods on the same set of models and compare their outputs. This can provide insights into which methods work best for your specific use case.

3. **Vary Parameters**: Experiment with different hyperparameters (like the `k` value in TIES) to see how they affect the merged model's performance.

4. **Apply to Your Projects**: Consider how model merging could benefit your own NLP projects or research.

### Looking Ahead: Extending Mergecraft

Are you intrigued by the possibilities and want to push the boundaries further? In the next part of this tutorial, we'll dive deeper into the inner workings of mergecraft. You'll learn:

- The core principles behind the library's design
- How to implement your own custom merging methods
- Tips for optimizing and fine-tuning your merging strategies

By understanding the library's architecture, you'll be well-equipped to extend its functionality and potentially contribute novel merging methods to the field of NLP.

In [33]:
# iso = (base + recipe) / 2
# iso_pipe = iso.to_model('openai-community/gpt2', task='text-generation')
res = iso_pipe('In order to make a great carbonara you\'ll need to ', num_return_sequences=20,)

for gen in res:
    print(gen['generated_text'])
    print('=============================\n')



In order to make a great carbonara you'll need to iced tea leaves - you'll need a hot water bath to use them - but at least one of them is watertight as well
You can also use just one or two at a

In order to make a great carbonara you'll need to ive a small flat pan
For example
Spray it with a little water to keep it fresh
Heat the oil in one medium-sized pan
When the oil is hot


In order to make a great carbonara you'll need to iced water a minimum of 10 - 15 seconds and also in a minimum of 10 minutes at the cold water temperature
But if you're not going to iced you can also iced

In order to make a great carbonara you'll need to perse this:

Fry in salted water for 15 minutes
Stirring frequently until the edges of the torte are cooked to cover the liquid
Remove from the heat

In order to make a great carbonara you'll need to iced your water and put in some good old fashioned sugar and vinegar and put it in the oven at 45 min to 90 minute intervals so the mixture gets thick and bub