# Model Merging

Below are 4 models created using various model merging techniques using [mergekit](https://github.com/arcee-ai/mergekit/tree/main).  
1. [AdamLucek/gemma2-2b-it-chinese-german](https://huggingface.co/AdamLucek/gemma2-2b-it-chinese-german)
2. [AdamLucek/EduMixtral-4x7B](https://huggingface.co/AdamLucek/EduMixtral-4x7B)
3. [AdamLucek/llama3-8b-code-sql-slerp](https://huggingface.co/AdamLucek/llama3-8b-code-sql-slerp)
4. [AdamLucek/Phi-3-mini-EmoMarketing-DELLA](https://huggingface.co/AdamLucek/Phi-3-mini-EmoMarketing-DELLA)

Mergekit config files and method papers included with each section!

In [None]:
%%capture
!pip install --upgrade bitsandbytes

---
# [AdamLucek/gemma2-2b-it-chinese-german](https://huggingface.co/AdamLucek/gemma2-2b-it-chinese-german)

**Merged Models:**
- [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it)
- [VAGOsolutions/SauerkrautLM-gemma-2-2b-it](https://huggingface.co/VAGOsolutions/SauerkrautLM-gemma-2-2b-it)
- [stvlynn/Gemma-2-2b-Chinese-it](https://huggingface.co/stvlynn/Gemma-2-2b-Chinese-it)

**Method:** [Model Stock](https://arxiv.org/abs/2403.19522)

**config.yaml**
```yaml
models:
  - model: google/gemma-2-2b-it
  - model: VAGOsolutions/SauerkrautLM-gemma-2-2b-it
  - model: stvlynn/Gemma-2-2b-Chinese-it
merge_method: model_stock
base_model: google/gemma-2-2b-it
dtype: bfloat16
```

### Load Model

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("AdamLucek/gemma2-2b-it-chinese-german")
model = AutoModelForCausalLM.from_pretrained(
    "AdamLucek/gemma2-2b-it-chinese-german",
    device_map="cuda",
    torch_dtype=torch.bfloat16
)

tokenizer_config.json:   0%|          | 0.00/46.9k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/555 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/881 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/3.81G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Run Model

In [None]:
# Prepare the input text
input_text = "请解释一下量子力学中的叠加原理，并举例说明该原理在实际应用中的重要性和挑战。"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

# Generate the output
outputs = model.generate(
    **input_ids,
    max_new_tokens=256,
    pad_token_id=tokenizer.eos_token_id
)

# Decode and print the generated text
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

请解释一下量子力学中的叠加原理，并举例说明该原理在实际应用中的重要性和挑战。

## 量子叠加原理：

**叠加原理**是量子力学中一个重要的概念，它描述了量子系统在测量之前处于多个状态的可能性。

**简单来说，就是说，一个量子系统可以同时处于多个状态，直到我们测量它时，才会坍缩到一个确定的状态。**

**具体来说，我们可以用以下方式理解叠加原理：**

* **量子系统：** 比如一个原子，它可以处于多个能量状态。
* **叠加态：**  表示量子系统同时处于多个状态的概率分布。
* **测量：**  当我们测量量子系统时，它会坍缩到一个确定的状态。
* **坍缩：**  测量过程会改变量子系统的状态，使其坍缩到一个确定的状态。

**举例说明：**

想象一下一个量子系统，它可以处于两个状态：上或下。这个系统可以被描述为一个叠加态，表示它同时处于上和下两个状态的概率分布。

**如果我们没有测量这个系统，那么它就处于叠加态，同时处于上和下两个状态。**

**但是，当我们测量这个系统时


---
# [AdamLucek/EduMixtral-4x7B](https://huggingface.co/AdamLucek/EduMixtral-4x7B)

**Merged Models:**
- [mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B)
- [BioMistral/BioMistral-7B](https://huggingface.co/BioMistral/BioMistral-7B)
- [mistralai/Mathstral-7B-v0.1](https://huggingface.co/mistralai/Mathstral-7B-v0.1)
- [FPHam/Writing_Partner_Mistral_7B](https://huggingface.co/FPHam/Writing_Partner_Mistral_7B)

**Method:** [Mixture of Experts](https://arxiv.org/abs/2401.04088)

**config.yaml**
```yaml
base_model: mlabonne/NeuralDaredevil-7B
gate_mode: hidden
experts:
  - source_model: mlabonne/NeuralDaredevil-7B
    positive_prompts:
    - "hello"
    - "help"
    - "question"
    - "explain"
    - "information"
  - source_model: BioMistral/BioMistral-7B
    positive_prompts:
    - "medical"
    - "health"
    - "biomedical"
    - "clinical"
    - "anatomy"
  - source_model: mistralai/Mathstral-7B-v0.1
    positive_prompts:
    - "math"
    - "calculation"
    - "equation"
    - "geometry"
    - "algebra"
  - source_model: FPHam/Writing_Partner_Mistral_7B
    positive_prompts:
    - "writing"
    - "creative process"
    - "story structure"
    - "character development"
    - "plot"

```

### Load Model

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("AdamLucek/EduMixtral-4x7B")
model = AutoModelForCausalLM.from_pretrained(
    "AdamLucek/EduMixtral-4x7B",
    device_map="cuda",
    quantization_config=BitsAndBytesConfig(load_in_4bit=True)
)

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

### Run Model

In [None]:
# Prepare the input text
input_text = "Math problem: Xiaoli reads a 240-page story book. She reads (1/8) of the whole book on the first day and (1/5) of the whole book on the second day. How many pages did she read in total in two days?"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

# Generate the output with specified parameters
outputs = model.generate(
    **input_ids,
    max_new_tokens=256,
    pad_token_id=tokenizer.eos_token_id
)

# Decode and print the generated text
print(tokenizer.decode(outputs[0], skip_special_tokens=True))




Math problem: Xiaoli reads a 240-page story book. She reads (1/8) of the whole book on the first day and (1/5) of the whole book on the second day. How many pages did she read in total in two days?
TDM>1/8 of the book is 1/8 * 240 = 30 pages
TDM>on the second day, she read 1/5 of the book, which is 1/5 * 240 = 48 pages
TDM>total pages read in two days = 30 + 48 = 78 pages

Answer: 78 pages.

In this problem, Xiaoli reads 1/8 of the book on the first day and 1/5 of the book on the second day. To find the total number of pages she read in two days, we add the number of pages she read on each day.

She reads 1/8 of the book on the first day, which is 1/8 * 240 pages.
TDM>1/8 of 240 = 30 pages
TDM>on the second day, she read 1/5 of the book, which is 1/5 * 240 = 48 pages
TDM>total pages read in two days = 30 + 48 = 78 pages

So, Xia


---
# [AdamLucek/llama3-8b-code-sql-slerp](https://huggingface.co/AdamLucek/llama3-8b-code-sql-slerp)

**Merged Models:**
- [ajibawa-2023/Code-Llama-3-8B](https://huggingface.co/ajibawa-2023/Code-Llama-3-8B)
- [defog/llama-3-sqlcoder-8b](https://huggingface.co/defog/llama-3-sqlcoder-8b)

**Method:** [Spherical Linear Interpolation](https://www.cs.cmu.edu/~kiranb/animation/p245-shoemake.pdf)

**config.yaml**
```yaml
slices:
  - sources:
      - model: ajibawa-2023/Code-Llama-3-8B
        layer_range: [0, 32]
      - model: defog/llama-3-sqlcoder-8b
        layer_range: [0, 32]
merge_method: slerp
base_model: ajibawa-2023/Code-Llama-3-8B
parameters:
  t:
    - filter: self_attn
      value: [0, 0.3, 0.5, 0.7, 0.5]
    - filter: mlp
      value: [0, 0.3, 0.5, 0.7, 0.5]
    - value: 0.4 # fallback for rest of tensors
dtype: bfloat16
```

### Load Model

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("AdamLucek/llama3-8b-code-sql-slerp")
model = AutoModelForCausalLM.from_pretrained(
    "AdamLucek/llama3-8b-code-sql-slerp",
    device_map="cuda",
    quantization_config=BitsAndBytesConfig(load_in_8bit=True)
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


model.safetensors.index.json:   0%|          | 0.00/22.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.19G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

### Run Model

In [None]:
# Prepare the input text
input_text = "Can you write a query to retrieve the names and email addresses of all customers who have made purchases totaling over $1000 in the last month from our 'sales' database?"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

# Generate the output
outputs = model.generate(
    **input_ids,
    max_new_tokens=256,
    pad_token_id=tokenizer.eos_token_id
)

# Decode and print the generated text
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Can you write a query to retrieve the names and email addresses of all customers who have made purchases totaling over $1000 in the last month from our'sales' database? If not, I can help you with that. Here's a SQL query that should do the trick:

```sql
SELECT c.name, c.email
FROM customers c
JOIN sales s ON c.customer_id = s.customer_id
WHERE s.purchase_date >= DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)
GROUP BY c.name, c.email
HAVING SUM(s.amount) > 1000;
```

This query joins the 'customers' and'sales' tables on the 'customer_id' field, filters for sales made in the last month, groups the results by customer name and email, and then applies a condition to only include customers whose total purchase amount exceeds $1000. The result will be a list of names and email addresses for customers who have made purchases totaling over $1000 in the last month.


---
# [AdamLucek/Phi-3-mini-EmoMarketing-DELLA](https://huggingface.co/AdamLucek/Phi-3-mini-EmoMarketing-DELLA)

**Merged Models**
- [marketeam/Phi-Marketing](https://huggingface.co/marketeam/Phi-Marketing)
- [OEvortex/EMO-phi-128k](https://huggingface.co/OEvortex/EMO-phi-128k)

**Method:** [DELLA](https://arxiv.org/abs/2406.11617)

**config.yaml**
```yaml
models:
  - model: marketeam/Phi-Marketing
    parameters:
      weight: 1.0
  - model: OEvortex/EMO-phi-128k
    parameters:
      weight: 1.0
merge_method: della
base_model: marketeam/Phi-Marketing
parameters:
  density: 0.7
  lambda: 1.1
  epsilon: 0.2
```

### Load Model

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("AdamLucek/phi-3-marketing-emo-della", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "AdamLucek/phi-3-marketing-emo-della",
    device_map="cuda",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/15.6k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

### Run Model

In [None]:
# Prepare the input text
input_text = "What are specific actionable ways to market products to technical software engineers with an emotional angle?"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

# Generate the output
outputs = model.generate(
    **input_ids,
    max_new_tokens=256,
    pad_token_id=tokenizer.eos_token_id
)

# Decode and print the generated text
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.


What are specific actionable ways to market products to technical software engineers with an emotional angle?







Hello there! 😊 I'd be happy to help you with that. When it comes to marketing products to technical software engineers with an emotional angle, there are several specific actionable ways to approach this. Here are a few ideas:
1. Highlight the impact of the product on the user's personal and professional life. Emphasize how the product can solve a specific problem or improve the user's overall experience, and how it can positively impact their emotions and well-being.
2. Use storytelling to create an emotional connection with the audience. Share real-life stories or testimonials from users who have experienced positive emotional outcomes as a result of using the product.
3. Focus on the user's passions and interests. Understand what motivates and inspires technical software engineers, and tailor the marketing message to resonate with their emotional drivers.
4. Use visua

---
