# Mergine fine-tuned models
After fine-tuning a model, it must be merged with the base model to make a new model others can simply download and try.

In [1]:
base_model = "google/gemma-2b-it"
new_model = "haesleinhuepf/gemma-2b-it-bia-proof-of-concept"

For the merging step, we reload the model and the base model.

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel
import torch
from trl import setup_chat_format
# Reload tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(base_model)

base_model_reload = AutoModelForCausalLM.from_pretrained(
        base_model,
        return_dict=True,
        low_cpu_mem_usage=True,
        torch_dtype=torch.float16,
        device_map="auto",
        trust_remote_code=True,
)

base_model_reload, tokenizer = setup_chat_format(base_model_reload, tokenizer)

# Merge adapter with base model
model = PeftModel.from_pretrained(base_model_reload, new_model + "_temp")

merged_model = model.merge_and_unload()

`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.
Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use
`config.hidden_activation` if you want to override this behaviour.
See https://github.com/huggingface/transformers/pull/29402 for more details.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

## Test model
After merging, we can test the model.

In [3]:
messages = [{"role": "user", "content": """
Write Python code to load the image ../11a_prompt_engineering/data/blobs.tif,
segment the nuclei in it and
show the result
"""}]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipe = pipeline(
    "text-generation",
    model=merged_model,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipe(prompt, max_new_tokens=120, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

  attn_output = torch.nn.functional.scaled_dot_product_attention(


<|im_start|>user

Write Python code to load the image ../11a_prompt_engineering/data/blobs.tif,
segment the nuclei in it and
show the result
<|im_end|>
<|im_start|>assistant
The code imports the necessary libraries, loads the image, segments it using the Voronoi-Otsu algorithm, and displays the resulting image.

```python

import pyclesperanto_prototype as cle
from skimage.io import imread

image = imread("../11a_prompt_engineering/data/blobs.tif")

cle.voronoi_otsu_segmentation(image)

cle.imshow(image)

```
This code imports the necessary libraries, loads the image, segments it using the Voronoi-Otsu algorithm, and displays the resulting image.


In [4]:
merged_model.save_pretrained(new_model)

In [7]:
#merged_model.push_to_hub(new_model, use_temp_dir=False)