In [1]:
# To add a new cell, type '# %%'
# To add a new markdown cell, type '# %% [markdown]'
# %%
import os

#  [markdown]
# ## Finetune Mistral-7b on an A100
#
# Welcome to this Colab notebook that shows how to fine-tune the recent Llama-2-7b model on a single GPU.
#
# We will leverage PEFT library from Hugging Face ecosystem, as well as QLoRA for more memory efficient finetuning

# %%
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0


In [2]:
# %%
!nvidia-smi

Thu Sep 28 04:46:37 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    24W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
#  [markdown]
# ## Setup
#
# Run the cells below to setup and install the required libraries. For our experiment we will need `accelerate`, `peft`, `transformers`, `datasets` and TRL to leverage the recent [`SFTTrainer`](https://huggingface.co/docs/trl/main/en/sft_trainer). We will use `bitsandbytes` to [quantize the base model into 4bit](https://huggingface.co/blog/4bit-transformers-bitsandbytes).

#
!pip install -U bitsandbytes einops
!pip install -U git+https://github.com/huggingface/peft
!pip install -U git+https://github.com/huggingface/transformers

In [4]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

In [None]:
peft_model_id = "dfurman/mistral-7b-instruct-peft"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)

In [6]:
format_template = "You are a helpful assistant. Write a response that appropriately completes the request. {query}\n"

In [10]:

# First, format the prompt
query = "What is a good recipe for vegan banana bread?"
prompt = format_template.format(query=query)

# Inference can be done using model.generate
print("\n\n*** Generate:")

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
with torch.autocast("cuda", dtype=torch.float16):
    output = model.generate(
        input_ids=input_ids,
        max_new_tokens=512,
        do_sample=False,
        #temperature=0.3,
        return_dict_in_generate=True,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
        repetition_penalty=1.2,
    )

print(tokenizer.decode(output["sequences"][0], skip_special_tokens=True))




*** Generate:
You are a helpful assistant. Write a response that appropriately completes the request. What is a good recipe for vegan banana bread?
I know! Let's start by gathering all of our ingredients: 2 cups flour, 1 teaspoon baking soda, 1/4 teaspoon salt, and 3 very ripe bananas (the more brown spots they have, the better). Then we need some wet ingredients to mix with the dry ones: 1 cup sugar, 1/2 cup vegetable oil or applesauce, and 1 tablespoon vanilla extract. Now let's make sure everything is ready before we begin mixing it together. Preheat your oven to 350 degrees Fahrenheit and grease an 8x4 inch loaf pan with cooking spray or butter. Next, in a large bowl, whisk together the flour, baking soda, and salt until well combined. In another smaller bowl, mash up those bananas really well using a fork or potato masher - you want them to be almost liquid-y. Add the sugar, oil or applesauce, and vanilla extract to the bananas and stir until smooth. Pour this mixture into the l

In [11]:

# First, format the prompt
query = "Write me a numbered list of things to do in New York City."
prompt = format_template.format(query=query)

# Inference can be done using model.generate
print("\n\n*** Generate:")

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
with torch.autocast("cuda", dtype=torch.float16):
    output = model.generate(
        input_ids=input_ids,
        max_new_tokens=512,
        do_sample=False,
        #temperature=0.3,
        return_dict_in_generate=True,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
        repetition_penalty=1.2,
    )

print(tokenizer.decode(output["sequences"][0], skip_special_tokens=True))




*** Generate:
You are a helpful assistant. Write a response that appropriately completes the request. Write me a numbered list of things to do in New York City.
I'd start with some classic tourist attractions, like visiting the Statue of Liberty and taking a stroll through Central Park. Then I might add some more unique experiences, such as catching a Broadway show or exploring Chinatown for authentic Asian cuisine. For those who enjoy shopping, I could include recommendations for popular stores and markets around town. And no trip to NYC would be complete without trying out some famous local foods, like pizza from Lombardi's or bagels from Ess-a-Bagel. So here is my numbered list: 1. Visit the Statue of Liberty 2. Take a walk through Central Park 3. Catch a Broadway show 4. Explore Chinatown 5. Shop at popular stores and markets 6. Try delicious local food options

Numbered List:
1. Visit the Statue of Liberty - This iconic symbol of freedom stands tall on Liberty Island in New York

In [None]:

# First, format the prompt
query = "Write a short email inviting my friends to a dinner party on Friday. Respond succinctly."
prompt = format_template.format(query=query)

# Inference can be done using model.generate
print("\n\n*** Generate:")

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
with torch.autocast("cuda", dtype=torch.float16):
    output = model.generate(
        input_ids=input_ids,
        max_new_tokens=512,
        do_sample=False,
        #temperature=0.3,
        return_dict_in_generate=True,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
        repetition_penalty=1.2,
    )

print(tokenizer.decode(output["sequences"][0], skip_special_tokens=True))




*** Generate:
