## Fine-tuning Mistral 7b with AutoTrain

Setup Runtime
For fine-tuning Llama, a GPU instance is essential. Follow the directions below:

- Go to `Runtime` (located in the top menu bar).
- Select `Change Runtime Type`.
- Choose `T4 GPU` (or a comparable option).

### Step 1: Setup Environment

In [2]:
!pip install pandas autotrain-advanced -q

In [3]:
!autotrain setup --update-torch

[1mINFO    [0m | [32m2025-03-31 14:06:12[0m | [36mautotrain.cli.run_setup[0m:[36mrun[0m:[36m43[0m - [1mInstalling latest xformers[0m
[1mINFO    [0m | [32m2025-03-31 14:06:13[0m | [36mautotrain.cli.run_setup[0m:[36mrun[0m:[36m45[0m - [1mSuccessfully installed latest xformers[0m
[1mINFO    [0m | [32m2025-03-31 14:06:13[0m | [36mautotrain.cli.run_setup[0m:[36mrun[0m:[36m51[0m - [1mInstalling latest PyTorch[0m
[1mINFO    [0m | [32m2025-03-31 14:06:16[0m | [36mautotrain.cli.run_setup[0m:[36mrun[0m:[36m53[0m - [1mSuccessfully installed latest PyTorch[0m


In [None]:
!pip install triton
!pip install --upgrade bitsandbytes

## Step 2: Connect to HuggingFace for Model Upload

### Logging to Hugging Face
To make sure the model can be uploaded to be used for Inference, it's necessary to log in to the Hugging Face hub.

### Getting a Hugging Face token
Steps:

1. Navigate to this URL: https://huggingface.co/settings/tokens
2. Create a write `token` and copy it to your clipboard
3. Run the code below and enter your `token`

In [None]:
from huggingface_hub import notebook_login
notebook_login()

## Step 3: Upload your dataset

Add your data set to the root directory in the Colab under the name train.csv. The AutoTrain command will look for your data there under that name.

#### Don't have a data set and want to try finetuning on an example data set?
If you don't have a dataset you can run these commands below to get an example data set and save it to train.csv

In [20]:
import pandas as pd
df = pd.read_csv("train.csv")
df

Unnamed: 0,text,rejected_text
0,system_instruction: System-instruction: You ar...,Suggest medications for a patient with high bl...
1,system_instruction: You are an endocrinology s...,Provide treatment options for type 2 diabetes.
2,system_instruction: You are a pediatric immuni...,Explain vaccination protocols for children.
3,system_instruction: You are a financial analys...,Analyze the stock market trends for the last 6...
4,system_instruction: You are a certified financ...,Explain strategies for retirement planning.
...,...,...
106,system_instruction: You are a predictive analy...,You are a predictive analytics assistant for f...
107,system_instruction: You are a healthcare assis...,You are a healthcare assistant. Provide treatm...
108,system_instruction: You are an emergency respo...,You are an emergency response coordinator. Ass...
109,system_instruction: You are a robotics managem...,You are a robotics management assistant. Coord...


In [None]:
df['text'][15]

## Step 4: Overview of AutoTrain Command

#### Short Overview of What the Command Flags Do

- `!autotrain`: Runs a shell command directly (commonly used in Jupyter or Colab notebooks).
- `llm`: Specifies that you’re working on a language model task.
- `--train`: Initiates the training process.
- `--project-name`: Sets the name of the project (and output directory).
- `--model abhishek/llama-2-7b-hf-small-shards`: Specifies the base model hosted on Hugging Face (here, "llama-2-7b-hf-small-shards" under the "abhishek" namespace).
- `--data_path .`: Points to the directory containing your dataset. With `.` AutoTrain will look for your dataset (e.g. `train.csv`) in the current directory.
- `--peft`: Enables Parameter-Efficient Fine-Tuning (PEFT) using techniques like LoRA.
- `--lora-r`, `--lora-alpha`, `--lora-dropout`: Configure LoRA parameters (e.g., rank, scaling factor, and dropout rate).
- `--quantization int4`: Enables 4-bit quantization to reduce model size and speed up inference with minimal precision loss.
- `--lr 2e-4`: Sets the learning rate for training to 0.0002.
- `--batch-size 12`: Sets the training batch size to 12.
- `--epochs 3`: Instructs AutoTrain to iterate over your dataset 3 times (3 training epochs).
- `--mixed-precision bf16`: Uses BF16 mixed-precision training for improved performance and memory efficiency.
- `--trainer reward`: Specifies that the Reward Trainer should be used (ideal for training reward models).
- `--target-modules q_proj,v_proj`: Limits fine-tuning to specific target modules (in this case, the query and value projection layers).
- `--push-to-hub`: Automatically uploads the fine-tuned model to the Hugging Face Hub after training.

---

### Steps Needed Before Running

1. **Project Name:** After `--project-name`, replace `<project-name>` with the name you'd like to assign to your project.
2. **Repository ID:** After `--repo_id`, replace `<username>/<repository>` with your Hugging Face username and the desired repository name. (The repository will be created automatically if it doesn't exist.)
3. **Dataset Location:** Ensure that your dataset (e.g. `train.csv`) is in the root directory since `--data_path .` directs AutoTrain to look there.
4. **LoRA Parameters:** Adjust the LoRA parameters if needed. Here, `--target-modules q_proj,v_proj` indicates which parts of the model to fine-tune.
5. **Mixed Precision & Trainer:** Confirm you want to use BF16 mixed-precision training along with the Reward Trainer.
6. **Authentication:** If required, include your Hugging Face credentials using `--username` and `--token`.

---

In [None]:
!autotrain llm --train \
    --project-name mistral-prompttune\
    --model mistralai/Mistral-7B-Instruct-v0.3 \
    --data-path . \
    --peft \
    --text-column text \
    --rejected-text-column rejected_text \
    --lora-r 16 \
    --lora-alpha 32 \
    --lora-dropout 0.1 \
    --quantization int4 \
    --lr 1e-4 \
    --batch-size 4 \
    --epochs 10 \
    --trainer reward \
    --scheduler cosine \
    --mixed-precision bf16 \
    --target-modules q_proj,v_proj \
    --push-to-hub \
    --username VidyutCx \

In [None]:
!autotrain llm --help

## Step 5: Completed 🎉
After the command above is completed your Model will be uploaded to Hugging Face.

#### Learn more about AutoTrain (optional)
If you want to learn more about what command-line flags are available

## Step 6: Inference Engine

In [None]:
!autotrain llm -h

In [None]:
!pip install -q peft  accelerate bitsandbytes safetensors

In [None]:
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers
adapters_name = "ashishpatel26/mistral-7b-mj-finetuned"
model_name = "bn22/Mistral-7B-Instruct-v0.1-sharded" #"mistralai/Mistral-7B-Instruct-v0.1"


device = "cuda" # the device to load the model onto

In [None]:
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config,
    device_map='auto'
)

## Step 7: Peft Model Loading with upload model

In [None]:
model = PeftModel.from_pretrained(model, adapters_name)

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.bos_token_id = 1

stop_token_ids = [0]

print(f"Successfully loaded the model {model_name} into memory")

In [None]:
text = "[INST] generate a midjourney prompt for A person walks in the rain [/INST]"

encoded = tokenizer(text, return_tensors="pt", add_special_tokens=False)
model_input = encoded
model.to(device)
generated_ids = model.generate(**model_input, max_new_tokens=200, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])