## Fine-tuning Mistral 7b with AutoTrain

Setup Runtime
For fine-tuning Llama, a GPU instance is essential. Follow the directions below:

- Go to `Runtime` (located in the top menu bar).
- Select `Change Runtime Type`.
- Choose `T4 GPU` (or a comparable option).

### Step 1: Setup Environment

In [2]:
!pip install pandas autotrain-advanced -q

In [3]:
!autotrain setup --update-torch

[1mINFO    [0m | [32m2025-03-31 14:06:12[0m | [36mautotrain.cli.run_setup[0m:[36mrun[0m:[36m43[0m - [1mInstalling latest xformers[0m
[1mINFO    [0m | [32m2025-03-31 14:06:13[0m | [36mautotrain.cli.run_setup[0m:[36mrun[0m:[36m45[0m - [1mSuccessfully installed latest xformers[0m
[1mINFO    [0m | [32m2025-03-31 14:06:13[0m | [36mautotrain.cli.run_setup[0m:[36mrun[0m:[36m51[0m - [1mInstalling latest PyTorch[0m
[1mINFO    [0m | [32m2025-03-31 14:06:16[0m | [36mautotrain.cli.run_setup[0m:[36mrun[0m:[36m53[0m - [1mSuccessfully installed latest PyTorch[0m


In [24]:
!pip install triton
!pip install --upgrade bitsandbytes

Collecting bitsandbytes
  Downloading bitsandbytes-0.45.4-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Downloading bitsandbytes-0.45.4-py3-none-manylinux_2_24_x86_64.whl (76.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.0/76.0 MB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
  Attempting uninstall: bitsandbytes
    Found existing installation: bitsandbytes 0.45.0
    Uninstalling bitsandbytes-0.45.0:
      Successfully uninstalled bitsandbytes-0.45.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
autotrain-advanced 0.8.36 requires bitsandbytes==0.45.0; sys_platform == "linux", but you have bitsandbytes 0.45.4 which is incompatible.[0m[31m
[0mSuccessfully installed bitsandbytes-0.45.4


## Step 2: Connect to HuggingFace for Model Upload

### Logging to Hugging Face
To make sure the model can be uploaded to be used for Inference, it's necessary to log in to the Hugging Face hub.

### Getting a Hugging Face token
Steps:

1. Navigate to this URL: https://huggingface.co/settings/tokens
2. Create a write `token` and copy it to your clipboard
3. Run the code below and enter your `token`

In [4]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Step 3: Upload your dataset

Add your data set to the root directory in the Colab under the name train.csv. The AutoTrain command will look for your data there under that name.

#### Don't have a data set and want to try finetuning on an example data set?
If you don't have a dataset you can run these commands below to get an example data set and save it to train.csv

In [20]:
import pandas as pd
df = pd.read_csv("train.csv")
df

Unnamed: 0,text,rejected_text
0,system_instruction: System-instruction: You ar...,Suggest medications for a patient with high bl...
1,system_instruction: You are an endocrinology s...,Provide treatment options for type 2 diabetes.
2,system_instruction: You are a pediatric immuni...,Explain vaccination protocols for children.
3,system_instruction: You are a financial analys...,Analyze the stock market trends for the last 6...
4,system_instruction: You are a certified financ...,Explain strategies for retirement planning.
...,...,...
106,system_instruction: You are a predictive analy...,You are a predictive analytics assistant for f...
107,system_instruction: You are a healthcare assis...,You are a healthcare assistant. Provide treatm...
108,system_instruction: You are an emergency respo...,You are an emergency response coordinator. Ass...
109,system_instruction: You are a robotics managem...,You are a robotics management assistant. Coord...


In [21]:
df['text'][15]

'role: You are an experienced high school math educator specializing in algebra. Your task is to create a detailed lesson plan that introduces quadratic equations; explains their properties; methods of solving; and practical applications. \n state_tracking: ["Step 1: Identify the core concepts related to quadratic equations (standard form; vertex form; roots; discriminant)."; "Step 2: Define clear learning objectives and outcomes."; "Step 3: Design engaging instructional activities and real-world examples."; "Step 4: Include formative assessment methods to gauge understanding."; "Step 5: Provide a summary and list supplementary reference materials."] \n few_shot_examples: [{input: "How do you introduce quadratic equations in simple terms?"; internal_thought: "Link the concept with everyday situations and visual aids."; output: "Begin by explaining that quadratic equations have the form ax² + bx + c = 0 and illustrate their use in modeling real-world scenarios like projectile motion or 

## Step 4: Overview of AutoTrain Command

#### Short Overview of What the Command Flags Do

- `!autotrain`: Runs a shell command directly (commonly used in Jupyter or Colab notebooks).
- `llm`: Specifies that you’re working on a language model task.
- `--train`: Initiates the training process.
- `--project-name`: Sets the name of the project (and output directory).
- `--model abhishek/llama-2-7b-hf-small-shards`: Specifies the base model hosted on Hugging Face (here, "llama-2-7b-hf-small-shards" under the "abhishek" namespace).
- `--data_path .`: Points to the directory containing your dataset. With `.` AutoTrain will look for your dataset (e.g. `train.csv`) in the current directory.
- `--peft`: Enables Parameter-Efficient Fine-Tuning (PEFT) using techniques like LoRA.
- `--lora-r`, `--lora-alpha`, `--lora-dropout`: Configure LoRA parameters (e.g., rank, scaling factor, and dropout rate).
- `--quantization int4`: Enables 4-bit quantization to reduce model size and speed up inference with minimal precision loss.
- `--lr 2e-4`: Sets the learning rate for training to 0.0002.
- `--batch-size 12`: Sets the training batch size to 12.
- `--epochs 3`: Instructs AutoTrain to iterate over your dataset 3 times (3 training epochs).
- `--mixed-precision bf16`: Uses BF16 mixed-precision training for improved performance and memory efficiency.
- `--trainer reward`: Specifies that the Reward Trainer should be used (ideal for training reward models).
- `--target-modules q_proj,v_proj`: Limits fine-tuning to specific target modules (in this case, the query and value projection layers).
- `--push-to-hub`: Automatically uploads the fine-tuned model to the Hugging Face Hub after training.

---

### Steps Needed Before Running

1. **Project Name:** After `--project-name`, replace `<project-name>` with the name you'd like to assign to your project.
2. **Repository ID:** After `--repo_id`, replace `<username>/<repository>` with your Hugging Face username and the desired repository name. (The repository will be created automatically if it doesn't exist.)
3. **Dataset Location:** Ensure that your dataset (e.g. `train.csv`) is in the root directory since `--data_path .` directs AutoTrain to look there.
4. **LoRA Parameters:** Adjust the LoRA parameters if needed. Here, `--target-modules q_proj,v_proj` indicates which parts of the model to fine-tune.
5. **Mixed Precision & Trainer:** Confirm you want to use BF16 mixed-precision training along with the Reward Trainer.
6. **Authentication:** If required, include your Hugging Face credentials using `--username` and `--token`.

---

### Example Command

```bash
!autotrain llm --train \
    --project-name my-project-name \
    --model mistralai/Mistral-7B-Instruct-v0.3 \
    --data_path . \
    --peft \
    --lora-r 16 \
    --lora-alpha 32 \
    --lora-dropout 0.1 \
    --quantization int4 \
    --lr 2e-4 \
    --batch-size 12 \
    --epochs 3 \
    --mixed-precision bf16 \
    --trainer reward \
    --target-modules q_proj,v_proj \
    --push-to-hub \
    --username USERNAME \


In [None]:
!autotrain llm --train \
    --project-name mistral-prompttune\
    --model mistralai/Mistral-7B-Instruct-v0.3 \
    --data-path . \
    --peft \
    --text-column text \
    --rejected-text-column rejected_text \
    --lora-r 16 \
    --lora-alpha 32 \
    --lora-dropout 0.1 \
    --quantization int4 \
    --lr 1e-4 \
    --batch-size 4 \
    --epochs 10 \
    --trainer reward \
    --scheduler cosine \
    --mixed-precision bf16 \
    --target-modules q_proj,v_proj \
    --push-to-hub \
    --username VidyutCx \

[1mINFO    [0m | [32m2025-03-31 14:56:29[0m | [36mautotrain.cli.run_llm[0m:[36mrun[0m:[36m136[0m - [1mRunning LLM[0m
Saving the dataset (0/1 shards):   0% 0/111 [00:00<?, ? examples/s]Saving the dataset (1/1 shards): 100% 111/111 [00:00<00:00, 21451.77 examples/s]Saving the dataset (1/1 shards): 100% 111/111 [00:00<00:00, 20713.07 examples/s]
Saving the dataset (0/1 shards):   0% 0/111 [00:00<?, ? examples/s]Saving the dataset (1/1 shards): 100% 111/111 [00:00<00:00, 21673.47 examples/s]Saving the dataset (1/1 shards): 100% 111/111 [00:00<00:00, 20962.08 examples/s]
[1mINFO    [0m | [32m2025-03-31 14:56:29[0m | [36mautotrain.backends.local[0m:[36mcreate[0m:[36m20[0m - [1mStarting local training...[0m
[1mINFO    [0m | [32m2025-03-31 14:56:29[0m | [36mautotrain.commands[0m:[36mlaunch_command[0m:[36m514[0m - [1m['accelerate', 'launch', '--num_machines', '1', '--num_processes', '1', '--mixed_precision', 'bf16', '-m', 'autotrain.trainers.clm', '--tra

In [14]:
!autotrain llm --help

usage: autotrain <command> [<args>] llm [-h] [--train] [--deploy] [--inference]
                                        [--backend BACKEND] [--model MODEL]
                                        [--project-name PROJECT_NAME] [--data-path DATA_PATH]
                                        [--train-split TRAIN_SPLIT] [--valid-split VALID_SPLIT]
                                        [--add-eos-token] [--model-max-length MODEL_MAX_LENGTH]
                                        [--padding PADDING] [--trainer TRAINER]
                                        [--use-flash-attention-2] [--log LOG]
                                        [--disable-gradient-checkpointing]
                                        [--logging-steps LOGGING_STEPS]
                                        [--eval-strategy EVAL_STRATEGY]
                                        [--save-total-limit SAVE_TOTAL_LIMIT]
                                        [--auto-find-batch-size]
                                      

## Step 5: Completed 🎉
After the command above is completed your Model will be uploaded to Hugging Face.

#### Learn more about AutoTrain (optional)
If you want to learn more about what command-line flags are available

## Step 6: Inference Engine

In [None]:
!autotrain llm -h

In [None]:
!pip install -q peft  accelerate bitsandbytes safetensors

In [None]:
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import transformers
adapters_name = "ashishpatel26/mistral-7b-mj-finetuned"
model_name = "bn22/Mistral-7B-Instruct-v0.1-sharded" #"mistralai/Mistral-7B-Instruct-v0.1"


device = "cuda" # the device to load the model onto

In [None]:
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    quantization_config=bnb_config,
    device_map='auto'
)

Loading checkpoint shards:   0%|          | 0/11 [00:00<?, ?it/s]

## Step 7: Peft Model Loading with upload model

In [None]:
model = PeftModel.from_pretrained(model, adapters_name)

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.bos_token_id = 1

stop_token_ids = [0]

print(f"Successfully loaded the model {model_name} into memory")

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Successfully loaded the model bn22/Mistral-7B-Instruct-v0.1-sharded into memory


In [None]:
text = "[INST] generate a midjourney prompt for A person walks in the rain [/INST]"

encoded = tokenizer(text, return_tensors="pt", add_special_tokens=False)
model_input = encoded
model.to(device)
generated_ids = model.generate(**model_input, max_new_tokens=200, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


[INST] generate a midjourney prompt for A person walks in the rain [/INST] "As you wander through the pouring rain, you can't help but wonder what the world would be like if things were different. What if the rain was a symbol of the turmoil in your life, and the sunshine promised a brighter future? What if you suddenly found yourself lost in a small town where time stood still, and the people were trapped in a time loop? As you struggle to find your way back to reality, you discover a mysterious stranger who seems to hold the key to unlocking the secrets of the town and your own past."</s>
