Breakdown of Each Library

1️⃣ accelerate – Hugging Face ka library jo multi-GPU, TPU & distributed training ko optimize karta hai.

Agar FSDP, DeepSpeed use kar rahe ho toh must-have hai.

2️⃣ peft (Parameter Efficient Fine-Tuning) –

LoRA, QLoRA, Adapters jaise low-memory tuning methods ke liye use hota hai.
Full fine-tuning ki jagah lightweight & efficient tuning karne me madad karta hai.

3️⃣ bitsandbytes –

8-bit aur 4-bit quantization support karta hai.
QLoRA fine-tuning me VRAM kaafi save hota hai.

4️⃣ git+https://github.com/huggingface/transformers –

Hugging Face ke transformers ka latest GitHub version install karta hai.
Ye zaroori hai agar koi naye models ya features chahiye ho jo PyPI version me nahi mile.

5️⃣ trl (Transformer Reinforcement Learning) –

RLHF (Reinforcement Learning from Human Feedback) ke liye.
Agar ChatGPT-like models banana hai toh `trl`` ka use hota hai.

6️⃣ py7zr –

7z format wali compressed files ko unzip karne ke liye.
Agar Hugging Face ya kisi aur se compressed dataset mila toh ye useful hoga.

7️⃣ auto-gptq –

GPTQ-based quantization ke liye.
Faster inference aur VRAM efficiency improve karta hai.

8️⃣ optimum –

Hugging Face ka library jo ONNX, TensorRT, Habana Gaudi, NeuronX jaise hardware optimizations provide karta hai.

Accelerated inference aur optimized training ke liye best hai.

🔥 Summary

Agar low-VRAM GPUs (24GB ya less) par fine-tuning kar rahe ho toh bitsandbytes + peft + QLoRA combo best hai.

Agar multi-GPU/TPU cluster pe train kar rahe ho toh accelerate + optimum zaroori hai.

Agar RLHF (like ChatGPT) fine-tune karna hai toh TRL package kaam aayega.

In [None]:
!pip install -q accelerate peft bitsandbytes git+https://github.com/huggingface/transformers trl py7zr auto-gptq optimum

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.0/76.0 MB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m335.7/335.7 kB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.9/67.9 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.5/23.5 MB[0m [31m18.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m433.6/433.6 kB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.9/2.9 MB[0m [31m66.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m487.4/487.4 kB[0m [31m29.8 MB/s[0m eta [36m0:00:0

 Breakdown
1️⃣ from huggingface_hub import notebook_login

This imports the notebook_login function, which is used for authentication inside Jupyter Notebooks or Google Colab.

2️⃣ notebook_login()

This will prompt you to enter your Hugging Face access token.
You can get the token from Hugging Face website.

🔥 Why is this important?

If you are downloading a private model or dataset, authentication is required.

If you want to upload your fine-tuned model back to Hugging Face, you need to log in first.

💡 Alternative for Script-Based Login

If you are running a script (not in a notebook), use:

from huggingface_hub import login

login(token="your_huggingface_token")


In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### Breakdown of Each Import

1️⃣ import torch

- PyTorch is the core deep-learning library used for training models.
- It helps in tensor operations, GPU acceleration, and model training.

2️⃣ from datasets import load_dataset, Dataset

- `load_dataset:` Used to load datasets from Hugging Face Hub or local files.
- `Dataset:` Helps in creating a dataset manually from Python objects (like a list or dictionary).

3️⃣ from peft import LoraConfig, AutoPeftModelForCausalLM, prepare_model_for_kbit_training, get_peft_model

- `LoraConfig`: Configuration for LoRA (Low-Rank Adaptation), which makes fine-tuning more memory efficient.

- `AutoPeftModelForCausalLM:` Loads a causal language model with PEFT (Parameter-Efficient Fine-Tuning).

- `prepare_model_for_kbit_training:` Optimizes the model for low-bit training (8-bit/4-bit with QLoRA).

- `get_peft_model:` Converts a standard model into a LoRA-optimized model.

4️⃣ from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig, TrainingArguments

- `AutoModelForCausalLM:` Loads a pre-trained causal language model (like LLaMA, Mistral).

- `AutoTokenizer:` Tokenizer for preprocessing text input.

- `GPTQConfig:` Configures GPTQ (Quantized GPT) for efficient inference.

- `TrainingArguments:` Defines training settings like epochs, batch size, optimizer, learning rate, etc.

5️⃣ from trl import SFTTrainer

- `SFTTrainer:` Trainer from the trl library used for Supervised Fine-Tuning (SFT).

- It simplifies LoRA-based fine-tuning and integrates well with Hugging Face.

6️⃣ import os

- Used for handling file paths and system settings, like saving models, loading datasets, etc.

🔥 What is this setup used for?

✅ Fine-tuning large language models (LLMs) efficiently using LoRA and QLoRA.

✅ Using Hugging Face datasets and models.

✅ Training a model with low-bit precision (4-bit/8-bit) for better memory efficiency.

In [None]:
import torch
from datasets import load_dataset, Dataset
from peft import LoraConfig, AutoPeftModelForCausalLM, prepare_model_for_kbit_training, get_peft_model
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig, TrainingArguments
from trl import SFTTrainer
import os

 #### Understanding the Code:
- Loading and Preparing the Dataset for Fine-Tuning
- This code is loading, processing, and converting a dataset into a format suitable for fine-tuning an LLM (like Mistral or LLaMA-2) for text summarization.
- Let’s break it down step by step.

________________________________________________________________________________
1️⃣ Loading the Dataset

- data = load_dataset("samsum", split="train")

🔹 load_dataset("samsum", split="train") loads the Samsum dataset, which contains dialogues and their summaries.

🔹 split="train" ensures that we load only the training set.

-----------------------------------------
✅ Samsum Dataset Overview

- `dialogue:` A conversation between people.

- `summary:` A short summary of that conversation.
-----------------------------------------
🔍 Example from the dataset:

`dialogue	summary`

- Alice: Hey, how are you? Bob: I'm good, you?	Alice and Bob greet each other.
-------------------------------------------------------------------------------
2️⃣ Converting Dataset to Pandas DataFrame

- data_df = data.to_pandas()

-🔹This converts the dataset into a Pandas DataFrame for easier processing.
-------------------------------------------------------------------------------
3️⃣ Formatting Data for LLM Fine-Tuning

- `data_df["text"]` = `data_df[["dialogue", "summary"]].apply(
    lambda x: "###Human: Summarize this following dialogue: " + x["dialogue"] +"\n###Assistant: " + x["summary"],axis=1)`


🔹 Purpose: It formats the data into a ChatML-style prompt to fine-tune LLaMA or Mistral.

💡 How It Works:

- It takes the dialogue and summary columns.

- It transforms them into a prompt-response format for LLM training.

🔍 `Example Output:`

###Human: Summarize this following dialogue:  

Alice: Hey, how are you?

Bob: I'm good, you?  

###Assistant: Alice and Bob greet each other.

🔹 This format mimics human-AI interactions, making it suitable for instruction-tuned models like Mistral or LLaMA-2-Chat.
_______________________________________________________________________________
4️⃣ Checking the First Example

- print(data_df.iloc[0])

🔹 data_df.iloc[0] prints the first row of the dataset after formatting.
________________________________________________________________________________
5️⃣ Converting Back to Hugging Face Dataset

- data = Dataset.from_pandas(data_df)
_______________________________________________________________________________
🔹 Why? Since fine-tuning with 🤗 Transformers & PEFT requires a Hugging Face dataset, we convert it back after processing.
________________________________________________________________________________
🚀 Summary

✅ Loads the Samsum dataset (dialogue → summary).

✅ Formats it into a prompt-response structure for LLM fine-tuning.

✅ Converts it back into a Hugging Face Dataset for training.

🔥 Next: Do you want to tokenize this dataset for Mistral/LLaMA-2 fine-tuning? 🚀

In [None]:
data = load_dataset("samsum", split="train")

README.md:   0%|          | 0.00/7.04k [00:00<?, ?B/s]

samsum.py:   0%|          | 0.00/3.36k [00:00<?, ?B/s]

The repository for samsum contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/samsum.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


corpus.7z:   0%|          | 0.00/2.94M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/14732 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/819 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/818 [00:00<?, ? examples/s]

In [None]:
data_df = data.to_pandas()

In [None]:
data_df

Unnamed: 0,id,dialogue,summary
0,13818513,Amanda: I baked cookies. Do you want some?\r\...,Amanda baked cookies and will bring Jerry some...
1,13728867,Olivia: Who are you voting for in this electio...,Olivia and Olivier are voting for liberals in ...
2,13681000,"Tim: Hi, what's up?\r\nKim: Bad mood tbh, I wa...",Kim may try the pomodoro technique recommended...
3,13730747,"Edward: Rachel, I think I'm in ove with Bella....",Edward thinks he is in love with Bella. Rachel...
4,13728094,Sam: hey overheard rick say something\r\nSam:...,"Sam is confused, because he overheard Rick com..."
...,...,...,...
14727,13863028,Romeo: You are on my ‘People you may know’ lis...,Romeo is trying to get Greta to add him to her...
14728,13828570,Theresa: <file_photo>\r\nTheresa: <file_photo>...,Theresa is at work. She gets free food and fre...
14729,13819050,John: Every day some bad news. Japan will hunt...,Japan is going to hunt whales again. Island an...
14730,13828395,Jennifer: Dear Celia! How are you doing?\r\nJe...,Celia couldn't make it to the afternoon with t...


In [None]:
data_df.columns

Index(['id', 'dialogue', 'summary'], dtype='object')

In [None]:
data_df["summary"][0]

'Amanda baked cookies and will bring Jerry some tomorrow.'

In [None]:
data_df["summary"][1]

'Olivia and Olivier are voting for liberals in this election. '

In [None]:
data_df["summary"][2]

'Kim may try the pomodoro technique recommended by Tim to get more stuff done.'

In [None]:
data_df["summary"][3]

'Edward thinks he is in love with Bella. Rachel wants Edward to open his door. Rachel is outside. '

### Adding the Text Column,as below

In [None]:
data_df["text"] = data_df[["dialogue", "summary"]].apply(lambda x: "###Human: Summarize this following dialogue: " + x["dialogue"] + "\n###Assistant: " +x["summary"], axis=1)

In [None]:
data_df

Unnamed: 0,id,dialogue,summary,text
0,13818513,Amanda: I baked cookies. Do you want some?\r\...,Amanda baked cookies and will bring Jerry some...,###Human: Summarize this following dialogue: A...
1,13728867,Olivia: Who are you voting for in this electio...,Olivia and Olivier are voting for liberals in ...,###Human: Summarize this following dialogue: O...
2,13681000,"Tim: Hi, what's up?\r\nKim: Bad mood tbh, I wa...",Kim may try the pomodoro technique recommended...,###Human: Summarize this following dialogue: T...
3,13730747,"Edward: Rachel, I think I'm in ove with Bella....",Edward thinks he is in love with Bella. Rachel...,###Human: Summarize this following dialogue: E...
4,13728094,Sam: hey overheard rick say something\r\nSam:...,"Sam is confused, because he overheard Rick com...",###Human: Summarize this following dialogue: S...
...,...,...,...,...
14727,13863028,Romeo: You are on my ‘People you may know’ lis...,Romeo is trying to get Greta to add him to her...,###Human: Summarize this following dialogue: R...
14728,13828570,Theresa: <file_photo>\r\nTheresa: <file_photo>...,Theresa is at work. She gets free food and fre...,###Human: Summarize this following dialogue: T...
14729,13819050,John: Every day some bad news. Japan will hunt...,Japan is going to hunt whales again. Island an...,###Human: Summarize this following dialogue: J...
14730,13828395,Jennifer: Dear Celia! How are you doing?\r\nJe...,Celia couldn't make it to the afternoon with t...,###Human: Summarize this following dialogue: J...


In [None]:
data_df["text"]

Unnamed: 0,text
0,###Human: Summarize this following dialogue: A...
1,###Human: Summarize this following dialogue: O...
2,###Human: Summarize this following dialogue: T...
3,###Human: Summarize this following dialogue: E...
4,###Human: Summarize this following dialogue: S...
...,...
14727,###Human: Summarize this following dialogue: R...
14728,###Human: Summarize this following dialogue: T...
14729,###Human: Summarize this following dialogue: J...
14730,###Human: Summarize this following dialogue: J...


In [None]:
data_df["dialogue"][0]

"Amanda: I baked  cookies. Do you want some?\r\nJerry: Sure!\r\nAmanda: I'll bring you tomorrow :-)"

In [None]:
data_df["summary"][0]

'Amanda baked cookies and will bring Jerry some tomorrow.'

In [None]:
data_df["text"][0]

"###Human: Summarize this following dialogue: Amanda: I baked  cookies. Do you want some?\r\nJerry: Sure!\r\nAmanda: I'll bring you tomorrow :-)\n###Assistant: Amanda baked cookies and will bring Jerry some tomorrow."


### Human: Summarize this following dialogue:  

 Amanda: I baked cookies. Do you want some?  
Jerry: Yes, I’d love some.  

### Assistant:

- Amanda baked cookies and will bring Jerry some.


### Text Column
- Text column is having the dialogue + summary

In [None]:
print(data_df.iloc[0])

id                                                   13818513
dialogue    Amanda: I baked  cookies. Do you want some?\r\...
summary     Amanda baked cookies and will bring Jerry some...
text        ###Human: Summarize this following dialogue: A...
Name: 0, dtype: object


In [None]:
data_df.iloc[0][3]

  data_df.iloc[0][3]


"###Human: Summarize this following dialogue: Amanda: I baked  cookies. Do you want some?\r\nJerry: Sure!\r\nAmanda: I'll bring you tomorrow :-)\n###Assistant: Amanda baked cookies and will bring Jerry some tomorrow."

In [None]:
data = Dataset.from_pandas(data_df)

In [None]:
data

Dataset({
    features: ['id', 'dialogue', 'summary', 'text'],
    num_rows: 14732
})

In [None]:
tokenizer = AutoTokenizer.from_pretrained("TheBloke/Mistral-7B-Instruct-v0.1-GPTQ")

tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

In [None]:
tokenizer.eos_token

'</s>'

In [None]:
tokenizer.eos_token_id

2

In [None]:
tokenizer.pad_token #by default there is no padd in mistral

In [None]:
tokenizer.pad_token = tokenizer.eos_token

In [None]:
quantization_config_loading = GPTQConfig(bits=4, disable_exllama=True, tokenizer=tokenizer)

Using `disable_exllama` is deprecated and will be removed in version 4.37. Use `use_exllama` instead and specify the version with `exllama_config`.The value of `use_exllama` will be overwritten by `disable_exllama` passed in `GPTQConfig` or stored in your config file.


In [None]:
model = AutoModelForCausalLM.from_pretrained(
                          "TheBloke/Mistral-7B-Instruct-v0.1-GPTQ",
                          quantization_config=quantization_config_loading,
                          device_map="auto")

config.json:   0%|          | 0.00/963 [00:00<?, ?B/s]

  @custom_fwd
  @custom_bwd
  @custom_fwd(cast_inputs=torch.float16)


model.safetensors:   0%|          | 0.00/4.16G [00:00<?, ?B/s]

`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
Some weights of the model checkpoint at TheBloke/Mistral-7B-Instruct-v0.1-GPTQ were not used when initializing MistralForCausalLM: ['model.layers.0.mlp.down_proj.bias', 'model.layers.0.mlp.gate_proj.bias', 'model.layers.0.mlp.up_proj.bias', 'model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.1.mlp.down_proj.bias', 'model.layers.1.mlp.gate_proj.bias', 'model.layers.1.mlp.up_proj.bias', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.10.mlp.down_proj.bias', 'model.layers.10.mlp.gate_proj.bias', 'model.layers.10.mlp.up_proj.bias', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.q_pr

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

In [None]:
print(model)

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (k_proj): QuantLinear()
          (o_proj): QuantLinear()
          (q_proj): QuantLinear()
          (v_proj): QuantLinear()
        )
        (mlp): MistralMLP(
          (act_fn): SiLU()
          (down_proj): QuantLinear()
          (gate_proj): QuantLinear()
          (up_proj): QuantLinear()
        )
        (input_layernorm): MistralRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): MistralRMSNorm((4096,), eps=1e-05)
      )
    )
    (norm): MistralRMSNorm((4096,), eps=1e-05)
    (rotary_emb): MistralRotaryEmbedding()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)


In [None]:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig # Importing BitsAndBytesConfig


In [None]:
# Load a 4-bit quantized model
quantization_config = BitsAndBytesConfig(
    load_in_4bit = True,    # Enable 4-bit quantization
    bnb_4bit_compute_dtype = torch.float16,  # Use fp16 for computation
    bnb_4bit_use_double_quant = True,  # Use double quantization for memory efficiency
)

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    quantization_config=quantization_config,
    device_map = "auto" # Automatically assigns layers to available GPUs
    # token="YOUR_HUGGINGFACE_TOKEN" # Add your token here
)

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

In [None]:
model.config.use_cache=False

In [None]:
model.config.pretraining_tp=1

In [None]:
model.gradient_checkpointing_enable()

In [None]:
model = prepare_model_for_kbit_training(model)

r=16 controls how much LoRA modifies the model (higher = more expressive).

✅ lora_alpha=16 scales LoRA’s effect on training.

✅ lora_dropout=0.05 prevents overfitting (good for small datasets).

✅ target_modules=["q_proj", "v_proj"] makes LoRA memory-efficient.

✅ Great for fine-tuning LLaMA, Mistral, Falcon on low-VRAM GPUs.



In [None]:
print(model)

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): MistralRMSNorm((4096,), eps=1e-05)
      )
    )
    (norm): Mist

In [None]:
# ["q_proj", "v_proj", "k_proj"] → Adds key projection (more expressive)
# ["q_proj", "v_proj", "out_proj"] → Also fine-tunes attention output


### LORA

In [None]:
## Inside Params are all Hyperparameters
peft_config = LoraConfig(
        r=16, lora_alpha=16, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", target_modules=["q_proj", "v_proj"]
    )

In [None]:
model = get_peft_model(model, peft_config)

In [None]:
print(model)

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): MistralForCausalLM(
      (model): MistralModel(
        (embed_tokens): Embedding(32000, 4096)
        (layers): ModuleList(
          (0-31): 32 x MistralDecoderLayer(
            (self_attn): MistralAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_pro

- Training


In [None]:
training_arguments = TrainingArguments(
        output_dir="mistral-finetuned-samsum",  # Directory where model checkpoints will be saved
        per_device_train_batch_size=8,  # Number of samples per batch for each device (GPU/TPU)
        gradient_accumulation_steps=1,  # Number of steps to accumulate gradients before performing a backward update
        optim="paged_adamw_32bit",  # Optimizer used for training; paged AdamW with 32-bit precision
        learning_rate=2e-4,  # Initial learning rate for the optimizer
        lr_scheduler_type="cosine",  # Learning rate scheduler; cosine decay
        save_strategy="epoch",  # Save model checkpoints at the end of each epoch
        logging_steps=100,  # Log training metrics every 100 steps
        num_train_epochs=1,  # Number of full training passes over the dataset
        max_steps=250,  # Maximum number of training steps (overrides epochs if reached first)
        fp16=True,  # Enable 16-bit floating point (mixed precision) training for faster computation
        # push_to_hub=True,  # Upload model checkpoints to the Hugging Face Model Hub
        report_to="none"  # Disable logging to external monitoring tools like WandB or TensorBoard
)


In [None]:
# Create the SFTTrainer
trainer = SFTTrainer(
        model=model,
        train_dataset=data,
        peft_config=peft_config,
        args=training_arguments,
        # tokenizer=tokenizer,
        #  model.tokenizer = tokenizer,
    )
trainer.model.config.tokenizer = tokenizer

Converting train dataset to ChatML:   0%|          | 0/14732 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/14732 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/14732 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/14732 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [None]:
trainer.train()

  return fn(*args, **kwargs)


OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 0 has a total capacity of 14.74 GiB of which 36.12 MiB is free. Process 11730 has 14.70 GiB memory in use. Of the allocated memory 13.74 GiB is allocated by PyTorch, and 848.54 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In [None]:
! cp -r /content/mistral-finetuned-samsum /content/drive/MyDrive/

In [None]:
trainer.push_to_hub()

adapter_model.safetensors:   0%|          | 0.00/27.3M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.62k [00:00<?, ?B/s]

Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/Mohan-DS-1321/mistral-finetuned-samsum/commit/82ed4be69859269c6156600f9e958301b9f17eac', commit_message='End of training', commit_description='', oid='82ed4be69859269c6156600f9e958301b9f17eac', pr_url=None, repo_url=RepoUrl('https://huggingface.co/Mohan-DS-1321/mistral-finetuned-samsum', endpoint='https://huggingface.co', repo_type='model', repo_id='Mohan-DS-1321/mistral-finetuned-samsum'), pr_revision=None, pr_num=None)

In [None]:
from peft import AutoPeftModelForCausalLM
from transformers import GenerationConfig
from transformers import AutoTokenizer
import torch


tokenizer = AutoTokenizer.from_pretrained("/content/mistral-finetuned-samsum")

- ###Human: Summarize this following dialogue: Sunny: I'm at the railway station in Chennai Karthik: No problems so far? Sunny: no, everything's going smoothly Karthik: good. lets meet there soon!


In [None]:
inputs = tokenizer("""
###Human: Summarize this following dialogue: Sunny: I'm at the railway station in Chennai Karthik: No problems so far? Sunny: no, everything's going smoothly Karthik: good. lets meet there soon!
###Assistant: """, return_tensors="pt").to("cuda")


In [None]:
model = AutoPeftModelForCausalLM.from_pretrained(
    "/content/mistral-finetuned-samsum",
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map="cuda")

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

OutOfMemoryError: CUDA out of memory. Tried to allocate 250.00 MiB. GPU 0 has a total capacity of 14.74 GiB of which 48.12 MiB is free. Process 11730 has 14.69 GiB memory in use. Of the allocated memory 13.63 GiB is allocated by PyTorch, and 948.95 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In [None]:
generation_config = GenerationConfig(
    do_sample=True,
    top_k=1,
    temperature=0.1,
    max_new_tokens=25,
    pad_token_id=tokenizer.eos_token_id
)

In [None]:
import time
st_time = time.time()
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
print(time.time()-st_time)

  return fn(*args, **kwargs)


OutOfMemoryError: CUDA out of memory. Tried to allocate 224.00 MiB. GPU 0 has a total capacity of 14.74 GiB of which 44.12 MiB is free. Process 11730 has 14.70 GiB memory in use. Of the allocated memory 13.75 GiB is allocated by PyTorch, and 833.82 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)