# Supervised Fine Tuning

- Once the base model (foundation model) is trained for text completion task, it is time for `Post-Training`
- It is the step in which the model will get capability of following instructions and conversing
- Post-training consists of two steps:
    - Supervised Fine Tuning (SFT)
    - Preference Fine Tuning (PFT)
- Supervised Fine-Tuning (SFT) - Training the model on curated, instruction-based data.
- Preference Fine-Tuning (RLHF, DPO, PPO) - Training the model to align better with human preference

In order to perform the SFT, there are three approaches:
- Full fine tuning
- LoRA
- QLoRA

In this notebook, QLoRA implemented to perform the SFT on phi-2 model using OpenAssistant dataset

---

## Install Dependencies

In [1]:
!pip install -q torch transformers accelerate bitsandbytes datasets peft trl pytorch-lightning tensorboard einops

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m59.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m46.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m35.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Model Training
Train the Phi-2 base model on [OpenAssistant](https://huggingface.co/datasets/OpenAssistant/oasst1?row=0) dataset using QLoRA

In [2]:
!python training.py --max_epochs 1 --no_validation

2025-03-07 07:08:43.033975: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1741331323.295344     797 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1741331323.366067     797 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-07 07:08:43.936817: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Setting batch size to 1 to save memory
Set gradient accumulation to 16
Set dataloader workers to 0 (main process only

## Inference

In [3]:
!python inference.py --model_path ./checkpoints/last.ckpt --base_model microsoft/phi-2 --example_prompts

2025-03-07 07:18:23.647778: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1741331903.690773    3292 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1741331903.703082    3292 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-07 07:18:23.749847: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loading tokenizer from microsoft/phi-2...
Loading model from ./checkpoints/last.ckpt...
Expected files not found. Ava

---
---

## Train QLoRA Adapters

In [4]:
!python training.py --max_epochs 1 --save_adapters_only

2025-03-07 07:26:12.208350: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1741332372.235178    5282 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1741332372.245135    5282 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-07 07:26:12.282816: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

Found existing checkpoint: checkpoints/last.ckpt
Resume training from checkpoint? (y/n): n
Setting batch size to 1 t

## Inference with QLoRA Adapters

In [5]:
!python inference.py --model_path ./adapters --base_model microsoft/phi-2 --use_qlora --example_prompts

2025-03-07 07:35:17.051504: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1741332917.076949    7608 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1741332917.086063    7608 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-07 07:35:17.134696: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Loading tokenizer from microsoft/phi-2...
Loading base model microsoft/phi-2...
Using 4-bit quantization (QLoRA)
`low