<a href="https://colab.research.google.com/github/NewCodeLearner/LLM-finetunning/blob/main/2_Fine_tuning_DeepSeek_R1(Reasoning_Model).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# About DeepSeek R1

**DeepSeek-R1-Zero**
DeepSeek-R1-Zero is the first open-source model trained solely with large-scale reinforcement learning (RL) instead of supervised fine-tuning (SFT) as an initial step. This approach enables the model to independently explore chain-of-thought (CoT) reasoning, solve complex problems, and iteratively refine its outputs. However, it comes with challenges such as repetitive reasoning steps, poor readability, and language mixing that can impact its clarity and usability.

**DeepSeek-R1**
DeepSeek-R1 was introduced to overcome the limitations of DeepSeek-R1-Zero by incorporating cold-start data before reinforcement learning, providing a strong foundation for reasoning and non-reasoning tasks.

This multi-stage training enables the model to achieve state-of-the-art performance, comparable to OpenAI-o1, across math, code, and reasoning benchmarks while improving its output's readability and coherence.




# 1. Setting up

For this project, we are using Google Colab as our Cloud IDE because it provides free access to GPUs, To get started, launch a new Colab notebook and add your Hugging Face token and Weights & Biases token as secrets.  

After setting up the secrets, install the unsloth Python package. Unsloth is an open-source framework designed to make fine-tuning large language models (LLMs) 2X faster and more memory-efficient.

In [1]:
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

Log in to the Hugging Face CLI using the Hugging Face API that we securely extracted from Colab Secrets.

In [2]:
from huggingface_hub import login
from google.colab import userdata


hf_token = userdata.get('HF_TOKEN')
login(hf_token)

Log in to Weights & Biases (wandb) using your API key and create a new project to track the experiments and fine-tuning progress.

In [3]:
import wandb

wb_token = userdata.get('WANDB_KEY')

wandb.login(key=wb_token)
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset',
    job_type="training",
    anonymous="allow"
)

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mgirish-barhate[0m ([33mgirish-barhate-student[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


# 2. Loading the model and tokenizer

For this project, we are loading the Unsloth version of DeepSeek-R1-Distill-Llama-8B.  Additionally, we will load the model in 4-bit quantization to optimize memory usage and performance.

In [4]:
from unsloth import FastLanguageModel

max_seq_length = 2048
dtype = None
load_in_4bit =True

model,tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hf_token
)

==((====))==  Unsloth 2025.1.8: Fast Llama patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!




model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/52.9k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]