<a href="https://colab.research.google.com/github/Kamran-Ali-Shah/Kamran-Ali-Shah/blob/main/Fine_tune_DeepSeek.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Implementation of DeepSeek Finetuning using Unsloth and Ollama
# Inspired from https://unsloth.ai/blog/deepseek-r1

In [1]:
!pip install unsloth

!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

Collecting unsloth
  Downloading unsloth-2025.1.8-py3-none-any.whl.metadata (53 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/53.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.9/53.9 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting unsloth_zoo>=2025.1.4 (from unsloth)
  Downloading unsloth_zoo-2025.1.5-py3-none-any.whl.metadata (16 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
  Downloading xformers-0.0.29.post2-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Collecting bitsandbytes (from unsloth)
  Downloading bitsandbytes-0.45.1-py3-none-manylinux_2_24_x86_64.whl.metadata (5.8 kB)
Collecting tyro (from unsloth)
  Downloading tyro-0.9.13-py3-none-any.whl.metadata (9.4 kB)
Collecting datasets>=2.16.0 (from unsloth)
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting trl!=0.9.0,!=0.9.1,!=0.9.2,!=0.9.3,>=0.7.9 (from unsloth)
  Downloading trl-0.14.0-

Found existing installation: unsloth 2025.1.8
Uninstalling unsloth-2025.1.8:
  Successfully uninstalled unsloth-2025.1.8
Collecting git+https://github.com/unslothai/unsloth.git
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-req-build-3joh3jcd
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-req-build-3joh3jcd
  Resolved https://github.com/unslothai/unsloth.git to commit 038e6d4c8d40207a87297ab3aaf787c19b1006d1
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: unsloth
  Building wheel for unsloth (pyproject.toml) ... [?25l[?25hdone
  Created wheel for unsloth: filename=unsloth-2025.1.8-py3-none-any.whl size=174982 sha256=153b37c904fac19b99ba34393eef56209cf43d55f24646bee87a58fc417e9753
  Stored in directory: /tmp/pip-ephem-wheel-cache-w8rvzs09/wheels/d1/17/

In [2]:
from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.1.8: Fast Llama patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/52.9k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 4,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 42,
    use_rslora = False,
    loftq_config = None,
)

Not an error, but Unsloth cannot patch MLP layers with our manual autograd engine since either LoRA adapters
are not enabled or a bias term (like in Qwen) is used.
Unsloth 2025.1.8 patched 32 layers with 32 QKV layers, 32 O layers and 0 MLP layers.


<a name="Data"></a>
### Data Prep

In [4]:
from datasets import load_dataset
dataset = load_dataset("open-r1/OpenThoughts-114k-math", split = "train")
print(dataset.column_names)

README.md:   0%|          | 0.00/2.91k [00:00<?, ?B/s]

train-00000-of-00005.parquet:   0%|          | 0.00/199M [00:00<?, ?B/s]

train-00001-of-00005.parquet:   0%|          | 0.00/201M [00:00<?, ?B/s]

train-00002-of-00005.parquet:   0%|          | 0.00/203M [00:00<?, ?B/s]

train-00003-of-00005.parquet:   0%|          | 0.00/200M [00:00<?, ?B/s]

train-00004-of-00005.parquet:   0%|          | 0.00/176M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/89120 [00:00<?, ? examples/s]

['source', 'problem', 'solution', 'messages', 'system', 'conversations', 'generated_token_count', 'correct']


In [5]:
dataset[0]

{'source': 'olympiads',
 'problem': 'Let \\( a, b, c \\) be positive real numbers. Prove that\n\n$$\n\\frac{1}{a(1+b)}+\\frac{1}{b(1+c)}+\\frac{1}{c(1+a)} \\geq \\frac{3}{1+abc},\n$$\n\nand that equality occurs if and only if \\( a = b = c = 1 \\).',
 'solution': "1. Consider the given inequality:\n\n\\[\n\\frac{1}{a(1+b)}+ \\frac{1}{b(1+c)}+ \\frac{1}{c(1+a)} \\geq \\frac{3}{1 + abc}\n\\]\n\nTo simplify, we add \\( \\frac{3}{1 + abc} \\) to both sides. The new inequality becomes:\n\n\\[\n\\frac{1}{a(1+b)} + \\frac{1}{b(1+c)} + \\frac{1}{c(1+a)} + \\frac{3}{1 + abc} \\geq \\frac{6}{1 + abc}\n\\]\n\n2. Let's analyze each term with an added \\( \\frac{1}{1 + abc} \\):\n\n\\[\n\\frac{1}{a(1+b)} + \\frac{1}{1 + abc}, \\quad \\frac{1}{b(1+c)} + \\frac{1}{1 + abc}, \\quad \\frac{1}{c(1+a)} + \\frac{1}{1 + abc}\n\\]\n\nWe can rewrite them as follows:\n\n\\[\n\\begin{aligned}\n\\frac{1}{a(1+b)} + \\frac{1}{1 + abc} &= \\frac{1}{1 + abc} \\left( \\frac{1 + abc}{a(1+b)} + 1 \\right), \\\\\n\\fra

In [7]:
# from unsloth import to_sharegpt

# dataset = to_sharegpt(
#     dataset,
#     merged_prompt = "{instruction}[[\nYour input is:\n{input}]]",
#     output_column_name = "output",
#     conversation_extension = 3, # Select more to handle longer conversations
# )

In [11]:
from unsloth import standardize_sharegpt
dataset = standardize_sharegpt(dataset)

Standardizing format:   0%|          | 0/89120 [00:00<?, ? examples/s]

In [12]:
dataset[0]['conversations']

[{'content': 'Return your final response within \\boxed{}. Let \\( a, b, c \\) be positive real numbers. Prove that\n\n$$\n\\frac{1}{a(1+b)}+\\frac{1}{b(1+c)}+\\frac{1}{c(1+a)} \\geq \\frac{3}{1+abc},\n$$\n\nand that equality occurs if and only if \\( a = b = c = 1 \\).',
  'role': 'user'},
 {'content': '<|begin_of_thought|>\n\nOkay, so I need to prove this inequality: \\(\\frac{1}{a(1+b)}+\\frac{1}{b(1+c)}+\\frac{1}{c(1+a)} \\geq \\frac{3}{1+abc}\\), where \\(a, b, c\\) are positive real numbers, and equality holds if and only if \\(a = b = c = 1\\). Hmm, let me try to think through this step by step.\n\nFirst, I remember that inequalities often involve techniques like AM-GM, Cauchy-Schwarz, or other classical inequalities. Maybe I can start by looking at each term on the left side and see if I can manipulate them to relate to the right side.\n\nThe left side has three terms: \\(\\frac{1}{a(1+b)}\\), \\(\\frac{1}{b(1+c)}\\), and \\(\\frac{1}{c(1+a)}\\). Each denominator has a variable

### Customizable Chat Templates

In [13]:
chat_template = """Below are some instructions that describe some tasks. Write responses that appropriately complete each request.

### Instruction:
{INPUT}

### Response:
{OUTPUT}"""

from unsloth import apply_chat_template
dataset = apply_chat_template(
    dataset,
    tokenizer = tokenizer,
    chat_template = chat_template,
    # default_system_message = "You are a helpful assistant", << [OPTIONAL]
)

Unsloth: We automatically added an EOS token to stop endless generations.


Map:   0%|          | 0/89120 [00:00<?, ? examples/s]

<a name="Train"></a>
### Train the model

In [14]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

# Define max_seq_length here
max_seq_length = 2048  # You might need to adjust this value

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Map (num_proc=2):   0%|          | 0/89120 [00:00<?, ? examples/s]

In [15]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 89,120 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 3,407,872


Step,Training Loss
1,0.9861
2,0.8801
3,1.1193
4,0.8785
5,0.8439
6,1.1502
7,0.9191
8,0.8862
9,0.8463
10,0.8692


<a name="Ollama"></a>
### Ollama

In [16]:
!curl -fsSL https://ollama.com/install.sh | sh

>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
############################################################################################# 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [17]:
model.save_pretrained_gguf("model", tokenizer)

Unsloth: ##### The current model auto adds a BOS token.
Unsloth: ##### Your chat template has a BOS token. We shall remove it temporarily.
Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 6.0G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 4.74 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 53%|█████▎    | 17/32 [00:00<00:00, 26.57it/s]
We will save to Disk and not RAM now.
100%|██████████| 32/32 [02:01<00:00,  3.81s/it]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving model/pytorch_model-00001-of-00004.bin...
Unsloth: Saving model/pytorch_model-00002-of-00004.bin...
Unsloth: Saving model/pytorch_model-00003-of-00004.bin...
Unsloth: Saving model/pytorch_model-00004-of-00004.bin...
Done.


Unsloth: Converting llama model. Can use fast conversion = False.


==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF 16bits might take 3 minutes.
\        /    [2] Converting GGUF 16bits to ['q8_0'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: CMAKE detected. Finalizing some steps for installation.
Unsloth: [1] Converting model at model into q8_0 GGUF format.
The output location will be /content/model/unsloth.Q8_0.gguf
This might take 3 minutes...
INFO:hf-to-gguf:Loading model: model
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:rope_freqs.weight,           torch.float32 --> F32, shape = {64}
INFO:hf-to-gguf:gguf: loading model weight map from 'pytorch_model.bin.index.json'
INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00001-of-00004.bin

Unsloth: ##### The current model auto adds a BOS token.
Unsloth: ##### We removed it in GGUF's chat template for you.


Unsloth: Conversion completed! Output location: /content/model/unsloth.Q8_0.gguf
Unsloth: Saved Ollama Modelfile to model/Modelfile


In [18]:
import subprocess
subprocess.Popen(["ollama", "serve"])
import time
time.sleep(3)

In [19]:
print(tokenizer._ollama_modelfile)

FROM {__FILE_LOCATION__}

TEMPLATE """Below are some instructions that describe some tasks. Write responses that appropriately complete each request.{{ if .Prompt }}

### Instruction:
{{ .Prompt }}{{ end }}

### Response:
{{ .Response }}<｜end▁of▁sentence｜>"""

PARAMETER stop "<|python_tag|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<｜▁pad▁｜>"
PARAMETER stop "<｜Assistant｜>"
PARAMETER stop "<|finetune_right_pad_id|>"
PARAMETER stop "<|eom_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<think>"
PARAMETER stop "</think>"
PARAMETER stop "<｜User｜>"
PARAMETER stop "<｜end▁of▁sentence｜>"
PARAMETER stop "<|reserved_special_token_"
PARAMETER temperature 1.5
PARAMETER min_p 0.1


We now will create an `Ollama` model called `unsloth_model` using the `Modelfile` which we auto generated!

In [20]:
!ollama create deepseek_finetuned_model -f ./model/Modelfile

[?25lgathering model components ⠙ [?25h[?25l[2K[1Ggathering model components ⠹ [?25h[?25l[2K[1Ggathering model components ⠸ [?25h[?25l[2K[1Ggathering model components ⠼ [?25h[?25l[2K[1Ggathering model components ⠴ [?25h[?25l[2K[1Ggathering model components ⠦ [?25h[?25l[2K[1Ggathering model components ⠧ [?25h[?25l[2K[1Ggathering model components ⠇ [?25h[?25l[2K[1Ggathering model components ⠏ [?25h[?25l[2K[1Ggathering model components ⠋ [?25h[?25l[2K[1Ggathering model components ⠙ [?25h[?25l[2K[1Ggathering model components ⠹ [?25h[?25l[2K[1Ggathering model components ⠸ [?25h[?25l[2K[1Ggathering model components ⠼ [?25h[?25l[2K[1Ggathering model components ⠴ [?25h[?25l[2K[1Ggathering model components ⠦ [?25h[?25l[2K[1Ggathering model components ⠧ [?25h[?25l[2K[1Ggathering model components ⠇ [?25h[?25l[2K[1Ggathering model components ⠏ [?25h[?25l[2K[1Ggathering model components ⠋ [?25h[?25l[2K[1Ggathering mode

In [22]:
!pip install ollama

Collecting ollama
  Downloading ollama-0.4.7-py3-none-any.whl.metadata (4.7 kB)
Downloading ollama-0.4.7-py3-none-any.whl (13 kB)
Installing collected packages: ollama
Successfully installed ollama-0.4.7


In [23]:
import ollama

response = ollama.chat(model="deepseek_finetuned_model",
            messages=[{ "role": "user", "content": "Continue the Fibonacci sequence: 1, 1, 2, 3, 5, 8,"
            },
                      ])

print(response.message.content)

To continue the Fibonacci sequence \(1, 1, 2, 3, 5, 8,\ldots\), we follow the rule where each subsequent number is the sum of the two preceding numbers. 

Starting from 13 and continuing this pattern:

1. Next after 8: \(13 + 21 = 34\)
2. Next after 21: \(21 + 34 = 55\)
3. Next after 34: \(34 + 55 = 89\)
4. Next after 55: \(55 + 89 = 144\)

Thus, the extended sequence is:

\(1,\ 1,\ 2,\ 3,\ 5,\ 8,\ 13,\ 21,\ 34,\ 55,\ 89,\ \boxed{144}\).


In [25]:
from IPython.display import Markdown
import ollama

from IPython.display import Markdown
import ollama

response = ollama.chat(model="deepseek_finetuned_model",
                       messages=[{"role": "user",
                                  "content": "Let \( a, b, c \) be positive real numbers. Prove that $$ \frac{1}{a(1+b)}+\frac{1}{b(1+c)}+\frac{1}{c(1+a)} \geq \frac{3}{1+abc}, $$ and that equality occurs if and only if \( a = b = c = 1 \)"}],
                      )

Markdown(response.message.content)

Markdown(response.message.content)

**Step 1: Homogeneous Variables**
Given that all variables are positive real numbers, we can use substitution to make the problem simpler. Let us assume $a = b = c$ and then see what condition must be met.

**Step 2: Simplify the Inequality**
Substituting $a = b = c = k$, the inequality simplifies as follows:
\[
\frac{1}{k(1 + k)} + \frac{1}{k(1 + k)} + \frac{1}{k(1 + k)} \geq \frac{3}{1 + k^3}
\]

**Step 3: Combine Fractions**
Combining the left-hand side fractions:
\[
\frac{3}{k(1 + k)}
\]
So, the inequality now is:
\[
\frac{3}{k(1 + k)} \geq \frac{3}{1 + k^3}
\]

**Step 4: Cross-Multiplication and Simplification**
Multiplying both sides by $(k(1 + k))(1 + k^3)$, we get:
\[
3(1 + k^3) \geq 3k(1 + k)
\]
Dividing both sides by $3$ (since $3 > 0$):
\[
1 + k^3 \geq k(1 + k)
\]
Simplifying the right-hand side:
\[
k + k^2
\]
Thus, we have:
\[
k^3 - k^2 - k + 1 \geq 0
\]

**Step 5: Factorization of Polynomial**
We factor the left-hand side polynomial by grouping terms:
\[
(k^3 - k^2) - (k - 1) = k^2(k - 1) - (k - 1)
\]
Factor out $(k - 1)$:
\[
(k - 1)(k^2 - 1) \geq 0
\]
Further factoring:
\[
(k - 1)(k + 1)^2 \geq 0
\]

**Step 6: Analyzing the Factorization**
Given that $(k + 1)^2$ is always non-negative for any real $k$, we analyze the sign of each factor.

- The term $(k - 1)$ changes sign at $k = 1$.
- Since $(k + 1)^2 \geq 0$ and $\text{sign}((k - 1)) = \text{sign}(1 - k)$, we consider two cases:
    - When $k < 1$, $\text{sign}((1 - k))$ is positive.
    - When $k > 1$, $\text{sign}((1 - k))$ is negative.

**Case 1: \(k \leq 0\)**
Given that $k$ is a positive real number, this case does not apply.

**Case 2: \(k = 1\)**
Here, we have:
\[
(k - 1)(k + 1)^2 = (1 - 1)(1 + 1)^2 = 0
\]
So, the inequality holds as equality.

**Case 3: \(k > 1\)**
For $k > 1$, since $\text{sign}((1 - k))$ is negative, we have:
\[
(k - 1) \leq 0
\]
Thus, the product $(k - 1)(k + 1)^2 \geq 0$ holds.

**Step 7: Conclusion**
From all cases above, since $(k - 1)(k + 1)^2 \geq 0$ for any $k > 0$, our inequality holds:
\[
\frac{1}{k(1 + k)} + \frac{1}{k(1 + k)} + \frac{1}{k(1 + k)} \geq \frac{3}{1 + k^3}
\]
Equality occurs when $k = 1$, thus proving that the initial inequality:
$$
\frac{1}{a(1 + b)} + \frac{1}{b(1 + c)} + \frac{1}{c(1 + a)} \geq \frac{3}{1 + abc}
$$
is indeed valid and equality holds only when $a = b = c = 1$.