In [1]:

# Inspired from https://unsloth.ai/blog/deepseek-r1


### Setup env

In [2]:
!pip install unsloth

!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

Collecting unsloth
  Downloading unsloth-2025.1.6-py3-none-any.whl.metadata (53 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/53.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.7/53.7 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting unsloth_zoo>=2025.1.4 (from unsloth)
  Downloading unsloth_zoo-2025.1.5-py3-none-any.whl.metadata (16 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
  Downloading xformers-0.0.29.post1-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Collecting bitsandbytes (from unsloth)
  Downloading bitsandbytes-0.45.1-py3-none-manylinux_2_24_x86_64.whl.metadata (5.8 kB)
Collecting tyro (from unsloth)
  Downloading tyro-0.9.13-py3-none-any.whl.metadata (9.4 kB)
Collecting datasets>=2.16.0 (from unsloth)
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting trl!=0.9.0,!=0.9.1,!=0.9.2,!=0.9.3,>=0.7.9 (from unsloth)
  Downloading trl-0.13.0-

Found existing installation: unsloth 2025.1.6
Uninstalling unsloth-2025.1.6:
  Successfully uninstalled unsloth-2025.1.6
Collecting git+https://github.com/unslothai/unsloth.git
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-req-build-k83xfz4i
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-req-build-k83xfz4i
  Resolved https://github.com/unslothai/unsloth.git to commit bdf0cd6033595be4e7ed23d0d002bb176d343152
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: unsloth
  Building wheel for unsloth (pyproject.toml) ... [?25l[?25hdone
  Created wheel for unsloth: filename=unsloth-2025.1.7-py3-none-any.whl size=174896 sha256=67e430998fde146dd0fc5246e6648fdcd091f8d3bee10aec99c742b4ff311ffd
  Stored in directory: /tmp/pip-ephem-wheel-cache-2bynwr2x/wheels/d1/17/

In [3]:
from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.1.7: Fast Llama patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/231 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/52.9k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 4,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 42,
    use_rslora = False,
    loftq_config = None,
)

Not an error, but Unsloth cannot patch MLP layers with our manual autograd engine since either LoRA adapters
are not enabled or a bias term (like in Qwen) is used.
Unsloth 2025.1.7 patched 32 layers with 32 QKV layers, 32 O layers and 0 MLP layers.



### Data Preparation : alpaca-gpt4

In [5]:
from datasets import load_dataset
dataset = load_dataset("vicgalle/alpaca-gpt4", split = "train")
print(dataset.column_names)

README.md:   0%|          | 0.00/3.38k [00:00<?, ?B/s]

(…)-00000-of-00001-6ef3991c06080e14.parquet:   0%|          | 0.00/48.4M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/52002 [00:00<?, ? examples/s]

['instruction', 'input', 'output', 'text']


In [6]:
dataset[0]

{'instruction': 'Give three tips for staying healthy.',
 'input': '',
 'output': '1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.',
 'text': 'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nGive three tips for 

In [7]:
from unsloth import to_sharegpt

dataset = to_sharegpt(
    dataset,
    merged_prompt = "{instruction}[[\nYour input is:\n{input}]]",
    output_column_name = "output",
    conversation_extension = 3, # Select more to handle longer conversations
)

Merging columns:   0%|          | 0/52002 [00:00<?, ? examples/s]

Converting to ShareGPT:   0%|          | 0/52002 [00:00<?, ? examples/s]

Flattening the indices:   0%|          | 0/52002 [00:00<?, ? examples/s]

Flattening the indices:   0%|          | 0/52002 [00:00<?, ? examples/s]

Flattening the indices:   0%|          | 0/52002 [00:00<?, ? examples/s]

Extending conversations:   0%|          | 0/52002 [00:00<?, ? examples/s]

In [8]:
from unsloth import standardize_sharegpt
dataset = standardize_sharegpt(dataset)

Standardizing format:   0%|          | 0/52002 [00:00<?, ? examples/s]

In [9]:
dataset[0]['conversations']

[{'content': 'Give three tips for staying healthy.', 'role': 'user'},
 {'content': '1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.',
  'role': 'assistant'},
 {'content': 'Describe what a monotheistic religion is.', 'role': 'user'},
 {'content': 'A monotheistic religion is a type of relig

### Customizable Chat Templates

In [10]:
chat_template = """Below are some instructions that describe some tasks. Write responses that appropriately complete each request.

### Instruction:
{INPUT}

### Response:
{OUTPUT}"""

from unsloth import apply_chat_template
dataset = apply_chat_template(
    dataset,
    tokenizer = tokenizer,
    chat_template = chat_template,
    # default_system_message = "You are a helpful assistant", << [OPTIONAL]
)

Unsloth: We automatically added an EOS token to stop endless generations.


Map:   0%|          | 0/52002 [00:00<?, ? examples/s]

<a name="Train"></a>
### Train the model

In [12]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Map (num_proc=2):   0%|          | 0/52002 [00:00<?, ? examples/s]

In [13]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 52,002 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 3,407,872


Step,Training Loss
1,1.8937
2,1.9251
3,1.8209
4,2.015
5,2.0907
6,1.789
7,1.7602
8,1.6723
9,1.6972
10,1.5224



### Use Ollama

In [32]:
!curl -fsSL https://ollama.com/install.sh | sh

>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
############################################################################################# 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


In [15]:
model.save_pretrained_gguf("model", tokenizer)

Unsloth: ##### The current model auto adds a BOS token.
Unsloth: ##### Your chat template has a BOS token. We shall remove it temporarily.
Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 6.0G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 5.29 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 47%|████▋     | 15/32 [00:00<00:00, 22.27it/s]
We will save to Disk and not RAM now.
100%|██████████| 32/32 [01:17<00:00,  2.42s/it]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving model/pytorch_model-00001-of-00004.bin...
Unsloth: Saving model/pytorch_model-00002-of-00004.bin...
Unsloth: Saving model/pytorch_model-00003-of-00004.bin...
Unsloth: Saving model/pytorch_model-00004-of-00004.bin...
Done.


Unsloth: Converting llama model. Can use fast conversion = False.


==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF 16bits might take 3 minutes.
\        /    [2] Converting GGUF 16bits to ['q8_0'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: CMAKE detected. Finalizing some steps for installation.
Unsloth: [1] Converting model at model into q8_0 GGUF format.
The output location will be /content/model/unsloth.Q8_0.gguf
This might take 3 minutes...
INFO:hf-to-gguf:Loading model: model
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:rope_freqs.weight,           torch.float32 --> F32, shape = {64}
INFO:hf-to-gguf:gguf: loading model weight map from 'pytorch_model.bin.index.json'
INFO:hf-to-gguf:gguf: loading model part 'pytorch_model-00001-of-00004.bin

Unsloth: ##### The current model auto adds a BOS token.
Unsloth: ##### We removed it in GGUF's chat template for you.


Unsloth: Conversion completed! Output location: /content/model/unsloth.Q8_0.gguf
Unsloth: Saved Ollama Modelfile to model/Modelfile


In [37]:
import subprocess
subprocess.Popen(["ollama", "serve"])
import time
time.sleep(3)

In [30]:
print(tokenizer._ollama_modelfile)

FROM {__FILE_LOCATION__}

TEMPLATE """Below are some instructions that describe some tasks. Write responses that appropriately complete each request.{{ if .Prompt }}

### Instruction:
{{ .Prompt }}{{ end }}

### Response:
{{ .Response }}<｜end▁of▁sentence｜>"""

PARAMETER stop "<|python_tag|>"
PARAMETER stop "<｜User｜>"
PARAMETER stop "<|eom_id|>"
PARAMETER stop "<|finetune_right_pad_id|>"
PARAMETER stop "<｜Assistant｜>"
PARAMETER stop "<think>"
PARAMETER stop "</think>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<｜▁pad▁｜>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<｜end▁of▁sentence｜>"
PARAMETER stop "<|reserved_special_token_"
PARAMETER temperature 1.5
PARAMETER min_p 0.1


In [38]:
!ollama create deepseek_finetuned_model -f ./model/Modelfile

[?25lgathering model components ⠋ [?25h[?25l[2K[1Ggathering model components ⠹ [?25h[?25l[2K[1Ggathering model components ⠸ [?25h[?25l[2K[1Ggathering model components ⠼ [?25h[?25l[2K[1Ggathering model components ⠴ [?25h[?25l[2K[1Ggathering model components ⠦ [?25h[?25l[2K[1Ggathering model components ⠧ [?25h[?25l[2K[1Ggathering model components ⠇ [?25h[?25l[2K[1Ggathering model components ⠏ [?25h[?25l[2K[1Ggathering model components ⠋ [?25h[?25l[2K[1Ggathering model components ⠙ [?25h[?25l[2K[1Ggathering model components ⠹ [?25h[?25l[2K[1Ggathering model components ⠸ [?25h[?25l[2K[1Ggathering model components ⠼ [?25h[?25l[2K[1Ggathering model components ⠴ [?25h[?25l[2K[1Ggathering model components ⠦ [?25h[?25l[2K[1Ggathering model components ⠧ [?25h[?25l[2K[1Ggathering model components ⠧ [?25h[?25l[2K[1Ggathering model components ⠏ [?25h[?25l[2K[1Ggathering model components ⠋ [?25h[?25l[2K[1Ggathering mode

In [39]:
!pip install ollama




### Use the inetuned model

In [41]:
from IPython.display import Markdown
import ollama

response = ollama.chat(model="deepseek_finetuned_model",
                       messages=[{"role": "user",
                                  "content": "How to configure django project?"},
                      ])

Markdown(response.message.content)

Configuring a Django project involves setting up your project and applications, creating superuser (admin), creating database models and much more steps. Let's follow these steps:

1. **Set Up Django Project**:
   - Install Python, pip, and virtualenv.
   - Create a new virtual environment: `virtualenv myproject` (you can replace myproject with any name you prefer).
   - Activate the virtual environment: `source myproject/bin/activate`
   - Clone or create your project: `django-admin startproject myapp`.
   - Navigate into your project directory: `cd myapp`.

2. **Set Up Your Project**:
   - Open up Django admin interface: `python manage.py admin`.
   - Create superuser: `python manage.py createsuperuser`

3. **Create Applications**:
   - Add app in the settings: `INSTALLED_APPS = [ 
      ...,
      'myapp',
   ]` (replace myapp with your project name)

4. **Run Migrations**:
   - Make changes to database models, then run migrations: `python manage.py migrate`
   - To make it easy to create superuser and run database migrations, you can use:

5. **Create Models**:
   - Create model definitions in `myapp/models.py`. For example, if I want a blog, I could write:

```
class BlogPost(models.Model):
    author = models.CharField(max_length=20)
    title = models.CharField(max_length=60)
    content = models.TextField()
    posted_at = models.DateTimeField(auto_now=True)

class Comment(models.Model:
    post = models.ForeignKey('myapp.BlogPost', related_name='comments')
    name = models.CharField(max_length=30)
    comment_text = models.TextField()

```

6. **Migrate Models to Database**:
   - After creating your model, you'll need to run `python manage.py migrate` or `python manage.py makemigrations myapp` and then run the migrations.

7. **Run Server**:
   - Start up a server: `python manage.py runserver`
   - Your application will be accessible at `http://localhost:8000`

These are some of the steps you need to take to set up a Django project, but it's always good to check out the official documentation for more details and updates.

In [53]:
from IPython.display import Markdown
import ollama

response = ollama.chat(model="deepseek_finetuned_model",
                       messages=[{"role": "user",
                                  "content": "Can you solve the equation 2x+3=7?"},
                      ])

Markdown(response.message.content)

Sure, I can help solve the equation 2x + 3 = 7.

To solve for x, we'll need to isolate it on one side of the equation. 

Step 1: Subtract 3 from both sides of the equation to eliminate the constant term:

2x + 3 - 3 = 7 - 3  
This simplifies to:
2x = 4

Step 2: Now divide both sides by 2 to solve for x:

(2x) / 2 = 4 / 2  
This simplifies to:
x = 2

In [54]:
from IPython.display import Markdown
import ollama

response = ollama.chat(model="deepseek_finetuned_model",
                       messages=[{"role": "user",
                                  "content": "What is a Markov chain and how is it used in probabilistic models?"},
                      ])

Markdown(response.message.content)

A Markov chain is a mathematical model of a system that changes over time, where the next state only depends on its current state, not on the sequence of events preceding it. This type of model is widely used in many fields such as finance, economics, and engineering for predicting future outcomes based on past data.

In probabilistic models, Markov chains are used to model uncertainty by assigning probabilities to transitions between states. The chain consists of a set of states and transition probabilities between them. For example, the state could be the weather condition or economic indicators, and the transition probabilities describe the likelihood of moving from one state to another.

Markov chains can be applied in scenarios where there is sequential dependence on random variables that can be modeled as a system with a finite number of possible states. They are useful for forecasting, estimating probabilities, and simulating complex systems in the absence of perfect information.