<a href="https://colab.research.google.com/github/ccoyso/Tapia/blob/main/Copia_de_deepseekv3_finetuning_ipynb_(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All"
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# DeepSeek R1 Model Fine-Tuning (LORA) with GPT-4 Dataset [Unsloth and OLLAMA]

## **Unsloth**

Unsloth appears to be a tool or framework designed for efficient fine-tuning of language models. From the context, it likely incorporates techniques like Low-Rank Adaptation (LoRA) and other efficiency optimizations to fine-tune large models, such as Llama derivatives, with minimal computational resources.

The key features of Unsloth as implied by your description might include:

1. **Efficient Fine-Tuning**: Instead of updating the entire model's weights, it leverages methods like LoRA, which fine-tune a smaller subset of the parameters, making it resource-efficient.

2. **Simplified Workflow**: The process of loading models, configuring parameters, and training appears streamlined, allowing developers to focus on specific customizations rather than managing complex infrastructure.

3. **Integration with Local Runtimes**: After fine-tuning, exporting to tools like Ollama for local deployment demonstrates its support for practical application.

## **DeepSeek-R1**

The DeepSeek-R1 is a versatile robot for exploration and inspection in tough environments. With AI, precise sensors, and multi-terrain mobility, it handles tasks like data collection, mapping, and monitoring. Customisable for search and rescue, inspections, or research, it ensures reliable performance in hazardous areas.

![](https://bsmedia.business-standard.com/_media/bs/img/article/2025-01/28/full/1738047877-0145.JPG?im=FeatureCrop,size=(826,465))

In [None]:
%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git@nightly git+https://github.com/unslothai/unsloth-zoo.git

In [None]:
!pip install unsloth_zoo



## Load the Model
Using Unsloth, you load the DeepSeek-R1 Distilled Llama-8B model, a smaller, faster version of the Llama model optimised for performance while retaining accuracy.
Along with the model, you also load its tokeniser. The tokeniser breaks down input text into smaller units (tokens) that the model can process.AttributeError

## Importance?
Loading the model and tokenizer is the foundation for fine-tuning since they define how text inputs are processed and predictions are generated.

In [None]:
from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B-unsloth-bnb-4bit",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.6.1: Fast Llama patching. Transformers: 4.52.4.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.30. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/53.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

## Setup Parameter-Efficient Fine-Tuning (PEFT)
The **FastLanguageModel.get_peft_model** function modifies a pre-trained language model to use PEFT techniques, which allow fine-tuning of the model using fewer resources and parameters.

The method introduces additional tunable parameters (like LoRA matrices) to specific layers of the model while freezing most of the original model weights.

***In this Project, we use LORA to fine-tune the DeepSeek R1 LLM.***

First, it takes the existing model. Then, it applies a technique called PEFT (Parameter-Efficient Fine-Tuning).  Think of it as adding small, adjustable "knobs" instead of changing the whole engine.

r=4 and lora_alpha=16 are just settings for how many "knobs" and how sensitive they are.  target_modules specifies where these knobs are attached – specifically, parts of the model that handle questions, keys, values, and outputs.  lora_dropout=0 means no "knobs" are randomly turned off during training.

bias="none" means no extra adjustments to the model's biases. use_gradient_checkpointing="unsloth" is a memory-saving trick for training. random_state=42 ensures we get the same results if we run this again.  use_rslora=False and loftq_config=None are more advanced settings that are turned off here.

In [29]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 4,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 42,
    use_rslora = False,
    loftq_config = None,
)

Unsloth: Already have LoRA adapters! We shall skip this step.


## Load Dataset

The vicgalle/alpaca-gpt4 dataset is a collection of 52,000 instruction-following instances designed to fine-tune language models (LLMs). It was created by Vic Galie and is available on Hugging Face.

In [30]:
from datasets import load_dataset
dataset = load_dataset("vicgalle/alpaca-gpt4", split = "train")
print(dataset.column_names)

['instruction', 'input', 'output', 'text']


In [31]:
dataset[0]

{'instruction': 'Give three tips for staying healthy.',
 'input': '',
 'output': '1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.',
 'text': 'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nGive three tips for 

Let's format the dataset in a way suitable for conversational AI training using the ShareGPT format, which is designed for multi-turn conversations.

In [32]:
from unsloth import to_sharegpt

dataset = to_sharegpt(
    dataset,
    merged_prompt = "{instruction}[[\nYour input is:\n{input}]]",
    output_column_name = "output",
    conversation_extension = 3, # Select more to handle longer conversations
)

In [33]:
from unsloth import standardize_sharegpt
dataset = standardize_sharegpt(dataset)

In [34]:
dataset[0]['conversations']

[{'content': 'Give three tips for staying healthy.', 'role': 'user'},
 {'content': '1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.\n\n2. Engage in regular physical activity: Exercise is crucial for maintaining strong bones, muscles, and cardiovascular health. Aim for at least 150 minutes of moderate aerobic exercise or 75 minutes of vigorous exercise each week.\n\n3. Get enough sleep: Getting enough quality sleep is crucial for physical and mental well-being. It helps to regulate mood, improve cognitive function, and supports healthy growth and immune function. Aim for 7-9 hours of sleep each night.',
  'role': 'assistant'},
 {'content': 'Describe what a monotheistic religion is.', 'role': 'user'},
 {'content': 'A monotheistic religion is a type of relig

## Customizable Chat Templates

This code takes a dataset, formats it into a chat-friendly format, and then applies a template so that the AI can understand the instructions and responses correctly.

In [35]:
chat_template = """Below are some instructions that describe some tasks. Write responses that appropriately complete each request.

### Instruction:
{INPUT}

### Response:
{OUTPUT}"""

from unsloth import apply_chat_template
dataset = apply_chat_template(
    dataset,
    tokenizer = tokenizer,
    chat_template = chat_template,
    # default_system_message = "You are a helpful assistant", << [OPTIONAL]
)

Unsloth: We automatically added an EOS token to stop endless generations.


## Train the model

Let's define and configure a fine-tuning trainer for a language model using the **SFTTrainer** class from the **trl** library (likely for fine-tuning language models) and transformers' **TrainingArguments**

It uses SFTTrainer from the trl library, which is specifically for Supervised Fine-Tuning.  This means the model will learn from the dataset of instructions and responses you prepared earlier.

It takes the model (the tuned language model), the tokenizer (for breaking down text), and the dataset. dataset_text_field="text" tells it where the actual text is in the dataset.  max_seq_length=2048 limits the length of text sequences the model processes at once. dataset_num_proc=2 uses two processes to prepare the data, and packing=False disables a specific data packing technique.

Then, it configures the training process with TrainingArguments.  per_device_train_batch_size=2 means each training device (like a GPU) will process 2 examples at a time. gradient_accumulation_steps=4 combines the gradients from 4 batches to simulate a larger batch size. warmup_steps=5 gradually increases the learning rate at the beginning. max_steps=20 limits the total training steps. learning_rate=2e-4 sets the learning rate.  fp16 and bf16 control the precision of calculations (using either half-precision or bfloat16 if supported, for faster training). logging_steps=1 logs training progress every step. optim="adamw_8bit" specifies the optimizer. weight_decay=0.01 is a regularization technique. lr_scheduler_type="linear" sets how the learning rate changes over time. seed=3407 ensures reproducibility. output_dir="outputs" specifies where to save the trained model. report_to="none" disables reporting to services like WandB (Weights & Biases).


In [36]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = 2048,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 20,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

The trainer orchestrates the fine-tuning process by combining the model, tokenizer, dataset, and hyperparameters. It ensures efficient training, taking advantage of mixed precision (fp16 or bf16) and memory-efficient optimizations (e.g., AdamW with 8-bit precision). Additionally, it handles sequence preprocessing, gradient accumulation, and learning rate scheduling.

In [37]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 52,002 | Num Epochs = 1 | Total steps = 20
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 3,407,872/4,632,088,576 (0.07% trained)


Step,Training Loss
1,0.0038
2,0.0038
3,0.0033
4,0.0043
5,0.0048
6,0.0023
7,0.0027
8,0.0038
9,0.0027
10,0.0038


## Ollama

OLAMA is a powerful tool that enables you to run large language models (LLMs) directly on your own computer (laptop or desktop).


In [38]:
!curl -fsSL https://ollama.com/install.sh | sh

>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.


Let's save this model.

In [39]:
model.save_pretrained_gguf("model", tokenizer)

Unsloth: ##### The current model auto adds a BOS token.
Unsloth: ##### Your chat template has a BOS token. We shall remove it temporarily.


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 4.08 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 32/32 [03:35<00:00,  6.73s/it]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving model/pytorch_model-00001-of-00004.bin...
Unsloth: Saving model/pytorch_model-00002-of-00004.bin...
Unsloth: Saving model/pytorch_model-00003-of-00004.bin...
Unsloth: Saving model/pytorch_model-00004-of-00004.bin...
Done.
==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF 16bits might take 3 minutes.
\        /    [2] Converting GGUF 16bits to ['q8_0'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: [1] Converting model at model into q8_0 GGUF format.
The output location will be /content/model/unsloth.Q8_0.gguf
This might take 3 minutes...
INFO:hf-to-gguf:Loading model: model
INFO:hf-to-gguf:Model architecture: LlamaForCausalLM
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-g

Unsloth: ##### The current model auto adds a BOS token.
Unsloth: ##### We removed it in GGUF's chat template for you.


Unsloth: Conversion completed! Output location: /content/model/unsloth.Q8_0.gguf
Unsloth: Saved Ollama Modelfile to model/Modelfile


# Start the OLLAMA Server

In [41]:
import subprocess
subprocess.Popen(["ollama", "serve"])
import time
time.sleep(3)
print(tokenizer._ollama_modelfile)

FROM {__FILE_LOCATION__}

TEMPLATE """Below are some instructions that describe some tasks. Write responses that appropriately complete each request.{{ if .Prompt }}

### Instruction:
{{ .Prompt }}{{ end }}

### Response:
{{ .Response }}<｜end▁of▁sentence｜>"""

PARAMETER stop "<|python_tag|>"
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|eom_id|>"
PARAMETER stop "<think>"
PARAMETER stop "<｜▁pad▁｜>"
PARAMETER stop "</think>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|finetune_right_pad_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<｜Assistant｜>"
PARAMETER stop "<｜User｜>"
PARAMETER stop "<｜end▁of▁sentence｜>"
PARAMETER stop "<|reserved_special_token_"
PARAMETER temperature 1.5
PARAMETER min_p 0.1


This command registers the fine-tuned model (deepseek_finetuned_model) with Ollama. Once registered:

- The model can be run locally using Ollama.
- It becomes accessible for further tasks, such as querying, evaluating, or deploying in specific applications.
- Ollama ensures the model is formatted and stored correctly for efficient usage

In [53]:
!ollama create deepseek_finetuned_model -f ./model/Modelfile

Error: ollama server not responding - could not connect to ollama server, run 'ollama serve' to start it


In [54]:
!pip install ollama



In [60]:
import subprocess
import time
import ollama # Import ollama here as well

# Start the ollama server as a background process
process = subprocess.Popen(["ollama", "serve"])

# Wait for a longer duration to give the server time to start
# This is a heuristic and might need adjustment based on the environment
print("Waiting for Ollama server to start...")
time.sleep(30) # Increased sleep time significantly

# Optional: Add a check to see if the process is still running
if process.poll() is not None:
    print("Ollama server process exited prematurely.")
else:
    print("Ollama server process is running.")

# Now attempt to use the ollama client
try:
    # You can optionally add a loop here to retry the connection a few times
    # before giving up, which would be more robust.
    response = ollama.chat(model="deepseek_finetuned_model",
                messages=[{ "role": "user", "content": "Continue the Fibonacci sequence: 1, 1, 2, 3, 5, 8,"
                },
                          ])

    print(response.message.content)
except ollama.ConnectionError as e:
    print(f"Failed to connect to Ollama after waiting: {e}")
    print("Please ensure Ollama is properly installed and the server is running.")

Waiting for Ollama server to start...
Ollama server process is running.
Alright, so I need to continue the Fibonacci sequence starting from 1, 1, 2, 3, 5, 8. Hmm, let me remember how the Fibonacci sequence works. It's where each number is the sum of the two preceding ones, right? So after 8, what comes next?

Let me break it down. The last two numbers before 8 are 5 and 8. Adding those together gives 13. Okay, so that should be the next number. Then, the next number after 13 would be 8 + 13, which is 21. After that, it would be 13 + 21, which equals 34. Following that, I add 21 and 34 to get 55. And then, adding 34 and 55 gives me 89. So putting all those together, the sequence continues as: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89.

Wait a minute, let me double-check my additions to make sure I didn't make any mistakes. Let's go step by step:

- Starting with 1 and 1.
- Next is 2 (1 + 1).
- Then 3 (1 + 2).
- After that, 5 (2 + 3).
- Then 8 (3 + 5).
- Adding those last two, 5 and 8 gives 1

1, 1, 2, 3, 5, 8, 13, 21

In [61]:
from IPython.display import Markdown
import ollama

response = ollama.chat(model="deepseek_finetuned_model",
                       messages=[{"role": "user",
                                  "content": "How to add chart to a document?"},
                      ])

Markdown(response.message.content)

To add a chart to a document, you should first open your document and click on the location where you want to insert the chart. Then select 'Insert' from the top menu bar and choose 'Chart' from the dropdown list. A window will pop up where you can customize the chart's settings like title, data, and styling before inserting it into the document.


To add a chart to a document, follow these steps:

- Insert a Table: Start by inserting a table into the document. You can do this using the 'Table' tool in most word processors.
- Insert Data: Add data into the table. Ensure that your data is properly formatted and organised before adding the chart.
- Choose a Chart Type: Select the type of chart you want to create from the available options (e.g., bar chart, pie chart, line graph, etc.).
- Edit the Chart Data: Add the necessary data points and formatting to the chart using the chart editor that appears once the chart is selected.
- Format the Table and Chart Together: Make sure the table and chart work well together by adjusting alignment, spacing, and other design elements as needed.

For more detailed instructions, you may want to consult a guide or use a tool such as Microsoft Word's chart features.