<a href="https://colab.research.google.com/github/WaleeSassi/Heart-Attack-EDA/blob/main/Fine_Tuning_DeepSeek_LLM_Adapting_Open_Source_AI_for_Your%C2%A0Needs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Check GPU Availability
Make sure Google Colab is using a GPU.

In [1]:
import torch
torch.cuda.is_available()


True

If this returns True, you're good to go! If not, go to Runtime > Change runtime type > GPU.

# Install Required Libraries
Run this command to install transformers, torch, and accelerate.

In [2]:
!pip install -U torch transformers datasets accelerate peft bitsandbytes

Collecting transformers
  Downloading transformers-4.50.0-py3-none-any.whl.metadata (39 kB)
Collecting datasets
  Downloading datasets-3.4.1-py3-none-any.whl.metadata (19 kB)
Collecting peft
  Downloading peft-0.15.0-py3-none-any.whl.metadata (13 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.45.3-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas

# 3. Load DeepSeek LLM from Hugging Face

Load the model with LoRA (Low-Rank Adaptation) for efficient fine-tuning.

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model

model_name = "deepseek-ai/deepseek-llm-7b-chat"

# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    # bnb_4bit_compute_dtype=torch.float16  # Use float16 for faster computation
)

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"
)


# Apply LoRA for memory-efficient fine-tuning
lora_config = LoraConfig(
    r=8,  # Low-rank adaptation size
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],  # Apply LoRA to attention layers
    lora_dropout=0.05,
    bias="none"
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

print("✅ DeepSeek LLM Loaded with LoRA and 4-bit Precision!")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.28k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/4.61M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/594 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/22.5k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.97G [00:00<?, ?B/s]

pytorch_model-00002-of-00002.bin:   0%|          | 0.00/3.85G [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.6k [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

trainable params: 3,932,160 || all params: 6,914,297,856 || trainable%: 0.0569
✅ DeepSeek LLM Loaded with LoRA and 4-bit Precision!


# 4. Load and Preprocess the  Dataset




In [3]:
import pandas as pd
import json

# Load dataset
df = pd.read_csv("/content/cleaned_saudisaudi(in).csv")

# Convert to structured chat format
chat_data = []
for _, row in df.iterrows():
    chat_data.append({
        "messages": [
            {"role": "system", "content": "أنت مساعد ذكاء اصطناعي مفيد."},  # Arabic system message
            {"role": "user", "content": row["input"]},  # Ensure input supports Arabic
            {"role": "assistant", "content": row["output"]}
        ]
    })

# Save properly formatted JSONL
with open("train_data_chat.jsonl", "w", encoding="utf-8") as f:
    for entry in chat_data:
        f.write(json.dumps(entry, ensure_ascii=False) + "\n")

# Load dataset correctly
from datasets import load_dataset

dataset = load_dataset("json", data_files="train_data_chat.jsonl", split="train")

# Split into train and test
dataset = dataset.train_test_split(test_size=0.3)
print(dataset)


Generating train split: 0 examples [00:00, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['messages'],
        num_rows: 3810
    })
    test: Dataset({
        features: ['messages'],
        num_rows: 1633
    })
})


### Tokenize Dataset

In [4]:
print(dataset["train"][0])  # Check the first example

{'messages': [{'role': 'system', 'content': 'أنت مساعد ذكاء اصطناعي مفيد.'}, {'role': 'user', 'content': 'مرحبا، أبي أعرف عن الشمس كم حرارتها؟'}, {'role': 'assistant', 'content': 'مرحبا! الشمس حرارتها 15 مليون درجة بوسطها، سطحها 6000 درجة، شيء يحير العقل!'}]}


In [15]:
def tokenize_function(examples):
    # Use the tokenizer's chat template to format the messages
    formatted_text = tokenizer.apply_chat_template(examples["messages"], tokenize=False)

    # Tokenize the formatted text
    tokenized_output = tokenizer(formatted_text, truncation=True, padding="max_length", max_length=512)

    # Add labels (same as input_ids for causal language modeling)
    tokenized_output["labels"] = tokenized_output["input_ids"]

    return tokenized_output

# Apply tokenization to the dataset
tokenized_datasets = dataset.map(tokenize_function, batched=True)

Map:   0%|          | 0/3810 [00:00<?, ? examples/s]

Map:   0%|          | 0/1633 [00:00<?, ? examples/s]

In [16]:
print("Tokenized dataset columns:", tokenized_datasets.column_names)

Tokenized dataset columns: {'train': ['messages', 'input_ids', 'attention_mask', 'labels'], 'test': ['messages', 'input_ids', 'attention_mask', 'labels']}


# 5. Set Training Parameterss

In [17]:
import os
os.environ["WANDB_DISABLED"] = "true"

from transformers import TrainingArguments

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=3e-4,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_dir="./logs",
    fp16=True,
)

print("✅ WandB Disabled!")

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


✅ WandB Disabled!


# Get sample Data

To speed up the training

In [18]:
small_train_dataset = tokenized_datasets["train"]
small_test_dataset = tokenized_datasets["test"]

# 5. Initialize Trainer and Train

Set up the Trainer and start fine-tuning.

In [9]:
pip install bert-score

Collecting bert-score
  Downloading bert_score-0.3.13-py3-none-any.whl.metadata (15 kB)
Downloading bert_score-0.3.13-py3-none-any.whl (61 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.1/61.1 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bert-score
Successfully installed bert-score-0.3.13


In [19]:
import numpy as np
from bert_score import score as bert_score

def compute_metrics(eval_pred):
    predictions, labels = eval_pred

    # Calculate perplexity
    # The eval_loss is already provided by the Trainer
    eval_loss = trainer.state.log_history[-1]["eval_loss"]
    perplexity = np.exp(eval_loss)

    # Decode predictions and labels into text
    pred_texts = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    label_texts = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # Compute BERTScore
    P, R, F1 = bert_score(pred_texts, label_texts, lang="ar")  # Use "ar" for Arabic, or the appropriate language code

    # Return metrics
    return {
        "perplexity": perplexity,
        "bertscore_precision": P.mean().item(),
        "bertscore_recall": R.mean().item(),
        "bertscore_f1": F1.mean().item(),
    }

In [20]:
from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,  # Remove raw text column
    eval_dataset=small_test_dataset,
    compute_metrics=compute_metrics,  # Add the compute_metrics function

)

print("🚀 Trainer Initialized!")


No label_names provided for model class `PeftModel`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


🚀 Trainer Initialized!


# 6. Fine-Tune DeepSeek LLM

In [21]:
print("🚀 Starting Fine-Tuning...")
trainer.train()

🚀 Starting Fine-Tuning...


Epoch,Training Loss,Validation Loss


OutOfMemoryError: CUDA out of memory. Tried to allocate 1.56 GiB. GPU 0 has a total capacity of 14.74 GiB of which 694.12 MiB is free. Process 24816 has 14.06 GiB memory in use. Of the allocated memory 13.34 GiB is allocated by PyTorch, and 600.71 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

# 7. Inference

In [22]:
trainer.model.save_pretrained("./Deepseek-Arabic")
tokenizer.save_pretrained("./Deepseek-Arabic")

('./Deepseek-Arabic/tokenizer_config.json',
 './Deepseek-Arabic/special_tokens_map.json',
 './Deepseek-Arabic/tokenizer.json')

In [23]:
model_name = "./Deepseek-Arabic"  # Replace this with the correct path if needed
tokenizer = AutoTokenizer.from_pretrained(model_name)

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    # bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=True,
)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name ,
    quantization_config=quant_config,
    device_map={"": 0}
)


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

OutOfMemoryError: CUDA out of memory. Tried to allocate 800.00 MiB. GPU 0 has a total capacity of 14.74 GiB of which 700.12 MiB is free. Process 24816 has 14.05 GiB memory in use. Of the allocated memory 13.33 GiB is allocated by PyTorch, and 602.21 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In [None]:
system_prompt = """
    يا هلا، أنا مساعدك الطبي الافتراضي. شغلتي أساعدك تحجز موعد عند الدكتور بطريقة سهلة ومريحة.
    بتكلم معك بأسلوب رايق وواضح، عشان نرتب كل شيء بسرعة.

    أول شيء، بشوف وش تحتاج بالضبط. هذي الخطوات اللي بنمشي عليها:

    1. استقبل طلبك إذا كنت تبي تحجز موعد طبي.
    2. أسألك عن نوع الدكتور اللي تبيه، يعني مثلاً دكتور أطفال، عظام، ولا جلدية؟
    3. أسألك وين تبي العيادة تكون، قريبة من بيتك ولا مستشفى كبير؟
      - مثلاً: "تبي عيادة جنبك ولا مستشفى فيه كل التخصصات؟"
      - بعدين أعطيك قايمة بالدكاترة المتوفرين حسب التخصص والمكان.
    4. أسألك أي دكتور تفضل منهم، وأقولك ليش هالدكتور ممكن يكون مناسب.
    5. أشوف معك أي يوم يناسبك للموعد.
    6. أتأكد من المواعيد المتوفرة، ولو ما فيه أعطيك خيارات ثانية، وأقولك إن كل شيء مرن.
    7. لما نحدد الموعد، أطلب منك بياناتك: الاسم كامل، تاريخ الميلاد، رقم الجوال، والإيميل.
    8. أأكد لك الحجز بطريقة حلوة، مثلاً: "يallah تم الحجز! الله يعطيك الصحة والعافية."
    """

    # Create the conversation history (messages list)
messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "أريد زيارة طبيب أطفال"},  # Add the user question (e.g., "أريد حجز موعد عند الطبيب")
    ]
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)

result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)
print(result)