# Urgency Detector

transformers → For working with DeepSeek models.
datasets → For handling urgency_data.csv efficiently.
peft → For LoRA fine-tuning (saves memory).
accelerate → Optimizes training performance.
bitsandbytes → Enables 8-bit and 4-bit quantization to reduce RAM usage.
torch → PyTorch for model training.

In [3]:
!pip install torch torch_xla torchvision -f https://storage.googleapis.com/libtpu-releases/index.html

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch_xla
import torch_xla.core.xla_model as xm

# Load the smallest DeepSeek model
MODEL_NAME = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"  # We'll replace this with a smaller model if needed

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

# Load model (set device to TPU if available)
device = xm.xla_device()
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME, num_labels=6
).to(device)

model.config.use_cache = False

# Verify model is loaded
print(f"Model loaded on {device}")

Looking in links: https://storage.googleapis.com/libtpu-releases/index.html


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Some weights of Qwen2ForSequenceClassification were not initialized from the model checkpoint at deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model loaded on xla:0


# Test Tokenization on Sample Data
Before fine-tuning, let’s verify that our tokenizer is working properly:

# Initial Model Testing on Small Dataset
Before fine-tuning, we want to check how well the base DeepSeek model classifies urgency without training.

1. Prepare a Few Sample Inputs for Testing
We will create a small dataset with a few labeled examples:

In [4]:
sample_conversations = [
    {
        "conversation": (
            "Caller: I need you to send money immediately, I am in big trouble!\n"
            "Receiver: What happened? Why do you need money so urgently?\n"
            "Caller: I got into a legal issue, and I need bail money right now!"
        ),
        "label": "Legal/Authority Urgency"
    },
    {
        "conversation": (
            "Caller: Hey, I just saw this new phone, and it’s on a flash sale for 80% off! You have to buy it now!\n"
            "Receiver: That’s a crazy deal! How long is the sale?\n"
            "Caller: It’s only for the next 5 minutes! Hurry up!"
        ),
        "label": "Social/Peer Pressure Urgency"
    },
    {
        "conversation": (
            "Caller: The bank just called. They said there’s an issue with your account, and you need to verify your details now.\n"
            "Receiver: Wait, is this a scam? My bank never calls like this.\n"
            "Caller: No, this is serious! If you don’t confirm, your account will be frozen!"
        ),
        "label": "Financial Urgency"
    }
]

labels = ["Emotional Urgency", "Financial Urgency", "Legal/Authority Urgency", "No Urgency", "Social/Peer Pressure Urgency", "Romantic Urgency"]
label_to_id = {label: i for i, label in enumerate(labels)}


2. Tokenize & Run Model on Small Sample
Now, we tokenize these conversations and see what the model predicts:

In [5]:
import torch
import torch.nn.functional as F

tokenizer.pad_token = tokenizer.eos_token
# Convert text to tokens
for conv in sample_conversations:
    inputs = tokenizer(
        conv["conversation"],
        truncation=True,
        padding=True,
        return_tensors="pt"
    ).to(device)

    # Get model predictions
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        prediction = torch.argmax(logits, dim=-1).item()  # Get predicted class

    # Print results
    print(f"Conversation: {conv['conversation'][:100]}...")  # Print first 100 chars
    print(f"Actual Label: {conv['label']}")
    print(f"Predicted Label: {labels[prediction]}")
    print("=" * 50)


Conversation: Caller: I need you to send money immediately, I am in big trouble!
Receiver: What happened? Why do y...
Actual Label: Legal/Authority Urgency
Predicted Label: Financial Urgency
Conversation: Caller: Hey, I just saw this new phone, and it’s on a flash sale for 80% off! You have to buy it now...
Actual Label: Social/Peer Pressure Urgency
Predicted Label: Financial Urgency
Conversation: Caller: The bank just called. They said there’s an issue with your account, and you need to verify y...
Actual Label: Financial Urgency
Predicted Label: Financial Urgency


# Step 3: Preparing the Dataset for Fine-Tuning
Now, we’ll prepare the main dataset (urgency_data.csv) for fine-tuning by:

1. Loading and inspecting the dataset.
2. Converting text and labels into model-compatible format.
3. Splitting the dataset into train and validation sets.
4. Ensuring efficient memory usage.



# 3.1 Load & Inspect the Dataset

In [6]:
import pandas as pd

# Load dataset
df = pd.read_csv("./urgency_data.csv")

# Display some samples
print(df.head())


                                        conversation              label
0  Caller: Hello, I'm so sorry to call you with t...  Emotional Urgency
1  Caller: Hi, I'm so sorry to inform you, but yo...  Emotional Urgency
2  Caller: This is terrible news, but I'm afraid ...  Emotional Urgency
3  Caller: I'm so sorry to call you with this, bu...  Emotional Urgency
4  Caller: Hello, I'm calling about your cousin. ...  Emotional Urgency


# 3.2 Encode Labels into Numerical Format
Since models require numerical inputs, we map labels to integers

In [7]:
from sklearn.preprocessing import LabelEncoder

# Initialize LabelEncoder
label_encoder = LabelEncoder()
df["label_encoded"] = label_encoder.fit_transform(df["label"])

# Store label mapping for future decoding
label_mapping = dict(zip(label_encoder.classes_, label_encoder.transform(label_encoder.classes_)))

print("Label Mapping:", label_mapping)


Label Mapping: {'Emotional Urgency': 0, 'Financial Urgency': 1, 'Legal/Authority Urgency': 2, 'No Urgency': 3, 'Romantic Urgency': 4, 'Social/Peer Pressure Urgency': 5}


# 3.3 Train-Test Split

In [8]:
from sklearn.model_selection import train_test_split

train_texts, val_texts, train_labels, val_labels = train_test_split(
    df["conversation"].tolist(),
    df["label_encoded"].tolist(),
    test_size=0.2,
    random_state=42
)

print(f"Training Samples: {len(train_texts)}, Validation Samples: {len(val_texts)}")


Training Samples: 450, Validation Samples: 113


# Step 4: Tokenization & Memory Optimization
This step ensures efficient processing by:
1. Tokenizing conversations using the pre-trained model’s tokenizer.
2. Truncating or padding sequences to a fixed length.
3. Using PyTorch Dataset for efficient batch processing.

In [9]:
# Load tokenizer

# Define a padding token if it's missing
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token  # Use EOS token as padding
    model.config.pad_token_id = tokenizer.pad_token_id  # Assign it to the model

# Tokenize training and validation data
train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=512)
val_encodings = tokenizer(val_texts, truncation=True, padding=True, max_length=512)

print("Sample Tokenized Input:", train_encodings["input_ids"][0][:20])  # First 20 tokens

Sample Tokenized Input: [151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643, 151643]


# 4.2 Create PyTorch Dataset for Efficient Batching

Since Hugging Face models use PyTorch, we need to wrap tokenized data into a dataset.

In [10]:
import torch

class UrgencyDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item["labels"] = torch.tensor(self.labels[idx])
        return item

# Convert tokenized data into PyTorch Datasets
train_dataset = UrgencyDataset(train_encodings, train_labels)
val_dataset = UrgencyDataset(val_encodings, val_labels)

print("Dataset Created. Training Samples:", len(train_dataset), "| Validation Samples:", len(val_dataset))


Dataset Created. Training Samples: 450 | Validation Samples: 113


# 4.3 Set Up DataLoaders for Optimized Training

We use DataLoaders to handle batching & parallel processing efficiently.

In [11]:
from torch.utils.data import DataLoader

# Set batch size (adjust based on RAM availability)
BATCH_SIZE = 4

# Create DataLoaders
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, pin_memory=True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, pin_memory=True)

print("Dataloaders Ready! Batch Size:", BATCH_SIZE)


Dataloaders Ready! Batch Size: 4


# Step 5: System Resource Check Before Fine-Tuning

We’ll monitor system memory, CPU, and GPU usage to avoid crashes during training.

In [None]:
import tensorflow as tf
import torch
import psutil
import os
import gc

# Check RAM usage
ram = psutil.virtual_memory()
print("==== Memory Usage ====")
print(f"Total RAM: {ram.total / 1e9:.2f} GB")
print(f"Available RAM: {ram.available / 1e9:.2f} GB")
print(f"Used RAM: {ram.used / 1e9:.2f} GB ({ram.percent}%)\n")

# Check disk usage
disk = psutil.disk_usage('/')
print("==== Disk Usage ====")
print(f"Total Disk Space: {disk.total / 1e9:.2f} GB")
print(f"Available Disk Space: {disk.free / 1e9:.2f} GB")
print(f"Used Disk Space: {disk.used / 1e9:.2f} GB ({disk.percent}%)\n")

# Check CPU usage
print("==== CPU Usage ====")
cpu_usage = psutil.cpu_percent(interval=1, percpu=True)
print(f"Overall CPU Usage: {psutil.cpu_percent()}%")
print("Per-core CPU Usage:")
for i, usage in enumerate(cpu_usage):
    print(f"  Core {i}: {usage}%")

# Check CPU temperature (if available)
try:
    temps = psutil.sensors_temperatures()
    if "coretemp" in temps:
        print("\nCPU Temperatures:")
        for i, temp in enumerate(temps["coretemp"]):
            print(f"  Core {i}: {temp.current}°C")
except AttributeError:
    print("\nCPU Temperature monitoring not supported on this system.")

# Check TPU availability
tpu_available = False
devices = tf.config.list_logical_devices()
for device in devices:
    if device.device_type == 'TPU':
        tpu_available = True
        break

print("\n==== Accelerator Check ====")
if tpu_available:
    print("TPU is available")
    try:
        import torch_xla.core.xla_model as xm
        print(f"TPU Device: {xm.xla_device()}")
        print(f"Total TPU Cores: {xm.xrt_world_size()}")
    except ImportError:
        print("TPU support is unavailable.")

# Check if GPU is available
elif torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    gpu_props = torch.cuda.get_device_properties(0)
    print(f"Total GPU Memory: {gpu_props.total_memory / 1e9:.2f} GB")
    print(f"Current GPU Memory Usage: {torch.cuda.memory_allocated(0) / 1e9:.2f} GB")
    print(f"Free GPU Memory: {torch.cuda.memory_reserved(0) / 1e9:.2f} GB")
else:
    print("No GPU found. Running on CPU!")

# Get process memory usage
pid = os.getpid()
proc = psutil.Process(pid)
print("\n==== Process Memory Usage ====")
print(f"Python Process Memory Usage: {proc.memory_info().rss / 1e9:.2f} GB")

# Run garbage collection
gc.collect()
torch.cuda.empty_cache()
print("\nCache cleared to free memory!")


==== Memory Usage ====
Total RAM: 359.23 GB
Available RAM: 350.29 GB
Used RAM: 6.52 GB (2.5%)

==== Disk Usage ====
Total Disk Space: 241.95 GB
Available Disk Space: 219.82 GB
Used Disk Space: 22.11 GB (9.1%)

==== CPU Usage ====


# Step 6: Fine-Tuning DeepSeek Model
6.1 Setup Training Arguments
We'll define our training parameters, including batch size, learning rate, and epochs.

In [30]:
from transformers import TrainingArguments

model.gradient_checkpointing_enable()  # Saves memory at the cost of extra compute

training_args = TrainingArguments(
    output_dir="./deepseek-finetuned",   # Save model here
    evaluation_strategy="epoch",         # Evaluate after each epoch
    save_strategy="epoch",               # Save model after each epoch
    learning_rate=5e-5,                   # Standard LR for transformers
    per_device_train_batch_size=4,        # Reduce if memory issue
    per_device_eval_batch_size=4,
    num_train_epochs=3,                   # We can adjust this later
    weight_decay=0.01,                     # Regularization
    logging_dir="./logs",                  # Log directory
    logging_steps=10,                       # Log progress
    bf16=True,                              # Use mixed precision for speed
    push_to_hub=False,                       # Disable model upload
    gradient_accumulation_steps=16,
    optim="adamw_torch",
    no_cuda=False
)




# 6.2 Initialize Trainer
We'll use Hugging Face’s Trainer to fine-tune our model

In [31]:
from transformers import Trainer

print("TPU Available:", xm.xla_device())
device = xm.xla_device()
print(f"Using device: {device}")

# Check if PyTorch sees TPU
print("Torch sees TPU:", torch.device("xla"))
print("TPU device name:", xm.xla_device())

# Try a simple tensor operation
x = torch.tensor([1.0, 2.0, 3.0], device=xm.xla_device())
print("Tensor on TPU:", x)

print("PyTorch version:", torch.__version__)
print("TorchXLA version:", torch_xla.__version__)

# os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" # Enable CUDA Memory Expansion

trainer = Trainer(
    model=model.to(device),                 # Our DeepSeek model
    args=training_args,
    train_dataset=train_dataset, # Processed training data
    eval_dataset=val_dataset,    # Processed validation data
    processing_class=tokenizer,
)

# from torch_xla.utils.checkpoint import checkpoint
# model.gradient_checkpointing_enable()
# model.checkpoint_func = lambda func, *args: checkpoint(func, *args, use_reentrant=True)


TPU Available: xla:0
Using device: xla:0
Torch sees TPU: xla
TPU device name: xla:0
Tensor on TPU: tensor([1., 2., 3.], device='xla:0')
PyTorch version: 2.5.1+cpu
TorchXLA version: 2.5.1


In [None]:
!pip install torch_xla
print(torch.__version__)
print(torch.backends.mps.is_available())  # For macOS
print(torch.cuda.is_available())  # For GPUs

print(hasattr(torch, "xla"))  # Should return True if XLA is installed

2.6.0+cu124
False
False
False


# 6.3 Start Fine-Tuning! 🚀
Now, we train the model.

In [32]:
trainer.train()

## After training, we save the model so we can use it later.

model.save_pretrained("./deepseek-finetuned")
tokenizer.save_pretrained("./deepseek-finetuned")

AttributeError: module 'torch' has no attribute 'xla'