
# Refactored: Transfer Learning — Qwen 2.5 + Hugging Face Transformers + CUDA utils

This top section upgrades the notebook to use **Qwen 2.5** models via `transformers` and integrates CUDA management utilities from `utils.py` (preferred at parent folder).  
It does **not** delete your original notebook content — it adds a modernized, ready-to-run section for transfer learning and model loading. Change `model_id` as needed for model size.


In [5]:
import torch
import gc
import os
import sys
import subprocess
import platform

# Set environment variable to help with memory fragmentation
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'

# Environment & version checks
# Import CUDA utils from parent folder (preferred), fallback to local
from pathlib import Path

print(f"Python: {sys.version}")
try:
    import torch, transformers
    print("PyTorch:", torch.__version__)
    print("Transformers:", transformers.__version__)
except Exception as e:
    print("You likely need to install torch/transformers:", e)
    
# Try parent directory first (ideal location)
parent_dir = str(Path.cwd().parent)
if parent_dir not in sys.path:
    sys.path.insert(0, parent_dir)

try:
    import utils  # expected at ../utils.py
except Exception:
    # Fallback: current working directory
    curr_dir = str(Path.cwd())
    if curr_dir not in sys.path:
        sys.path.insert(0, curr_dir)
    import utils  # tries ./utils.py

from utils import *

print("Loaded utils from:", utils.__file__)
# Set memory env & show current device
utils.setup_memory_environment(expandable_segments=True)
device = utils.get_device()
print("Selected device:", device)


Python: 3.12.11 (main, Jul 23 2025, 00:34:44) [Clang 20.1.4 ]
PyTorch: 2.8.0+cu129
Transformers: 4.56.0
Loaded utils from: /mnt/nfs/workspace/courses/PyTorch/Building-Transformer-Models-with-PyTorch-2.0/utils.py
Memory environment configured
Selected device: cuda


In [6]:

# -- Model selection and load (Qwen 2.5) --
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "Qwen/Qwen2.5-0.5B-Instruct"  # change to desired Qwen 2.5 variant
gen_kwargs = {"max_new_tokens": 256, "temperature": 0.7, "top_p": 0.9, "do_sample": True}

# clear GPU before load
if hasattr(utils, "clear_gpu_memory"):
    try:
        utils.clear_gpu_memory(aggressive=False)
    except Exception as e:
        print("clear_gpu_memory failed:", e)

use_bfloat16 = False
try:
    if device == "cuda" and torch.cuda.is_available():
        use_bfloat16 = torch.cuda.is_bf16_supported()
except Exception:
    pass

dtype = torch.bfloat16 if use_bfloat16 else (torch.float16 if device == "cuda" else None)

print("Loading tokenizer and model:", model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    dtype=(dtype if dtype is not None else None),  # Changed from torch_dtype to dtype
    device_map=("auto" if device in ("cuda", "mps") else None)
)
model.eval()
print("Model loaded. Device:", model.device)
if hasattr(utils, "print_cuda_memory"):
    try:
        utils.print_cuda_memory(verbose=True)
    except Exception as e:
        print("print_cuda_memory failed:", e)


GPU memory cleared
Loading tokenizer and model: Qwen/Qwen2.5-0.5B-Instruct
Model loaded. Device: cuda:0
=== GPU Memory Usage ===
Allocated: 0.92 GB
Reserved:  1.86 GB
Free:      14.56 GB
Total:     15.48 GB


In [None]:

# -- Simple transfer learning / fine-tuning scaffold note --
# This notebook is named Transfer_Learning.ipynb, so below are recommended starting points.
# Use the existing dataset handling cells in the original notebook — these refactor cells only
# ensure modern model loading and CUDA helpers are available.
#
# Typical steps:
# 1. Prepare dataset -> map to chat prompts using tokenizer.apply_chat_template (if instruct-style)
# 2. Create tokenized dataset with labels (language modelling)
# 3. Use DataCollatorForLanguageModeling or custom collator
# 4. Use Trainer or accelerate + custom training loop; consider PEFT/LoRA for efficiency
#
# See the earlier created chatbot_MOD.ipynb for a full example of training args and LOra scaffolding.


We will build the real news vs fake news detection engine. We want to demonstrate how this pipeline can be adapted to your organization's specific needs. Instead of using a pre-built dataset, we will download a dataset from Kaggle and utilize it in our fine-tuning process. This approach will help illustrate how the pipeline can be tailored to work with custom datasets in real-world applications.
Here's an outline of the fine-tuning process
1. Import required libraries and packages

2. Load the dataset. Download the data from kaggle and save it on your drive.
3. Load pre-trained BERT tokenizer:


4. Prepare the dataset: 


  * Tokenize the text using the BERT tokenizer
  * Create attention masks
 * Split the dataset into training and validation sets
  * Create a custom PyTorch dataset class (TextClassificationDataset)
  * Instantiate the custom dataset for both training and validation sets
  * Create PyTorch DataLoader
  
4. Load a pre-trained BERT model for sequence classification using the Hugging Face Transformers library
5. Setup Accelarator environment
6. Fine-tune the model:

7. Evaluate the model:
  *Calculate  metrics, such as F1 score, recall, and precision
8. Inference:

  * Create a function to perform inference on new text input
 * Tokenize the input text and convert it to the required format
 * Perform inference using the fine-tuned model
 * Interpret the model's output and return the predicted class

# 1. Import required libraries and packages
 

In [7]:
import torch
import gc
import os

# Now import and run your code
import pandas as pd
from sklearn.model_selection import train_test_split
from accelerate import Accelerator
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler
from torch.optim import AdamW
from tqdm import tqdm
from transformers import AutoTokenizer, AutoModelForSequenceClassification, get_scheduler

# Your model loading and training code here...

**Note:** MPS=> Apple's Metal Performance Shaders (MPS) is a framework that provides highly optimized, low-level GPU-accelerated functions for deep learning, image processing, and other compute-intensive tasks.

In [8]:
def get_device():
  device="cpu"
  if torch.cuda.is_available():
    device="cuda"
  elif  torch.backends.mps.is_available():
    device='mps'
  else:
    device="cpu"
  return device


device = get_device()
print(device) 

cuda


# 2. Load Data
1. Reading data from two CSV files: True.csv (real news) and Fake.csv (fake news)
2. Cleaning and preprocessing the data in each CSV file
3. Concatenating both dataframes into a single dataframe
4. The resulting dataframe contains two columns: 'text' for the news content and 'label' for its corresponding category (real or fake)

In [9]:
real=pd.read_csv('true.csv')
fake=pd.read_csv('fake.csv')


In [10]:
real = real.drop(['title','subject','date'], axis=1)
real['label']=1.0
fake = fake.drop(['title','subject','date'], axis=1)
fake['label']=0.0
dataframe=pd.concat([real, fake], axis=0, ignore_index=True)


In [11]:
df = dataframe.sample(frac=0.1).reset_index(drop=True)
print(df.head(20))
print(len(df[df['label']==1.0]))
print(len(df[df['label']==0.0]))

                                                 text  label
0   (Reuters) - President Donald Trump has cemente...    1.0
1   Republicans have revived an old and overused p...    0.0
2   Remember when experts came out after Hillary c...    0.0
3   If anyone wonders how the  Reverend  Al Sharpt...    0.0
4   Just two months shy of the one-year anniversar...    0.0
5   JOHANNESBURG (Reuters) - South Africa s Nation...    1.0
6   You can often figure a lot out about a person ...    0.0
7   WASHINGTON (Reuters) - U.S. House Democratic L...    1.0
8   WASHINGTON (Reuters) - U.S. House Armed Servic...    1.0
9   Besty Devos is Trump s conservative choice for...    0.0
10  WASHINGTON (Reuters) - President Donald Trump’...    1.0
11  Our country is spinning out of control. Obama ...    0.0
12  CLEVELAND (Reuters) - U.S. House Speaker Paul ...    1.0
13  CHICAGO (Reuters) - Illinois’ long-running bud...    1.0
14  It won t be long before the progressives start...    0.0
15  Donald Trump can bra

#3.  Load Tokenizer:
1. We are using the `bert-base-uncased` tokenizer. We also need to use the corresponding model

In [12]:

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# 4. Prepare Data
The data preparation process for BERT-based uncased models involves tokenizing the text, mapping tokens to `input_ids`, creating attention masks `attention_mask`, , and preparing the labels tensor `labels`. Each element of Dataset Class should be dictionary of following structure.

```
{'input_ids': torch.Tensor(),'attention_mask':torch.Tensor(), 'labels': torch.Tensor()  }
```
1. Tokenization: The text input should be tokenized into subwords using BERT's WordPiece tokenizer. This tokenizer converts the text into a format that BERT can understand.

2. `input_ids`: Each token from the tokenized text needs to be mapped to an ID using BERT's vocabulary. The resulting input IDs should be in the form of a tensor or array, usually of shape (batch_size, max_sequence_length).
3. `attention_mask`: The attention mask is used to differentiate between the actual tokens and padding tokens. It has the same shape as the input IDs tensor, i.e., (batch_size, max_sequence_length). The mask has 1s for actual tokens and 0s for padding tokens.
4. `labels`: The labels tensor contains the true class or value for each example in the dataset. It usually has a shape of (batch_size,). For classification tasks, these labels are one-hot-encoded labels

In [13]:
# this is just creating list of tuples. Each tupe has (text, label)
data=list(zip(df['text'].tolist(), df['label'].tolist()))

# This function takes list of Texts, and Labels as Parameter
# This function return input_ids, attention_mask, and labels_out
def tokenize_and_encode(texts, labels):
    input_ids, attention_masks, labels_out = [], [], []
    for text, label in zip(texts, labels):
        encoded = tokenizer.encode_plus(text, max_length=512, padding='max_length', truncation=True)
        input_ids.append(encoded['input_ids'])
        attention_masks.append(encoded['attention_mask'])
        labels_out.append(label)
    return torch.tensor(input_ids), torch.tensor(attention_masks), torch.tensor(labels_out)

# seprate the tuples
# generate two lists: a) containing texts, b) containing labels
texts, labels = zip(*data)

# train, validation split
train_texts, val_texts, train_labels, val_labels = train_test_split(texts, labels, test_size=0.2)

# tokenization
train_input_ids, train_attention_masks, train_labels = tokenize_and_encode(train_texts, train_labels)
val_input_ids, val_attention_masks, val_labels = tokenize_and_encode(val_texts, val_labels)




**It's always good to review the data**
1. input_ids
  * `0` token value means padded token
2. attention_mask
  * `1`: corresponding token is real token
  * `0`: corresponding token is padded token

In [14]:
print('train_input_ids ',train_input_ids[0].shape ,train_input_ids[0], '\n'
      'train_attention_masks ', train_attention_masks[0] ,train_attention_masks[0], '\n'
      'train_labels', train_labels[0])

train_input_ids  torch.Size([512]) tensor([101, 102,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,  

### TextClassificationDataset
1. For tunning `bert-based-uncased`: each item of Dataset must be of type dictionary with at following  keys:
  * input_ids
  * attention_mask
  * labels
2. Thus,  `__getitem__`  should return dictionary of following structure:
```
{
            'input_ids': self.input_ids[idx],
            'attention_mask': self.attention_masks[idx],
            'labels': self.one_hot_labels[idx]
        }
```
3. one_hot_encode method: A static method that takes in targets (labels) and num_classes as arguments. It converts the given targets into one-hot encoded tensors. The method first converts the targets to long tensors and then initializes a zero tensor of shape (number of samples, num_classes). The scatter_ function is used to place 1.0 in the appropriate position for each sample's label, resulting in a one-hot encoded tensor.

In [15]:
class TextClassificationDataset(torch.utils.data.Dataset):
    def __init__(self, input_ids, attention_masks, labels, num_classes=2):
        self.input_ids = input_ids
        self.attention_masks = attention_masks
        self.labels = labels
        self.num_classes = num_classes
        self.one_hot_labels = self.one_hot_encode(labels, num_classes)

    def __len__(self):
        return len(self.input_ids)

    def __getitem__(self, idx):
        return {
            'input_ids': self.input_ids[idx],
            'attention_mask': self.attention_masks[idx],
            'labels': self.one_hot_labels[idx]
        }


    @staticmethod
    def one_hot_encode(targets, num_classes):
        targets = targets.long()
        one_hot_targets = torch.zeros(targets.size(0), num_classes)
        one_hot_targets.scatter_(1, targets.unsqueeze(1), 1.0)
        return one_hot_targets
        

train_dataset = TextClassificationDataset(train_input_ids, train_attention_masks, train_labels)
val_dataset = TextClassificationDataset(val_input_ids, val_attention_masks, val_labels)


### DataLoader
*italicized text*

In [16]:
train_dataloader = DataLoader(train_dataset, batch_size=2, shuffle=True)
eval_dataloader = DataLoader(val_dataset, batch_size=2)

In [17]:
print(len(train_dataset))
len((val_dataset))

3592


898

1.Revisiting dimension requirements for Transformers in Pytorch from Chapter 3: The encoder expects data with dimensions (seq_len, batch_size). However, Hugging Face's bert-based-uncased model requires data with dimensions (batch_size, seq_len). As a result, the output from the train_dataloader has dimensions of (batch_size, seq_len).

In [18]:
item=next(iter(train_dataloader))
item_ids,item_mask,item_labels=item['input_ids'],item['attention_mask'],item['labels']
print ('item_ids, ',item_ids.shape, '\n',
       'item_mask, ',item_mask.shape, '\n',
       'item_labels, ',item_labels.shape, '\n',)

item_ids,  torch.Size([2, 512]) 
 item_mask,  torch.Size([2, 512]) 
 item_labels,  torch.Size([2, 2]) 



In [19]:
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
optimizer = AdamW(model.parameters(), lr=5e-5)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


# 5. Prepare Accelaerator
What is Accelerator?
 1. It provides an easy-to-use API for training deep learning models on various hardware accelerators, such as GPUs, TPUs, and Apple's Metal Performance Shaders (MPS).
  * In our example, during training, we donot specifically select 'mps' device. THe accelerator automatically detects it and use 'mps' for training
 2. The Accelerator library is particularly useful for distributed training and mixed-precision training.

In [20]:
import torch
import gc
from utils import *

clear_gpu_memory()

# Now try to prepare the model
accelerator = Accelerator()
model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
    model, optimizer, train_dataloader, eval_dataloader
)

GPU memory cleared


# 5. Fine Tune The Model
1. `lr_scheduler` in the provided code is an instance of a learning rate scheduler, which is responsible for adjusting the learning rate during the training process. The learning rate scheduler helps improve the training process by dynamically adjusting the learning rate based on the number of training steps. In this code, the learning rate starts with the initial value set in the optimizer and decreases linearly to 0 as the training progresses.
2. Some benefit of lr_scheduler over optimizer alone are
  * Faster convergence 
  * Avoid Overshooting: When using a fixed learning rate, the optimizer might overshoot the optimal solution, especially in the later stages of training. By decreasing the learning rate over time, the model can make smaller updates and fine-tune its weights
  
3. `progress_bar` is just utility to show the progress of training
4. These are standard approach for fine tunning:
```
 }
        outputs = model(**batch)
        loss = outputs.loss
        accelerator.backward(loss)
        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()
        progress_bar.update(1)
```
  * each batch should be dictionary of structure {input_ids:torch.Tensor(), attention_mask: torch.Tensor(), labels: torch.Tensor()
  * the dimension of input_ids=(batch_size, seq_len); attention_mask= (batch_size, seq_len); and labels=(batch_size,)
  * You can notice that during training, we are not explicitly converting `tensor` into device; accelerator is automatically identifying the `device` and converting `tensor` into the appropriate format
1. After each epoch, we are also printing the evaluation metrics over the evaluation dataset

In [21]:
from utils import *
from sklearn.metrics import accuracy_score, f1_score, recall_score, precision_score
from tqdm import tqdm
# Direct import from parent folder
import importlib.util
import sys
from pathlib import Path

# import utils directly
from utils import *
from accelerate import Accelerator

# Setup memory environment
setup_memory_environment()
clear_gpu_memory()

# Get device
device = get_device()
print(f"Using device: {device}")

# Print memory info
print_cuda_memory(verbose=True)

# Calculate optimal batch size
sample_batch = next(iter(train_dataloader))
optimal_batch_size = calculate_optimal_batch_size_simple(model, sample_batch)
#optimal_batch_size = get_optimal_batch_size()

# Recreate dataloaders with optimal batch size
train_dataloader = DataLoader(train_dataset, batch_size=optimal_batch_size, shuffle=True)
eval_dataloader = DataLoader(val_dataset, batch_size=optimal_batch_size)

print(f"Using batch size: {optimal_batch_size}")

# Set accumulation steps
target_effective_batch_size = 8
accumulation_steps = max(1, target_effective_batch_size // optimal_batch_size)
print(f"Using gradient accumulation steps: {accumulation_steps}")

num_epochs = 3
accumulation_steps = 4  # Effective batch size = 2 * 4 = 8
num_training_steps = num_epochs * len(train_dataloader) // accumulation_steps
lr_scheduler = get_scheduler(
    "linear",
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=num_training_steps
)

accelerator = Accelerator()

# Let accelerator prepare everything
model, optimizer, train_dataloader, eval_dataloader = accelerator.prepare(
    model, optimizer, train_dataloader, eval_dataloader
)

# Training loop with memory monitoring
for epoch in range(num_epochs):
    print(f"\n=== Epoch {epoch+1}/{num_epochs} ===")
    print_cuda_memory()
    
    # Training phase...
    # Training phase
    model.train()
    optimizer.zero_grad()

    bar_length = get_optimal_bar_length()
    
    train_progress_bar = tqdm(
        train_dataloader,
        desc=f"Epoch {epoch+1}/{num_epochs} - Training",
        unit="batch",
        ncols=bar_length,
        bar_format='{l_bar}{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}]'
    )
    
   # train_progress_bar = tqdm(train_dataloader, desc=f"Epoch {epoch+1}/{num_epochs} ")
    total_train_loss = 0
    
    for i, batch in enumerate(train_progress_bar):
        outputs = model(**batch)
        loss = outputs.loss / accumulation_steps  # Scale loss
        accelerator.backward(loss)
        total_train_loss += loss.item() * accumulation_steps
        
        if (i + 1) % accumulation_steps == 0:
            optimizer.step()
            lr_scheduler.step()
            optimizer.zero_grad()
            
            # Update progress bar with current loss
            avg_loss = total_train_loss / (i + 1)
            train_progress_bar.set_postfix({
                'loss': f'{avg_loss:.4f}',
                'lr': f'{lr_scheduler.get_last_lr()[0]:.2e}'
            })
    
    # Evaluation phase
    model.eval()
    preds = []
    out_label_ids = []

    bar_length = get_optimal_bar_length()
    
    eval_progress_bar = tqdm(
        eval_dataloader, 
        desc=f"Epoch {epoch+1}/{num_epochs} - Evaluating",
        unit="batch",
        ncols=bar_length,
        bar_format='{l_bar}{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}]'
    )
    
    for batch in eval_progress_bar:
        with torch.no_grad():
            inputs = {k: v.to(device) for k, v in batch.items()}
            outputs = model(**inputs)
            logits = outputs.logits

        batch_preds = torch.argmax(logits.detach().cpu(), dim=1).numpy()
        batch_labels = torch.argmax(inputs["labels"].detach().cpu(), dim=1).numpy()
        
        preds.extend(batch_preds)
        out_label_ids.extend(batch_labels)
        
        # Update progress bar with current batch stats
        if len(preds) > 0:
            current_accuracy = accuracy_score(out_label_ids, preds)
            eval_progress_bar.set_postfix({'acc': f'{current_accuracy:.3f}'})
    
    # Calculate final metrics
    accuracy = accuracy_score(out_label_ids, preds)
    f1 = f1_score(out_label_ids, preds, average='weighted')
    recall = recall_score(out_label_ids, preds, average='weighted')
    precision = precision_score(out_label_ids, preds, average='weighted')
    
    # Update the training progress bar with final metrics
    train_progress_bar.set_postfix({
        'loss': f'{avg_loss:.4f}',
        'acc': f'{accuracy:.3f}',
        'f1': f'{f1:.3f}',
        'recall': f'{recall:.3f}',
        'precision': f'{precision:.3f}'
    })
    
    # Close progress bars
    train_progress_bar.close()
    eval_progress_bar.close()
    
    # Optional: Print summary at the end of each epoch
    print(f"\nEpoch {epoch+1}/{num_epochs} Summary:")
    print(f"Train Loss: {avg_loss:.4f} | Accuracy: {accuracy:.4f} | F1: {f1:.4f} | Recall: {recall:.4f} | Precision: {precision:.4f}")
    
    # Memory check during training
    if i % 10 == 0:
        print_cuda_memory()
    
    # Clean up after epoch
    clear_gpu_memory()

    

Memory environment configured
GPU memory cleared
Using device: cuda
=== GPU Memory Usage ===
Allocated: 0.41 GB
Reserved:  0.41 GB
Free:      15.07 GB
Total:     15.48 GB
GPU memory cleared
Available memory: 11.97GB
Using batch size: 29
Using batch size: 29
Using gradient accumulation steps: 1

=== Epoch 1/3 ===
GPU Memory: 0.41GB used, 15.07GB free, 15.48GB total


Epoch 1/3 - Training: 100%|█| 124/124 [01:43<00:00
Epoch 1/3 - Evaluating: 100%|█| 31/31 [00:08<00:00



Epoch 1/3 Summary:
Train Loss: 0.3644 | Accuracy: 0.9933 | F1: 0.9933 | Recall: 0.9933 | Precision: 0.9934
GPU memory cleared

=== Epoch 2/3 ===
GPU Memory: 1.24GB used, 14.24GB free, 15.48GB total


Epoch 2/3 - Training: 100%|█| 124/124 [01:43<00:00
Epoch 2/3 - Evaluating: 100%|█| 31/31 [00:08<00:00



Epoch 2/3 Summary:
Train Loss: 0.0249 | Accuracy: 0.9978 | F1: 0.9978 | Recall: 0.9978 | Precision: 0.9978
GPU memory cleared

=== Epoch 3/3 ===
GPU Memory: 1.24GB used, 14.24GB free, 15.48GB total


Epoch 3/3 - Training: 100%|█| 124/124 [01:43<00:00
Epoch 3/3 - Evaluating: 100%|█| 31/31 [00:08<00:00


Epoch 3/3 Summary:
Train Loss: 0.0102 | Accuracy: 0.9978 | F1: 0.9978 | Recall: 0.9978 | Precision: 0.9978
GPU memory cleared





# 6. Inference Pipeline
1. `tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
`: You need use the same tokenizer that was use for fine-tunning
2. `logits.detach().cpu()`
  * `detach is done to prevent  unintentional back-propogation
  * `.cpu` is done so that the output is compatible with scikit-learn libraries for further computation

In [22]:
from transformers import BertTokenizer
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

def inference(text, model,  label, device='cuda'):
    # Load the tokenizer

    # Tokenize the input text
    inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True)
    # Move input tensors to the specified device (default: 'cpu')
    inputs = {k: v.to(device) for k, v in inputs.items()}

    # Set the model to evaluation mode and perform inference
    model.eval()
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits

    # Get the index of the predicted label
    pred_label_idx = torch.argmax(logits.detach().cpu(), dim=1).item()

    print(f"Predicted label index: {pred_label_idx}, actual label {label}")
    return pred_label_idx


In [23]:
#https://abcnews.go.com/US/tornado-confirmed-delaware-powerful-storm-moves-east/story?id=98293454
text='\
WASHINGTON (ABC) A confirmed tornado was located near Bridgeville in Sussex County, Delaware, shortly after 6 p.m. ET Saturday, moving east at 50 mph, according to the National Weather Service. Downed trees and wires were reported in the area.\
'
inference(text, model, 1.0)
text="this is definately junk text I am typing"
inference(text, model, 0.0)

Predicted label index: 1, actual label 1.0
Predicted label index: 0, actual label 0.0


0

In [24]:
clear_gpu_memory()

GPU memory cleared
