# Fine-Tuning LLaMA 3.2 1B for Patent Classification

**Author:** Gaurav Bharatavalli Rangaswamy

**Date:** March 13, 2025  

## Step 1: Installing Required Libraries

To set up the environment for fine-tuning the LLaMA 3.2 1B model with 4-bit quantization, we need to install several essential Python libraries. These libraries help with model loading, fine-tuning, quantization, and efficient training.

### 1.1 Required Packages
- `transformers`: Provides pre-trained models and tokenizers from Hugging Face, allowing us to load and fine-tune LLaMA 3.2.
- `bitsandbytes`: Enables 4-bit and 8-bit quantization for efficient model inference and training on lower-memory devices.
- `accelerate`: Optimizes deep learning model training across multiple devices.
- `datasets`: Provides access to a vast collection of datasets, including the **CCDV Patent Classification Dataset** used in this project.
- `vllm`: A high-throughput and memory-efficient inference and deployment framework for large language models.
- `peft`: (Parameter Efficient Fine-Tuning) enables techniques like **LoRA (Low-Rank Adaptation)** for fine-tuning large models efficiently.

### 1.2 Installation Commands
We install these packages using `pip` in a Colab environment:

```bash
!pip install transformers bitsandbytes accelerate datasets vllm
!pip install -U bitsandbytes
!pip install peft


In [1]:
!pip install transformers bitsandbytes accelerate datasets vllm
!pip install -U bitsandbytes
!pip install peft

Collecting bitsandbytes
  Downloading bitsandbytes-0.45.3-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting datasets
  Downloading datasets-3.3.2-py3-none-any.whl.metadata (19 kB)
Collecting vllm
  Downloading vllm-0.7.3-cp38-abi3-manylinux1_x86_64.whl.metadata (25 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting blake3 (from vllm)
  Downloading blake3-1.0.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.2 kB)
Collecting fastapi!=0.113.*,!=0.114.0,>=0.107.0 (from fastapi[standard]!=0.113.*,!=0.114.0,>=0.107.0; python_version >= "3.9"->vllm)
  Downloading fastapi-0.115.11-py3-none-any.whl.metadata (27 kB)
Collecting prometheus-

## Model Training and Fine-Tuning

Here, I fine-tuned the **LLaMA 3.2 1B** model on the **CCDV Patent Classification Dataset** using 4-bit quantization and LoRA to make training efficient.

### Why These Choices?
- **4-bit Quantization (Instructed):** Helps reduce memory usage while keeping model performance stable.
- **LoRA Fine-Tuning:** Instead of updating all weights, it modifies only a few, making training faster and more efficient.
- **Gradient Accumulation (`steps=4`)**: Since batch size is small, this helps simulate a larger batch without running out of memory.
- **Mixed Precision (`fp16=True`)**: Saves GPU memory and speeds up training.
- **Dataset Processing:** Tokenized text (`max_length=512`) and mapped labels for training.

### Training Setup:
- **Batch Size:** 2 per device
- **Epochs:** 3 (enough for learning without overfitting)
- **Learning Rate:** 3e-5 (fine-tuned for stability)
- **Checkpoint Saving:** Disabled to save space in Colab (`save_strategy="no"`).

Once training was done, I saved the model weights, tokenizer, and LoRA adapters separately for later use.


In [None]:
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType
from datasets import load_dataset
from google.colab import drive

# Mounting Google Drive
drive.mount('/content/drive')

# Configuration
hf_token = "add_your_hugging_face_toke_here" # Note that in production we should create .env to add the keys and secrets. But to make things simpler for now I have added all here.
model_name_or_path = "meta-llama/Llama-3.2-1B"
checkpoint_dir = "add_your_path_to_model_folder_here"
num_labels = 9

# 4-bit quantization config - as instructed
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Loading model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_auth_token=hf_token)
tokenizer.pad_token = tokenizer.eos_token

base_model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    quantization_config=quant_config,
    device_map="auto",
    use_auth_token=hf_token
)

# Adding LoRA adapters
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=8,
    lora_alpha=16,
    lora_dropout=0.05
)
base_model = get_peft_model(base_model, lora_config)

# Custom classification model
class LlamaForClassification(nn.Module):
    def __init__(self, lm_model, num_labels):
        super().__init__()
        self.lm_model = lm_model
        self.classifier = nn.Linear(lm_model.config.hidden_size, num_labels)

    def forward(self, input_ids, attention_mask=None, labels=None):
        outputs = self.lm_model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            output_hidden_states=True
        )
        hidden_state = outputs.hidden_states[-1]
        # Getting position of last non-pad token using attention_mask
        seq_lengths = attention_mask.sum(dim=1) - 1
        batch_size = input_ids.size(0)
        pooled_output = hidden_state[torch.arange(batch_size), seq_lengths]
        logits = self.classifier(pooled_output)

        loss = None
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            loss = loss_fct(logits, labels)
        return {"loss": loss, "logits": logits}

model = LlamaForClassification(base_model, num_labels)

# Loading and preparing dataset
dataset = load_dataset("ccdv/patent-classification")

def preprocess_function(examples):
    inputs = tokenizer(examples["text"], truncation=True, max_length=512)
    inputs["labels"] = examples["label"]
    return inputs

tokenized_dataset = dataset.map(preprocess_function, batched=True)

# Training arguments
training_args = TrainingArguments(
    output_dir=checkpoint_dir,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    learning_rate=3e-5,
    logging_steps=5,
    evaluation_strategy="no",
    save_strategy="no",  # Disabled checkpoint saving due to an issue
    fp16=True,
    report_to=None
)

# Initializing trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    tokenizer=tokenizer,
)

# Training model
trainer.train()

# Saving final model components
# Saving tokenizer
tokenizer.save_pretrained(checkpoint_dir)
# Saving LoRA adapters separately
model.lm_model.save_pretrained(checkpoint_dir)
# Saving PyTorch model weights
torch.save(model.state_dict(), f"{checkpoint_dir}/model_weights.pth")
# Saving Classifier weights
torch.save(model.classifier.state_dict(), f"{checkpoint_dir}/classifier_weights.pth")

print("Training complete. Components saved:")
print(f"- Model weights: {checkpoint_dir}/model_weights.pth")
print(f"- Tokenizer: {checkpoint_dir}")
print(f"- LoRA adapters: {checkpoint_dir}")

Mounted at /content/drive




tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/301 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/843 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/3.25k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/194M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/39.5M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/39.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/5000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/5000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

  trainer = Trainer(
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mgauravhsn8[0m ([33mgauravhsn8-clark-university[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
5,2.3715
10,2.8015
15,2.5269
20,2.7391
25,2.5822
30,2.3728
35,2.2089
40,2.3293
45,1.9744
50,2.2131


Training complete. Components saved:
- Model weights: /content/drive/My Drive/fine_tuned_model_updated/model_weights.pth
- Tokenizer: /content/drive/My Drive/fine_tuned_model_updated
- LoRA adapters: /content/drive/My Drive/fine_tuned_model_updated



Cannot access gated repo for url https://huggingface.co/meta-llama/Llama-3.2-1B/resolve/main/config.json.
Access to model meta-llama/Llama-3.2-1B is restricted. You must have access to it and be authenticated to access it. Please log in. - silently ignoring the lookup for the file config.json in meta-llama/Llama-3.2-1B.


## Training Output Insights

- **Dataset Processing:** The train, validation, and test datasets were successfully loaded and tokenized at ~450 examples/sec, ensuring efficient preprocessing.
- **Training Performance:** Completed **3 epochs** with final loss between **0.82 and 1.30**, indicating stable fine-tuning.
- **GPU Acceleration:** Used **A100 GPU** in Colab, which **reduced training time by 75%** compared to previous runs.
- **Efficient Fine-Tuning:** Leveraged **LoRA adapters** and **4-bit quantization**, optimizing memory usage and training speed.
- **Model & Tokenizer Saved:** Successfully saved fine-tuned weights, tokenizer, and LoRA adapters for inference.

****************************************************************************************************************************************************************

## New Start From Here. Loading the Saved Fine-Tuned Model and Dataset for Evaluation

This step loads the **fine-tuned LLaMA 3.2 1B model** and the **CCDV Patent Classification Dataset** to perform model evaluation.

### Key Steps:
- **Mount Google Drive:** Accesses the saved model and tokenizer.
- **Load Fine-Tuned Model:**
  - Uses **4-bit quantization** for efficient inference.
  - Restores **LoRA adapters** to retain fine-tuned parameters.
- **Define Classification Model:**
  - Adds a **linear layer** on top of LLaMA to classify patents.
- **Load Dataset:** Fetches the **CCDV Patent Classification Dataset**.
- **Preprocessing:** Tokenizes text with `max_length=512` and maps labels.

With this setup, the model is now ready for **evaluation and inference**.

In [2]:
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments, Trainer
from peft import get_peft_model, LoraConfig, TaskType
from datasets import load_dataset
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

# Configuration
hf_token = "add_your_hugging_face_toke_here"
model_name_or_path = "meta-llama/Llama-3.2-1B"
checkpoint_dir = "add_your_path_to_model_folder_here"
num_labels = 9

# 4-bit quantization config
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Loading model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_auth_token=hf_token)
tokenizer.pad_token = tokenizer.eos_token

base_model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    quantization_config=quant_config,
    device_map="auto",
    use_auth_token=hf_token
)

# Adding LoRA adapters
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=8,
    lora_alpha=16,
    lora_dropout=0.05
)
base_model = get_peft_model(base_model, lora_config)

# Custom classification model
class LlamaForClassification(nn.Module):
    def __init__(self, lm_model, num_labels):
        super().__init__()
        self.lm_model = lm_model
        self.classifier = nn.Linear(lm_model.config.hidden_size, num_labels)

    def forward(self, input_ids, attention_mask=None, labels=None):
        outputs = self.lm_model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            output_hidden_states=True
        )
        hidden_state = outputs.hidden_states[-1]
        # Get position of last non-pad token using attention_mask
        seq_lengths = attention_mask.sum(dim=1) - 1  # -1 for 0-based indexing
        batch_size = input_ids.size(0)
        pooled_output = hidden_state[torch.arange(batch_size), seq_lengths]
        logits = self.classifier(pooled_output)

        loss = None
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            loss = loss_fct(logits, labels)
        return {"loss": loss, "logits": logits}

model = LlamaForClassification(base_model, num_labels)

# Loading and prepare dataset
dataset = load_dataset("ccdv/patent-classification")

def preprocess_function(examples):
    inputs = tokenizer(examples["text"], truncation=True, max_length=512)
    inputs["labels"] = examples["label"]
    return inputs

tokenized_dataset = dataset.map(preprocess_function, batched=True)

Mounted at /content/drive




tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/301 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/843 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/3.25k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/194M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/39.5M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/39.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/5000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/5000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

Map:   0%|          | 0/5000 [00:00<?, ? examples/s]

## Model Evaluation on Training Data

This step evaluates the fine-tuned **LLaMA 3.2 1B** model on the training set.

### What’s Happening?
- **Dataset Formatting:** Converts data to **PyTorch tensors** (`input_ids`, `attention_mask`, `labels`).
- **DataLoader:** Uses **batching (size=8)** and **dynamic padding** for efficient processing.
- **Model Evaluation:** Runs inference with `model.eval()` and collects predictions.
- **Metrics Computed:**
  - **Accuracy** (overall correctness)
  - **Precision, Recall, F1 Score** (for classification performance)
  - **Classification Report** (detailed breakdown)

### Next Steps:
- Check for **overfitting** if accuracy is too high.
- Proceed to **test set evaluation** for real-world performance.

In [3]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

# Mounting Drive
from google.colab import drive
drive.mount('/content/drive')

# Config
checkpoint_dir = "add_your_path_to_model_folder_here"
hf_token = "add_your_hugging_face_toke_here"

# 1. Loading base model with quantization
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-1B",
    quantization_config=quant_config,
    device_map="auto",
    use_auth_token=hf_token
)

# 2. Loading LoRA adapters
base_model = PeftModel.from_pretrained(base_model, checkpoint_dir)

# 3. Creating classification model
class LlamaForClassification(torch.nn.Module):
    def __init__(self, lm_model):
        super().__init__()
        self.lm_model = lm_model
        self.classifier = torch.nn.Linear(lm_model.config.hidden_size, 9)

    def forward(self, input_ids, attention_mask=None):
        outputs = self.lm_model(input_ids, attention_mask, output_hidden_states=True)
        pooled = outputs.hidden_states[-1][:, -1, :]
        return self.classifier(pooled)

# 4. Initializing and load ONLY classifier
model = LlamaForClassification(base_model)
model.classifier.load_state_dict(
    torch.load(f"{checkpoint_dir}/classifier_weights.pth", map_location="cuda")
)
# Cast classifier weights to half precision
model.classifier = model.classifier.half()
model = model.to("cuda")

# 5. Loading tokenizer
tokenizer = AutoTokenizer.from_pretrained(checkpoint_dir)
tokenizer.pad_token = tokenizer.eos_token

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


  torch.load(f"{checkpoint_dir}/classifier_weights.pth", map_location="cuda")


## Testing on Training Dataset

This step evaluates the fine-tuned model on the training data.

### What’s Happening?
- **Dataset Formatting:** Converts data into **PyTorch tensors**.
- **DataLoader:** Uses **batch size = 8** with dynamic padding.
- **Model Inference:** Runs in **evaluation mode**, collects predictions, and compares with true labels.
- **Metrics Computed:** Accuracy, Precision, Recall, and F1 Score.

In [4]:
# Only formatting the required columns for evaluation
tokenized_dataset["train"].set_format("torch", columns=["input_ids", "attention_mask", "labels"])

from transformers import DataCollatorWithPadding
from torch.utils.data import DataLoader

# Let's create a data collator that pads the inputs dynamically
data_collator = DataCollatorWithPadding(tokenizer)

# Let's create a DataLoader for the train set using the collator
train_loader = DataLoader(
    tokenized_dataset["train"],
    batch_size=8,
    collate_fn=data_collator
)

import torch
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, classification_report

all_preds = []
all_labels = []

model.eval()
with torch.no_grad():
    for batch in train_loader:
        input_ids = batch["input_ids"].to("cuda")
        attention_mask = batch["attention_mask"].to("cuda")
        labels = batch["labels"].to("cuda")

        logits = model(input_ids, attention_mask=attention_mask)
        preds = torch.argmax(logits, dim=1)

        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

# Computing evaluation metrics
accuracy = accuracy_score(all_labels, all_preds)
precision, recall, f1, _ = precision_recall_fscore_support(all_labels, all_preds, average="weighted")

print(f"Train Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")
print("\nClassification Report:")
print(classification_report(all_labels, all_preds))

Train Accuracy: 0.7091
Precision: 0.6841
Recall: 0.7091
F1 Score: 0.6884

Classification Report:
              precision    recall  f1-score   support

           0       0.77      0.82      0.79      3614
           1       0.64      0.71      0.67      3357
           2       0.68      0.78      0.73      2099
           3       0.72      0.63      0.67       204
           4       0.67      0.68      0.68       705
           5       0.68      0.70      0.69      1730
           6       0.73      0.80      0.76      5408
           7       0.77      0.82      0.79      5321
           8       0.37      0.11      0.17      2562

    accuracy                           0.71     25000
   macro avg       0.67      0.67      0.66     25000
weighted avg       0.68      0.71      0.69     25000



## Training Evaluation Insights

- **Overall Accuracy:** **70.91%**, indicating a well-trained model.
- **Precision & Recall:** **Precision (68.41%)** and **Recall (70.91%)** are balanced, showing good prediction reliability.
- **Class-Wise Performance:**
  - **Strong Performance:** Classes **0, 6, and 7** have **high precision and recall**.
  - **Weak Performance:** Class **8** has **low recall (11%)**, suggesting it’s harder to classify.
- **Key Takeaway:** While the model performs well overall, class **imbalance or feature overlap** may affect lower-performing categories.
****************************************************************************************************************************************************************

## Testing on Test Dataset

This step evaluates the model on unseen test data to measure real-world performance.

### What’s Happening?
- **Dataset Formatting:** Converts `input_ids`, `attention_mask`, and `labels` into **PyTorch tensors**.
- **DataLoader:** Uses **batch size = 8** with dynamic padding for efficient inference.
- **Model Evaluation:**
  - Runs inference in **evaluation mode** (`model.eval()`).
  - Collects predictions and compares them with actual labels.
- **Metrics Computed:**
  - **Accuracy, Precision, Recall, F1 Score** for overall performance.
  - **Classification Report** for a per-class breakdown.

In [5]:
# Only formatting the required columns for evaluation
tokenized_dataset["test"].set_format("torch", columns=["input_ids", "attention_mask", "labels"])

from transformers import DataCollatorWithPadding
from torch.utils.data import DataLoader

# Let's create a data collator that pads the inputs dynamically
data_collator = DataCollatorWithPadding(tokenizer)

# Let's create a DataLoader for the test set using the collator
test_loader = DataLoader(
    tokenized_dataset["test"],
    batch_size=8,
    collate_fn=data_collator
)

import torch
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, classification_report

all_preds = []
all_labels = []

model.eval()
with torch.no_grad():
    for batch in test_loader:
        input_ids = batch["input_ids"].to("cuda")
        attention_mask = batch["attention_mask"].to("cuda")
        labels = batch["labels"].to("cuda")

        logits = model(input_ids, attention_mask=attention_mask)
        preds = torch.argmax(logits, dim=1)

        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(labels.cpu().numpy())

# Computing evaluation metrics
accuracy = accuracy_score(all_labels, all_preds)
precision, recall, f1, _ = precision_recall_fscore_support(all_labels, all_preds, average="weighted")

print(f"Test Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")
print("\nClassification Report:")
print(classification_report(all_labels, all_preds))

Test Accuracy: 0.6742
Precision: 0.6448
Recall: 0.6742
F1 Score: 0.6540

Classification Report:
              precision    recall  f1-score   support

           0       0.74      0.79      0.76       754
           1       0.60      0.66      0.63       649
           2       0.61      0.73      0.67       394
           3       0.78      0.66      0.72        44
           4       0.67      0.60      0.63       156
           5       0.60      0.62      0.61       358
           6       0.73      0.77      0.75      1107
           7       0.73      0.80      0.76      1035
           8       0.25      0.08      0.12       503

    accuracy                           0.67      5000
   macro avg       0.63      0.63      0.63      5000
weighted avg       0.64      0.67      0.65      5000



## Test Evaluation Insights

- **Overall Accuracy:** **67.42%**, slightly lower than training accuracy, indicating some **generalization gap**.
- **Precision & Recall:** **Precision (64.48%)** and **Recall (67.42%)** suggest the model performs well but struggles with certain classes.
- **Class-Wise Performance:**
  - **Strong Performance:** Classes **0, 6, and 7** maintain high precision and recall.
  - **Weak Performance:** Class **8** has very low recall (**8%**), indicating poor classification.
- **Key Takeaway:** The model generalizes reasonably well but struggles with **underrepresented or complex classes**.
****************************************************************************************************************************************************************

## Making Predictions with the Fine-Tuned Model

This step tests the model by predicting the **category of a sample patent text**.

### What’s Happening?
- **Input Processing:**
  - A sample **patent description** is tokenized with `max_length=512` to match training conditions.
  - The input is converted to tensors and moved to **GPU (`cuda`)** for inference.
- **Model Prediction:**
  - Runs a **forward pass** to obtain **logits** (raw prediction scores).
  - Applies **softmax** to convert logits into **probabilities**.
  - Extracts the **class with the highest probability** as the predicted category.

### Expected Output:
- **Predicted Class Probabilities:** Shows confidence levels for each category.
- **Final Predicted Class:** The **most likely patent category** according to the model.

In [7]:
import torch
import torch.nn.functional as F

# Example patent content
patent_text = (
    "A method for wireless communication using dynamic spectrum allocation is disclosed. "
    "In one embodiment, an apparatus selects communication channels based on measured interference levels "
    "and adjusts transmission power accordingly to optimize network throughput in congested environments. "
    "This method significantly improves signal quality and reduces transmission errors."
)

# Tokenize the input text (ensure truncation and max_length as used in training)
inputs = tokenizer(patent_text, truncation=True, max_length=512, return_tensors="pt")
inputs = {key: value.to("cuda") for key, value in inputs.items()}

# Get model predictions
with torch.no_grad():
    # For our classification model, the forward pass returns logits directly
    logits = model(**inputs)
    # Apply softmax to convert logits to probabilities
    probabilities = F.softmax(logits, dim=-1)
    # Get the predicted class (the index of the maximum probability)
    predicted_class = torch.argmax(probabilities, dim=-1)

# Display the results
print("Predicted class probabilities:", probabilities.cpu().numpy())
print("Predicted class:", predicted_class.item())

Predicted class probabilities: [[3.402e-04 1.356e-03 3.588e-03 1.155e-04 1.208e-04 1.187e-03 1.941e-02
  9.600e-01 1.399e-02]]
Predicted class: 7


## Prediction Insights

- **Predicted Class:** **7**, meaning the model classifies this patent under category **7** with high confidence.
- **Probability Distribution:**
  - Class **7** has the highest probability (**96.00%**), indicating strong confidence in this prediction.
  - Other classes have very low probabilities, suggesting **clear separation** between categories.
- **Model Confidence:** The high probability for class **7** shows the model is making a **decisive prediction** rather than being uncertain.
- **Next Steps:** Further testing with different patents can help verify if the model generalizes well across various categories.

## Other Information

- I couldn't complete the final step of using **vLLM** and deploying on port 9000 due to:
  - **Time constraints** (48 hours)
  - **Resource limitations**

- A significant amount of time was spent on:
  - Training
  - Fine-tuning
  - Decision-making
  - Debugging
  - Improving accuracy

- I did attempt to deploy at the end but faced challenges, which I plan to explain in more detail during the upcoming interview.

- Initially, I used the free tier of Colab with a **T4 GPU** for training (approximately **5 hours**). However:
  - Session timeouts
  - Other issues  
  These factors prevented completion, even with checkpoint saving attempts.

- Later, I upgraded to **Colab Pro** and used an **A100 GPU**:
  - Reduced training time to around **1.5 hours**
  - Required multiple retraining sessions for further fine-tuning

- Currently, the model achieves:
  - **~70% accuracy** on the training set
  - **~67.5% accuracy** on the test set

- Thank you for this opportunity to learn and build.