<a href="https://colab.research.google.com/github/LayanAlrashoud/ClassifyAndSummary/blob/main/Copy_of_classification1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 1. Introduction
In this project, we aimed to fine-tune the BERT model to classify Arabic text into specific categories and preprocess the text using various NLP techniques. We handled the dataset, fine-tuned the BERT model, and evaluated its performance using metrics such as accuracy and F1-score.

## 2. Setup
### 2.1 Mount Google Drive
We begin by mounting Google Drive to access the dataset stored in the drive.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### 2.2 Define Folder Path
Define the path to the folder containing the dataset in Google Drive.

In [None]:
data_folder = '/content/drive/MyDrive/Data'

## 3. Data Preparation
### 3.1 Import Necessary Libraries
Import libraries for handling data, tokenization, and NLP preprocessing, including:
- `os` and re for file system operations and text normalization.
- `transformers` for loading the pre-trained model and tokenizer.
- `nltk` for text preprocessing and stemming.

In [None]:
import os
import re
import torch
from transformers import BertTokenizer
from nltk import download
from nltk.stem.isri import ISRIStemmer



[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

### 3.2 Load Tokenizer and Stemmer
Load the pre-trained BERT tokenizer and Arabic stemmer.

In [None]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('aubmindlab/bert-base-arabertv02')

stemmer = ISRIStemmer()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/381 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/825k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.64M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/384 [00:00<?, ?B/s]



### 3.3 Define Arabic Stopwords
Define a list of Arabic stopwords that will be removed during preprocessing.


In [None]:
# Define the stopwords list
arabic_stopwords = [
    'نيسان', 'هما', 'آه', 'بس', 'أخبر', 'لا', 'مه', 'كل', 'بَسْ', 'إنما', 'ترك', 'لعل', 'إلّا', 'قد', 'ألف',
    'أنى', 'بك', 'وَيْ', 'لستما', 'خبَّر', 'ت', 'ذينك', 'أولالك', 'حيَّ', 'نا', 'هَجْ', 'ما', 'سرا', 'قلما',
    'وما', 'اتخذ', 'عسى', 'يورو', 'عجبا', 'اللائي', 'إياها', 'جنيه', 'كلما', 'جيم', 'فاء', 'بهن', 'أولاء',
    'سين', 'حادي', 'خمسين', 'ض', 'بماذا', 'حزيران', 'شباط', 'مع', 'غير', 'وا', 'كليكما', 'بات', 'ثلاثين',
    'تِه', 'هَاتانِ', 'اثنان', 'ه', 'ثمة', 'ست', 'فو', 'لم', 'مائة', 'يفعلان', 'التي', 'زود', 'نعم', 'شمال',
    'أسكن', 'ة', 'سحقا', 'ثان', 'ذين', 'تين', 'من', 'ا', 'رزق', 'أنتم', 'هؤلاء', 'تارة', 'عل', 'فمن', 'ثامن',
    'كذلك', 'ثمانين', 'أوت', 'رُبَّ', 'إيه', 'لي', 'لبيك', 'ثاء', 'أمسى', 'زعم', 'دولار', 'ثلاثمائة', 'ذلك',
    'هذان', 'ذات', 'بخ', 'لكنما', 'هنا', 'هناك', 'ح', 'لمّا', 'جميع', 'بعض', 'ستمائة', 'هَيْهات', 'لكي', 'خمسون',
    'جويلية', 'لات', 'ريال', 'هذه', 'خ', 'لهما', 'عاد', 'لست', 'آمينَ', 'وإن', 'كأيّن', 'كلا', 'أقبل', 'إحدى',
    'هل', 'اربعون', 'بئس', 'فيما', 'حجا', 'ثلاثون', 'كأنما', 'ثمانمئة', 'اثنا', 'الألى', 'أجمع', 'صبرا', 'كان',
    'انقلب', 'مادام', 'لسن', 'تِي', 'عشر', 'نحو', 'فلان', 'بؤسا', 'ذهب', 'بطآن', 'أحد', 'عاشر', 'درى', 'الآن',
    'لعمر', 'سوف', 'ّأيّان', 'ولكن', 'أول', 'أطعم', 'ألا', 'صاد', 'اللتيا', 'ء', 'مكانَك', 'كيت', 'صبر', 'ثلاثة',
    'جوان', 'ياء', 'عن', 'عند', 'أينما', 'جانفي', 'لما', 'لئن', 'ذَيْنِ', 'فضلا', 'د', 'ءَ', 'كما', 'حيث',
    'سادس', 'تانِ', 'سبعمئة', 'أصلا', 'ميم', 'مافتئ', 'أفريل', 'أوشك', 'أبدا', 'كيفما', 'إياه', 'إذن', 'ومن',
    'أمام', 'واو', 'يوان', 'ع', 'تسعمائة', 'صهْ', 'مايو', 'لسنا', 'نوفمبر', 'ظاء', 'شَتَّانَ', 'مازال', 'خمسمئة',
    'ين', 'إياكن', 'كى', 'أو', 'فإن', 'ن', 'ذانِ', 'ذِه', 'مكانكنّ', 'تلكم', 'أرى', 'ديسمبر', 'شبه', 'كثيرا',
    'ثمنمئة', 'تعلَّم', 'غدا', 'غين', 'هَاتِه', 'يناير', 'كلاهما', 'نيف', 'جلل', 'ليسا', 'إياهن', 'اللتين',
    'إليكم', 'دونك', 'كأنّ', 'عشرين', 'أيّ', 'ذلكم', 'أي', 'أربعاء', 'سابع', 'أل', 'إيهٍ', 'حتى', 'سبت',
    'حبيب', 'خاء', 'هلّا', 'عامة', 'أيضا', 'كسا', 'أى', 'جمعة', 'هاتان', 'ب', 'لوما', 'اللتان', 'أغسطس',
    'باء', 'إذما', 'وإذ', 'ص', 'عليه', 'تعسا', 'إمّا', 'ريث', 'قطّ', 'لو', 'أنت', 'ليست', 'ما برح', 'حين',
    'ف', 'ضحوة', 'وراءَك', 'عما', 'كن', 'إلَيْكَ', 'لكنَّ', 'خلافا', 'عدا', 'لهن', 'بل', 'هيا', 'ارتدّ', 'أين',
    'كرب', 'تسعة', 'نحن', 'تسعمئة', 'فيه', 'لن', 'أُفٍّ', 'إن', 'تفعلين', 'علق', 'هكذا', 'حدَث', 'هَذِه',
    'هيت', 'كي', 'ك', 'صباح', 'وجد', 'حمٌ', 'كذا', 'أنتما', 'أنتِ', 'ستون', 'ستين', 'تلقاء', 'إياك', 'تموز',
    'أهلا', 'حسب', 'إذ', 'عشرون', 'طَق', 'كانون', 'لكما', 'علم', 'اللذين', 'ثاني', 'ذواتا', 'أمد', 'رابع',
    'س', 'لدن', 'شتان', 'عليك', 'كأين', 'أيلول', 'سبعمائة', 'فرادى', 'بغتة', 'قام', 'ؤ', 'أنًّ', 'بين', 'إنا',
    'هاته', 'م', 'ضاد', 'تسعين', 'حاي', 'وهو', 'عَدَسْ', 'إليكنّ', 'طاق', 'مذ', 'بكم', 'همزة', 'ثم', 'بعدا',
    'إنه', 'والذين', 'فبراير', 'سبعون', 'أيار', 'هنالك', 'آهٍ', 'منذ', 'آها', 'أبٌ', 'راح', 'أولئك', 'بلى',
    'تبدّل', 'تسع', 'سبتمبر', 'لا سيما', 'ليرة', 'كلَّا', 'سبعة', 'ذيت', 'حرى', 'له', 'ثمانية', 'سبحان', 'مئة',
    'اثني', 'هاكَ', 'كاد', 'أمامك', 'استحال', 'أعطى', 'هاء', 'خال', 'جير', 'أبريل', 'ذا', 'شيكل', 'قبل',
    'كِخ', 'الذين', 'بمن', 'غ', 'تفعلون', 'ثالث', 'كم', 'مما', 'أربعمائة', 'ئ', 'تانِك', 'وإذا', 'ش', 'تلكما',
    'آذار', 'لكيلا', 'هيّا', 'كيف', 'غالبا', 'لكم', 'إلى', 'خميس', 'هَذِي', 'ته', 'أما', 'في', 'كأي', 'إليكَ',
    'هللة', 'خاصة', 'أخذ', 'ثلاثمئة', 'ذِي', 'خلا', 'إذا', 'خلف', 'صار', 'ما أفعله', 'يونيو', 'ولو', 'شين',
    'ذي', 'آنفا', 'بنا', 'ثماني', 'لستم', 'تاء', 'بيد', 'إليك', 'ذلكما', 'كلتا', 'هاك', 'آ', 'مكانكما',
    'آناء', 'أوّهْ', 'ظ', 'ماي', 'أنشأ', 'سمعا', 'اللاتي', 'نبَّا', 'لستن', 'أكثر', 'أن', 'بهما', 'أفٍّ',
    'تجاه', 'اللذان', 'كاف', 'هَذَيْنِ', 'سنتيم', 'بما', 'ط', 'هبّ', 'آض', 'لها', 'أقل', 'ولا', 'لاسيما',
    'لعلَّ', 'حمدا', 'عيانا', 'صهٍ', 'مارس', 'نون', 'قاف', 'مئتان', 'خمس', 'أخٌ', 'هَذانِ', 'فلا', 'وهب',
    'مرّة', 'ى', 'فيم', 'ليت', 'خمسة', 'نَخْ', 'خامس', 'ستة', 'ذواتي', 'ثمَّ', 'أصبح', 'منه', 'الذي', 'إنَّ',
    'ذانك', 'حَذارِ', 'أ', 'سبع', 'هَاتِي', 'هو', 'لولا', 'الألاء', 'ليستا', 'أربع', 'لنا', 'هذي', 'رجع',
    'درهم', 'على', 'إما', 'شتانَ', 'تحوّل', 'حاء', 'أجل', 'آهاً', 'ج', 'كلّما', 'ممن', 'اربعين', 'تينك',
    'إليكما', 'م', 'إذاً',"اذا", 'سرعان', 'سقى', 'تخذ', 'أبو', 'أمامكَ', 'هي', 'إيانا', 'هَؤلاء', 'بسّ', 'ذال',
    'يفعلون', 'عدَّ', 'آهِ', 'ما انفك', 'عين', 'و', 'قاطبة', 'أنّى', 'أربعة', 'راء', 'دون', 'هاتي', 'ها',
    'منها', 'ثمّ', 'أنتن', 'واهاً', 'بها', 'سوى', 'ر', 'ثلاثاء', 'طالما', 'ابتدأ', 'يوليو', 'مليم', 'رويدك',
    'أيها', 'هلم', 'إياهم', 'أمّا', 'هاهنا', 'ذ', 'هيهات', 'هَاتَيْنِ', 'غداة', 'اللواتي', 'لدى', 'ق',
    'ساء', 'ثمانون', 'ألفى', 'دينار', 'بكن', 'بَلْهَ', 'أعلم', 'تفعلان', 'أخو', 'صراحة', 'بكما', 'أنا',
    'إياكما', 'تَيْنِ', 'هلا', 'أنبأ', 'واحد', 'دال', 'كأن', 'هاتين', 'تسعون', 'مساء', 'مهما', 'زاي', 'ليسوا',
    'إياهما', 'يمين', 'اثنين', 'عوض', 'ظنَّ', 'حيثما', 'ذاك', 'أيا', 'علًّ', 'رأى', 'لام', 'طفق', 'بهم',
    'ليس', 'كليهما', 'ستمئة', 'أمس', 'ظلّ', 'كأيّ', 'حمو', 'آي', 'أم', 'تاسع', 'صدقا', 'آب', 'انبرى',
    'هذين', 'فيها', 'أيّان', 'ذه', 'متى', 'والذي', 'تي', 'هن', 'عشرة', 'طرا', 'حاشا', 'إياي', 'فلس', 'ورد',
    'فيفري', 'أكتوبر', 'حار', 'أربعمئة', 'سبعين', 'مكانكم', 'مثل', 'قرش', 'تحت', 'به', 'لكن', 'غادر', 'ي',
    'بعد', 'لهم', 'إياكم', 'إليكن', 'تلك', 'ز', 'ل', 'إى', 'نَّ', 'أف', 'طاء', 'هم', 'هَذا', 'ثلاث', 'ذلكن',
    'إزاء', 'ذو', 'حبذا', 'ثمان', 'نفس', 'ثمّة', 'معاذ', 'حقا', 'لك', 'تشرين', 'دواليك', 'اخلولق', 'ذوا',
    'بضع', 'فوق', 'فإذا', 'شرع', 'ث', 'إي', 'ذان', 'أوه', 'إلا', 'بي', 'أفعل به', 'يا', 'خمسمائة', 'وُشْكَانَ',
    'جعل', 'بخٍ', 'أضحى', 'هذا'
]

### 3.4 Load Data from Folders
Define a function to load texts and their corresponding labels from dataset folders.

In [None]:
# Function to load texts and labels from folders
def load_texts_from_folders(data_folder):
    texts = []
    labels = []
    categories = ['articlesEconomy', 'articlesLocal', 'articlesInternational', 'articlesSports', 'articlesReligion', 'articlesCulture']

    for category in categories:
        folder_path = os.path.join(data_folder, category)
        for filename in os.listdir(folder_path):
            file_path = os.path.join(folder_path, filename)
            with open(file_path, 'r', encoding='utf-8') as file:
                text = file.read()
                texts.append(text)
                labels.append(category)

    return texts, labels

### 3.5 Remove Stopwords
Define a function to remove stopwords from Arabic text.

In [None]:
# Function to remove stopwords
def remove_stopwords_arabic(text):
    words = text.split()
    filtered_words = [word for word in words if word not in arabic_stopwords]
    return ' '.join(filtered_words)

### 3.6 Normalize Arabic Text
Define a function to normalize Arabic text by removing diacritics, numbers, and punctuation.

In [None]:
# Function to normalize text
def normalize_arabic_text(text):
    text = text.replace("صلى الله عليه وسلم", "صلى_الله_عليه_وسلم")
    text = re.sub(r'[\u0617-\u061A\u064B-\u0652]', '', text)  # Remove diacritics
    text = re.sub(r'[0-9٠-٩]+', '', text)  # Remove numbers
    text = re.sub(r'[إأآءؤئ]', 'ا', text)  # Normalize Hamza
    text = re.sub(r'[^\w\s]', '', text)  # Remove punctuation
    text = re.sub(r'\s+', ' ', text).strip()  # Remove extra spaces
    return text

### 3.7 Apply Stemming
Apply stemming to the normalized text using the `ISRIStemmer`.

In [None]:
# Function to apply stemming
def apply_stemming(text):
    words = text.split()
    stemmed_words = [stemmer.stem(word) for word in words]
    return ' '.join(stemmed_words)

### 3.8 Preprocess Text for BERT
Define a function to preprocess the text for BERT, including tokenization, normalization, stopword removal, and stemming.

### 3.9 Load and Preprocess Data
Load the dataset and preprocess it for BERT.

In [None]:
# Function to preprocess the text for BERT
def preprocess_for_bert(texts):
    input_ids = []
    attention_masks = []

    for text in texts:
        # Normalize, remove stopwords, and apply stemming
        text = normalize_arabic_text(text)
        text = remove_stopwords_arabic(text)
        text = apply_stemming(text)

        # Tokenize using BERT tokenizer
        encoding = tokenizer.encode_plus(
            text,
            add_special_tokens=True,  # Add '[CLS]' and '[SEP]'
            max_length=512,
            padding='max_length',  # Pad to max_length
            return_attention_mask=True,  # Generate attention mask
            return_tensors='pt',  # Return PyTorch tensors
            truncation=True
        )

        input_ids.append(encoding['input_ids'])
        attention_masks.append(encoding['attention_mask'])

    return torch.cat(input_ids, dim=0), torch.cat(attention_masks, dim=0)

# Load texts and labels
texts, labels = load_texts_from_folders(data_folder)

# Preprocess texts for BERT
input_ids, attention_masks = preprocess_for_bert(texts)

### 3.10 Split Data into Train and test Sets
Split the preprocessed data into training and test sets.



In [None]:
from sklearn.model_selection import train_test_split

# Assuming `input_ids` and `attention_masks` are BERT-processed data, and `labels` are your category labels
train_inputs, val_inputs, train_labels, val_labels = train_test_split(input_ids, labels, test_size=0.2)
train_masks, val_masks, _, _ = train_test_split(attention_masks, labels, test_size=0.2)


## 4. Model Training
### 4.1 Encode Labels and Create TensorDatasets
Encode the labels and create TensorDataset objects for both the training and validation sets.

In [None]:
import torch
from torch.utils.data import TensorDataset
from sklearn.preprocessing import LabelEncoder

# Initialize label encoder
label_encoder = LabelEncoder()

# Fit and transform the labels to integers
train_labels_encoded = label_encoder.fit_transform(train_labels)
val_labels_encoded = label_encoder.transform(val_labels)

# Convert encoded labels to tensors
train_labels_tensor = torch.tensor(train_labels_encoded)
val_labels_tensor = torch.tensor(val_labels_encoded)

# Create TensorDatasets
train_data = TensorDataset(train_inputs, train_masks, train_labels_tensor)
val_data = TensorDataset(val_inputs, val_masks, val_labels_tensor)


### 4.2 Create DataLoader
Set up DataLoader for training and validation sets.

In [None]:
from torch.utils.data import DataLoader, RandomSampler, SequentialSampler

batch_size = 16  # Adjust this based on your hardware capabilities

# Create the DataLoader for the training set
train_dataloader = DataLoader(train_data, sampler=RandomSampler(train_data), batch_size=batch_size)

# Create the DataLoader for the validation set
validation_dataloader = DataLoader(val_data, sampler=SequentialSampler(val_data), batch_size=batch_size)


### 4.3 Load Pretrained BERT Model
Load the pre-trained Arabic BERT model for classification.

In [None]:
from transformers import BertForSequenceClassification, AdamW

model = BertForSequenceClassification.from_pretrained(
    "aubmindlab/bert-base-arabertv02",  # The pretrained Arabic BERT model
    num_labels=6,  # Number of classes in your classification task
    output_attentions=False,
    output_hidden_states=False
)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

model.safetensors:   0%|          | 0.00/543M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at aubmindlab/bert-base-arabertv02 and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(64000, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

### 4.4 Define Optimizer
Define the optimizer with the AdamW function and set a learning rate.

In [None]:
optimizer = AdamW(model.parameters(), lr=2e-5, eps=1e-8)



### 4.5 Train the Model
Set up the training loop and fine-tune the model using the training data.

In [None]:
from torch.nn import CrossEntropyLoss
from tqdm import tqdm

epochs = 3  # Number of training epochs

# Training loop
for epoch in range(epochs):
    model.train()
    total_loss = 0

    for step, batch in tqdm(enumerate(train_dataloader), total=len(train_dataloader)):
        batch_inputs = batch[0].to(device)
        batch_masks = batch[1].to(device)
        batch_labels = batch[2].to(device)

        model.zero_grad()  # Reset gradients

        # Forward pass
        outputs = model(batch_inputs, token_type_ids=None, attention_mask=batch_masks, labels=batch_labels)
        loss = outputs.loss
        total_loss += loss.item()

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

    avg_train_loss = total_loss / len(train_dataloader)
    print(f"Epoch {epoch+1}, Training loss: {avg_train_loss}")


100%|██████████| 913/913 [04:40<00:00,  3.25it/s]


Epoch 1, Training loss: 0.4975082416980969


100%|██████████| 913/913 [04:38<00:00,  3.27it/s]


Epoch 2, Training loss: 0.23125725354681043


100%|██████████| 913/913 [04:38<00:00,  3.27it/s]

Epoch 3, Training loss: 0.16367082478405107





## 5. Model Evaluation
### 5.1 Evaluate the Model on Validation Set
Use the validation set to evaluate the performance of the trained model.

In [None]:
import numpy as np
from sklearn.metrics import classification_report

model.eval()
predictions, true_labels = [], []

# Iterate over the validation data
for batch in validation_dataloader:
    batch_inputs = batch[0].to(device)
    batch_masks = batch[1].to(device)
    batch_labels = batch[2].to(device)

    # Get model predictions without calculating gradients
    with torch.no_grad():
        outputs = model(batch_inputs, token_type_ids=None, attention_mask=batch_masks)

    logits = outputs.logits
    # Append the predictions and true labels for each batch
    predictions.append(logits.argmax(dim=-1).cpu().numpy())
    true_labels.append(batch_labels.cpu().numpy())

# Flatten the predictions and true labels
predictions_flat = np.concatenate(predictions, axis=0)
true_labels_flat = np.concatenate(true_labels, axis=0)

# Print classification report
print(classification_report(true_labels_flat, predictions_flat))


              precision    recall  f1-score   support

           0       0.93      0.85      0.89       515
           1       0.85      0.92      0.88       646
           2       0.98      0.90      0.94       313
           3       0.84      0.84      0.84       663
           4       0.98      1.00      0.99       700
           5       0.99      0.99      0.99       815

    accuracy                           0.92      3652
   macro avg       0.93      0.92      0.92      3652
weighted avg       0.93      0.92      0.92      3652



### 5.2 Save the Fine-Tuned Model
Save the fine-tuned BERT model and tokenizer for future use.

In [None]:
model.save_pretrained('./fine-tuned-model')
tokenizer.save_pretrained('./fine-tuned-model')

('./fine-tuned-model/tokenizer_config.json',
 './fine-tuned-model/special_tokens_map.json',
 './fine-tuned-model/vocab.txt',
 './fine-tuned-model/added_tokens.json')

## Analysis on Text Classification
The classification model achieved an overall accuracy of 92%, indicating strong performance in categorizing Arabic text into six different categories. Here’s a breakdown of the key metrics:

**Precision** : Ranges from 85% to 99%. Class 1 has the lowest precision at 85%, suggesting some misclassification, while Class 5 achieved the highest precision at 99%.

**Recall**: Class 3 has the lowest recall at 84%, indicating that the model failed to identify some examples from this category. In contrast, Classes 4 and 5 performed perfectly in recall with 100%.

**F1-Score** : The F1-scores are consistently high, with Class 3 having the lowest F1-score of 0.84 and Classes 4 and 5 reaching an F1-score of 0.99, showing that the model is well-balanced between precision and recall for most classes.

**Support**: All classes had sufficient examples, with the lowest being 313 (Class 2), ensuring that the evaluation is representative.

#### Observations:
- Class 3 performed the weakest, with both recall and F1-score lagging behind other classes. This could indicate overlapping features or data imbalance.
- Classes 4 and 5 had perfect recall and near-perfect F1-scores, indicating very accurate classification for these categories.
#### Suggestions:
- Data Augmentation or Feature Engineering could help improve the performance of Class 3.
- Adjusting class weights might further balance precision and recall across all categories.