<a href="https://colab.research.google.com/github/MoncefBahja/Deep-Learning-for-Arabic-Sentiment-Analysis/blob/main/Fine_Tuning_BERT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Fine-Tuning BERT for Sentiment Analysis** 🚀

Created by [moncef bahja](https://github.com/MoncefBahja) 🥰

---
Hey there! 👋 Today, we’re unleashing the power of **BERT**—specifically, the **[google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased)** model—to tackle sentiment analysis like a pro! 💡  

### What’s the Task?  
We’re working with the **[Arabic 100k Reviews](https://www.kaggle.com/datasets/abedkhooli/arabic-100k-reviews)** dataset—a compilation of 100k (well, 99,999) Arabic reviews from hotels, books, movies, products, and airlines. Each review falls into one of three sentiment categories:  
1️⃣ **Positive** 🌟  
2️⃣ **Negative** 😠  
3️⃣ **Mixed** 🤔  

#### 📂 **About the Dataset**:  
- **Content**: Each review includes a sentiment label (`Positive`, `Negative`, or `Mixed`) and the review text, **separated by tabs** (TSV format).  
- **Processing**: Reviews were cleaned of diacritics and non-Arabic characters, ensuring high-quality data.  
- **Balanced Classes**: No duplicate reviews, making it a robust dataset for classification tasks.  

### Why Fine-Tune BERT?  
BERT has a deep contextual understanding of text, but by fine-tuning it with our dataset, we’ll train it to recognize sentiment in Arabic reviews with precision.  

### What to Expect?  
1️⃣ **Load and explore the dataset** to understand what we’re working with.  
2️⃣ **Prepare BERT for sentiment analysis**.  
3️⃣ **Train and evaluate** our custom sentiment classifier.  

Let’s turn these reviews into actionable insights! 🎉  


---

## **1- Testing Results Before Fine-Tuning** 🛠️

Before fine-tuning the BERT model on our sentiment analysis dataset, we’ll test the pre-trained version as-is. This helps us:

Understand the baseline performance of the model.
Compare results after we fine-tune the model.


### Step 1: Install Required Libraries
To fine-tune BERT, we need the Hugging Face transformers library, which provides pre-trained models, tokenizers, and utilities for NLP tasks. Install it using pip.

In [None]:
!pip install transformers





---


### Step 2: Load BERT Tokenizer and Model


In [None]:
from transformers import BertTokenizer, TFBertForSequenceClassification

model_name = "google-bert/bert-base-uncased"

tokenizer = BertTokenizer.from_pretrained(model_name)
model = TFBertForSequenceClassification.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFBertForSequenceClassification.

Some weights or buffers of the TF 2.0 model TFBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


We’re using the `google-bert/bert-base-uncased` model, designed for English tasks. Here’s what we’re loading:

* `BertTokenizer`: Converts input text into token IDs understandable by the BERT model.
* `TFBertForSequenceClassification`: A BERT model tailored for classification tasks (e.g., sentiment analysis).

Key Parameters:
* `from_pretrained`: Downloads the pre-trained weights of the specified model.



---

### Step 3: Mount Google Drive
We’re working with a dataset stored in Google Drive. Use this code to mount the drive and access the file system.

In [4]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive




---

### Step 4: Load the Dataset
Let’s load the **Arabic 100k Reviews dataset**, a TSV file with two columns:

* `label`: Sentiment category (*Positive, Negative, or Mixed*).
* `text`: Review text in Arabic.

In [None]:
file_path = "/content/drive/MyDrive/Colab Notebooks/05-Fine-Tuning-BERT/ar_reviews_100k.tsv"

In [5]:
import pandas as pd

df = pd.read_csv(file_path, sep='\t')
df

NameError: name 'file_path' is not defined

In [None]:
# Let's copy these random five rows (To Avoid Modifying the Original DataSet)
five_rows_copy = random_5_rows.copy()

# Define custom mappings similar to the Dataset
custom_mapping = {
    0: "Negative",
    1: "Mixed",
    2: "Positive"
}



---


### Step 7: Test the BERT Model on the Dataset
We’re feeding each row to the BERT model to predict sentiments. Here’s what happens step-by-step:

Key Steps:
1. Tokenization: Converts the text into tokens using the `tokenizer`.
2. Prediction: Passes tokenized inputs through the model to get raw outputs (`logits`).
3. Classification: Maps logits to class indices using `argmax` from **pandas Library**.
4. Custom Mapping: Converts class indices into human-readable labels.

In [None]:
# Let's test the BERT model on the Dataset
import numpy as np

# Create an empty list
predicted_sentiments_list = []

for row in five_rows_copy['label']:
  inputs = tokenizer(row)

  input_ids = inputs['input_ids']

  predictions = model.predict([input_ids])

  logits = predictions.logits

  predicted_class = np.argmax(logits)

  predicted_sentiment = custom_mapping[predicted_class]

  # Append to sentiments list
  predicted_sentiments_list.append(predicted_sentiment)

print(predicted_sentiments_list)

Functions and Parameters:
* `tokenizer()`: Tokenizes input text.
* `logits`: Raw scores from the model for each sentiment class.
* `np.argmax(logits)`: Finds the class with the highest score.



---


### Step 8: Compare Predicted Sentiments with Original Labels
Finally, let’s add a new column to the copied DataFrame (**five_rows_copy**) to compare the original labels with the BERT model’s predictions.

* `Bert_Sentiments`: Column containing predicted sentiments.

In [None]:
# Comparing Predicted Sentiments with Original Labels

five_rows_copy['Bert_Sentiments'] = predicted_sentiments_list
five_rows_copy



---


## **Limitations: BERT Model and Arabic Data ?!**🤔
The pre-trained BERT model we’re using (`google-bert/bert-base-uncased`) was trained **primarily on English text**, which means it might not perform optimally on Arabic data. 🧐 While BERT’s architecture is powerful and can generalize well, it wasn’t specifically fine-tuned for Arabic language tasks like sentiment analysis. As a result, the model's predictions on Arabic reviews **might not be as reliable or accurate.**


**Why Fine-Tune the Model?**

By fine-tuning the model using the Arabic 100k Reviews dataset, we can teach the model to better understand the nuances of sentiment in Arabic text. Fine-tuning allows the model to specialize in recognizing the sentiment of Arabic reviews, improving its performance and reliability for this specific task. 🚀



---


## **2- Testing Results After Fine-Tuning 🛠️**
Now that we’ve fine-tuned our BERT model on the Arabic 100k Reviews dataset, it’s time to test it on new data! This will help us:

* Evaluate how well the fine-tuned model performs on Arabic sentiment analysis.
* Compare its performance against the baseline (pre-trained model) and see the improvement.

We’ll run the fine-tuned model on a set of reviews and analyze the results to see how much better it handles the Arabic sentiment classification task! 💡

### Step 1: Preparing Dataset for Training




#### **A- Sampling Dataset (if necessary)**

In [None]:
df.shape


* Since our dataset consists of **99,999 rows**, training on the entire dataset could **take a long time** ⏳.

* Additionally, there's always a risk of facing errors during lengthy training, which can be frustrating.
* To avoid this, **we’ll sample 30,000 rows** to speed up the process and lessen potential issues.

In [None]:
df = df.sample(n=30000, random_state=42)

print(f"Total Number of Rows and Columns After Sampling: {df.shape}")

* `random_state=42`: It ensures that the sampling process is reproducible, meaning **we'll get the same 30,000 rows every time we run the code**. This is important to ensure consistent results during training and evaluation.



---


#### **B- Visualizing Our Dataset** (Optional)

Before we dive into fine-tuning our model, it's essential to understand the distribution of review lengths (**the text column**) in our dataset. Since transformer models like BERT have a limit on the number of tokens (**512 Tokens**) they can process, we need to make sure that our data fits within this limit.

--

**Why Visualize Review Lengths?**
- **Understand the dataset better**: Visualizing the text lengths helps us see whether most reviews (texts) are short or long and decide how to handle them.

- **Decide the right `maxlen` for the model**: Since BERT has a token limit (512 tokens), if your input exceeds 512 tokens:
It will either throw an error (if truncation isn’t set) or truncate to the first 512 tokens. By Visualising the dataset, we can chose a resonable `maxlen` for most cases.

In [None]:
df.text.apply(lambda x: len(x.split())).plot(kind='hist')

---
**What Does This Output Mean?! 🤔**

Looking at the histogram, you can observe that:

1.    **Most reviews are short:** A majority of reviews in the dataset fall within a small word count range (likely less than 100 words).


2.  **Few reviews are very long (outliers):** There’s a tail at the higher end of the histogram, indicating that a small number of reviews have a large number of words.


---
**Next Steps ?! 🤨**

Depending on your analysis, you may want to:

1. **Remove reviews with too few words** that do not contribute much context.

2. **Truncate or split long reviews** to ensure they fit within the model's tokenization limits (e.g., BERT has a token limit of 512 tokens per input).

3. **setting the `maxlen` parameter**: It specifies the maximum number of words (or tokens) that BERT will process for each review. This is crucial because BERT has a fixed limit on the number of tokens it can handle.


---
**How to Decide the Right `maxlen` ?!** 🔍

* **Use shorter `maxlen`:** If your text is generally short (e.g., tweets or short reviews), using a smaller maxlen (like 100 or 150) could work.

* **Use longer `maxlen`:** If your text is generally long (e.g., long articles or detailed reviews), you might need to go up to 512 **(the max token limit of BERT)**.

* **Test and Experiment:** If you're unsure, you can experiment with different values of `maxlen`. Start with something reasonable like **200** and test the performance of the model. You can adjust it based on results.

* **Memory & Speed:** Shorter `maxlen` (e.g., **200**) leads to faster processing and lower memory usage. Longer `maxlen` (e.g., **512**) requires more computational resources.

* **Dataset Analysis:** Visualize the token distribution (as we did earlier) to pick a reasonable `maxlen` that covers **most of your data** without truncating too much.



---
**What If We Choose a Different `maxlen` ?!**🤔

* **Too Short (e.g., `maxlen=50`):** If we use a very short limit, like 50, we risk losing important context from longer reviews. This could negatively impact the model's accuracy, especially for reviews that need more words to express sentiment.

* **Too Long (e.g., `maxlen=512`):** If we set maxlen too high (such as 512), the model might spend more time processing longer sequences that aren't needed for most of the reviews. This can slow down training and inference without improving performance.


---
**🔴 Final Decesion on `maxlen` ?!**


- Setting `maxlen=200` is a reasonable choice.



**Why 200 Tokens?**


If we set `maxlen=200`, we’re limiting the input to **200 tokens**, even though BERT can handle up to **512 tokens**. This is useful when you:

1. **Want to reduce memory usage or training time.**

2. **Know that longer inputs aren't critical for your task.**





---


#### **C- Converting the dataset into a format that BERT (or other ML models) can understand.**

* **Text Data**: BERT and ML models don’t directly understand raw text. We need to tokenize it into numerical representations (tokens) that the model can process.

* **Labels**: Similarly, models require numerical values for the target labels (e.g., sentiment categories like "Mixed," "Positive," and "Negative") instead of text.

------------------------
**How Can We Do that ❓❗**
1. **Numerically Encode the Labels:** Use a library like **scikit-learn's `LabelEncoder`** to map text-based labels (e.g., "Mixed") to numerical values (e.g., 0, 1, ...).


2. **Split the Data:** Divide the dataset into training and validation sets to evaluate the model during training.

3. **Tokenize the Text:** Use BERT’s tokenizer to break down sentences tokens and convert them into numerical IDs.





In [None]:
import numpy as np

for label in np.unique(df['label']):
  print(label)

* `np.unique(df['label'])`: extracts all unique values in the `label` column of the Dataframe.


In [None]:
from sklearn.preprocessing import LabelEncoder

LE = LabelEncoder()

df['label'] = LE.fit_transform(df['label'])
df.head()

* `LabelEncoder()`: This scikit-learn class converts categorical labels into numeric ones.

* `LE.fit_transform(df['label'])`: It maps the text labels to numeric values. For example:
  * "Mixed" → 0
  * "Negative" → 1
  * "Positive" → 2

* The transformed labels are then stored back in the **`df['label']` column**.

In [None]:
# Check after converting labels into numerics
for label in np.unique(df['label']):
  print(label)

### **Step 2: Splitting the Data for Training and Validation:**
To train and evaluate our model, we need to split the dataset into:

* **Training Set**: Used to teach the model.
* **Validation Set**: Used to evaluate the model's performance.



In [None]:
from sklearn.model_selection import train_test_split

train, val = train_test_split(df, test_size=0.2, random_state=42)

* `train_test_split()`: Splits the data set into training set and validation set.

* `test_size=0.2`: Specifies that 20% of the dataset will be used as the validation set.



In [None]:
# Check the new shapes
print(train.shape)
print(val.shape)

In [None]:
# Let's check our sets
train.head()

#### **Resetting the Indexes 🔄 :**
When you split the data, the original row indices from the dataframe are retained. Resetting the index ensures that the new subsets start with sequential indices.



In [None]:
# Let's reset the index values
train.reset_index(drop=True, inplace=True)
val.reset_index(drop=True, inplace=True)

train.head()

* `drop=True`: Drops the old index column instead of adding it back as a new column.

* `inplace=True`: Updates the dataframe directly instead of creating a copy.


In [None]:
train.head()

In [None]:
val.head()

___
### Step 3: Converting Text and Labels into Arrays

When feeding the data into the ML model we need to convert them into **arrays**.
This conversion is necessary because ML typically work with arrays rather than DataFrame Objects.

In [None]:
# Convert text and label columns of the traing data into Numpy arrays
x_train = train['text'].to_numpy()
y_train = train['label'].to_numpy()

# Apply for the validation data also
x_test = val['text'].to_numpy()
y_test = val['label'].to_numpy()

* `.to_numpy()`: Converts the text and label data (stored in pandas DataFrames) into NumPy arrays. This conversion is necessary for feeding the data into ML models, which **typically work with arrays rather than DataFrame objects**.


In [None]:
print(f"Text column of the Train Data:\n {x_train}")
print(f"\nLabel column of the Train Data:\n {y_train}")
print(type(x_train))
print(type(y_train))

In [None]:
print(f"Text column of the Validation (Test) Data:\n {x_test}")
print(f"\nLabel column of the Validation (Test) Data:\n {y_test}")

___
### Step 4: Training the model

Now, we're ready to feed the data into the model.

#### **A - Intall Required Libraries**
We're goint to use the **`ktrain` library** for training our model.

* `ktrain`: This is a high-level wrapper for building and deploying machine learning models. It simplifies working with models like **BERT** for tasks such as classification, text processing, and fine-tuning. **`ktrain` is built on top of TensorFlow** (and other libraries), so you need it for easier interaction with BERT and other models.


* `tensorflow`: This is a deep learning framework that provides the foundation for training and running models. **`ktrain` relies on TensorFlow** to actually execute model training, prediction, and evaluation. Without TensorFlow installed, ktrain won't be able to perform its tasks because it's built on top of it.

In [None]:
!pip install ktrain==0.32.3

In [None]:
!pip install tensorflow==2.15.0

#### **B - Creating a Transformer Object**

The `ktrain.text.Transformer` class allows us to define and preprocess the data for a specific model.


In [None]:
from ktrain import text

# Specify the model name, categories and maxlen for the text transformer object.
model_name = 'bert-base-uncased'
class_names = ['Mixed', 'Negative', 'Positive']

text_transformer = text.Transformer(model_name=model_name, class_names=class_names, maxlen=200)

* `text.Transformer`: **Transformer** (`model_name`, `maxlen: int = 128`, `class_names: Any = []`, `classes: Any = []`, `batch_size: Any` | None = None, use_with_learner: bool = True)

* `model_name='bert-base-uncased'`: Refers to the pre-trained BERT model that will be fine-tuned.

* `class_names=['Mixed', 'Negative', 'Positive']`: Defines the sentiment classes for our task.

* `maxlen=200`: Sets the maximum sequence length for the text data. Reviews longer than this will be truncated, and shorter reviews will be padded.


#### **C - Building and Preporecessing the Data**
We need to preprocess the data before feeding it into the BERT model.




In [None]:
import ktrain

model = text_transformer.get_classifier()
train_data = text_transformer.preprocess_train(x_train, y_train)
val_data = text_transformer.preprocess_test(x_test, y_test)

learner = ktrain.get_learner(model=model, train_data=train_data, val_data=val_data, batch_size=16)

* `ktrain.get_learner()`: Returns a Learner instance that can be used to tune and train Keras models.

* **`get_learner`**: (`model: Any`, `train_data: Any` | None = None, `val_data: Any` | None = None, `batch_size: int` = U.DEFAULT_BS, eval_batch_size: int = U.DEFAULT_BS, workers: int = 1, use_multiprocessing: bool = False) -> Any



* `get_classifier()`: Returns the BERT model ready for fine-tuning.

* `text_transformer.preprocess_train(x_train, y_train)`: This method processes the training data (x_train and y_train), tokenizes the text (converts it into tokens that the model can understand), and encodes the labels.

* `text_transformer.preprocess_test(x_test, y_test)`: Similarly, this method processes the validation data (x_test and y_test), preparing it in the same format as the training data for evaluation.

#### **D - Training The Model**

Let’s fine-tune BERT using **one-cycle learning** to optimize the learning rate and achieve faster convergence.


In [None]:
import time

start = time.time()
learner.fit_onecycle(lr=2e-5, epochs=3)
end = time.time()

print(f"Total Training Time: {round(end - start)} seconds")

* `fit_onecycle(lr=2e-5, epochs=3)`: Trains the model for 3 epochs using a learning rate of 2e-5 (0.00002).

* `2e-5`: The learning rate used for training. This value is small, as fine-tuning pre-trained models generally requires smaller learning rates to avoid damaging the model's pre-trained knowledge.

* `epochs=3`: The number of epochs, i.e., how many times the entire training dataset will be processed by the model.




---
### Step 5: Validating the Model

In [None]:
learner.validate(class_names=class_names)

* `learner.validate(class_names=class_names)`: evaluates the model’s performance on the validation dataset and prints out metrics such as accuracy, precision, recall, and F1-score for each of the sentiment categories *(Mixed, Negative, Positive)*.



---
### Step 6: Saving the Fine-Tuned Model


In [None]:
predictor = ktrain.get_predictor(model=learner.model, preproc=text_transformer)

# Saving the fine-tuned model
predictor.save("/content/drive/MyDrive/Colab Notebooks/05-Fine-Tuning-BERT/Finetuned_Arabic_Sentiment_BERT")

* `ktrain.get_predictor(model=learner.model, preproc=text_transformer)`: Creates a predictor object using the trained model (`learner.model`) and the pre-processing pipeline (`preproc=text_transformer`). This allows you to easily use the model for predictions.

* `.save()`: Saves the predictor object, including the model and pre-processing steps, to the specified directory. This makes it easy to reload the model later.




---
### Step 7: Loading the Saved Model.


In [None]:
from ktrain import load_predictor

# Load the saved predictor
predictor = load_predictor('/content/drive/MyDrive/Colab Notebooks/05-Fine-Tuning-BERT/Finetuned_Arabic_Sentiment_BERT')

### Step 8: Testing the Model with examples

In [None]:
example = ["كله رائع بجد ربنا يكرمك", "اتقوا الله فينا بكفي رفع اسعار الرواتب بالحضيض"]

predictions = predictor.predict(example)
predictions

**Key Takeaway:**
* These examples show that the model effectively handles the complexities of sentiment analysis in Arabic, distinguishing positive remarks from more nuanced or mixed emotions. This demonstrates the power of fine-tuning BERT for specific tasks and datasets. 🎉

---
### Step 9: Testing the Model on out Dataset

Now, let's test the fine-tuned model on our dataset to evaluate its preformance.

In [None]:
random_5_rows

Let's copy these random five rows (To Avoid Modifying the Original DataSet)

In [None]:
random_5_rows_copy = random_5_rows.copy()

In [None]:
# Create an empty list for the predicted sentiments
predicted_sentiments = []

for text in random_5_rows_copy['text']:
  predictions = predictor.predict(text)

  predicted_sentiments.append(predictions)

print(predicted_sentiments)

### Step 9: Compare the Predicted Sentiments with the Original


In [None]:
random_5_rows_copy['Predicted_Sentiments'] = predicted_sentiments
random_5_rows_copy

### **Final Observations** 🌟  

Upon review, the model's predictions often seem **more accurate** than the dataset labels.  

For example:  
- **Example 26002**: Labeled **Positive**, but the model correctly identified it as **Mixed** due to both positive and negative remarks.  

- **Example 19864**: Labeled **Positive**, yet the model’s prediction of **Mixed** better reflects the review's tone, which included significant complaints.

---

### **Conclusion**:  
The model captures nuances better than the dataset labels, showing its strength in understanding complex sentiments. This highlights the power of fine-tuned BERT for sentiment analysis! 🚀


---

### **About the Author** 👨‍💻  

This fine-tuning project was created by **Khaled Soudy**. Check out more of my work on [GitHub](https://github.com/khaledsoudy-1)! 😊  

---