<a href="https://colab.research.google.com/github/AlexKalll/Unsupervised-Machine-Learning/blob/main/Task_3_Sentiment_analysis_to_consumer_feedback_on_sustainable_products.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task-3: Sentiment analysis to consumer feedback on sustainable products
- Sentiment Analysis for Sustainable Product Design using BERT, by fine-tuning it

[**Click Here to visit the project scpecifications**](https://docs.google.com/presentation/d/1wbR8axNdw4NjJMoSTXLECKzepvHMfJ1KTMbHZ9HQiBw/edit?slide=id.p#slide=id.p)

#### Contents
1. [introduction](#scrollTo=iks43xQQx8Er&line=3&uniqifier=1)
2. [Setup and Dataset Loading: the NLTK's twitter_samples Corpus](#scrollTo=YVbhh450xphN&line=5&uniqifier=1)
3. [Dataset Exploration and Preprocessing for BERT](#scrollTo=VUB-MxxdLmG4&line=1&uniqifier=1)
4. [Initial BERT Model Training and Evaluation](#scrollTo=P-ZEjCetOJLd&line=3&uniqifier=1)
5. [Initial Sentiment Analysis with Custom Sentences](#scrollTo=ur9lBWpyPrAI&line=2&uniqifier=1)
6. [Synthetic Data Generation for Sustainability Sentiments](#scrollTo=YGwz07AjR6gC&line=21&uniqifier=1)
7. [Fine-Tuning BERT on Synthetic Sustainability Data](#scrollTo=JvRhPcZlX9bi&line=3&uniqifier=1)
8. [Re-analysis of Custom Sentences with Fine-Tuned Model and Comparison](#scrollTo=LeoJUOJsaOpU&line=6&uniqifier=1)
9. [Conclusion and Insights](#scrollTo=Hz1x-rhkbHFN&line=18&uniqifier=1)

### 1\. Introduction

This project focuses on leveraging natural language processing (NLP) through sentiment analysis to enhance sustainable product design. The objective is to analyze consumer feedback to understand preferences, improve product messaging, and identify sustainable design opportunities, addressing challenges like higher upfront costs and behavioral trade-offs of eco-friendly products.

The project involves several steps:

1.  Loading and exploring NLTK's `twitter_samples` corpus (positive and negative tweets).
2.  Preprocessing the data for BERT sentiment analysis.
3.  Training a BERT model (`bert-base-uncased`) to classify tweet sentiments.
4.  Generating synthetic data, comprising 500 positive, 500 negative, and 500 neutral eco-tweets to reflect sustainability-focused consumer feedback.
5.  Fine-tuning the BERT model on this synthetic dataset to improve its sensitivity to sustainability-related sentiments.
6.  Re-analyzing custom sentences with the fine-tuned model to assess prediction improvements, and comparing the results to draw insights.

The task being is done using core libraries like NLTK, transformers, Pytorch, and Scikit-Learn.


### 2\. Setup and Dataset Loading: the NLTK's `twitter_samples` Corpus

This section handles the initial setup, including installing necessary libraries and downloading the `twitter_samples` corpus from NLTK. This corpus provides a readily available dataset of positive and negative tweets, serving as a baseline for our initial BERT model training.

In [2]:
# ignore any warning texts
import warnings
warnings.filterwarnings("ignore")

In [3]:
# Import necessary libraries
import pandas as pd
import numpy as np
import nltk
import torch
import random
from torch.utils.data import Dataset, DataLoader
from transformers import BertTokenizer, BertForSequenceClassification, get_linear_schedule_with_warmup
from torch.optim import AdamW
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix
from tqdm import tqdm

In [4]:
# set random seeds for reproducibility
def set_seed(seed_value= 42):
  random.seed(seed_value)
  np.random.seed(seed_value)
  torch.manual_seed(seed_value)
  torch.cuda.manual_seed_all(seed_value)

set_seed(42)

In [5]:
# dowanload the NLTK data
nltk.download('twitter_samples')
nltk.download('punkt') # for tokenization if the dataset is not been already

[nltk_data] Downloading package twitter_samples to /root/nltk_data...
[nltk_data]   Unzipping corpora/twitter_samples.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

### 3\. Dataset Exploration and Preprocessing for BERT

In this step, we load the positive and negative tweets from the `twitter_samples` corpus. We then combine them and preprocess the data specifically for BERT. This involves tokenization using `BertTokenizer`, which handles converting text into a format understandable by BERT (input IDs, attention masks, and token type IDs). We also encode the labels (positive, negative) numerically.

To prepare the data for BERT, we define a custom `TweetDataset` class. This class takes raw text and labels, and internally uses the `BertTokenizer` to tokenize and encode each text into numerical inputs (input IDs, attention masks) that BERT expects. `input_ids` are the token IDs, `attention_mask` indicates which tokens are actual words versus padding, and `token_type_ids` distinguish between different segments if processing a pair of sentences (not needed here). Finally, `DataLoader`s are created to efficiently batch and load data during training and validation.


In [6]:
# Load positive and negative tweets from NLTK's twitter_samples
from nltk.corpus import twitter_samples

positive_tweets = twitter_samples.strings('positive_tweets.json')
negative_tweets = twitter_samples.strings('negative_tweets.json')

In [7]:
print("Number of positive tweets:", len(positive_tweets))
print("Number of negative tweets:", len(negative_tweets))

Number of positive tweets: 5000
Number of negative tweets: 5000


In [8]:
positive_tweets = positive_tweets[:700]
negative_tweets = negative_tweets[:700]

In [9]:
print("Number of positive tweets:", len(positive_tweets))
print("Number of negative tweets:", len(negative_tweets))

Number of positive tweets: 700
Number of negative tweets: 700


In [10]:
# Create labels: 1 for positive, 0 for negative
positive_labels = [1] * len(positive_tweets)
negative_labels = [0] * len(negative_tweets)

In [11]:
# Combine tweets and labels
tweets = positive_tweets + negative_tweets
labels = positive_labels + negative_labels

In [12]:
# create a Pandas DataFrame
df_nltk = pd.DataFrame({'tweet': tweets, 'sentiment': labels})
print("First 5 rows:")
display(df_nltk.head())
print("\nLast 5 rows:")
display(df_nltk.tail())

First 5 rows:


Unnamed: 0,tweet,sentiment
0,#FollowFriday @France_Inte @PKuchly57 @Milipol...,1
1,@Lamb2ja Hey James! How odd :/ Please call our...,1
2,@DespiteOfficial we had a listen last night :)...,1
3,@97sides CONGRATS :),1
4,yeaaaah yippppy!!! my accnt verified rqst has...,1



Last 5 rows:


Unnamed: 0,tweet,sentiment
1395,@crosseyesmiley I didn't see you :(,0
1396,Tommy and Georgia are so cute they actually hu...,0
1397,@21oclock :((( bout to instant transmission,0
1398,@daiIysolos zayn malik please :((,0
1399,@angelhairhes i dont know what to dm you :((,0


In [13]:
print(f"Total tweets loaded: {len(df_nltk)}")
print("Sentiment distribution:")
print(df_nltk['sentiment'].value_counts())
print("\nSample positive tweet:", df_nltk[df_nltk['sentiment'] == 1].iloc[0]['tweet'])
print("Sample negative tweet:", df_nltk[df_nltk['sentiment'] == 0].iloc[0]['tweet'])

Total tweets loaded: 1400
Sentiment distribution:
sentiment
1    700
0    700
Name: count, dtype: int64

Sample positive tweet: #FollowFriday @France_Inte @PKuchly57 @Milipol_Paris for being top engaged members in my community this week :)
Sample negative tweet: hopeless for tmr :(


In [14]:
# initialize BERT tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')  #uncased specifies the model is trained on lowercase texts

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

In [15]:
# finding the longest tweet
max_len_tweet = max([len(tokenizer.encode(tweet, add_special_tokens=True)) for tweet in df_nltk['tweet'].to_list()])
print(f"Maximum token length found in the dataset: {max_len_tweet}")

Maximum token length found in the dataset: 70


In [16]:
# custom dataset class for BERT
class TweetDataset(Dataset):
  def __init__(self, texts, labels, tokenizer, max_len):
    self.texts = texts
    self.labels = labels
    self.tokenizer = tokenizer
    self.max_len = max_len

  def __len__(self):
    return len(self.texts)

  def __getitem__(self, index):
    text = str(self.texts[index])
    label = self.labels[index]

    encoding = self.tokenizer.encode_plus(
        text,
        add_special_tokens = True,  # add [cls] and '[SEP] token to the text
        max_length = self.max_len,  # pad/truncate to max_len
        return_token_type_ids=False, # not needed for single sentence classification
        padding = 'max_length',  # padding to max_len
        truncation = True,     # truncate if longer than max_len
        return_attention_mask = True,  # return attention mask
        return_tensors = 'pt',  # return PyTorch tensors
    )

    return {
        'text': text,
        'input_ids': encoding['input_ids'].flatten(),
        'attention_mask': encoding['attention_mask'].flatten(),
        'labels': torch.tensor(label, dtype=torch.long)
    }



In [17]:
max_len = 128  # enough since the tweets are generally short, even this case the longest is 118

# split data into training and validation sets, in 80/20 spliting mechanism. and stratify ensures that both sets have similar proportion of +ve and -ve samples
train_texts, val_texts, train_labels, val_labels = train_test_split(
    df_nltk['tweet'].to_numpy(),
    df_nltk['sentiment'].to_numpy(),
    test_size=0.2,
    random_state=42,
    stratify=df_nltk['sentiment'].to_numpy()
)

# create dataset instances
train_dataset = TweetDataset(train_texts, train_labels, tokenizer, max_len)
val_dataset = TweetDataset(val_texts, val_labels, tokenizer, max_len)

# create data loaders to batch the data, for efficient training
batch_size = 16
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

print(f"Training samples: {len(train_dataset)}")
print(f"Validation samples: {len(val_dataset)}")

Training samples: 1120
Validation samples: 280


In [18]:
train_dataset[0]

{'text': '@jenxmish @wittykrushnic you are the only thing that i need :(',
 'input_ids': tensor([  101,  1030, 15419,  2595, 15630,  2232,  1030, 25591, 21638, 20668,
          8713,  2017,  2024,  1996,  2069,  2518,  2008,  1045,  2342,  1024,
          1006,   102,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
             0,     0,     0,     0,     0,

### 4\. Initial BERT Model Training and Evaluation

This section involves loading a pre-trained `bert-base-uncased` model for sequence classification and setting up the training loop. We will train the model on the `twitter_samples` dataset to get a baseline understanding of its performance on general sentiment analysis. We'll use AdamW optimizer and a linear learning rate scheduler, common practices for fine-tuning BERT.

The `BertForSequenceClassification` model is chosen because our task is to classify sequences (tweets) into predefined categories (sentiments). The training loop iterates over epochs, performing forward passes, calculating loss, backpropagating gradients, and updating model weights. Evaluation on a validation set after each epoch provides insights into the model's generalization performance.


In [19]:
# initialize the BERT with num_lables = 2 since we just have positive/negative sentiments only in this case

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [20]:
# Using a GPU significantly speeds up training. since the colab has its own.
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

In [21]:
# Define optimizer and scheduler
EPOCHS = 5
LEARNING_RATE = 2e-5

optimizer = AdamW(model.parameters(), lr=LEARNING_RATE)
total_steps = len(train_loader) * EPOCHS
scheduler = get_linear_schedule_with_warmup(
    optimizer,
    num_warmup_steps=0, # No warm-up steps
    num_training_steps=total_steps
)

# This function defines a single training pass over the data.
# It performs forward pass, computes loss, backpropagates, and updates weights.
def train_epoch(model, data_loader, optimizer, device, scheduler):
    model.train()
    losses = []
    correct_predictions = 0

    for batch in tqdm(data_loader, desc="Training"):
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)

        # Forward pass: model outputs logits and loss
        outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        logits = outputs.logits
        predictions = torch.argmax(logits, dim=1) # Get predicted class (0 or 1)

        correct_predictions += torch.sum(predictions == labels) # Count correct predictions
        losses.append(loss.item()) # Store loss

        loss.backward() # Backpropagate error
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()
        scheduler.step() # Update learning rate
        optimizer.zero_grad()

    return np.mean(losses), correct_predictions.double() / len(data_loader.dataset)

In [22]:
# This function evaluates the model's performance on a given data loader (e.g., validation set).
# It calculates loss, accuracy, precision, recall, and F1-score without updating weights.
def eval_model(model, data_loader, device):
    model.eval()
    losses = []
    correct_predictions = 0
    all_predictions = []
    all_labels = []

    with torch.no_grad():
        for batch in tqdm(data_loader, desc="Evaluating"):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)

            outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss
            logits = outputs.logits
            predictions = torch.argmax(logits, dim=1)

            correct_predictions += torch.sum(predictions == labels)
            losses.append(loss.item())

            all_predictions.extend(predictions.cpu().numpy()) # Store predictions
            all_labels.extend(labels.cpu().numpy()) # Store true labels

    avg_loss = np.mean(losses)
    accuracy = correct_predictions.double() / len(data_loader.dataset)
    # precision_recall_fscore_support calculates metrics.
    # 'binary' average for 2-class classification. zero_division=0 handles cases where no true samples for a class exist.
    precision, recall, f1, _ = precision_recall_fscore_support(all_labels, all_predictions, average='binary', zero_division=0)

    return avg_loss, accuracy, precision, recall, f1

In [23]:
print(f"Training BERT model on NLTK twitter_samples for {EPOCHS} epochs...")
history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': [], 'val_precision': [], 'val_recall': [], 'val_f1': []}


for epoch in range(EPOCHS):
    print(f"\nEpoch {epoch + 1}/{EPOCHS}")
    train_loss, train_acc = train_epoch(model, train_loader, optimizer, device, scheduler)
    val_loss, val_acc, val_precision, val_recall, val_f1 = eval_model(model, val_loader, device)

    print(f"Train loss: {train_loss:.4f}, Train accuracy: {train_acc:.4f}")
    print(f"Val loss: {val_loss:.4f}, Val accuracy: {val_acc:.4f}, Precision: {val_precision:.4f}, Recall: {val_recall:.4f}, F1-Score: {val_f1:.4f}")

    # Store results in history dictionary
    history['train_loss'].append(train_loss)
    history['train_acc'].append(train_acc.item())
    history['val_loss'].append(val_loss)
    history['val_acc'].append(val_acc.item())
    history['val_precision'].append(val_precision)
    history['val_recall'].append(val_recall)
    history['val_f1'].append(val_f1)

Training BERT model on NLTK twitter_samples for 5 epochs...

Epoch 1/5


Training: 100%|██████████| 70/70 [00:22<00:00,  3.12it/s]
Evaluating: 100%|██████████| 18/18 [00:01<00:00,  9.49it/s]


Train loss: 0.1653, Train accuracy: 0.9536
Val loss: 0.0125, Val accuracy: 0.9964, Precision: 1.0000, Recall: 0.9929, F1-Score: 0.9964

Epoch 2/5


Training: 100%|██████████| 70/70 [00:22<00:00,  3.17it/s]
Evaluating: 100%|██████████| 18/18 [00:02<00:00,  8.41it/s]


Train loss: 0.0066, Train accuracy: 0.9982
Val loss: 0.0006, Val accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000

Epoch 3/5


Training: 100%|██████████| 70/70 [00:22<00:00,  3.08it/s]
Evaluating: 100%|██████████| 18/18 [00:02<00:00,  8.67it/s]


Train loss: 0.0070, Train accuracy: 0.9991
Val loss: 0.0004, Val accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000

Epoch 4/5


Training: 100%|██████████| 70/70 [00:22<00:00,  3.05it/s]
Evaluating: 100%|██████████| 18/18 [00:02<00:00,  8.18it/s]


Train loss: 0.0006, Train accuracy: 1.0000
Val loss: 0.0003, Val accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000

Epoch 5/5


Training: 100%|██████████| 70/70 [00:22<00:00,  3.05it/s]
Evaluating: 100%|██████████| 18/18 [00:02<00:00,  8.38it/s]

Train loss: 0.0005, Train accuracy: 1.0000
Val loss: 0.0003, Val accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000





### 5\. Initial Sentiment Analysis with Custom Sentences

Before fine-tuning on synthetic sustainability data, we'll test our current model with custom sentences related to eco-friendly products. This will give us a baseline understanding of how well the general BERT model interprets sustainability-specific sentiments and highlight the need for specialized fine-tuning.

This step involves creating a prediction function that takes a text string, the trained model, tokenizer, and device as input. It tokenizes the input text, passes it through the model, and converts the model's raw output (logits) into probability scores and a final sentiment prediction (Positive or Negative).


In [24]:
def predict_sentiment(text, model, tokenizer, device, max_len=128):
    model.eval()
    # Encode the input text using the BERT tokenizer
    encoding = tokenizer.encode_plus(
        text,
        add_special_tokens=True,
        max_length=max_len,
        return_token_type_ids=False,
        padding='max_length',
        truncation=True,
        return_attention_mask=True,
        return_tensors='pt',
    )

    input_ids = encoding['input_ids'].to(device)
    attention_mask = encoding['attention_mask'].to(device)

    with torch.no_grad():
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        logits = outputs.logits
        probabilities = torch.softmax(logits, dim=1)
        _, prediction = torch.max(probabilities, dim=1)

    sentiment_map = {0: 'Negative', 1: 'Positive'}
    return sentiment_map[prediction.item()], probabilities.flatten()[prediction.item()].item() # Return sentiment and its probability


In [25]:
# Test custom sentences (as per task description)
custom_sentences = [
    "This degradable plastics took more than 2 years to degrade",
    "This plastic bottle is degrading really fast",
    "Eco-friendly products are usually very expensive.",
    "I love how this sustainable packaging reduces waste.",
    "The carbon footprint of this product is too high.",
    "This product is amazing and totally recyclable!"
]

print("\n--- Initial Sentiment Predictions on Custom Sentences ---")
for sentence in custom_sentences:
    sentiment, prob = predict_sentiment(sentence, model, tokenizer, device)
    print(f"Sentence: '{sentence}' -> Predicted Sentiment: {sentiment} (Probability: {prob:.4f})")


--- Initial Sentiment Predictions on Custom Sentences ---
Sentence: 'This degradable plastics took more than 2 years to degrade' -> Predicted Sentiment: Positive (Probability: 0.8428)
Sentence: 'This plastic bottle is degrading really fast' -> Predicted Sentiment: Positive (Probability: 0.9720)
Sentence: 'Eco-friendly products are usually very expensive.' -> Predicted Sentiment: Positive (Probability: 0.8742)
Sentence: 'I love how this sustainable packaging reduces waste.' -> Predicted Sentiment: Positive (Probability: 0.9524)
Sentence: 'The carbon footprint of this product is too high.' -> Predicted Sentiment: Positive (Probability: 0.8171)
Sentence: 'This product is amazing and totally recyclable!' -> Predicted Sentiment: Positive (Probability: 0.9843)


#### Key Observations:
- The model **over-predicts Positive sentiment** for sustainability-critical phrases, likely due to:
  - Bias in the base BERT model (trained on general text, not eco-specific feedback).
  - Words like *degradable* or *eco-friendly* are often positive in generic datasets but need nuanced context.

### 6\. Synthetic Data Generation for Sustainability Sentiments

To make the BERT model more sensitive to sustainability-related language, we generate a synthetic dataset of 500 positive, 500 negative, and 500 neutral eco-tweets. This step is crucial for fine-tuning, as it provides the model with specific examples of how sustainability concepts are expressed with different sentiments.

The `generate_eco_tweets` function uses predefined templates and lists of relevant terms (e.g., `eco_products`, `sustainable_features`, `negative_impact`) to construct new, unique sentences. This method allows for controlled generation of data that is specifically tailored to the domain of sustainable products and covers positive, negative, and neutral sentiments.


In [26]:
# Function to generate synthetic eco-tweets
def generate_eco_tweets(num_tweets, sentiment_type):
    tweets = []

    for _ in tqdm(range(num_tweets), desc=f"Generating {sentiment_type} tweets"):
      if sentiment_type == 'positive':
          templates = [
              "I love how this [eco_product] helps the environment!",
              "This [sustainable_feature] makes [product] truly amazing.",
              "So happy with my new [eco_product] – reducing waste and living green!",
              "Finally, a [product] that's both effective and [eco_feature]!",
              "This [eco_product] is a game-changer for [environmental_benefit].",
              "Feeling good about supporting brands with [sustainable_practice] like this [product].",
              "Highly recommend this [eco_product] for its [positive_attribute] and environmental impact.",
              "The [sustainable_material] of this [product] is fantastic!",
              "Proud to use this [product] with its [positive_environmental_impact]!",
              "This [product] sets a new standard for [eco_friendly_design]."
          ]
          eco_products = ["compostable packaging", "reusable water bottle", "bamboo toothbrush", "solar charger", "electric vehicle", "biodegradable soap", "upcycled furniture", "recycled clothing", "organic cotton t-shirt", "refillable cleaning product"]
          sustainable_features = ["zero-waste design", "energy-efficient motor", "plant-based ingredients", "carbon-neutral manufacturing", "fair trade sourcing", "durable and long-lasting", "responsibly sourced materials", "closed-loop system", "water-saving technology", "minimalist design"]
          products = ["shampoo", "clothing", "phone case", "detergent", "sneakers", "bag", "car", "computer", "kitchenware", "toy"]
          environmental_benefits = ["reducing plastic pollution", "saving energy", "conserving water", "minimizing carbon emissions", "protecting ecosystems"]
          positive_attributes = ["durability", "innovation", "design", "effectiveness", "eco-consciousness"]
          sustainable_materials = ["bamboo", "recycled plastic", "hemp", "organic cotton", "cork", "recycled glass", "mushroom leather"]
          positive_environmental_impact = ["low carbon footprint", "no harmful chemicals", "supports biodiversity", "saves natural resources", "reduces landfill waste"]
          eco_friendly_design = ["eco-friendly design", "sustainable production", "circular economy principles"]

          template = random.choice(templates)
          tweet = template.replace("[eco_product]", random.choice(eco_products)) \
                          .replace("[sustainable_feature]", random.choice(sustainable_features)) \
                          .replace("[product]", random.choice(products)) \
                          .replace("[environmental_benefit]", random.choice(environmental_benefits)) \
                          .replace("[positive_attribute]", random.choice(positive_attributes)) \
                          .replace("[sustainable_material]", random.choice(sustainable_materials)) \
                          .replace("[positive_environmental_impact]", random.choice(positive_environmental_impact)) \
                          .replace("[eco_friendly_design]", random.choice(eco_friendly_design))
          tweets.append(tweet)

      # for negative eco-tweets
      elif sentiment_type == 'negative':
          templates = [
              "This [eco_product] claims to be green but it's not working as expected.",
              "Disappointed with the [sustainable_feature] of this [product] – it's still [negative_impact].",
              "Why is this [eco_product] so [negative_attribute]? Not worth the high price.",
              "The [environmental_claim] of this [product] feels like greenwashing.",
              "This so-called [eco_product] broke after a week. What a waste!",
              "I'm concerned about the [negative_impact] of this [product] despite its eco claims.",
              "Another [product] with [sustainable_material] that doesn't hold up.",
              "The [eco_friendly_design] of this [product] led to poor performance.",
              "Too many [harmful_chemicals] even in supposedly [eco_product].",
              "This [product] is not as [eco_feature] as they claim."
          ]
          eco_products = ["biodegradable plastic", "rechargeable battery", "compostable cutlery", "recycled paper", "organic cotton shirt", "bamboo straw", "solar panel", "electric car", "eco-friendly paint", "water-saving showerhead"]
          sustainable_features = ["packaging", "material", "durability", "environmental claims", "production process"]
          products = ["bag", "toy", "utensil", "box", "t-shirt", "cup", "device", "vehicle", "can", "fixture"]
          negative_impact = ["still polluting", "not lasting", "too expensive", "hard to dispose of", "not truly sustainable"]
          negative_attribute = ["flimsy", "expensive", "ineffective", "ugly", "short-lived"]
          environmental_claim = ["eco-friendly label", "sustainable sourcing", "biodegradable claim", "compostable claim", "recyclable packaging"]
          sustainable_material = ["recycled plastic", "plant-based material", "organic cotton", "bamboo"]
          eco_friendly_design = ["eco-friendly design", "sustainable production", "circular economy principles"]
          harmful_chemicals = ["hidden chemicals", "toxic dyes", "microplastics"]
          eco_feature = ["green", "sustainable", "ethical"]

          template = random.choice(templates)
          tweet = template.replace("[eco_product]", random.choice(eco_products)) \
                          .replace("[sustainable_feature]", random.choice(sustainable_features)) \
                          .replace("[product]", random.choice(products)) \
                          .replace("[negative_impact]", random.choice(negative_impact)) \
                          .replace("[negative_attribute]", random.choice(negative_attribute)) \
                          .replace("[environmental_claim]", random.choice(environmental_claim)) \
                          .replace("[sustainable_material]", random.choice(sustainable_material)) \
                          .replace("[eco_friendly_design]", random.choice(eco_friendly_design)) \
                          .replace("[harmful_chemicals]", random.choice(harmful_chemicals)) \
                          .replace("[eco_feature]", random.choice(eco_feature))
          tweets.append(tweet)

      # for neutral eco-tweets
      elif sentiment_type == 'neutral':
          templates = [
              "This [product] uses [sustainable_material].",
              "The [eco_product] is made by [company_name].",
              "Information about the [eco_product] manufacturing process is available online.",
              "Reviewing the specifications for this [product] with [eco_feature].",
              "Considering a [eco_product] for general use.",
              "The [sustainable_feature] of this [product] is listed as a key characteristic.",
              "This [product] aims for [environmental_goal].",
              "Details about [sustainable_material] in [product] are provided.",
              "Looking into the [environmental_impact] of [product].",
              "A discussion about [eco_friendly_aspect] of [product] is ongoing."
          ]
          products = ["phone", "desk", "shirt", "bag", "chair", "bottle", "device", "car", "table", "container"]
          sustainable_material = ["recycled plastic", "bamboo", "organic cotton", "renewable resources", "responsibly sourced wood"]
          eco_product = ["eco-friendly alternative", "sustainable option", "green product"]
          company_name = ["EcoCorp", "GreenTech", "Sustainable Innovations", "PurePlanet", "TerraGoods"]
          eco_feature = ["biodegradable components", "recyclable parts", "low energy consumption"]
          environmental_goal = ["waste reduction", "carbon neutrality", "water conservation"]
          environmental_impact = ["carbon footprint", "material sourcing", "disposal methods"]
          eco_friendly_aspect = ["sustainable packaging", "recycled content", "energy efficiency"]

          template = random.choice(templates)
          tweet = template.replace("[product]", random.choice(products)) \
                          .replace("[sustainable_material]", random.choice(sustainable_material)) \
                          .replace("[eco_product]", random.choice(eco_product)) \
                          .replace("[company_name]", random.choice(company_name)) \
                          .replace("[eco_feature]", random.choice(eco_feature)) \
                          .replace("[environmental_goal]", random.choice(environmental_goal)) \
                          .replace("[environmental_impact]", random.choice(environmental_impact)) \
                          .replace("[eco_friendly_aspect]", random.choice(eco_friendly_aspect))
          tweets.append(tweet)
    return tweets

In [27]:
# Generate synthetic data
num_synthetic_tweets = 500
positive_eco_tweets = generate_eco_tweets(num_synthetic_tweets, 'positive')
negative_eco_tweets = generate_eco_tweets(num_synthetic_tweets, 'negative')
neutral_eco_tweets = generate_eco_tweets(num_synthetic_tweets, 'neutral')

Generating positive tweets: 100%|██████████| 500/500 [00:00<00:00, 130484.82it/s]
Generating negative tweets: 100%|██████████| 500/500 [00:00<00:00, 117244.48it/s]
Generating neutral tweets: 100%|██████████| 500/500 [00:00<00:00, 139931.41it/s]


In [28]:
# Create a DataFrame for synthetic data
# Assign numerical labels: 0 for Negative, 1 for Positive, 2 for Neutral

print(f"Length of positive_eco_tweets: {len(positive_eco_tweets)}")
print(f"Length of negative_eco_tweets: {len(negative_eco_tweets)}")
print(f"Length of neutral_eco_tweets: {len(neutral_eco_tweets)}")

synthetic_df = pd.DataFrame({
    'tweet': positive_eco_tweets + negative_eco_tweets + neutral_eco_tweets,
    'sentiment': [1] * num_synthetic_tweets + [0] * num_synthetic_tweets + [2] * num_synthetic_tweets
})

print(f"Total synthetic tweets generated: {len(synthetic_df)}")
print("\nSynthetic sentiment distribution:")
print(synthetic_df['sentiment'].value_counts())
print("\nSample synthetic positive tweet:", synthetic_df[synthetic_df['sentiment'] == 1].iloc[0]['tweet'])
print("Sample synthetic negative tweet:", synthetic_df[synthetic_df['sentiment'] == 0].iloc[0]['tweet'])
print("Sample synthetic neutral tweet:", synthetic_df[synthetic_df['sentiment'] == 2].iloc[0]['tweet'])

Length of positive_eco_tweets: 500
Length of negative_eco_tweets: 500
Length of neutral_eco_tweets: 500
Total synthetic tweets generated: 1500

Synthetic sentiment distribution:
sentiment
1    500
0    500
2    500
Name: count, dtype: int64

Sample synthetic positive tweet: This fair trade sourcing makes detergent truly amazing.
Sample synthetic negative tweet: I'm concerned about the hard to dispose of of this device despite its eco claims.
Sample synthetic neutral tweet: Details about bamboo in chair are provided.


In [29]:
# Prepare synthetic data for BERT fine-tuning
train_synthetic_texts, val_synthetic_texts, train_synthetic_labels, val_synthetic_labels = train_test_split(
    synthetic_df['tweet'].to_numpy(),
    synthetic_df['sentiment'].to_numpy(),
    test_size=0.2,
    random_state=42,
    stratify=synthetic_df['sentiment'].to_numpy()
)

# Create Dataset and DataLoader instances for the synthetic data.
train_synthetic_dataset = TweetDataset(train_synthetic_texts, train_synthetic_labels, tokenizer, max_len)
val_synthetic_dataset = TweetDataset(val_synthetic_texts, val_synthetic_labels, tokenizer, max_len)

train_synthetic_loader = DataLoader(train_synthetic_dataset, batch_size=batch_size, shuffle=True)
val_synthetic_loader = DataLoader(val_synthetic_dataset, batch_size=batch_size, shuffle=False)

print(f"\nSynthetic training samples: {len(train_synthetic_dataset)}")
print(f"Synthetic validation samples: {len(val_synthetic_dataset)}")


Synthetic training samples: 1200
Synthetic validation samples: 300


### 7\. Fine-Tuning BERT on Synthetic Sustainability Data

Now, we will fine-tune the previously trained BERT model using the synthetically generated dataset. This step adapts the model's understanding to the specific language and nuances of sustainability-related consumer feedback, aiming to improve its accuracy in classifying such sentiments. The model will now predict among 3 labels (negative, positive, neutral).

Since we are adding a 'neutral' class, the `num_labels` parameter for `BertForSequenceClassification` needs to be changed from 2 to 3. The training process remains similar to the initial training, but it's applied to the new synthetic dataset. The evaluation function is also adapted to handle multi-class classification by changing the `average` parameter for precision, recall, and f1-score calculation to `weighted`.


In [30]:
# Re-load model for fine-tuning with 3 labels (Positive, Negative, Neutral)

fine_tuned_model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)
fine_tuned_model = fine_tuned_model.to(device)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [31]:
epochs = 5
fine_tune_learning_rate = 2e-5

fine_tune_optimizer = AdamW(fine_tuned_model.parameters(), lr=fine_tune_learning_rate)
fine_tune_total_steps = len(train_synthetic_loader) * epochs
fine_tune_scheduler = get_linear_schedule_with_warmup(
    fine_tune_optimizer,
    num_warmup_steps=0,
    num_training_steps=fine_tune_total_steps
)

# Function to train one epoch (re-using the previous function, as it's general enough)
def train_epoch(model, data_loader, optimizer, device, scheduler):
    model.train()
    losses = []
    correct_predictions = 0

    for batch in tqdm(data_loader, desc="Training"):
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)

        # Forward pass: model outputs logits and loss
        outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
        loss = outputs.loss
        logits = outputs.logits
        predictions = torch.argmax(logits, dim=1)

        correct_predictions += torch.sum(predictions == labels)
        losses.append(loss.item()) # Store loss

        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()

    return np.mean(losses), correct_predictions.double() / len(data_loader.dataset)

In [32]:
# Function to evaluate the model (re-using the previous function, but with average='weighted' for multiclass)
def eval_model_multiclass(model, data_loader, device):
    model.eval()
    losses = []
    correct_predictions = 0
    all_predictions = []
    all_labels = []

    with torch.no_grad():
        for batch in tqdm(data_loader, desc="Evaluating"):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)

            outputs = model(input_ids=input_ids, attention_mask=attention_mask, labels=labels)
            loss = outputs.loss
            logits = outputs.logits
            predictions = torch.argmax(logits, dim=1)

            correct_predictions += torch.sum(predictions == labels)
            losses.append(loss.item())

            all_predictions.extend(predictions.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    avg_loss = np.mean(losses)
    accuracy = correct_predictions.double() / len(data_loader.dataset)
    # Use 'weighted' average for precision, recall, f1 for multiclass classification
    precision, recall, f1, _ = precision_recall_fscore_support(all_labels, all_predictions, average='weighted', zero_division=0)

    return avg_loss, accuracy, precision, recall, f1


In [33]:
print(f"Fine-tuning BERT model on synthetic eco-tweets for {epochs} epochs...")
fine_tune_history = {'train_loss': [], 'train_acc': [], 'val_loss': [], 'val_acc': [], 'val_precision': [], 'val_recall': [], 'val_f1': []}

# Fine-tuning loop
for epoch in range(epochs):
    print(f"\nFine-tune Epoch {epoch + 1}/{epochs}")
    train_loss, train_acc = train_epoch(fine_tuned_model, train_synthetic_loader, fine_tune_optimizer, device, fine_tune_scheduler)
    val_loss, val_acc, val_precision, val_recall, val_f1 = eval_model_multiclass(fine_tuned_model, val_synthetic_loader, device)

    print(f"Fine-tune Train loss: {train_loss:.4f}, Train accuracy: {train_acc:.4f}")
    print(f"Fine-tune Val loss: {val_loss:.4f}, Val accuracy: {val_acc:.4f}, Precision: {val_precision:.4f}, Recall: {val_recall:.4f}, F1-Score: {val_f1:.4f}")

    # Store results in history dictionary
    fine_tune_history['train_loss'].append(train_loss)
    fine_tune_history['train_acc'].append(train_acc.item())
    fine_tune_history['val_loss'].append(val_loss)
    fine_tune_history['val_acc'].append(val_acc.item())
    fine_tune_history['val_precision'].append(val_precision)
    fine_tune_history['val_recall'].append(val_recall)
    fine_tune_history['val_f1'].append(val_f1)


Fine-tuning BERT model on synthetic eco-tweets for 5 epochs...

Fine-tune Epoch 1/5


Training: 100%|██████████| 75/75 [00:24<00:00,  3.10it/s]
Evaluating: 100%|██████████| 19/19 [00:02<00:00,  8.70it/s]


Fine-tune Train loss: 0.3745, Train accuracy: 0.8967
Fine-tune Val loss: 0.0075, Val accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000

Fine-tune Epoch 2/5


Training: 100%|██████████| 75/75 [00:24<00:00,  3.04it/s]
Evaluating: 100%|██████████| 19/19 [00:02<00:00,  7.42it/s]


Fine-tune Train loss: 0.0052, Train accuracy: 1.0000
Fine-tune Val loss: 0.0023, Val accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000

Fine-tune Epoch 3/5


Training: 100%|██████████| 75/75 [00:24<00:00,  3.07it/s]
Evaluating: 100%|██████████| 19/19 [00:02<00:00,  8.57it/s]


Fine-tune Train loss: 0.0027, Train accuracy: 1.0000
Fine-tune Val loss: 0.0015, Val accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000

Fine-tune Epoch 4/5


Training: 100%|██████████| 75/75 [00:24<00:00,  3.06it/s]
Evaluating: 100%|██████████| 19/19 [00:02<00:00,  8.58it/s]


Fine-tune Train loss: 0.0019, Train accuracy: 1.0000
Fine-tune Val loss: 0.0012, Val accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000

Fine-tune Epoch 5/5


Training: 100%|██████████| 75/75 [00:24<00:00,  3.08it/s]
Evaluating: 100%|██████████| 19/19 [00:02<00:00,  7.96it/s]

Fine-tune Train loss: 0.0017, Train accuracy: 1.0000
Fine-tune Val loss: 0.0011, Val accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-Score: 1.0000





### 8\. Re-analysis of Custom Sentences with Fine-Tuned Model and Comparison

Finally, we re-evaluate the same custom sentences using the fine-tuned BERT model. This allows us to directly compare the predictions from the general BERT model and the sustainability-tuned BERT model, observing any improvements in sentiment classification, especially for the nuanced eco-friendly product feedback.

The prediction function is updated to reflect the three sentiment classes (Negative, Positive, Neutral). By comparing the output of this section with the "Initial Sentiment Analysis with Custom Sentences" section (Section 5), we can assess the impact of fine-tuning on sustainability-specific sentiment recognition.


In [34]:
def predict_sentiment_fine_tuned(text, model, tokenizer, device, max_len=128):
    model.eval()
    encoding = tokenizer.encode_plus(
        text,
        add_special_tokens=True,
        max_length=max_len,
        return_token_type_ids=False,
        padding='max_length',
        truncation=True,
        return_attention_mask=True,
        return_tensors='pt',
    )

    input_ids = encoding['input_ids'].to(device)
    attention_mask = encoding['attention_mask'].to(device)

    with torch.no_grad():
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        logits = outputs.logits
        probabilities = torch.softmax(logits, dim=1) # Get probabilities for all 3 classes
        _, prediction = torch.max(probabilities, dim=1) # Get the predicted class, taking the max among these

    sentiment_map_fine_tuned = {0: 'Negative', 1: 'Positive', 2: 'Neutral'}
    return sentiment_map_fine_tuned[prediction.item()], probabilities.flatten()[prediction.item()].item()

In [36]:
print("\n--- Re-analysis with Fine-tuned Model on Custom Sentences ---")
for sentence in custom_sentences:
    sentiment, prob = predict_sentiment_fine_tuned(sentence, fine_tuned_model, tokenizer, device)
    print(f"Sentence: '{sentence}' -> Predicted Sentiment: {sentiment} (Probability: {prob:.4f})")


--- Re-analysis with Fine-tuned Model on Custom Sentences ---
Sentence: 'This degradable plastics took more than 2 years to degrade' -> Predicted Sentiment: Negative (Probability: 0.9966)
Sentence: 'This plastic bottle is degrading really fast' -> Predicted Sentiment: Negative (Probability: 0.6514)
Sentence: 'Eco-friendly products are usually very expensive.' -> Predicted Sentiment: Negative (Probability: 0.9974)
Sentence: 'I love how this sustainable packaging reduces waste.' -> Predicted Sentiment: Positive (Probability: 0.9987)
Sentence: 'The carbon footprint of this product is too high.' -> Predicted Sentiment: Negative (Probability: 0.9987)
Sentence: 'This product is amazing and totally recyclable!' -> Predicted Sentiment: Positive (Probability: 0.9988)


### Fine-Tuning Results Analysis:

**✅ Dramatic Improvements in Negative Sentiment Detection:**
- *This degradable plastics took more than 2 years to degrade*  
  - **Before:** Positive (0.84) → **After:** Negative (0.99)  
  - **Why:** Model now recognizes "took more than 2 years" as a negative delay.

- *Eco-friendly products are usually very expensive.*  
  - **Before:** Positive (0.87) → **After:** Negative (0.99)  
  - **Why:** "Very expensive" is now correctly weighted as criticism.

- *The carbon footprint is too high*  
  - **Before:** Positive (0.81) → **After:** Negative (0.99)  
  - **Why:** "Too high" is now flagged as environmentally negative.

**⚠️ Potential Overcorrection?**
- *This plastic bottle is degrading really fast"*  
  - **Before:** Positive (0.97) → **After:** Negative (0.65)  
  - **Oddity:** Should likely remain Positive. Suggests synthetic data may over-emphasize "degrading" as negative without context.

**Consistent Positives:**
- *I love sustainable packaging* and *amazing recyclable product*  
  - Remain strongly Positive (0.99+) – correctly unaffected by fine-tuning.

#### Key Insights:
1. **Domain-Specific Nuances Learned:**  
   - Fine-tuning helped the model grasp sustainability-specific critiques (slow degradation, high cost, carbon footprint).

2. **Neutral Class Missing:**  
   - Sentences like *"Eco-products are expensive"* might benefit from a Neutral label instead of forced Negative.

3. **Next Steps:**  
   - Adjust synthetic data to better distinguish:  
     - *Fast degradation* = Positive  
     - *Slow degradation* = Negative  
   - Add Neutral examples for balanced training.

#### Fine-Tuning Verdict:  
**Success** for critical sustainability sentiment, but needs slight calibration for borderline cases.

### 9. Conclusion and Insights

**Impact of Fine-tuning:**
- ✅ **Critical Improvements**  
  - The model now correctly classifies sustainability pain points:  
    - Slow degradation ("took 2 years") → Negative (from Positive)  
    - High cost ("very expensive") → Negative (from Positive)  
    - Environmental harm ("high carbon footprint") → Negative (from Positive)  
  - Neutral class would help for ambiguous statements (e.g., "Eco-products cost more").

**Value for Product Design:**  
- 🎯 **Actionable Insights**  
  - **Pain Points to Address**:  
    - Speed of degradation (consumers expect faster results)  
    - Cost barriers (highlight long-term savings)  
    - Transparency in environmental claims (avoid greenwashing)  
  - **Positive Drivers to Leverage**:  
    - Fast degradation ("degrades quickly!" → marketable benefit)  
    - Waste reduction ("reduces packaging waste" → emotional appeal)  
    - Recyclability ("totally recyclable" → clear sustainability badge)  

**Limitations and Future Work:**  
- 🔧 **Areas for Improvement**:  
  1. **Data Quality**:  
     - Replace synthetic tweets with real consumer reviews for richer nuance.  
     - Example: Scrape Twitter/X for #SustainableProducts hashtags.  
  2. **Model Precision**:  
     - Add **aspect-based sentiment** (e.g., separate "price" vs. "durability" sentiment).  
     - Use **Few-Shot Learning** with GPT-4 to generate better synthetic examples.  
  3. **Deployment**:  
     - Build a real-time dashboard monitoring social media for emerging sustainability trends.  

**Key Takeaway**:  
Fine-tuning BERT with domain-specific data *works*—it transformed a generic sentiment model into a sustainability-aware tool. Further refinements will unlock even deeper insights for eco-design.  

# Author
                                          By: Kaletsidik Ayalew
                                          Date: July 14, 2025