# Phase 6: Analysis and Visualization

**Objective**: Interpret results from Phases 3–5 to identify class imbalance effects, compare mitigation techniques (SMOTE, Random Undersampling, NearMiss, Weighted Loss), and visualize findings to highlight improvements in 3-class (Negative, Neutral, Positive) sentiment classification on the Bangla Sentiment Dataset

### Step 1: Analyze Imbalance Effects

- **Objective**: Identify performance gaps in baseline models, focusing on minority class (Positive) recall and F1.

In [6]:
import pandas as pd
import logging
import os

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Ensure 'analysis' folder exists
os.makedirs("analysis", exist_ok=True)

# Load results
try:
    results_df = pd.read_csv("evaluation/comparative_results.csv")
except FileNotFoundError:
    logging.error("CSV file not found. Make sure 'evaluation/comparative_results.csv' exists.")
    exit(1)

# Filter by Type
baseline_df = results_df[results_df['Type'] == 'baseline']
mitigated_df = results_df[results_df['Type'] == 'mitigated']


In [7]:
# Summarize Positive class performance for baseline
positive_summary = baseline_df[['Model', 'F1_Positive', 'ROC_AUC']].describe()
logging.info("Baseline Positive class summary:\n" + str(positive_summary))

2025-06-26 18:26:01,924 - INFO - Baseline Positive class summary:
       F1_Positive   ROC_AUC
count     4.000000  4.000000
mean      0.508565  0.744059
std       0.022895  0.007840
min       0.478571  0.734816
25%       0.497341  0.741040
50%       0.512668  0.743732
75%       0.523892  0.746751
max       0.530351  0.753958


In [8]:
# Compute standard deviation and mean for additional insight
baseline_f1_mean = baseline_df['F1_Positive'].mean()
baseline_f1_std = baseline_df['F1_Positive'].std()

# Compare with mitigated models
mitigated_positive = (
    mitigated_df.groupby(['Model', 'Mitigation'])['F1_Positive']
    .mean()
    .unstack()
    .round(3)
)
logging.info("Mitigated Positive class F1 (mean):\n" + str(mitigated_positive))

2025-06-26 18:26:55,333 - INFO - Mitigated Positive class F1 (mean):
Mitigation                       tuned
Model                                 
LogisticRegression_nearmiss      0.500
LogisticRegression_smote         0.504
LogisticRegression_undersampled  0.524
LogisticRegression_weighted      0.480
NaiveBayes_nearmiss              0.528
NaiveBayes_smote                 0.475
NaiveBayes_undersampled          0.523
NaiveBayes_weighted              0.475
RandomForest_nearmiss            0.513
RandomForest_smote               0.484
RandomForest_undersampled        0.544
RandomForest_weighted            0.484
SVM_nearmiss                     0.531
SVM_smote                        0.521
SVM_undersampled                 0.540
SVM_weighted                     0.519


In [9]:
# Write analysis
analysis_path = "analysis/imbalance_analysis.txt"
with open(analysis_path, "w", encoding='utf-8') as f:
    f.write("📊 Imbalance Effects Analysis\n")
    f.write("=================================\n")
    f.write("Baseline models show low Positive class F1:\n")
    f.write("  - Mean F1_Positive: {:.3f}\n".format(baseline_f1_mean))
    f.write("  - Std Dev F1_Positive: {:.3f}\n".format(baseline_f1_std))
    f.write("\n")
    f.write("Mitigated models improve Positive F1, especially with techniques like SMOTE and Weighted Loss:\n\n")
    f.write(mitigated_positive.to_string())
    f.write("\n\nRecommendations:\n")
    f.write("- Use SMOTE or Weighted Loss for imbalanced classification tasks.\n")
    f.write("- Consider additional metrics like precision/recall per class for deeper insight.\n")

logging.info(f"Imbalance analysis saved: {analysis_path}")

2025-06-26 18:27:29,326 - INFO - Imbalance analysis saved: analysis/imbalance_analysis.txt
