## BINARY CLASSIFICATION

In [1]:
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
import pandas as pd
import numpy as np

In [2]:
# Load dataset

text_cleaned = pd.read_csv("../../assets/Cleaned_Tweets.csv")
text_cleaned.head()                           
                           

Unnamed: 0,tweet,emotion,product_filled,cleaned_tweets,text_length
0,.@wesley83 i have a 3g iphone. after 3 hrs twe...,negative,iphone,3g iphone 3 hrs tweeting rise_austin dead need...,12
1,@jessedee know about @fludapp ? awesome ipad/i...,positive,ipad or iphone app,know awesome ipadiphone app youll likely appre...,14
2,@swonderlin can not wait for #ipad 2 also. the...,positive,ipad,wait ipad 2 also sale sxsw,6
3,@sxsw i hope this year's festival isn't as cra...,negative,ipad or iphone app,hope years festival isnt crashy years iphone a...,9
4,@sxtxstate great stuff on fri #sxsw: marissa m...,positive,google,great stuff fri sxsw marissa mayer google tim ...,15


In [5]:
binary_df = text_cleaned[text_cleaned["emotion"].isin(['positive', 'negative'])]
binary_df


Unnamed: 0,tweet,emotion,product_filled,cleaned_tweets,text_length
0,.@wesley83 i have a 3g iphone. after 3 hrs twe...,negative,iphone,3g iphone 3 hrs tweeting rise_austin dead need...,12
1,@jessedee know about @fludapp ? awesome ipad/i...,positive,ipad or iphone app,know awesome ipadiphone app youll likely appre...,14
2,@swonderlin can not wait for #ipad 2 also. the...,positive,ipad,wait ipad 2 also sale sxsw,6
3,@sxsw i hope this year's festival isn't as cra...,negative,ipad or iphone app,hope years festival isnt crashy years iphone a...,9
4,@sxtxstate great stuff on fri #sxsw: marissa m...,positive,google,great stuff fri sxsw marissa mayer google tim ...,15
...,...,...,...,...,...
9076,@mention your pr guy just convinced me to swit...,positive,iphone,pr guy convinced switch back iphone great sxsw...,10
9078,&quot;papyrus...sort of like the ipad&quot; - ...,positive,ipad,quotpapyrussort like ipadquot nice lol sxsw la...,7
9079,diller says google tv &quot;might be run over ...,negative,other google product or service,diller says google tv quotmight run playstatio...,13
9084,i've always used camera+ for my iphone b/c it ...,positive,ipad or iphone app,ive always used camera iphone bc image stabili...,17


In [9]:
X_binary = binary_df["cleaned_tweets"]
y_binary = binary_df["emotion"]

y_binary = y_binary.map({'negative': 0, 'positive': 1})


Xb_train, Xb_test, yb_train, yb_test = train_test_split(
    X_binary, y_binary, test_size=0.2, random_state=42, stratify=y_binary
)

In [10]:


Xb_temp, Xb_val, yb_temp, yb_val= train_test_split(
    Xb_train, yb_train, test_size=0.2, random_state=42, stratify=yb_train
)



In [11]:
# Define pipeline
sentiment_pipeline = Pipeline([
    ("tfidf", TfidfVectorizer(max_features=5000, ngram_range=(1,2))),
    ("clf", LogisticRegression(max_iter=1000))
])

# Fit pipeline
sentiment_pipeline.fit(Xb_temp, yb_temp)

# Predictions
y_pred = sentiment_pipeline.predict(Xb_val)

# Evaluation
print("Classification Report:\n", classification_report(yb_val, y_pred))
print("Confusion Matrix:\n", confusion_matrix(yb_val, y_pred))

Classification Report:
               precision    recall  f1-score   support

           0       0.86      0.07      0.12        91
           1       0.85      1.00      0.92       477

    accuracy                           0.85       568
   macro avg       0.85      0.53      0.52       568
weighted avg       0.85      0.85      0.79       568

Confusion Matrix:
 [[  6  85]
 [  1 476]]



SUMMARY:
It looks like your validation results are showing very poor recall for class 0 (only 7%), which means your model is barely predicting any negatives correctly — it’s biased toward predicting positive (class 1) almost all the time.

This is common in imbalanced datasets, and since you’re doing binary classification for "positive" vs "negative", the issue might be:

Class distribution is heavily skewed (way more positives than negatives).

No balancing techniques applied (e.g., class weights, oversampling).

The model is learning features that mostly correlate with positives.

In [13]:
# Define pipeline

from imblearn.over_sampling import SMOTE
from imblearn.pipeline import Pipeline as ImbPipeline

smote_pipeline = ImbPipeline([
    ("tfidf", TfidfVectorizer(max_features=5000, ngram_range=(1,2))),
    ("smote", SMOTE(random_state=42)),
    ("clf", LogisticRegression(max_iter=1000, class_weight=None))  # class_weight not needed since we use SMOTE
])

# Fit pipeline
smote_pipeline.fit(Xb_temp, yb_temp)

# Predictions
y_pred = smote_pipeline.predict(Xb_val)

# Evaluation
print("Classification Report:\n", classification_report(yb_val, y_pred))
print("Confusion Matrix:\n", confusion_matrix(yb_val, y_pred))

Classification Report:
               precision    recall  f1-score   support

           0       0.54      0.53      0.53        91
           1       0.91      0.91      0.91       477

    accuracy                           0.85       568
   macro avg       0.72      0.72      0.72       568
weighted avg       0.85      0.85      0.85       568

Confusion Matrix:
 [[ 48  43]
 [ 41 436]]


Looks like SMOTE helped your model pick up more of the minority class (label 0) compared to your earlier results.

Here’s the change in performance compared to your previous run:

Recall for class 0 went from 0.07 → 0.53  (big improvement — now your model catches more positive cases).

Precision for class 0 dropped a bit (0.86 → 0.54) — this is expected, because SMOTE increases false positives.

Class 1 stayed strong with 0.91 precision/recall.

Overall accuracy stayed at 85%, but your model is now more balanced in detecting both classes.

In [14]:
param_grid = {
    "tfidf__max_features": [3000, 5000, 7000],
    "tfidf__ngram_range": [(1,1), (1,2)],
    "clf__C": [0.01, 0.1, 1, 10],
    "clf__solver": ["liblinear", "lbfgs"],
    "clf__penalty": ["l2"]  # liblinear & lbfgs both work with L2
}

# Grid search
grid_search = GridSearchCV(
    smote_pipeline,
    param_grid,
    cv=5,
    scoring="f1_macro",
    n_jobs=-1,
    verbose=2
)

# Fit grid search
grid_search.fit(Xb_train, yb_train)

# Best parameters
print("Best Parameters:", grid_search.best_params_)

# Evaluate on test set
best_model = grid_search.best_estimator_
y_pred = best_model.predict(Xb_test)

from sklearn.metrics import classification_report, confusion_matrix
print("\nClassification Report:\n", classification_report(yb_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(yb_test, y_pred))

Fitting 5 folds for each of 48 candidates, totalling 240 fits
Best Parameters: {'clf__C': 10, 'clf__penalty': 'l2', 'clf__solver': 'liblinear', 'tfidf__max_features': 7000, 'tfidf__ngram_range': (1, 1)}

Classification Report:
               precision    recall  f1-score   support

           0       0.63      0.58      0.61       114
           1       0.92      0.94      0.93       596

    accuracy                           0.88       710
   macro avg       0.78      0.76      0.77       710
weighted avg       0.87      0.88      0.88       710


Confusion Matrix:
 [[ 66  48]
 [ 38 558]]


Key takeaways from your results:

Recall for class 0 (minority) improved from 0.53 → 0.58, meaning the model is catching more minority samples.

Recall for class 1 (majority) stayed strong at 0.94.

Overall accuracy went up to 88%.

Macro avg recall (important for imbalanced data) also improved.