## Task7

#### In your opinion, can we do better? I propose that your team try all ML models that you know and give us the model with the best possible precision.

In [5]:
# =========================
# Common Setup (run once)
# - Loads NLTK twitter_samples
# - Fixed split: train = [:4000], test = [4000:] for each class
# - Minimal tweet cleaning
# - Produces: train_x_clean, test_x_clean, train_y, test_y
# =========================
import re, string, numpy as np
import nltk
from nltk.corpus import twitter_samples, stopwords
from sklearn.metrics import precision_score, classification_report, confusion_matrix
# Reproducibility
RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)

# If not cached locally, keep these lines enabled
nltk.download('twitter_samples')
nltk.download('stopwords')

# 1) Load dataset
pos = twitter_samples.strings('positive_tweets.json')
neg = twitter_samples.strings('negative_tweets.json')

# 2) Fixed split per requirement
train_pos, test_pos = pos[:4000], pos[4000:]
train_neg, test_neg = neg[:4000], neg[4000:]

train_x = train_pos + train_neg
test_x  = test_pos + test_neg
train_y = np.append(np.ones(len(train_pos)), np.zeros(len(train_neg)))  # 1=Positive, 0=Negative
test_y  = np.append(np.ones(len(test_pos)),  np.zeros(len(test_neg)))

# 3) Minimal, robust cleaning for short tweets
STOP = set(stopwords.words('english'))
def clean_tweet(t: str) -> str:
    t = re.sub(r'http\S+', '', t)   # remove URLs
    t = re.sub(r'@\w+', '', t)      # remove @mentions
    t = t.lower()
    t = t.translate(str.maketrans('', '', string.punctuation))
    toks = [w for w in t.split() if w not in STOP]
    return " ".join(toks)

# 4) Preprocess once and reuse
train_x_clean = [clean_tweet(t) for t in train_x]
test_x_clean  = [clean_tweet(t) for t in test_x]

print(f"Train size: {len(train_x_clean)} | Test size: {len(test_x_clean)}")

def evaluate_model(name, y_true, y_pred):
    """Prints precision for Positive=1 and Negative=0, plus full report and confusion matrix."""
    print(f"\n=== {name} ===")
    print("Precision (Positive=1):", precision_score(y_true, y_pred, pos_label=1))
    print("Precision (Negative=0):", precision_score(y_true, y_pred, pos_label=0))
    print("\nClassification report:")
    print(classification_report(y_true, y_pred, target_names=["Negative", "Positive"]))
    print("Confusion matrix [[TN, FP],[FN, TP]]:")
    print(confusion_matrix(y_true, y_pred))


[nltk_data] Downloading package twitter_samples to
[nltk_data]     C:\Users\lenovo\AppData\Roaming\nltk_data...
[nltk_data]   Package twitter_samples is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\lenovo\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Train size: 8000 | Test size: 2000


##### * There are some ML models that my team knows and that can give us better precision

1, Linear SVM (LinearSVC, TF-IDF)

In [6]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC

tfidf = TfidfVectorizer(ngram_range=(1,2), min_df=2)
X_tr = tfidf.fit_transform(train_x_clean)
X_te = tfidf.transform(test_x_clean)

svm_lin = LinearSVC(C=1.0, random_state=RANDOM_STATE)
svm_lin.fit(X_tr, train_y)

y_pred = svm_lin.predict(X_te)
evaluate_model("LinearSVC (TF-IDF)", test_y, y_pred)



=== LinearSVC (TF-IDF) ===
Precision (Positive=1): 0.7551020408163265
Precision (Negative=0): 0.7221702525724977

Classification report:
              precision    recall  f1-score   support

    Negative       0.72      0.77      0.75      1000
    Positive       0.76      0.70      0.73      1000

    accuracy                           0.74      2000
   macro avg       0.74      0.74      0.74      2000
weighted avg       0.74      0.74      0.74      2000

Confusion matrix [[TN, FP],[FN, TP]]:
[[772 228]
 [297 703]]


2, RBF SVM (SVC with RBF, TF-IDF)

In [7]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC

tfidf = TfidfVectorizer(ngram_range=(1,2), min_df=2)
X_tr = tfidf.fit_transform(train_x_clean)
X_te = tfidf.transform(test_x_clean)

svm_rbf = SVC(kernel="rbf", C=2.0, gamma="scale", random_state=RANDOM_STATE)
svm_rbf.fit(X_tr, train_y)

y_pred = svm_rbf.predict(X_te)
evaluate_model("SVC RBF (TF-IDF)", test_y, y_pred)



=== SVC RBF (TF-IDF) ===
Precision (Positive=1): 0.7694013303769401
Precision (Negative=0): 0.7213114754098361

Classification report:
              precision    recall  f1-score   support

    Negative       0.72      0.79      0.76      1000
    Positive       0.77      0.69      0.73      1000

    accuracy                           0.74      2000
   macro avg       0.75      0.74      0.74      2000
weighted avg       0.75      0.74      0.74      2000

Confusion matrix [[TN, FP],[FN, TP]]:
[[792 208]
 [306 694]]


3, Multinomial Naive Bayes (TF-IDF)

In [8]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB

tfidf = TfidfVectorizer(ngram_range=(1,2), min_df=2)
X_tr = tfidf.fit_transform(train_x_clean)
X_te = tfidf.transform(test_x_clean)

mnb = MultinomialNB(alpha=0.5)  # try 0.3–1.0 for tuning
mnb.fit(X_tr, train_y)

y_pred = mnb.predict(X_te)
evaluate_model("MultinomialNB (TF-IDF)", test_y, y_pred)



=== MultinomialNB (TF-IDF) ===
Precision (Positive=1): 0.7788018433179723
Precision (Negative=0): 0.7137809187279152

Classification report:
              precision    recall  f1-score   support

    Negative       0.71      0.81      0.76      1000
    Positive       0.78      0.68      0.72      1000

    accuracy                           0.74      2000
   macro avg       0.75      0.74      0.74      2000
weighted avg       0.75      0.74      0.74      2000

Confusion matrix [[TN, FP],[FN, TP]]:
[[808 192]
 [324 676]]


4, Bernoulli Naive Bayes (Binary Count)

In [9]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import BernoulliNB

bin_vec = CountVectorizer(binary=True, ngram_range=(1,2), min_df=2)
X_tr = bin_vec.fit_transform(train_x_clean)
X_te = bin_vec.transform(test_x_clean)

bnb = BernoulliNB(alpha=0.5)
bnb.fit(X_tr, train_y)

y_pred = bnb.predict(X_te)
evaluate_model("BernoulliNB (Binary BoW)", test_y, y_pred)



=== BernoulliNB (Binary BoW) ===
Precision (Positive=1): 0.799265605875153
Precision (Negative=0): 0.7066779374471682

Classification report:
              precision    recall  f1-score   support

    Negative       0.71      0.84      0.77      1000
    Positive       0.80      0.65      0.72      1000

    accuracy                           0.74      2000
   macro avg       0.75      0.74      0.74      2000
weighted avg       0.75      0.74      0.74      2000

Confusion matrix [[TN, FP],[FN, TP]]:
[[836 164]
 [347 653]]


6, XGBoost (TF-IDF)(Requires: pip install xgboost)


In [13]:

from sklearn.feature_extraction.text import TfidfVectorizer
from xgboost import XGBClassifier

tfidf = TfidfVectorizer(ngram_range=(1,2), min_df=2)
X_tr = tfidf.fit_transform(train_x_clean)
X_te = tfidf.transform(test_x_clean)

xgb = XGBClassifier(
    random_state=RANDOM_STATE,
    eval_metric="logloss",
    n_estimators=300,
    max_depth=5,
    learning_rate=0.1,
    n_jobs=-1
)
xgb.fit(X_tr, train_y)

y_pred = xgb.predict(X_te)
evaluate_model("XGBoost (TF-IDF)", test_y, y_pred)



=== XGBoost (TF-IDF) ===
Precision (Positive=1): 0.7958579881656804
Precision (Negative=0): 0.6510574018126888

Classification report:
              precision    recall  f1-score   support

    Negative       0.65      0.86      0.74      1000
    Positive       0.80      0.54      0.64      1000

    accuracy                           0.70      2000
   macro avg       0.72      0.70      0.69      2000
weighted avg       0.72      0.70      0.69      2000

Confusion matrix [[TN, FP],[FN, TP]]:
[[862 138]
 [462 538]]


#### Conclusion ...........

## Task 8
#### We are in 2025 right now, so use some Virtual Assistant such as ChatGPT (or better call API of LLM model) as the benchmark and find a way to run the test set in your course with ChatGPT to determine the sentiment. What is your conclusion?

*(values depend on actual run, typically high since the dataset is clean and balanced).*

### 1. ChatGPT API (LLM-based Classifier)
- **Setup**: For each tweet, a prompt was sent to GPT-4o-mini with instruction:  
*“Classify the sentiment of this tweet as Positive or Negative”*.
- **No training** required; the model directly predicts labels.
- **Results (20 tweets)**:


In [None]:
import nltk
from nltk.corpus import twitter_samples
from openai import OpenAI
from sklearn.metrics import classification_report

nltk.download('twitter_samples')
all_positive_tweets = twitter_samples.strings('positive_tweets.json')
all_negative_tweets = twitter_samples.strings('negative_tweets.json')

test (20 positive + 20 negative)
test_pos = all_positive_tweets[4000:4020]
test_neg = all_negative_tweets[4000:4020]
test_x = test_pos + test_neg
test_y = [1]*len(test_pos) + [0]*len(test_neg)   # 1 = Positive, 0 = Negative

client = OpenAI(api_key="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")
def classify_with_chatgpt(sentence):
    prompt = f"Classify the sentiment of this tweet as Positive or Negative:\n\nTweet: {sentence}"
    response = client.chat.completions.create(
        model="gpt-4o-mini","
        messages=[{"role": "user", "content": prompt}],
        temperature=0
    )
    label = response.choices[0].message.content.strip().lower()
    return 1 if "positive" in label else 0

y_pred = [classify_with_chatgpt(s) for s in test_x]

print(classification_report(test_y, y_pred, target_names=["Negative", "Positive"]))


[nltk_data] Downloading package twitter_samples to
[nltk_data]     C:\Users\lenovo\AppData\Roaming\nltk_data...
[nltk_data]   Package twitter_samples is already up-to-date!


              precision    recall  f1-score   support

    Negative       0.90      0.95      0.93        20
    Positive       0.95      0.90      0.92        20

    accuracy                           0.93        40
   macro avg       0.93      0.93      0.92        40
weighted avg       0.93      0.93      0.92        40



*(values vary by run; generally comparable to Logistic Regression, sometimes slightly better on informal or tricky tweets).*

### Key Observations
- **Accuracy**: Both models perform well on this small balanced test set.  
- **Efficiency**:
- Logistic Regression is **instantaneous** and free after training.
- ChatGPT API requires **20 separate API calls** (slower, costs tokens).
- **Generalization**:
- Logistic Regression relies on word frequencies; struggles with slang, emojis, sarcasm.
- ChatGPT leverages pre-training on massive corpora, so it often handles informal text better.
- **Reproducibility**:
- Logistic Regression is fully deterministic with a fixed random seed.
- ChatGPT output may vary, though setting `temperature=0` reduces randomness.

### Conclusion
Logistic Regression provides a **strong, fast, reproducible baseline**.  
ChatGPT API delivers **comparable or superior accuracy** on nuanced tweets,  
but at the cost of **time and API usage**.
