# 🧠 Emotion Detection from Text using Machine Learning

### 📌 Project Overview
This project detects human emotions (like joy, sadness, anger, fear, love, and surprise) based on textual input using a supervised machine learning approach.

---

### 📂 Dataset Source
- **Name:** DAIR-AI Emotion Dataset  
- **Link:** [https://huggingface.co/datasets/dair-ai/emotion](https://huggingface.co/datasets/dair-ai/emotion)  
- **Size:** ~16,000 real-world English sentences with 6 labeled emotions.

---

### ⚙️ What This Project Does
- Cleans raw text using `neattext`
- Converts text into features using **TF-IDF**
- Trains a **Multinomial Naive Bayes classifier**
- Evaluates with precision, recall, and F1-score
- Predicts emotion for any custom user input

---

### ✅ Final Accuracy Achieved
- **Accuracy:** ~78%
- **Strong performance** on emotions like **joy** and **sadness**
- Scope to improve **love** and **surprise** with more data


In [24]:
!pip install datasets


Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp312-cp312-win_amd64.whl.metadata (13 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py312-none-any.whl.metadata (7.2 kB)
Collecting huggingface-hub>=0.24.0 (from datasets)
  Downloading huggingface_hub-0.31.2-py3-none-any.whl.metadata (13 kB)
Downloading datasets-3.6.0-py3-none-any.whl (491 kB)
Downloading huggingface_hub-0.31.2-py3-none-any.whl (484 kB)
Downloading multiprocess-0.70.16-py312-none-any.whl (146 kB)
Downloading xxhash-3.5.0-cp312-cp312-win_amd64.whl (30 kB)
Installing collected packages: xxhash, multiprocess, huggingface-hub, datasets
Successfully installed datasets-3.6.0 huggingface-hub-0.31.2 multiprocess-0.70.16 xxhash-3.5.0


In [72]:
from datasets import load_dataset
import pandas as pd

# Load the 'train' split of the dataset
dataset = load_dataset("dair-ai/emotion", split='train')

# Convert to Pandas DataFrame
df = dataset.to_pandas()

# Map integer labels to emotion names
label_map = {
    0: "sadness",
    1: "joy",
    2: "love",
    3: "anger",
    4: "fear",
    5: "surprise"
}
df['emotion'] = df['label'].map(label_map)

# Keep only required columns
df = df[['text', 'emotion']]
df.head()


Unnamed: 0,text,emotion
0,i didnt feel humiliated,sadness
1,i can go from feeling so hopeless to so damned...,sadness
2,im grabbing a minute to post i feel greedy wrong,anger
3,i am ever feeling nostalgic about the fireplac...,love
4,i am feeling grouchy,anger


In [28]:
# Install neattext if not installed
!pip install neattext

import neattext.functions as nfx

# Remove stopwords, punctuations, etc.
df['clean_text'] = df['text'].apply(nfx.remove_userhandles)
df['clean_text'] = df['clean_text'].apply(nfx.remove_hashtags)
df['clean_text'] = df['clean_text'].apply(nfx.remove_punctuations)
df['clean_text'] = df['clean_text'].apply(nfx.remove_stopwords)

df[['text', 'clean_text', 'emotion']].head()




Unnamed: 0,text,clean_text,emotion
0,i didnt feel humiliated,didnt feel humiliated,sadness
1,i can go from feeling so hopeless to so damned...,feeling hopeless damned hopeful cares awake,sadness
2,im grabbing a minute to post i feel greedy wrong,im grabbing minute post feel greedy wrong,anger
3,i am ever feeling nostalgic about the fireplac...,feeling nostalgic fireplace know property,love
4,i am feeling grouchy,feeling grouchy,anger


In [56]:
from sklearn.feature_extraction.text import CountVectorizer

# Vectorize the clean text
cv = CountVectorizer()
X = cv.fit_transform(df['clean_text'])

# Labels
y = df['emotion']


In [70]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Naive Bayes model
model = MultinomialNB()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluation
acc = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

# Print metrics
print("Accuracy:", acc)
print("\nClassification Report:\n", report)

# Summary of insights
print("\nWhat the Report Tells Us:")
print(f"Accuracy: {round(acc * 100, 2)}% on {len(y_test)} test samples")
print("Joy & Sadness: High recall (model is strong)")
print("Love & Surprise: Low recall (needs more samples or refinement)")


Accuracy: 0.7803125

Classification Report:
               precision    recall  f1-score   support

       anger       0.88      0.67      0.76       427
        fear       0.82      0.63      0.71       397
         joy       0.75      0.94      0.83      1021
        love       0.88      0.35      0.51       296
     sadness       0.77      0.94      0.85       946
    surprise       0.77      0.09      0.16       113

    accuracy                           0.78      3200
   macro avg       0.81      0.60      0.64      3200
weighted avg       0.79      0.78      0.76      3200


What the Report Tells Us:
Accuracy: 78.03% on 3200 test samples
Joy & Sadness: High recall (model is strong)
Love & Surprise: Low recall (needs more samples or refinement)


In [60]:
import joblib

joblib.dump(model, "emotion_classifier_model.pkl")
joblib.dump(cv, "vectorizer.pkl")

print("Model and vectorizer saved successfully.")


Model and vectorizer saved successfully.


In [75]:
# Load model and vectorizer
model = joblib.load("emotion_classifier_model.pkl")
vectorizer = joblib.load("vectorizer.pkl")

# Predict function
def predict_emotion(text):
    clean = nfx.remove_stopwords(nfx.remove_punctuations(text))
    vect_text = vectorizer.transform([clean])
    return model.predict(vect_text)[0]

# Take user input
user_input = input("Enter your text: ")
predicted_emotion = predict_emotion(user_input)

print(f"Predicted Emotion: {predicted_emotion}")


Enter your text:  I'm really anxious about tomorrow.


Predicted Emotion: fear
