#Emotion Detection(ML)

✅ Features of the System:

1)Input: Text

2)Output: Emotion label (joy, anger, sadness, or fear)

3)Techniques: ML (Logistic Regression / SVM), TF-IDF, NLP preprocessing





##🔹 Cell 1: Install & Import Required Libraries



In [1]:
!pip install kagglehub
import kagglehub
import pandas as pd
import numpy as np
import re
import nltk
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

nltk.download('punkt')
nltk.download('stopwords')




[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

##🔹 Cell 2: Download Dataset and Preprocess

In [2]:
# Download dataset
path = kagglehub.dataset_download("praveengovi/emotions-dataset-for-nlp")

# Read CSV
df = pd.read_csv(path + '/train.txt', sep=';', names=["text", "emotion"])
df = df[df['emotion'].isin(['joy', 'anger', 'sadness', 'fear'])]  # filter only 4 emotions
df.head()


Unnamed: 0,text,emotion
0,i didnt feel humiliated,sadness
1,i can go from feeling so hopeless to so damned...,sadness
2,im grabbing a minute to post i feel greedy wrong,anger
4,i am feeling grouchy,anger
5,ive been feeling a little burdened lately wasn...,sadness


##🔹 Cell 3: Clean Text Data

In [3]:
import nltk
import pandas as pd
import re
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Download 'punkt_tab' resource
nltk.download('punkt_tab')

stop_words = set(stopwords.words('english'))


def clean_text(text):
    text = text.lower()
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    tokens = word_tokenize(text)
    filtered = [w for w in tokens if w not in stop_words]
    return " ".join(filtered)


df['clean_text'] = df['text'].apply(clean_text)
df.head()

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


Unnamed: 0,text,emotion,clean_text
0,i didnt feel humiliated,sadness,didnt feel humiliated
1,i can go from feeling so hopeless to so damned...,sadness,go feeling hopeless damned hopeful around some...
2,im grabbing a minute to post i feel greedy wrong,anger,im grabbing minute post feel greedy wrong
4,i am feeling grouchy,anger,feeling grouchy
5,ive been feeling a little burdened lately wasn...,sadness,ive feeling little burdened lately wasnt sure


##🔹 Cell 4: Train ML Model

In [4]:
# Split
X_train, X_test, y_train, y_test = train_test_split(df['clean_text'], df['emotion'], test_size=0.2, random_state=42)

# Vectorize
vectorizer = TfidfVectorizer(max_features=5000)
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

# Train Model
model = LogisticRegression()
model.fit(X_train_vec, y_train)

# Evaluate
y_pred = model.predict(X_test_vec)
print(classification_report(y_test, y_pred))



              precision    recall  f1-score   support

       anger       0.92      0.82      0.87       425
        fear       0.92      0.79      0.85       384
         joy       0.91      0.97      0.94      1095
     sadness       0.91      0.93      0.92       921

    accuracy                           0.91      2825
   macro avg       0.92      0.88      0.90      2825
weighted avg       0.91      0.91      0.91      2825



##🔹 Cell 5: Predict Function

In [5]:
def predict_emotion(text):
    cleaned = clean_text(text)
    vec = vectorizer.transform([cleaned])
    pred = model.predict(vec)[0]
    return pred

# Example
predict_emotion("I am feeling very sad and lost today.")


'sadness'

✅ How Audio Input Works:

🔊 Audio (.wav, .mp3)

→ 🧠 Speech Recognition

→ ✍️ Transcribed Text

→ 📊 Emotion Classification (ML model)

→ 😃 Predicted Emotion

##Adding Audio Feature

In [6]:
!pip install speechrecognition pydub


Collecting speechrecognition
  Downloading speechrecognition-3.14.3-py3-none-any.whl.metadata (30 kB)
Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading speechrecognition-3.14.3-py3-none-any.whl (32.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m32.9/32.9 MB[0m [31m59.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Installing collected packages: pydub, speechrecognition
Successfully installed pydub-0.25.1 speechrecognition-3.14.3


In [7]:
import speech_recognition as sr
from pydub import AudioSegment
from pydub.playback import play

def transcribe_audio(file_path):
    recognizer = sr.Recognizer()
    audio_file = sr.AudioFile(file_path)
    with audio_file as source:
        audio = recognizer.record(source)
    try:
        text = recognizer.recognize_google(audio)
        return text
    except sr.UnknownValueError:
        return "Could not understand audio"
    except sr.RequestError:
        return "Speech recognition service is unavailable"


In [11]:
def predict_emotion_from_audio(file_path):
    text = transcribe_audio(file_path)
    print("Transcribed text:", text)
    if "could not" in text.lower() or "unavailable" in text.lower():
        return "Error in audio transcription"
    return predict_emotion(text)


In [12]:
# Upload your audio file (WAV format recommended)
from google.colab import files
uploaded = files.upload()

# Use the filename (e.g., 'sad_voice.wav')
# Define the predict_emotion_from_audio function
def predict_emotion_from_audio(file_path):
    text = transcribe_audio(file_path)
    print("Transcribed text:", text)
    if "could not" in text.lower() or "unavailable" in text.lower():
        return "Error in audio transcription"
    return predict_emotion(text)

# Call the function after definition
predict_emotion_from_audio("/content/OAF_bite_sad.wav")

Saving OAF_bite_sad.wav to OAF_bite_sad (3).wav
Transcribed text: say the word bite


'sadness'

✅ Already Done:

 ~Text-based emotion detection

 ~Audio-based emotion detection

🔜 Next Step: Image Input (Facial Expression Analysis)

We’ll:

1)Load facial emotion dataset or use a pre-trained model

2)Process webcam or image input

3)Predict emotion using a CNN

##✅ Install & Import Required Libraries

In [13]:
# Install necessary packages
!pip install opencv-python-headless --quiet


In [14]:
# Import libraries
import cv2
import numpy as np
from keras.models import load_model
import matplotlib.pyplot as plt


In [15]:
# Step 1: Install TensorFlow
!pip install -q tensorflow

# Step 2: Import necessary libraries
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

# Step 3: Sample Data
texts = ["I am happy", "I am sad", "I am angry", "I am scared"]
labels = [0, 1, 2, 3]  # 0: joy, 1: sadness, 2: anger, 3: fear

# Step 4: Tokenization
tokenizer = Tokenizer(num_words=1000, oov_token="<OOV>")
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
padded_sequences = pad_sequences(sequences, maxlen=10)

# Step 5: Convert labels to array
labels = np.array(labels)

# Step 6: Build LSTM model
model = Sequential([
    Embedding(input_dim=1000, output_dim=16, input_length=10),
    LSTM(32),
    Dense(4, activation='softmax')
])

# Step 7: Compile & Train
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(padded_sequences, labels, epochs=50, verbose=1)

# Step 8: Save model
model.save("/content/emotion_model.h5")


Epoch 1/50




[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3s/step - accuracy: 0.5000 - loss: 1.3864
Epoch 2/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 0.5000 - loss: 1.3856
Epoch 3/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 55ms/step - accuracy: 0.5000 - loss: 1.3849
Epoch 4/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 50ms/step - accuracy: 0.5000 - loss: 1.3841
Epoch 5/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 0.5000 - loss: 1.3833
Epoch 6/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 59ms/step - accuracy: 0.5000 - loss: 1.3825
Epoch 7/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step - accuracy: 0.7500 - loss: 1.3817
Epoch 8/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 1.0000 - loss: 1.3809
Epoch 9/50
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms



##✅ Load Pre-trained CNN Model

In [18]:
!pip install emoji

Collecting emoji
  Downloading emoji-2.14.1-py3-none-any.whl.metadata (5.7 kB)
Downloading emoji-2.14.1-py3-none-any.whl (590 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/590.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━[0m [32m563.2/590.6 kB[0m [31m17.5 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m590.6/590.6 kB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: emoji
Successfully installed emoji-2.14.1


In [19]:
def get_emoji_description(emoji_char):
    return emoji.demojize(emoji_char).replace("_", " ")

In [20]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "cardiffnlp/twitter-roberta-base-emoji"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_emoji_emotion(emoji):
    inputs = tokenizer(emoji, return_tensors="pt")
    outputs = model(**inputs)
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(probs).item()
    return predicted_class  # Map this to emotion labels


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/150 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

In [1]:
pip install transformers torch emoji




##🧠 Load Pre-trained Emoji Model (Cardiff NLP)

In [2]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np

# Load model and tokenizer
model_name = "cardiffnlp/twitter-roberta-base-emoji"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Load label mapping
labels = [
    "😂", "😍", "😭", "😊", "😒", "💕", "👌", "😘", "😁", "😩",
    "🔥", "🙏", "😏", "😉", "🙌", "😔", "💪", "😷", "👏", "😃",
    # model was trained on Twitter emojis — keep updating this list based on model card
]


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [10]:
emoji_to_emotion = {
    "😂": "joy",
    "😍": "love",
    "😭": "sadness",
    "😊": "happiness",
    "😒": "disapproval",
    "💕": "affection",
    "👌": "approval",
    "😘": "affection",
    "😁": "cheerful",
    "😩": "tired",
    "🔥": "excitement",
    "🙏": "gratitude",
    "😏": "smug",
    "😉": "playful",
    "🙌": "celebration",
    "😔": "disappointment",
    "💪": "strength",
    "😷": "sick",
    "👏": "praise",
    "😃": "happy",
    "😡":"angry"
}


##🧪 Predict Emoji Emotion

In [11]:
def predict_emoji_emotion(emoji_input):
    inputs = tokenizer(emoji_input, return_tensors="pt", truncation=True)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
        pred_class = torch.argmax(probs).item()
        predicted_emoji = labels[pred_class]
        emotion_name = emoji_to_emotion.get(predicted_emoji, "unknown")
        return emotion_name, probs[0][pred_class].item()


In [13]:
emojis_to_test = ["😭"]

for emoji_char in emojis_to_test:
    predicted_emoji, confidence = predict_emoji_emotion(emoji_char)
    print(f"Input Emoji: {emoji_char} → Predicted Emotion Emoji: {predicted_emoji} (Confidence: {confidence:.2f})")


Input Emoji: 😭 → Predicted Emotion Emoji: sadness (Confidence: 0.37)
