# GROUP 7 DSFT09 HYBRID PHASE 4
## Mirriam Mumbua

## Multi-Label Emotion Classification Using NLP: Analyzing Emotional Tone in Social Media and Text Data

## Project overview
### Problem Statement
### This project aims to build an emotion classifier using the GoEmotions dataset, which includes human-labeled Reddit comments annotated for 27 emotion categories. The goal is to develop a model that can accurately classify text into one of these emotions, which can be applied to analyzing social media comments, reviews, or even customer feedback to detect emotional tone.

### Dataset
### The GoEmotions dataset was sourced from Reddit, a popular social media platform where users post comments on various topics. Specifically, the dataset consists of over 58,000 Reddit comments that were manually annotated by human labelers into 27 distinct emotion categories (such as joy, anger, sadness, curiosity, and more).The dataset was created by Google Research as part of their efforts to advance Natural Language Processing (NLP) research. The comments were collected from publicly available Reddit posts, ensuring a wide variety of topics and emotional expressions. The comments were then labeled with one or more emotions, making it a multi-label classification problem.

## Objective
### Develop a machine learning model capable of:

### Multi-label emotion detection: Predicting one or more emotion categories for each Reddit comment from the 27 possible emotion classes.
### Handling noisy and real-world text: Effectively preprocessing the text (e.g., dealing with slang, abbreviations, and varied sentence structures in Reddit comments) to ensure accurate predictions.
### Accurate classification: Maximizing the model's performance on key metrics for multi-label classification (such as F1-score, precision, and recall) across all 27 emotion categories.

## Expected Outcome
### The final model will:

### Take a Reddit comment as input.
### Output one or more emotion labels (from the 27 possible emotions) that best represent the emotional tone of the comment.

## Applications
### This emotion detection model could be applied to:

### Social media monitoring: To understand public sentiment on platforms like Reddit, Twitter, and Facebook.
### Customer feedback analysis: For detecting emotional tone in product reviews or customer support conversations.
### Mental health monitoring: To detect signs of distress or mental health issues in text-based communications on forums or in private messages.



Business Understanding

Data Understanding 

Importing Necessary Libraries

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import nltk
from nltk.corpus import stopwords, wordnet
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from langdetect import detect
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from textblob import TextBlob

In [15]:
# Load the training data
train_data = pd.read_csv('full_dataset/train.tsv', sep='\t', header=None ,names=['text', 'label', 'id'])

# Load the validation data
dev_data = pd.read_csv('full_dataset/dev.tsv', sep='\t', header=None,names=['text', 'label', 'id'])
# test data 
test_data = pd.read_csv('full_dataset/test.tsv', sep='\t', header=None,names=['text', 'label', 'id'])

train_data.head()
dev_data.head()
test_data.head()


Unnamed: 0,text,label,id
0,I’m really sorry about your situation :( Altho...,25,eecwqtt
1,It's wonderful because it's awful. At not with.,0,ed5f85d
2,"Kings fan here, good luck to you guys! Will be...",13,een27c3
3,"I didn't know that, thank you for teaching me ...",15,eelgwd1
4,They got bored from haunting earth for thousan...,27,eem5uti


In [28]:
label_to_emotion = {}
with open('full_dataset/emotions.txt', 'r') as f:
    for idx, line in enumerate(f):
        label_to_emotion[idx] = line.strip()
        
        
            