# Emotion Classification in Text Samples

## Overview
This project aims to develop machine learning models for classifying emotions in text samples. We will employ various preprocessing techniques, feature extraction methods, and machine learning models to achieve the classification task.

## Objective
The objective of this assessment is to create and evaluate machine learning models to classify emotions in text, focusing on preprocessing, feature extraction, model training, and evaluation.

### Code:


In [24]:
import pandas as pd
import re
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Load the dataset
data = pd.read_csv('C:\\Users\\ZAIN NIZAR YOUSAF\\Downloads\\nlp_dataset.csv')

# Sample text cleaning function
def preprocess_text(text):
    text = re.sub(r'[^a-zA-Z\s]', '', text)  # Remove punctuation and numbers
    text = text.lower()  # Convert to lowercase
    tokens = word_tokenize(text)  # Tokenize
    tokens = [word for word in tokens if word not in stopwords.words('english')]  # Remove stopwords
    return ' '.join(tokens)

# Apply preprocessing
data['cleaned_text'] = data['Comment'].apply(preprocess_text)


### 2. Feature Extraction (2 marks)
For feature extraction, we will use **TfidfVectorizer**. This method transforms the text data into numerical features by calculating the Term Frequency-Inverse Document Frequency (TF-IDF) for each word, which reflects how important a word is to a document relative to the entire corpus.

#### Code

In [27]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Initialize TfidfVectorizer
vectorizer = TfidfVectorizer()

# Fit and transform the cleaned text data
X = vectorizer.fit_transform(data['cleaned_text'])
y = data['Comment']  # Replace with the appropriate label column


### 3. Model Development 
We will train the following machine learning models:

1. **Naive Bayes**
2. **Support Vector Machine (SVM)**

These models are commonly used for text classification tasks and will help us evaluate the effectiveness of our feature extraction process.

#### Code

In [28]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import SVC

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Naive Bayes model
nb_model = MultinomialNB()
nb_model.fit(X_train, y_train)

# Train SVM model
svm_model = SVC()
svm_model.fit(X_train, y_train)


### 4. Model Comparison (2 marks)
We will evaluate both models using metrics such as accuracy and F1-score. These metrics will help us understand how well each model performs in classifying emotions in the text samples.

#### Code

In [29]:
from sklearn.metrics import accuracy_score, f1_score

# Predictions
nb_pred = nb_model.predict(X_test)
svm_pred = svm_model.predict(X_test)

# Evaluation metrics
nb_accuracy = accuracy_score(y_test, nb_pred)
svm_accuracy = accuracy_score(y_test, svm_pred)

nb_f1 = f1_score(y_test, nb_pred, average='weighted')
svm_f1 = f1_score(y_test, svm_pred, average='weighted')

print(f'Naive Bayes Accuracy: {nb_accuracy}, F1 Score: {nb_f1}')
print(f'SVM Accuracy: {svm_accuracy}, F1 Score: {svm_f1}')


Naive Bayes Accuracy: 0.0, F1 Score: 0.0
SVM Accuracy: 0.0008417508417508417, F1 Score: 0.0008417508417508417
