<a href="https://colab.research.google.com/github/AnanyaGarg51/IBY-Repo/blob/main/IBY_Emotions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# EXPLORATORY DATA ANALYSIS: EMOTION DATASET
* This is a concrete analysis on the Emotion dataset provided for each candidate, one at a time.
I have implemented the codes on Colab notebooks, the Github links for the same have been attached in the Notion document.
* This notebook is focussed on analyzing the outputs obtained from the codes in the form of visual representations and extracting useful decisions to help in the decision-making process.
* The codes have been implementd in the Python language using libraries, namely- Pandas, Matplotlib and Seaborn.

**BASIC DATASET ANALYSIS**

Pandas functions namely .info() and .head() have been used to get an idea of the type of data present in the give CSV files and its range and data types used.

In [None]:
# Import necessary libraries
from google.colab import files
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Read the uploaded files into Pandas DataFrames
emotion_df = pd.read_csv('/content/emotion (10).csv')
gaze_df = pd.read_csv('/content/gaze (10).csv')
metadata_df= pd.read_csv('/content/metadata (10).csv')

# Preview the first few rows of each DataFrame to ensure they've loaded correctly
print("Emotion Dataset Information:")
print(emotion_df.info())
print("\n")
print("Emotion Data:")
print(emotion_df.head())

print("\n")
print("\n")

print("Gaze Dataset Information:")
print(gaze_df.info())
print("\n")
print("Gaze Data: ")
print(gaze_df.head())

**DATASET STATISTICAL SUMMARY**


In [None]:
# Disable scientific notation in Pandas
pd.set_option('display.float_format', lambda x: '%.6f' % x)

# Now re-run the .describe() method to see the numbers in decimal format
emotion_summary = emotion_df[['angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral']].describe()
print(emotion_summary)


**DOMINANT EMOTIONS**

In [None]:
# Plot the distribution of dominant emotions
plt.figure(figsize=(10, 6))
sns.countplot(data=emotion_df, x='dominant_emotion', palette='viridis')
plt.title('Distribution of Dominant Emotions')
plt.xlabel('Dominant Emotion')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

# Analyze dominant emotions
dominant_emotion_counts = emotion_df['dominant_emotion'].value_counts()
plt.figure(figsize=(8, 6))
dominant_emotion_counts.plot(kind='pie', autopct='%1.1f%%')
plt.title('Distribution of Dominant Emotions')
plt.ylabel('')
plt.tight_layout()
plt.show()

**EVOLUTION OF EMOTIONS ACROSS DIFFERENT TIME FRAMES**

In [None]:
import math

# Step 1: Split the data into 3 parts: Initial, Middle, Ending
total_rows = len(emotion_df)
split_size = math.ceil(total_rows / 3)

# Create labels for intervals
emotion_df['time_interval'] = ['Initial'] * split_size + ['Middle'] * split_size + ['Ending'] * (total_rows - 2 * split_size)

# Step 2: Group by time interval and dominant emotion
grouped_df = emotion_df.groupby(['time_interval', 'dominant_emotion']).size().reset_index(name='count')

# Step 3: Plot the distribution of dominant emotions across the three intervals
plt.figure(figsize=(18, 6))

# Initial
plt.subplot(1, 3, 1)
sns.barplot(data=grouped_df[grouped_df['time_interval'] == 'Initial'], x='dominant_emotion', y='count', palette='viridis')
plt.title('Initial Time Interval')
plt.xlabel('Dominant Emotion')
plt.ylabel('Count')

# Middle
plt.subplot(1, 3, 2)
sns.barplot(data=grouped_df[grouped_df['time_interval'] == 'Middle'], x='dominant_emotion', y='count', palette='viridis')
plt.title('Middle Time Interval')
plt.xlabel('Dominant Emotion')
plt.ylabel('Count')

# Ending
plt.subplot(1, 3, 3)
sns.barplot(data=grouped_df[grouped_df['time_interval'] == 'Ending'], x='dominant_emotion', y='count', palette='viridis')
plt.title('Ending Time Interval')
plt.xlabel('Dominant Emotion')
plt.ylabel('Count')

plt.tight_layout()
plt.show()


**AVERAGE INTENSITY OF EMOTIONS**

In [None]:
# Calculate average intensity for each emotion
emotion_averages = emotion_df[['angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral']].mean().sort_values(ascending=False)

# Plot average emotion intensities
plt.figure(figsize=(10, 6))
sns.barplot(x=emotion_averages.index, y=emotion_averages.values)
plt.title('Average Intensity of Emotions')
plt.xlabel('Emotion')
plt.ylabel('Average Intensity')
plt.xticks(rotation=45)
plt.show()

print("Average Emotion Intensities:")
print(emotion_averages)

# Calculate the total duration of the interview
total_duration = metadata_df['elapsed_time'].max() - metadata_df['elapsed_time'].min()
print(f"\
Total interview duration: {total_duration:.2f} seconds")

**GAZE ANALYSIS**

In [None]:
# Plot the distribution of gaze
plt.figure(figsize=(8, 6))
sns.countplot(x='gaze', data=gaze_df)
plt.title('Distribution of Gaze')
plt.xlabel('Gaze (1 = Direct, 0 = Not Direct)')
plt.ylabel('Count')
plt.tight_layout()
plt.show()

# Plot the distribution of eye offset
plt.figure(figsize=(10, 6))
sns.histplot(gaze_df['eye_offset'], bins=20, kde=True)
plt.title('Distribution of Eye Offset')
plt.xlabel('Eye Offset')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()

# Plot eye offset over time
plt.figure(figsize=(12, 6))
plt.plot(gaze_df['image_seq'], gaze_df['eye_offset'], marker='o')
plt.title('Eye Offset Over Time')
plt.xlabel('Frame Sequence')
plt.ylabel('Eye Offset')
plt.tight_layout()
plt.show()

**ANALYSIS OF EYE OFFSET**

In [None]:
# Create a scatter plot of eye offset over time
plt.figure(figsize=(10, 6))
plt.scatter(gaze_df['image_seq'], gaze_df['eye_offset'])
plt.title('Eye Offset Over Time')
plt.xlabel('Frame Sequence')
plt.ylabel('Eye Offset')
plt.tight_layout()
plt.show()

# Calculate percentage of frames with gaze
gaze_percentage = (gaze_df['gaze'].sum() / len(gaze_df)) * 100
print(f"Percentage of frames with direct gaze: {gaze_percentage:.2f}%")

# Calculate average eye offset
avg_eye_offset = gaze_df['eye_offset'].mean()
print(f"Average eye offset: {avg_eye_offset:.2f}")

# Correlation between gaze and eye offset
correlation = gaze_df['gaze'].corr(gaze_df['eye_offset'])
print(f"Correlation between gaze and eye offset: {correlation:.2f}")