# Exploratory Data Analysis

In this notebook, we will perform exploratory data analysis (EDA) on the spam dataset to understand its structure, visualize the data, and derive insights that will help in building the LSTM spam detector.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
data = pd.read_csv('../data/spam_dataset.csv')

# Display the first few rows of the dataset
data.head()

In [None]:
# Check for missing values
missing_values = data.isnull().sum()
missing_values[missing_values > 0]

In [None]:
# Visualize the distribution of spam and ham messages
plt.figure(figsize=(8, 6))
sns.countplot(x='label', data=data)
plt.title('Distribution of Spam and Ham Messages')
plt.xlabel('Label')
plt.ylabel('Count')
plt.show()

In [None]:
# Display basic statistics of the dataset
data.describe(include='all')

## Conclusion

This notebook provided an overview of the spam dataset, including its structure and distribution of classes. Further analysis can be conducted to prepare the data for training the LSTM model.