# Exploratory Data Analysis for Recommendation System

This notebook is used for exploratory data analysis (EDA) of the dataset used in the recommendation system. It includes data cleaning, feature exploration, and initial insights.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset
data = pd.read_csv('../data/dataset.csv')

# Display the first few rows of the dataset
data.head()

In [2]:
# Check for missing values
missing_values = data.isnull().sum()
missing_values[missing_values > 0]

In [3]:
# Visualize the distribution of user-item interactions
plt.figure(figsize=(10, 6))
sns.countplot(data['item_id'])
plt.title('Distribution of User-Item Interactions')
plt.xlabel('Item ID')
plt.ylabel('Count')
plt.xticks(rotation=90)
plt.show()

In [4]:
# Explore the correlation between features (if applicable)
correlation_matrix = data.corr()
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

## Initial Insights

- The dataset contains X number of users and Y number of items.
- There are Z missing values in the dataset that need to be addressed.
- The distribution of user-item interactions shows that certain items are more popular than others.

Further analysis will be conducted to refine the features and prepare the data for modeling.