# 📓 **Exploratory Data Analysis (EDA) & Visualization**

✅ Objectives:

Visualize your viewing trends over time.  
Analyze genre preferences.  
Understand duration patterns and rating trends.  

## 📌**Initial Setup**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set style for visuals
sns.set_theme(style="darkgrid")

# Load enriched dataset
df = pd.read_csv('../data/processed/enriched_netflix_history.csv')

# Inspect data
df.head()


## 📅**Analysis of Viewing Trends Over Time**

### 📊*Monthly Viewing Frequency*

In [None]:
# Convert dates to datetime
df = pd.read_csv('../data/processed/enriched_netflix_history.csv')
df['Date Watched'] = pd.to_datetime(df['Date Watched'])

# Count views by month
monthly_views = df.resample('M', on='Date Watched').size()

# Plot
plt.figure(figsize=(12,6))
monthly_views.plot(kind='bar', color='skyblue')

plt.title('Monthly Netflix Viewing Frequency')
plt.xlabel('Month')
plt.ylabel('Number of Views')
plt.tight_layout()
plt.show()


### *Trends by Genre*

In [None]:
# Split genres and count occurrences
genre_counts = df['listed_in'].str.split(', ', expand=True).stack().value_counts()

# Plot top 10 genres
genre_counts.head(10).plot(kind='barh', figsize=(10,6))
plt.title('Top 10 Genres Watched')
plt.xlabel('Number of Views')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()


## 📝**Analysis**

### ⏱️*Durations Analysis*

In [None]:
# Clean duration (for Movies only)
movies = df[df['type'] == 'Movie'].copy()

# Convert durations to integers (minutes)
movies['duration_minutes'] = movies['duration'].str.replace(' min','').astype(float)

# Visualize distribution
plt.figure(figsize=(10,5))
sns.histplot(movies['duration_minutes'], bins=20, kde=True)
plt.title('Distribution of Movie Durations Watched')
plt.xlabel('Duration (minutes)')
plt.ylabel('Count')
plt.show()


### ★★☆☆☆*Ratings Analysis*

In [None]:
# Rating distribution
rating_counts = df['rating'].value_counts()

# Plot ratings distribution
plt.figure(figsize=(10,6))
sns.barplot(x=rating_counts.index, y=rating_counts.values)
plt.xticks(rotation=45)
plt.title('Content Ratings Distribution')
plt.xlabel('Rating')
plt.ylabel('Count')
plt.show()


## 📈 **Time-based Watching Patterns**

### 📆**Days of the Week**

In [None]:
df['Day_of_Week'] = df['Date Watched'].dt.day_name()
weekly_counts = df['Day_of_Week'] = df['Date Watched'].dt.day_name()

weekly_counts = df['Day_of_Week'].value_counts().reindex(
    ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
)

# Plot weekly viewing habits
plt.figure(figsize=(10,6))
sns.barplot(x=weekly_counts.index, y=weekly_counts.values)
plt.title("Netflix Watching Frequency by Day of Week")
plt.xlabel("Day of Week")
plt.ylabel("Number of Views")
plt.show()


### 📅**Yearly Trends**

In [None]:
# Extract year from date
df['Year'] = df['Date Watched'].dt.year

yearly_views = df['Year'].value_counts().sort_index()

# Plot yearly viewing frequency
yearly_views.plot(kind='line', marker='o', figsize=(8, 5))
plt.title('Netflix Viewing Trends by Year')
plt.ylabel('Number of Views')
plt.xlabel('Year')


## 🔍**Correlation Analysis**

In [None]:
# Correlation between content release year and views
sns.scatterplot(data=df, x='release_year', y='Year')
plt.title('Relation between Content Release Year and Watching Year')
plt.xlabel('Content Release Year')
plt.ylabel('Year Watched')


## 🎯**Insights & Summary**

“The most popular genre was XXX, followed by XXX.”
“Majority of content watched was rated TV-MA, showing preference for XXX content.”
"Significant spike in viewing activity during weekXXX and XXX months."