# Mini Project: Analyzing Netflix Watch History

## Goal:
1. Load and inspect data
2. Clean data
3. Derive some insight
4. Save the result to a csv file

### Dataset: 
I am creating my own mini Netflix watch log for this mini project.

### Environement:
- WSL: Ubuntu distro
- Python
- JupterLab
- Git


In [1]:
import pandas as pd

In [2]:
# Load the data

df = pd.read_csv("netflix_log.csv")

In [3]:
# inspect the data

df.head()

Unnamed: 0,Title,Date,Duration (min),Genre
0,The Office,2024-03-01,22,Comedy
1,Breaking Bad,2024-03-02,47,Drama
2,The Office,2024-03-02,21,Comedy
3,Stranger Things,2024-03-03,50,Sci-Fi
4,The Crown,2024-03-04,58,Drama


In [4]:
# Clean the data: check for missing values
# rename "Duration (min)" to "Duration_min"

print(df.isnull().sum())
df.rename(columns={"Duration (min)": "Duration_min"}, inplace=True)

Title             0
Date              0
Duration (min)    0
Genre             0
dtype: int64


### Basic Analysis
1. Which genre did the user watch the most?
2. How many minutes were spent watching *The Office.*
3. What was the average watch time per show?

In [6]:
df.head()

Unnamed: 0,Title,Date,Duration_min,Genre
0,The Office,2024-03-01,22,Comedy
1,Breaking Bad,2024-03-02,47,Drama
2,The Office,2024-03-02,21,Comedy
3,Stranger Things,2024-03-03,50,Sci-Fi
4,The Crown,2024-03-04,58,Drama


In [13]:
print(df["Genre"].value_counts())
print(df[df["Title"] == "The Office"]["Duration_min"].sum())
print(df.groupby("Title")["Duration_min"].mean())

Genre
Comedy    3
Drama     2
Sci-Fi    1
Name: count, dtype: int64
68
Title
Breaking Bad       47.000000
Stranger Things    50.000000
The Crown          58.000000
The Office         22.666667
Name: Duration_min, dtype: float64


### Summary

- The user watched Comedy movies the most.
- The user watched *The Office* for 68 minutes.
- *The Crown* has the highest average watch time. Popular? not conclusive; my data is very short and madeup.

In [14]:
# Save to a csv

average_watch = df.groupby("Title")["Duration_min"].mean()
average_watch.to_csv("average_watch_time.csv")