---

*I’ve spent countless hours watching Netflix... so you don’t have to.*

## **Why Do We Keep Clicking "Next Episode"?**

**What makes a show so addictive that we just can’t stop watching?**  
For engineers and content strategists, this question drives ongoing research into viewer behavior and engagement design.

From gripping cliffhangers to smart pacing, some shows seem built to pull us in—until it’s 2 AM and we’re whispering: **just one more episode**.

**What keeps us coming back, episode after episode, even when we know we should stop?**

This project explores that question using data from over **8,800 Netflix titles**, uncovering the patterns behind binge-worthy content.

### What’s inside the notebook:
- What makes a show easy to binge—like pacing, episode length, and release format  
- A simple machine learning model that predicts a show’s “Binge Score”  
- Which genres use cliffhangers most effectively  
- Tips for creators and platforms to design content that keeps viewers engaged  
- How binge habits vary across different countries

---


===============================
# 🔹 IMPORT LIBRARIES
===============================

In [1]:
## SUPPRESS WARNINGS FOR CLEANER OUTPUT
import warnings
warnings.filterwarnings('ignore')

## CORE LIBRARIES
import numpy as np  #for numerical operation
import pandas as pd #for data handling and analysis

## VISUALIZATION LIBRARIES
import matplotlib.pyplot as plt #for plots
import seaborn as sns #for statistical visualisation

## INTERACTIVE PLOTTING WITH PLOTLY
import plotly.express as px #high level interface for interactive plots
import plotly.graph_objects as go #low level interface for custom visualization
from plotly.subplots import make_subplots #for subplots layouts
import plotly.offline as pyo #offline mode for plotly in notebook
pyo.init_notebook_mode(connected=True) #initialize plotly for jupyter notebook

## NLP AND TEXT PREPROCESSING LIBRARIES
import re #regular expressions for text cleaning
import nltk #natural language toolkit
from textblob import TextBlob #for basic sentiment and basic analysis
from wordcloud import WordCloud #for generating word clouds

# Reinstall scikit-learn cleanly
!pip install --upgrade --force-reinstall scikit-learn

## FEATURE ENGINEERING AND PREPROCESSING
from sklearn.preprocessing import LabelEncoder, StandardScaler #for scaling and label encoding
from sklearn.feature_extraction.text import TfidfVectorizer #for text vectorization
from sklearn.decomposition import PCA  #dimentionality reduction
from sklearn.cluster import KMeans #for clustering

## MACHINE LEARNING MODELS
from sklearn.ensemble import RandomForestRegressor #Regression model

## MODEL TRAINING AND EVALUATION
from sklearn.model_selection import train_test_split #for splitting data
from sklearn.metrics import mean_squared_error, r2_score #for model evaluation

## OWLOAD REQUIRED NLTK DATASETS
nltk.download('stopwords', quiet=True) #stopword list
nltk.download('vader_lexicon', quiet=True) #lexicon for sentiment analysis
nltk.download('punkt', quiet=True) #tokenizer
from nltk.sentiment import SentimentIntensityAnalyzer #for VADER sentiment analyzer

## VISUALIZATION SETTINGS FOR CONSISTENT STYLING
sns.set_style('whitegrid') #white background with gray grid lines
plt.rcParams['figure.figsize'] = (8, 6) #default figure size
plt.rcParams['legend.fontsize'] = 12 #fontsize of legend
plt.rcParams['figure.titlesize'] = 16 #fontsize of figure title
plt.rcParams['figure.titleweight'] = 'bold' #bold font weight for figure title
plt.rcParams['font.family'] = 'sans-serif' #default font family
plt.rcParams['text.color'] = 'black' #default text color
plt.rcParams['axes.labelcolor'] = 'black' #default x and y label color
pd.set_option('display.max_columns', None) #display all columns in dataframe
pd.set_option('display.max_rows', None) #display all rows in dataframe

# Netflix-inspired brand colors (refined for visual clarity)
NETFLIX_PRIMARY   = '#E50914'  # Signature red
NETFLIX_DARK      = '#1A1A1A'  # Deep black for backgrounds
NETFLIX_LIGHT     = '#F4F4F4'  # Soft white for contrast
NETFLIX_ACCENT    = '#B20710'  # Rich red accent
NETFLIX_MUTED     = '#4A4A4A'  # Neutral gray for labels and borders

# Custom color palette for visualizations
netflix_palette = [
    NETFLIX_PRIMARY,
    NETFLIX_ACCENT,
    NETFLIX_DARK,
    NETFLIX_MUTED,
    '#8B0000'  # Optional deep red for variation
]

# Apply palette to Seaborn
sns.set_palette(netflix_palette)

print("Libraries loaded Successfully")

Collecting scikit-learn
  Using cached scikit_learn-1.7.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (11 kB)
Collecting numpy>=1.22.0 (from scikit-learn)
  Using cached numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (62 kB)
Collecting scipy>=1.8.0 (from scikit-learn)
  Using cached scipy-1.16.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (61 kB)
Collecting joblib>=1.2.0 (from scikit-learn)
  Using cached joblib-1.5.2-py3-none-any.whl.metadata (5.6 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn)
  Using cached threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Using cached scikit_learn-1.7.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (9.5 MB)
Using cached joblib-1.5.2-py3-none-any.whl (308 kB)
Using cached numpy-2.3.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB)
Using cached scipy-1.16.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (35.2 MB

Libraries loaded Successfully


============================

LOADING AND EXPLORING DATASET

============================

In [2]:
# Loading Netflix dataset from CSV
df = pd.read_csv('/netflix_titles.csv')

#getting the number of entries and columns
loaded = df.shape[0] #total record
titles = df.shape[1] # total columns

print(f'The dataset has {loaded} rows and {titles} columns.')

# number of movies and shows on netflix
movies = df[df['type'] == 'Movie'].shape[0]
tv_shows = df[df['type'] == 'TV Show'].shape[0]

print(f'The dataset has {movies} movies and {tv_shows} TV shows.')

The dataset has 8807 rows and 12 columns.
The dataset has 6131 movies and 2676 TV shows.


# =========================
# **THE BINGE HYPOTHESIS**
# =========================

**Before we begin analyzing the data, let’s define what typically makes a show “bingeable”:**

1. **Cliffhanger Structure**  
   - Episodes often end with *unresolved tension* or open questions, *prompting viewers to continue watching*.

2. **Emotional Feedback Loop**  
   - Fast-paced storytelling and frequent *emotional payoffs* create a sense of momentum and reward.

3. **Cultural Relevance**  
   - Shows that are *widely discussed* or *trending tend* to attract viewers who want to stay in the loop.

4. **Cognitive Ease**  
   - Content that’s easy to follow and *doesn’t require deep concentration* is more likely to be consumed in long stretches.

5. **Viewer Commitment Effect**  
   - The more time a viewer invests in a series, the harder it becomes to disengage—creating a *self-reinforcing cycle*.

**Now, Let’s examine whether the data supports these patterns.**

---
