# IMDB Top 250 Series Exploratory Data Analysis
Some questions we are interested in answering:
- What are the top TV Series by their Mean Episode Rating?
  - Also by their Median Episode Rating
- What are the top TV Seasons by their Mean Episode Rating?
  - Also by their Median Episode Rating
- What are the most loved TV Series endings (with highest rating in the last episode)?
- What are the most loved TV Series startings (with highest rating in the first episode)?
- What are the TV Series that continusly keep improving (with highest slope in their ratings trend)?
- What's the global distribution of episode ratings?
  - Also global average and median?
  - Which rage is above such average making it a "good episode"?
  - What's the series with the highest number of "good episodes"?
- Which series have a higher gap between their Mean Episode Rating and their TV Series Rating?

## Prepare Data

In [1]:
import pandas as pd
import plotly.express as px

In [2]:
global_ratings = pd.read_csv('../input/top-250-imdb-series-episode-ratings/imdb_top_250_series_global_ratings.csv')
global_ratings = global_ratings.drop('Code.1', axis=1)
global_ratings.head()

Unnamed: 0,Code,Title,Rating,Rating Count
0,tt5491994,Planet Earth II,9.4,142844
1,tt0903747,Breaking Bad,9.4,1817275
2,tt0795176,Planet Earth,9.4,208191
3,tt0185906,Band of Brothers,9.4,460468
4,tt7366338,Chernobyl,9.3,729461


In [3]:
episode_ratings = pd.read_csv(
    '../input/top-250-imdb-series-episode-ratings/imdb_top_250_series_episode_ratings.csv',
    index_col=0
)
episode_ratings.head()

Unnamed: 0,Season,Episode,Rating,Code,Title
0,1,1,9.4,tt5491994,Planet Earth II
1,1,2,9.1,tt5491994,Planet Earth II
2,1,3,8.9,tt5491994,Planet Earth II
3,1,4,8.8,tt5491994,Planet Earth II
4,1,5,8.6,tt5491994,Planet Earth II


In [4]:
global_ratings['IMDb Rank'] = range(1, 251)
global_ratings.head()

Unnamed: 0,Code,Title,Rating,Rating Count,IMDb Rank
0,tt5491994,Planet Earth II,9.4,142844,1
1,tt0903747,Breaking Bad,9.4,1817275,2
2,tt0795176,Planet Earth,9.4,208191,3
3,tt0185906,Band of Brothers,9.4,460468,4
4,tt7366338,Chernobyl,9.3,729461,5


In [5]:
data = episode_ratings[['Season', 'Episode', 'Rating', 'Code']].rename({'Rating': 'Episode Rating'}, axis=1).join(
    global_ratings.set_index('Code'), 
    on='Code'
).rename({'Rating': 'Global Series Rating'}, axis=1)

data.head()

Unnamed: 0,Season,Episode,Episode Rating,Code,Title,Global Series Rating,Rating Count,IMDb Rank
0,1,1,9.4,tt5491994,Planet Earth II,9.4,142844,1
1,1,2,9.1,tt5491994,Planet Earth II,9.4,142844,1
2,1,3,8.9,tt5491994,Planet Earth II,9.4,142844,1
3,1,4,8.8,tt5491994,Planet Earth II,9.4,142844,1
4,1,5,8.6,tt5491994,Planet Earth II,9.4,142844,1


In [6]:
data['Rating Count'] = data['Rating Count'].apply(lambda x: x.replace(',', '')).astype(int)

## What are the top TV Series by their Mean Episode Rating

In [7]:
def make_clickable(val):
    # target _blank to open new window
    return '<a target="_blank" href="{}">{}</a>'.format(val, val)

view = data.groupby(['Code', 'Title']).mean()[
    ['Episode Rating', 'Global Series Rating', 'IMDb Rank', 'Rating Count']
].sort_values(
    'Episode Rating', 
    ascending=False
).copy()

view = view.reset_index()

view['url'] = view['Code'].apply(lambda x: 'https://www.imdb.com/title/'+x)

view[['IMDb Rank', 'Rating Count']] = view[['IMDb Rank', 'Rating Count']].astype(int)
view = view.drop('Code', axis=1)
view = view.rename({'Episode Rating': 'Mean Episode Rating'}, axis=1)
view.head(15).style.format({'url': make_clickable})

Unnamed: 0,Title,Mean Episode Rating,Global Series Rating,IMDb Rank,Rating Count,url
0,Chernobyl,9.54,9.3,5,729461,https://www.imdb.com/title/tt7366338
1,Aspirants,9.28,8.6,123,295762,https://www.imdb.com/title/tt14392248
2,Attack on Titan,9.14023,8.9,26,360876,https://www.imdb.com/title/tt2560140
3,The Beatles: Get Back,9.133333,8.9,36,21612,https://www.imdb.com/title/tt9735318
4,The Last Dance,9.1,9.0,17,118827,https://www.imdb.com/title/tt8420184
5,Ramayan,9.065385,8.5,156,19898,https://www.imdb.com/title/tt0268093
6,Gullak,9.053333,8.6,99,16725,https://www.imdb.com/title/tt10530900
7,Arcane,9.044444,8.9,24,193782,https://www.imdb.com/title/tt11126994
8,Scam 1992: The Harshad Mehta Story,9.04,9.0,23,142151,https://www.imdb.com/title/tt12392504
9,Band of Brothers,9.04,9.4,4,460468,https://www.imdb.com/title/tt0185906


In [8]:
fig = px.bar(
    view.head(15).sort_values('Mean Episode Rating'),
    y='Title',
    x=['Global Series Rating', 'Mean Episode Rating'],
    barmode='group',
    hover_data=['IMDb Rank', 'Rating Count', 'url'],
    title='IMDb Rating vs Mean Episode Rating'
)
fig.update_xaxes(
    range=[8, 10],
    title='Rating'
)
fig.show()

## What are the top TV Series by their **Median** Episode Rating

In [9]:
def make_clickable(val):
    # target _blank to open new window
    return '<a target="_blank" href="{}">{}</a>'.format(val, val)

view = data.groupby(['Code', 'Title']).median()[
    ['Episode Rating', 'Global Series Rating', 'IMDb Rank', 'Rating Count']
].sort_values(
    'Episode Rating', 
    ascending=False
).copy()

view = view.reset_index()

view['url'] = view['Code'].apply(lambda x: 'https://www.imdb.com/title/'+x)

view[['IMDb Rank', 'Rating Count']] = view[['IMDb Rank', 'Rating Count']].astype(int)
view = view.drop('Code', axis=1)
view = view.rename({'Episode Rating': 'Median Episode Rating'}, axis=1)
view.head(15).style.format({'url': make_clickable})

Unnamed: 0,Title,Median Episode Rating,Global Series Rating,IMDb Rank,Rating Count,url
0,Chernobyl,9.5,9.3,5,729461,https://www.imdb.com/title/tt7366338
1,Aspirants,9.3,8.6,123,295762,https://www.imdb.com/title/tt14392248
2,Attack on Titan,9.2,8.9,26,360876,https://www.imdb.com/title/tt2560140
3,Arcane,9.2,8.9,24,193782,https://www.imdb.com/title/tt11126994
4,Ramayan,9.1,8.5,156,19898,https://www.imdb.com/title/tt0268093
5,The Last Dance,9.05,9.0,17,118827,https://www.imdb.com/title/tt8420184
6,Band of Brothers,9.05,9.4,4,460468,https://www.imdb.com/title/tt0185906
7,The Beatles: Get Back,9.0,8.9,36,21612,https://www.imdb.com/title/tt9735318
8,TVF Pitchers,9.0,8.8,50,67038,https://www.imdb.com/title/tt4742876
9,Cosmos: A Spacetime Odyssey,9.0,9.2,9,121155,https://www.imdb.com/title/tt2395695


In [10]:
fig = px.bar(
    view.head(15).sort_values('Median Episode Rating'),
    y='Title',
    x=['Global Series Rating', 'Median Episode Rating'],
    barmode='group',
    hover_data=['IMDb Rank', 'Rating Count', 'url'],
    title='IMDb Rating vs Median Episode Rating'
)
fig.update_xaxes(
    range=[8, 10],
    title='Rating'
)
fig.show()