# **Analyzing and Predicting K-Drama Trends based on Top 250 Performing Series**

In this data science project, we will analyze the top 250 performing K-drama series to identify trends and patterns in relation to their ratings. We will use various data analysis techniques to extract insights from the data, such as identifying the most popular genres, tags, casts, air times, networks, and content ratings, as well as the factors that contribute to a series' success. Based on this analysis, we will develop a predictive model to forecast the popularity of future K-drama series. The goal of this project is to provide valuable insights for the K-drama industry and help content creators and broadcasters to better understand and target their audience.

**The questions we seek to answer in the Exploratory Data Analysis portion are:**
1. Which networks garnered the highest and lowest average ratings?
2. Which production companies achieved the highest and lowest average ratings?
3. Which genres are the most popular? least popular?
4. Which actors and/or actresses rank high? rank low?
5. Which directors rank high? rank low?
6. Is there a correlation between episode duration and ratings? 
7. Is there a connection between episode number and ratings?

# **Setup**
Next cell imports all Python libraries needed for the project.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import numpy as np

# **Import all datasets**
Each import will be showing its first 5 rows of data as a preview.

In [2]:
kdrama_data = pd.read_csv("../data/kdrama.csv")
kdrama_data.head()

Unnamed: 0,Name,Aired Date,Year of release,Original Network,Aired On,Number of Episodes,Duration,Content Rating,Rating,Synopsis,Genre,Tags,Director,Screenwriter,Cast,Production companies,Rank
0,Move to Heaven,"May 14, 2021",2021,Netflix,Friday,10,52 min.,18+ Restricted (violence & profanity),9.2,Geu Roo is a young autistic man. He works for ...,"Life, Drama, Family","Autism, Uncle-Nephew Relationship, Death, Sava...",Kim Sung Ho,Yoon Ji Ryun,"Lee Je Hoon, Tang Jun Sang, Hong Seung Hee, Ju...","Page One Film, Number Three Pictures",#1
1,Flower of Evil,"Jul 29, 2020 - Sep 23, 2020",2020,tvN,"Wednesday, Thursday",16,1 hr. 10 min.,15+ - Teens 15 or older,9.1,Although Baek Hee Sung is hiding a dark secret...,"Thriller, Romance, Crime, Melodrama","Married Couple, Deception, Suspense, Family Se...","Kim Chul Gyu, Yoon Jong Ho",Yoo Jung Hee,"Lee Joon Gi, Moon Chae Won, Jang Hee Jin, Seo ...",Monster Union,#2
2,Hospital Playlist,"Mar 12, 2020 - May 28, 2020",2020,"Netflix, tvN",Thursday,12,1 hr. 30 min.,15+ - Teens 15 or older,9.1,The stories of people going through their days...,"Friendship, Romance, Life, Medical","Strong Friendship, Multiple Mains, Best Friend...",Shin Won Ho,Lee Woo Jung,"Jo Jung Suk, Yoo Yeon Seok, Jung Kyung Ho, Kim...","Egg Is Coming, CJ ENM",#3
3,Hospital Playlist 2,"Jun 17, 2021 - Sep 16, 2021",2021,"Netflix, tvN",Thursday,12,1 hr. 40 min.,15+ - Teens 15 or older,9.1,Everyday is extraordinary for five doctors and...,"Friendship, Romance, Life, Medical","Workplace, Strong Friendship, Best Friends, Mu...",Shin Won Ho,Lee Woo Jung,"Jo Jung Suk, Yoo Yeon Seok, Jung Kyung Ho, Kim...","Egg Is Coming, CJ ENM",#4
4,My Mister,"Mar 21, 2018 - May 17, 2018",2018,tvN,"Wednesday, Thursday",16,1 hr. 17 min.,15+ - Teens 15 or older,9.1,Park Dong Hoon is a middle-aged engineer who i...,"Psychological, Life, Drama, Family","Age Gap, Nice Male Lead, Strong Female Lead, H...","Kim Won Suk, Kim Sang Woo",Park Hae Young,"Lee Sun Kyun, IU, Park Ho San, Song Sae Byuk, ...",Chorokbaem Media,#5


# **Data Preprocessing**
Transforming data and checking for missing values in the dataset.

In [3]:
# Check if there are any missing values

kdrama_nan_count = kdrama_data.isna().sum()
kdrama_nan_count

Name                    0
Aired Date              0
Year of release         0
Original Network        0
Aired On                0
Number of Episodes      0
Duration                0
Content Rating          5
Rating                  0
Synopsis                0
Genre                   0
Tags                    0
Director                1
Screenwriter            1
Cast                    0
Production companies    2
Rank                    0
dtype: int64

In [4]:
kdrama_data[kdrama_data.isnull().any(axis=1)]

Unnamed: 0,Name,Aired Date,Year of release,Original Network,Aired On,Number of Episodes,Duration,Content Rating,Rating,Synopsis,Genre,Tags,Director,Screenwriter,Cast,Production companies,Rank
140,One Dollar Lawyer,"Sep 23, 2022 - Nov 11, 2022",2022,SBS,"Friday, Saturday",12,1 hr. 10 min.,,8.4,Cheon Ji Hun is a lawyer with unusual flair. H...,"Comedy, Law, Drama","Lawyer Male Lead, Former Prosecutor Male Lead,...",Kim Jae Hyun,"Choi Soo Jin, Choi Chang Hwan","Namkoong Min, Kim Ji Eun, Choi Dae Hoon, Lee D...",Studio S,#141
146,Duel,"Jun 3, 2017 - Jul 23, 2017",2017,OCN,"Saturday, Sunday",16,1 hr. 3 min.,,8.4,"Jang Deuk Cheon, a hardened detective cop whos...","Thriller, Mystery, Sci-Fi","Amnesia, Human Experimentation, Kidnapping, Tr...",Lee Jong Jae,Kim Yoon Joo,"Jung Jae Young, Kim Jung Eun, Yang Se Jong, Se...",Chorokbaem Media,#147
171,Player,"Sep 29, 2018 - Nov 11, 2018",2018,OCN,"Saturday, Sunday",14,1 hr. 5 min.,,8.4,A police redemption team consisting of a swind...,"Action, Thriller, Mystery, Comedy","Strong Female Lead, Corruption, Hidden Identit...",Go Jae Hyun,Shin Jae Hyung,"Song Seung Heon, Krystal Jung, Lee Si Eon, Tae...",iWill Media,#172
190,The Mysterious Class,"Nov 12, 2021 - Dec 31, 2021",2021,YouTube,Friday,8,20 min.,,8.3,"""There are 21 students in our class."" ""What ar...","Mystery, Horror, Youth, Supernatural","Student, Ghost, Investigation, High School, Sc...",Ha Han Me,Han Song-yi,"Choi Hyun Suk, Park Ji Hoon, Yoshi, Kim Jun Ky...",YG Entertainment,#191
192,"It's Okay, That's Friendship","Mar 5, 2021",2021,YouTube,Friday,1,20 min.,G - All Ages,8.3,The 12 members of TREASURE take on acting for ...,"Comedy, Life","Idol Male Lead, Multiple Mains, Transfer Stude...",,,"Choi Hyun Suk, Park Ji Hoon, Kim Jun Kyu, Yosh...",YG Entertainment,#193
218,Angry Mom,"Mar 18, 2015 - May 7, 2015",2015,MBC,"Wednesday, Thursday",16,1 hr. 10 min.,15+ - Teens 15 or older,8.3,"The protagonist, Jo Kang Ja was once legendary...","Comedy, Drama, Melodrama","Independent Female Lead, Mother-Daughter Relat...",Ashbun,Kim Ban Di,"Kim Hee Sun, Kim Yoo Jung, Ji Hyun Woo, Kim Ji...",,#219
230,Liar Game,"Oct 20, 2014 - Nov 25, 2014",2014,tvN,"Monday, Tuesday",12,1 hr. 3 min.,,8.3,Various contestants take part in a game show w...,"Thriller, Mystery, Psychological, Drama","Naive Female Lead, Debt, Game Show, Swindler M...",Kim Hong Seon,Ryu Yong Jae,"Lee Sang Yoon, Kim So Eun, Shin Sung Rok, Cha ...","Apollo Pictures, Fantagio",#231
240,Coffee Prince,"Jul 2, 2007 - Aug 28, 2007",2007,MBC,"Monday, Tuesday",17,60 min.,15+ - Teens 15 or older,8.3,Choi Han Gyul is the grandson of chairwoman Ba...,"Food, Comedy, Romance, Drama","Cross-Dressing, Hidden Identity, Boss-Employee...",Lee Yoon Jung,"Jang Hyun Joo, Lee Jung Ah","Gong Yoo, Yoon Eun Hye, Lee Sun Kyun, Chae Jun...",,#241


In [5]:
# Apply One Hot Encoding to Content Rating to transform it from categorical to a binary vector representation.

kdrama_data = pd.get_dummies(kdrama_data, prefix=["Content Rating"], columns=["Content Rating"])
kdrama_data.head()

Unnamed: 0,Name,Aired Date,Year of release,Original Network,Aired On,Number of Episodes,Duration,Rating,Synopsis,Genre,Tags,Director,Screenwriter,Cast,Production companies,Rank,Content Rating_13+ - Teens 13 or older,Content Rating_15+ - Teens 15 or older,Content Rating_18+ Restricted (violence & profanity),Content Rating_G - All Ages
0,Move to Heaven,"May 14, 2021",2021,Netflix,Friday,10,52 min.,9.2,Geu Roo is a young autistic man. He works for ...,"Life, Drama, Family","Autism, Uncle-Nephew Relationship, Death, Sava...",Kim Sung Ho,Yoon Ji Ryun,"Lee Je Hoon, Tang Jun Sang, Hong Seung Hee, Ju...","Page One Film, Number Three Pictures",#1,0,0,1,0
1,Flower of Evil,"Jul 29, 2020 - Sep 23, 2020",2020,tvN,"Wednesday, Thursday",16,1 hr. 10 min.,9.1,Although Baek Hee Sung is hiding a dark secret...,"Thriller, Romance, Crime, Melodrama","Married Couple, Deception, Suspense, Family Se...","Kim Chul Gyu, Yoon Jong Ho",Yoo Jung Hee,"Lee Joon Gi, Moon Chae Won, Jang Hee Jin, Seo ...",Monster Union,#2,0,1,0,0
2,Hospital Playlist,"Mar 12, 2020 - May 28, 2020",2020,"Netflix, tvN",Thursday,12,1 hr. 30 min.,9.1,The stories of people going through their days...,"Friendship, Romance, Life, Medical","Strong Friendship, Multiple Mains, Best Friend...",Shin Won Ho,Lee Woo Jung,"Jo Jung Suk, Yoo Yeon Seok, Jung Kyung Ho, Kim...","Egg Is Coming, CJ ENM",#3,0,1,0,0
3,Hospital Playlist 2,"Jun 17, 2021 - Sep 16, 2021",2021,"Netflix, tvN",Thursday,12,1 hr. 40 min.,9.1,Everyday is extraordinary for five doctors and...,"Friendship, Romance, Life, Medical","Workplace, Strong Friendship, Best Friends, Mu...",Shin Won Ho,Lee Woo Jung,"Jo Jung Suk, Yoo Yeon Seok, Jung Kyung Ho, Kim...","Egg Is Coming, CJ ENM",#4,0,1,0,0
4,My Mister,"Mar 21, 2018 - May 17, 2018",2018,tvN,"Wednesday, Thursday",16,1 hr. 17 min.,9.1,Park Dong Hoon is a middle-aged engineer who i...,"Psychological, Life, Drama, Family","Age Gap, Nice Male Lead, Strong Female Lead, H...","Kim Won Suk, Kim Sang Woo",Park Hae Young,"Lee Sun Kyun, IU, Park Ho San, Song Sae Byuk, ...",Chorokbaem Media,#5,0,1,0,0


In [6]:
# Apply One Hot Encoding to genres to transform it from categorical to a binary vector representation.

# TODO: Remove trailing whitespaces from list values

kdrama_data["Genre"] = kdrama_data["Genre"].apply(lambda value: value.split(","))
kdrama_data = pd.get_dummies(kdrama_data["Genre"].apply(pd.Series) \
                                                 .stack()) \
                                                 .sum(level=0)
kdrama_data.head()

  kdrama_data = pd.get_dummies(kdrama_data["Genre"].apply(pd.Series) \


Unnamed: 0,Business,Comedy,Crime,Crime.1,Drama,Drama.1,Family,Family.1,Fantasy,Fantasy.1,...,Historical,Horror,Law,Life,Military,Music,Mystery,Psychological,Romance,Thriller
0,0,0,0,0,1,0,0,1,0,0,...,0,0,0,1,0,0,0,0,0,0
1,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,1,0,0,1,0,0,...,0,0,0,0,0,0,0,1,0,0


In [7]:
# Rename columns to a better format which is without spaces

kdrama_data.columns = kdrama_data.columns.str.replace(' ', '_')
kdrama_data.columns

Index(['__Business', '__Comedy', '__Crime', '__Crime_', '__Drama', '__Drama_',
       '__Family', '__Family_', '__Fantasy', '__Fantasy_', '__Historical',
       '__Horror', '__Law', '__Life', '__Life_', '__Medical', '__Medical_',
       '__Melodrama', '__Melodrama_', '__Military', '__Mystery',
       '__Political_', '__Psychological', '__Romance', '__Romance_',
       '__School', '__Sci-Fi', '__Sci-Fi_', '__Sports', '__Sports_',
       '__Supernatural', '__Supernatural_', '__Thriller', '__Youth',
       '__Youth_', '_Comedy', '_Crime', '_Drama', '_Drama\r', '_Drama_',
       '_Fantasy', '_Fantasy_', '_Historical', '_Horror', '_Law', '_Life',
       '_Medical', '_Melodrama', '_Melodrama_', '_Military', '_Mystery',
       '_Political', '_Psychological', '_Romance', '_Sci-Fi', '_Sitcom',
       '_Sports', '_Supernatural', '_Thriller', '_Youth', 'Action',
       'Adventure', 'Business', 'Comedy', 'Drama', 'Food', 'Friendship',
       'Historical', 'Horror', 'Law', 'Life', 'Military', 'Musi

In [8]:
# Rename Content Rating dummies

kdrama_data.rename(columns = {'Content_Rating_13+_-_Teens_13_or_older':'Content_Rating_13+', 
                            'Content_Rating_15+_-_Teens_15_or_older':'Content_Rating_15+',
                            'Content_Rating_18+_Restricted_(violence_&_profanity)':'Content_Rating_18+',
                            'Content_Rating_G_-_All_Ages': 'Content_Rating_G'}, inplace = True)
kdrama_data.columns

Index(['__Business', '__Comedy', '__Crime', '__Crime_', '__Drama', '__Drama_',
       '__Family', '__Family_', '__Fantasy', '__Fantasy_', '__Historical',
       '__Horror', '__Law', '__Life', '__Life_', '__Medical', '__Medical_',
       '__Melodrama', '__Melodrama_', '__Military', '__Mystery',
       '__Political_', '__Psychological', '__Romance', '__Romance_',
       '__School', '__Sci-Fi', '__Sci-Fi_', '__Sports', '__Sports_',
       '__Supernatural', '__Supernatural_', '__Thriller', '__Youth',
       '__Youth_', '_Comedy', '_Crime', '_Drama', '_Drama\r', '_Drama_',
       '_Fantasy', '_Fantasy_', '_Historical', '_Horror', '_Law', '_Life',
       '_Medical', '_Melodrama', '_Melodrama_', '_Military', '_Mystery',
       '_Political', '_Psychological', '_Romance', '_Sci-Fi', '_Sitcom',
       '_Sports', '_Supernatural', '_Thriller', '_Youth', 'Action',
       'Adventure', 'Business', 'Comedy', 'Drama', 'Food', 'Friendship',
       'Historical', 'Horror', 'Law', 'Life', 'Military', 'Musi

In [9]:
# Transform duration to minutes 

kdrama_data["Duration"] = kdrama_data["Duration"].str.replace('hr.', 'h')
kdrama_data["Duration"] = kdrama_data["Duration"].str.replace('min.', 'm')
kdrama_data["Duration"] = kdrama_data["Duration"].str.replace(' ', '')
kdrama_data["Duration_mins"] = pd.to_timedelta(kdrama_data["Duration"])
kdrama_data["Duration_mins"] = kdrama_data["Duration_mins"].dt.total_seconds().div(60)
kdrama_data.head()

KeyError: 'Duration'

In [None]:
# Drop columns that are irrelevant to analysis and model training

kdrama_data = kdrama_data.drop(['Aired_Date', 'Duration', 'Synopsis'], axis=1)
kdrama_data.head()

Unnamed: 0,Name,Year_of_release,Original_Network,Aired_On,Number_of_Episodes,Rating,Genre,Tags,Director,Screenwriter,Cast,Production_companies,Rank,Content_Rating_13+,Content_Rating_15+,Content_Rating_18+,Content_Rating_G,Duration_mins
0,Move to Heaven,2021,Netflix,Friday,10,9.2,"Life, Drama, Family","Autism, Uncle-Nephew Relationship, Death, Sava...",Kim Sung Ho,Yoon Ji Ryun,"Lee Je Hoon, Tang Jun Sang, Hong Seung Hee, Ju...","Page One Film, Number Three Pictures",#1,0,0,1,0,52.0
1,Flower of Evil,2020,tvN,"Wednesday, Thursday",16,9.1,"Thriller, Romance, Crime, Melodrama","Married Couple, Deception, Suspense, Family Se...","Kim Chul Gyu, Yoon Jong Ho",Yoo Jung Hee,"Lee Joon Gi, Moon Chae Won, Jang Hee Jin, Seo ...",Monster Union,#2,0,1,0,0,70.0
2,Hospital Playlist,2020,"Netflix, tvN",Thursday,12,9.1,"Friendship, Romance, Life, Medical","Strong Friendship, Multiple Mains, Best Friend...",Shin Won Ho,Lee Woo Jung,"Jo Jung Suk, Yoo Yeon Seok, Jung Kyung Ho, Kim...","Egg Is Coming, CJ ENM",#3,0,1,0,0,90.0
3,Hospital Playlist 2,2021,"Netflix, tvN",Thursday,12,9.1,"Friendship, Romance, Life, Medical","Workplace, Strong Friendship, Best Friends, Mu...",Shin Won Ho,Lee Woo Jung,"Jo Jung Suk, Yoo Yeon Seok, Jung Kyung Ho, Kim...","Egg Is Coming, CJ ENM",#4,0,1,0,0,100.0
4,My Mister,2018,tvN,"Wednesday, Thursday",16,9.1,"Psychological, Life, Drama, Family","Age Gap, Nice Male Lead, Strong Female Lead, H...","Kim Won Suk, Kim Sang Woo",Park Hae Young,"Lee Sun Kyun, IU, Park Ho San, Song Sae Byuk, ...",Chorokbaem Media,#5,0,1,0,0,77.0
