# Top 100 Korean Dramas

top100_kdrama.csv contains top hundreds korean drama according to user on MyDramaList website.   
used beautilfulsoup library to extract the information   

**beautifulsoup : 인터넷 문서의 구조에서 명확한 데이터를 추출하고 처리하는 가장 쉬운 라이브러리(웹 크롤링할 때)**

# 변수 설명

**Name**: Korean drama name  
**Year of release**: Release year of the drama  
**Aired Date**: Aired Date (start) - (end)  
**Aired On**: Aired on what day(s) of the week  
**Number of Episode**: How many episodes are there  
**Network**: What Network is the drama aired on  
**Duration**: How long is one episode approximately  
**Content Rating**: Content raet for appropirate audience  
**Synopsis**: Short story of the drama  
**Genre**: Genre that the drama is listed in  
**Tags**: Tags that the drama is listed in  
**Rank**: Ranking on the website  
**Rating**: Rating by the users on the website out of ten  

# 라이브러리 설치

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns 

import plotly.express as px
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objects as go
init_notebook_mode(connected=True)

import warnings
warnings.filterwarnings("ignore")

# 데이터 불러오기

In [2]:
kdrama = pd.read_csv('top100_kdrama.csv')

In [3]:
kdrama.head()

Unnamed: 0,Name,Year of release,Aired Date,Aired On,Number of Episode,Network,Duration,Content Rating,Synopsis,Cast,Genre,Tags,Rank,Rating
0,Move to Heaven,2021,"May 14, 2021",Friday,10,"Netflix, Netflix, Netflix, Netflix",52 min.,18+ Restricted (violence & profanity),Geu Roo is a young autistic man. He works for ...,"Lee Je Hoon, Tang Jun Sang, Hong Seung Hee, Ju...","Life, Drama, Family","Autism, Father-Son Relationship, Uncle-Nephew ...",#1,9.2
1,Hospital Playlist,2020,"Mar 12, 2020 - May 28, 2020",Thursday,12,"tvN, Netflix, Netflix, Netflix, Netflix",1 hr. 30 min.,15+ - Teens 15 or older,The stories of people going through their days...,"Jo Jung Suk, Yoo Yeon Seok, Jung Kyung Ho, Kim...","Friendship, Romance, Life, Medical","Strong Friendship, Doctor, Multiple Mains, Slo...",#2,9.1
2,Flower of Evil,2020,"Jul 29, 2020 - Sep 23, 2020","Wednesday, Thursday",16,tvN,1 hr. 10 min.,15+ - Teens 15 or older,Although Baek Hee Sung is hiding a dark secret...,"Lee Joon Gi, Moon Chae Won, Jang Hee Jin, Seo ...","Thriller, Romance, Crime, Melodrama","Married Couple, Deception, Suspense, Family Se...",#3,9.1
3,My Mister,2018,"Mar 21, 2018 - May 17, 2018","Wednesday, Thursday",16,tvN,1 hr. 17 min.,15+ - Teens 15 or older,Park Dong Hoon is a middle-aged engineer who i...,"Lee Sun Kyun, IU, Park Ho San, Song Sae Byuk, ...","Business, Psychological, Life, Drama, Family","Nice Male Lead, Strong Female Lead, Hardship, ...",#4,9.1
4,Prison Playbook,2017,"Nov 22, 2017 - Jan 18, 2018","Wednesday, Thursday",16,"tvN, Netflix, Netflix, Netflix, Netflix",1 hr. 32 min.,15+ - Teens 15 or older,"Kim Je Hyuk, a famous baseball player, is arre...","Park Hae Soo, Jung Kyung Ho, Krystal, Im Hwa Y...","Comedy, Life, Drama","Prison, Bromance, Wrongfully Accused, Life Les...",#5,9.1


In [4]:
kdrama.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Name               100 non-null    object 
 1   Year of release    100 non-null    int64  
 2   Aired Date         100 non-null    object 
 3   Aired On           100 non-null    object 
 4   Number of Episode  100 non-null    int64  
 5   Network            100 non-null    object 
 6   Duration           100 non-null    object 
 7   Content Rating     100 non-null    object 
 8   Synopsis           100 non-null    object 
 9   Cast               100 non-null    object 
 10  Genre              100 non-null    object 
 11  Tags               100 non-null    object 
 12  Rank               100 non-null    object 
 13  Rating             100 non-null    float64
dtypes: float64(1), int64(2), object(11)
memory usage: 11.1+ KB


In [5]:
kdrama.isna().sum()

Name                 0
Year of release      0
Aired Date           0
Aired On             0
Number of Episode    0
Network              0
Duration             0
Content Rating       0
Synopsis             0
Cast                 0
Genre                0
Tags                 0
Rank                 0
Rating               0
dtype: int64

In [7]:
kdrama.columns

Index(['Name', 'Year of release', 'Aired Date', 'Aired On',
       'Number of Episode', 'Network', 'Duration', 'Content Rating',
       'Synopsis', 'Cast', 'Genre', 'Tags', 'Rank', 'Rating'],
      dtype='object')

# Released year and aired on

In [8]:
kdrama.groupby('Year of release').size().reset_index().rename(columns = {0:'Count'})

fig = px.bar(data_frame = kdrama.groupby('Year of release').size().reset_index().rename(columns = {0:'Count'}),
              x = 'Year of release',
              y = 'Count')

colors = ['DarkSalmon'] * 12
colors[-2] = 'DarkSeaGreen'
colors [-3], colors[-5] = 'MediumOrchid','MediumOrchid'

fig.update_traces(marker_color = colors)

fig.update_layout(title = {'text':'Korean Drama released by year',
                           'font_size':20},
                  font = dict(family = "Driod Sans Mono, monospace",
                              size = 15,
                              color = 'black'))

fig.show()


In [9]:
fig = px.bar(data_frame = kdrama.groupby(['Year of release','Aired On']).size().reset_index().rename(columns = {0:'Count'}),
             x = 'Year of release',
             y = 'Count',
             color = 'Aired On',
             barmode = 'stack',
             color_discrete_sequence=px.colors.qualitative.Pastel)

fig.update_layout(title = {'text':'Korean Drama relased by Year and Aired On',
                           'y' : 0.95,
                           'x' : 0.45,
                           'xanchor' : 'center',
                           'yanchor' : 'top',
                           'font_family': 'Gravity One, monospace',
                           'font_color' :'black',
                           'font_size': 20},
                  legend_title = 'Aired On (day of week)',
                  font = dict(family = 'Courier New, monospace',
                              size = 15,
                              color = 'midnightblue'
                  ))

fig.show()


# Number of episodes

In [10]:
num_episode = kdrama['Number of Episode'].value_counts().reset_index().rename(columns={'Number of Episode':'Count','index':'Num Ep'})

# num_episode['Num Ep'] = num_episode['Num Ep'].apply(lambda s: f"#ep {s}")

fig = px.pie(data_frame = num_episode,
             values = 'Count', names = 'Num Ep',
             color_discrete_sequence = px.colors.qualitative.Safe)

fig.update_traces(textposition = 'inside',
                  textinfo = 'label+percent',
                  pull = [0.05] * num_episode['Num Ep'].nunique(),
                  insidetextorientation='horizontal')

fig.update_layout(title = 'Distribution of Number of Episodes among Top 100',
                  legend_title = 'Number of Episode',
                  uniformtext_minsize = 13,
                  uniformtext_mode = 'hide',
                  font = dict(family = 'Courier New, monospace',
                              size = 15,
                              color = 'black')
                  )

fig.show()

In [11]:
fig = px.bar(data_frame = num_episode,
             x = 'Num Ep', y = 'Count',
             title = 'Number of Episode Distribution')

fig.update_layout(xaxis_title = 'Number of Episode')

fig.update_xaxes(type='category')

fig.show()

# Network

In [12]:
kdrama['Network'].value_counts()

tvN                                                       18
SBS                                                       18
tvN,  Netflix,  Netflix,  Netflix,  Netflix               14
MBC                                                       10
KBS2                                                      10
jTBC                                                       9
OCN                                                        7
Netflix,  Netflix,  Netflix,  Netflix                      4
jTBC,  Netflix,  Netflix,  Netflix,  Netflix               3
jTBC,  Viki                                                2
Netflix                                                    1
OCN,  Netflix,  Netflix,  Netflix,  Netflix                1
SBS,  Netflix,  Netflix,  Netflix,  Netflix                1
Daum Kakao TV,  Netflix,  Netflix,  Netflix,  Netflix      1
KBS2,  Netflix,  Netflix,  Netflix,  Netflix               1
Name: Network, dtype: int64

In [13]:
# This is NOT the most efficient way of doing this feature modification
# Since there aren't many unique value, this method (possibly) quickest and easiest to understand

def unique_network(networks):
    if networks == 'Netflix,  Netflix,  Netflix,  Netflix ':
        return 'Netflix'
    elif networks == 'tvN,  Netflix,  Netflix,  Netflix,  Netflix ':
        return 'Netflix, tvN'
    elif networks == 'OCN,  Netflix,  Netflix,  Netflix,  Netflix ':
        return 'Netflix, OCN'
    elif networks == 'jTBC,  Netflix,  Netflix,  Netflix,  Netflix ':
        return 'Netflix, jTBC'
    elif networks == 'KBS2,  Netflix,  Netflix,  Netflix,  Netflix ':
        return 'Netflix, KNS2'
    elif networks == 'SBS,  Netflix,  Netflix,  Netflix,  Netflix ':
        return 'Netflix, SBS'
    elif networks == 'Daum Kakao TV,  Netflix,  Netflix,  Netflix,  Netflix ':
        return 'Netflix, Daum Kakao TV'
    else:
        return networks
    
kdrama['Network'] = kdrama['Network'].apply(lambda networks: unique_network(networks))
kdrama['Network'].value_counts()

tvN                       18
SBS                       18
Netflix, tvN              14
MBC                       10
KBS2                      10
jTBC                       9
OCN                        7
Netflix                    4
Netflix, jTBC              3
jTBC,  Viki                2
Netflix                    1
Netflix, KNS2              1
Netflix, OCN               1
Netflix, SBS               1
Netflix, Daum Kakao TV     1
Name: Network, dtype: int64

In [14]:
# Counting individual network
from collections import Counter

network_list = []
for networks in kdrama['Network'].to_list():
    networks = networks.strip().split(", ")
    for network in networks:
        network_list.append(network)
        
network_df = pd.DataFrame.from_dict(Counter(network_list),orient='index').rename(columns={0:'Count'})
network_df.sort_values(by='Count',ascending = False,inplace = True)
network_df

Unnamed: 0,Count
tvN,32
Netflix,26
SBS,19
jTBC,14
KBS2,10
MBC,10
OCN,8
Viki,2
KNS2,1
Daum Kakao TV,1


In [15]:
fig = px.bar(data_frame = network_df,
             x = network_df.index,
             y = 'Count')

fig.update_layout(title = "Distribution of Korean Drama on different Networks",
                  xaxis_title = 'Network')

fig.show()

In [16]:
fig = px.pie(data_frame = network_df,
             values = 'Count',
             names = network_df.index,
             color_discrete_sequence = px.colors.qualitative.Prism)

fig.update_traces(textposition ='inside',
                  textinfo = 'label+percent',
                  pull = [0.05] * len(network_df.index.to_list()),
                  insidetextorientation='horizontal')

fig.update_layout(paper_bgcolor = 'white',
                  title = 'Network Distribution',
                  legend_title = 'Network',
                  uniformtext_minsize=18,
                  uniformtext_mode='hide',
                  font = dict(
                      family = "Courier New, monospace",
                      size = 18,
                      color = 'black'
                  ))

fig.show()


# Duration of each episode

In [17]:
duration_df = kdrama['Duration'].value_counts().reset_index().rename(columns = {'Duration':'Count','index':'Duration'})

fig = px.bar(data_frame = duration_df.head(10),
             x = 'Duration',
             y = 'Count')

fig.update_layout(title = 'Kdrama Duration Distribution')

fig.show()

# Content Rating

In [18]:
fig = px.bar(data_frame = kdrama['Content Rating'].value_counts().reset_index().rename(columns = {'Content Rating':'Count','index':'Con. Rate'}),
             x = 'Con. Rate',
             y = 'Count')

fig.show()


In [19]:
# Content Rating by year
fig = px.bar(data_frame = kdrama.groupby(['Year of release','Content Rating']).size().reset_index().rename(columns = {0:'Count'}),
             x = 'Year of release',
             y = 'Count',
             color = 'Content Rating',
             barmode = 'stack',
             color_discrete_sequence=px.colors.qualitative.Pastel_r)

fig.update_layout(title = {'text':'Korean Drama relased by Year and Content Rating',
                           'y' : 0.95,
                           'x' : 0.45,
                           'xanchor' : 'center',
                           'yanchor' : 'top',
                           'font_family': 'Gravity One, monospace',
                           'font_color' :'black',
                           'font_size': 20},
                  legend_title = 'Content Rating',
                  font = dict(family = 'Courier New, monospace',
                              size = 15,
                              color = 'midnightblue'
                  ))

fig.show()

# Genres and Tags

In [20]:
# Individual Genre
kdrama['Genre'] = kdrama['Genre'].str.strip()

genre_list = list()
for genres in kdrama['Genre'].to_list():
    genres = genres.split(",  ")
    for gen in genres:
        genre_list.append(gen)
        
genre_df = pd.DataFrame.from_dict(Counter(genre_list), orient = 'index').rename(columns = {0:'Count'})
genre_df.sort_values(by='Count',ascending = False, inplace = True)
genre_df.head()

Unnamed: 0,Count
Drama,71
Romance,53
Comedy,42
Thriller,32
Mystery,30


In [21]:
fig = px.bar(data_frame = genre_df,
             x = genre_df.index,
             y = 'Count')

fig.update_layout(title = 'Genre Distribution',
                  xaxis_title = 'Genre')

fig.show()

In [22]:
# Individuals Tags
tags_list = list()

for tags in kdrama['Tags'].to_list():
    tags = tags.split(", ")
    for tag in tags:
        tags_list.append(tag)

tags_df = pd.DataFrame.from_dict(Counter(tags_list), orient = 'index').rename(columns = {0:'Count'})
tags_df.sort_values(by='Count',ascending = False, inplace = True)
tags_df.head()

Unnamed: 0,Count
Strong Female Lead,41
Smart Female Lead,23
Smart Male Lead,20
Murder,17
Bromance,17


In [23]:
fig = px.bar(data_frame = tags_df.head(10),
             x = tags_df.iloc[:10].index,
             y = 'Count')

fig.update_layout(title = 'Top 10 Tags Distribution',
                  xaxis_title = 'Tags')

fig.show()

# Drama rating

In [24]:
# Max and Min Rating in the Top 100 Korean Drama
kdrama['Rating'].max(), kdrama['Rating'].min()

(9.2, 8.5)

In [25]:
# Best Drama Rating
kdrama[kdrama['Rating'] == kdrama['Rating'].max()]

Unnamed: 0,Name,Year of release,Aired Date,Aired On,Number of Episode,Network,Duration,Content Rating,Synopsis,Cast,Genre,Tags,Rank,Rating
0,Move to Heaven,2021,"May 14, 2021",Friday,10,Netflix,52 min.,18+ Restricted (violence & profanity),Geu Roo is a young autistic man. He works for ...,"Lee Je Hoon, Tang Jun Sang, Hong Seung Hee, Ju...","Life, Drama, Family","Autism, Father-Son Relationship, Uncle-Nephew ...",#1,9.2


In [26]:
kdrama[kdrama['Rating'] == kdrama['Rating'].min()].head()

Unnamed: 0,Name,Year of release,Aired Date,Aired On,Number of Episode,Network,Duration,Content Rating,Synopsis,Cast,Genre,Tags,Rank,Rating
79,Missing,2020,"Aug 29, 2020 - Oct 11, 2020","Saturday, Sunday",12,OCN,1 hr. 10 min.,15+ - Teens 15 or older,"A village holds spirits of missing, deceased p...","Go Soo, Heo Joon Ho, Ahn So Hee, Ha Joon, Seo ...","Suspense, Thriller, Mystery, Crime, Fantasy","Con Artist, Ghost-seeing Male Lead, Spirit, Mi...",#80,8.5
80,Kairos,2020,"Oct 26, 2020 - Dec 22, 2020","Monday, Tuesday",16,MBC,1 hr. 10 min.,15+ - Teens 15 or older,Living a precarious life as a part-timer at a ...,"Shin Sung Rok, Lee Se Young, Ahn Bo Hyun, Nam ...","Thriller, Drama, Sci-Fi, Fantasy","Time Altering, Past And Present, Hardworking M...",#81,8.5
81,Once Again,2020,"Mar 28, 2020 - Sep 13, 2020","Saturday, Sunday",100,KBS2,35 min.,15+ - Teens 15 or older,"""Once Again"" is the story of the eventful Song...","Chun Ho Jin, Cha Hwa Yun, Lee Jung Eun, Oh Dae...","Business, Comedy, Romance, Life, Drama, F...","Divorce, Family Relationship, Multiple Couples...",#82,8.5
82,The World of the Married,2020,"Mar 27, 2020 - May 16, 2020","Friday, Saturday",16,jTBC,1 hr. 20 min.,18+ Restricted (violence & profanity),Everything seems perfect in the life of the su...,"Kim Hee Ae, Park Hae Joon, Han So Hee, Park Su...","Romance, Drama, Family, Melodrama","Infidelity, Extramarital Affair, Betrayal, Mak...",#83,8.5
83,Extracurricular,2020,"Apr 29, 2020",Wednesday,10,Netflix,60 min.,18+ Restricted (violence & profanity),"""Extracurricular” is centered around four high...","Kim Dong Hee, Park Joo Hyun, Jung Da Bin, Nam ...","Psychological, Crime, Life, School, Youth,...","Prostitution, Morally Ambiguous Lead, Illegal ...",#84,8.5


# Actors and actresses in the drama

In [27]:
cast_list = list()

for casts in kdrama['Cast'].to_list():
    casts = casts.split(", ")
    for a in casts:
        cast_list.append(a)
        
cast_df = pd.DataFrame.from_dict(Counter(cast_list),orient = 'index').rename(columns = {0:'Appearance'})
cast_df.sort_values(by='Appearance',ascending = False,inplace = True)
cast_df.head()

Unnamed: 0,Appearance
Song Joong Ki,5
Lee Joon Hyuk,5
Kim Ji Won,5
Jung Kyung Ho,4
Bae Doo Na,4


# Korean drama recommendation system

In [28]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

features = ['Duration','Synopsis','Cast','Genre','Tags']
kdrama['Number of Episode'] = kdrama['Number of Episode'].astype(str)

# kdrama['combined_features'] = kdrama['Duration'] + " " + kdrama['Synopsis'] + " " + kdrama['Cast'] + " " + kdrama['Genre'] + " " + kdrama['Tags'] + " " + kdrama['Number of Episode'] + " " + kdrama['Content Rating']

kdrama['combined_features'] = kdrama['Synopsis'] + " " + kdrama['Genre'] + " " + kdrama['Tags']

cv = CountVectorizer()
count_matrix = cv.fit_transform(kdrama['combined_features'])
cosine_sim = cosine_similarity(count_matrix)

In [29]:
# Function for movie recommendation
def kdrama_recommendation(mov,sim_num = 5):

    user_choice = mov
    
    try:
        ref_index = kdrama[kdrama['Name'].str.contains(user_choice, case = False)].index[0]

        similar_movies = list(enumerate(cosine_sim[ref_index]))

        sorted_simmilar_movies = sorted(similar_movies, key = lambda x: x[1], reverse = True)[1:]

        print('\nRecomended K Drama for [{}]'.format(user_choice))
        print('-'*(24 + len(user_choice)))

        for i, element in enumerate(sorted_simmilar_movies):
            similar_movie_id = element[0]
            similar_movie_title = kdrama['Name'].iloc[similar_movie_id]
            s_score = element[1]
            print('{:40} -> {:.3f}'.format(similar_movie_title, s_score))

            if i > sim_num:
                break
    except IndexError:
        print("\n[{}] is not in our database!".format(user_choice))
        print("We couldn't recommend anyting...Sorry...")

In [30]:
# Search for movie with the keyword
def kdrama_available(key):
    
    keyword = key
    
    print("Movie with keyword: [{}]".format(keyword))
    
    for i, mov in enumerate(kdrama[kdrama['Name'].str.contains(keyword)]['Name'].to_list()):
        print("{}) {} ".format(i+1,mov))

In [31]:
kdrama_available('It')

Movie with keyword: [It]
1) It's Okay to Not Be Okay 
2) It's Okay, That's Love 
3) Itaewon Class 


In [32]:
kdrama_recommendation("It's Okay to Not Be Okay")


Recomended K Drama for [It's Okay to Not Be Okay]
------------------------------------------------
My Father is Strange                     -> 0.369
Vincenzo                                 -> 0.355
Search                                   -> 0.341
Mother                                   -> 0.337
I Hear Your Voice                        -> 0.331
My Mister                                -> 0.327
Dr. Romantic                             -> 0.325


In [33]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

tfdif_vector = TfidfVectorizer(stop_words = 'english')

tfidf_matrix = tfdif_vector.fit_transform(kdrama['Synopsis'])

sim_matrix = linear_kernel(tfidf_matrix, tfidf_matrix)

indicies = pd.Series(kdrama.index, index = kdrama['Name']).drop_duplicates()

In [34]:
def content_based_recommender(title, sim_scores = sim_matrix):
    idx = indicies[title]
    
    sim_scores = list(enumerate(sim_matrix[idx]))
    
    sim_scores = sorted(sim_scores, key = lambda x : x[1], reverse = True)
    
    sim_scores = sim_scores[1:11]
    
    drama_score = list()
    for score in sim_scores:
        drama_score.append(score[1])
    
    kdrama_indices = [i[0] for i in sim_scores]
    
    kdrama_name = kdrama['Name'].iloc[kdrama_indices]
    
    print('\nRecomended KDrama for [{}]'.format(title))
    print('-'*(24 + len(title)))
    for score,name in list(zip(drama_score,kdrama_name)):
        print("{:30} -> {:.3f}".format(name,score))

In [35]:
content_based_recommender("It's Okay to Not Be Okay")


Recomended KDrama for [It's Okay to Not Be Okay]
------------------------------------------------
While You Were Sleeping        -> 0.140
Youth of May                   -> 0.067
Healer                         -> 0.064
Stranger 2                     -> 0.059
Children of Nobody             -> 0.049
The Fiery Priest               -> 0.042
Mr. Queen                      -> 0.041
Kill Me, Heal Me               -> 0.040
Dear My Friends                -> 0.038
Strangers from Hell            -> 0.036


In [36]:
# nltk
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\hyeri\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\hyeri\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger.zip.
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\hyeri\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\wordnet.zip.


True

In [37]:
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

from nltk.corpus import stopwords
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

VERB_CODES = {'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ'}

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\hyeri\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.


In [38]:
def preprocess_sentences(text):
    text = text.lower()
    temp_sent =[]
    words = nltk.word_tokenize(text)
    tags = nltk.pos_tag(words)
    for i, word in enumerate(words):
        if tags[i][1] in VERB_CODES: 
              lemmatized = lemmatizer.lemmatize(word, 'v')
        else:
              lemmatized = lemmatizer.lemmatize(word)
        if lemmatized not in stop_words and lemmatized.isalpha():
              temp_sent.append(lemmatized)

    finalsent = ' '.join(temp_sent)
    finalsent = finalsent.replace("n't", " not")
    finalsent = finalsent.replace("'m", " am")
    finalsent = finalsent.replace("'s", " is")
    finalsent = finalsent.replace("'re", " are")
    finalsent = finalsent.replace("'ll", " will")
    finalsent = finalsent.replace("'ve", " have")
    finalsent = finalsent.replace("'d", " would")
    return finalsent

kdrama_copy = kdrama.copy()
kdrama_copy['synopsis_processed'] = kdrama_copy['Synopsis'].apply(preprocess_sentences)
kdrama_copy['synopsis_processed'].head()

0    geu roo young autistic man work father busines...
1    story people go day seemingly ordinary actuall...
2    although baek hee sung hide dark secret surrou...
3    park dong hoon engineer marry attorney kang yo...
4    kim je hyuk famous baseball player arrest use ...
Name: synopsis_processed, dtype: object

In [39]:
tfdifvec = TfidfVectorizer()
tfdif_drama_processed = tfdifvec.fit_transform((kdrama_copy['synopsis_processed']))

co_sin_drama = cosine_similarity(tfdif_drama_processed,tfdif_drama_processed)

In [40]:
# Storing indices of the data
indices = pd.Series(kdrama_copy['Name'])
  
def recommendations(title, cosine_sim = co_sin_drama):
    recommended_movies = []
    index = indices[indices == title].index[0]
    similarity_scores = pd.Series(cosine_sim[index]).sort_values(ascending = False)
    top_10_movies = list(similarity_scores.iloc[1:11].index)
    for i in top_10_movies:
        recommended_movies.append(list(kdrama_copy['Name'].index)[i])
    
    for index in recommended_movies:
        print(kdrama_copy.iloc[index]['Name'])

In [41]:
recommendations("It's Okay to Not Be Okay")

While You Were Sleeping
The Fiery Priest
Kill Me, Heal Me
Children of Nobody
Stranger 2
Youth of May
Healer
Dear My Friends
The World of the Married
My Love from the Star
