#### When I was doing the data cleansing, it came to my mind that it might be interesting to convert movie genres into column title and fill it into binary data. By doing so, it will allow further analysis on user behavior based on their watching history. Mind you that I could do the same with director/cast, but decided not to do so as I could already hear my laptop fan whirring -- begging me to stop this project already. I promise I will replace you one day my beloved Vaio 2013, but that day is not today.

In [11]:
import requests
import pandas as pd
import time
import datetime
import numpy as np
import matplotlib

In [12]:
df = pd.read_csv('netflix_titles_rev.csv')

In [13]:
df.head(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in.1,listed_in.2,listed_in.3,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020.0,PG-13,90 min,Documentaries,,,As her father nears the end of his life/ filmm...
1,s2,TV Show,Blood & Water,,Ama Qamata/ Khosi Ngema/ Gail Mabalane/ Thaban...,South Africa,2021-09-24,2021.0,TV-MA,2 Seasons,International TV Shows,TV Dramas,TV Mysteries,After crossing paths at a party/ a Cape Town t...
2,s3,TV Show,Ganglands,Julien Leclercq,Sami Bouajila/ Tracy Gotoas/ Samuel Jouy/ Nabi...,,2021-09-24,2021.0,TV-MA,1 Season,Crime TV Shows,International TV Shows,TV Action & Adventure,To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021.0,TV-MA,1 Season,Docuseries,Reality TV,,Feuds/ flirtations and toilet talk go down amo...
4,s5,TV Show,Kota Factory,,Mayur More/ Jitendra Kumar/ Ranjan Raj/ Alam K...,India,2021-09-24,2021.0,TV-MA,2 Seasons,International TV Shows,Romantic TV Shows,TV Comedies,In a city of coaching centers known to train I...


In [14]:
df[['listed_in.1', 'listed_in.2', 'listed_in.3']] = df[['listed_in.1', 'listed_in.2', 'listed_in.3']].fillna('NA')

In [15]:
df.head(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in.1,listed_in.2,listed_in.3,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020.0,PG-13,90 min,Documentaries,,,As her father nears the end of his life/ filmm...
1,s2,TV Show,Blood & Water,,Ama Qamata/ Khosi Ngema/ Gail Mabalane/ Thaban...,South Africa,2021-09-24,2021.0,TV-MA,2 Seasons,International TV Shows,TV Dramas,TV Mysteries,After crossing paths at a party/ a Cape Town t...
2,s3,TV Show,Ganglands,Julien Leclercq,Sami Bouajila/ Tracy Gotoas/ Samuel Jouy/ Nabi...,,2021-09-24,2021.0,TV-MA,1 Season,Crime TV Shows,International TV Shows,TV Action & Adventure,To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021.0,TV-MA,1 Season,Docuseries,Reality TV,,Feuds/ flirtations and toilet talk go down amo...
4,s5,TV Show,Kota Factory,,Mayur More/ Jitendra Kumar/ Ranjan Raj/ Alam K...,India,2021-09-24,2021.0,TV-MA,2 Seasons,International TV Shows,Romantic TV Shows,TV Comedies,In a city of coaching centers known to train I...


In [16]:
# select only the 'genre1', 'genre2', and 'genre3' columns
df_genres = df.loc[:, ['listed_in.1', 'listed_in.2', 'listed_in.3']]

# get a list of unique genres across all three columns
unique_genres = pd.unique(df_genres.values.ravel())

# create a new dataframe to hold the binary values
df_binary = pd.DataFrame(columns=unique_genres)

# iterate over each row in the original dataframe
for index, row in df.iterrows():
    # create a new dictionary to hold the binary values for this row
    row_dict = {}
    # iterate over each unique genre and set its value to 1 or 0
    for genre in unique_genres:
        if genre in row.values:
            row_dict[genre] = 1
        else:
            row_dict[genre] = 0
    # add the row to the binary dataframe
    df_binary = df_binary.append(row_dict, ignore_index=True)

# combine the original dataframe with the binary dataframe
df_combined = pd.concat([df, df_binary], axis=1)

In [17]:
df_combined.head(5)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,...,Science & Nature TV,Teen TV Shows,Cult Movies,TV Shows,Faith & Spirituality,LGBTQ Movies,Stand-Up Comedy,Movies,Stand-Up Comedy & Talk Shows,Classic & Cult TV
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,2021-09-25,2020.0,PG-13,90 min,...,0,0,0,0,0,0,0,0,0,0
1,s2,TV Show,Blood & Water,,Ama Qamata/ Khosi Ngema/ Gail Mabalane/ Thaban...,South Africa,2021-09-24,2021.0,TV-MA,2 Seasons,...,0,0,0,0,0,0,0,0,0,0
2,s3,TV Show,Ganglands,Julien Leclercq,Sami Bouajila/ Tracy Gotoas/ Samuel Jouy/ Nabi...,,2021-09-24,2021.0,TV-MA,1 Season,...,0,0,0,0,0,0,0,0,0,0
3,s4,TV Show,Jailbirds New Orleans,,,,2021-09-24,2021.0,TV-MA,1 Season,...,0,0,0,0,0,0,0,0,0,0
4,s5,TV Show,Kota Factory,,Mayur More/ Jitendra Kumar/ Ranjan Raj/ Alam K...,India,2021-09-24,2021.0,TV-MA,2 Seasons,...,0,0,0,0,0,0,0,0,0,0


In [18]:
df_combined.to_csv('netflix_title_binary.csv', index=False)