## Scrapping for Streamlit App (continued)

In this notebook, I will be scrapping even more information after some testing of a streamlit app prototype.

### Streamlit App Prototype
I created a prototype app for the recommender system.  
The user searches for and inputs any number of their favorite manga titles via a multiselect box.  
These selected manga titles are then given the maximum rating of 10.  
Recommendations are generated through my machine learning model and displayed to the user.

### Genres
After trying out the streamlit app prototype, I found that even though the genres were displayed in the MyAnimeList(MAL) link, it might be better to display them in the app's results as well.  
By having the genres on hand in my data, I will also be able to allow the user to filter the recommendations based on their preferred genres.  
For example, even if the user inputs multiple Action titles, they will be able to filter the recommendations to show Mystery titles only.  
Lastly, having the genres on hand allows me to set a filter for if the user wishes to see adult recommendations.

### Alternate Titles
Another point of improvement I found from the streamlit app prototype is that I should include alternate titles of the manga titles.  
Manga titles tend to be known by many names other than their English title. E.g. Japanese title, Romanized title etc.  
This will help users to quickly find and input the titles they want.

In [6]:
import requests
import pandas as pd
from tqdm import tqdm
from jikanpy import Jikan
import time

In [7]:
all_info_df = pd.read_csv('../data/all_info_final_2.csv')

In [8]:
all_info_df = all_info_df[['search title','result title','mal_id','url','image','synopsis']]

In [9]:
#likely to contain duplicates as I only have 2.9K titles
all_info_df.shape

(3062, 6)

In [10]:
all_info_df = all_info_df.drop_duplicates(subset = 'result title')
all_info_df.shape

(2906, 6)

In [11]:
extra_features_df = all_info_df[['mal_id']]
extra_features_df.shape

(2906, 1)

In [12]:
#create features in the data frame to store new information
features = ['title_english','title_japanese','title_synonyms', 'genres_1', 'genres_2', 'genres_3', 'genres_4', 'genres_5', 'explicit_genres_1', 'explicit_genres_2', 'explicit_genres_3', 'explicit_genres_4', 'explicit_genres_5', 'demographics_1', 'demographics_2', 'demographics_3', 'themes_1', 'themes_2', 'themes_3']
for feature in features:
    extra_features_df[feature] = 'nil'
extra_features_df.shape

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  extra_features_df[feature] = 'nil'


(2906, 20)

In [106]:
for mal_id in tqdm(extra_features_df['mal_id'].unique().tolist()):
    try:
        res = requests.get(f'https://api.jikan.moe/v4/manga/{mal_id}')
        result = res.json() 
            

        extra_features_df.loc[extra_features_df['mal_id'] == mal_id, ['title_english']] = result['data']['title_english']
        extra_features_df.loc[extra_features_df['mal_id'] == mal_id, ['title_japanese']] = result['data']['title_japanese']
        synonyms = ''
        for name in result['data']['title_synonyms']:
            synonyms = synonyms + '/' + name
            extra_features_df.loc[extra_features_df['mal_id'] == mal_id, ['title_synonyms']] = synonyms

        for i in range(len(result['data']['genres'])):
            extra_features_df.loc[extra_features_df['mal_id'] == mal_id, [f'genres_{i+1}']] = result['data']['genres'][i]['name']


        for i in range(len(result['data']['explicit_genres'])):
            extra_features_df.loc[extra_features_df['mal_id'] == mal_id, [f'explicit_genres_{i+1}']] = result['data']['explicit_genres'][i]['name']

        for i in range(len(result['data']['demographics'])):
            extra_features_df.loc[extra_features_df['mal_id'] == mal_id, [f'demographics_{i+1}']] = result['data']['demographics'][i]['name']

        for i in range(len(result['data']['themes'])):
            extra_features_df.loc[extra_features_df['mal_id'] == mal_id, [f'themes_{i+1}']] = result['data']['themes'][i]['name']
         
        extra_features_df.to_csv('extra_features_df.csv', index = False)
        
    except: 
        continue

100%|██████████████████████████████████████████████████████████████████████████████| 2906/2906 [52:42<00:00,  1.09s/it]


In [13]:
extra_features_df = pd.read_csv('../data/extra_features_df.csv')
extra_features_df.head()

Unnamed: 0,mal_id,title_english,title_japanese,title_synonyms,genres_1,genres_2,genres_3,genres_4,genres_5,explicit_genres_1,...,demographics_1,demographics_2,demographics_3,themes_1,themes_2,themes_3,genres_6,genres_7,genres_8,themes_4
0,3,20th Century Boys,20世紀少年,/20 Seiki Shounen/Nijuu Seiki Shounen/Nijuusse...,Drama,Mystery,Sci-Fi,nil,nil,nil,...,Seinen,nil,nil,Historical,Psychological,nil,,,,
1,1101,,アクメツ,nil,Action,Drama,Suspense,nil,nil,nil,...,Shounen,nil,nil,Police,Psychological,nil,,,,
2,141583,,アヤシモン,nil,Action,Supernatural,nil,nil,nil,nil,...,Shounen,nil,nil,nil,nil,nil,,,,
3,135496,Dandadan,ダンダダン,nil,Action,Comedy,Sci-Fi,Supernatural,nil,nil,...,Shounen,nil,nil,nil,nil,nil,,,,
4,112318,Hell's Paradise: Jigokuraku,地獄楽,/Paradition/Heavenhell,Action,nil,nil,nil,nil,nil,...,Shounen,nil,nil,Historical,nil,nil,,,,


In [14]:
#check for and fill null values
extra_features_df['title_english'].isnull().sum()

1027

In [15]:
extra_features_df['title_english'].fillna(value='nil',inplace = True)

In [16]:
extra_features_df['title_english'].isnull().sum()

0

In [17]:
extra_features_df['title_japanese'].isnull().sum()

8

In [18]:
extra_features_df['title_japanese'].fillna(value='nil',inplace = True)

In [19]:
extra_features_df['title_japanese'].isnull().sum()

0

In [20]:
extra_features_df['title_synonyms'].isnull().sum()

0

In [21]:
extra_features_df.to_csv('../data/extra_features_df_clean.csv', index = False)

In [23]:
extra_features_df = pd.read_csv('../data/extra_features_df_clean.csv')
extra_features_df.head()

Unnamed: 0,mal_id,title_english,title_japanese,title_synonyms,genres_1,genres_2,genres_3,genres_4,genres_5,explicit_genres_1,...,demographics_1,demographics_2,demographics_3,themes_1,themes_2,themes_3,genres_6,genres_7,genres_8,themes_4
0,3,20th Century Boys,20世紀少年,/20 Seiki Shounen/Nijuu Seiki Shounen/Nijuusse...,Drama,Mystery,Sci-Fi,nil,nil,nil,...,Seinen,nil,nil,Historical,Psychological,nil,,,,
1,1101,nil,アクメツ,nil,Action,Drama,Suspense,nil,nil,nil,...,Shounen,nil,nil,Police,Psychological,nil,,,,
2,141583,nil,アヤシモン,nil,Action,Supernatural,nil,nil,nil,nil,...,Shounen,nil,nil,nil,nil,nil,,,,
3,135496,Dandadan,ダンダダン,nil,Action,Comedy,Sci-Fi,Supernatural,nil,nil,...,Shounen,nil,nil,nil,nil,nil,,,,
4,112318,Hell's Paradise: Jigokuraku,地獄楽,/Paradition/Heavenhell,Action,nil,nil,nil,nil,nil,...,Shounen,nil,nil,Historical,nil,nil,,,,


In [24]:
all_info_df.columns.unique()

Index(['search title', 'result title', 'mal_id', 'url', 'image', 'synopsis'], dtype='object')

In [25]:
#merge new information with existing information
combined_all_info_df = pd.merge(left = all_info_df, right = extra_features_df, how = 'right', on = 'mal_id')
combined_all_info_df.head()

Unnamed: 0,search title,result title,mal_id,url,image,synopsis,title_english,title_japanese,title_synonyms,genres_1,...,demographics_1,demographics_2,demographics_3,themes_1,themes_2,themes_3,genres_6,genres_7,genres_8,themes_4
0,20th Century Boys,20th Century Boys,3,https://myanimelist.net/manga/3/20th_Century_Boys,https://cdn.myanimelist.net/images/manga/1/544...,"As the 20th century approaches its end, people...",20th Century Boys,20世紀少年,/20 Seiki Shounen/Nijuu Seiki Shounen/Nijuusse...,Drama,...,Seinen,nil,nil,Historical,Psychological,nil,,,,
1,Akumetsu,Akumetsu,1101,https://myanimelist.net/manga/1101/Akumetsu,https://cdn.myanimelist.net/images/manga/2/183...,"Due to an economic downturn plaguing Japan, Sh...",nil,アクメツ,nil,Action,...,Shounen,nil,nil,Police,Psychological,nil,,,,
2,Ayashimon,Ayashimon,141583,https://myanimelist.net/manga/141583/Ayashimon,https://cdn.myanimelist.net/images/manga/1/256...,"Years ago, the death of the infamous Enma Synd...",nil,アヤシモン,nil,Action,...,Shounen,nil,nil,nil,nil,nil,,,,
3,Dandadan,Dandadan,135496,https://myanimelist.net/manga/135496/Dandadan,https://cdn.myanimelist.net/images/manga/2/248...,"After being aggressively rejected, Momo Ayase ...",Dandadan,ダンダダン,nil,Action,...,Shounen,nil,nil,nil,nil,nil,,,,
4,Jigokuraku,Jigokuraku,112318,https://myanimelist.net/manga/112318/Jigokuraku,https://cdn.myanimelist.net/images/manga/2/208...,"Gabimaru the Hollow, a ninja of Iwagakure Vill...",Hell's Paradise: Jigokuraku,地獄楽,/Paradition/Heavenhell,Action,...,Shounen,nil,nil,Historical,nil,nil,,,,


In [26]:
combined_all_info_df.shape

(2906, 29)

In [27]:
#check that the 'result title's are exactly the same as what I searched via 'search title'
combined_all_info_df[combined_all_info_df['search title'] != combined_all_info_df['result title']]

Unnamed: 0,search title,result title,mal_id,url,image,synopsis,title_english,title_japanese,title_synonyms,genres_1,...,demographics_1,demographics_2,demographics_3,themes_1,themes_2,themes_3,genres_6,genres_7,genres_8,themes_4


In [28]:
#as 'search title's are the same as 'result title's, drop 'search title'
combined_all_info_df.drop(columns = 'search title', inplace = True)
combined_all_info_df.head()

Unnamed: 0,result title,mal_id,url,image,synopsis,title_english,title_japanese,title_synonyms,genres_1,genres_2,...,demographics_1,demographics_2,demographics_3,themes_1,themes_2,themes_3,genres_6,genres_7,genres_8,themes_4
0,20th Century Boys,3,https://myanimelist.net/manga/3/20th_Century_Boys,https://cdn.myanimelist.net/images/manga/1/544...,"As the 20th century approaches its end, people...",20th Century Boys,20世紀少年,/20 Seiki Shounen/Nijuu Seiki Shounen/Nijuusse...,Drama,Mystery,...,Seinen,nil,nil,Historical,Psychological,nil,,,,
1,Akumetsu,1101,https://myanimelist.net/manga/1101/Akumetsu,https://cdn.myanimelist.net/images/manga/2/183...,"Due to an economic downturn plaguing Japan, Sh...",nil,アクメツ,nil,Action,Drama,...,Shounen,nil,nil,Police,Psychological,nil,,,,
2,Ayashimon,141583,https://myanimelist.net/manga/141583/Ayashimon,https://cdn.myanimelist.net/images/manga/1/256...,"Years ago, the death of the infamous Enma Synd...",nil,アヤシモン,nil,Action,Supernatural,...,Shounen,nil,nil,nil,nil,nil,,,,
3,Dandadan,135496,https://myanimelist.net/manga/135496/Dandadan,https://cdn.myanimelist.net/images/manga/2/248...,"After being aggressively rejected, Momo Ayase ...",Dandadan,ダンダダン,nil,Action,Comedy,...,Shounen,nil,nil,nil,nil,nil,,,,
4,Jigokuraku,112318,https://myanimelist.net/manga/112318/Jigokuraku,https://cdn.myanimelist.net/images/manga/2/208...,"Gabimaru the Hollow, a ninja of Iwagakure Vill...",Hell's Paradise: Jigokuraku,地獄楽,/Paradition/Heavenhell,Action,nil,...,Shounen,nil,nil,Historical,nil,nil,,,,


In [29]:
#rename 'result title' to just 'title'
combined_all_info_df.rename(columns = {'result title':'title'}, inplace = True)
combined_all_info_df.head()

Unnamed: 0,title,mal_id,url,image,synopsis,title_english,title_japanese,title_synonyms,genres_1,genres_2,...,demographics_1,demographics_2,demographics_3,themes_1,themes_2,themes_3,genres_6,genres_7,genres_8,themes_4
0,20th Century Boys,3,https://myanimelist.net/manga/3/20th_Century_Boys,https://cdn.myanimelist.net/images/manga/1/544...,"As the 20th century approaches its end, people...",20th Century Boys,20世紀少年,/20 Seiki Shounen/Nijuu Seiki Shounen/Nijuusse...,Drama,Mystery,...,Seinen,nil,nil,Historical,Psychological,nil,,,,
1,Akumetsu,1101,https://myanimelist.net/manga/1101/Akumetsu,https://cdn.myanimelist.net/images/manga/2/183...,"Due to an economic downturn plaguing Japan, Sh...",nil,アクメツ,nil,Action,Drama,...,Shounen,nil,nil,Police,Psychological,nil,,,,
2,Ayashimon,141583,https://myanimelist.net/manga/141583/Ayashimon,https://cdn.myanimelist.net/images/manga/1/256...,"Years ago, the death of the infamous Enma Synd...",nil,アヤシモン,nil,Action,Supernatural,...,Shounen,nil,nil,nil,nil,nil,,,,
3,Dandadan,135496,https://myanimelist.net/manga/135496/Dandadan,https://cdn.myanimelist.net/images/manga/2/248...,"After being aggressively rejected, Momo Ayase ...",Dandadan,ダンダダン,nil,Action,Comedy,...,Shounen,nil,nil,nil,nil,nil,,,,
4,Jigokuraku,112318,https://myanimelist.net/manga/112318/Jigokuraku,https://cdn.myanimelist.net/images/manga/2/208...,"Gabimaru the Hollow, a ninja of Iwagakure Vill...",Hell's Paradise: Jigokuraku,地獄楽,/Paradition/Heavenhell,Action,nil,...,Shounen,nil,nil,Historical,nil,nil,,,,


In [30]:
combined_all_info_df.columns

Index(['title', 'mal_id', 'url', 'image', 'synopsis', 'title_english',
       'title_japanese', 'title_synonyms', 'genres_1', 'genres_2', 'genres_3',
       'genres_4', 'genres_5', 'explicit_genres_1', 'explicit_genres_2',
       'explicit_genres_3', 'explicit_genres_4', 'explicit_genres_5',
       'demographics_1', 'demographics_2', 'demographics_3', 'themes_1',
       'themes_2', 'themes_3', 'genres_6', 'genres_7', 'genres_8', 'themes_4'],
      dtype='object')

In [52]:
def combined_title(row):  
    if row['title_english'] != 'nil':
        if row['title'] != row['title_english']:
            row['title_check'] = str(row['title']) + '/' + str(row['title_english'])
        else:
            row['title_check'] = str(row['title'])
    else:
        row['title_check'] = str(row['title'])
    
    if row['title_japanese'] == 'nil':
        row['title_japanese_check'] = 'nil'
    else:
        row['title_japanese_check'] = str(row['title_japanese'])
    
    if row['title_synonyms'] == 'nil':
        row['title_synonyms_check'] = 'nil'
    else:
        row['title_synonyms_check'] = str(row['title_synonyms'])
    

    if row['title_japanese_check'] == 'nil' and row['title_synonyms_check'] == 'nil':
        return str(row['title_check'])
    elif row['title_japanese_check'] != 'nil' and row['title_synonyms_check'] == 'nil': 
        return str(row['title_check']) + '/' + str(row['title_japanese_check'])
    elif row['title_japanese_check'] == 'nil' and row['title_synonyms_check'] != 'nil':
        return str(row['title_check']) + str(row['title_synonyms_check'])
    else:
        return str(row['title_check']) + '/' + str(row['title_japanese_check']) + str(row['title_synonyms_check'])

In [53]:
combined_all_info_df['combined_title'] = combined_all_info_df.apply(lambda row: combined_title(row), axis=1)
combined_all_info_df.head(10)

Unnamed: 0,title,mal_id,url,image,synopsis,title_english,title_japanese,title_synonyms,genres_1,genres_2,...,demographics_2,demographics_3,themes_1,themes_2,themes_3,genres_6,genres_7,genres_8,themes_4,combined_title
0,20th Century Boys,3,https://myanimelist.net/manga/3/20th_Century_Boys,https://cdn.myanimelist.net/images/manga/1/544...,"As the 20th century approaches its end, people...",20th Century Boys,20世紀少年,/20 Seiki Shounen/Nijuu Seiki Shounen/Nijuusse...,Drama,Mystery,...,nil,nil,Historical,Psychological,nil,,,,,20th Century Boys/20世紀少年/20 Seiki Shounen/Niju...
1,Akumetsu,1101,https://myanimelist.net/manga/1101/Akumetsu,https://cdn.myanimelist.net/images/manga/2/183...,"Due to an economic downturn plaguing Japan, Sh...",nil,アクメツ,nil,Action,Drama,...,nil,nil,Police,Psychological,nil,,,,,Akumetsu/アクメツ
2,Ayashimon,141583,https://myanimelist.net/manga/141583/Ayashimon,https://cdn.myanimelist.net/images/manga/1/256...,"Years ago, the death of the infamous Enma Synd...",nil,アヤシモン,nil,Action,Supernatural,...,nil,nil,nil,nil,nil,,,,,Ayashimon/アヤシモン
3,Dandadan,135496,https://myanimelist.net/manga/135496/Dandadan,https://cdn.myanimelist.net/images/manga/2/248...,"After being aggressively rejected, Momo Ayase ...",Dandadan,ダンダダン,nil,Action,Comedy,...,nil,nil,nil,nil,nil,,,,,Dandadan/ダンダダン
4,Jigokuraku,112318,https://myanimelist.net/manga/112318/Jigokuraku,https://cdn.myanimelist.net/images/manga/2/208...,"Gabimaru the Hollow, a ninja of Iwagakure Vill...",Hell's Paradise: Jigokuraku,地獄楽,/Paradition/Heavenhell,Action,nil,...,nil,nil,Historical,nil,nil,,,,,Jigokuraku/Hell's Paradise: Jigokuraku/地獄楽/Par...
5,Last Game,30315,https://myanimelist.net/manga/30315/Last_Game,https://cdn.myanimelist.net/images/manga/5/657...,"Nothing is beyond Naoto Yanagi, heir to the Ya...",nil,ラストゲーム,/Kimi to/Shiawase./Hidamari no Niwa/Wasureyuki...,Comedy,Romance,...,nil,nil,nil,nil,nil,,,,,Last Game/ラストゲーム/Kimi to/Shiawase./Hidamari no...
6,Shuumatsu no Harem,98752,https://myanimelist.net/manga/98752/Shuumatsu_...,https://cdn.myanimelist.net/images/manga/3/183...,"Diagnosed with multiple sclerosis, young resea...",World's End Harem,終末のハーレム,/Shuumatsu no Harem: After World/World's End H...,Sci-Fi,Ecchi,...,nil,nil,Harem,nil,nil,,,,,Shuumatsu no Harem/World's End Harem/終末のハーレム/S...
7,Solo Leveling,121496,https://myanimelist.net/manga/121496/Solo_Leve...,https://cdn.myanimelist.net/images/manga/3/222...,"Ten years ago, ""the Gate"" appeared and connect...",nil,nil,nil,nil,nil,...,nil,nil,nil,nil,nil,,,,,Solo Leveling
8,Vagabond,656,https://myanimelist.net/manga/656/Vagabond,https://cdn.myanimelist.net/images/manga/2/181...,"In 16th-century Japan, Shinmen Takezou is a wi...",Vagabond,バガボンド,nil,Action,Adventure,...,nil,nil,Historical,Samurai,nil,,,,,Vagabond/バガボンド
9,Wind Breaker,133081,https://myanimelist.net/manga/133081/Wind_Breaker,https://cdn.myanimelist.net/images/manga/1/244...,,nil,nil,nil,nil,nil,...,nil,nil,nil,nil,nil,,,,,Wind Breaker


In [54]:
combined_all_info_df.to_csv('../data/combined_all_info_df.csv', index = False)

In [55]:
#currently, the genres are stored under columns called 'genres_1', 'genres_2' etc., define a function to combine them
def combined_genres(row):  
    output = ''
    for i in range(1,9):
        output = output + str(row[f'genres_{i}']) + ' '
    return output        

In [56]:
combined_all_info_df['combined_genres'] = combined_all_info_df.apply(lambda row: combined_genres(row), axis=1)
combined_all_info_df.head()

Unnamed: 0,title,mal_id,url,image,synopsis,title_english,title_japanese,title_synonyms,genres_1,genres_2,...,demographics_3,themes_1,themes_2,themes_3,genres_6,genres_7,genres_8,themes_4,combined_title,combined_genres
0,20th Century Boys,3,https://myanimelist.net/manga/3/20th_Century_Boys,https://cdn.myanimelist.net/images/manga/1/544...,"As the 20th century approaches its end, people...",20th Century Boys,20世紀少年,/20 Seiki Shounen/Nijuu Seiki Shounen/Nijuusse...,Drama,Mystery,...,nil,Historical,Psychological,nil,,,,,20th Century Boys/20世紀少年/20 Seiki Shounen/Niju...,Drama Mystery Sci-Fi nil nil nan nan nan
1,Akumetsu,1101,https://myanimelist.net/manga/1101/Akumetsu,https://cdn.myanimelist.net/images/manga/2/183...,"Due to an economic downturn plaguing Japan, Sh...",nil,アクメツ,nil,Action,Drama,...,nil,Police,Psychological,nil,,,,,Akumetsu/アクメツ,Action Drama Suspense nil nil nan nan nan
2,Ayashimon,141583,https://myanimelist.net/manga/141583/Ayashimon,https://cdn.myanimelist.net/images/manga/1/256...,"Years ago, the death of the infamous Enma Synd...",nil,アヤシモン,nil,Action,Supernatural,...,nil,nil,nil,nil,,,,,Ayashimon/アヤシモン,Action Supernatural nil nil nil nan nan nan
3,Dandadan,135496,https://myanimelist.net/manga/135496/Dandadan,https://cdn.myanimelist.net/images/manga/2/248...,"After being aggressively rejected, Momo Ayase ...",Dandadan,ダンダダン,nil,Action,Comedy,...,nil,nil,nil,nil,,,,,Dandadan/ダンダダン,Action Comedy Sci-Fi Supernatural nil nan nan ...
4,Jigokuraku,112318,https://myanimelist.net/manga/112318/Jigokuraku,https://cdn.myanimelist.net/images/manga/2/208...,"Gabimaru the Hollow, a ninja of Iwagakure Vill...",Hell's Paradise: Jigokuraku,地獄楽,/Paradition/Heavenhell,Action,nil,...,nil,Historical,nil,nil,,,,,Jigokuraku/Hell's Paradise: Jigokuraku/地獄楽/Par...,Action nil nil nil nil nan nan nan


In [57]:
#create a list of all unique genres present in data
all_genres = []
for i in range(1,9):
    for genre in combined_all_info_df[f'genres_{i}'].unique().tolist():
        if genre not in all_genres:
            all_genres.append(genre)
all_genres

['Drama',
 'Action',
 'Comedy',
 'Sci-Fi',
 'nil',
 'Girls Love',
 'Avant Garde',
 'Horror',
 'Supernatural',
 'Fantasy',
 'Adventure',
 'Romance',
 'Gourmet',
 'Ecchi',
 'Sports',
 'Boys Love',
 'Mystery',
 'Slice of Life',
 'Hentai',
 'Suspense',
 'Erotica',
 nan]

**Observation:** 'Erotica' and 'Hentai' are adult categories, to set them under the adult content filter.

In [58]:
#create a new column for each genre
for genre in all_genres:
    combined_all_info_df[genre] = 0
combined_all_info_df.head()

Unnamed: 0,title,mal_id,url,image,synopsis,title_english,title_japanese,title_synonyms,genres_1,genres_2,...,Gourmet,Ecchi,Sports,Boys Love,Mystery,Slice of Life,Hentai,Suspense,Erotica,NaN
0,20th Century Boys,3,https://myanimelist.net/manga/3/20th_Century_Boys,https://cdn.myanimelist.net/images/manga/1/544...,"As the 20th century approaches its end, people...",20th Century Boys,20世紀少年,/20 Seiki Shounen/Nijuu Seiki Shounen/Nijuusse...,Drama,Mystery,...,0,0,0,0,0,0,0,0,0,0
1,Akumetsu,1101,https://myanimelist.net/manga/1101/Akumetsu,https://cdn.myanimelist.net/images/manga/2/183...,"Due to an economic downturn plaguing Japan, Sh...",nil,アクメツ,nil,Action,Drama,...,0,0,0,0,0,0,0,0,0,0
2,Ayashimon,141583,https://myanimelist.net/manga/141583/Ayashimon,https://cdn.myanimelist.net/images/manga/1/256...,"Years ago, the death of the infamous Enma Synd...",nil,アヤシモン,nil,Action,Supernatural,...,0,0,0,0,0,0,0,0,0,0
3,Dandadan,135496,https://myanimelist.net/manga/135496/Dandadan,https://cdn.myanimelist.net/images/manga/2/248...,"After being aggressively rejected, Momo Ayase ...",Dandadan,ダンダダン,nil,Action,Comedy,...,0,0,0,0,0,0,0,0,0,0
4,Jigokuraku,112318,https://myanimelist.net/manga/112318/Jigokuraku,https://cdn.myanimelist.net/images/manga/2/208...,"Gabimaru the Hollow, a ninja of Iwagakure Vill...",Hell's Paradise: Jigokuraku,地獄楽,/Paradition/Heavenhell,Action,nil,...,0,0,0,0,0,0,0,0,0,0


In [59]:
#set genre value to 1 if manga title has that genre, if not, it stays as 0
for ind in combined_all_info_df.index:
     for genre in all_genres:
            if str(genre) in str(combined_all_info_df['combined_genres'][ind]):
                combined_all_info_df[genre][ind] = 1
combined_all_info_df.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  combined_all_info_df[genre][ind] = 1


Unnamed: 0,title,mal_id,url,image,synopsis,title_english,title_japanese,title_synonyms,genres_1,genres_2,...,Gourmet,Ecchi,Sports,Boys Love,Mystery,Slice of Life,Hentai,Suspense,Erotica,NaN
0,20th Century Boys,3,https://myanimelist.net/manga/3/20th_Century_Boys,https://cdn.myanimelist.net/images/manga/1/544...,"As the 20th century approaches its end, people...",20th Century Boys,20世紀少年,/20 Seiki Shounen/Nijuu Seiki Shounen/Nijuusse...,Drama,Mystery,...,0,0,0,0,1,0,0,0,0,1
1,Akumetsu,1101,https://myanimelist.net/manga/1101/Akumetsu,https://cdn.myanimelist.net/images/manga/2/183...,"Due to an economic downturn plaguing Japan, Sh...",nil,アクメツ,nil,Action,Drama,...,0,0,0,0,0,0,0,1,0,1
2,Ayashimon,141583,https://myanimelist.net/manga/141583/Ayashimon,https://cdn.myanimelist.net/images/manga/1/256...,"Years ago, the death of the infamous Enma Synd...",nil,アヤシモン,nil,Action,Supernatural,...,0,0,0,0,0,0,0,0,0,1
3,Dandadan,135496,https://myanimelist.net/manga/135496/Dandadan,https://cdn.myanimelist.net/images/manga/2/248...,"After being aggressively rejected, Momo Ayase ...",Dandadan,ダンダダン,nil,Action,Comedy,...,0,0,0,0,0,0,0,0,0,1
4,Jigokuraku,112318,https://myanimelist.net/manga/112318/Jigokuraku,https://cdn.myanimelist.net/images/manga/2/208...,"Gabimaru the Hollow, a ninja of Iwagakure Vill...",Hell's Paradise: Jigokuraku,地獄楽,/Paradition/Heavenhell,Action,nil,...,0,0,0,0,0,0,0,0,0,1


In [222]:
combined_all_info_df.head()

Unnamed: 0,title,mal_id,url,image,synopsis,title_english,title_japanese,title_synonyms,genres_1,genres_2,...,Gourmet,Ecchi,Sports,Boys Love,Mystery,Slice of Life,Hentai,Suspense,Erotica,NaN
0,20th Century Boys,3,https://myanimelist.net/manga/3/20th_Century_Boys,https://cdn.myanimelist.net/images/manga/1/544...,"As the 20th century approaches its end, people...",20th Century Boys,20世紀少年,/20 Seiki Shounen/Nijuu Seiki Shounen/Nijuusse...,Drama,Mystery,...,0,0,0,0,1,0,0,0,0,1
1,Akumetsu,1101,https://myanimelist.net/manga/1101/Akumetsu,https://cdn.myanimelist.net/images/manga/2/183...,"Due to an economic downturn plaguing Japan, Sh...",nil,アクメツ,nil,Action,Drama,...,0,0,0,0,0,0,0,1,0,1
2,Ayashimon,141583,https://myanimelist.net/manga/141583/Ayashimon,https://cdn.myanimelist.net/images/manga/1/256...,"Years ago, the death of the infamous Enma Synd...",nil,アヤシモン,nil,Action,Supernatural,...,0,0,0,0,0,0,0,0,0,1
3,Dandadan,135496,https://myanimelist.net/manga/135496/Dandadan,https://cdn.myanimelist.net/images/manga/2/248...,"After being aggressively rejected, Momo Ayase ...",Dandadan,ダンダダン,nil,Action,Comedy,...,0,0,0,0,0,0,0,0,0,1
4,Jigokuraku,112318,https://myanimelist.net/manga/112318/Jigokuraku,https://cdn.myanimelist.net/images/manga/2/208...,"Gabimaru the Hollow, a ninja of Iwagakure Vill...",Hell's Paradise: Jigokuraku,地獄楽,/Paradition/Heavenhell,Action,nil,...,0,0,0,0,0,0,0,0,0,1


In [60]:
#filter away irrelevant columns
combined_all_info_df = combined_all_info_df[['title','mal_id','url','image','synopsis','demographics_1', 'demographics_2', 'demographics_3', 'themes_1','themes_2','themes_3','combined_title','Drama','Action', 'Comedy','Sci-Fi', 'Girls Love','Avant Garde','Horror','Supernatural','Fantasy','Adventure','Romance','Gourmet', 'Ecchi','Sports','Boys Love','Mystery','Slice of Life','Hentai','Suspense','Erotica']]

In [224]:
combined_all_info_df.head()

Unnamed: 0,title,mal_id,url,image,synopsis,demographics_1,demographics_2,demographics_3,themes_1,themes_2,...,Romance,Gourmet,Ecchi,Sports,Boys Love,Mystery,Slice of Life,Hentai,Suspense,Erotica
0,20th Century Boys,3,https://myanimelist.net/manga/3/20th_Century_Boys,https://cdn.myanimelist.net/images/manga/1/544...,"As the 20th century approaches its end, people...",Seinen,nil,nil,Historical,Psychological,...,0,0,0,0,0,1,0,0,0,0
1,Akumetsu,1101,https://myanimelist.net/manga/1101/Akumetsu,https://cdn.myanimelist.net/images/manga/2/183...,"Due to an economic downturn plaguing Japan, Sh...",Shounen,nil,nil,Police,Psychological,...,0,0,0,0,0,0,0,0,1,0
2,Ayashimon,141583,https://myanimelist.net/manga/141583/Ayashimon,https://cdn.myanimelist.net/images/manga/1/256...,"Years ago, the death of the infamous Enma Synd...",Shounen,nil,nil,nil,nil,...,0,0,0,0,0,0,0,0,0,0
3,Dandadan,135496,https://myanimelist.net/manga/135496/Dandadan,https://cdn.myanimelist.net/images/manga/2/248...,"After being aggressively rejected, Momo Ayase ...",Shounen,nil,nil,nil,nil,...,0,0,0,0,0,0,0,0,0,0
4,Jigokuraku,112318,https://myanimelist.net/manga/112318/Jigokuraku,https://cdn.myanimelist.net/images/manga/2/208...,"Gabimaru the Hollow, a ninja of Iwagakure Vill...",Shounen,nil,nil,Historical,nil,...,0,0,0,0,0,0,0,0,0,0


In [61]:
combined_all_info_df.to_csv('../data/combined_all_info_df_clean.csv', index = False)