# Anime Classifier with Naïve Bayes
## Goal
Classify animes using their synopses into genres.
## Process
* Fetch anime data using an API
* Parse this data into class instances to facilitate processing
* Create a dataframe using our data, adding, for each word of the vocabulary, its number of occurences in each synopsis
* Implement the Multinomial Naïve Bayes algorithm -> calculate constants + classify
* Compute success rate for each genre
* Improve the classification system to improve the success rate

## Fetching Anime Data using an API
The API we will use is [Jikan](https://jikan.docs.apiary.io/), an unofficial API for the website [MyAnimeList](https://myanimelist.net/).
It does no require any authentication.

The only type of request we will use fetches the most popular animes of a specific genre, divided by pages of 99 animes ([doc](https://jikan.docs.apiary.io/#reference/0/genre)).


In [None]:
# gets the first page of the most popular animes of genre of id 8 (Drama).
# then converts it to a python dictionary
request.get("https://api.jikan.moe/v3/genre/anime/8/1").json()

In [1]:
import requests
import json
import pandas as pd
import time
import re

In [2]:
#get one anime of id 1
response = requests.get("https://api.jikan.moe/v3/anime/1").json()
#get the anime of anime of id 1 page 1 (99 animes?)
"https://api.jikan.moe/v3/genre/anime/1/1"

'https://api.jikan.moe/v3/genre/anime/1/1'

In [87]:
genres_name = {
    4: "Comedy",
    8: "Drama"
}
# genres gathered by requesting 500+ animes
genres = ['Romance',
 'Historical',
 'Space',
 'Cars',
 'Game',
 'Supernatural',
 'Fantasy',
 'Psychological',
 'Sci-Fi',
 'Seinen',
 'Parody',
 'Drama',
 'Action',
 'Josei',
 'Police',
 'Super Power',
 'Sports',
 'Military',
 'Demons',
 'Vampire',
 'Adventure',
 'Shoujo Ai',
 'Mecha',
 'Shounen',
 'Horror',
 'Kids',
 'Dementia',
 'Samurai',
 'Shounen Ai',
 'Slice of Life',
 'Comedy',
 'Magic',
 'Shoujo',
 'Mystery',
 'Music',
 'Thriller',
 'Martial Arts',
 'School',
 'Harem',
 'Ecchi']


In [5]:
response["anime"][0]

{'mal_id': 16498,
 'url': 'https://myanimelist.net/anime/16498/Shingeki_no_Kyojin',
 'title': 'Shingeki no Kyojin',
 'image_url': 'https://cdn.myanimelist.net/images/anime/10/47347.jpg',
 'synopsis': "Centuries ago, mankind was slaughtered to near extinction by monstrous humanoid creatures called titans, forcing humans to hide in fear behind enormous concentric walls. What makes these giants truly terrifying is that their taste for human flesh is not born out of hunger but what appears to be out of pleasure. To ensure their survival, the remnants of humanity began living within defensive barriers, resulting in one hundred years without a single titan encounter. However, that fragile calm is soon shattered when a colossal titan manages to breach the supposedly impregnable outer wall, reigniting the fight for survival against the man-eating abominations.\r\n\r\nAfter witnessing a horrific personal loss at the hands of the invading creatures, Eren Yeager dedicates his life to their eradic

In [109]:
# We get genres as a list of dictionaries, each dictionary having several keys.
# The name of the genre is the only one we are intersted in.
def simplify_genres(genres):
    clean_genres = []
    for genre in genres:
        clean_genres.append(genre["name"])
    return clean_genres

# takes in the response of the request to get animes of a genre, and returns a list of anime objects
def simplify(animes_of_genre_json):
    animes_json = animes_of_genre_json["anime"]
    animes_objs = []
    for anime_json in animes_json:
        animes_objs.append(Anime(anime_json))
    return animes_objs


# we will ignore the words that refer to genres or column names in our synopses to avoid conflicts
banned_words = set([genre.lower() for genre in genres] + ["title", "synopsis", "genres"]) 
# in order to exclude meaningless words, we only keep words of length 5 or more.
def is_valid(word):
    if word in banned_words or len(word) <= 5: 
        return False
    return True

# formats a string by removing ponctuation, 'written by mal rewrite', extra spaces, lowering it, making a list of valid words
def clean_synopsis(string):
    words = re.sub('\W', ' ', string).replace("written by mal rewrite", "").strip().lower().split()
    valid_words = [word for word in words if is_valid(word)]
    return valid_words

# We create our own structure to simplify the program and only work with the data we need
class Anime:
    
    def __init__(self, anime_from_json):
        self.title = anime_from_json["title"]
        self.synopsis = clean_synopsis(anime_from_json["synopsis"])
        self.genres = simplify_genres(anime_from_json["genres"])

        
# Returns a df from anime objects
def get_df(animes):
    titles = []
    synopses = []
    genres = []
    for anime in animes:
        titles.append(anime.title)
        synopses.append(anime.synopsis)
        genres.append(anime.genres)
    return pd.DataFrame(data = {"title": titles, "synopsis": synopses, "genres": genres})

In [112]:
anime_objs = simplify(response)
df = get_df(anime_objs)

In [58]:
# get request to get the first page of animes of a genre
def request_genre(genre_id, page=1):
    url = "https://api.jikan.moe/v3/genre/anime/" + str(genre_id) + "/" + str(page)
    return requests.get(url).json()

# remove duplicates
def remove_duplicates(animes_list):
    singleton = []
    anime_titles = set()
    for i, anime in enumerate(animes_list):
        if anime.title not in anime_titles:
            anime_titles.add(anime.title)
            singleton.append(anime)
    return singleton

# gets us all the anime of all specified genre (page 1) as a dataframe
def fetch_data(genres_dictionary, pages_per_genre=3):
    anime_groups = []
    for genre_id in genres_dictionary.keys():
        for page in range(1, pages_per_genre + 1):
            anime_groups.append(simplify(request_genre(genre_id, page)))
            time.sleep(4)    # as we request a lot of data each request, we wait a large amount of time to not flood the API
        
    animes = []    
    for group in anime_groups:
        for anime in group:
            animes.append(anime)
    animes = remove_duplicates(animes)
    return get_df(animes)      

In [113]:
animes = fetch_data(genres_name)
print(animes.shape)
animes.head()

(549, 3)


Unnamed: 0,title,synopsis,genres
0,Fullmetal Alchemist: Brotherhood,"[something, obtained, something, alchemy, equi...","[Action, Military, Adventure, Comedy, Drama, M..."
1,One Punch Man,"[seemingly, ordinary, unimpressive, saitama, r...","[Action, Sci-Fi, Comedy, Parody, Super Power, ..."
2,No Game No Life,"[surreal, follows, siblings, online, behind, l...","[Game, Adventure, Comedy, Supernatural, Ecchi,..."
3,Naruto,"[moments, naruto, uzumaki, kyuubi, tailed, att...","[Action, Adventure, Comedy, Super Power, Marti..."
4,Boku no Hero Academia,"[appearance, quirks, discovered, powers, stead...","[Action, Comedy, School, Shounen, Super Power]"


In [117]:
# create a column per genre, a set it to TRUE if the anime is of this genre, FALSE otherwise
def clean_genres(df, genres):
    for genre in genres:
        df[genre] = df["genres"].apply(lambda genre_list: genre in genre_list)

clean_genres(animes, genres)
animes.head()

Unnamed: 0,title,synopsis,genres,Romance,Historical,Space,Cars,Game,Supernatural,Fantasy,...,Comedy,Magic,Shoujo,Mystery,Music,Thriller,Martial Arts,School,Harem,Ecchi
0,Fullmetal Alchemist: Brotherhood,"[something, obtained, something, alchemy, equi...","[Action, Military, Adventure, Comedy, Drama, M...",False,False,False,False,False,False,True,...,True,True,False,False,False,False,False,False,False,False
1,One Punch Man,"[seemingly, ordinary, unimpressive, saitama, r...","[Action, Sci-Fi, Comedy, Parody, Super Power, ...",False,False,False,False,False,True,False,...,True,False,False,False,False,False,False,False,False,False
2,No Game No Life,"[surreal, follows, siblings, online, behind, l...","[Game, Adventure, Comedy, Supernatural, Ecchi,...",False,False,False,False,True,True,True,...,True,False,False,False,False,False,False,False,False,True
3,Naruto,"[moments, naruto, uzumaki, kyuubi, tailed, att...","[Action, Adventure, Comedy, Super Power, Marti...",False,False,False,False,False,False,False,...,True,False,False,False,False,False,True,False,False,False
4,Boku no Hero Academia,"[appearance, quirks, discovered, powers, stead...","[Action, Comedy, School, Shounen, Super Power]",False,False,False,False,False,False,False,...,True,False,False,False,False,False,False,True,False,False


## Training & Test Set

In [118]:
# Randomize the dataset
data_randomized = animes.sample(frac=1, random_state=1)

# Calculate index for split
training_test_index = round(len(data_randomized) * 0.8)

# Training/Test split
training_set = data_randomized[:training_test_index].reset_index(drop=True)
test_set = data_randomized[training_test_index:].reset_index(drop=True)

print("Training shape: " + str(training_set.shape))
print("Test shape: " + str(test_set.shape))

Training shape: (439, 43)
Test shape: (110, 43)


In [119]:
# create a list of all the words present in synopses
def get_vocabulary(series):
    vocab = []
    for synopsis in series:
        for word in synopsis:
            vocab.append(word)
    return list(set(vocab))

vocabulary = get_vocabulary(training_set["synopsis"])
len(vocabulary)

6876

### Adding word occurences count columns
For each word, we will add a column counting the occurences of this word in the synopsis of each anime

In [120]:
# Creates an initial dictionary associating words with a list of n 0, n beign the number of rows (animes)
word_counts_per_synopsis = {unique_word: [0] * len(training_set["synopsis"]) for unique_word in vocabulary}

# We then populate our dictionary with the number of occurences
for index, synopsis in enumerate(training_set["synopsis"]):
    for word in synopsis:
        word_counts_per_synopsis[word][index] += 1

# We finally convert it to a dataframe for concat purposes
word_counts = pd.DataFrame(word_counts_per_synopsis)
word_counts.head()

Unnamed: 0,illegal,masochist,initially,succeeding,scorned,princesses,wannabe,graceful,inhumane,abnormal,...,envelope,corrupted,suspected,asahina,membership,collectors,transgression,illusions,anthropomorphic,kishou
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [122]:
# We can now concatenate our title, synopsis and genres data with our new word occurences data
training_set_clean = pd.concat([training_set, word_counts], axis=1)
training_set_clean.head()

Unnamed: 0,title,synopsis,genres,Romance,Historical,Space,Cars,Game,Supernatural,Fantasy,...,envelope,corrupted,suspected,asahina,membership,collectors,transgression,illusions,anthropomorphic,kishou
0,Rosario to Vampire Capu2,"[tsukune, enrolled, youkai, academy, interesti...","[Comedy, Ecchi, Fantasy, Harem, Romance, Schoo...",True,False,False,False,False,False,True,...,0,0,0,0,0,0,0,0,0,0
1,Akame ga Kill!,"[covert, assassination, branch, revolutionary,...","[Action, Adventure, Drama, Fantasy, Shounen]",False,False,False,False,False,False,True,...,0,0,0,0,0,0,0,0,0,0
2,Kekkai Sensen,"[supersonic, monkeys, vampires, talking, fishm...","[Action, Comedy, Super Power, Supernatural, Va...",False,False,False,False,False,True,True,...,0,0,0,0,0,0,0,0,0,0
3,Made in Abyss,"[gaping, stretching, depths, filled, mysteriou...","[Sci-Fi, Adventure, Mystery, Drama, Fantasy]",False,False,False,False,False,False,True,...,0,0,0,0,0,0,0,0,0,0
4,Saenai Heroine no Sodatekata,"[tomoya, obsessed, collecting, novels, attachi...","[Harem, Comedy, Romance, Ecchi, School]",True,False,False,False,False,False,False,...,0,0,0,0,0,0,0,0,0,0


### Calculating Constants

In [123]:
# P(Genre) -> proba of anime of this genre among all animes
genre_p = {}
for genre in genres:
    genre_p[genre] = (training_set_clean[genre] == True).sum() / len(training_set_clean)

genre_not_p = {}
for genre in genres:
    genre_not_p[genre] = (training_set_clean[genre] == False).sum() / len(training_set_clean)

# Associates genres to the number of words in all synopses of this genre
genre_n_words = {}    
for genre in genres:
    rows_of_this_genre = training_set_clean[training_set_clean[genre] == True]
    genre_n_words[genre] = rows_of_this_genre["synopsis"].apply(len).sum()
    
# Associates genres to the number of words in all synopses not of this genre
total_words = training_set_clean["synopsis"].apply(len).sum()
not_genre_n_words = {}    
for genre in genres:
    not_genre_n_words[genre] = total_words - genre_n_words[genre]
    
alpha = 1
n_vocabulary = len(vocabulary)
genre_n_words
genre_p
genre_n_words

{'Romance': 9530,
 'Historical': 1311,
 'Space': 628,
 'Cars': 58,
 'Game': 490,
 'Supernatural': 5740,
 'Fantasy': 4848,
 'Psychological': 2312,
 'Sci-Fi': 4067,
 'Seinen': 2571,
 'Parody': 1056,
 'Drama': 12543,
 'Action': 7507,
 'Josei': 277,
 'Police': 214,
 'Super Power': 1799,
 'Sports': 1235,
 'Military': 1278,
 'Demons': 928,
 'Vampire': 569,
 'Adventure': 3316,
 'Shoujo Ai': 94,
 'Mecha': 1132,
 'Shounen': 5561,
 'Horror': 1087,
 'Kids': 203,
 'Dementia': 399,
 'Samurai': 369,
 'Shounen Ai': 69,
 'Slice of Life': 4660,
 'Comedy': 14052,
 'Magic': 1706,
 'Shoujo': 1663,
 'Mystery': 3127,
 'Music': 598,
 'Thriller': 961,
 'Martial Arts': 542,
 'School': 7116,
 'Harem': 2607,
 'Ecchi': 2391}

### Calculating Parameters

In [124]:
import time

task_start_time = time.perf_counter()

parameters = {genre: {unique_word:0 for unique_word in vocabulary} for genre in genres}
parameters_no = {genre: {unique_word:0 for unique_word in vocabulary} for genre in genres} # P(not_genre|syno)
for word in vocabulary:
    for genre in genres:
        
        # P(genre|synopsis)
        rows_of_this_genre = training_set_clean[training_set_clean[genre] == True]
        n_word_given_genre = rows_of_this_genre[word].sum()    # number of occurences of the treated word in synopses of this genre
        #print(n_word_given_genre)
        p_word_given_genre = (n_word_given_genre + alpha) / (genre_n_words[genre] + alpha*n_vocabulary)
        parameters[genre][word] = p_word_given_genre
        
        # P(not_genre|synopsis)
        rows_not_of_this_genre = training_set_clean[training_set_clean[genre] == False]
        n_word_given_not_genre = rows_not_of_this_genre[word].sum()
        p_word_given_not_genre = (n_word_given_not_genre + alpha) / (not_genre_n_words[genre] + alpha*n_vocabulary)
        parameters_no[genre][word] = p_word_given_not_genre
        
task_duration = time.perf_counter() - task_start_time
total_operations = n_vocabulary * len(genres)
duration_per_operation = task_duration / total_operations
print("Computing the parameters took " + str(round(task_duration, 2)) + " seconds for " + str(total_operations) + " operations") 
print(str(round(duration_per_operation * 1000, 2)) + " miliseconds per operation")
parameters["Action"]

Computing the parameters took 4847.57 seconds for 275040 operations


TypeError: str() argument 2 must be str, not int

In [214]:
import re

def sort_dictio(d, descending=True):
    return {k: v for k, v in sorted(d.items(), key=lambda item: item[1], reverse = descending)}

def classify(synopsis):    
    # synopsis is already cleaned: it's a list of valid words
    
    p_genre_given_synopsis = {genre: genre_p[genre] for genre in genres} # just a copy of genre_p
    p_not_genre_given_synopsis = {genre: genre_not_p[genre] for genre in genres} # just a copy of genre_p
    for word in synopsis:
        for genre in genres:
            if word in parameters[genre]:
                proba = parameters[genre][word]
                p_genre_given_synopsis[genre] *= proba
            if word in parameters_no[genre]:
                proba = parameters_no[genre][word]
                p_not_genre_given_synopsis[genre] *= proba
                
    return p_genre_given_synopsis, p_not_genre_given_synopsis
 
# Returns a dictionary associating each selected genre (P(genre|synopsis) > P(not_genre|synopsis)) to its confidence
# Confidence translates by how much (%) the proba of being in this genre was superior to not being in this genre given the synopsis 
def extract_best_guesses(classification):
    p, no_p = classification
    genres_classified = {}
    for k, v in p.items():
        if v > no_p[k]:        
            confidence = (v - no_p[k]) / (no_p[k]) * 100   
            genres_classified[k] = confidence
#     genres_selected_by_confidence = list(sort_dictio(genres_classified).keys())
    return sort_dictio(genres_classified)

# returns an array of predicted genres, ordered by confidence
# use it as df["predicted_genres"] = df["synopsis"].apply(get_perdicted_genres)
def get_predicted_genres(synopsis):
    classification = classify(synopsis)
    selected_genres_by_confidence = extract_best_guesses(classification)
    selected_genres_array = list(selected_genres_by_confidence.keys())
    return selected_genres_array

# creates a boolean column per genre as [genre]"_predicted" and populate them with True if the row's anime genre was predicted accurately
def predict_all(df):
    df["predicted_genres"] = df["synopsis"].apply(get_predicted_genres)
    for genre in genres:
        col_name = genre + "_predicted"
        df[col_name] = ((df["predicted_genres"].apply(lambda predicted_genres: genre in predicted_genres)) == (df[genre]))

def compute_accuracy(df):
    # treat an entire row
    # can define accuracy as the number of correct boolean values / len(genres)
    # most animes would have an accuracy of 38/40, 39/40, or even 100%
    # could add other version of accuracy

test_anime = test_set.iloc[19]
print(test_anime["title"])
print(test_anime["genres"])
synopsis = test_anime["synopsis"]
classification = classify(synopsis)
p_to_be_of_genre, p_not_to_be_of_genre = sort_dictio(classification[0]), sort_dictio(classification[1])
print(p_to_be_of_genre)
print(p_not_to_be_of_genre)
extract_best_guesses(classification)

Shingeki no Kyojin Season 3 Part 2
['Action', 'Drama', 'Fantasy', 'Military', 'Mystery', 'Shounen', 'Super Power']
{'Action': 3.389417508553057e-159, 'Shounen': 3.4339254506619817e-161, 'Drama': 5.88871962549744e-162, 'Fantasy': 1.0117146877444363e-162, 'Mystery': 2.865954672045926e-165, 'Super Power': 2.6086125708932476e-165, 'Military': 1.5272561964138218e-166, 'Comedy': 3.938597467320561e-167, 'Supernatural': 5.800247022615439e-168, 'Adventure': 2.2043552514339158e-169, 'Sci-Fi': 1.2725240130911976e-169, 'Romance': 1.4169358156149622e-171, 'Magic': 4.1885671706213995e-172, 'Mecha': 4.5771285642381584e-173, 'Harem': 1.166375122737486e-173, 'School': 9.357247209869944e-174, 'Slice of Life': 7.3307930463688345e-174, 'Psychological': 5.078725677471467e-174, 'Seinen': 1.7015243288252355e-174, 'Ecchi': 1.098478839639352e-174, 'Thriller': 7.396608581564624e-175, 'Demons': 6.426123235158355e-175, 'Vampire': 1.2837000419145246e-175, 'Historical': 2.9235100793072224e-176, 'Horror': 2.37986415

{'Action': 14047825507401.264,
 'Drama': 1075180978.8798532,
 'Shounen': 291946522.7245832,
 'Fantasy': 153873520.4302978,
 'Mystery': 10887.086511430001,
 'Super Power': 7381.023816002799,
 'Military': 230.13279432320832}

In [21]:
test_set.iloc[8][["title", "synopsis", "genres"]]

title                        Boku no Hero Academia 3rd Season
synopsis    [summer, arrives, students, academy, superhero...
genres         [Action, Comedy, School, Shounen, Super Power]
Name: 8, dtype: object

In [None]:
{'mal_id': 16498,
 'url': 'https://myanimelist.net/anime/16498/Shingeki_no_Kyojin',
 'title': 'Shingeki no Kyojin',
 'image_url': 'https://cdn.myanimelist.net/images/anime/10/47347.jpg',
 'synopsis': "Centuries ago, mankind was slaughtered to near extinction by monstrous humanoid creatures called titans, forcing humans to hide in fear behind enormous concentric walls. What makes these giants truly terrifying is that their taste for human flesh is not born out of hunger but what appears to be out of pleasure. To ensure their survival, the remnants of humanity began living within defensive barriers, resulting in one hundred years without a single titan encounter. However, that fragile calm is soon shattered when a colossal titan manages to breach the supposedly impregnable outer wall, reigniting the fight for survival against the man-eating abominations.\r\n\r\nAfter witnessing a horrific personal loss at the hands of the invading creatures, Eren Yeager dedicates his life to their eradication by enlisting into the Survey Corps, an elite military unit that combats the merciless humanoids outside the protection of the walls. Based on Hajime Isayama's award-winning manga, Shingeki no Kyojin follows Eren, along with his adopted sister Mikasa Ackerman and his childhood friend Armin Arlert, as they join the brutal war against the titans and race to discover a way of defeating them before the last walls are breached.\r\n\r\n[Written by MAL Rewrite]",
 'type': 'TV',
 'airing_start': '2013-04-06T16:58:00+00:00',
 'episodes': 25,
 'members': 1808210,
 'genres': [{'mal_id': 1,
   'type': 'anime',
   'name': 'Action',
   'url': 'https://myanimelist.net/anime/genre/1/Action'},
  {'mal_id': 38,
   'type': 'anime',
   'name': 'Military',
   'url': 'https://myanimelist.net/anime/genre/38/Military'},
  {'mal_id': 7,
   'type': 'anime',
   'name': 'Mystery',
   'url': 'https://myanimelist.net/anime/genre/7/Mystery'},
  {'mal_id': 31,
   'type': 'anime',
   'name': 'Super Power',
   'url': 'https://myanimelist.net/anime/genre/31/Super_Power'},
  {'mal_id': 8,
   'type': 'anime',
   'name': 'Drama',
   'url': 'https://myanimelist.net/anime/genre/8/Drama'},
  {'mal_id': 10,
   'type': 'anime',
   'name': 'Fantasy',
   'url': 'https://myanimelist.net/anime/genre/10/Fantasy'},
  {'mal_id': 27,
   'type': 'anime',
   'name': 'Shounen',
   'url': 'https://myanimelist.net/anime/genre/27/Shounen'}],
 'source': 'Manga',
 'producers': [{'mal_id': 858,
   'type': 'anime',
   'name': 'Wit Studio',
   'url': 'https://myanimelist.net/anime/producer/858/Wit_Studio'}],
 'score': 8.45,
 'licensors': ['Funimation'],
 'r18': False,
 'kids': False}

In [150]:
def save_parameters():
#     filename = "parameters_" + str(len(clean_training_data)) + "_animes_" + str(n_vocabulary)  + "_words_" + str(len(genres)) + "_genres.json"
    with open("parameters.json", 'w', encoding='utf-8') as f:
        json.dump(parameters, f, ensure_ascii=False, indent=4)
    with open("parameters_no.json", 'w', encoding='utf-8') as f:
        json.dump(parameters_no, f, ensure_ascii=False, indent=4)
        
def load_parameters(filename="parameters.json", filename_no="parameters_no.json"):
    params, params_no = None, None
    with open(filename, 'r') as f:
        params =  json.load(f)
    with open(filename_no, 'r') as n:
        params_no =  json.load(n)
    return params, params_no

In [177]:
training_set_clean["synopsis"].apply(len).describe()

count    439.000000
mean      51.635535
std       16.153776
min        0.000000
25%       42.000000
50%       53.000000
75%       62.000000
max      117.000000
Name: synopsis, dtype: float64

In [183]:
training_set_clean[training_set_clean["synopsis"].apply(len) == 0]

Unnamed: 0,title,synopsis,genres,Romance,Historical,Space,Cars,Game,Supernatural,Fantasy,...,envelope,corrupted,suspected,asahina,membership,collectors,transgression,illusions,anthropomorphic,kishou
48,Made in Abyss Movie 3: Fukaki Tamashii no Reimei,[],"[Sci-Fi, Adventure, Mystery, Drama, Fantasy]",False,False,False,False,False,False,True,...,0,0,0,0,0,0,0,0,0,0


In [None]:
t_d = pd.DataFrame(pd.Series([["a", "b", "c"], ["b", "c"]]))
t_d.columns = ["genres_predicted"]
genre = "a"
t_d[genre + "_predicted"] = t_d["genres_predicted"].isin([genre])
t_d

In [215]:
isolated_test_set = test_set.copy()
predict_all(isolated_test_set)
isolated_test_set.head()

Unnamed: 0,title,synopsis,genres,Romance,Historical,Space,Cars,Game,Supernatural,Fantasy,...,Comedy_predicted,Magic_predicted,Shoujo_predicted,Mystery_predicted,Music_predicted,Thriller_predicted,Martial Arts_predicted,School_predicted,Harem_predicted,Ecchi_predicted
0,Youkoso Jitsuryoku Shijou Shugi no Kyoushitsu ...,"[surface, ikusei, senior, utopia, students, un...","[Slice of Life, Psychological, Drama, School]",False,False,False,False,False,False,False,...,False,True,True,True,True,True,True,True,True,True
1,Kaichou wa Maid-sama!,"[female, student, council, president, especial...","[Comedy, Romance, School, Shoujo]",True,False,False,False,False,False,False,...,True,True,False,True,True,True,True,True,True,True
2,Sakura-sou no Pet na Kanojo,"[abandoned, kittens, conscience, second, sorat...","[Slice of Life, Comedy, Drama, Romance, School]",True,False,False,False,False,False,False,...,True,True,True,True,True,True,True,False,True,True
3,Ansatsu Kyoushitsu,"[mysterious, creature, permanent, crescent, st...","[Action, Comedy, School, Shounen]",False,False,False,False,False,False,False,...,True,True,True,True,True,True,True,False,True,True
4,Kaze ga Tsuyoku Fuiteiru,"[former, runner, sendai, kakeru, kurahara, cha...","[Comedy, Sports, Drama]",False,False,False,False,False,False,False,...,False,True,True,True,True,True,True,True,True,True


In [228]:
pd.set_option("display.max_rows", 101)
isolated_test_set.iloc[2]

title                                            Sakura-sou no Pet na Kanojo
synopsis                   [abandoned, kittens, conscience, second, sorat...
genres                       [Slice of Life, Comedy, Drama, Romance, School]
Romance                                                                 True
Historical                                                             False
Space                                                                  False
Cars                                                                   False
Game                                                                   False
Supernatural                                                           False
Fantasy                                                                False
Psychological                                                          False
Sci-Fi                                                                 False
Seinen                                                                 False