# Data Exploration and Analysis
The following notebook will take a look at the data sets that will be used in the recommendation engine and prep the data for use in the engine itself.

## Import Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## Read in Data

In [4]:
animes_df = pd.read_csv('data/animes.csv')
users_df = pd.read_csv('data/profiles.csv')
reviews_df = pd.read_csv('data/reviews.csv')

print("Animes DF: {}\nUsers DF: {}\nReviews DF: {}".format(animes_df.shape, users_df.shape, reviews_df.shape))

Animes DF: (19311, 12)
Users DF: (81727, 5)
Reviews DF: (192112, 7)


### Animes DataFrame

In [5]:
animes_df.head(5)

Unnamed: 0,uid,title,synopsis,genre,aired,episodes,members,popularity,ranked,score,img_url,link
0,28891,Haikyuu!! Second Season,Following their participation at the Inter-Hig...,"['Comedy', 'Sports', 'Drama', 'School', 'Shoun...","Oct 4, 2015 to Mar 27, 2016",25.0,489888,141,25.0,8.82,https://cdn.myanimelist.net/images/anime/9/766...,https://myanimelist.net/anime/28891/Haikyuu_Se...
1,23273,Shigatsu wa Kimi no Uso,Music accompanies the path of the human metron...,"['Drama', 'Music', 'Romance', 'School', 'Shoun...","Oct 10, 2014 to Mar 20, 2015",22.0,995473,28,24.0,8.83,https://cdn.myanimelist.net/images/anime/3/671...,https://myanimelist.net/anime/23273/Shigatsu_w...
2,34599,Made in Abyss,The Abyss—a gaping chasm stretching down into ...,"['Sci-Fi', 'Adventure', 'Mystery', 'Drama', 'F...","Jul 7, 2017 to Sep 29, 2017",13.0,581663,98,23.0,8.83,https://cdn.myanimelist.net/images/anime/6/867...,https://myanimelist.net/anime/34599/Made_in_Abyss
3,5114,Fullmetal Alchemist: Brotherhood,"""In order for something to be obtained, someth...","['Action', 'Military', 'Adventure', 'Comedy', ...","Apr 5, 2009 to Jul 4, 2010",64.0,1615084,4,1.0,9.23,https://cdn.myanimelist.net/images/anime/1223/...,https://myanimelist.net/anime/5114/Fullmetal_A...
4,31758,Kizumonogatari III: Reiketsu-hen,After helping revive the legendary vampire Kis...,"['Action', 'Mystery', 'Supernatural', 'Vampire']","Jan 6, 2017",1.0,214621,502,22.0,8.83,https://cdn.myanimelist.net/images/anime/3/815...,https://myanimelist.net/anime/31758/Kizumonoga...


In [53]:
animes_df.genre.isnull().sum()

0

In [51]:
genres = []
for genre_set in animes_df.genre:
    values = genre_set.strip("[]").split(",")
    values = [w.strip()[1:-1] for w in values]
    genres.extend(values)

genres = set(genres)
print("The number of genres is {}.".format(len(genres)))
print(genres)

The number of genres is 44.
{'', 'Comedy', 'Fantasy', 'Game', 'Police', 'Action', 'Parody', 'Slice of Life', 'Adventure', 'Music', 'Sci-Fi', 'Yaoi', 'Shounen', 'Thriller', 'Cars', 'Kids', 'Josei', 'Yuri', 'Military', 'Super Power', 'Dementia', 'School', 'Samurai', 'Psychological', 'Mystery', 'Hentai', 'Romance', 'Space', 'Supernatural', 'Ecchi', 'Sports', 'Shoujo', 'Drama', 'Vampire', 'Demons', 'Martial Arts', 'Historical', 'Mecha', 'Seinen', 'Shoujo Ai', 'Horror', 'Harem', 'Shounen Ai', 'Magic'}


Notice the first element is empty. This happend when calling set(genres). We can quickly delete that.

In [52]:
genres = list(genres)
genres.pop(0)
print("The number of genres is {}.".format(len(genres)))
print(genres)

The number of genres is 43.
['Comedy', 'Fantasy', 'Game', 'Police', 'Action', 'Parody', 'Slice of Life', 'Adventure', 'Music', 'Sci-Fi', 'Yaoi', 'Shounen', 'Thriller', 'Cars', 'Kids', 'Josei', 'Yuri', 'Military', 'Super Power', 'Dementia', 'School', 'Samurai', 'Psychological', 'Mystery', 'Hentai', 'Romance', 'Space', 'Supernatural', 'Ecchi', 'Sports', 'Shoujo', 'Drama', 'Vampire', 'Demons', 'Martial Arts', 'Historical', 'Mecha', 'Seinen', 'Shoujo Ai', 'Horror', 'Harem', 'Shounen Ai', 'Magic']


In [58]:
animes_df.genre[0].find("Sports")

12

In [59]:
def split_genres(movie):
    try:
        if movie.find(genre) > -1:
            return 1
        else:
            return 0
    except AttributeError:
        return 0

for genre in genres:
    animes_df[genre] = animes_df['genre'].apply(split_genres)

In [61]:
animes_df.iloc[59]

uid                                                            269
title                                                       Bleach
synopsis         Ichigo Kurosaki is an ordinary high schooler—u...
genre            ['Action', 'Adventure', 'Comedy', 'Super Power...
aired                                  Oct 5, 2004 to Mar 27, 2012
episodes                                                     366.0
members                                                    1002578
popularity                                                      25
ranked                                                       757.0
score                                                         7.87
img_url          https://cdn.myanimelist.net/images/anime/3/404...
link                      https://myanimelist.net/anime/269/Bleach
Comedy                                                           1
Fantasy                                                          0
Game                                                          