# Data Preparation

Dataset: <a href="https://www.kaggle.com/vishalmane10/anime-dataset-2022">Anime Dataset 2022</a><br>
Filename: Anime.csv<br>


<table>
  <tr>
    <th>Feature_Name</th>
    <th>Feature_Type</th>
  </tr><tr>
    <th></th>
    <th></th>
  </tr>
</table>

## Import Libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

%matplotlib inline

In [2]:
# Set Options for display
pd.options.display.max_rows = 100
pd.options.display.max_columns = 100
pd.options.display.float_format = '{:.2f}'.format

#Filter Warnings
import warnings
warnings.filterwarnings('ignore')

In [3]:
from scipy.stats import norm
from scipy import stats

________

## Load the Dataset
* Specify the Parameters (Filepath, Index Column)
* Check for Date-Time Columns to Parse Dates
* Check Encoding if file does not load correctly

In [4]:
df = pd.read_csv("./Anime.csv")

View the Dataset

In [5]:
df.head()

Unnamed: 0,Rank,Name,Japanese_name,Type,Episodes,Studio,Release_season,Tags,Rating,Release_year,End_year,Description,Content_Warning,Related_Mange,Related_anime,Voice_actors,staff
0,1,Demon Slayer: Kimetsu no Yaiba - Entertainment...,Kimetsu no Yaiba: Yuukaku-hen,TV,,ufotable,Fall,"Action, Adventure, Fantasy, Shounen, Demons, H...",4.6,2021.0,,'Tanjiro and his friends accompany the Hashira...,Explicit Violence,Demon Slayer: Kimetsu no Yaiba,"Demon Slayer: Kimetsu no Yaiba, Demon Slayer: ...","Inosuke Hashibira : Yoshitsugu Matsuoka, Nezuk...","Koyoharu Gotouge : Original Creator, Haruo Sot..."
1,2,Fruits Basket the Final Season,Fruits Basket the Final,TV,13.0,TMS Entertainment,Spring,"Drama, Fantasy, Romance, Shoujo, Animal Transf...",4.6,2021.0,,'The final arc of Fruits Basket.',"Emotional Abuse,, Mature Themes,, Physical Abu...","Fruits Basket, Fruits Basket Another","Fruits Basket 1st Season, Fruits Basket 2nd Se...","Akito Sohma : Maaya Sakamoto, Kyo Sohma : Yuum...","Natsuki Takaya : Original Creator, Yoshihide I..."
2,3,Mo Dao Zu Shi 3,The Founder of Diabolism 3,Web,12.0,B.C MAY PICTURES,,"Fantasy, Ancient China, Chinese Animation, Cul...",4.58,2021.0,,'The third season of Mo Dao Zu Shi.',,Grandmaster of Demonic Cultivation: Mo Dao Zu ...,"Mo Dao Zu Shi 2, Mo Dao Zu Shi Q","Lan Wangji, Wei Wuxian, Jiang Cheng, Jin Guang...","Mo Xiang Tong Xiu : Original Creator, Xiong Ke..."
3,4,Fullmetal Alchemist: Brotherhood,Hagane no Renkinjutsushi: Full Metal Alchemist,TV,64.0,Bones,Spring,"Action, Adventure, Drama, Fantasy, Mystery, Sh...",4.58,2009.0,2010.0,"""The foundation of alchemy is based on the law...","Animal Abuse,, Mature Themes,, Violence,, Dome...","Fullmetal Alchemist, Fullmetal Alchemist (Ligh...","Fullmetal Alchemist: Brotherhood Specials, Ful...","Alphonse Elric : Rie Kugimiya, Edward Elric : ...","Hiromu Arakawa : Original Creator, Yasuhiro Ir..."
4,5,Attack on Titan 3rd Season: Part II,Shingeki no Kyojin Season 3: Part II,TV,10.0,WIT Studio,Spring,"Action, Fantasy, Horror, Shounen, Dark Fantasy...",4.57,2019.0,,'The battle to retake Wall Maria begins now! W...,"Cannibalism,, Explicit Violence","Attack on Titan, Attack on Titan: End of the W...","Attack on Titan, Attack on Titan 2nd Season, A...","Armin Arlelt : Marina Inoue, Eren Jaeger : Yuu...","Hajime Isayama : Original Creator, Tetsurou Ar..."


Check the Shape

In [6]:
df.shape

(18495, 17)

## Ensure Columns / Features have Proper Labels

Remove any columns that have not been labelled properly or are of unknown feature type

In [7]:
df.columns

Index(['Rank', 'Name', 'Japanese_name', 'Type', 'Episodes', 'Studio',
       'Release_season', 'Tags', 'Rating', 'Release_year', 'End_year',
       'Voice_actors', 'staff'],
      dtype='object')

## Ensure Correct Format of Values

Use the table above as reference

In [8]:
df.dtypes

Rank                 int64
Name                object
Japanese_name       object
Type                object
Episodes           float64
Studio              object
Release_season      object
Tags                object
Rating             float64
Release_year       float64
End_year           float64
Description         object
Related_Mange       object
Related_anime       object
Voice_actors        object
staff               object
dtype: object

## Remove Duplicates

Check if there are duplicated rows


In [9]:
df.index.duplicated().sum()

0

Remove the duplicates if any

In [10]:
df[df.duplicated()]

Unnamed: 0,Rank,Name,Japanese_name,Type,Episodes,Studio,Release_season,Tags,Rating,Release_year,End_year,Description,Content_Warning,Related_Mange,Related_anime,Voice_actors,staff


Check if the rows are dropped

In [11]:
df.drop_duplicates(inplace=True)

## Handle Missing Data

In [12]:
df.head()

Unnamed: 0,Rank,Name,Japanese_name,Type,Episodes,Studio,Release_season,Tags,Rating,Release_year,End_year,Description,Content_Warning,Related_Mange,Related_anime,Voice_actors,staff
0,1,Demon Slayer: Kimetsu no Yaiba - Entertainment...,Kimetsu no Yaiba: Yuukaku-hen,TV,,ufotable,Fall,"Action, Adventure, Fantasy, Shounen, Demons, H...",4.6,2021.0,,'Tanjiro and his friends accompany the Hashira...,Explicit Violence,Demon Slayer: Kimetsu no Yaiba,"Demon Slayer: Kimetsu no Yaiba, Demon Slayer: ...","Inosuke Hashibira : Yoshitsugu Matsuoka, Nezuk...","Koyoharu Gotouge : Original Creator, Haruo Sot..."
1,2,Fruits Basket the Final Season,Fruits Basket the Final,TV,13.0,TMS Entertainment,Spring,"Drama, Fantasy, Romance, Shoujo, Animal Transf...",4.6,2021.0,,'The final arc of Fruits Basket.',"Emotional Abuse,, Mature Themes,, Physical Abu...","Fruits Basket, Fruits Basket Another","Fruits Basket 1st Season, Fruits Basket 2nd Se...","Akito Sohma : Maaya Sakamoto, Kyo Sohma : Yuum...","Natsuki Takaya : Original Creator, Yoshihide I..."
2,3,Mo Dao Zu Shi 3,The Founder of Diabolism 3,Web,12.0,B.C MAY PICTURES,,"Fantasy, Ancient China, Chinese Animation, Cul...",4.58,2021.0,,'The third season of Mo Dao Zu Shi.',,Grandmaster of Demonic Cultivation: Mo Dao Zu ...,"Mo Dao Zu Shi 2, Mo Dao Zu Shi Q","Lan Wangji, Wei Wuxian, Jiang Cheng, Jin Guang...","Mo Xiang Tong Xiu : Original Creator, Xiong Ke..."
3,4,Fullmetal Alchemist: Brotherhood,Hagane no Renkinjutsushi: Full Metal Alchemist,TV,64.0,Bones,Spring,"Action, Adventure, Drama, Fantasy, Mystery, Sh...",4.58,2009.0,2010.0,"""The foundation of alchemy is based on the law...","Animal Abuse,, Mature Themes,, Violence,, Dome...","Fullmetal Alchemist, Fullmetal Alchemist (Ligh...","Fullmetal Alchemist: Brotherhood Specials, Ful...","Alphonse Elric : Rie Kugimiya, Edward Elric : ...","Hiromu Arakawa : Original Creator, Yasuhiro Ir..."
4,5,Attack on Titan 3rd Season: Part II,Shingeki no Kyojin Season 3: Part II,TV,10.0,WIT Studio,Spring,"Action, Fantasy, Horror, Shounen, Dark Fantasy...",4.57,2019.0,,'The battle to retake Wall Maria begins now! W...,"Cannibalism,, Explicit Violence","Attack on Titan, Attack on Titan: End of the W...","Attack on Titan, Attack on Titan 2nd Season, A...","Armin Arlelt : Marina Inoue, Eren Jaeger : Yuu...","Hajime Isayama : Original Creator, Tetsurou Ar..."


In [13]:
#Gets the total number of missing data
total = df.isnull().sum().sort_values(ascending=False)


In [14]:
#Get % of Null
percent = (df.isnull().sum()/df.isnull().count()).sort_values(ascending=False)

In [15]:
missing_data = pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])

missing_data.head(20)

Unnamed: 0,Total,Percent
Content_Warning,16655,0.9
End_year,15641,0.85
Release_season,14379,0.78
Related_Mange,10868,0.59
Japanese_name,10557,0.57
Episodes,8994,0.49
Related_anime,8432,0.46
Studio,6477,0.35
staff,5490,0.3
Voice_actors,3186,0.17


In [16]:
df.fillna({'Content_Warning' : 'N/A'}, inplace=True)

#missing data will fill them up 0 because its int
df.fillna({'End_year' : '0'}, inplace=True)

df.fillna({'Release_season' : 'N/A'}, inplace=True)

df.fillna({'Related_Mange' : 'N/A'}, inplace=True)

df.fillna({'Japanese_name' : 'N/A'}, inplace=True)

#maybe ongoing but we will fillin first by 0 episode
df.fillna({'Episodes' : '0'}, inplace=True)

df.fillna({'Related_anime' : 'N/A'}, inplace=True)

df.fillna({'Studio' : 'N/A'} , inplace=True)

df.fillna({'staff' : 'N/A'}, inplace=True)

df.fillna({'Voice_actors' : 'N/A'}, inplace=True)

#just fill in 0
df.fillna({'Rating' : '0'}, inplace=True)

df.fillna({'Tags' : 'N/A'}, inplace=True)

df.fillna({'Release_year' : '0'}, inplace=True)

df


Unnamed: 0,Rank,Name,Japanese_name,Type,Episodes,Studio,Release_season,Tags,Rating,Release_year,End_year,Description,Content_Warning,Related_Mange,Related_anime,Voice_actors,staff
0,1,Demon Slayer: Kimetsu no Yaiba - Entertainment...,Kimetsu no Yaiba: Yuukaku-hen,TV,0,ufotable,Fall,"Action, Adventure, Fantasy, Shounen, Demons, H...",4.60,2021.00,0,'Tanjiro and his friends accompany the Hashira...,Explicit Violence,Demon Slayer: Kimetsu no Yaiba,"Demon Slayer: Kimetsu no Yaiba, Demon Slayer: ...","Inosuke Hashibira : Yoshitsugu Matsuoka, Nezuk...","Koyoharu Gotouge : Original Creator, Haruo Sot..."
1,2,Fruits Basket the Final Season,Fruits Basket the Final,TV,13.00,TMS Entertainment,Spring,"Drama, Fantasy, Romance, Shoujo, Animal Transf...",4.60,2021.00,0,'The final arc of Fruits Basket.',"Emotional Abuse,, Mature Themes,, Physical Abu...","Fruits Basket, Fruits Basket Another","Fruits Basket 1st Season, Fruits Basket 2nd Se...","Akito Sohma : Maaya Sakamoto, Kyo Sohma : Yuum...","Natsuki Takaya : Original Creator, Yoshihide I..."
2,3,Mo Dao Zu Shi 3,The Founder of Diabolism 3,Web,12.00,B.C MAY PICTURES,,"Fantasy, Ancient China, Chinese Animation, Cul...",4.58,2021.00,0,'The third season of Mo Dao Zu Shi.',,Grandmaster of Demonic Cultivation: Mo Dao Zu ...,"Mo Dao Zu Shi 2, Mo Dao Zu Shi Q","Lan Wangji, Wei Wuxian, Jiang Cheng, Jin Guang...","Mo Xiang Tong Xiu : Original Creator, Xiong Ke..."
3,4,Fullmetal Alchemist: Brotherhood,Hagane no Renkinjutsushi: Full Metal Alchemist,TV,64.00,Bones,Spring,"Action, Adventure, Drama, Fantasy, Mystery, Sh...",4.58,2009.00,2010.00,"""The foundation of alchemy is based on the law...","Animal Abuse,, Mature Themes,, Violence,, Dome...","Fullmetal Alchemist, Fullmetal Alchemist (Ligh...","Fullmetal Alchemist: Brotherhood Specials, Ful...","Alphonse Elric : Rie Kugimiya, Edward Elric : ...","Hiromu Arakawa : Original Creator, Yasuhiro Ir..."
4,5,Attack on Titan 3rd Season: Part II,Shingeki no Kyojin Season 3: Part II,TV,10.00,WIT Studio,Spring,"Action, Fantasy, Horror, Shounen, Dark Fantasy...",4.57,2019.00,0,'The battle to retake Wall Maria begins now! W...,"Cannibalism,, Explicit Violence","Attack on Titan, Attack on Titan: End of the W...","Attack on Titan, Attack on Titan 2nd Season, A...","Armin Arlelt : Marina Inoue, Eren Jaeger : Yuu...","Hajime Isayama : Original Creator, Tetsurou Ar..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18490,18491,Qin Shi Mingyue: Canghai Hengliu Xiaomeng Spec...,,Web,2.00,Sparkly Key Animation Studio,,"Action, Ancient China, Chinese Animation, Hist...",0,2020.00,0,Special episodes of Qin Shi Mingyue: Canghai H...,,,Qin Shi Mingyue: Canghai Hengliu,,
18491,18492,Yi Tang Juchang: Sanguo Yanyi,,TV,108.00,,,Chinese Animation,0,2010.00,0,No synopsis yet - check back soon!,,,,,
18492,18493,Fenghuang Ji Xiang Yu Qingming Shanghe Tu,,TV,13.00,,,"Chinese Animation, Family Friendly, Short Epis...",0,2020.00,0,No synopsis yet - check back soon!,,,,,
18493,18494,Chengshi Jiyi Wo Men de Jieri,,TV,0,,,"Chinese Animation, Family Friendly, Short Epis...",0,2020.00,0,No synopsis yet - check back soon!,,,,,


______

# Save the final dataset as a CSV File

In [17]:
df.to_csv('./Anime_prepped.csv')

### Check if it loads correctly

In [20]:
df_check = pd.read_csv('./Anime_prepped.csv', index_col='Unnamed: 0')

In [22]:
df_check.tail()

Unnamed: 0,Rank,Name,Japanese_name,Type,Episodes,Studio,Release_season,Tags,Rating,Release_year,End_year,Description,Content_Warning,Related_Mange,Related_anime,Voice_actors,staff
18490,18491,Qin Shi Mingyue: Canghai Hengliu Xiaomeng Spec...,,Web,2.0,Sparkly Key Animation Studio,,"Action, Ancient China, Chinese Animation, Hist...",0.0,2020.0,0.0,Special episodes of Qin Shi Mingyue: Canghai H...,,,Qin Shi Mingyue: Canghai Hengliu,,
18491,18492,Yi Tang Juchang: Sanguo Yanyi,,TV,108.0,,,Chinese Animation,0.0,2010.0,0.0,No synopsis yet - check back soon!,,,,,
18492,18493,Fenghuang Ji Xiang Yu Qingming Shanghe Tu,,TV,13.0,,,"Chinese Animation, Family Friendly, Short Epis...",0.0,2020.0,0.0,No synopsis yet - check back soon!,,,,,
18493,18494,Chengshi Jiyi Wo Men de Jieri,,TV,0.0,,,"Chinese Animation, Family Friendly, Short Epis...",0.0,2020.0,0.0,No synopsis yet - check back soon!,,,,,
18494,18495,Heisei Inu Monogatari Bow: Genshi Inu Monogata...,,Movie,0.0,Nippon Animation,,"Comedy, Slice of Life, Dogs",0.0,1994.0,0.0,No synopsis yet - check back soon!,,,Heisei Inu Monogatari Bow,,
