# Scraped Data Analysis

## This notebook is going to analyse the data scraped from the [MyAnimeList](https://myanimelist.net/) website.

#### Compile and load our data into a dataframe

In [1]:
import scraper
import os
 
csv_dir = os.path.join(os.getcwd(), 'anime-dataset', 'csv')

data_scraper = scraper.Scraper(csv_dir)

# data_scraper.buildCSV() # Scrape the data and store it in the CSV files

df = data_scraper.compileDF() # Compile our CSV's into a single dataframe containing the columns we want

In [17]:
df

Unnamed: 0,name_english,name_japanese,show_type,episodes,producers,licensors,studios,genres,episode_length,rating,description,score_and_scorers,episode_length_bins,episodes_bins
0,Fullmetal Alchemist Brotherhood,鋼の錬金術師 FULLMETAL ALCHEMIST,TV,64,"Aniplex, Square Enix, Mainichi Broadcasting Sy...","Funimation, Aniplex of America",Bones,"Action, Military, Adventure, Comedy, Drama, Ma...",24,R,"""In order for something to be obtained, someth...","9.22, 1238537",<30,60
1,Steins;Gate,STEINS;GATE,TV,24,"Frontier Works, Media Factory, Movic, AT-X, Ka...",Funimation,White Fox,"Thriller, Sci-Fi",24,PG-13,The self-proclaimed mad scientist Rintarou Oka...,"9.12, 888306",<30,<30
2,Gintama Season 4,銀魂°,TV,51,"TV Tokyo, Aniplex, Dentsu","Funimation, Crunchyroll",Bandai Namco Pictures,"Action, Comedy, Historical, Parody, Samurai, S...",24,PG-13,"Gintoki, Shinpachi, and Kagura return as the f...","9.11, 127501",<30,60
3,Hunter x Hunter,HUNTER×HUNTER（ハンター×ハンター）,TV,148,"VAP, Nippon Television Network, Shueisha",Viz Media,Madhouse,"Action, Adventure, Fantasy, Shounen, Super Power",23,PG-13,Hunter x Hunter is set in a world where Hunter...,"9.11, 834570",<30,150
4,Legend of the Galactic Heroes,銀河英雄伝説,OVA,110,"Kitty Films, K-Factory",Sentai Filmworks,"Artland, Magic Bus","Military, Sci-Fi, Space, Drama",26,R,The 150-year-long stalemate between the two in...,"9.10, 48184",<30,120
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45,,エイリアン9,OVA,4,"Bandai Visual, Genco, MediaNet, AT-X, Nippon C...",Central Park Media,J.C.Staff,"Sci-Fi, Horror, Psychological, School",28,R,"Soon after entering middle school, Yuri Otani ...","6.67, 11353",<30,<30
46,Cat Planet Cuties,あそびにいくヨ!,TV,12,"AIC, Lantis, Media Factory, Pony Canyon, Rakuo...",Funimation,AIC Plus+,"Comedy, Ecchi, Harem, Romance, Sci-Fi",24,R+,"Kio is just another boring, nice guy with a bo...","6.67, 72526",<30,<30
47,Astarotte's Toy EX,アスタロッテのおもちゃ! EX,OVA,1,add some,add some,Diomedea,"Comedy, Demons, Ecchi, Fantasy, Romance, Seinen",21,R+,The OVA Astaroette no Omocha is a three part s...,"6.67, 16046",<30,<30
48,"The Idiot, the Tests, and the Summoned Creatur...",バカとテストと召喚獣 問題 クリスマスについて答えなさい,ONA,1,add some,Funimation,Silver Link.,"Comedy, Romance",4,PG-13,The cast of Baka to Test all get together and ...,"6.67, 18673",<30,<30


### Now let's look at all of the unique values from each column of the dataframe

In [3]:
unique_dic = {}
for column in df.columns[:-1]:
    unique_dic[column] = df[column].unique()

In [4]:
unique_dic['show_type']

array(['TV', 'OVA', 'Movie', 'ONA', 'Special', 'Music'], dtype=object)

In [5]:
unique_dic['episodes'] # Done

array([64, 24, 51, 148, 110, 10, 13, 22, 1, 12, 201, 7, 25, 14, 26, 75,
       74, 4, 43, 47, 27, 11, 39, 37, 101, 99, 23, 15, 3, 38, 5, 2, 109,
       44, 6, 16, 52, 48, 50, 72, 127, 9, 8, 331, 49, 31, 62, 130, 77,
       170, NaT, 63, 19, 36, 20, 53, 29, 114, 60, 59, 54, 151, 214, 46,
       58, 78, 40, 145, 21, 69, 108, 30, 35, 95, 164, 87, 28, 89, 45, 94,
       61, 56, 34, 55, 83, 178, 119, 103, 32, 526, 91, 84, 140, 88, 100,
       220, 73, 156, 80, 97, 104, 136, 143, 18, 167, 305, 42, 65, 96, 33,
       113, 200, 726, 300, 17, 139, 1471, 366, 155, 163, 161, 102, 1006,
       1565, 1428, 237, 373, 1818, 358, 175, 365, 120, 112, 195, 1787, 67,
       263, 85, 124, 510, 475, 41, 142, 86, 312, 431, 115, 135, 70, 162,
       132, 398, 125, 215, 147, 744, 68, 180, 283, 71, 154, 260, 199, 243,
       150, 122, 79, 240, 105, 425, 1306, 773, 76, 224, 131, 276, 172,
       128, 93, 92, 330, 137, 191, 203, 291, 193, 66, 192, 153, 500, 258,
       296, 146, 694, 182, 117], dtype=object)

In [6]:
unique_dic['episode_length'] # Done

array([24, 23, 26, 25, 130, 106, 110, 22, 125, 83, 122, 105, 135, 30, 90,
       119, 117, 162, 47, 28, 64, 108, 45, 114, 140, 95, 87, 88, 27, 161,
       48, 18, 112, 42, 111, 5, 99, 89, 14, 93, 31, 60, 2, 3, 43, 6, 15,
       1, 4, 76, 62, 29, 12, 75, 20, 13, 8, 10, 96, 9, 70, 118, 40, 54,
       72, 21, 16, 33, 11, 56, 109, 101, 78, 7, 128, 39, 51, 50, 92, 66,
       121, 132, 79, 38, 150, 85, 65, 44, 97, 74, 57, 100, 59, 46, 37, 91,
       32, 126, 77, 120, 115, 94, 102, 52, 103, 55, 136, 17, 98, 86, 82,
       80, 84, 69, 53, 19, 107, 71, 34, 67, 58, NaT, 63, 49, 35, 41, 73,
       61, 36, 127, 68, 116, 104, 81, 113, 139, 131, 123, 148, 147, 167,
       231, 134, 141, 156, 137, 124, 153, 152, 160, 163], dtype=object)

In [7]:
unique_dic['rating']

array(['R', 'PG-13', 'PG', 'R+', 'G', 'None'], dtype=object)