### Importing Necessary Libraries

In [1]:
import pandas as pd
import numpy as np

### Reading Data

In [2]:
file = 'MAL Anime Reviews 85k.csv'
reviews = pd.read_csv(file)
reviews.head()

Unnamed: 0,Anime Rank,Anime Title,Anime URL,Username,Review Date,Episodes Watched,Review Likes,Overall Rating,Story Rating,Animation Rating,Sound Rating,Character Rating,Enjoyment Rating,Review
0,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,tazillo,"Jan 25, 2010",64 of 64 episodes seen,3464,10,10,9,9,10,10,"First of all, I have seen the original FMA and..."
1,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,Archaeon,"Nov 15, 2010",64 of 64 episodes seen,1311,9,8,9,9,9,9,Adaptations have long been a thorn in the side...
2,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,ChristopherKClaw,"Apr 7, 2015",64 of 64 episodes seen,1113,7,8,8,10,6,7,Fullmetal Alchemist: Brotherhood gets an immen...
3,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,literaturenerd,"Apr 25, 2015",64 of 64 episodes seen,704,7,8,8,8,8,8,Overview:\nFMA Brotherhood is an anime that ne...
4,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,M0nkeyD_Luffy,"Jul 29, 2015",64 of 64 episodes seen,446,5,2,7,7,8,5,Since I couldn't find any legitimate objective...


### Let us see what the overall composition of the dataset is in terms of positive/negative reviews:

In [3]:
score_filter = reviews['Overall Rating'] >= 6
print(len(reviews.loc[score_filter]))
print(len(reviews.loc[~score_filter]))

70455
15498


For the sake of simplicity, we will classify any review that has an Overall Rating of 6 or more to be Positive, and any review that has an Overall Rating of 5 or less to be Negative.  
If we split this training set based on the number of negative reviews, this gives us around 15500 reviews of each type (Positive/Negative) to train our model with.

### Creating a Sentiment column

In [4]:
reviews['Sentiment'] = reviews['Overall Rating'].apply(lambda rating: 'Positive' if rating >= 6 else 'Negative')
reviews.head()

Unnamed: 0,Anime Rank,Anime Title,Anime URL,Username,Review Date,Episodes Watched,Review Likes,Overall Rating,Story Rating,Animation Rating,Sound Rating,Character Rating,Enjoyment Rating,Review,Sentiment
0,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,tazillo,"Jan 25, 2010",64 of 64 episodes seen,3464,10,10,9,9,10,10,"First of all, I have seen the original FMA and...",Positive
1,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,Archaeon,"Nov 15, 2010",64 of 64 episodes seen,1311,9,8,9,9,9,9,Adaptations have long been a thorn in the side...,Positive
2,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,ChristopherKClaw,"Apr 7, 2015",64 of 64 episodes seen,1113,7,8,8,10,6,7,Fullmetal Alchemist: Brotherhood gets an immen...,Positive
3,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,literaturenerd,"Apr 25, 2015",64 of 64 episodes seen,704,7,8,8,8,8,8,Overview:\nFMA Brotherhood is an anime that ne...,Positive
4,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,M0nkeyD_Luffy,"Jul 29, 2015",64 of 64 episodes seen,446,5,2,7,7,8,5,Since I couldn't find any legitimate objective...,Negative


What we have done is assign a sentiment to each review based on the Overall Rating column. This will help us train our model when we split up our data into a test set and a training set.

### Creating a training set and test set from our dataframe

Let's use our score filter from earlier to get the positive and negative reviews and combine them into one, with the positives 'stacked on top' of the negatives.


In [5]:
positive = reviews.loc[score_filter]
negative = reviews.loc[~score_filter]
positive = positive[:len(negative)]
joined_reviews = pd.concat([positive, negative], axis=0)
joined_reviews

Unnamed: 0,Anime Rank,Anime Title,Anime URL,Username,Review Date,Episodes Watched,Review Likes,Overall Rating,Story Rating,Animation Rating,Sound Rating,Character Rating,Enjoyment Rating,Review,Sentiment
0,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,tazillo,"Jan 25, 2010",64 of 64 episodes seen,3464,10,10,9,9,10,10,"First of all, I have seen the original FMA and...",Positive
1,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,Archaeon,"Nov 15, 2010",64 of 64 episodes seen,1311,9,8,9,9,9,9,Adaptations have long been a thorn in the side...,Positive
2,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,ChristopherKClaw,"Apr 7, 2015",64 of 64 episodes seen,1113,7,8,8,10,6,7,Fullmetal Alchemist: Brotherhood gets an immen...,Positive
3,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,literaturenerd,"Apr 25, 2015",64 of 64 episodes seen,704,7,8,8,8,8,8,Overview:\nFMA Brotherhood is an anime that ne...,Positive
5,1,Fullmetal Alchemist: Brotherhood,https://myanimelist.net/anime/5114/Fullmetal_A...,ryuu_zer0,"Mar 2, 2010",64 of 64 episodes seen,241,9,10,9,8,10,10,"Now, this is a prime example of how to adapt a...",Positive
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
85876,99,Mo Dao Zu Shi,https://myanimelist.net/anime/37208/Mo_Dao_Zu_...,iluvlynx,"Oct 15, 2020",11 of 15 episodes seen,1,4,4,9,0,5,3,This review is spoiler-free.\nI read the novel...,Negative
85878,99,Mo Dao Zu Shi,https://myanimelist.net/anime/37208/Mo_Dao_Zu_...,susurro_void,"Sep 23, 2020",10 of 15 episodes seen,1,4,4,4,6,4,2,First things first: There are actually 23 epis...,Negative
85891,99,Mo Dao Zu Shi,https://myanimelist.net/anime/37208/Mo_Dao_Zu_...,BorisThePea,"Jun 28, 2020",15 of 15 episodes seen,0,5,5,10,8,5,3,"[Minimal spoiler review, and relatively short ...",Negative
85913,9,3-gatsu no Lion 2nd Season,https://myanimelist.net/anime/35180/3-gatsu_no...,Obama-Sama,"Jul 20, 2020",22 of 22 episodes seen,3,5,3,8,8,5,5,"In short, I got left Blue balled from this sea...",Negative


To summarize the above code cell, we separated the positive and negative reviews into their own respective dataframes. We then took the first 15498 reviews from the positive reviews and joined that subset with the negative reviews to form an evenly distributed new dataframe. 

In [6]:
shuffled_reviews = joined_reviews.reindex(np.random.permutation(joined_reviews.index))
shuffled_reviews

Unnamed: 0,Anime Rank,Anime Title,Anime URL,Username,Review Date,Episodes Watched,Review Likes,Overall Rating,Story Rating,Animation Rating,Sound Rating,Character Rating,Enjoyment Rating,Review,Sentiment
60793,4167,Cossette no Shouzou,https://myanimelist.net/anime/514/Cossette_no_...,IceAndCream,"Aug 8, 2010",1 of 3 episodes seen,11,5,6,10,8,7,1,This is the. Most. Disturbing. Anime. I. Have....,Negative
6045,119,Koukaku Kidoutai: Stand Alone Complex,https://myanimelist.net/anime/467/Koukaku_Kido...,ktulu007,"Dec 3, 2014",26 of 26 episodes seen,29,10,10,9,10,10,10,I've talked about Ghost in the Shell before. B...,Positive
8409,1287,D-Frag!,https://myanimelist.net/anime/20031/D-Frag/rev...,NullFreaks,"May 1, 2018",12 of 12 episodes seen,4,10,10,9,9,10,10,Short and simple: this anime is a MUST.\nTreme...,Positive
12707,1414,Itazura na Kiss,https://myanimelist.net/anime/3731/Itazura_na_...,Otaku1412,"Oct 1, 2008",25 of 25 episodes seen,17,8,8,7,7,8,8,"Itazura na Kiss, or &quot;Mischievous Kiss&quo...",Positive
15388,1518,Sukitte Ii na yo.,https://myanimelist.net/anime/14289/Sukitte_Ii...,jaccart,"Nov 7, 2015",13 of 13 episodes seen,4,6,7,9,9,6,5,My main problem with this anime is how damn fr...,Positive
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12241,1397,Deca-Dence,https://myanimelist.net/anime/40056/Deca-Dence...,Samawon,"Nov 3, 2020",12 of 12 episodes seen,1,8,9,8,6,8,8,Deca-Dence is the underrated gem of its season...,Positive
4092,1120,Toaru Hikuushi e no Tsuioku,https://myanimelist.net/anime/9000/Toaru_Hikuu...,Piesh00ter,"Oct 11, 2020",1 of 1 episodes seen,0,8,8,9,7,10,7,It was a great movie with fantastic character ...,Positive
16875,1579,Hal,https://myanimelist.net/anime/16528/Hal/review...,Oshiroibaba,"Apr 16, 2017",1 of 1 episodes seen,1,10,10,9,9,10,9,This is a great filler anime if you want to wa...,Positive
68608,49,Kimetsu no Yaiba,https://myanimelist.net/anime/38000/Kimetsu_no...,HanAsparux,"Aug 31, 2019",19 of 26 episodes seen,135,3,7,9,9,2,4,"I don't usually write any reviews for anime, c...",Negative


Now, our review set has been properly shuffled and we are ready to train our model.

In [7]:
from sklearn.model_selection import train_test_split
# future reference, install scikit-learn with pip but refer to it as sklearn as pip install sklearn is deprecated

In [8]:
X, y = train_test_split(shuffled_reviews, test_size=0.2, train_size=0.8, random_state=42, shuffle=True)
# X will be our training set and y will be our testing set

In [9]:
X
# training data

Unnamed: 0,Anime Rank,Anime Title,Anime URL,Username,Review Date,Episodes Watched,Review Likes,Overall Rating,Story Rating,Animation Rating,Sound Rating,Character Rating,Enjoyment Rating,Review,Sentiment
4426,1131,Gakuen Alice,https://myanimelist.net/anime/74/Gakuen_Alice/...,tsukiichii,"Jun 3, 2012",26 of 26 episodes seen,2,7,7,6,8,8,7,Okay so this is my first anime review so sorry...,Positive
11554,1381,Tonari no Kaibutsu-kun,https://myanimelist.net/anime/14227/Tonari_no_...,xFangero,"Aug 3, 2015",10 of 13 episodes seen,11,8,9,9,9,7,1,it was a hella great anime until ((spoiler?)) ...,Positive
3500,1101,Dungeon ni Deai wo Motomeru no wa Machigatteir...,https://myanimelist.net/anime/28121/Dungeon_ni...,Cauthan,"Jun 27, 2015",13 of 13 episodes seen,11,6,6,7,7,6,6,This is a spoiler-free review adapted for this...,Positive
52164,3548,Isekai Maou to Shoukan Shoujo no Dorei Majutsu,https://myanimelist.net/anime/37210/Isekai_Mao...,FCalavera,"Jul 26, 2018",4 of 12 episodes seen,30,4,3,5,5,5,6,"Well this is trash, I think we can all agree o...",Negative
41601,287,Fruits Basket 1st Season,https://myanimelist.net/anime/38680/Fruits_Bas...,RTPDJRT1,"Aug 12, 2019",18 of 25 episodes seen,6,4,4,9,2,1,3,Pros:\nBeautiful Art\nCons:\nHorrible soundtra...,Negative
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
24406,1994,B: The Beginning,https://myanimelist.net/anime/32827/B__The_Beg...,Nid_Vicious,"Oct 3, 2019",12 of 12 episodes seen,1,5,3,9,3,4,4,"Before I start with the actual review, let me ...",Negative
1666,1051,Mousou Dairinin,https://myanimelist.net/anime/323/Mousou_Dairi...,Fallow,"Apr 21, 2008",13 of 13 episodes seen,116,9,7,9,7,9,8,Paranoia Agent\nNo. episodes - 13\nStory -\nIt...,Positive
13776,1456,Boruto: Naruto the Movie,https://myanimelist.net/anime/28755/Boruto__Na...,Izuka-kun,"Mar 4, 2017",1 of 1 episodes seen,1,8,7,9,9,7,8,Any fan of the Naruto series would say they ar...,Positive
42758,296,Toki wo Kakeru Shoujo,https://myanimelist.net/anime/2236/Toki_wo_Kak...,Swiftshooter13,"Oct 24, 2017",1 of 1 episodes seen,12,4,5,5,3,3,4,Although Einstein’s theory of general relativi...,Negative


In [10]:
y
# testing data

Unnamed: 0,Anime Rank,Anime Title,Anime URL,Username,Review Date,Episodes Watched,Review Likes,Overall Rating,Story Rating,Animation Rating,Sound Rating,Character Rating,Enjoyment Rating,Review,Sentiment
3705,1106,Ichigo Mashimaro,https://myanimelist.net/anime/488/Ichigo_Mashi...,Plucky-San,"Nov 3, 2016",12 of 12 episodes seen,4,8,8,9,7,7,8,"Just finished it last night!\nYeah, the overar...",Positive
13305,1436,Ouritsu Uchuugun: Honneamise no Tsubasa,https://myanimelist.net/anime/1034/Ouritsu_Uch...,MrHawky,"Jun 21, 2016",1 of 1 episodes seen,6,7,8,8,4,7,9,"Personally, I would like to fly some more into...",Positive
5515,1171,Ga-Rei: Zero,https://myanimelist.net/anime/4725/Ga-Rei__Zer...,instagramposter,"Aug 16, 2009",12 of 12 episodes seen,3,10,10,10,10,10,10,i personally think this anime has the most pow...,Positive
26291,2076,Fate/stay night,https://myanimelist.net/anime/356/Fate_stay_ni...,DanteMustDie8907,"Mar 28, 2015",24 of 24 episodes seen,33,4,3,2,2,2,4,"Fate/Stay Night - 4/10\nIt looks awfull, art i...",Negative
17020,158,Cowboy Bebop: Tengoku no Tobira,https://myanimelist.net/anime/5/Cowboy_Bebop__...,Lab_Mem_Num001,"Apr 7, 2020",1 of 1 episodes seen,1,8,7,9,10,8,9,Plot: B *Note: it is described as being in bet...,Positive
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15732,1533,Nanbaka 2,https://myanimelist.net/anime/34414/Nanbaka_2/...,orkuncan,"Mar 22, 2017",12 of 12 episodes seen,23,8,7,7,9,7,8,İts literally like 7.5 well i just voted 8 .\n...,Positive
9676,1328,Tsurune: Kazemai Koukou Kyuudoubu,https://myanimelist.net/anime/36653/Tsurune__K...,Foxstens,"Jan 28, 2019",13 of 13 episodes seen,1,8,8,8,9,6,7,I didn't plan on writing a review for this but...,Positive
10989,1363,Grisaia no Kajitsu,https://myanimelist.net/anime/17729/Grisaia_no...,SensouNoKami,"Jun 7, 2015",13 of 13 episodes seen,6,8,8,9,8,9,8,"In writing this review, i have played part of ...",Positive
2234,1068,Runway de Waratte,https://myanimelist.net/anime/40392/Runway_de_...,sweet_psycho,"May 26, 2020",12 of 12 episodes seen,2,8,8,9,9,7,8,Well I gave it a score of: 8\nI really liked a...,Positive


In [11]:
training_sentiments = list(X['Sentiment'])
testing_sentiments = list(y['Sentiment'])

# training_ovr_ratings = list(X['Overall Rating'])
# training_ovr_ratings = list(y['Overall Rating'])

In [12]:
print(training_sentiments.count('Positive'))
print(training_sentiments.count('Negative'))

12412
12384


Fairly even split of positives to negatives, let's check the testing set

In [13]:
print(testing_sentiments.count('Positive'))
print(testing_sentiments.count('Negative'))

3086
3114


Training set also contains very similar split of positives to negatives

Now let us collect the actual review text and put them into their respective lists:

In [14]:
training_reviews = list(X['Review'])
testing_reviews = list(y['Review'])

In [15]:
from sklearn.feature_extraction.text import TfidfVectorizer
# tfid gives words that occur very frequently lesser significance (e.g. 'the', 'a', 'and')

In [16]:
vectorizer = TfidfVectorizer()
training_matrices = vectorizer.fit_transform(training_reviews)
testing_matrices = vectorizer.transform(testing_reviews)
# now that we have created the vector representations, we can train our model by fitting on our data

### Which classification model to use?

In [17]:
from sklearn.metrics import f1_score
# instead of accuracy, which can be misleading due to false positives/false negatives, we use the f1 score as a metric instead

### Naive Bayes

In [18]:
from sklearn.naive_bayes import MultinomialNB
nb_model = MultinomialNB()
nb_model.fit(training_matrices, training_sentiments)
f1_score(testing_sentiments, nb_model.predict(testing_matrices), average=None, labels=['Positive', 'Negative'])

array([0.87473391, 0.87843636])

### Logistic Regression

In [19]:
from sklearn.linear_model import LogisticRegression
lreg_model = LogisticRegression()
lreg_model.fit(training_matrices, training_sentiments)
f1_score(testing_sentiments, lreg_model.predict(testing_matrices), average=None, labels=['Positive', 'Negative'])

array([0.88845401, 0.89087428])

As we can see, our models work pretty well, with around a 87-88% f1 score all around. This is not bad for a basic classifier!