# Import and Load Data

In [1]:
import re
import string
import pandas as pd
from string import punctuation

from nltk.corpus import stopwords
from nltk import word_tokenize
from nltk.stem import WordNetLemmatizer

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from xgboost import XGBClassifier

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [2]:
df = pd.read_csv("IMDB Dataset.csv")

In [3]:
pd.set_option("max.colwidth",0)

In [4]:
df.head()

Unnamed: 0,review,sentiment
0,"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.<br /><br />The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.<br /><br />It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.<br /><br />I would say the main appeal of the show is due to the fact that it goes where other shows wouldn't dare. Forget pretty pictures painted for mainstream audiences, forget charm, forget romance...OZ doesn't mess around. The first episode I ever saw struck me as so nasty it was surreal, I couldn't say I was ready for it, but as I watched more, I developed a taste for Oz, and got accustomed to the high levels of graphic violence. Not just violence, but injustice (crooked guards who'll be sold out for a nickel, inmates who'll kill on order and get away with it, well mannered, middle class inmates being turned into prison bitches due to their lack of street skills or prison experience) Watching Oz, you may become comfortable with what is uncomfortable viewing....thats if you can get in touch with your darker side.",positive
1,"A wonderful little production. <br /><br />The filming technique is very unassuming- very old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire piece. <br /><br />The actors are extremely well chosen- Michael Sheen not only ""has got all the polari"" but he has all the voices down pat too! You can truly see the seamless editing guided by the references to Williams' diary entries, not only is it well worth the watching but it is a terrificly written and performed piece. A masterful production about one of the great master's of comedy and his life. <br /><br />The realism really comes home with the little things: the fantasy of the guard which, rather than use the traditional 'dream' techniques remains solid then disappears. It plays on our knowledge and our senses, particularly with the scenes concerning Orton and Halliwell and the sets (particularly of their flat with Halliwell's murals decorating every surface) are terribly well done.",positive
2,"I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater and watching a light-hearted comedy. The plot is simplistic, but the dialogue is witty and the characters are likable (even the well bread suspected serial killer). While some may be disappointed when they realize this is not Match Point 2: Risk Addiction, I thought it was proof that Woody Allen is still fully in control of the style many of us have grown to love.<br /><br />This was the most I'd laughed at one of Woody's comedies in years (dare I say a decade?). While I've never been impressed with Scarlet Johanson, in this she managed to tone down her ""sexy"" image and jumped right into a average, but spirited young woman.<br /><br />This may not be the crown jewel of his career, but it was wittier than ""Devil Wears Prada"" and more interesting than ""Superman"" a great comedy to go see with friends.",positive
3,"Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fighting all the time.<br /><br />This movie is slower than a soap opera... and suddenly, Jake decides to become Rambo and kill the zombie.<br /><br />OK, first of all when you're going to make a film you must Decide if its a thriller or a drama! As a drama the movie is watchable. Parents are divorcing & arguing like in real life. And then we have Jake with his closet which totally ruins all the film! I expected to see a BOOGEYMAN similar movie, and instead i watched a drama with some meaningless thriller spots.<br /><br />3 out of 10 just for the well playing parents & descent dialogs. As for the shots with Jake: just ignore them.",negative
4,"Petter Mattei's ""Love in the Time of Money"" is a visually stunning film to watch. Mr. Mattei offers us a vivid portrait about human relations. This is a movie that seems to be telling us what money, power and success do to people in the different situations we encounter. <br /><br />This being a variation on the Arthur Schnitzler's play about the same theme, the director transfers the action to the present time New York where all these different characters meet and connect. Each one is connected in one way, or another to the next person, but no one seems to know the previous point of contact. Stylishly, the film has a sophisticated luxurious look. We are taken to see how these people live and the world they live in their own habitat.<br /><br />The only thing one gets out of all these souls in the picture is the different stages of loneliness each one inhabits. A big city is not exactly the best place in which human relations find sincere fulfillment, as one discerns is the case with most of the people we encounter.<br /><br />The acting is good under Mr. Mattei's direction. Steve Buscemi, Rosario Dawson, Carol Kane, Michael Imperioli, Adrian Grenier, and the rest of the talented cast, make these characters come alive.<br /><br />We wish Mr. Mattei good luck and await anxiously for his next work.",positive


# Preprocessing

In [5]:
# Preprocessing Function 

def preprocessing(sentence):
    sentence = sentence.replace("<br /><br />","")
    sentence = re.sub('<[^<]+?>','', sentence)
    sentence = sentence.strip()
    sentence = sentence.lower()
    sentence = ''.join(c for c in sentence if not c.isdigit())
    
    for pun in string.punctuation:
        sentence = sentence.replace(pun, '')
    tokens = word_tokenize(sentence)
    
    lem_word = [WordNetLemmatizer().lemmatize(word, pos='n') for word in [WordNetLemmatizer().lemmatize(words, pos='v') 
                                                                          for words in tokens]]
    sentence =  ' '.join(lem_word)
    
    return sentence

In [6]:
%%time
df['cleaned_reviews'] = df['review'].apply(preprocessing)

CPU times: user 57.4 s, sys: 43 ms, total: 57.5 s
Wall time: 57.7 s


In [7]:
# Encoding target

le = LabelEncoder()
df.sentiment = le.fit_transform(df.sentiment)

In [8]:
df.head()

Unnamed: 0,review,sentiment,cleaned_reviews
0,"One of the other reviewers has mentioned that after watching just 1 Oz episode you'll be hooked. They are right, as this is exactly what happened with me.<br /><br />The first thing that struck me about Oz was its brutality and unflinching scenes of violence, which set in right from the word GO. Trust me, this is not a show for the faint hearted or timid. This show pulls no punches with regards to drugs, sex or violence. Its is hardcore, in the classic use of the word.<br /><br />It is called OZ as that is the nickname given to the Oswald Maximum Security State Penitentary. It focuses mainly on Emerald City, an experimental section of the prison where all the cells have glass fronts and face inwards, so privacy is not high on the agenda. Em City is home to many..Aryans, Muslims, gangstas, Latinos, Christians, Italians, Irish and more....so scuffles, death stares, dodgy dealings and shady agreements are never far away.<br /><br />I would say the main appeal of the show is due to the fact that it goes where other shows wouldn't dare. Forget pretty pictures painted for mainstream audiences, forget charm, forget romance...OZ doesn't mess around. The first episode I ever saw struck me as so nasty it was surreal, I couldn't say I was ready for it, but as I watched more, I developed a taste for Oz, and got accustomed to the high levels of graphic violence. Not just violence, but injustice (crooked guards who'll be sold out for a nickel, inmates who'll kill on order and get away with it, well mannered, middle class inmates being turned into prison bitches due to their lack of street skills or prison experience) Watching Oz, you may become comfortable with what is uncomfortable viewing....thats if you can get in touch with your darker side.",1,one of the other reviewer have mention that after watch just oz episode youll be hook they be right a this be exactly what happen with methe first thing that strike me about oz be it brutality and unflinching scene of violence which set in right from the word go trust me this be not a show for the faint hearted or timid this show pull no punch with regard to drug sex or violence it be hardcore in the classic use of the wordit be call oz a that be the nickname give to the oswald maximum security state penitentary it focus mainly on emerald city an experimental section of the prison where all the cell have glass front and face inwards so privacy be not high on the agenda em city be home to manyaryans muslim gangsta latino christian italian irish and moreso scuffle death star dodgy deal and shady agreement be never far awayi would say the main appeal of the show be due to the fact that it go where other show wouldnt dare forget pretty picture paint for mainstream audience forget charm forget romanceoz doesnt mess around the first episode i ever saw strike me a so nasty it be surreal i couldnt say i be ready for it but a i watch more i develop a taste for oz and get accustom to the high level of graphic violence not just violence but injustice crook guard wholl be sell out for a nickel inmate wholl kill on order and get away with it well mannered middle class inmate be turn into prison bitch due to their lack of street skill or prison experience watch oz you may become comfortable with what be uncomfortable viewingthats if you can get in touch with your darker side
1,"A wonderful little production. <br /><br />The filming technique is very unassuming- very old-time-BBC fashion and gives a comforting, and sometimes discomforting, sense of realism to the entire piece. <br /><br />The actors are extremely well chosen- Michael Sheen not only ""has got all the polari"" but he has all the voices down pat too! You can truly see the seamless editing guided by the references to Williams' diary entries, not only is it well worth the watching but it is a terrificly written and performed piece. A masterful production about one of the great master's of comedy and his life. <br /><br />The realism really comes home with the little things: the fantasy of the guard which, rather than use the traditional 'dream' techniques remains solid then disappears. It plays on our knowledge and our senses, particularly with the scenes concerning Orton and Halliwell and the sets (particularly of their flat with Halliwell's murals decorating every surface) are terribly well done.",1,a wonderful little production the film technique be very unassuming very oldtimebbc fashion and give a comfort and sometimes discomforting sense of realism to the entire piece the actor be extremely well choose michael sheen not only have get all the polari but he have all the voice down pat too you can truly see the seamless edit guide by the reference to williams diary entry not only be it well worth the watch but it be a terrificly write and perform piece a masterful production about one of the great master of comedy and his life the realism really come home with the little thing the fantasy of the guard which rather than use the traditional dream technique remain solid then disappear it play on our knowledge and our sense particularly with the scene concern orton and halliwell and the set particularly of their flat with halliwells mural decorate every surface be terribly well do
2,"I thought this was a wonderful way to spend time on a too hot summer weekend, sitting in the air conditioned theater and watching a light-hearted comedy. The plot is simplistic, but the dialogue is witty and the characters are likable (even the well bread suspected serial killer). While some may be disappointed when they realize this is not Match Point 2: Risk Addiction, I thought it was proof that Woody Allen is still fully in control of the style many of us have grown to love.<br /><br />This was the most I'd laughed at one of Woody's comedies in years (dare I say a decade?). While I've never been impressed with Scarlet Johanson, in this she managed to tone down her ""sexy"" image and jumped right into a average, but spirited young woman.<br /><br />This may not be the crown jewel of his career, but it was wittier than ""Devil Wears Prada"" and more interesting than ""Superman"" a great comedy to go see with friends.",1,i think this be a wonderful way to spend time on a too hot summer weekend sit in the air condition theater and watch a lighthearted comedy the plot be simplistic but the dialogue be witty and the character be likable even the well bread suspect serial killer while some may be disappoint when they realize this be not match point risk addiction i think it be proof that woody allen be still fully in control of the style many of u have grow to lovethis be the most id laugh at one of woodys comedy in year dare i say a decade while ive never be impress with scarlet johanson in this she manage to tone down her sexy image and jump right into a average but spirit young womanthis may not be the crown jewel of his career but it be wittier than devil wear prada and more interest than superman a great comedy to go see with friend
3,"Basically there's a family where a little boy (Jake) thinks there's a zombie in his closet & his parents are fighting all the time.<br /><br />This movie is slower than a soap opera... and suddenly, Jake decides to become Rambo and kill the zombie.<br /><br />OK, first of all when you're going to make a film you must Decide if its a thriller or a drama! As a drama the movie is watchable. Parents are divorcing & arguing like in real life. And then we have Jake with his closet which totally ruins all the film! I expected to see a BOOGEYMAN similar movie, and instead i watched a drama with some meaningless thriller spots.<br /><br />3 out of 10 just for the well playing parents & descent dialogs. As for the shots with Jake: just ignore them.",0,basically there a family where a little boy jake think there a zombie in his closet his parent be fight all the timethis movie be slower than a soap opera and suddenly jake decide to become rambo and kill the zombieok first of all when youre go to make a film you must decide if it a thriller or a drama a a drama the movie be watchable parent be divorce argue like in real life and then we have jake with his closet which totally ruin all the film i expect to see a boogeyman similar movie and instead i watch a drama with some meaningless thriller spot out of just for the well play parent descent dialog a for the shot with jake just ignore them
4,"Petter Mattei's ""Love in the Time of Money"" is a visually stunning film to watch. Mr. Mattei offers us a vivid portrait about human relations. This is a movie that seems to be telling us what money, power and success do to people in the different situations we encounter. <br /><br />This being a variation on the Arthur Schnitzler's play about the same theme, the director transfers the action to the present time New York where all these different characters meet and connect. Each one is connected in one way, or another to the next person, but no one seems to know the previous point of contact. Stylishly, the film has a sophisticated luxurious look. We are taken to see how these people live and the world they live in their own habitat.<br /><br />The only thing one gets out of all these souls in the picture is the different stages of loneliness each one inhabits. A big city is not exactly the best place in which human relations find sincere fulfillment, as one discerns is the case with most of the people we encounter.<br /><br />The acting is good under Mr. Mattei's direction. Steve Buscemi, Rosario Dawson, Carol Kane, Michael Imperioli, Adrian Grenier, and the rest of the talented cast, make these characters come alive.<br /><br />We wish Mr. Mattei good luck and await anxiously for his next work.",1,petter matteis love in the time of money be a visually stun film to watch mr mattei offer u a vivid portrait about human relation this be a movie that seem to be tell u what money power and success do to people in the different situation we encounter this be a variation on the arthur schnitzlers play about the same theme the director transfer the action to the present time new york where all these different character meet and connect each one be connect in one way or another to the next person but no one seem to know the previous point of contact stylishly the film have a sophisticate luxurious look we be take to see how these people live and the world they live in their own habitatthe only thing one get out of all these soul in the picture be the different stag of loneliness each one inhabit a big city be not exactly the best place in which human relation find sincere fulfillment a one discern be the case with most of the people we encounterthe act be good under mr matteis direction steve buscemi rosario dawson carol kane michael imperioli adrian grenier and the rest of the talented cast make these character come alivewe wish mr mattei good luck and await anxiously for his next work


In [9]:
X_train, X_test, y_train, y_test = train_test_split(df["cleaned_reviews"], df["sentiment"], random_state=42, test_size=0.2)

In [10]:
X_train.shape

(40000,)

In [11]:
y_train.shape

(40000,)

In [12]:
#stop = stopwords.words('english')

# Logistic Regression

In [13]:
%%time
lgmodel = Pipeline([
    ("tfidf", TfidfVectorizer(ngram_range=(1,2))),
    ("lr", LogisticRegression())
])

## model training
lgmodel.fit(X_train, y_train)

CPU times: user 38.7 s, sys: 51.2 s, total: 1min 29s
Wall time: 19.6 s


In [14]:
y_pred = lgmodel.predict(X_test)

In [15]:
accuracy_score(y_test, y_pred)

0.8982

In [16]:
confusion_matrix(y_test, y_pred)

array([[4380,  581],
       [ 437, 4602]])

In [17]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.91      0.88      0.90      4961
           1       0.89      0.91      0.90      5039

    accuracy                           0.90     10000
   macro avg       0.90      0.90      0.90     10000
weighted avg       0.90      0.90      0.90     10000



In [18]:
X_validation = ["I enjoy watching it", "It's not good"]

In [19]:
y_pred = lgmodel.predict(X_validation)

In [20]:
y_pred

array([1, 0])

# XGBoost

In [21]:
xgb = Pipeline([
    ("tfidf", TfidfVectorizer(ngram_range=(1,2))),
    ("xgb", XGBClassifier())
])
xgb

In [22]:
xgb.fit(X_train, y_train)

In [23]:
y_pred = xgb.predict(X_test)

In [24]:
print(f"Accuracy score : {accuracy_score(y_test, y_pred)}\n")
print(f"Confusion Matrix :\n\n {confusion_matrix(y_test, y_pred)}\n")
print(f"Classification Report :\n\n{classification_report(y_test, y_pred)}")

Accuracy score : 0.8656

Confusion Matrix :

 [[4206  755]
 [ 589 4450]]

Classification Report :

              precision    recall  f1-score   support

           0       0.88      0.85      0.86      4961
           1       0.85      0.88      0.87      5039

    accuracy                           0.87     10000
   macro avg       0.87      0.87      0.87     10000
weighted avg       0.87      0.87      0.87     10000



In [25]:
X_validation = ["I enjoy watching it", "It's not good"]
y_pred = xgb.predict(X_validation)
y_pred

array([1, 1])

# Support Vector Machine

In [26]:
svc = Pipeline([
    ("tfidf", TfidfVectorizer(ngram_range=(1,2))),
    ("svc_", SVC())
])

In [27]:
svc

In [None]:
svc.fit(X_train, y_train)

In [None]:
y_pred = svc.predict(X_test)

In [None]:
print(f"Accuracy score : {accuracy_score(y_test, y_pred)}\n")
print(f"Confusion Matrix :\n\n {confusion_matrix(y_test, y_pred)}\n")
print(f"Classification Report :\n\n{classification_report(y_test, y_pred)}")

In [None]:
grid = {
    "svc__kernel": ["linear","poly","rbf","sigmoid"],
    "svc__C": [0.1, 1, 10],
    "svc__gamma": [0.01, 0.1, 1 ],
}

svc_search = GridSearchCV(
    svc,
    grid,
    cv=5,
    n_jobs=-1,
    verbose=1
)