### Introduction
#### Sentiment Analysis: It is the interpretation and classification of emotions (positive, negative and neutral) within text data using text analysis techniques. Sentiment analysis allows organizations to identify public sentiment towards certain words or topics.
#### Support Vector Machine (SVM) is a relatively simple Supervised Machine Learning Algorithm used for classification and/or regression. It is more preferred for classification but is sometimes very useful for regression as well. Basically, SVM finds a hyper-plane that creates a boundary between the types of data. In

In [1]:
#importing all required libraries
import warnings
warnings.filterwarnings('ignore')
import random
import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.feature_extraction.text import CountVectorizer

#### I have used SVM model and achieved a score of 87%.This is excellant project for beginners.
#### To reduce the burden on the system I have performed systematic sampling on the dataset

## Accessing Dataset

In [2]:
reviews=pd.read_csv('/kaggle/input/amazon-customerreviews-polarity/train.csv',names=['ratings','title','review'])
reviews

Unnamed: 0,ratings,title,review
0,2,Stuning even for the non-gamer,This sound track was beautiful! It paints the ...
1,2,The best soundtrack ever to anything.,I'm reading a lot of reviews saying that this ...
2,2,Amazing!,This soundtrack is my favorite music of all ti...
3,2,Excellent Soundtrack,I truly like this soundtrack and I enjoy video...
4,2,"Remember, Pull Your Jaw Off The Floor After He...","If you've played the game, you know how divine..."
...,...,...,...
3599995,1,Don't do it!!,The high chair looks great when it first comes...
3599996,1,"Looks nice, low functionality",I have used this highchair for 2 kids now and ...
3599997,1,"compact, but hard to clean","We have a small house, and really wanted two o..."
3599998,1,what is it saying?,not sure what this book is supposed to be. It ...


## Accessing the data

In [3]:
reviews.groupby('ratings').count()

Unnamed: 0_level_0,title,review
ratings,Unnamed: 1_level_1,Unnamed: 2_level_1
1,1799958,1800000
2,1799965,1800000


### Performing Systematic Sampling on the dataset

In [4]:
def systematic_sampling(df, step):
 
    indexes = np.arange(0, len(df), step=step)
    systematic_sample = df.iloc[indexes]
    return systematic_sample
reviews_subset=systematic_sampling(reviews,100)

In [5]:
reviews_subset.groupby('ratings').count()

Unnamed: 0_level_0,title,review
ratings,Unnamed: 1_level_1,Unnamed: 2_level_1
1,17782,17782
2,18217,18218


## Resetting Index

In [6]:
reviews_subset=reviews_subset.reset_index(drop='index')

In [7]:
reviews_subset

Unnamed: 0,ratings,title,review
0,2,Stuning even for the non-gamer,This sound track was beautiful! It paints the ...
1,2,textbook,Book shipped quickly and was in excellent cond...
2,1,Worthless and cheap,How is this thing awful? let me count the ways...
3,1,super wack,just like No-Limit Cash Money has no shame at ...
4,1,"Starts out with promise, but then goes downhill.",Too bad Adam Sandler could not have been given...
...,...,...,...
35995,2,Vampire Diaries Season 1,What should I say? This item is just the best ...
35996,1,Day one - two sets of batteries and train has ...,We received theThomas And Friends Wooden Railw...
35997,2,DTW- A must read for fantasy football diehards...,"If you're reading this review, then the book i..."
35998,2,unique pianist,"The ""Andante Spianato"" is the ultimate one! I ..."


## Creating a new column sentiment containing Positive and Negative 

In [8]:
reviews_subset['sentiment']=''
for i in reviews_subset.index:
    if reviews_subset.loc[i,'ratings']==2:
        reviews_subset.loc[i,'sentiment']='POSITIVE'
    else:
        reviews_subset.loc[i,'sentiment']='NEGATIVE'

In [9]:
reviews_subset

Unnamed: 0,ratings,title,review,sentiment
0,2,Stuning even for the non-gamer,This sound track was beautiful! It paints the ...,POSITIVE
1,2,textbook,Book shipped quickly and was in excellent cond...,POSITIVE
2,1,Worthless and cheap,How is this thing awful? let me count the ways...,NEGATIVE
3,1,super wack,just like No-Limit Cash Money has no shame at ...,NEGATIVE
4,1,"Starts out with promise, but then goes downhill.",Too bad Adam Sandler could not have been given...,NEGATIVE
...,...,...,...,...
35995,2,Vampire Diaries Season 1,What should I say? This item is just the best ...,POSITIVE
35996,1,Day one - two sets of batteries and train has ...,We received theThomas And Friends Wooden Railw...,NEGATIVE
35997,2,DTW- A must read for fantasy football diehards...,"If you're reading this review, then the book i...",POSITIVE
35998,2,unique pianist,"The ""Andante Spianato"" is the ultimate one! I ...",POSITIVE


## Forming a single column of review by adding title column and existing review column together

In [10]:
reviews_subset['review']=reviews_subset['title']+reviews_subset['review']

In [11]:
reviews_subset.head()

Unnamed: 0,ratings,title,review,sentiment
0,2,Stuning even for the non-gamer,Stuning even for the non-gamerThis sound track...,POSITIVE
1,2,textbook,textbookBook shipped quickly and was in excell...,POSITIVE
2,1,Worthless and cheap,Worthless and cheapHow is this thing awful? le...,NEGATIVE
3,1,super wack,super wackjust like No-Limit Cash Money has no...,NEGATIVE
4,1,"Starts out with promise, but then goes downhill.","Starts out with promise, but then goes downhil...",NEGATIVE


## Dropping old title column

In [12]:
reviews_subset=reviews_subset.drop('title',axis='columns')

## Checking data for NaN values 

In [13]:
reviews_subset.isna().sum()

ratings      0
review       1
sentiment    0
dtype: int64

In [14]:
reviews_subset=reviews_subset.dropna()

## Splitting dataset into train and test dataset using train_test_split method.This is convenient to test the model with datasets not seen by the model

In [15]:
x_train,x_test,y_train,y_test=train_test_split(reviews_subset['review'],reviews_subset['sentiment'],test_size=0.25)

In [16]:
cv=CountVectorizer()
x_train_cv=cv.fit_transform(x_train.values)
x_test_cv=cv.transform(x_test.values)

## Fitting the Model

In [17]:
models=svm.SVC(C=1000,kernel='sigmoid',gamma='auto')
models.fit(x_train_cv,y_train)

SVC(C=1000, gamma='auto', kernel='sigmoid')

In [18]:
models.score(x_test_cv,y_test)

0.8748888888888889

In [19]:
y_pred=models.predict(x_test_cv)
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

    NEGATIVE       0.88      0.87      0.87      4460
    POSITIVE       0.87      0.88      0.88      4540

    accuracy                           0.87      9000
   macro avg       0.87      0.87      0.87      9000
weighted avg       0.87      0.87      0.87      9000



In [20]:
text=['i am happy','I hate this','This is good','Better luck next time']
_vector = cv.transform(text)
models.predict(_vector)

array(['POSITIVE', 'NEGATIVE', 'POSITIVE', 'NEGATIVE'], dtype=object)