<h1><center><u>RESTAURANT REVIEWS</u></center></h1>

<h2><center>Libraries Used</center></h2>

In [1]:
library(tm)
library(SnowballC)
library(randomForest)
library(caTools)
library(rpart)
library(e1071)
library(MLmetrics)

"package 'tm' was built under R version 3.6.1"Loading required package: NLP
randomForest 4.6-14
Type rfNews() to see new features/changes/bug fixes.
"package 'MLmetrics' was built under R version 3.6.1"
Attaching package: 'MLmetrics'

The following object is masked from 'package:base':

    Recall



<h2><center>Importing and Cleaning the Data</center></h2>

In [2]:
#importing the data
data = read.delim("Restaurant_Reviews.tsv",quote = "",stringsAsFactors = FALSE)

In [3]:
head(data)

Review,Liked
Wow... Loved this place.,1
Crust is not good.,0
Not tasty and the texture was just nasty.,0
Stopped by during the late May bank holiday off Rick Steve recommendation and loved it.,1
The selection on the menu was great and so were the prices.,1
Now I am getting angry and I want my damn pho.,0


In [4]:
str(data)

'data.frame':	1000 obs. of  2 variables:
 $ Review: chr  "Wow... Loved this place." "Crust is not good." "Not tasty and the texture was just nasty." "Stopped by during the late May bank holiday off Rick Steve recommendation and loved it." ...
 $ Liked : int  1 0 0 1 1 0 0 0 1 1 ...


In [5]:
#cleaning the data
corpous = VCorpus(VectorSource(data$Review))
corpous = tm_map(corpous,content_transformer(tolower)) # converting all text to lower alphabets
corpous = tm_map(corpous,removeNumbers)# removing all numeric data 
corpous = tm_map(corpous,removePunctuation)# removing all punctuation marks
corpous = tm_map(corpous,removeWords,stopwords()) # removing extra words like pronouns, articles etc.
corpous = tm_map(corpous,stemDocument)# converting all words to there root words
corpous = tm_map(corpous,stripWhitespace) # removing white spaces

In [6]:
dtm = DocumentTermMatrix(corpous)# creating sparse matrix
dtm = removeSparseTerms(dtm,0.999)
dtm

<<DocumentTermMatrix (documents: 1000, terms: 691)>>
Non-/sparse entries: 4549/686451
Sparsity           : 99%
Maximal term length: 12
Weighting          : term frequency (tf)

<h2>We have 691 unique words after cleaning the data and allowing 99% Sparsity.</h2>

In [7]:
dataset = as.data.frame(as.matrix(dtm))#converting again as data frame
dataset$Liked = data$Liked

In [8]:
head(dataset)

absolut,acknowledg,actual,ago,almost,also,although,alway,amaz,ambianc,...,wow,wrap,wrong,year,yet,youd,your,yummi,zero,Liked
0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [9]:
dim(dataset)

In [10]:
# Encoding the target feature as factor
dataset$Liked = factor(dataset$Liked, levels = c(0, 1))

In [11]:
# Splitting the dataset into the Training set and Test set
split = sample.split(dataset$Liked, SplitRatio = 0.8)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

<h2>Classifier 1: Random Forest</h2>

In [12]:
set.seed(123)
# Fitting classifier to the Training set
classifier1 = randomForest(x = training_set[-692],y = training_set$Liked,ntree = 30)

In [13]:
# Predicting the Test set results
y_pred1 = predict(classifier1, newdata = test_set[-692])

# Making the Confusion Matrix
cm1 = table(test_set[, 692], y_pred1)
cm1

   y_pred1
     0  1
  0 77 23
  1 28 72

In [14]:
Accuracy(y_pred1,test_set[,692])

In [15]:
F1_Score(test_set[,692],y_pred1)

<h2>Classifier 2: Decision Tree </h2>

In [16]:
# Fitting classifier to the Training set
classifier2 = rpart(formula = Liked~.,data = training_set) 

In [17]:
# Predicting the Test set results
y_pred2 = predict(classifier2, newdata = test_set[-692],type = "class")

# Making the Confusion Matrix
cm2 = table(test_set[,692],y_pred2)
cm2

   y_pred2
     0  1
  0 87 13
  1 39 61

In [18]:
Accuracy(y_pred2,test_set[,692])

In [19]:
F1_Score(test_set[,692],y_pred2)

<h2>Classifier 3: Naive Bayes</h2>

In [20]:
# Fitting classifier to the Training set
classifier3 = naiveBayes(x = training_set[-692],y = training_set$Liked)

In [21]:
# Predicting the Test set results
y_pred3 = predict(classifier3, newdata = test_set[-692])

In [22]:
# Making the Confusion Matrix
cm3 = table(test_set[, 692], y_pred3)
cm3

   y_pred3
     0  1
  0 12 88
  1 12 88

In [23]:
Accuracy(y_pred3,test_set[,692])

In [24]:
F1_Score(test_set[,692],y_pred3)

<h2>Performance Comparision</h2><br>
<h3><ul><li> Random Forest Classification: 74.5%</li><br>
    <li>Decision Tree Classification: 74%</li><br>
    <li>Naive Bayes Classification: 50%</li></ul></h3>