# Exploratory Data Analysis of Google Reviews

In this notebook, you will find an Exploratory Data Analysis (EDA) of the Google Review data. The EDA involves pre-processing.

## Notes

In [1]:
# To detect a language
# for r in google_reviews['Review Text']:
#     r = str(r)
#     if original in r:
#         _, _, _, detected_language = cld2.detect(r, returnVectors=True)
#         print(detected_language)

## Loading Data

In [40]:
# IMPORT REQUIRED PACKAGES
import pandas as pd
from pandas import util
import sklearn
import os
import glob
import re

import nltk
import pycld2 as cld2

In [41]:
# Load venues_ams data
venues_ams = pd.read_csv("venues_ams.csv") # NOT USING?

# Define path and files    
path = "./GoogleReviews"
#all_files = os.path.join(path, "*.csv")
all_files = glob.glob(path + "/*.csv")

# Create dataframe containing all reviews
google_reviews = pd.concat((pd.read_csv(f) for f in all_files))

In [4]:
# Save google_reviews in csv file
google_reviews.to_csv("total_goodgle_reviews.csv")

## Data Cleaning

Removing nan, removing the translated review, changing the types of the data.
Not yeat removing punctuation!

In [42]:
def remove_nan(data, column_name):
    '''Returns only the data where values in column_name are not empty (NaN)'''
    
    data = data[data[column_name].notna()]
    
    return data


def clean_text_lang(data, column_name, sep):
    '''Returns only the (Translated by Google) English review not the (Original) review from column_name.
        Indicated by sep, specifies what separator to separate the review by.
        '''
    
    for row, col in data.iterrows():
        rev = col[4] # col[4] = Review Text
        if sep in rev:
            translation, separator, original = rev.rpartition(sep)
            data[column_name][row] = translation
        else:
            data[column_name][row] = rev
    
    return data

def clean_dtypes(data):
    ''' Changes the dtype of the data columns.'''    
    
    # Changing the column types 
    type_dict = {'Unnamed: 0': object,
                 'Name': 'str',
                 'Review Rate': 'str',
                 'Review Time': 'str', 
                 'Review Text': 'str',
                 }

    # Replacing all \n
    data.replace(r"\n", '', inplace = True)
    data.replace(r"Translated by Google", '', inplace = True)

    # Change data type for all columns
    data = data.astype(type_dict)
    
    return data

In [43]:
%%time
google_reviews = remove_nan(google_reviews, 'Review Text')

CPU times: user 152 ms, sys: 34.2 ms, total: 187 ms
Wall time: 201 ms


In [44]:
%%time
google_reviews = clean_text_lang(google_reviews, 'Review Text', "(Original)")

CPU times: user 6min 35s, sys: 4.21 s, total: 6min 39s
Wall time: 6min 45s


In [45]:
%%time
google_reviews = clean_dtypes(google_reviews)

CPU times: user 137 ms, sys: 31 ms, total: 168 ms
Wall time: 172 ms


In [48]:
# Preview data (original text removed, (Translated by Google) removes, types changed)
google_reviews[:10]

Unnamed: 0.1,Unnamed: 0,Name,Review Rate,Review Time,Review Text
0,0,Ellis,5 stars,3 years ago,"It was a bit quite when we went in, but don’t ..."
1,1,Ellis,5 stars,2 years ago,Nice cozy place which serves very tasty burger...
2,2,Ellis,5 stars,3 years ago,Really nice place. One of my favourite burger ...
3,3,Ellis,2 stars,3 years ago,The Service was quite good but the burgers we ...
4,4,Ellis,5 stars,2 years ago,I had a very nice experience! The staff were r...
5,5,Ellis,5 stars,4 years ago,Ellis Gourmet Burger - Today (15.03.2018) I w...
6,6,Ellis,3 stars,2 years ago,"The taste was okay. Unfortunately, when we got..."
7,7,Ellis,5 stars,3 years ago,The only disappointing thing about this place ...
8,8,Ellis,5 stars,2 years ago,Yesterday in the afternoon we had some burgers...
9,9,Ellis,5 stars,3 years ago,Really cosy. Has an actual fireplace. Great fo...


## Pre-processing

In [49]:
def clean_strings(data, column_name):
    ''' Lower textual data.'''
    #'''Remove punctuation.'''
    for row, col in copy_reviews.iterrows():
        rev = col[4]
        data[column_name][row]=rev.lower()
        #data[column_name][row]=re.sub(r'[^\w\s]', '', rev.lower())
    
    return data

In [32]:
# Make a copy of the reviews to test cleaning
copy_reviews = google_reviews.copy()
#copy_reviews = copy_reviews[0:50]

In [50]:
%%time
google_reviews = clean_strings(google_reviews, 'Review Text')

CPU times: user 6min 59s, sys: 5.39 s, total: 7min 4s
Wall time: 7min 30s


In [51]:
# Preview data (lowercase)
google_reviews[:10]

Unnamed: 0.1,Unnamed: 0,Name,Review Rate,Review Time,Review Text
0,0,Ellis,5 stars,3 years ago,"it was a bit quite when we went in, but don’t ..."
1,1,Ellis,5 stars,2 years ago,nice cozy place which serves very tasty burger...
2,2,Ellis,5 stars,3 years ago,really nice place. one of my favourite burger ...
3,3,Ellis,2 stars,3 years ago,the service was quite good but the burgers we ...
4,4,Ellis,5 stars,2 years ago,i had a very nice experience! the staff were r...
5,5,Ellis,5 stars,4 years ago,ellis gourmet burger - today (15.03.2018) i w...
6,6,Ellis,3 stars,2 years ago,"the taste was okay. unfortunately, when we got..."
7,7,Ellis,5 stars,3 years ago,the only disappointing thing about this place ...
8,8,Ellis,5 stars,2 years ago,yesterday in the afternoon we had some burgers...
9,9,Ellis,5 stars,3 years ago,really cosy. has an actual fireplace. great fo...


In [52]:
google_reviews.to_csv("cleaned_reviews.csv")

In [18]:
# Make a copy of the reviews to test cleaning
copy_reviews = google_reviews.copy()
#copy_reviews = copy_reviews[0:50]

## Trying out stuff

In [90]:
# Take only the reviews that contain Review Text (where Review Text is not NaN)
google_reviews = google_reviews[google_reviews['Review Text'].notna()]

In [91]:
# Take only the (translated) English reviews from the Review Text
for row, col in google_reviews.iterrows():
    rev = col[4] #col 4 = Review Text
    if "(Original)" in rev:
        translation, separator, original = rev.rpartition("(Original)")
        google_reviews['Review Text'][row] = translation
    else:
        google_reviews['Review Text'][row] = rev

In [107]:
# Preview
google_reviews[1000:1020]

Unnamed: 0.1,Unnamed: 0,Name,Review Rate,Review Time,Review Text
440,440,McDonald's,4 stars,3 years ago,Translated by Google its all OK\n\n
441,441,McDonald's,3 stars,3 years ago,Translated by Google The petet was cold\n\n
442,442,McDonald's,5 stars,a year ago,Translated by Google Very tasty\n\n
444,444,McDonald's,2 stars,3 years ago,Translated by Google Busy\n\n
445,445,McDonald's,5 stars,2 years ago,Translated by Google Good offers and promotion...
446,446,McDonald's,1 star,3 years ago,Translated by Google Bad\n\n
447,447,McDonald's,3 stars,2 years ago,Translated by Google Slow and inaccurate servi...
448,448,McDonald's,4 stars,3 years ago,Translated by Google Ole is barely normal\n\n
449,449,McDonald's,5 stars,3 years ago,Translated by Google Too long\n\n
450,450,McDonald's,3 stars,a year ago,Translated by Google Before\n\n
