## San Francisco Restaurant Reviews Natural Language Processing Stage
### Darren Lyles

<p>In the following notebook, we will use fundamentals of natural language processing to clean up the reviews and add new features to the data set.  Ultimately we will clean up the reviews by tokenizing the reviews and adding this to a new column.  ##Please add more details later in this introduction</p>

In [1]:
import nltk
import pandas as pd
import numpy as np

In [2]:
df_restaurant_reviews = pd.read_csv('restaurant_reviews.csv')
df_restaurant_reviews.head()

Unnamed: 0.1,Unnamed: 0,name,cuisine,address,locality,region,hours,email,tel,fax,...,website,latitude,longitude,price,rating,review_url,review_title,review_text,review_rating,review_date
0,0,21st Amendment Brewery & Restaurant,"['Cafe', 'Pub Food', 'American', 'Burgers', 'P...",563 2nd St,San Francisco,CA,"{'monday': [['11:30', '23:59']], 'tuesday': [[...",new-pub@21st-amendment.com,(415) 369-0900,(415) 369-0909,...,http://www.21st-amendment.com,37.782448,-122.392576,2,4.0,https://www.tripadvisor.com/ShowUserReviews-g6...,Great drinks and food,They have great local craft beers and probably...,4,"Mar 28, 2016 12:00:00 AM"
1,1,21st Amendment Brewery & Restaurant,"['Cafe', 'Pub Food', 'American', 'Burgers', 'P...",563 2nd St,San Francisco,CA,"{'monday': [['11:30', '23:59']], 'tuesday': [[...",new-pub@21st-amendment.com,(415) 369-0900,(415) 369-0909,...,http://www.21st-amendment.com,37.782448,-122.392576,2,4.0,https://www.tripadvisor.com/ShowUserReviews-g6...,Good food &amp; beer,We went to the downtown SF location. The resta...,4,"Mar 27, 2016 12:00:00 AM"
2,2,21st Amendment Brewery & Restaurant,"['Cafe', 'Pub Food', 'American', 'Burgers', 'P...",563 2nd St,San Francisco,CA,"{'monday': [['11:30', '23:59']], 'tuesday': [[...",new-pub@21st-amendment.com,(415) 369-0900,(415) 369-0909,...,http://www.21st-amendment.com,37.782448,-122.392576,2,4.0,https://www.tripadvisor.com/ShowUserReviews-g6...,Pretty good beers,I just came to this place for drinks with an o...,4,"Mar 16, 2016 12:00:00 AM"
3,3,21st Amendment Brewery & Restaurant,"['Cafe', 'Pub Food', 'American', 'Burgers', 'P...",563 2nd St,San Francisco,CA,"{'monday': [['11:30', '23:59']], 'tuesday': [[...",new-pub@21st-amendment.com,(415) 369-0900,(415) 369-0909,...,http://www.21st-amendment.com,37.782448,-122.392576,2,4.0,https://www.tripadvisor.com/ShowUserReviews-g6...,Ridiculously overpriced (yes I live in SF),"Mediocre food (not bad, just mediocre, you can...",3,"Mar 8, 2016 12:00:00 AM"
4,4,21st Amendment Brewery & Restaurant,"['Cafe', 'Pub Food', 'American', 'Burgers', 'P...",563 2nd St,San Francisco,CA,"{'monday': [['11:30', '23:59']], 'tuesday': [[...",new-pub@21st-amendment.com,(415) 369-0900,(415) 369-0909,...,http://www.21st-amendment.com,37.782448,-122.392576,2,4.0,https://www.tripadvisor.com/ShowUserReviews-g6...,Team dinner,We headed out for our team dinner to this esta...,4,"Mar 1, 2016 12:00:00 AM"


<p>Below, I am applying preprocessing techniques to each review in our dataset.
   The first step that needs to be done is to set each string review into the lowercase,
   then I tokenize each review such that each word is an element in a list.
   The last three steps I took was to filter out stopwords, punctuation, and finally
   lemmatize each word in each review.
</p>

In [3]:
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from string import punctuation

#Set all words in each review in the lower case
df_restaurant_reviews['tokenized_review_text'] = df_restaurant_reviews.review_text.str.lower()

#Tokenize all Restaurant Reviews
df_restaurant_reviews['tokenized_review_text'] = \
df_restaurant_reviews['tokenized_review_text'].apply(lambda x: word_tokenize(x))

#Filter out stopwords
sw = set(stopwords.words('english'))
df_restaurant_reviews['tokenized_review_text'] = \
df_restaurant_reviews['tokenized_review_text'].apply(lambda x: [y for y in x if y not in sw])

#Filter out punctuation
df_restaurant_reviews['tokenized_review_text'] = \
df_restaurant_reviews['tokenized_review_text'].apply(lambda x: [y for y in x if y not in punctuation])

#Lemmatize each word for each review
wordnet_lemmatizer = WordNetLemmatizer()
df_restaurant_reviews['tokenized_review_text'] = \
df_restaurant_reviews['tokenized_review_text'].apply(lambda x: [wordnet_lemmatizer.lemmatize(w) for w in x])

df_restaurant_reviews.head()

Unnamed: 0.1,Unnamed: 0,name,cuisine,address,locality,region,hours,email,tel,fax,...,latitude,longitude,price,rating,review_url,review_title,review_text,review_rating,review_date,tokenized_review_text
0,0,21st Amendment Brewery & Restaurant,"['Cafe', 'Pub Food', 'American', 'Burgers', 'P...",563 2nd St,San Francisco,CA,"{'monday': [['11:30', '23:59']], 'tuesday': [[...",new-pub@21st-amendment.com,(415) 369-0900,(415) 369-0909,...,37.782448,-122.392576,2,4.0,https://www.tripadvisor.com/ShowUserReviews-g6...,Great drinks and food,They have great local craft beers and probably...,4,"Mar 28, 2016 12:00:00 AM","[great, local, craft, beer, probably, one, bes..."
1,1,21st Amendment Brewery & Restaurant,"['Cafe', 'Pub Food', 'American', 'Burgers', 'P...",563 2nd St,San Francisco,CA,"{'monday': [['11:30', '23:59']], 'tuesday': [[...",new-pub@21st-amendment.com,(415) 369-0900,(415) 369-0909,...,37.782448,-122.392576,2,4.0,https://www.tripadvisor.com/ShowUserReviews-g6...,Good food &amp; beer,We went to the downtown SF location. The resta...,4,"Mar 27, 2016 12:00:00 AM","[went, downtown, sf, location, restaurant, rea..."
2,2,21st Amendment Brewery & Restaurant,"['Cafe', 'Pub Food', 'American', 'Burgers', 'P...",563 2nd St,San Francisco,CA,"{'monday': [['11:30', '23:59']], 'tuesday': [[...",new-pub@21st-amendment.com,(415) 369-0900,(415) 369-0909,...,37.782448,-122.392576,2,4.0,https://www.tripadvisor.com/ShowUserReviews-g6...,Pretty good beers,I just came to this place for drinks with an o...,4,"Mar 16, 2016 12:00:00 AM","[came, place, drink, old, colleague, beer, pre..."
3,3,21st Amendment Brewery & Restaurant,"['Cafe', 'Pub Food', 'American', 'Burgers', 'P...",563 2nd St,San Francisco,CA,"{'monday': [['11:30', '23:59']], 'tuesday': [[...",new-pub@21st-amendment.com,(415) 369-0900,(415) 369-0909,...,37.782448,-122.392576,2,4.0,https://www.tripadvisor.com/ShowUserReviews-g6...,Ridiculously overpriced (yes I live in SF),"Mediocre food (not bad, just mediocre, you can...",3,"Mar 8, 2016 12:00:00 AM","[mediocre, food, bad, mediocre, find, food, pr..."
4,4,21st Amendment Brewery & Restaurant,"['Cafe', 'Pub Food', 'American', 'Burgers', 'P...",563 2nd St,San Francisco,CA,"{'monday': [['11:30', '23:59']], 'tuesday': [[...",new-pub@21st-amendment.com,(415) 369-0900,(415) 369-0909,...,37.782448,-122.392576,2,4.0,https://www.tripadvisor.com/ShowUserReviews-g6...,Team dinner,We headed out for our team dinner to this esta...,4,"Mar 1, 2016 12:00:00 AM","[headed, team, dinner, establishment, lot, goo..."
