In [12]:
import pandas as pd 
import numpy as np
import prepare
#to see the whole review, display max column width
pd.set_option('display.max_colwidth', None)

# Rate My Review
## An Analysis on Hotel reviews in Texas
#### Xavier Carter, September 2021

----

#### The Dataset
- Using Selinium, 13,800 reviews were gathered from various hotels across 4 major cities in Texas (Houston, Austin, Dallas, San Antonio)

#### Project Goals
- Analyze reviews to understand correlation to the review rating and the review. 
- Build a machine learning model to predict what rating a review should get.

#### Executive Summary
- Executive Summary here

----

## Acquire
- Utilizing Selinium (see acquire1.py and acquire2.py) , Gathering review information from TripAdvisor.com
- For sake of time, the max number of reviews looked at for each hotel was maxed to 35, as some hotels had hundreds of reviews

In [2]:
df = pd.read_csv('hotel_data.csv')

In [3]:
df.head(2)

Unnamed: 0,hotel_name,hotel_city,date_of_stay,review_rating,review
0,Drury Plaza Hotel San Antonio Riverwalk,San Antonio,September 2021,5,Joseph was so helpful and attentive! Awesome customer service. Made our trip more enjoyable! This will now be our go to hotel when we come to San Antonio. Everything about the hotel was nice and the staff was very friendly. Very pleased with the whole experience.
1,Drury Plaza Hotel San Antonio Riverwalk,San Antonio,September 2020,5,"We stayed one night at the Drury Plaza Riverwalk in mid-September. Sooo enjoyed our stay. Definitely our favorite hotel on the Riverwalk. We specifically stayed here for the rooms with the balconies overlooking the San Fernando Cathedral. I sat on that balcony all day long, reading and enjoying the view, even despite the day of rain! Love the separate bedroom! The afternoon happy hour could have easily sufficed for dinner had the allure of the Riverwalk restaurants not been there. The indoor pool/hot tub was nice, and the fitness center was perfectly equipped with great views while running the treadmill. The breakfast was hearty and very good quality...love that they have biscuits and gravy! Every employee we encountered was upbeat and kind and seemed to be interested in serving"


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13801 entries, 0 to 13800
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   hotel_name     13801 non-null  object
 1   hotel_city     13801 non-null  object
 2   date_of_stay   13801 non-null  object
 3   review_rating  13801 non-null  int64 
 4   review         13801 non-null  object
dtypes: int64(1), object(4)
memory usage: 539.2+ KB


In [5]:
df.isna().sum()

hotel_name       0
hotel_city       0
date_of_stay     0
review_rating    0
review           0
dtype: int64

In [6]:
df.describe()

Unnamed: 0,review_rating
count,13801.0
mean,3.622564
std,1.559053
min,1.0
25%,2.0
50%,4.0
75%,5.0
max,5.0


In [7]:
for i in df.columns:
    print(df[i].value_counts())
    print('---------------------------')

Fairmont Austin                                          70
La Cantera Resort & Spa                                  70
Hotel San Jose                                           35
Athens Hotel Suites                                      35
La Quinta by Wyndham Houston Willowbrook Vintage Park    35
                                                         ..
Best Value Inn of San Antonio/Kirby                       1
Hampton Inn & Suites Dallas I-30 Cockrell Hill            1
Ramada Limited Addison                                    1
Grand Inn                                                 1
GreenTree Hotel Houston Hobby                             1
Name: hotel_name, Length: 548, dtype: int64
---------------------------
Austin         4033
San Antonio    3633
Houston        3574
Dallas         2561
Name: hotel_city, dtype: int64
---------------------------
 August 2021      1356
 July 2021        1215
 June 2021         696
 February 2020     696
 May 2021          695
              

#### Acquire Findings 

#### TO-Do's:
1). The cap for reviews was 35, each review being unique, since value counts of 70 and 2 were seen, duplicates exist in the data. duplicates need to be removed.

2). Month and year can be in their own seperate columns.

3.) no null values or missing values. 

4.) Standardize english words using NLP processing, Standard cleaning using NLTK.

## Prepare
- In Preperation, we will
     * Drop Duplicates
     * Split month and year into seperate columns 
     * Drop date of stay column
     * Prep review content (Basic cleaning, tokenizing, lemmentizing, removing stop words, excluding common negative stop words. As they add to negative sentiment)
     * Makeing columns for word and letter count
     * creating columns for negative , postive and neautral sentiment

In [8]:
df = prepare.prep_review_data(df)

In [10]:
df.sample(3)

Unnamed: 0,hotel_name,hotel_city,review_rating,review,month_of_stay,year_of_stay,review_cleaned,message_length,word_count,postive_sentiment,negative_sentiment,neatral_sentiment
1200,Hilton Austin,Austin,4,I enjoyed my trip but I would say for in the future they should consider putting more food options in the hotel because there was only meat and a salad as options. Not the best for someone trying to eat better or go vegan!,June,2021,enjoyed trip would say future consider putting food option hotel meat salad options. not best someone trying eat better go vegan !,130,22,0.233,0.121,0.646
12512,Super 8 by Wyndham Houston Hobby Airport South,Houston,5,"Very Clean Rooms and nice helpful staff. This Hotel is nice and quiet, staff is helpful, it’s close to shopping area and food. Very clean . Close to freeway. Love their free breakfast and free WiFi that actually works.",September,2018,"clean room nice helpful staff. hotel nice quiet , staff helpful , close shopping area food. clean . close freeway. love free breakfast free wifi actually work .",160,28,0.671,0.0,0.329
2446,Magnolia Hotel Dallas Downtown,Dallas,1,"The only good thing about this property was the valet service. 27 floors and only 2 of 7 elevators working. When I complained, the response was ""we're an old hotel."" The bar and restaurant are shut down. I'd have been better of at Motel 6!",June,2021,"good thing property valet service. 27 floor 2 7 elevator working. complained , response "" ' old hotel. "" bar restaurant shut down. ' better motel 6 !",149,28,0.246,0.109,0.645
