# Hotel Analysis (2017)

The following analysis will aid the reader to understand how the data is composed for the most important get insights on what makes consumers give a positive or negative reviews taking as the main factor the average reviews users give towards the hotels in the cities of: United Kingdom,Amsterdam, Paris & Barcelona:

The original data was scraped by Booking.com and they are the owners of the data, for this analysis 2017 was taken into consideration.

Methodology: Through a Virtual Machine and using Spark, data transformations were done with the help of the “PySpark” library. 

Steps to follow:
1. Set up the envirorment in this case: PySpark 
2. Load Data set
3. Understand data by Meta data analysis 
4. Basic profiling 
5. Transformations for Business Questions: 
    1.	Which hotels are the ones that have the best and worst score devised by location?
    2.	Which location is the one that has to watch out the most and what are my worst hotels based on the classification of goodness?
    3.	What other factor besides location and hotel are taken into consideration for giving a bad review?


## 1. PySpark environment setup

In [534]:
import findspark
findspark.init()

from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession

sc = SparkContext.getOrCreate()
spark = SparkSession(sc)

## 2. Loading CVS source and Spark data abstraction (DataFrame) setup

In [535]:
HotelsDF = spark.read \
                 .option("inferSchema", "true") \
                 .option("header", "true") \
                 .csv("Hotel_Reviews.csv")

## 3. Data set metadata analysis
### A. Display schema and size of the DataFrame

In [536]:
from IPython.display import display, Markdown

HotelsDF.printSchema()
display(Markdown("This DataFrame has **%d rows**." % HotelsDF.count()))

root
 |-- Hotel_Address: string (nullable = true)
 |-- Location: string (nullable = true)
 |-- Additional_Number_of_Scoring: integer (nullable = true)
 |-- Review_Date: string (nullable = true)
 |-- Average_Score: double (nullable = true)
 |-- Hotel_Name: string (nullable = true)
 |-- Reviewer_Nationality: string (nullable = true)
 |-- Negative_Review: string (nullable = true)
 |-- Review_Total_Negative_Word_Counts: integer (nullable = true)
 |-- Total_Number_of_Reviews: integer (nullable = true)
 |-- Positive_Review: string (nullable = true)
 |-- Review_Total_Positive_Word_Counts: integer (nullable = true)
 |-- Total_Number_of_Reviews_Reviewer_Has_Given: integer (nullable = true)
 |-- Reviewer_Score: double (nullable = true)
 |-- Tags: string (nullable = true)
 |-- days_since_review: string (nullable = true)



This DataFrame has **111022 rows**.

In [537]:
HotelsDF.cache() # optimization to make the processing faster
HotelsDF.sample(False, 0.1).take(2)

[Row(Hotel_Address=' s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands', Location='Amsterdam Netherlands', Additional_Number_of_Scoring=194, Review_Date='7/17/17', Average_Score=7.7, Hotel_Name='Hotel Arena', Reviewer_Nationality=' United Kingdom ', Negative_Review=' Cleaner did not change our sheet and duvet everyday but just made bed They also didn t clean the floor and changed the body gel when we run out of it ', Review_Total_Negative_Word_Counts=33, Total_Number_of_Reviews=1403, Positive_Review=' The room is spacious and bright The hotel is located in a quiet and beautiful park ', Review_Total_Positive_Word_Counts=18, Total_Number_of_Reviews_Reviewer_Has_Given=6, Reviewer_Score=4.6, Tags="[' Leisure trip ', ' Group ', ' Duplex Twin Room ', ' Stayed 5 nights ', ' Submitted from a mobile device ']", days_since_review='17 days'),
 Row(Hotel_Address=' s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands', Location='Amsterdam Netherlands', Additional_Number_of_Scoring=194, R

### B. Get one or multiple random samples from the data set

### C. Data entities, metrics and dimensions

I've identified the following elements:

* **Entities:** Hotel Name (main one which is measured - facts), Reviewer Score
* **Metrics:** Reviewer Score,Average Score, Review_Total_Negative_Word_Counts, Review_Total_Positive_Word_Counts
* **Dimensions:** Location, Reviewer_Nationality

### D. Column categorization

The following could be a potential column categorization:

* **Hotel related columns:** *Hotel name*, *Location*
* **Performance related columns:** *Reviewer_Nationality*, *Reviewer Score*, *Review_Total_Negative_Word_Counts*, *Review_Total_Positive_Word_Counts*.



## 4. Columns groups basic profiling to better understand our data set
### A. Hotel related columns basic profiling¶

In [538]:
##4.1. Library Uploading for column profiling and transfromations ##
    
from IPython.display import display, Markdown
from pyspark.sql.functions import when, count, col, countDistinct, desc, first, lit, avg,sum, min, max,avg

##4.2. Since the data set has the reviews pero row is important to understand how many hotels there are to analyze 
        #as well which ones are the most mentioned (most visited) in order to eliminate further biass on the review categorization##

print ("Summary of columns Hotel Names: How many distinct hotels there are")

Hotel_total= HotelsDF.select(countDistinct("Hotel_Name").alias("Hotel_Name")).show()

print ("Summary of columns Hotel Names: Top 20 Most Visited Hotels")

Most_Visited_Hotels = HotelsDF.select("Hotel_Name") \
   .groupBy("Hotel_Name") \
   .agg(count("Hotel_Name").alias("NumHotel_Name"))\
   .orderBy("NumHotel_Name",ascending=False).show(20,truncate=False)

print ("Summary of columns Hotel Names: Topt 20 Least Visited Hotels")

Least_Visited_Hotels = HotelsDF.select("Hotel_Name") \
   .groupBy("Hotel_Name") \
   .agg(count("Hotel_Name").alias("NumHotel_Name"))\
   .orderBy("NumHotel_Name",ascending=True).show(30)


   ##4.3. After reviewing the least visited hotels, a filter is needed according to the given values so the analysis can be separated in categories: 
    #"Most visited hotels": count > 94
    ## Least visited hotels" >45 and < 94 

    
print ("Summary of columns Hotel Names: Most Visited Hotels")


print("Most Visited Hotels & Rating")
Reviews_HotelDF= HotelsDF.groupBy("Hotel_Name","location").agg(avg("Reviewer_Score"),\
     count("Hotel_Name"))\
    .filter(col('count(Hotel_Name)') > 94)\
    .orderBy(col("count(Hotel_Name)").desc()).show(10,truncate=False)


Least_Visited_Hotels = HotelsDF.select("Hotel_Name") \
   .groupBy("Hotel_Name") \
   .agg(count("Hotel_Name").alias("NumHotel_Name"))\
   .filter(col("NumHotel_Name") > 45)\
   .filter(col("NumHotel_Name") < 94)\
   .orderBy("NumHotel_Name",ascending=False).show()


    ##4.4. As important as the hotels are, the city in which they are located is an important factor for the review analisis.
          
print ("Summary of columns Hotel Names: Most Visited Cities")

Most_Visited_Cities = HotelsDF.select("Location") \
   .groupBy("Location") \
   .agg(count("Location").alias("NumLocation_Name"))\
        .orderBy("NumLocation_Name",ascending=False).show()



        



     
       
 


Summary of columns Hotel Names: How many distinct hotels there are
+----------+
|Hotel_Name|
+----------+
|      1171|
+----------+

Summary of columns Hotel Names: Top 20 Most Visited Hotels
+-------------------------------------------------+-------------+
|Hotel_Name                                       |NumHotel_Name|
+-------------------------------------------------+-------------+
|Britannia International Hotel Canary Wharf       |1232         |
|Park Plaza Westminster Bridge London             |1182         |
|Strand Palace Hotel                              |1011         |
|Copthorne Tara Hotel London Kensington           |913          |
|DoubleTree by Hilton Hotel London Tower of London|869          |
|Intercontinental London The O2                   |768          |
|citizenM Tower of London                         |767          |
|Holiday Inn London Kensington                    |740          |
|Park Plaza London Riverbank                      |718          |
|Grand Royale Lo

In [539]:
### C.Performance related columns basic profiling ####

    #C### For the perfromance related columns, the reviews are taking into consideration, since it is how the perception
        ##of the users, also taking into consideration the nationality of the reviewers to analyze if where people are form makes them more
        ##vocal on evaluating the perfromance as well as how many negative and positive words are used. 
        
##4.6. Begining with a  summary of the numerical variables: Reviewer score, Average score, Total Negative Words and Total Positive Words

##summary of reviewers for categorization 

print ("Summary of columns Review_Total_Negative_Word_Counts, Review_Total_Positive_Word_Counts, Reviewer_Score")
HotelsDF.select("Review_Total_Negative_Word_Counts", "Review_Total_Positive_Word_Counts", "Reviewer_Score","Average_Score").summary().show()


##Checking for nulls 
print("Checking for nulls on columns Review_Total_Negative_Word_Counts, Review_Total_Positive_Word_Counts, Reviewer_Score")
HotelsDF.select([count(when(col(Reviews).isNull(), Reviews)).alias(Reviews) for Reviews in ["Review_Total_Negative_Word_Counts", "Review_Total_Positive_Word_Counts", "Reviewer_Score","Average_Score"]]).show()

    
##4.6- comment-  After reviewing the Reviews summary, there is better metrics that can transmit more interesting insights, since the dataset is from 2017, there is a probability that
      ## the average score given by booking relies on historical data, we can conclude that by analyzing the actual reviewer score will reflect the actual perception og the consumer.
    
##4.7- ####Nationalities####
      ##As part of the analysis, the basic profiling for nationalities is needed, analizing how many distinct nationalities there are, as well as the number of reviews they have given out,
     ##with a significance filter of a min of given reviews

print ("Summary of columns Reviewer_Nationality: How many distinct Nationalities there are")

HotelsDF.select(countDistinct("Reviewer_Nationality").alias("Nationality")).show()


print ("Summary of columns Cities: What is the behaviour of nationalities - What nationalities are more vocal")

Most_Vocal_Nationalities = HotelsDF.select("Reviewer_Nationality") \
   .groupBy("Reviewer_Nationality") \
   .agg(count("Reviewer_Nationality").alias("Num_mostvocal_Nationalities"))\
        .orderBy("Num_mostvocal_Nationalities",ascending=False).show()


print ("Summary of columns Cities: What is the behaviour of nationalities - What nationalities are less vocal")

Least_Vocal_Nationalities = HotelsDF.select("Reviewer_Nationality") \
   .groupBy("Reviewer_Nationality") \
   .agg(count("Reviewer_Nationality").alias("Num_leastvocal_Nationalities"))\
   .filter(col("Num_leastvocal_Nationalities") > 80)\
   .filter(col("Num_leastvocal_Nationalities") < 200)\
   .orderBy("Num_leastvocal_Nationalities",ascending=False).show()



    
    






Summary of columns Review_Total_Negative_Word_Counts, Review_Total_Positive_Word_Counts, Reviewer_Score
+-------+---------------------------------+---------------------------------+------------------+-----------------+
|summary|Review_Total_Negative_Word_Counts|Review_Total_Positive_Word_Counts|    Reviewer_Score|    Average_Score|
+-------+---------------------------------+---------------------------------+------------------+-----------------+
|  count|                           111022|                           111022|            111022|           111022|
|   mean|               20.289384086037003|               18.853983895083857| 8.374934697626752|8.388800417936194|
| stddev|                 32.4170026242823|                24.25433818381829|1.6824347060653249|0.564340692822694|
|    min|                                0|                                0|               2.5|              6.4|
|    25%|                                2|                                5|              

## 5. Analysis for Business questions:

### 1.	Which hotels are the ones that have the best and worst score devised by location?                                         ### 2.	Which location is the one that has to watch out the most and what are my worst hotels based on the classification of goodness?                                                                                                                                                ###3.	What other factor besides location and hotel are taken into consideration for giving a good or bad review?


In [540]:
### 5.1 Which hotels are the ones that have the best and worst score devised by location? 
   #### To answer this question a filter was created to ensure equity between the reviews taking into consideration
    ###hotels with more than 60 reviews which is an average of 5 per month

In [541]:
from pyspark.sql import functions as F

print("Best in class Hotels")
Reviews_HotelDF= HotelsDF.groupBy("Hotel_Name","location").agg(avg("Reviewer_Score"),\
     count("Hotel_Name"))\
    .filter(col('count(Hotel_Name)') > 60)\
    .orderBy(col("avg(Reviewer_Score)").desc()).show(10,truncate=False)

print("Least favorite Hotels")
Reviews_HotelDF= HotelsDF.groupBy("Hotel_Name","location").agg(avg("Reviewer_Score"),\
     count("Hotel_Name"))\
    .filter(col('count(Hotel_Name)') > 60)\
    .orderBy(col("avg(Reviewer_Score)").asc()).show(10,truncate=False)

Best in class Hotels
+------------------------------+---------------+-------------------+-----------------+
|Hotel_Name                    |location       |avg(Reviewer_Score)|count(Hotel_Name)|
+------------------------------+---------------+-------------------+-----------------+
|Hotel Casa Camper             |Barcelona Spain|9.74705882352942   |102              |
|H10 Casa Mimosa 4 Sup         |Barcelona Spain|9.63714285714286   |70               |
|Catalonia Magdalenes          |Barcelona Spain|9.611764705882356  |85               |
|Hotel Palace GL               |Barcelona Spain|9.601587301587303  |63               |
|Nolinski Paris                |Paris France   |9.585245901639349  |61               |
|H tel D Aubusson              |Paris France   |9.561904761904765  |63               |
|Intercontinental London The O2|United Kingdom |9.519401041666697  |768              |
|Rosewood London               |United Kingdom |9.50416666666667   |72               |
|Batty Langley s      

In [542]:
### 5.2 Which location is the one that has to watch out the most and what are my worst hotels based on the classification of goodness? 
   ###Classification: 4 categories:  Exceptional,Very Good,Below expectations & Alarming based on the previews summary of reviewer score
    ##taking into account the min, max and min the calssification is:


#   "Exceptional"           - Reviews Above 9.6
#   "Very Good"             - Reviews Between 9.5 -8.8
#   "Below expectations"    - Reviews Between 8.7 - 7.5
#   "Alarming"              - Reviews Below 7.5


# 5.2.1. Create categorizations based on the review rating

from pyspark.sql.functions import count, round,countDistinct

totalHotels = HotelsDF.count()

reviewCategorizationDF = HotelsDF\
   .where(col("days_since_review")!="NA")\
   .withColumn("PerformanceReview", when(col("Reviewer_Score")<7.5,"1.Alarming")\
                               .when((col("Reviewer_Score")>=7.5) & (col("Reviewer_Score")<=8.7),"2.Below expectations")\
                               .when((col("Reviewer_Score")>8.7) & (col("Reviewer_Score")<=9.5),"3.Very Good")\
                               .when((col("Reviewer_Score")>9.5) & (col("Reviewer_Score")<=10),"4.Exceptional"))

reviewCategorizationDF.cache()
## 5.2.2. Which location is the one that has to watch out the most:
print ("Which location is the one that has to watch out the most:")
reviewCategorizationDF.select("PerformanceReview","location","Reviewer_Score")\
                     .groupBy("PerformanceReview","location","Reviewer_Score")\
                     .agg(avg("Reviewer_Score").alias("NumReviews"))\
                     .orderBy("PerformanceReview").show(20,truncate=False)

Which location is the one that has to watch out the most:
+-----------------+---------------------+--------------+------------------+
|PerformanceReview|location             |Reviewer_Score|NumReviews        |
+-----------------+---------------------+--------------+------------------+
|1.Alarming       |United Kingdom       |7.0           |7.0               |
|1.Alarming       |United Kingdom       |4.5           |4.5               |
|1.Alarming       |Amsterdam Netherlands|6.0           |6.0               |
|1.Alarming       |Amsterdam Netherlands|5.8           |5.799999999999969 |
|1.Alarming       |Paris France         |2.5           |2.5               |
|1.Alarming       |Amsterdam Netherlands|7.1           |7.100000000000042 |
|1.Alarming       |Amsterdam Netherlands|5.4           |5.400000000000018 |
|1.Alarming       |Barcelona Spain      |4.6           |4.600000000000012 |
|1.Alarming       |Paris France         |5.0           |5.0               |
|1.Alarming       |Barcelona S

In [543]:
## 5.2.2. Which location is the one that has to watch out the most:
print ("Which location is the one that has to watch out the most: Amsterdam is the city with most alarming reviews, and the  least Exceptional reviews, meanwhile United Kingdom has the best reviews, exceptional and Very Good")

LocationDF = reviewCategorizationDF.groupBy("location").pivot("PerformanceReview").sum("Reviewer_Score").orderBy("location",ascending=False)
LocationDF.show()

Which location is the one that has to watch out the most: Amsterdam is the city with most alarming reviews, and the  least Exceptional reviews, meanwhile United Kingdom has the best reviews, exceptional and Very Good
+--------------------+------------------+--------------------+------------------+------------------+
|            location|        1.Alarming|2.Below expectations|       3.Very Good|     4.Exceptional|
+--------------------+------------------+--------------------+------------------+------------------+
|      United Kingdom|  91903.3000000059|  108846.80000000211|108733.79999999708|233793.60000003438|
|        Paris France| 18521.69999999991|  23545.099999999708|26707.600000000137|  55463.5999999981|
|     Barcelona Spain|16446.200000000004|   23439.29999999979|28379.200000000237|  62107.9999999977|
|Amsterdam Netherl...|19359.899999999896|  24910.299999999843|29960.000000000193| 57683.59999999774|
+--------------------+------------------+--------------------+--------------

In [544]:
## 5.2.3. What hotels are the worst according to classification:

reviewCategorizationDF = HotelsDF\
   .withColumn("PerformanceReview", when(col("Reviewer_Score")<7.5,"1.Alarming")\
                               .when((col("Reviewer_Score")>=7.5) & (col("Reviewer_Score")<=8.7),"2.Below expectations")\
                               .when((col("Reviewer_Score")>8.7) & (col("Reviewer_Score")<=9.5),"3.Very Good")\
                               .when((col("Reviewer_Score")>9.5) & (col("Reviewer_Score")<=10),"4.Exceptional"))

reviewCategorizationDF.cache()
print ("Which hotels are the one that has to watch out the most: top 100. Insight: Eventhough UK holds the best average reviews, in this analysis it is concluded that UK holds 54% of the worst rated.Main Insight: UK is the city to place to focus on.")
reviewCategorizationDF.select("PerformanceReview","hotel_name","location","Reviewer_Score")\
                     .groupBy("PerformanceReview","hotel_name","location","Reviewer_Score")\
                     .agg(avg("Reviewer_Score"))\
                     .orderBy("Reviewer_Score").distinct().show(20,truncate=False)

Which hotels are the one that has to watch out the most: top 100. Insight: Eventhough UK holds the best average reviews, in this analysis it is concluded that UK holds 54% of the worst rated.Main Insight: UK is the city to place to focus on.
+-----------------+------------------------------------------+---------------------+--------------+-------------------+
|PerformanceReview|hotel_name                                |location             |Reviewer_Score|avg(Reviewer_Score)|
+-----------------+------------------------------------------+---------------------+--------------+-------------------+
|1.Alarming       |Best Western Premier Kapital Op ra        |Paris France         |2.5           |2.5                |
|1.Alarming       |Hotel le Lapin Blanc                      |Paris France         |2.5           |2.5                |
|1.Alarming       |Novotel Amsterdam City                    |Amsterdam Netherlands|2.5           |2.5                |
|1.Alarming       |Marlin Waterloo    

In [545]:
## 5.2.4. What hotels are the best according to classification:

reviewCategorizationDF = HotelsDF\
   .withColumn("PerformanceReview", when(col("Reviewer_Score")<7.5,"1.Alarming")\
                               .when((col("Reviewer_Score")>=7.5) & (col("Reviewer_Score")<=8.7),"2.Below expectations")\
                               .when((col("Reviewer_Score")>8.7) & (col("Reviewer_Score")<=9.5),"3.Very Good")\
                               .when((col("Reviewer_Score")>9.5) & (col("Reviewer_Score")<=10),"4.Exceptional"))

reviewCategorizationDF.cache()
print ("Which hotels are the one that have the best rating: After Analyzing UK is among the best ones as well, followed by Paris France,again it is confirmed that making a zoom into the bigger picture can allow a better analisis.")
reviewCategorizationDF.select("PerformanceReview","hotel_name","location","Reviewer_Score")\
                     .groupBy("PerformanceReview","hotel_name","location","Reviewer_Score")\
                     .agg(avg("Reviewer_Score"))\
                     .orderBy(col("PerformanceReview").desc()).show(20,truncate=False)

Which hotels are the one that have the best rating: After Analyzing UK is among the best ones as well, followed by Paris France,again it is confirmed that making a zoom into the bigger picture can allow a better analisis.
+-----------------+-------------------------------------+---------------+--------------+-------------------+
|PerformanceReview|hotel_name                           |location       |Reviewer_Score|avg(Reviewer_Score)|
+-----------------+-------------------------------------+---------------+--------------+-------------------+
|4.Exceptional    |Novotel London City South            |United Kingdom |10.0          |10.0               |
|4.Exceptional    |Catalonia Eixample 1864              |Barcelona Spain|9.6           |9.599999999999998  |
|4.Exceptional    |Ayre Hotel Rosell n                  |Barcelona Spain|9.6           |9.599999999999996  |
|4.Exceptional    |Le Marquis Eiffel                    |Paris France   |10.0          |10.0               |
|4.Exceptional 

In [546]:
### 5.3 What other factor are taken into consideration for giving a bad review ?
        ##For this analysis the nationality is taken into consideration, seeking the answer if certain nationalities are more keen to give a bad review than others.

In [547]:
### 5.3 What other factor are taken into consideration for giving a bad review ?
##5.3.1- First I took into consideration the provious basic profiling of "Most Vocal Nationalities", reviewing that Uk, US, Australia and Ireland are the Most Vocal ones. 

Most_Vocal_NationalitiesDF = HotelsDF.select("Reviewer_Nationality")\
   .groupBy("Reviewer_Nationality")\
   .agg(count("Reviewer_Nationality").alias("Num_mostvocal_Nationalities"))\
   .orderBy("Num_mostvocal_Nationalities",ascending=False)

Most_Vocal_NationalitiesDF.show(20,truncate=False)


+--------------------------+---------------------------+
|Reviewer_Nationality      |Num_mostvocal_Nationalities|
+--------------------------+---------------------------+
| United Kingdom           |57670                      |
| United States of America |8212                       |
| Australia                |4416                       |
| Ireland                  |3919                       |
| Netherlands              |1798                       |
| United Arab Emirates     |1698                       |
| Canada                   |1671                       |
| Germany                  |1505                       |
| France                   |1471                       |
| Switzerland              |1467                       |
| Saudi Arabia             |1340                       |
| Israel                   |1251                       |
| Belgium                  |1224                       |
| Spain                    |1040                       |
| South Africa             |102

In [548]:
### 5.3 What other factor are taken into consideration for giving a bad review ?
##5.3.2-Then taking into consideration the type of Perfromance Review, I analyze further the nationalities, concluding that eventhough UK,US & AUS
        ##are the most vocal they are mostly leaving good reviews since they are within the top 10, being the main insight that this nationalities will give a good review if they like the 
        ### service provided to them,the recommendation would be to make a loyalty program by the most vocal nationalities to increase the word of mouth of the hotels that are best hotels and do a further 
        ### sentiment analysis based on what they like the most & promote them in an advertising campaign. 
reviewCategorizationDF = HotelsDF\
   .withColumn("PerformanceReview", when(col("Reviewer_Score")<7.5,"1.Alarming")\
                               .when((col("Reviewer_Score")>=7.5) & (col("Reviewer_Score")<=8.7),"2.Below expectations")\
                               .when((col("Reviewer_Score")>8.7) & (col("Reviewer_Score")<=9.5),"3.Very Good")\
                               .when((col("Reviewer_Score")>9.5) & (col("Reviewer_Score")<=10),"4.Exceptional"))

reviewCategorizationDF.cache()
print ("Summary of Nationalities & Reviews")

Nationalities_Top = reviewCategorizationDF.groupBy("Reviewer_Nationality").pivot("PerformanceReview").max("Reviewer_Score").orderBy(col("`1.Alarming`").desc(), col("`2.Below expectations`").desc(), col("`3.Very Good`").desc(), col("`4.Exceptional`").desc()).show(200,truncate=False)\

Summary of Nationalities & Reviews
+--------------------------------------+----------+--------------------+-----------+-------------+
|Reviewer_Nationality                  |1.Alarming|2.Below expectations|3.Very Good|4.Exceptional|
+--------------------------------------+----------+--------------------+-----------+-------------+
| United Kingdom                       |7.1       |8.5                 |9.5        |10.0         |
| India                                |7.1       |8.5                 |9.5        |10.0         |
| Qatar                                |7.1       |8.5                 |9.5        |10.0         |
| Australia                            |7.1       |8.5                 |9.5        |10.0         |
| France                               |7.1       |8.5                 |9.5        |10.0         |
| Ireland                              |7.1       |8.5                 |9.5        |10.0         |
| Lebanon                              |7.1       |8.5                 |9.

In [549]:
## 6. Insights:
 
    ### 1.	United Kingdom the key location to keep track on since is the most visited one and the one that is between the best and worst hotel reviews.
 ###2.	The location with above best reviews is Barcelona, a further analysis based on the content of the reviews is advised to get the best qualities the location has and replicate them among other chains within and outside the city. 
  ###3.	It is recommended to make the same sentiment analysis in the hotels categorized as “Alarming” since they are alarmingly below the expected average, and in order to increase the average rating and provide a good service the analysis can provide the most used words by the customer and find patterns. 
  ###4.	Since the most vocal nationalities (UK, US, Australia and Ireland) give above all good reviews, a loyalty program is recommended towards these nationalities to promote word of mouth on the hotels, as well as an online marketing campaign targeting these nationalities.
