![Kayak](https://seekvectorlogo.com/wp-content/uploads/2018/01/kayak-vector-logo.png)

# Plan your trip with Kayak 

## Company's description 📇

<a href="https://www.kayak.com" target="_blank">Kayak</a> is a travel search engine that helps user plan their next trip at the best price.

The company was founded in 2004 by Steve Hafner & Paul M. English. After a few rounds of fundraising, Kayak was acquired by <a href="https://www.bookingholdings.com/" target="_blank">Booking Holdings</a> which now holds: 

* <a href="https://booking.com/" target="_blank">Booking.com</a>
* <a href="https://kayak.com/" target="_blank">Kayak</a>
* <a href="https://www.priceline.com/" target="_blank">Priceline</a>
* <a href="https://www.agoda.com/" target="_blank">Agoda</a>
* <a href="https://Rentalcars.com/" target="_blank">RentalCars</a>
* <a href="https://www.opentable.com/" target="_blank">OpenTable</a>

With over \$300 million revenue a year, Kayak operates in almost all countries and all languages to help their users book travels accros the globe. 

## Project 🚧

The marketing team needs help on a new project. After doing some user research, the team discovered that **70% of their users who are planning a trip would like to have more information about the destination they are going to**. 

In addition, user research shows that **people tend to be defiant about the information they are reading if they don't know the brand** which produced the content. 

Therefore, Kayak Marketing Team would like to create an application that will recommend where people should plan their next holidays. The application should be based on real data about:

* Weather 
* Hotels in the area 

The application should then be able to recommend the best destinations and hotels based on the above variables at any given time. 

## Goals 🎯

As the project has just started, your team doesn't have any data that can be used to create this application. Therefore, your job will be to: 

* Scrape data from destinations 
* Get weather data from each destination 
* Get hotels' info about each destination
* Store all the information above in a data lake
* Extract, transform and load cleaned data from your datalake to a data warehouse

## Scope of this project 🖼️

Marketing team wants to focus first on the best cities to travel to in France. According <a href="https://one-week-in.com/35-cities-to-visit-in-france/" target="_blank">One Week In.com</a> here are the top-35 cities to visit in France: 

```python 
["Mont Saint Michel",
"St Malo",
"Bayeux",
"Le Havre",
"Rouen",
"Paris",
"Amiens",
"Lille",
"Strasbourg",
"Chateau du Haut Koenigsbourg",
"Colmar",
"Eguisheim",
"Besancon",
"Dijon",
"Annecy",
"Grenoble",
"Lyon",
"Gorges du Verdon",
"Bormes les Mimosas",
"Cassis",
"Marseille",
"Aix en Provence",
"Avignon",
"Uzes",
"Nimes",
"Aigues Mortes",
"Saintes Maries de la mer",
"Collioure",
"Carcassonne",
"Ariege",
"Toulouse",
"Montauban",
"Biarritz",
"Bayonne",
"La Rochelle"]
```

Your team should focus **only on the above cities for your project**. 


## Helpers 🦮

To help you achieve this project, here are a few tips that should help you

### Get weather data with an API 

*   Use https://nominatim.org/ to get the gps coordinates of all the cities (no subscription required) Documentation : https://nominatim.org/release-docs/develop/api/Search/

*   Use https://openweathermap.org/appid (you have to subscribe to get a free apikey) and https://openweathermap.org/api/one-call-api to get some information about the weather for the 35 cities and put it in a DataFrame

*   Determine the list of cities where the weather will be the nicest within the next 7 days For example, you can use the values of daily.pop and daily.rain to compute the expected volume of rain within the next 7 days... But it's only an example, actually you can have different opinions on a what a nice weather would be like 😎 Maybe the most important criterion for you is the temperature or humidity, so feel free to change the rules !

*   Save all the results in a `.csv` file, you will use it later 😉 You can save all the informations that seem important to you ! Don't forget to save the name of the cities, and also to create a column containing a unique identifier (id) of each city (this is important for what's next in the project)

*   Use plotly to display the best destinations on a map

### Scrape Booking.com 

Since BookingHoldings doesn't have aggregated databases, it will be much faster to scrape data directly from booking.com 

You can scrap as many information asyou want, but we suggest that you get at least:

*   hotel name,
*   Url to its booking.com page,
*   Its coordinates: latitude and longitude
*   Score given by the website users
*   Text description of the hotel


### Create your data lake using S3 

Once you managed to build your dataset, you should store into S3 as a csv file. 

### ETL 

Once you uploaded your data onto S3, it will be better for the next data analysis team to extract clean data directly from a Data Warehouse. Therefore, create a SQL Database using AWS RDS, extract your data from S3 and store it in your newly created DB. 

## Deliverable 📬

To complete this project, your team should deliver:

* A `.csv` file in an S3 bucket containing enriched information about weather and hotels for each french city

* A SQL Database where we should be able to get the same cleaned data from S3 

* Two maps where you should have a Top-5 destinations and a Top-20 hotels in the area. You can use plotly or any other library to do so. It should look something like this: 

![Map](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/Kayak_best_destination_project.png)

In [1]:
!pip install Scrapy -q

In [2]:
import pandas as pd

import requests
import pprint
from datetime import datetime

!pip install plotly -q
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
pio.renderers.default = "iframe_connected"


import os # => Library used to easily manipulate operating systems
import logging # => Library used for logs manipulation 

# Import scrapy and scrapy.crawler 
import scrapy 
from scrapy.crawler import CrawlerProcess


In [5]:
# Create list of the 35 cities
cities = ["Mont Saint Michel",
"St Malo",
"Bayeux",
"Le Havre",
"Rouen",
"Paris",
"Amiens",
"Lille",
"Strasbourg",
"Chateau du Haut Koenigsbourg",
"Colmar",
"Eguisheim",
"Besancon",
"Dijon",
"Annecy",
"Grenoble",
"Lyon",
"Gorges du Verdon",
"Bormes les Mimosas",
"Cassis",
"Marseille",
"Aix en Provence",
"Avignon",
"Uzes",
"Nimes",
"Aigues Mortes",
"Saintes Maries de la mer",
"Collioure",
"Carcassonne",
"Ariege",
"Toulouse",
"Montauban",
"Biarritz",
"Bayonne",
"La Rochelle"]

In [6]:
# Store cities in a dataframe
df = pd.DataFrame(cities, columns=['City'])
df

Unnamed: 0,City
0,Mont Saint Michel
1,St Malo
2,Bayeux
3,Le Havre
4,Rouen
5,Paris
6,Amiens
7,Lille
8,Strasbourg
9,Chateau du Haut Koenigsbourg


### Get weather data from each destination
##### Get weather data with an API 

*   Use https://nominatim.org/ to get the gps coordinates of all the cities (no subscription required) Documentation : https://nominatim.org/release-docs/develop/api/Search/

*   Use https://openweathermap.org/appid (you have to subscribe to get a free apikey) and https://openweathermap.org/api/one-call-api to get some information about the weather for the 35 cities and put it in a DataFrame

*   Determine the list of cities where the weather will be the nicest within the next 7 days For example, you can use the values of daily.pop and daily.rain to compute the expected volume of rain within the next 7 days... But it's only an example, actually you can have different opinions on a what a nice weather would be like 😎 Maybe the most important criterion for you is the temperature or humidity, so feel free to change the rules !

*   Save all the results in a `.csv` file, you will use it later 😉 You can save all the informations that seem important to you ! Don't forget to save the name of the cities, and also to create a column containing a unique identifier (id) of each city (this is important for what's next in the project)

*   Use plotly to display the best destinations on a map

### I : Get GPS coordinates of 35 TOP cities from nominatim.org API

In [7]:
# Get info for one city, here Ariege
get_one_city = requests.get('https://nominatim.openstreetmap.org/search?q=Ariege&country=France&format=json&limit=1')
get_one_city.json()

[{'place_id': 281653603,
  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
  'osm_type': 'relation',
  'osm_id': 7439,
  'boundingbox': ['42.5732416', '43.3162514', '0.8267506', '2.1758135'],
  'lat': '42.9455368',
  'lon': '1.4065544156065486',
  'display_name': 'Ariège, Occitanie, France métropolitaine, France',
  'class': 'boundary',
  'type': 'administrative',
  'importance': 0.6009114788084189,
  'icon': 'https://nominatim.openstreetmap.org/ui/mapicons//poi_boundary_administrative.p.20.png'}]

In [8]:
# get Lat and Lon for Ariege
print(get_one_city.json()[0]['lat'])
print(get_one_city.json()[0]['lon'])

42.9455368
1.4065544156065486


In [9]:
list_lat = []
list_lon = []
for city in cities:
    city_encoded = city.replace(' ','+')
    info_city = requests.get('https://nominatim.openstreetmap.org/search?q={}&country=France&format=json&limit=1'.format(city_encoded))
    gps_data = info_city.json()
    if gps_data ==[]:
        list_lat.append('Null')
        list_lon.append('Null')
        print("Info for {} 'null' ".format(city))
    else:
        list_lat.append(gps_data[0]['lat'])
        list_lon.append(gps_data[0]['lon'])
        print("Info for {} ok".format(city))

Info for Mont Saint Michel ok
Info for St Malo ok
Info for Bayeux ok
Info for Le Havre ok
Info for Rouen ok
Info for Paris ok
Info for Amiens ok
Info for Lille ok
Info for Strasbourg ok
Info for Chateau du Haut Koenigsbourg ok
Info for Colmar ok
Info for Eguisheim ok
Info for Besancon ok
Info for Dijon ok
Info for Annecy ok
Info for Grenoble ok
Info for Lyon ok
Info for Gorges du Verdon ok
Info for Bormes les Mimosas ok
Info for Cassis ok
Info for Marseille ok
Info for Aix en Provence ok
Info for Avignon ok
Info for Uzes ok
Info for Nimes ok
Info for Aigues Mortes ok
Info for Saintes Maries de la mer ok
Info for Collioure ok
Info for Carcassonne ok
Info for Ariege ok
Info for Toulouse ok
Info for Montauban ok
Info for Biarritz ok
Info for Bayonne ok
Info for La Rochelle ok


In [10]:
print("List Lat :", list_lat)
print()
print("Len list lat:", len(list_lat))
print()
print("List Long :", list_lon)
print()
print("Len list lon:", len(list_lon))

List Lat : ['48.6359541', '48.649518', '49.2764624', '49.4938975', '49.4404591', '48.8588897', '49.8941708', '50.6365654', '48.584614', '48.249489800000006', '48.0777517', '48.0447968', '47.2380222', '47.3215806', '45.8992348', '45.1875602', '45.7578137', '43.7496562', '43.1572172', '43.2140359', '43.2961743', '43.5298424', '43.9492493', '44.0121279', '43.8374249', '43.5658225', '43.4522771', '42.52505', '43.2130358', '42.9455368', '43.6044622', '44.0175835', '43.4832523', '43.4933379', '46.1591126']

Len list lat: 35

List Long : ['-1.511459954959514', '-2.0260409', '-0.7024738', '0.1079732', '1.0939658', '2.3200410217200766', '2.2956951', '3.0635282', '7.7507127', '7.34429620253195', '7.3579641', '7.3079618', '6.0243622', '5.0414701', '6.1288847', '5.7357819', '4.8320114', '6.3285616', '6.329253867921363', '5.5396318', '5.3699525', '5.4474738', '4.8059012', '4.4196718', '4.3600687', '4.1912837', '4.4287172', '3.0831554', '2.3491069', '1.4065544156065486', '1.4442469', '1.3549991', '-

In [11]:
# Create columns in df for coordinates of each cities
df['Latitude'] = None
df['Longitude'] = None

for i in df.index:
    df['Latitude'][i]= float(list_lat[i])
    df['Longitude'][i]= float(list_lon[i])
    
display(df)

Unnamed: 0,City,Latitude,Longitude
0,Mont Saint Michel,48.635954,-1.51146
1,St Malo,48.649518,-2.026041
2,Bayeux,49.276462,-0.702474
3,Le Havre,49.493898,0.107973
4,Rouen,49.440459,1.093966
5,Paris,48.85889,2.320041
6,Amiens,49.894171,2.295695
7,Lille,50.636565,3.063528
8,Strasbourg,48.584614,7.750713
9,Chateau du Haut Koenigsbourg,48.24949,7.344296


In [12]:
df['Latitude'] = df['Latitude'].astype(float)
df['Longitude'] = df['Longitude'].astype(float)

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35 entries, 0 to 34
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   City       35 non-null     object 
 1   Latitude   35 non-null     float64
 2   Longitude  35 non-null     float64
dtypes: float64(2), object(1)
memory usage: 968.0+ bytes


In [14]:
# extract df in csv file
df.to_csv('src/cities_coordgps.csv',index=False)

### II : Get weather of 35 TOP cities from openweathermap.org API

In [15]:
API_key=API_key
API_key
# For security and anonymization reasons, account info has been removed from the notebook.

'4f87f1b692a3452fb5da3993fa7a9d6d'

In [16]:
paris_coord = [df.Latitude[5],df.Longitude[5]]
paris_coord

[48.8588897, 2.3200410217200766]

In [16]:
## test to get weather for one destination
onecity_test = requests.get(f"https://api.openweathermap.org/data/2.5/onecall?lat={df.Latitude[5]}&lon={df.Longitude[5]}&exclude=hourly,current,minutely&appid={API_key}&units=metric&lang=fr")
onecity_test = onecity_test.json()
onecity_test

{'lat': 48.8589,
 'lon': 2.32,
 'timezone': 'Europe/Paris',
 'timezone_offset': 3600,
 'daily': [{'dt': 1644753600,
   'sunrise': 1644735770,
   'sunset': 1644772034,
   'moonrise': 1644758760,
   'moonset': 1644730860,
   'moon_phase': 0.4,
   'temp': {'day': 8.52,
    'min': 2.23,
    'max': 11.55,
    'night': 11.08,
    'eve': 11.28,
    'morn': 2.23},
   'feels_like': {'day': 5.39, 'night': 9.89, 'eve': 10.11, 'morn': -1.07},
   'pressure': 1012,
   'humidity': 72,
   'dew_point': 3.77,
   'wind_speed': 7.99,
   'wind_deg': 189,
   'wind_gust': 18.5,
   'weather': [{'id': 801,
     'main': 'Clouds',
     'description': 'peu nuageux',
     'icon': '02d'}],
   'clouds': 20,
   'pop': 0,
   'uvi': 1.22},
  {'dt': 1644840000,
   'sunrise': 1644822068,
   'sunset': 1644858534,
   'moonrise': 1644849000,
   'moonset': 1644819780,
   'moon_phase': 0.43,
   'temp': {'day': 11.22,
    'min': 6.39,
    'max': 11.68,
    'night': 6.39,
    'eve': 8.56,
    'morn': 8.54},
   'feels_like': {'d

In [17]:
# Get weather data for each cities and for keep_keys in the request
df_weather = []
days = 8

for index in range(df.shape[0]):
    for j in range(days):
        r = requests.get(f"https://api.openweathermap.org/data/2.5/onecall?lat={df.Latitude[index]}&lon={df.Longitude[index]}&exclude=hourly,current,minutely&appid={API_key}&units=metric").json()
        city = df.City[index]
        lat = df.Latitude[index]
        lon = df.Longitude[index]
        day = j
        d = r['daily']
        date = datetime.fromtimestamp(d[j]['dt']).strftime('%d/%m/%Y')
        tem_day = d[j]['temp']['day']
        feelslike_day = d[j]['feels_like']['day']
        pressure = d[j]['pressure']
        humidity = d[j]['humidity']
        wind_speed = d[j]['wind_speed']
        weather_main = d[j]['weather'][0]['main']
        weather_desc = d[j]['weather'][0]['description']
        prob_rain = d[j]['pop']
        clouds = d[j]['clouds']
        uvi = d[j]['uvi']
        df_weather.append([city, lat, lon, day, date, tem_day, feelslike_day, pressure, humidity, wind_speed, weather_main, weather_desc, prob_rain, clouds, uvi])

In [18]:
# Create a dataframe with df_weather
keep_keys = ["city", "Latitude", "Longitude", "day", "date", "tem_day", "feelslike_day", "pressure", "humidity", "wind_speed", "weather_main", "weather_desc", "prob_rain", "clouds", "uvi"]
df_weather_35 = pd.DataFrame(df_weather, columns=keep_keys)
df_weather_35

Unnamed: 0,city,Latitude,Longitude,day,date,tem_day,feelslike_day,pressure,humidity,wind_speed,weather_main,weather_desc,prob_rain,clouds,uvi
0,Mont Saint Michel,48.635954,-1.511460,0,22/03/2022,14.08,13.01,1026,56,7.73,Rain,light rain,0.22,2,3.41
1,Mont Saint Michel,48.635954,-1.511460,1,23/03/2022,13.40,12.00,1029,46,6.22,Clouds,broken clouds,0.00,55,3.42
2,Mont Saint Michel,48.635954,-1.511460,2,24/03/2022,16.13,15.21,1028,54,4.25,Clouds,few clouds,0.00,14,3.50
3,Mont Saint Michel,48.635954,-1.511460,3,25/03/2022,16.21,15.17,1027,49,5.35,Clear,clear sky,0.00,0,4.26
4,Mont Saint Michel,48.635954,-1.511460,4,26/03/2022,15.56,14.58,1027,54,6.39,Clear,clear sky,0.00,0,4.24
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
275,La Rochelle,46.159113,-1.152043,3,25/03/2022,14.31,13.23,1024,55,4.84,Clouds,overcast clouds,0.00,100,3.65
276,La Rochelle,46.159113,-1.152043,4,26/03/2022,15.36,14.57,1023,62,7.32,Clear,clear sky,0.00,7,4.80
277,La Rochelle,46.159113,-1.152043,5,27/03/2022,14.93,14.18,1024,65,5.34,Clear,clear sky,0.00,0,4.46
278,La Rochelle,46.159113,-1.152043,6,28/03/2022,16.06,15.26,1017,59,5.63,Clouds,few clouds,0.00,16,5.00


In [7]:
df_weather_35.to_csv('src/cities_weather.csv',index=False)

In [5]:
df_weather_35 = pd.read_csv('src/cities_weather.csv')
df_weather_35.head()

Unnamed: 0,city,Latitude,Longitude,day,date,tem_day,feelslike_day,pressure,humidity,wind_speed,weather_main,weather_desc,prob_rain,clouds,uvi
0,Mont Saint Michel,48.635954,-1.51146,0,22/03/2022,14.08,13.01,1026,56,7.73,Rain,light rain,0.22,2,3.41
1,Mont Saint Michel,48.635954,-1.51146,1,23/03/2022,13.4,12.0,1029,46,6.22,Clouds,broken clouds,0.0,55,3.42
2,Mont Saint Michel,48.635954,-1.51146,2,24/03/2022,16.13,15.21,1028,54,4.25,Clouds,few clouds,0.0,14,3.5
3,Mont Saint Michel,48.635954,-1.51146,3,25/03/2022,16.21,15.17,1027,49,5.35,Clear,clear sky,0.0,0,4.26
4,Mont Saint Michel,48.635954,-1.51146,4,26/03/2022,15.56,14.58,1027,54,6.39,Clear,clear sky,0.0,0,4.24


In [6]:
df_weather_35.weather_main.value_counts()

Clear     137
Clouds    108
Rain       35
Name: weather_main, dtype: int64

In [6]:
# Select best cities by weather_main and tem_day
top_cities = df_weather_35[df_weather_35["weather_main"]=="Clear"].sort_values("tem_day", ascending=False).head(5)
list_best_cities = top_cities['city'].unique()

In [7]:
print("According to weather main = Clear and tem_day, we could say that 5 best cities for 8 past days were : ", list_best_cities)

According to weather main = Clear and tem_day, we could say that 5 best cities for 8 past days were :  ['Lyon' 'Avignon' 'Eguisheim' 'Bayonne' 'Grenoble']


In [112]:
# Map of weather of the 35 cities, according to temperature of the day and the main weather.
fig = px.scatter_mapbox(df_weather_35, 
                        lat = 'Latitude', 
                        lon = 'Longitude', 
                        size = 'tem_day', 
                        mapbox_style = 'carto-positron', 
                        color = 'weather_main', 
                        zoom=4.5)
fig.show(renderer='iframe_connected')

In [113]:
# Map of the 5 best cities with temperature of the day and main weather
fig = px.scatter_mapbox(top_cities, 
                        lat = 'Latitude', 
                        lon = 'Longitude', 
                        size = 'tem_day', 
                        mapbox_style = 'carto-positron', 
                        color = 'weather_main', 
                        zoom=4.5)
fig.show(renderer='iframe_connected')

### Get hotels' info about each destination
##### Scrape Booking.com 

Since BookingHoldings doesn't have aggregated databases, it will be much faster to scrape data directly from booking.com 

You can scrap as many information asyou want, but we suggest that you get at least:

*   hotel name,
*   Url to its booking.com page,
*   Its coordinates: latitude and longitude
*   Score given by the website users
*   Text description of the hotel

### III. Scraping booking.com

In [23]:
class BookingSpider(scrapy.Spider):

    #Name of the spider
    name = "booking" 
    
    # List of URLs by cities
    list_url = []
    list_cities = list_best_cities
    for i in range(0, len(list_cities)):
        list_url.append("https://www.booking.com/searchresults.fr.html?ss={}%2C%20france".format(list_cities[i]))
    
    # Starting URL
    start_urls = list_url
    
    # Parse function for request
    def parse(self, response):
        for hotel in response.css('div._fe1927d9e._0811a1b54._a8a1be610._022ee35ec.b9c27d6646.fb3c4512b4.fc21746a73'):
            
            try :
                hotel_name = hotel.css('div.fde444d7ef._c445487e2::text').get()
                hotel_url = hotel.css('a::attr(href)').get()
                hotel_describe = hotel.css('div._4abc4c3d5::text').get()
                hotel_ranking = hotel.css('div._9c5f726ff.bd528f9ea6::text').get()
                hotel_city = hotel.css('span.af1ddfc958.eba89149fb::text').get()
                         
            except :
                print ('No page found for : ', hotel)
                
            else :
                yield scrapy.Request(
                    url = hotel_url, 
                    callback=self.hotel_page,
                    meta={ 
                        'hotel_name':hotel_name, 
                        'hotel_url':hotel_url, 
                        'hotel_describe':hotel_describe, 
                        'hotel_ranking':hotel_ranking,
                        'hotel_city':hotel_city
                    }
                    )
                           
            
    # Callback used after having hotel_url
    def hotel_page (self,response):
        
        hotel_gps_coord = response.css('a#hotel_header').attrib['data-atlas-latlng'].split(",")
        
        output ={
            'hotel_name':response.meta.get('hotel_name'),
            'hotel_url':response.meta.get('hotel_url'),
            'hotel_describe':response.meta.get('hotel_describe'),
            'hotel_ranking':response.meta.get('hotel_ranking'),
            'hotel_city':response.meta.get('hotel_city'),
            'hotel_lat':hotel_gps_coord[0],
            'hotel_lon':hotel_gps_coord[1]    
        }
        return output

In [24]:
# Name of the file where the results will be saved
filename = "list_hotel_cities.json"

# If file already exists, delete it before crawling (because Scrapy will 
# concatenate the last and new results otherwise)
if filename in os.listdir('src/'):
        os.remove('src/' + filename)

# Declare a new CrawlerProcess with some settings
## USER_AGENT => Simulates a browser on an OS
## LOG_LEVEL => Minimal Level of Log 
## FEEDS => Where the file will be stored 
process = CrawlerProcess(settings = {
    'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',
    'LOG_LEVEL': logging.INFO,
    'AUTOTHROTTLE_ENABLED': True,
    "FEEDS": {
        'src/' + filename : {"format": "json"},
    }
})

# Start the crawling using the spider you defined above
process.crawl(BookingSpider)
process.start()

2022-03-22 21:37:25 [scrapy.utils.log] INFO: Scrapy 2.6.1 started (bot: scrapybot)
2022-03-22 21:37:25 [scrapy.utils.log] INFO: Versions: lxml 4.8.0.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 22.2.0, Python 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) - [GCC 9.4.0], pyOpenSSL 22.0.0 (OpenSSL 1.1.1l  24 Aug 2021), cryptography 36.0.1, Platform Linux-5.4.170+-x86_64-with-glibc2.31
2022-03-22 21:37:25 [scrapy.crawler] INFO: Overridden settings:
{'AUTOTHROTTLE_ENABLED': True,
 'LOG_LEVEL': 20,
 'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) '
               'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 '
               'Safari/537.36'}
2022-03-22 21:37:25 [scrapy.extensions.telnet] INFO: Telnet Password: 90469dd1987b2f50
2022-03-22 21:37:26 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage'

In [114]:
df_hotels = pd.read_json('src/list_hotel_cities.json')
df_hotels

Unnamed: 0,hotel_name,hotel_url,hotel_describe,hotel_ranking,hotel_city,hotel_lat,hotel_lon
0,MEININGER Hotel Lyon Centre Berthelot,https://www.booking.com/hotel/fr/meininger-lyo...,"Installé à Lyon, le MEININGER Hotel Lyon Centr...",84,"7e arr., Lyon",45.746083,4.837187
1,ibis Lyon Gerland Musée des Confluences,https://www.booking.com/hotel/fr/ibis-lyon-ger...,"Installé dans le sud de Lyon, sur les rives du...",79,"7e arr., Lyon",45.733325,4.822880
2,19Sisley - Calme & Cosy - 3CH 8P Metro Parking x2,https://www.booking.com/hotel/fr/19sisley.fr.h...,Hébergement géré par un particulier,83,"3e arr., Lyon",45.750615,4.868686
3,La Résidence,https://www.booking.com/hotel/fr/laresidenlyon...,Situé dans une rue piétonne de la presqu'île d...,80,"2e arr., Lyon",45.755278,4.830482
4,La Casa Jungle Bed & Spa - Pentes de la Croix ...,https://www.booking.com/hotel/fr/la-casa-jungl...,"Doté d'une baignoire spa, l'établissement La C...",84,"1er arr., Lyon",45.771222,4.835430
...,...,...,...,...,...,...,...
120,Best Western Hotel du Pont Wilson,https://www.booking.com/hotel/fr/lyon-wilson.f...,"Installé dans le centre-ville de Lyon, le Best...",83,"3e arr., Lyon",45.758431,4.841479
121,Aparthotel Adagio Lyon Patio Confluence,https://www.booking.com/hotel/fr/quality-suite...,L’Aparthotel Adagio Lyon Patio Confluence prop...,86,"2e arr., Lyon",45.745253,4.822816
122,Greet Hotel Lyon Confluence,https://www.booking.com/hotel/fr/greet-hotel-l...,"Doté d'une terrasse, d'un restaurant et d'un b...",80,"2e arr., Lyon",45.748458,4.827753
123,Hotel des Savoies Lyon Perrache,https://www.booking.com/hotel/fr/hoteldessavoi...,Situé dans le quartier de la Presqu&#39;île de...,75,"2e arr., Lyon",45.749736,4.829850


In [115]:
df_hotels.hotel_city.value_counts()

Eguisheim                          25
Bayonne                            25
Grenoble                           17
Centre-ville d'Avignon, Avignon    15
Avignon                            10
2e arr., Lyon                       9
Grenoble City Centre, Grenoble      8
3e arr., Lyon                       7
7e arr., Lyon                       4
1er arr., Lyon                      3
6e arr., Lyon                       1
5e arr., Lyon                       1
Name: hotel_city, dtype: int64

In [116]:
# Cleaning dataset

#Avignon

df_hotels["hotel_city"] = df_hotels["hotel_city"].str.replace("Centre-ville d'Avignon, Avignon","Avignon")

#Avignon

df_hotels["hotel_city"] = df_hotels["hotel_city"].str.replace("Grenoble City Centre, Grenoble","Grenoble")

#Lyon

df_hotels["hotel_city"] = df_hotels["hotel_city"].str.replace("2e arr., Lyon","Lyon")
df_hotels["hotel_city"] = df_hotels["hotel_city"].str.replace("3e arr., Lyon","Lyon")
df_hotels["hotel_city"] = df_hotels["hotel_city"].str.replace("7e arr., Lyon","Lyon")
df_hotels["hotel_city"] = df_hotels["hotel_city"].str.replace("1er arr., Lyon","Lyon")
df_hotels["hotel_city"] = df_hotels["hotel_city"].str.replace("6e arr., Lyon","Lyon")
df_hotels["hotel_city"] = df_hotels["hotel_city"].str.replace("5e arr., Lyon","Lyon")


The default value of regex will change from True to False in a future version.


The default value of regex will change from True to False in a future version.


The default value of regex will change from True to False in a future version.


The default value of regex will change from True to False in a future version.


The default value of regex will change from True to False in a future version.


The default value of regex will change from True to False in a future version.



In [117]:
df_hotels.hotel_city.value_counts()

Lyon         25
Avignon      25
Eguisheim    25
Bayonne      25
Grenoble     25
Name: hotel_city, dtype: int64

In [118]:
df_hotels.to_csv('src/list_hotels.csv',index=False)

In [3]:
hotels = pd.read_csv('src/list_hotels.csv')
hotels

Unnamed: 0,hotel_name,hotel_url,hotel_describe,hotel_ranking,hotel_city,hotel_lat,hotel_lon
0,MEININGER Hotel Lyon Centre Berthelot,https://www.booking.com/hotel/fr/meininger-lyo...,"Installé à Lyon, le MEININGER Hotel Lyon Centr...",84,Lyon,45.746083,4.837187
1,ibis Lyon Gerland Musée des Confluences,https://www.booking.com/hotel/fr/ibis-lyon-ger...,"Installé dans le sud de Lyon, sur les rives du...",79,Lyon,45.733325,4.822880
2,19Sisley - Calme & Cosy - 3CH 8P Metro Parking x2,https://www.booking.com/hotel/fr/19sisley.fr.h...,Hébergement géré par un particulier,83,Lyon,45.750615,4.868686
3,La Résidence,https://www.booking.com/hotel/fr/laresidenlyon...,Situé dans une rue piétonne de la presqu'île d...,80,Lyon,45.755278,4.830482
4,La Casa Jungle Bed & Spa - Pentes de la Croix ...,https://www.booking.com/hotel/fr/la-casa-jungl...,"Doté d'une baignoire spa, l'établissement La C...",84,Lyon,45.771222,4.835430
...,...,...,...,...,...,...,...
120,Best Western Hotel du Pont Wilson,https://www.booking.com/hotel/fr/lyon-wilson.f...,"Installé dans le centre-ville de Lyon, le Best...",83,Lyon,45.758431,4.841479
121,Aparthotel Adagio Lyon Patio Confluence,https://www.booking.com/hotel/fr/quality-suite...,L’Aparthotel Adagio Lyon Patio Confluence prop...,86,Lyon,45.745253,4.822816
122,Greet Hotel Lyon Confluence,https://www.booking.com/hotel/fr/greet-hotel-l...,"Doté d'une terrasse, d'un restaurant et d'un b...",80,Lyon,45.748458,4.827753
123,Hotel des Savoies Lyon Perrache,https://www.booking.com/hotel/fr/hoteldessavoi...,Situé dans le quartier de la Presqu&#39;île de...,75,Lyon,45.749736,4.829850


In [4]:
print("Percentage of missing values : ")
display((hotels.isnull().sum()/hotels.shape[0]*100).sort_values(ascending=False))

Percentage of missing values : 


hotel_ranking     2.4
hotel_name        0.0
hotel_url         0.0
hotel_describe    0.0
hotel_city        0.0
hotel_lat         0.0
hotel_lon         0.0
dtype: float64

In [5]:
hotels = hotels.drop(hotels[hotels["hotel_ranking"].isnull()].index)
hotels.shape

(122, 7)

In [6]:
print("Percentage of missing values : ")
display((hotels.isnull().sum()/hotels.shape[0]*100).sort_values(ascending=False))

Percentage of missing values : 


hotel_name        0.0
hotel_url         0.0
hotel_describe    0.0
hotel_ranking     0.0
hotel_city        0.0
hotel_lat         0.0
hotel_lon         0.0
dtype: float64

In [7]:
hotels.hotel_city.value_counts()

Lyon         25
Avignon      25
Grenoble     25
Eguisheim    24
Bayonne      23
Name: hotel_city, dtype: int64

In [8]:
fig = px.scatter_mapbox(hotels[hotels["hotel_city"] == 'Eguisheim'], 
                        lat = "hotel_lat", 
                        lon = "hotel_lon", 
                        color ="hotel_ranking",
                        hover_name="hotel_name", 
                        mapbox_style="carto-positron",
                        zoom=12)
fig.show(renderer='iframe_connected')

In [9]:
hotels[hotels["hotel_city"]=='Lyon'].sort_values('hotel_ranking', ascending=False)[0:5]

Unnamed: 0,hotel_name,hotel_url,hotel_describe,hotel_ranking,hotel_city,hotel_lat,hotel_lon
119,Appartement Lyon Centre Confluence 100 m2 Park...,https://www.booking.com/hotel/fr/appartement-l...,Hébergement géré par un particulier,92,Lyon,45.746386,4.824233
11,"Hotel De Verdun 1882, BW Signature Collection",https://www.booking.com/hotel/fr/de-verdun-lyo...,"L'Hotel De Verdun 1882, BW Signature Collectio...",91,Lyon,45.749588,4.829767
12,La Maison Debourg,https://www.booking.com/hotel/fr/la-maison-de-...,Hébergement géré par un particulier,90,Lyon,45.765057,4.828032
121,Aparthotel Adagio Lyon Patio Confluence,https://www.booking.com/hotel/fr/quality-suite...,L’Aparthotel Adagio Lyon Patio Confluence prop...,86,Lyon,45.745253,4.822816
113,DIFY Roi Lyon - Hotel de Ville,https://www.booking.com/hotel/fr/dify-roi-lyon...,"Situé au cœur de Lyon, à environ 2,1 km du mus...",85,Lyon,45.771474,4.833744


In [12]:
cities = ['Lyon','Avignon','Bayonne','Eguisheim','Grenoble']
for index, city in enumerate (cities):
    print(index)
    city = city
    print(city)
    data = hotels[hotels["hotel_city"] == city].sort_values('hotel_ranking', ascending=False)[0:20]
    display(data)
    fig = px.scatter_mapbox(data, 
                            lat="hotel_lat", 
                            lon = "hotel_lon", 
                            color="hotel_ranking",
                            hover_name="hotel_name", 
                            mapbox_style="carto-positron",
                            zoom=12)
    fig.update_layout(title='Top 20 of hotels in {}'.format(city))
    fig.show(renderer='iframe_connected')

0
Lyon


Unnamed: 0,hotel_name,hotel_url,hotel_describe,hotel_ranking,hotel_city,hotel_lat,hotel_lon
119,Appartement Lyon Centre Confluence 100 m2 Park...,https://www.booking.com/hotel/fr/appartement-l...,Hébergement géré par un particulier,92,Lyon,45.746386,4.824233
11,"Hotel De Verdun 1882, BW Signature Collection",https://www.booking.com/hotel/fr/de-verdun-lyo...,"L'Hotel De Verdun 1882, BW Signature Collectio...",91,Lyon,45.749588,4.829767
12,La Maison Debourg,https://www.booking.com/hotel/fr/la-maison-de-...,Hébergement géré par un particulier,90,Lyon,45.765057,4.828032
121,Aparthotel Adagio Lyon Patio Confluence,https://www.booking.com/hotel/fr/quality-suite...,L’Aparthotel Adagio Lyon Patio Confluence prop...,86,Lyon,45.745253,4.822816
113,DIFY Roi Lyon - Hotel de Ville,https://www.booking.com/hotel/fr/dify-roi-lyon...,"Situé au cœur de Lyon, à environ 2,1 km du mus...",85,Lyon,45.771474,4.833744
0,MEININGER Hotel Lyon Centre Berthelot,https://www.booking.com/hotel/fr/meininger-lyo...,"Installé à Lyon, le MEININGER Hotel Lyon Centr...",84,Lyon,45.746083,4.837187
4,La Casa Jungle Bed & Spa - Pentes de la Croix ...,https://www.booking.com/hotel/fr/la-casa-jungl...,"Doté d'une baignoire spa, l'établissement La C...",84,Lyon,45.771222,4.83543
2,19Sisley - Calme & Cosy - 3CH 8P Metro Parking x2,https://www.booking.com/hotel/fr/19sisley.fr.h...,Hébergement géré par un particulier,83,Lyon,45.750615,4.868686
120,Best Western Hotel du Pont Wilson,https://www.booking.com/hotel/fr/lyon-wilson.f...,"Installé dans le centre-ville de Lyon, le Best...",83,Lyon,45.758431,4.841479
6,Lagrange Aparthotel Lyon Lumière,https://www.booking.com/hotel/fr/lagrange-city...,Installé à mi-chemin entre les stations de mét...,82,Lyon,45.746459,4.868931


1
Avignon


Unnamed: 0,hotel_name,hotel_url,hotel_describe,hotel_ranking,hotel_city,hotel_lat,hotel_lon
105,Le Clos Saluces,https://www.booking.com/hotel/fr/le-clos-saluc...,Le Clos Saluces possède un jardin fleuri aména...,99,Avignon,43.950193,4.810586
90,Les Jardins de Baracane,https://www.booking.com/hotel/fr/les-jardins-d...,"Construit au XVIIème siècle, l'établissement L...",97,Avignon,43.944609,4.809282
107,La Maison Grivolas,https://www.booking.com/hotel/fr/la-maison-gri...,"Installé à Avignon, à 500 mètres du Palais des...",97,Avignon,43.951836,4.812205
91,KAROUBA.31,https://www.booking.com/hotel/fr/karouba-31.fr...,Hébergement géré par un particulier,94,Avignon,43.947607,4.807996
92,Mas Château Blanc Guest House,https://www.booking.com/hotel/fr/mas-chateau-b...,Le Mas Château Blanc propose des chambres d'hô...,94,Avignon,43.968044,4.845778
93,Maison d'hôtes L'îlot bambou,https://www.booking.com/hotel/fr/l-39-ilot-bam...,"Installée à Avignon, la Maison d'hôtes L'îlot ...",93,Avignon,43.95109,4.798019
112,Maison XIXe et Jardin en Intramuros,https://www.booking.com/hotel/fr/maison-xixe-e...,Hébergement géré par un particulier,91,Avignon,43.950344,4.811538
104,CLIMATISATION-Hypercentre-PARKING-COSY CARNOT-...,https://www.booking.com/hotel/fr/hypercentre-c...,Hébergement géré par un particulier,90,Avignon,43.949178,4.810831
13,Les petits poissons,https://www.booking.com/hotel/fr/les-petits-po...,Hébergement géré par un particulier,90,Avignon,43.936195,4.826001
102,Régina Boutique Hotel,https://www.booking.com/hotel/fr/hotelreginaav...,Le Régina Boutique Hotel est situé dans la rue...,87,Avignon,43.948166,4.805765


2
Bayonne


Unnamed: 0,hotel_name,hotel_url,hotel_describe,hotel_ranking,hotel_city,hotel_lat,hotel_lon
56,Péniche DJEBELLE,https://www.booking.com/hotel/fr/peniche-djebe...,"Située à Bayonne, à moins de 2,3 km de la cath...",96,Bayonne,43.496335,-1.473749
59,Parc 709 Bayonne,https://www.booking.com/hotel/fr/parc-709-bayo...,Hébergement géré par un particulier,94,Bayonne,43.495606,-1.481317
54,Appartement au coeur de Bayonne sur les remparts,https://www.booking.com/hotel/fr/5-rue-des-fau...,Hébergement géré par un particulier,94,Bayonne,43.489391,-1.479095
63,Villa la Renaissance,https://www.booking.com/hotel/fr/villa-la-rena...,La Villa la Renaissance propose des chambres à...,92,Bayonne,43.482321,-1.468164
41,Baionakoa Résidence,https://www.booking.com/hotel/fr/baionakoa-res...,"Situé à Bayonne, à 400 mètres de la cathédrale...",90,Bayonne,43.489828,-1.47693
45,Bayonne en plein cœur Centre Historique 2 cham...,https://www.booking.com/hotel/fr/bayonne-en-pl...,Hébergement géré par un particulier,90,Bayonne,43.48907,-1.475523
62,Hôtel Villa KOEGUI Bayonne,https://www.booking.com/hotel/fr/villa-koegui-...,"Doté d’un bar, d’un restaurant, d’une terrasse...",89,Bayonne,43.492371,-1.472588
52,Boutique Hôtel Un Appart en Ville,https://www.booking.com/hotel/fr/un-appart-en-...,Situé à 200 mètres de la cathédrale Sainte-Mar...,88,Bayonne,43.490923,-1.477441
60,Appart'Hôtel Bellevue,https://www.booking.com/hotel/fr/appart-39-bel...,"Situé dans un jardin, l'Appart'Hôtel Bellevue ...",87,Bayonne,43.505794,-1.456088
43,Hôtel des Basses Pyrénées - Bayonne,https://www.booking.com/hotel/fr/des-basses-py...,L'Hôtel des Basses Pyrénées - Bayonne est situ...,85,Bayonne,43.488411,-1.477143


3
Eguisheim


Unnamed: 0,hotel_name,hotel_url,hotel_describe,hotel_ranking,hotel_city,hotel_lat,hotel_lon
87,GITE LE COQ ROUGE,https://www.booking.com/hotel/fr/gite-le-coq-r...,Hébergement géré par un particulier,98,Eguisheim,48.041709,7.305693
85,La Grange de Madeleine,https://www.booking.com/hotel/fr/la-grange-de-...,La Grange de Madeleine est située à Eguisheim....,98,Eguisheim,48.041783,7.306547
86,Fleur de Vigne,https://www.booking.com/hotel/fr/fleur-de-vign...,Hébergement géré par un particulier,96,Eguisheim,48.046113,7.305163
83,Gîte au château fleuri,https://www.booking.com/hotel/fr/gite-au-chate...,Hébergement géré par un particulier,96,Eguisheim,48.042875,7.30672
74,Gîte Nature à Eguisheim***,https://www.booking.com/hotel/fr/gite-nature-a...,Hébergement géré par un particulier,95,Eguisheim,48.042707,7.304425
68,Les Epicuriens du Rempart,https://www.booking.com/hotel/fr/maison-du-rem...,Hébergement géré par un particulier,94,Eguisheim,48.041847,7.305535
65,Gite Le Petit Malsbach Eguisheim,https://www.booking.com/hotel/fr/gite-le-petit...,Hébergement géré par un particulier,94,Eguisheim,48.042241,7.312705
70,Les chambres du domaine,https://www.booking.com/hotel/fr/les-chambres-...,Hébergement géré par un particulier,93,Eguisheim,48.042963,7.307216
76,La Maison du Rempart,https://www.booking.com/hotel/fr/la-maison-du-...,Hébergement géré par un particulier,93,Eguisheim,48.041722,7.305604
69,Au pied des remparts à Eguisheim,https://www.booking.com/hotel/fr/elsass-design...,Doté d'une connexion Wi-Fi gratuite et offrant...,92,Eguisheim,48.043006,7.304316


4
Grenoble


Unnamed: 0,hotel_name,hotel_url,hotel_describe,hotel_ranking,hotel_city,hotel_lat,hotel_lon
25,Hôtel Victoria,https://www.booking.com/hotel/fr/victoria-gren...,"Situé à Grenoble, à 1,1 km du WTC Grenoble, l'...",88,Grenoble,45.187494,5.721472
39,Esprit Bistrot / Rent4night Grenoble,https://www.booking.com/hotel/fr/esprit-bistro...,Hébergement géré par un particulier,87,Grenoble,45.185432,5.743996
33,Le Grand Hôtel Grenoble,https://www.booking.com/hotel/fr/le-grand-gren...,Le Grand Hôtel Grenoble est un établissement 4...,87,Grenoble,45.190815,5.728548
20,Mon petit jardin de ville,https://www.booking.com/hotel/fr/mon-petit-jar...,Hébergement géré par un particulier,85,Grenoble,45.184922,5.707256
22,1924 Hôtel,https://www.booking.com/hotel/fr/royal-grenobl...,"Doté d’une connexion Wi-Fi gratuite, le 1924 H...",85,Grenoble,45.189285,5.71862
17,Le Hüb - Grenoble,https://www.booking.com/hotel/fr/le-hub-grenob...,Le Hüb - Grenoble vous accueille à 300 mètres ...,84,Grenoble,45.192873,5.711917
34,Hôtel de l'Europe Grenoble hyper-centre,https://www.booking.com/hotel/fr/de-l-europe-g...,"Situé au cœur de Grenoble, l'Hôtel de l'Europe...",84,Grenoble,45.190357,5.727331
40,Maison Barbillon Grenoble,https://www.booking.com/hotel/fr/maison-barbil...,La Maison Barbillon Grenoble vous accueille à ...,83,Grenoble,45.190793,5.716697
19,Tempologis Grenoble,https://www.booking.com/hotel/fr/tempologis-gr...,Le Tempologis Grenoble est situé à Grenoble et...,81,Grenoble,45.179273,5.734339
37,Hotel Lux,https://www.booking.com/hotel/fr/hotel-lux.fr....,Proposant des chambres avec une connexion Wi-F...,81,Grenoble,45.189522,5.716226
