![Kayak](https://seekvectorlogo.com/wp-content/uploads/2018/01/kayak-vector-logo.png)

# Plan your trip with Kayak 

## Company's description 📇

<a href="https://www.kayak.com" target="_blank">Kayak</a> is a travel search engine that helps user plan their next trip at the best price.

The company was founded in 2004 by Steve Hafner & Paul M. English. After a few rounds of fundraising, Kayak was acquired by <a href="https://www.bookingholdings.com/" target="_blank">Booking Holdings</a> which now holds: 

* <a href="https://booking.com/" target="_blank">Booking.com</a>
* <a href="https://kayak.com/" target="_blank">Kayak</a>
* <a href="https://www.priceline.com/" target="_blank">Priceline</a>
* <a href="https://www.agoda.com/" target="_blank">Agoda</a>
* <a href="https://Rentalcars.com/" target="_blank">RentalCars</a>
* <a href="https://www.opentable.com/" target="_blank">OpenTable</a>

With over \$300 million revenue a year, Kayak operates in almost all countries and all languages to help their users book travels accros the globe. 

## Project 🚧

The marketing team needs help on a new project. After doing some user research, the team discovered that **70% of their users who are planning a trip would like to have more information about the destination they are going to**. 

In addition, user research shows that **people tend to be defiant about the information they are reading if they don't know the brand** which produced the content. 

Therefore, Kayak Marketing Team would like to create an application that will recommend where people should plan their next holidays. The application should be based on real data about:

* Weather 
* Hotels in the area 

The application should then be able to recommend the best destinations and hotels based on the above variables at any given time. 

## Goals 🎯

As the project has just started, your team doesn't have any data that can be used to create this application. Therefore, your job will be to: 

* Scrape data from destinations 
* Get weather data from each destination 
* Get hotels' info about each destination
* Store all the information above in a data lake
* Extract, transform and load cleaned data from your datalake to a data warehouse

## Scope of this project 🖼️

Marketing team wants to focus first on the best cities to travel to in France. According <a href="https://one-week-in.com/35-cities-to-visit-in-france/" target="_blank">One Week In.com</a> here are the top-35 cities to visit in France: 

```python 
["Mont Saint Michel",
"St Malo",
"Bayeux",
"Le Havre",
"Rouen",
"Paris",
"Amiens",
"Lille",
"Strasbourg",
"Chateau du Haut Koenigsbourg",
"Colmar",
"Eguisheim",
"Besancon",
"Dijon",
"Annecy",
"Grenoble",
"Lyon",
"Gorges du Verdon",
"Bormes les Mimosas",
"Cassis",
"Marseille",
"Aix en Provence",
"Avignon",
"Uzes",
"Nimes",
"Aigues Mortes",
"Saintes Maries de la mer",
"Collioure",
"Carcassonne",
"Ariege",
"Toulouse",
"Montauban",
"Biarritz",
"Bayonne",
"La Rochelle"]
```

Your team should focus **only on the above cities for your project**. 


## Helpers 🦮

To help you achieve this project, here are a few tips that should help you

### Get weather data with an API 

*   Use https://nominatim.org/ to get the gps coordinates of all the cities (no subscription required) Documentation : https://nominatim.org/release-docs/develop/api/Search/

*   Use https://openweathermap.org/appid (you have to subscribe to get a free apikey) and https://openweathermap.org/api/one-call-api to get some information about the weather for the 35 cities and put it in a DataFrame

*   Determine the list of cities where the weather will be the nicest within the next 7 days For example, you can use the values of daily.pop and daily.rain to compute the expected volume of rain within the next 7 days... But it's only an example, actually you can have different opinions on a what a nice weather would be like 😎 Maybe the most important criterion for you is the temperature or humidity, so feel free to change the rules !

*   Save all the results in a `.csv` file, you will use it later 😉 You can save all the informations that seem important to you ! Don't forget to save the name of the cities, and also to create a column containing a unique identifier (id) of each city (this is important for what's next in the project)

*   Use plotly to display the best destinations on a map

### Scrape Booking.com 

Since BookingHoldings doesn't have aggregated databases, it will be much faster to scrape data directly from booking.com 

You can scrap as many information asyou want, but we suggest that you get at least:

*   hotel name,
*   Url to its booking.com page,
*   Its coordinates: latitude and longitude
*   Score given by the website users
*   Text description of the hotel


### Create your data lake using S3 

Once you managed to build your dataset, you should store into S3 as a csv file. 

### ETL 

Once you uploaded your data onto S3, it will be better for the next data analysis team to extract clean data directly from a Data Warehouse. Therefore, create a SQL Database using AWS RDS, extract your data from S3 and store it in your newly created DB. 

## Deliverable 📬

To complete this project, your team should deliver:

* A `.csv` file in an S3 bucket containing enriched information about weather and hotels for each french city

* A SQL Database where we should be able to get the same cleaned data from S3 

* Two maps where you should have a Top-5 destinations and a Top-20 hotels in the area. You can use plotly or any other library to do so. It should look something like this: 

![Map](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/Kayak_best_destination_project.png)

## 1. Collecte des données météo avec un API

In [1]:
import os
import json
import pandas as pd
import requests
import plotly.express as px

In [2]:
# The search API has the following format:
# https://nominatim.openstreetmap.org/search?<params>

endpoint_geo = "https://nominatim.openstreetmap.org/search"

# APIKey pour OpenWeatherMap
# XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

# Endpoint:
# - Please, use the endpoint api.openweathermap.org for your API calls
# - Example of API call:
# api.openweathermap.org/data/2.5/weather?q=London,uk&APPID=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

endpoint_weather = "https://api.openweathermap.org/data/2.5/forecast"

list_cities = ["Mont Saint Michel",
"St Malo",
"Bayeux",
"Le Havre",
"Rouen",
"Paris",
"Amiens",
"Lille",
"Strasbourg",
"Chateau du Haut Koenigsbourg",
"Colmar",
"Eguisheim",
"Besancon",
"Dijon",
"Annecy",
"Grenoble",
"Lyon",
"Gorges du Verdon",
"Bormes les Mimosas",
"Cassis",
"Marseille",
"Aix en Provence",
"Avignon",
"Uzes",
"Nimes",
"Aigues Mortes",
"Saintes Maries de la mer",
"Collioure",
"Carcassonne",
"Ariege",
"Toulouse",
"Montauban",
"Biarritz",
"Bayonne",
"La Rochelle"]

In [3]:
options_geo = {
    'city': list_cities[0],
    'country': 'France',
    'format': 'json'
}
r = requests.get(url=endpoint_geo, params=options_geo)
# r.json()[0]

In [4]:
r.json()[0]

{'place_id': 247828266,
 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://osm.org/copyright',
 'osm_type': 'way',
 'osm_id': 211285890,
 'lat': '48.6359541',
 'lon': '-1.511459954959514',
 'class': 'place',
 'type': 'islet',
 'place_rank': 20,
 'importance': 0.45543655678157396,
 'addresstype': 'islet',
 'name': 'Mont Saint-Michel',
 'display_name': 'Mont Saint-Michel, Le Mont-Saint-Michel, Avranches, Manche, Normandie, France métropolitaine, 50170, France',
 'boundingbox': ['48.6349172', '48.6370310', '-1.5133292', '-1.5094796']}

In [5]:
options_weather = {
    'lat': r.json()[0]['lat'],
    'lon': r.json()[0]['lon'],
    'units': 'metric',
    'appid': 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
}

r_weather = requests.get(url=endpoint_weather, params=options_weather)
r_weather

<Response [200]>

In [6]:
r_weather.headers

{'Server': 'openresty', 'Date': 'Mon, 05 Aug 2024 13:19:30 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '15889', 'Connection': 'keep-alive', 'X-Cache-Key': '/data/2.5/forecast?lat=48.64&lon=-1.51&units=metric', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Methods': 'GET, POST'}

In [7]:
r_weather.json()['list'][-1]['main']

{'temp': 26.51,
 'feels_like': 26.51,
 'temp_min': 26.51,
 'temp_max': 26.51,
 'pressure': 1018,
 'sea_level': 1018,
 'grnd_level': 1013,
 'humidity': 39,
 'temp_kf': 0}

In [8]:
df = pd.DataFrame(columns = ['Ville', 'Latitude', 'Longitude', 'Temperature', 'Humidite'])

In [9]:
df.head()

Unnamed: 0,Ville,Latitude,Longitude,Temperature,Humidite


In [10]:
df.loc[0] = [list_cities[0],
             r.json()[0]['lat'],
             r.json()[0]['lon'],
             r_weather.json()['list'][-1]['main']['temp'],
             r_weather.json()['list'][-1]['main']['humidity']]

In [11]:
df.head()

Unnamed: 0,Ville,Latitude,Longitude,Temperature,Humidite
0,Mont Saint Michel,48.6359541,-1.511459954959514,26.51,39


In [12]:
import time

for i, city in enumerate(list_cities):
    options_geo = {
        'city': city,
        'country': 'France',
        'format': 'json'
    }
    r = requests.get(url=endpoint_geo, params=options_geo)
    print('Ville {} code retour {}'.format(city, r.status_code))
    if r.status_code == 200:
        dict_city = r.json()[0]
        options_weather = {
            'lat': dict_city['lat'],
            'lon': dict_city['lon'],
            'units': 'metric', # Pour avoir des degres Celsius
            'appid': 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
        }
        r_weather = requests.get(url=endpoint_weather, params=options_weather)
        print('Meteo ville {} code retour {}'.format(city, r_weather.status_code))
        if r_weather.status_code == 200:
            meteo_city_in_5days = r_weather.json()['list'][-1]['main'] # Dernier element de mon ensemble de previsions
            df.loc[i] = [city,
                         dict_city['lat'],
                         dict_city['lon'],
                         meteo_city_in_5days['temp'],
                         meteo_city_in_5days['humidity']]

    time.sleep(1)

Ville Mont Saint Michel code retour 200
Meteo ville Mont Saint Michel code retour 200
Ville St Malo code retour 200
Meteo ville St Malo code retour 200
Ville Bayeux code retour 200
Meteo ville Bayeux code retour 200
Ville Le Havre code retour 200
Meteo ville Le Havre code retour 200
Ville Rouen code retour 200
Meteo ville Rouen code retour 200
Ville Paris code retour 200
Meteo ville Paris code retour 200
Ville Amiens code retour 200
Meteo ville Amiens code retour 200
Ville Lille code retour 200
Meteo ville Lille code retour 200
Ville Strasbourg code retour 200
Meteo ville Strasbourg code retour 200
Ville Chateau du Haut Koenigsbourg code retour 200
Meteo ville Chateau du Haut Koenigsbourg code retour 200
Ville Colmar code retour 200
Meteo ville Colmar code retour 200
Ville Eguisheim code retour 200
Meteo ville Eguisheim code retour 200
Ville Besancon code retour 200
Meteo ville Besancon code retour 200
Ville Dijon code retour 200
Meteo ville Dijon code retour 200
Ville Annecy code reto

In [13]:
df

Unnamed: 0,Ville,Latitude,Longitude,Temperature,Humidite
0,Mont Saint Michel,48.6359541,-1.511459954959514,26.51,39
1,St Malo,48.649518,-2.0260409,24.01,53
2,Bayeux,49.2764624,-0.7024738,27.03,37
3,Le Havre,49.4938975,0.1079732,20.63,59
4,Rouen,49.4404591,1.0939658,27.57,42
5,Paris,48.8534951,2.3483915,29.7,30
6,Amiens,49.8941708,2.2956951,26.23,49
7,Lille,50.6365654,3.0635282,26.6,48
8,Strasbourg,48.584614,7.7507127,27.46,53
9,Chateau du Haut Koenigsbourg,48.2495226,7.3454923,25.36,54


In [14]:
# Pour convertir les chaines de caracteres en valeurs numeriques
for column in ['Latitude', 'Longitude', 'Temperature', 'Humidite']:
    df[column] = df[column].apply(lambda x: float(x))

In [17]:
# Pour sauvegarder les infos de la meteo dans un fichier CSV
df.to_csv('weather_infos.csv', encoding='iso-8859-1')

### Graphe des températures des différentes villes

In [18]:
import plotly.express as px

fig_temp = px.scatter_mapbox(
        df,
        lat="Latitude",
        lon="Longitude",
        color="Temperature",
        mapbox_style="open-street-map",
        color_continuous_scale=px.colors.sequential.Bluered,
        zoom=4,
        width=800
    )
fig_temp.show()

In [19]:
fig_hum = px.scatter_mapbox(
        df,
        lat="Latitude",
        lon="Longitude",
        color="Humidite",
        mapbox_style="open-street-map",
        color_continuous_scale=px.colors.sequential.Blues,
        zoom=4,
        width=800
    )
fig_hum.show()

Si on préfère les villes les moins humides, le top 5 serait donc :
Carcassonne (18), Toulouse (22), Avignon (23), Montauban (24), Uzes (28)

## 2. Scrape Booking.com

In [9]:
# Pour scraper Booking.com, on recherche les liens de recherche pour chaque ville

list_urls_cities = ['https://www.booking.com/searchresults.fr.html?ss=' + element.replace(' ','%20') for element in list_cities]
list_urls_cities

['https://www.booking.com/searchresults.fr.html?ss=Mont%20Saint%20Michel',
 'https://www.booking.com/searchresults.fr.html?ss=St%20Malo',
 'https://www.booking.com/searchresults.fr.html?ss=Bayeux',
 'https://www.booking.com/searchresults.fr.html?ss=Le%20Havre',
 'https://www.booking.com/searchresults.fr.html?ss=Rouen',
 'https://www.booking.com/searchresults.fr.html?ss=Paris',
 'https://www.booking.com/searchresults.fr.html?ss=Amiens',
 'https://www.booking.com/searchresults.fr.html?ss=Lille',
 'https://www.booking.com/searchresults.fr.html?ss=Strasbourg',
 'https://www.booking.com/searchresults.fr.html?ss=Chateau%20du%20Haut%20Koenigsbourg',
 'https://www.booking.com/searchresults.fr.html?ss=Colmar',
 'https://www.booking.com/searchresults.fr.html?ss=Eguisheim',
 'https://www.booking.com/searchresults.fr.html?ss=Besancon',
 'https://www.booking.com/searchresults.fr.html?ss=Dijon',
 'https://www.booking.com/searchresults.fr.html?ss=Annecy',
 'https://www.booking.com/searchresults.fr.ht

In [6]:
# Lancer un projet Scrapy
!scrapy startproject projet_booking

New Scrapy project 'projet_booking', using template directory 'C:\Users\pierr\anaconda3\Lib\site-packages\scrapy\templates\project', created in:
    C:\Users\pierr\OneDrive\Documents\Formation\Formation Jedha\Jedha_Training\Projets Portfolio\M03-Collecte et management des données\projet_booking

You can start your first spider with:
    cd projet_booking
    scrapy genspider example example.com


In [7]:
!pip install scrapy-rotating-proxies

Collecting scrapy-rotating-proxies
  Downloading scrapy_rotating_proxies-0.6.2-py2.py3-none-any.whl (15 kB)
Collecting typing (from scrapy-rotating-proxies)
  Downloading typing-3.7.4.3.tar.gz (78 kB)
     ---------------------------------------- 0.0/78.6 kB ? eta -:--:--
     ----- ---------------------------------- 10.2/78.6 kB ? eta -:--:--
     ------------------- ------------------ 41.0/78.6 kB 667.8 kB/s eta 0:00:01
     -------------------------------------- 78.6/78.6 kB 876.3 kB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: typing
  Building wheel for typing (setup.py): started
  Building wheel for typing (setup.py): finished with status 'done'
  Created wheel for typing: filename=typing-3.7.4.3-py3-none-any.whl size=26325 sha256=901a9df537b47d8d5b346c1c68ab1dd1a7665bcf22e8b0f2e6e8aca4caea893d
  Stored in directory: c:\users\pierr\appdata\local\pip\cache\wheels\9d\67\2f

In [10]:
proxy = pd.read_csv("Free_Proxy_List.csv")
proxy.head()

Unnamed: 0,ip,port,anonymityLevel,asn,country,isp,latency,org,protocols,responseTime,speed,updated_at,upTime,upTimeSuccessCount,upTimeTryCount
0,38.91.106.204,40103,elite,,US,,73.0,,socks5,,1.0,2022-01-14T16:01:23.480Z,100,4616,4622
1,147.135.255.62,8242,elite,AS16276,FR,OVH SAS,144.0,OVH,http,61.0,286.0,2022-01-14T16:01:21.197Z,100,54,54
2,178.128.178.169,3128,transparent,,US,,2.0,,https,,,2022-01-14T16:01:21.307Z,100,4811,4811
3,43.249.224.170,84,transparent,AS18229,IN,Equinox Consulting PVT LTD,221.0,Pioneer Elabs Ltd.,http,46.0,,2022-01-14T16:01:20.509Z,100,73,73
4,62.109.31.192,20000,elite,AS29182,RU,Cjsc the First,181.0,TheFirst,socks5,35.0,1.0,2022-01-14T16:01:17.958Z,100,54,54


In [42]:
hotel_infos = pd.read_json("projet_booking/spiders/bookingscrap.json")
hotel_infos.head()

Unnamed: 0,hotel_url,name,coord,description,review,stars
0,https://www.booking.com/hotel/fr/hotel-saint-a...,Le Saint Aubert,"48.61293783,-1.51010513","Niché dans un écrin de verdure, à seulement 2 ...",72,0
1,https://www.booking.com/hotel/fr/la-vieille-au...,La Vieille Auberge,"48.63606300,-1.51145700",La Vieille Auberge vous accueille dans le vill...,74,0
2,https://www.booking.com/hotel/fr/d-aleth.fr.html,Hotel d'Aleth,"48.63593081,-2.02171236","Situé en face du port des Bas Sablons, l'Hotel...",78,0
3,https://www.booking.com/hotel/fr/hotel-gabriel...,Hotel Gabriel,"48.61538141,-1.51070997","L’Hotel Gabriel vous accueille à 1,6 km du Mon...",80,0
4,https://www.booking.com/hotel/fr/chambre-priva...,B&B de la roseraie,"49.29158754,-0.70337099","Situé à Bayeux, à seulement 1,8 km de la cathé...",88,3


In [43]:
hotel_infos['stars'] = hotel_infos['stars'].apply(lambda x: None if x==0 else x)
hotel_infos.head()

Unnamed: 0,hotel_url,name,coord,description,review,stars
0,https://www.booking.com/hotel/fr/hotel-saint-a...,Le Saint Aubert,"48.61293783,-1.51010513","Niché dans un écrin de verdure, à seulement 2 ...",72,
1,https://www.booking.com/hotel/fr/la-vieille-au...,La Vieille Auberge,"48.63606300,-1.51145700",La Vieille Auberge vous accueille dans le vill...,74,
2,https://www.booking.com/hotel/fr/d-aleth.fr.html,Hotel d'Aleth,"48.63593081,-2.02171236","Situé en face du port des Bas Sablons, l'Hotel...",78,
3,https://www.booking.com/hotel/fr/hotel-gabriel...,Hotel Gabriel,"48.61538141,-1.51070997","L’Hotel Gabriel vous accueille à 1,6 km du Mon...",80,
4,https://www.booking.com/hotel/fr/chambre-priva...,B&B de la roseraie,"49.29158754,-0.70337099","Situé à Bayeux, à seulement 1,8 km de la cathé...",88,3.0


In [44]:
hotel_infos['coord_lat'] = hotel_infos['coord'].apply(lambda x: x.split(',')[0])
hotel_infos['coord_lon'] = hotel_infos['coord'].apply(lambda x: x.split(',')[1])

In [45]:
hotel_infos.head()

Unnamed: 0,hotel_url,name,coord,description,review,stars,coord_lat,coord_lon
0,https://www.booking.com/hotel/fr/hotel-saint-a...,Le Saint Aubert,"48.61293783,-1.51010513","Niché dans un écrin de verdure, à seulement 2 ...",72,,48.61293783,-1.51010513
1,https://www.booking.com/hotel/fr/la-vieille-au...,La Vieille Auberge,"48.63606300,-1.51145700",La Vieille Auberge vous accueille dans le vill...,74,,48.636063,-1.511457
2,https://www.booking.com/hotel/fr/d-aleth.fr.html,Hotel d'Aleth,"48.63593081,-2.02171236","Situé en face du port des Bas Sablons, l'Hotel...",78,,48.63593081,-2.02171236
3,https://www.booking.com/hotel/fr/hotel-gabriel...,Hotel Gabriel,"48.61538141,-1.51070997","L’Hotel Gabriel vous accueille à 1,6 km du Mon...",80,,48.61538141,-1.51070997
4,https://www.booking.com/hotel/fr/chambre-priva...,B&B de la roseraie,"49.29158754,-0.70337099","Situé à Bayeux, à seulement 1,8 km de la cathé...",88,3.0,49.29158754,-0.70337099


In [46]:
hotel_infos = hotel_infos.drop('coord', axis=1)
hotel_infos.head()

Unnamed: 0,hotel_url,name,description,review,stars,coord_lat,coord_lon
0,https://www.booking.com/hotel/fr/hotel-saint-a...,Le Saint Aubert,"Niché dans un écrin de verdure, à seulement 2 ...",72,,48.61293783,-1.51010513
1,https://www.booking.com/hotel/fr/la-vieille-au...,La Vieille Auberge,La Vieille Auberge vous accueille dans le vill...,74,,48.636063,-1.511457
2,https://www.booking.com/hotel/fr/d-aleth.fr.html,Hotel d'Aleth,"Situé en face du port des Bas Sablons, l'Hotel...",78,,48.63593081,-2.02171236
3,https://www.booking.com/hotel/fr/hotel-gabriel...,Hotel Gabriel,"L’Hotel Gabriel vous accueille à 1,6 km du Mon...",80,,48.61538141,-1.51070997
4,https://www.booking.com/hotel/fr/chambre-priva...,B&B de la roseraie,"Situé à Bayeux, à seulement 1,8 km de la cathé...",88,3.0,49.29158754,-0.70337099


In [47]:
hotel_infos_sorted = hotel_infos.sort_values(by='review', ascending=False)
hotel_tops = hotel_infos_sorted.head(20)
hotel_tops

Unnamed: 0,hotel_url,name,description,review,stars,coord_lat,coord_lon
67,https://www.booking.com/hotel/fr/au-duplex-d-o...,Au Duplex d'Or Centre Historique,"Situé à Besançon, à 1,2 km de la gare de Besan...",99,3.0,47.23366436,6.0291489
231,https://www.booking.com/hotel/fr/harmony-3.fr....,Appartement situation exceptionnelle HARMONY III,"Récemment rénové, l'Appartement situation exce...",99,3.0,43.21181391,2.35220498
264,https://www.booking.com/hotel/fr/au-coeur-des-...,Au Cœur des Remparts,"Situé à Aigues-Mortes, à 24 km du parc des exp...",99,,43.565401,4.192973
110,https://www.booking.com/hotel/fr/studio-cosy-c...,Studio cosy centre-ville/mer,Le Studio cosy centre-ville/mer est situé dans...,98,3.0,49.4859312,0.1064053
76,https://www.booking.com/hotel/fr/chambre-pin-u...,Chambre Pin Up Wings,"Situé à Eguisheim, à 6,1 km de la Maison des T...",97,3.0,48.0424275,7.308131
208,https://www.booking.com/hotel/fr/le-bijou-des-...,Le bijou des carmes - Haut de gamme climatisé,"Situé à Toulouse, à 4,2 km du Zénith et à 7,1 ...",97,3.0,43.5948314,1.4463197
150,https://www.booking.com/hotel/fr/luxueux-appar...,Luxueux Appartement au 16 eme Arrondissement,Le Luxueux Appartement au 16 eme Arrondissemen...,97,4.0,48.8396971,2.2658719
280,https://www.booking.com/hotel/fr/lousoan.fr.html,Lousoan,"Doté d'une piscine extérieure, d'un jardin et ...",97,3.0,43.54026065,5.43485774
300,https://www.booking.com/hotel/fr/la-petite-bas...,La petite Bastide,Dotée d'une terrasse et offrant une vue sur le...,97,,43.214951,5.5422611
237,https://www.booking.com/hotel/fr/charmant-t2-p...,Charmant T2 proche arènes,"Situé à Nîmes, à 2,5 km du parc des exposition...",97,4.0,43.8299613,4.3550993


In [48]:
hotel_tops['coord_lat'] = hotel_tops['coord_lat'].apply(lambda x: float(x))
hotel_tops['coord_lon'] = hotel_tops['coord_lon'].apply(lambda x: float(x))
hotel_tops.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  hotel_tops['coord_lat'] = hotel_tops['coord_lat'].apply(lambda x: float(x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  hotel_tops['coord_lon'] = hotel_tops['coord_lon'].apply(lambda x: float(x))


Unnamed: 0,hotel_url,name,description,review,stars,coord_lat,coord_lon
67,https://www.booking.com/hotel/fr/au-duplex-d-o...,Au Duplex d'Or Centre Historique,"Situé à Besançon, à 1,2 km de la gare de Besan...",99,3.0,47.233664,6.029149
231,https://www.booking.com/hotel/fr/harmony-3.fr....,Appartement situation exceptionnelle HARMONY III,"Récemment rénové, l'Appartement situation exce...",99,3.0,43.211814,2.352205
264,https://www.booking.com/hotel/fr/au-coeur-des-...,Au Cœur des Remparts,"Situé à Aigues-Mortes, à 24 km du parc des exp...",99,,43.565401,4.192973
110,https://www.booking.com/hotel/fr/studio-cosy-c...,Studio cosy centre-ville/mer,Le Studio cosy centre-ville/mer est situé dans...,98,3.0,49.485931,0.106405
76,https://www.booking.com/hotel/fr/chambre-pin-u...,Chambre Pin Up Wings,"Situé à Eguisheim, à 6,1 km de la Maison des T...",97,3.0,48.042428,7.308131


In [49]:
hotel_tops.describe(include='all')

Unnamed: 0,hotel_url,name,description,review,stars,coord_lat,coord_lon
count,20,20,20,20.0,17.0,20.0,20.0
unique,20,20,20,5.0,,,
top,https://www.booking.com/hotel/fr/au-duplex-d-o...,Au Duplex d'Or Centre Historique,"Situé à Besançon, à 1,2 km de la gare de Besan...",96.0,,,
freq,1,1,1,8.0,,,
mean,,,,,3.294118,45.378839,4.189838
std,,,,,0.469668,2.609166,2.457525
min,,,,,3.0,42.908524,0.095051
25%,,,,,3.0,43.211826,2.167243
50%,,,,,3.0,43.688836,4.692453
75%,,,,,4.0,48.115241,6.10618


In [50]:
# On enlève les sauts de ligne
# et on enlève les caractères spéciaux gênants (oe)
hotel_infos = hotel_infos.replace(r'\n',' ', regex=True)
hotel_infos = hotel_infos.replace({chr(0x0153): 'oe',
                                   chr(0x2013): '-',
                                   chr(0x2019): '\'',
                                   chr(0x2022): '-'}, regex=True)

hotel_infos

Unnamed: 0,hotel_url,name,description,review,stars,coord_lat,coord_lon
0,https://www.booking.com/hotel/fr/hotel-saint-a...,Le Saint Aubert,"Niché dans un écrin de verdure, à seulement 2 ...",72,,48.61293783,-1.51010513
1,https://www.booking.com/hotel/fr/la-vieille-au...,La Vieille Auberge,La Vieille Auberge vous accueille dans le vill...,74,,48.63606300,-1.51145700
2,https://www.booking.com/hotel/fr/d-aleth.fr.html,Hotel d'Aleth,"Situé en face du port des Bas Sablons, l'Hotel...",78,,48.63593081,-2.02171236
3,https://www.booking.com/hotel/fr/hotel-gabriel...,Hotel Gabriel,"L'Hotel Gabriel vous accueille à 1,6 km du Mon...",80,,48.61538141,-1.51070997
4,https://www.booking.com/hotel/fr/chambre-priva...,B&B de la roseraie,"Situé à Bayeux, à seulement 1,8 km de la cathé...",88,3.0,49.29158754,-0.70337099
...,...,...,...,...,...,...,...
345,https://www.booking.com/hotel/fr/parc-709-bayo...,Parc 709 Bayonne,"Offrant une vue sur la rue calme, le Parc 709 ...",93,,43.49560600,-1.48131700
346,https://www.booking.com/hotel/fr/le-grand-larg...,Résidence Vacances Bleues Le Grand Large,Offrant une vue panoramique sur la plage de la...,82,,43.48065123,-1.56534523
347,https://www.booking.com/hotel/fr/3-pieces-lumi...,3 pièces lumineux avec terrasse au Port-Vieux,"Situé en plein centre de Biarritz, le 3 pièces...",10,4.0,43.48180370,-1.56508000
348,https://www.booking.com/hotel/fr/rivage-yourho...,Rivage YourHostHelper,"Situé à 500 mètres de la Grande Plage, à 600 m...",10,3.0,43.48913100,-1.55228730


In [3]:
hotel_infos['review'] = hotel_infos['review'].apply(
    lambda x: float(str(x).replace(',', '.').replace('"', '')) if pd.notnull(x) else x
)
hotel_infos.head()

Unnamed: 0.1,Unnamed: 0,hotel_url,name,description,review,stars,coord_lat,coord_lon
0,0,https://www.booking.com/hotel/fr/hotel-saint-a...,Le Saint Aubert,"Niché dans un écrin de verdure, à seulement 2 ...",7.2,,48.612938,-1.510105
1,1,https://www.booking.com/hotel/fr/la-vieille-au...,La Vieille Auberge,La Vieille Auberge vous accueille dans le vill...,7.4,,48.636063,-1.511457
2,2,https://www.booking.com/hotel/fr/d-aleth.fr.html,Hotel d'Aleth,"Situé en face du port des Bas Sablons, l'Hotel...",7.8,,48.635931,-2.021712
3,3,https://www.booking.com/hotel/fr/hotel-gabriel...,Hotel Gabriel,"L'Hotel Gabriel vous accueille à 1,6 km du Mon...",8.0,,48.615381,-1.51071
4,4,https://www.booking.com/hotel/fr/chambre-priva...,B&B de la roseraie,"Situé à Bayeux, à seulement 1,8 km de la cathé...",8.8,3.0,49.291588,-0.703371


In [9]:
hotel_infos = hotel_infos.drop(columns=['Unnamed: 0'])

In [10]:
hotel_infos.describe(include='all')

Unnamed: 0,hotel_url,name,description,review,stars,coord_lat,coord_lon
count,350,350,350,327.0,199.0,350.0,350.0
unique,350,350,350,,,,
top,https://www.booking.com/hotel/fr/hotel-saint-a...,Le Saint Aubert,"Niché dans un écrin de verdure, à seulement 2 ...",,,,
freq,1,1,1,,,,
mean,,,,8.71682,3.090452,45.839891,3.394217
std,,,,0.755344,0.287552,2.557849,2.903271
min,,,,5.9,3.0,42.520945,-2.025282
25%,,,,8.2,3.0,43.496426,1.35069
50%,,,,8.8,3.0,45.187202,4.357045
75%,,,,9.3,3.0,48.578103,5.711448


In [11]:
hotel_infos.head(1)

Unnamed: 0,hotel_url,name,description,review,stars,coord_lat,coord_lon
0,https://www.booking.com/hotel/fr/hotel-saint-a...,Le Saint Aubert,"Niché dans un écrin de verdure, à seulement 2 ...",7.2,,48.612938,-1.510105


In [12]:
hotel_infos.to_csv('hotel_infos.csv', encoding='utf-8')

In [48]:
fig_hotels = px.scatter_mapbox(
        hotel_tops,
        lat=hotel_tops["coord_lat"],
        lon=hotel_tops["coord_lon"],
        color="review",
        mapbox_style="open-street-map",
        color_continuous_scale=px.colors.sequential.Blues,
        zoom=4,
        width=800
    )
fig_hotels.show()