![Kayak](https://seekvectorlogo.com/wp-content/uploads/2018/01/kayak-vector-logo.png)

# Plan your trip with Kayak 

## Company's description 📇

<a href="https://www.kayak.com" target="_blank">Kayak</a> is a travel search engine that helps user plan their next trip at the best price.

The company was founded in 2004 by Steve Hafner & Paul M. English. After a few rounds of fundraising, Kayak was acquired by <a href="https://www.bookingholdings.com/" target="_blank">Booking Holdings</a> which now holds: 

* <a href="https://booking.com/" target="_blank">Booking.com</a>
* <a href="https://kayak.com/" target="_blank">Kayak</a>
* <a href="https://www.priceline.com/" target="_blank">Priceline</a>
* <a href="https://www.agoda.com/" target="_blank">Agoda</a>
* <a href="https://Rentalcars.com/" target="_blank">RentalCars</a>
* <a href="https://www.opentable.com/" target="_blank">OpenTable</a>

With over \$300 million revenue a year, Kayak operates in almost all countries and all languages to help their users book travels accros the globe. 

## Project 🚧

The marketing team needs help on a new project. After doing some user research, the team discovered that **70% of their users who are planning a trip would like to have more information about the destination they are going to**. 

In addition, user research shows that **people tend to be defiant about the information they are reading if they don't know the brand** which produced the content. 

Therefore, Kayak Marketing Team would like to create an application that will recommend where people should plan their next holidays. The application should be based on real data about:

* Weather 
* Hotels in the area 

The application should then be able to recommend the best destinations and hotels based on the above variables at any given time. 

## Goals 🎯

As the project has just started, your team doesn't have any data that can be used to create this application. Therefore, your job will be to: 

* Scrape data from destinations 
* Get weather data from each destination 
* Get hotels' info about each destination
* Store all the information above in a data lake
* Extract, transform and load cleaned data from your datalake to a data warehouse

## Scope of this project 🖼️

Marketing team wants to focus first on the best cities to travel to in France. According <a href="https://one-week-in.com/35-cities-to-visit-in-france/" target="_blank">One Week In.com</a> here are the top-35 cities to visit in France: 

```python 
["Mont Saint Michel",
"St Malo",
"Bayeux",
"Le Havre",
"Rouen",
"Paris",
"Amiens",
"Lille",
"Strasbourg",
"Chateau du Haut Koenigsbourg",
"Colmar",
"Eguisheim",
"Besancon",
"Dijon",
"Annecy",
"Grenoble",
"Lyon",
"Gorges du Verdon",
"Bormes les Mimosas",
"Cassis",
"Marseille",
"Aix en Provence",
"Avignon",
"Uzes",
"Nimes",
"Aigues Mortes",
"Saintes Maries de la mer",
"Collioure",
"Carcassonne",
"Ariege",
"Toulouse",
"Montauban",
"Biarritz",
"Bayonne",
"La Rochelle"]
```

Your team should focus **only on the above cities for your project**. 


## Helpers 🦮

To help you achieve this project, here are a few tips that should help you

### Get weather data with an API 

*   Use https://nominatim.org/ to get the gps coordinates of all the cities (no subscription required) Documentation : https://nominatim.org/release-docs/develop/api/Search/

*   Use https://openweathermap.org/appid (you have to subscribe to get a free apikey) and https://openweathermap.org/api/one-call-api to get some information about the weather for the 35 cities and put it in a DataFrame

*   Determine the list of cities where the weather will be the nicest within the next 7 days For example, you can use the values of daily.pop and daily.rain to compute the expected volume of rain within the next 7 days... But it's only an example, actually you can have different opinions on a what a nice weather would be like 😎 Maybe the most important criterion for you is the temperature or humidity, so feel free to change the rules !

*   Save all the results in a `.csv` file, you will use it later 😉 You can save all the informations that seem important to you ! Don't forget to save the name of the cities, and also to create a column containing a unique identifier (id) of each city (this is important for what's next in the project)

*   Use plotly to display the best destinations on a map

### Scrape Booking.com 

Since BookingHoldings doesn't have aggregated databases, it will be much faster to scrape data directly from booking.com 

You can scrap as many information asyou want, but we suggest that you get at least:

*   hotel name,
*   Url to its booking.com page,
*   Its coordinates: latitude and longitude
*   Score given by the website users
*   Text description of the hotel


### Create your data lake using S3 

Once you managed to build your dataset, you should store into S3 as a csv file. 

### ETL 

Once you uploaded your data onto S3, it will be better for the next data analysis team to extract clean data directly from a Data Warehouse. Therefore, create a SQL Database using AWS RDS, extract your data from S3 and store it in your newly created DB. 

## Deliverable 📬

To complete this project, your team should deliver:

* A `.csv` file in an S3 bucket containing enriched information about weather and hotels for each french city

* A SQL Database where we should be able to get the same cleaned data from S3 

* Two maps where you should have a Top-5 destinations and a Top-20 hotels in the area. You can use plotly or any other library to do so. It should look something like this: 

![Map](https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/Kayak_best_destination_project.png)

## 1 - Get weather data from each destination ##


In [1]:
from bs4 import BeautifulSoup
import requests as r
import pandas as pd
import json

In [2]:
#Variable for scraping

country = "France"

In [3]:
#Create a list for the 35 towns

destination = ["Mont Saint Michel",
"St Malo",
"Bayeux",
"Le Havre",
"Rouen",
"Paris",
"Amiens",
"Lille",
"Strasbourg",
"Chateau du Haut Koenigsbourg",
"Colmar",
"Eguisheim",
"Besancon",
"Dijon",
"Annecy",
"Grenoble",
"Lyon",
"Gorges du Verdon",
"Bormes les Mimosas",
"Cassis",
"Marseille",
"Aix en Provence",
"Avignon",
"Uzes",
"Nimes",
"Aigues Mortes",
"Saintes Maries de la mer",
"Collioure",
"Carcassonne",
"Ariege",
"Toulouse",
"Montauban",
"Biarritz",
"Bayonne",
"La Rochelle"]

In [4]:
#Test a GPS coordinates request for a destination

gps=r.get("https://nominatim.openstreetmap.org/search/Mont Saint Michel?format=json&addressdetails=1&limit=1&polygon_svg=0")

gps.text

'[{"place_id":151486647,"licence":"Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright","osm_type":"way","osm_id":211285890,"boundingbox":["48.6349172","48.637031","-1.5133292","-1.5094796"],"lat":"48.6359541","lon":"-1.511459954959514","display_name":"Mont Saint-Michel, Le Mont-Saint-Michel, Avranches, Manche, Normandie, France métropolitaine, 50170, France","class":"place","type":"islet","importance":0.755436556781574,"address":{"place":"Mont Saint-Michel","village":"Le Mont-Saint-Michel","municipality":"Avranches","county":"Manche","state":"Normandie","region":"France métropolitaine","postcode":"50170","country":"France","country_code":"fr"}}]'

In [5]:
gps.json()

[{'place_id': 151486647,
  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
  'osm_type': 'way',
  'osm_id': 211285890,
  'boundingbox': ['48.6349172', '48.637031', '-1.5133292', '-1.5094796'],
  'lat': '48.6359541',
  'lon': '-1.511459954959514',
  'display_name': 'Mont Saint-Michel, Le Mont-Saint-Michel, Avranches, Manche, Normandie, France métropolitaine, 50170, France',
  'class': 'place',
  'type': 'islet',
  'importance': 0.755436556781574,
  'address': {'place': 'Mont Saint-Michel',
   'village': 'Le Mont-Saint-Michel',
   'municipality': 'Avranches',
   'county': 'Manche',
   'state': 'Normandie',
   'region': 'France métropolitaine',
   'postcode': '50170',
   'country': 'France',
   'country_code': 'fr'}}]

In [6]:
gps = pd.DataFrame(gps.json())
gps.head()

Unnamed: 0,place_id,licence,osm_type,osm_id,boundingbox,lat,lon,display_name,class,type,importance,address
0,151486647,"Data © OpenStreetMap contributors, ODbL 1.0. h...",way,211285890,"[48.6349172, 48.637031, -1.5133292, -1.5094796]",48.6359541,-1.511459954959514,"Mont Saint-Michel, Le Mont-Saint-Michel, Avran...",place,islet,0.755437,"{'place': 'Mont Saint-Michel', 'village': 'Le ..."


In [7]:
#Loop for all destinations to get coordinates

for i in destination:
    response = r.get(f'https://nominatim.openstreetmap.org/search/{i}/FR?format=json&addressdetails=1&limit=1&polygon_svg=1')
    gps = gps.append(response.json(), ignore_index=True)
gps.head()

  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gps.append(response.json(), ignore_index=True)
  gps = gp

Unnamed: 0,place_id,licence,osm_type,osm_id,boundingbox,lat,lon,display_name,class,type,importance,address,svg,icon
0,151486647,"Data © OpenStreetMap contributors, ODbL 1.0. h...",way,211285890,"[48.6349172, 48.637031, -1.5133292, -1.5094796]",48.6359541,-1.511459954959514,"Mont Saint-Michel, Le Mont-Saint-Michel, Avran...",place,islet,0.755437,"{'place': 'Mont Saint-Michel', 'village': 'Le ...",,
1,151486647,"Data © OpenStreetMap contributors, ODbL 1.0. h...",way,211285890,"[48.6349172, 48.637031, -1.5133292, -1.5094796]",48.6359541,-1.511459954959514,"Mont Saint-Michel, Le Mont-Saint-Michel, Avran...",place,islet,0.855437,"{'place': 'Mont Saint-Michel', 'village': 'Le ...",M -1.5133292 -48.636198 L -1.5133291 -48.63618...,
2,282098015,"Data © OpenStreetMap contributors, ODbL 1.0. h...",relation,905534,"[48.5979853, 48.6949736, -2.0765246, -1.9367259]",48.649518,-2.0260409,"Saint-Malo, Ille-et-Vilaine, Bretagne, France ...",boundary,administrative,0.776467,"{'town': 'Saint-Malo', 'municipality': 'Saint-...",M -2.0765246 -48.6752522 L -2.0764951 -48.6751...,https://nominatim.openstreetmap.org/ui/mapicon...
3,281962470,"Data © OpenStreetMap contributors, ODbL 1.0. h...",relation,145776,"[49.2608124, 49.2934736, -0.7275671, -0.6757378]",49.2764624,-0.7024738,"Bayeux, Calvados, Normandie, France métropolit...",boundary,administrative,0.7827,"{'town': 'Bayeux', 'municipality': 'Bayeux', '...",M -0.7275671 -49.2746757 L -0.7274923 -49.2741...,https://nominatim.openstreetmap.org/ui/mapicon...
4,282341149,"Data © OpenStreetMap contributors, ODbL 1.0. h...",relation,104492,"[49.4516697, 49.5401463, 0.0667992, 0.1955556]",49.4938975,0.1079732,"Le Havre, Seine-Maritime, Normandie, France mé...",boundary,administrative,0.922333,"{'city': 'Le Havre', 'municipality': 'Le Havre...",M 0.0667992 -49.5185472 L 0.0696538 -49.518014...,https://nominatim.openstreetmap.org/ui/mapicon...


In [8]:
# To be sure to get the 35 places after data frame transformation
gps.shape

(36, 14)

In [9]:
# Remove the first row

gps =gps.drop(0)
gps.head()

Unnamed: 0,place_id,licence,osm_type,osm_id,boundingbox,lat,lon,display_name,class,type,importance,address,svg,icon
1,151486647,"Data © OpenStreetMap contributors, ODbL 1.0. h...",way,211285890,"[48.6349172, 48.637031, -1.5133292, -1.5094796]",48.6359541,-1.511459954959514,"Mont Saint-Michel, Le Mont-Saint-Michel, Avran...",place,islet,0.855437,"{'place': 'Mont Saint-Michel', 'village': 'Le ...",M -1.5133292 -48.636198 L -1.5133291 -48.63618...,
2,282098015,"Data © OpenStreetMap contributors, ODbL 1.0. h...",relation,905534,"[48.5979853, 48.6949736, -2.0765246, -1.9367259]",48.649518,-2.0260409,"Saint-Malo, Ille-et-Vilaine, Bretagne, France ...",boundary,administrative,0.776467,"{'town': 'Saint-Malo', 'municipality': 'Saint-...",M -2.0765246 -48.6752522 L -2.0764951 -48.6751...,https://nominatim.openstreetmap.org/ui/mapicon...
3,281962470,"Data © OpenStreetMap contributors, ODbL 1.0. h...",relation,145776,"[49.2608124, 49.2934736, -0.7275671, -0.6757378]",49.2764624,-0.7024738,"Bayeux, Calvados, Normandie, France métropolit...",boundary,administrative,0.7827,"{'town': 'Bayeux', 'municipality': 'Bayeux', '...",M -0.7275671 -49.2746757 L -0.7274923 -49.2741...,https://nominatim.openstreetmap.org/ui/mapicon...
4,282341149,"Data © OpenStreetMap contributors, ODbL 1.0. h...",relation,104492,"[49.4516697, 49.5401463, 0.0667992, 0.1955556]",49.4938975,0.1079732,"Le Havre, Seine-Maritime, Normandie, France mé...",boundary,administrative,0.922333,"{'city': 'Le Havre', 'municipality': 'Le Havre...",M 0.0667992 -49.5185472 L 0.0696538 -49.518014...,https://nominatim.openstreetmap.org/ui/mapicon...
5,122848,"Data © OpenStreetMap contributors, ODbL 1.0. h...",node,26686587,"[49.2804591, 49.6004591, 0.9339658, 1.2539658]",49.4404591,1.0939658,"Rouen, Seine-Maritime, Normandie, France métro...",place,city,0.850073,"{'city': 'Rouen', 'municipality': 'Rouen', 'co...","cx=""1.0939658"" cy=""-49.4404591""",https://nominatim.openstreetmap.org/ui/mapicon...


In [10]:
#Adding of destination with lat and lon

gps['destination'] = destination
gps.head()

Unnamed: 0,place_id,licence,osm_type,osm_id,boundingbox,lat,lon,display_name,class,type,importance,address,svg,icon,destination
1,151486647,"Data © OpenStreetMap contributors, ODbL 1.0. h...",way,211285890,"[48.6349172, 48.637031, -1.5133292, -1.5094796]",48.6359541,-1.511459954959514,"Mont Saint-Michel, Le Mont-Saint-Michel, Avran...",place,islet,0.855437,"{'place': 'Mont Saint-Michel', 'village': 'Le ...",M -1.5133292 -48.636198 L -1.5133291 -48.63618...,,Mont Saint Michel
2,282098015,"Data © OpenStreetMap contributors, ODbL 1.0. h...",relation,905534,"[48.5979853, 48.6949736, -2.0765246, -1.9367259]",48.649518,-2.0260409,"Saint-Malo, Ille-et-Vilaine, Bretagne, France ...",boundary,administrative,0.776467,"{'town': 'Saint-Malo', 'municipality': 'Saint-...",M -2.0765246 -48.6752522 L -2.0764951 -48.6751...,https://nominatim.openstreetmap.org/ui/mapicon...,St Malo
3,281962470,"Data © OpenStreetMap contributors, ODbL 1.0. h...",relation,145776,"[49.2608124, 49.2934736, -0.7275671, -0.6757378]",49.2764624,-0.7024738,"Bayeux, Calvados, Normandie, France métropolit...",boundary,administrative,0.7827,"{'town': 'Bayeux', 'municipality': 'Bayeux', '...",M -0.7275671 -49.2746757 L -0.7274923 -49.2741...,https://nominatim.openstreetmap.org/ui/mapicon...,Bayeux
4,282341149,"Data © OpenStreetMap contributors, ODbL 1.0. h...",relation,104492,"[49.4516697, 49.5401463, 0.0667992, 0.1955556]",49.4938975,0.1079732,"Le Havre, Seine-Maritime, Normandie, France mé...",boundary,administrative,0.922333,"{'city': 'Le Havre', 'municipality': 'Le Havre...",M 0.0667992 -49.5185472 L 0.0696538 -49.518014...,https://nominatim.openstreetmap.org/ui/mapicon...,Le Havre
5,122848,"Data © OpenStreetMap contributors, ODbL 1.0. h...",node,26686587,"[49.2804591, 49.6004591, 0.9339658, 1.2539658]",49.4404591,1.0939658,"Rouen, Seine-Maritime, Normandie, France métro...",place,city,0.850073,"{'city': 'Rouen', 'municipality': 'Rouen', 'co...","cx=""1.0939658"" cy=""-49.4404591""",https://nominatim.openstreetmap.org/ui/mapicon...,Rouen


In [12]:
# Reset index

clean_gps = gps.loc[:,['destination','lat','lon']]
clean_gps = clean_gps.reset_index()
clean_gps.head()

Unnamed: 0,index,destination,lat,lon
0,1,Mont Saint Michel,48.6359541,-1.511459954959514
1,2,St Malo,48.649518,-2.0260409
2,3,Bayeux,49.2764624,-0.7024738
3,4,Le Havre,49.4938975,0.1079732
4,5,Rouen,49.4404591,1.0939658


In [13]:
# Recording of the file for cleaned data

clean_gps.to_csv('35_towns_gps.csv', index=False)

In [14]:
## Test to get weather for a destination

weathertest = r.get(f"https://api.openweathermap.org/data/2.5/onecall?lat=48.6359541&lon=-1.511459954959514&exclude=daily,minutely,hourly&appid=065e46c0a92a72de01782328806aa55a")

In [15]:
weathertest.json()

{'lat': 48.636,
 'lon': -1.5115,
 'timezone': 'Europe/Paris',
 'timezone_offset': 3600,
 'current': {'dt': 1647508083,
  'sunrise': 1647497739,
  'sunset': 1647540798,
  'temp': 282.54,
  'feels_like': 280.43,
  'pressure': 1033,
  'humidity': 71,
  'dew_point': 277.55,
  'uvi': 1.23,
  'clouds': 0,
  'visibility': 10000,
  'wind_speed': 3.95,
  'wind_deg': 9,
  'wind_gust': 6.52,
  'weather': [{'id': 800,
    'main': 'Clear',
    'description': 'clear sky',
    'icon': '01d'}]}}

In [16]:
# Isolate lat and lon for loop

lat=clean_gps.loc[:,"lat"]
lat

0             48.6359541
1              48.649518
2             49.2764624
3             49.4938975
4             49.4404591
5             48.8588897
6             49.8941708
7             50.6365654
8              48.584614
9     48.249489800000006
10            48.0777517
11            48.0447968
12            47.2380222
13            47.3215806
14            45.8992348
15            45.1875602
16            45.7578137
17            43.7496562
18            43.1572172
19            43.2140359
20            43.2961743
21            43.5298424
22            43.9492493
23            44.0121279
24            43.8374249
25            43.5658225
26            43.4522771
27              42.52505
28            43.2130358
29            42.9455368
30            43.6044622
31            44.0175835
32            43.4832523
33            43.4933379
34            46.1591126
Name: lat, dtype: object

In [17]:
lon=clean_gps.loc[:,"lon"]
lon

0     -1.511459954959514
1             -2.0260409
2             -0.7024738
3              0.1079732
4              1.0939658
5     2.3200410217200766
6              2.2956951
7              3.0635282
8              7.7507127
9       7.34429620253195
10             7.3579641
11             7.3079618
12             6.0243622
13             5.0414701
14             6.1288847
15             5.7357819
16             4.8320114
17             6.3285616
18     6.329253867921363
19             5.5396318
20             5.3699525
21             5.4474738
22             4.8059012
23             4.4196718
24             4.3600687
25             4.1912837
26             4.4287172
27             3.0831554
28             2.3491069
29    1.4065544156065486
30             1.4442469
31             1.3549991
32            -1.5592776
33             -1.475099
34            -1.1520434
Name: lon, dtype: object

In [18]:
#To loop on each destination to retrieve weather criterias 

weatherloop=[]

for i, j in zip(lat, lon):
        response = r.get(f"https://api.openweathermap.org/data/2.5/onecall?lat={i}&lon={j}&exclude=daily,minutely,hourly&appid=065e46c0a92a72de01782328806aa55a")
        weather_35 = response.json()
        
        weatherloop.append(weather_35)

In [18]:
weatherloop

[{'lat': 48.636,
  'lon': -1.5115,
  'timezone': 'Europe/Paris',
  'timezone_offset': 3600,
  'current': {'dt': 1646402819,
   'sunrise': 1646376144,
   'sunset': 1646416411,
   'temp': 284.31,
   'feels_like': 283.28,
   'pressure': 1021,
   'humidity': 69,
   'dew_point': 278.84,
   'uvi': 1.04,
   'clouds': 98,
   'visibility': 10000,
   'wind_speed': 7.04,
   'wind_deg': 332,
   'wind_gust': 8.31,
   'weather': [{'id': 804,
     'main': 'Clouds',
     'description': 'overcast clouds',
     'icon': '04d'}]}},
 {'lat': 48.6495,
  'lon': -2.026,
  'timezone': 'Europe/Paris',
  'timezone_offset': 3600,
  'current': {'dt': 1646402819,
   'sunrise': 1646376268,
   'sunset': 1646416534,
   'temp': 284,
   'feels_like': 282.99,
   'pressure': 1022,
   'humidity': 71,
   'dew_point': 278.96,
   'uvi': 1.18,
   'clouds': 75,
   'visibility': 10000,
   'wind_speed': 8.23,
   'wind_deg': 330,
   'weather': [{'id': 803,
     'main': 'Clouds',
     'description': 'broken clouds',
     'icon': '0

In [20]:
# Weather for current day

weatherdata = pd.DataFrame(weatherloop)
weatherdata.head()

Unnamed: 0,lat,lon,timezone,timezone_offset,current,alerts
0,48.636,-1.5115,Europe/Paris,3600,"{'dt': 1647508135, 'sunrise': 1647497739, 'sun...",
1,48.6495,-2.026,Europe/Paris,3600,"{'dt': 1647508135, 'sunrise': 1647497862, 'sun...",
2,49.2765,-0.7025,Europe/Paris,3600,"{'dt': 1647508135, 'sunrise': 1647497550, 'sun...",
3,49.4939,0.108,Europe/Paris,3600,"{'dt': 1647508135, 'sunrise': 1647497357, 'sun...",
4,49.4405,1.094,Europe/Paris,3600,"{'dt': 1647508135, 'sunrise': 1647497120, 'sun...",


In [21]:
# To split data on each column

weatherdata1 = weatherdata['current'].apply(pd.Series)
display(weatherdata1.head())
weatherdata2 = weatherdata1['weather'].apply(pd.Series)
display(weatherdata2.head())
weatherdata3 = weatherdata2[0].apply(pd.Series)
display(weatherdata3.head())
#weatherdata4 = weatherdata2['weather'].apply(pd.Series)
#weatherdata3.head()

Unnamed: 0,dt,sunrise,sunset,temp,feels_like,pressure,humidity,dew_point,uvi,clouds,visibility,wind_speed,wind_deg,wind_gust,weather,rain
0,1647508135,1647497739,1647540798,282.54,280.43,1033,71,277.55,1.23,0,10000,3.95,9,6.52,"[{'id': 800, 'main': 'Clear', 'description': '...",
1,1647508135,1647497862,1647540921,282.86,280.32,1033,71,277.86,1.21,0,10000,5.14,360,,"[{'id': 800, 'main': 'Clear', 'description': '...",
2,1647508135,1647497550,1647540599,280.29,278.1,1033,64,273.95,1.24,3,10000,3.19,356,4.43,"[{'id': 800, 'main': 'Clear', 'description': '...",
3,1647508135,1647497357,1647540403,281.66,278.62,1032,71,276.71,1.26,100,10000,5.66,350,,"[{'id': 804, 'main': 'Clouds', 'description': ...",
4,1647508135,1647497120,1647540166,280.63,277.53,1030,80,277.41,1.31,0,10000,5.14,330,,"[{'id': 800, 'main': 'Clear', 'description': '...",


Unnamed: 0,0
0,"{'id': 800, 'main': 'Clear', 'description': 'c..."
1,"{'id': 800, 'main': 'Clear', 'description': 'c..."
2,"{'id': 800, 'main': 'Clear', 'description': 'c..."
3,"{'id': 804, 'main': 'Clouds', 'description': '..."
4,"{'id': 800, 'main': 'Clear', 'description': 'c..."


Unnamed: 0,id,main,description,icon
0,800,Clear,clear sky,01d
1,800,Clear,clear sky,01d
2,800,Clear,clear sky,01d
3,804,Clouds,overcast clouds,04d
4,800,Clear,clear sky,01d


In [24]:
# Concatenate the three databases

global_weath = pd.concat([clean_gps, weatherdata1,  weatherdata2,  weatherdata3], axis=1)
global_weath.head()

Unnamed: 0,index,destination,lat,lon,dt,sunrise,sunset,temp,feels_like,pressure,...,wind_speed,wind_deg,wind_gust,weather,rain,0,id,main,description,icon
0,1,Mont Saint Michel,48.6359541,-1.511459954959514,1647508135,1647497739,1647540798,282.54,280.43,1033,...,3.95,9,6.52,"[{'id': 800, 'main': 'Clear', 'description': '...",,"{'id': 800, 'main': 'Clear', 'description': 'c...",800,Clear,clear sky,01d
1,2,St Malo,48.649518,-2.0260409,1647508135,1647497862,1647540921,282.86,280.32,1033,...,5.14,360,,"[{'id': 800, 'main': 'Clear', 'description': '...",,"{'id': 800, 'main': 'Clear', 'description': 'c...",800,Clear,clear sky,01d
2,3,Bayeux,49.2764624,-0.7024738,1647508135,1647497550,1647540599,280.29,278.1,1033,...,3.19,356,4.43,"[{'id': 800, 'main': 'Clear', 'description': '...",,"{'id': 800, 'main': 'Clear', 'description': 'c...",800,Clear,clear sky,01d
3,4,Le Havre,49.4938975,0.1079732,1647508135,1647497357,1647540403,281.66,278.62,1032,...,5.66,350,,"[{'id': 804, 'main': 'Clouds', 'description': ...",,"{'id': 804, 'main': 'Clouds', 'description': '...",804,Clouds,overcast clouds,04d
4,5,Rouen,49.4404591,1.0939658,1647508135,1647497120,1647540166,280.63,277.53,1030,...,5.14,330,,"[{'id': 800, 'main': 'Clear', 'description': '...",,"{'id': 800, 'main': 'Clear', 'description': 'c...",800,Clear,clear sky,01d


In [25]:
# Cleaning of dataset

clean_weat = global_weath.loc[:,['index','destination','lat','lon','dt','temp','pressure','description']]
clean_weat.head()

Unnamed: 0,index,destination,lat,lon,dt,temp,pressure,description
0,1,Mont Saint Michel,48.6359541,-1.511459954959514,1647508135,282.54,1033,clear sky
1,2,St Malo,48.649518,-2.0260409,1647508135,282.86,1033,clear sky
2,3,Bayeux,49.2764624,-0.7024738,1647508135,280.29,1033,clear sky
3,4,Le Havre,49.4938975,0.1079732,1647508135,281.66,1032,overcast clouds
4,5,Rouen,49.4404591,1.0939658,1647508135,280.63,1030,clear sky


In [26]:
clean_weat[["lat", "lon"]] = clean_weat[["lat", "lon"]].apply(pd.to_numeric)
clean_weat.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35 entries, 0 to 34
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   index        35 non-null     int64  
 1   destination  35 non-null     object 
 2   lat          35 non-null     float64
 3   lon          35 non-null     float64
 4   dt           35 non-null     int64  
 5   temp         35 non-null     float64
 6   pressure     35 non-null     int64  
 7   description  35 non-null     object 
dtypes: float64(3), int64(3), object(2)
memory usage: 2.3+ KB


In [27]:
# Map of meteo

! pip install plotly
import plotly.graph_objects as go
import plotly.io as pio
import plotly.express as px
pio.renderers.default = "iframe_connected"



In [28]:
fig = px.scatter_mapbox(clean_weat, lat='lat', lon='lon', size = 'temp', opacity=0.4, mapbox_style="carto-positron",color = "description", zoom=4.5)
fig.show(renderer = "iframe_connected")

## 2 - Get hotels' info about each destination ##


#### Herebelow two manner to scrap booking.com on differents URL. I'd like to keep them for the workshop ####


#### A - Scraping with name of hotel ####


In [35]:
import html.parser
import re

In [36]:
# Retrieving for the first destination

respbook = r.get("https://www.booking.com/hotel/fr/le-relais-du-roy.fr.html?aid=304142;label=gen173nr-1FCAEoggI46AdIM1gEaE2IAQGYAQ24AQfIAQ3YAQHoAQH4AQOIAgGoAgO4Aqep9Y4GwAIB0gIkZjU4YzRiZjEtNDIwNC00Yjg3LTg1NWUtNDBkNDVhMGZjNmRj2AIF4AIB;sid=836997d7e7a0eb5d392ae733c38b84b1;age=6;age=6;dest_id=73;dest_type=country;dist=0;group_adults=2;group_children=2;hapos=2;hpos=2;no_rooms=1;req_adults=2;req_age=6;req_age=6;req_children=2;room1=A%2CA%2C6%2C6;sb_price_type=total;sr_order=popularity;srepoch=1643488946;srpvid=1f2f9198f6d5018b;type=total;ucfs=1&#hotelTmpl")
            
respbook

<Response [200]>

In [37]:
respbook.text



In [39]:
#display the html code structure
soup = BeautifulSoup(respbook.text, 'html.parser')
print(soup.prettify())

<!DOCTYPE html>
<!--
You know you could be getting paid to poke around in our code?
We're hiring designers and developers to work in Amsterdam:
https://careers.booking.com/
-->
<!-- wdot-802 -->
<script nonce="VfALZyj0UAxip30" type="text/javascript">
 document.addEventListener('DOMContentLoaded', function () {
//c360 javascript tracker first iteration
//sends a track request to c360 http tracker
//in order to use it:
//1. inline the c360Tracker.js in the page you need to use it
//2. in your js file:
//
// var c360Tracker = B.require('c360Tracker');
// var event = {
// action_name:"accommodation_checkout_confirmation_viewed",
// action_version :"0.2.0",
// content : { "transaction_id" : 123434},
// user : { "BKNG_user_id": 123434}
// };
// c360Tracker.track(event);
B.define('c360Tracker', function () {
var enrichedContext = {};
var configuration = {
validateInput: false
};
var track = function (event) {
if (event == null) {
return "event object is null or empty";
} else {
//if (enriched

### --> Test from Hotel's page (without loop on each destination)

In [41]:
# Get name of hotel
hotel = soup.find("a","fn").get_text("   ",strip=True)
hotel

'Le Relais Du Roy'

In [42]:
# Get GPS coordinates

gps = soup.find("a", "jq_tooltip loc_block_link_underline_fix bui-link show_on_map_hp_link").attrs['data-atlas-latlng']
gps

'48.61626270,-1.51090577'

In [43]:
# Get URL

url = soup.find_all("link")[20].attrs['href']
url

'https://www.booking.com/hotel/fr/le-relais-du-roy.fr.html'

In [44]:
# Get score

score = soup.find("div", "_9c5f726ff").get_text()
score

'8,0'

In [45]:
# Get description

description = soup.find_all(id="property_description_content")[0].get_text("   ",strip=True)
description

"Vous pouvez bénéficier d'une réduction Genius dans l'établissement Le Relais Du Roy\xa0!   Connectez-vous   pour économiser.   Le Relais Du Roy est un hôtel 3 étoiles situé au bord du Couesnon, à seulement 1,5 km du Mont Saint-Michel et à 50 mètres de la navette gratuite.   Décorées dans un style classique, toutes les chambres comprennent une télévision par satellite à écran plat, un téléphone et un plateau de bienvenue. Elles comprennent également une salle de bains privative pourvue d’une baignoire ou d’une douche, d’articles de toilettes gratuits et d’un sèche-cheveux. Certaines chambres possèdent un balcon.   Un petit-déjeuner buffet est servi tous les matins.   Une épicerie se trouve à moins de 3 minutes à pied. La ville de Pontorson est quant à elle accessible à 7 km. Un espace de stationnement privé est disponible moyennant des frais supplémentaires."

In [46]:
# loop through all elements to extract the info of every destination
listhot = []
for hotel, gps, url, description, score in zip(hotel, gps, url, description, score):
    
    hotel = soup.find("a","fn").get_text("   ",strip=True)
    gps = soup.find("a", "jq_tooltip loc_block_link_underline_fix bui-link show_on_map_hp_link").attrs['data-atlas-latlng']
    url = soup.find_all("link")[20].attrs['href']
    description = soup.find_all(id="property_description_content")[0].get_text("   ",strip=True)
    score = soup.find("div", "_9c5f726ff").get_text()
    
    hotel_info = {
        'name_of_hotel': hotel,
        'GPS coordinates': gps,
        'link': url,
        'description': description,
        'score': score
    }
    listhot.append(hotel_info)
    

In [47]:
listhot = listhot[0]
listhot

{'name_of_hotel': 'Le Relais Du Roy',
 'GPS coordinates': '48.61626270,-1.51090577',
 'link': 'https://www.booking.com/hotel/fr/le-relais-du-roy.fr.html',
 'description': "Vous pouvez bénéficier d'une réduction Genius dans l'établissement Le Relais Du Roy\xa0!   Connectez-vous   pour économiser.   Le Relais Du Roy est un hôtel 3 étoiles situé au bord du Couesnon, à seulement 1,5 km du Mont Saint-Michel et à 50 mètres de la navette gratuite.   Décorées dans un style classique, toutes les chambres comprennent une télévision par satellite à écran plat, un téléphone et un plateau de bienvenue. Elles comprennent également une salle de bains privative pourvue d’une baignoire ou d’une douche, d’articles de toilettes gratuits et d’un sèche-cheveux. Certaines chambres possèdent un balcon.   Un petit-déjeuner buffet est servi tous les matins.   Une épicerie se trouve à moins de 3 minutes à pied. La ville de Pontorson est quant à elle accessible à 7 km. Un espace de stationnement privé est dispon

### --> Scrap from France page with loop

In [48]:
def booking(destination, country):
    
    
    # Import libraries and set headers
   
    headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.2 Safari/605.1.15"}
    
    # Initialize empty dataframe
    final_df = pd.DataFrame()
    
    # Loop for each city
    for city in destination:
        url = "https://www.booking.com/searchresults.fr.html?&order=price&ss={city}%2C%20{country}"\
                .format(city=city,
                        country=country)
        response = r.get(url, headers=headers)
        soup2 = BeautifulSoup(response.text, "html")
        
        # Get url links
        links = soup2.find_all("a","fb01724e5b")
        links_list = ["https://www.booking.com" + str(link.get("href").replace("\n", "")) for link in links]
        final_links = pd.DataFrame(links_list[:1], columns=["Link"])
        IDs = [i for i in range(1)]
        final_links["ID"] = IDs
        
        # Get house names
        names = soup2.find_all("div", class_="fde444d7ef _c445487e2")
        names_list = [name.get_text() for name in names]
        for i in range(len(names_list)):
            names_list[i] = names_list[i].replace("\n", "")
        houses = pd.DataFrame(names_list, columns=["House"]).head(1)
        houses["ID"] = IDs
        
         # Get description
        descrip = soup2.find_all("div", class_="_4abc4c3d5")
        descrip_list = [dess.get_text() for dess in descrip]
        for i in range(len(descrip_list)):
            descrip_list[i] = descrip_list[i].replace("\n", "")
        Description = pd.DataFrame(descrip_list, columns=["Description"]).head(1)
        Description["ID"] = IDs
        
          # Get score
        score = soup2.find_all("div", class_="_9c5f726ff bd528f9ea6")
        score_list = [sco.get_text() for sco in score]
        for i in range(len(score_list)):
            descrip_list[i] = score_list[i].replace("\n", "")
        scorebook = pd.DataFrame(score_list, columns=["Score"]).head(1)
        scorebook["ID"] = IDs
        
        # Merge the two dataframes
        df = final_links.merge(houses, on="ID").merge(Description, on="ID").merge(scorebook, on="ID")
        df["City"] = f"{city}"
        final_df = final_df.append(df).reset_index(drop=True)
        
        
        final_df.to_csv("booking.csv")
    
    return final_df
    


In [49]:
booking(destination, country)


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated a

Unnamed: 0,Link,ID,House,Description,Score,City
0,https://www.booking.comhttps://www.booking.com...,0,Hôtel Vert,"Situé à 2 km du Mont-Saint-Michel, sur la côte...",81,Mont Saint Michel
1,https://www.booking.comhttps://www.booking.com...,0,Cap à l'Ouest by Cocoonr,Le Cap à l'Ouest by Cocoonr est situé à Saint-...,84,St Malo
2,https://www.booking.comhttps://www.booking.com...,0,Le Castel Guesthouse,La maison d'hôtes Le Castel Guesthouse est sit...,87,Bayeux
3,https://www.booking.comhttps://www.booking.com...,0,Hilton Garden Inn Le Havre Centre,Établissement Voyage Durable,88,Le Havre
4,https://www.booking.comhttps://www.booking.com...,0,"Radisson Blu Hotel, Rouen Centre","Situé à Rouen, le Radisson Blu Hotel, Rouen Ce...",89,Rouen
5,https://www.booking.comhttps://www.booking.com...,0,Lennon Hotel Paris,"Doté d'un bar, le Lennon Hotel Paris est situé...",90,Paris
6,https://www.booking.comhttps://www.booking.com...,0,Le Majestic Cathédrale,"Situé à Amiens, à 1,7 km du Zénith, l'établiss...",92,Amiens
7,https://www.booking.comhttps://www.booking.com...,0,Hotel Lille Europe,L'Hotel Lille Europe est un établissement 3 ét...,81,Lille
8,https://www.booking.comhttps://www.booking.com...,0,Comfort Hotel Strasbourg - Montagne Verte & Re...,"Situé près de l'Ill, le Comfort Hotel Strasbou...",83,Strasbourg
9,https://www.booking.comhttps://www.booking.com...,0,Les Chambres du Haut-Koenigsbourg,"Situé à Orschwiller, en Alsace, l'établissemen...",88,Chateau du Haut Koenigsbourg


In [50]:
findf = pd.read_csv("booking.csv")
findf

Unnamed: 0.1,Unnamed: 0,Link,ID,House,Description,Score,City
0,0,https://www.booking.comhttps://www.booking.com...,0,Hôtel Vert,"Situé à 2 km du Mont-Saint-Michel, sur la côte...",81,Mont Saint Michel
1,1,https://www.booking.comhttps://www.booking.com...,0,Cap à l'Ouest by Cocoonr,Le Cap à l'Ouest by Cocoonr est situé à Saint-...,84,St Malo
2,2,https://www.booking.comhttps://www.booking.com...,0,Le Castel Guesthouse,La maison d'hôtes Le Castel Guesthouse est sit...,87,Bayeux
3,3,https://www.booking.comhttps://www.booking.com...,0,Hilton Garden Inn Le Havre Centre,Établissement Voyage Durable,88,Le Havre
4,4,https://www.booking.comhttps://www.booking.com...,0,"Radisson Blu Hotel, Rouen Centre","Situé à Rouen, le Radisson Blu Hotel, Rouen Ce...",89,Rouen
5,5,https://www.booking.comhttps://www.booking.com...,0,Lennon Hotel Paris,"Doté d'un bar, le Lennon Hotel Paris est situé...",90,Paris
6,6,https://www.booking.comhttps://www.booking.com...,0,Le Majestic Cathédrale,"Situé à Amiens, à 1,7 km du Zénith, l'établiss...",92,Amiens
7,7,https://www.booking.comhttps://www.booking.com...,0,Hotel Lille Europe,L'Hotel Lille Europe est un établissement 3 ét...,81,Lille
8,8,https://www.booking.comhttps://www.booking.com...,0,Comfort Hotel Strasbourg - Montagne Verte & Re...,"Situé près de l'Ill, le Comfort Hotel Strasbou...",83,Strasbourg
9,9,https://www.booking.comhttps://www.booking.com...,0,Les Chambres du Haut-Koenigsbourg,"Situé à Orschwiller, en Alsace, l'établissemen...",88,Chateau du Haut Koenigsbourg


In [54]:
bookinglobal = pd.concat([clean_weat,findf], axis=1)
bookinglobal.head()

Unnamed: 0.1,index,destination,lat,lon,dt,temp,pressure,description,Unnamed: 0,Link,ID,House,Description,Score,City
0,1,Mont Saint Michel,48.635954,-1.51146,1647508135,282.54,1033,clear sky,0,https://www.booking.comhttps://www.booking.com...,0,Hôtel Vert,"Situé à 2 km du Mont-Saint-Michel, sur la côte...",81,Mont Saint Michel
1,2,St Malo,48.649518,-2.026041,1647508135,282.86,1033,clear sky,1,https://www.booking.comhttps://www.booking.com...,0,Cap à l'Ouest by Cocoonr,Le Cap à l'Ouest by Cocoonr est situé à Saint-...,84,St Malo
2,3,Bayeux,49.276462,-0.702474,1647508135,280.29,1033,clear sky,2,https://www.booking.comhttps://www.booking.com...,0,Le Castel Guesthouse,La maison d'hôtes Le Castel Guesthouse est sit...,87,Bayeux
3,4,Le Havre,49.493898,0.107973,1647508135,281.66,1032,overcast clouds,3,https://www.booking.comhttps://www.booking.com...,0,Hilton Garden Inn Le Havre Centre,Établissement Voyage Durable,88,Le Havre
4,5,Rouen,49.440459,1.093966,1647508135,280.63,1030,clear sky,4,https://www.booking.comhttps://www.booking.com...,0,"Radisson Blu Hotel, Rouen Centre","Situé à Rouen, le Radisson Blu Hotel, Rouen Ce...",89,Rouen


In [57]:
# Recording the final file

bookinglobal.to_csv('booking35.csv', index=False)

In [None]:
# ETL part (to be filled in)