<img src="https://companieslogo.com/img/orig/kayak_BIG-48662c15.png?t=1701236680&download=true" alt="kayak logo" />

# Data Collection 

### Import useful libraries

In [1]:
import pandas as pd
import json
import plotly.express as px

Marketing team wants to focus first on the best cities to travel in France

## Store the best cities list into a csv file

In [11]:
!python scripts/0-destinations.py

Saving destinations into a csv file ...
...Done
All destinations are stored into : results\destination_names.csv


### Show results

In [13]:
display(pd.read_csv("results/destination_names.csv").head())

Unnamed: 0,destination
0,Mont Saint Michel
1,St Malo
2,Bayeux
3,Le Havre
4,Rouen


## Get coodinates of each city with an API

* Use **https://nominatim.org/** to get the gps coodinates of each city
* Documentation : **https://nominatim.org/release-docs/develop/api/Search/**

In [14]:
!python scripts/1-Call_API_Nominatim.py

Saving destinations into a csv file ...
Done !
All coordinates are stored into : results\destination_coordinates.csv


### Show results

In [15]:
display(pd.read_csv("results/destination_coordinates.csv").head())

Unnamed: 0,destination,lat,lon
0,Mont Saint-Michel,48.635954,-1.51146
1,Saint-Malo,48.649518,-2.026041
2,Bayeux,49.276462,-0.702474
3,Le Havre,49.493898,0.107973
4,Rouen,49.440459,1.093966


## Get weather data with an API

* Use **https://openweathermap.org/appid** and **https://openweathermap.org/api/one-call-api** 
* Get API Key
* To get weaher data for each city and store it in a dataframe

In [17]:
!python scripts/2-Call_API_OpenWeatherMap.py

Saving weather data into a csv file ...
Done !
All weather data are stored into : results\weather_data.csv


### Show results

In [25]:
display(pd.read_csv("results/weather_data.csv").head())

Unnamed: 0,city,lat,lon,temp_day,avg_temp,description
0,Mont Saint-Michel,48.635954,-1.51146,10.14,9.51,pluie modérée
1,Saint-Malo,48.649518,-2.026041,10.43,8.468571,pluie modérée
2,Bayeux,49.276462,-0.702474,8.96,8.397143,pluie modérée
3,Le Havre,49.493898,0.107973,8.68,8.065714,pluie modérée
4,Rouen,49.440459,1.093966,9.59,9.188571,pluie modérée


In [24]:
display(pd.read_csv("results/weather_data.csv")['description'].value_counts())

description
pluie modérée     11
ciel dégagé       11
légère pluie      10
pluie et neige     1
couvert            1
nuageux            1
Name: count, dtype: int64

### Plot the best destination on map

Best destinations have "ciel dégagé" description and temperature above 12°C

Unnamed: 0,city,lat,lon,temp_day,avg_temp,description
0,Mont Saint-Michel,48.635954,-1.51146,10.14,9.51,pluie modérée
1,Saint-Malo,48.649518,-2.026041,10.43,8.468571,pluie modérée
2,Bayeux,49.276462,-0.702474,8.96,8.397143,pluie modérée
3,Le Havre,49.493898,0.107973,8.68,8.065714,pluie modérée
4,Rouen,49.440459,1.093966,9.59,9.188571,pluie modérée


In [20]:
# Load data
weather_data = pd.read_csv("results/weather_data.csv", index_col=0)
weather_data = weather_data.reset_index() 
display(weather_data.head())

# Filter data with weather_data['description'] == 'ciel dégagé'
# And weather_data['avg_temp']>11.5

filtered_data = weather_data[(weather_data['description'] == 'ciel dégagé') & (weather_data['avg_temp']>11)]

# Sort by temp_day descending and get the 5 best destinations with the best weather
best_destinations = filtered_data.sort_values(by='avg_temp', ascending=False)

best_destinations = best_destinations.reset_index(drop=True)
best_destinations['id'] = best_destinations.index
columns_order = ['id'] + [col for col in best_destinations.columns if col != 'id']
best_destinations = best_destinations[columns_order]

# Show the best destinations
display(best_destinations)

# Create a new dataframe with the best destinations
best_destinations_df = pd.DataFrame(best_destinations)

# Save the best destinations to a csv file
print("Saving best destinations by weather to csv file...")
best_destinations_df.to_csv("results/best_destinations.csv")

# Visualize the best destinations on a map
fig =  px.scatter_mapbox(
    best_destinations_df,
    lat="lat",
    lon="lon",
    hover_name="city", 
    hover_data={"temp_day": True, "lat": False, "lon": False},  
    color="temp_day", 
    size="temp_day", 
    color_continuous_scale="sunset",
    title="Best destination with clear sky and average temperature > 11.5°C",
    mapbox_style="open-street-map",
)

fig.update_layout(
    mapbox_style="open-street-map",
    margin={"r":0,"t":0,"l":0,"b":0},
    mapbox_zoom=5
)

fig.show()

Unnamed: 0,city,lat,lon,temp_day,avg_temp,description
0,Mont Saint-Michel,48.635954,-1.51146,10.14,9.51,pluie modérée
1,Saint-Malo,48.649518,-2.026041,10.43,8.468571,pluie modérée
2,Bayeux,49.276462,-0.702474,8.96,8.397143,pluie modérée
3,Le Havre,49.493898,0.107973,8.68,8.065714,pluie modérée
4,Rouen,49.440459,1.093966,9.59,9.188571,pluie modérée


Unnamed: 0,id,city,lat,lon,temp_day,avg_temp,description
0,0,Marseille,43.296174,5.369953,11.17,12.648571,ciel dégagé
1,1,Aix-en-Provence,43.529842,5.447474,10.31,12.132857,ciel dégagé
2,2,Bormes-les-Mimosas,43.150697,6.341928,12.0,12.097143,ciel dégagé
3,3,Cassis,43.214036,5.539632,10.26,12.09,ciel dégagé
4,4,Collioure,42.52505,3.083155,14.51,11.931429,ciel dégagé
5,5,Aigues-Mortes,43.566152,4.19154,12.48,11.064286,ciel dégagé
6,6,Nîmes,43.837425,4.360069,12.07,11.027143,ciel dégagé


Saving best destinations by weather to csv file...


## Scrape Booking.com

Scrape data directly from the booking.com website

Scrap as many information like :
* Hotel name
* Url to its booking.com page
* Its coordinates : latitude and longitude
* Its rating given by the users
* Its textual description

In [26]:
!python scripts/3-Scrapy_on_Booking.py

INFO:scrapy.utils.log:Scrapy 2.12.0 started (bot: scrapybot)
2025-02-27 20:47:21 [scrapy.utils.log] INFO: Scrapy 2.12.0 started (bot: scrapybot)
INFO:scrapy.utils.log:Versions: lxml 5.3.1.0, libxml2 2.11.7, cssselect 1.2.0, parsel 1.10.0, w3lib 2.1.2, Twisted 23.10.0, Python 3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 13:17:27) [MSC v.1929 64 bit (AMD64)], pyOpenSSL 24.2.1 (OpenSSL 3.3.2 3 Sep 2024), cryptography 43.0.3, Platform Windows-11-10.0.26100-SP0
2025-02-27 20:47:21 [scrapy.utils.log] INFO: Versions: lxml 5.3.1.0, libxml2 2.11.7, cssselect 1.2.0, parsel 1.10.0, w3lib 2.1.2, Twisted 23.10.0, Python 3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 13:17:27) [MSC v.1929 64 bit (AMD64)], pyOpenSSL 24.2.1 (OpenSSL 3.3.2 3 Sep 2024), cryptography 43.0.3, Platform Windows-11-10.0.26100-SP0
INFO:scrapy.addons:Enabled addons:
[]
2025-02-27 20:47:21 [scrapy.addons] INFO: Enabled addons:
[]
DEBUG:scrapy.utils.log:Using reactor: twisted.internet.selectreactor.Select

### Show results

In [30]:
with open("results/booking_data.json", "r", encoding="utf-8") as file :
    booking_data = json.load(file)

if booking_data: 
    print(json.dumps(booking_data[0], indent=4, ensure_ascii=False))

{
    "name": "Vivez le Bonheur - Plage - Port de plaisance - Vue mer",
    "score": "9",
    "description": "Set in Le Havre, just less than 1 km from Le Havre Beach, Vivez le Bonheur - Plage - Port de plaisance - Vue mer offers beachfront accommodation with free WiFi. The property has sea views and is 600 metres from Eglise St-Joseph and less than 1 km from Le Volcan. The property is non-smoking and is situated 1.1 km from Perret Model Appartment.\n\nThe apartment features 1 bedroom, a flat-screen TV, a fully equipped kitchen with a microwave and a fridge, a washing machine, and 1 bathroom with a walk-in shower. Towels and bed linen are featured in the apartment.\n\nFor guests with children, the apartment features outdoor play equipment.\n\nSaint-Michel's Church is 1.6 km from Vivez le Bonheur - Plage - Port de plaisance - Vue mer, while Norman Museum of Ethnography and Popular Arts is 25 km away.",
    "latitude": "49.488163827840836",
    "longitude": "0.09894180510142903",
    "ur