# Kayak

* Voir : https://app.jedha.co/course/project-plan-your-trip-with-kayak-ft/plan-your-trip-with-kayak-ft

Summary
* 70% of the users who are planning a trip would like to have more information about the destination they are going to. 
* People tend to be defiant about the information they are reading if they don't know the brand which produced the content => SOURCING 
* Therefore, Kayak Marketing Team would like to create an application that will recommend where people should plan their next holidays. 
* The application should be based on real data about:
    * Weather 
    * Hotels in the area 
* The application should then be able to recommend the best destinations and hotels based on the above variables at any given time. 


## Helpers

<!-- ### Get weather data with an API 

1. DONE - Use https://nominatim.org/ to get the gps coordinates of all the cities (no subscription required) Documentation : https://nominatim.org/release-docs/develop/api/Search/

1. DONE - Use https://openweathermap.org/appid and https://openweathermap.org/api/one-call-api to get some information about the weather for the 35 cities and put it in a DataFrame

1. DONE - Determine the list of cities where the weather will be the nicest within the next 7 days For example, you can use the values of daily.pop and daily.rain to compute the expected volume of rain within the next 7 days... But it's only an example, actually you can have different opinions on a what a nice weather would be like 😎 Maybe the most important criterion for you is the temperature or humidity, so feel free to change the rules !


1. DONE - Save all the results in a `.csv` file, you will use it later. You can save all the informations that seem important to you ! Don't forget to save the name of the cities, and also to create a column containing a unique identifier (id) of each city (this is important for what's next in the project)

1. DONE - Use plotly to display the best destinations on a map -->


 



### Scrape Booking.com 

Since BookingHoldings doesn't have aggregated databases, it will be much faster to scrape data directly from booking.com 

You can scrap as many information as you want, but we suggest that you get at least:

*   hotel name,
*   Url to its booking.com page,
*   Its coordinates: latitude and longitude
*   Score given by the website users
*   Text description of the hotel

TOP 20 hotels




<!-- 
### Create your data lake using S3 

Once you managed to build your dataset, you should store into S3 as a csv file. 

### ETL 

Once you uploaded your data onto S3, it will be better for the next data analysis team to extract clean data directly from a Data Warehouse. Therefore, create a SQL Database using AWS RDS, extract your data from S3 and store it in your newly created DB.  
 -->


In [65]:
# prelude
import time
import json
import pandas as pd

from pathlib import Path

import my_api_id as key

kOpenWeatherMapKey  = key.openweathermap
kGold               = 1.618
kWidth              = 12
kHeight             = kWidth/kGold
kWidthPx            = 1024
kHeightPx           = kWidthPx/kGold

# kCurrentDir         = Path(__file__).parent
kFilename           = "cities.csv"
kOneCityFile        = "one_city.json"
kHotelsListFile     = "hotels_list.json"
kHotelsAttributes   = "hotels_attributes.json"

# 1. Get the top n cities 

In [76]:
df_cities = pd.read_csv(kFilename, nrows = 3)
# df = pd.read_csv(kFilename)
df_cities.rename(columns={"Unnamed: 0": "id"}, inplace=True)
df_cities.head(10)


Unnamed: 0,id,city,latitude,longitude,temp,rain
0,0,Aix en Provence,43.529842,5.447474,17.8,0.0
1,1,Grenoble,45.18756,5.735782,17.68,0.0
2,2,Lyon,45.757814,4.832011,17.15,2.05


# 2. For each city generate a list of hotels

In [None]:
# Pour chaque ville de la table
#   Si il existe, supprimer le fichier one_city.json
#   Generer un fichier one_city.json avec le nom de la ville

#   Si il existe, supprimer le fichier hotels_lists.json
#   Invoquer sraper8_hotels_per_city
#   Attendre que le fichier hotels_lists.json soit généré

#   Si il existe, supprimer le fichier hotel_attributes.json
#   Invoquer scraper7_attributes
#   Attentendre que le fichier hotel_attributes.json soit généré

#   

In [50]:


for index, row in df_cities.iterrows():

  pathToOneCityFile = Path(kOneCityFile)  
  if Path.exists(pathToOneCityFile):
    (pathToOneCityFile).unlink()

  current_ligne = row.to_list()
  entry = '[{"city" : '  +   f'"{current_ligne[1]}"'   +   "}]"
  with open(pathToOneCityFile, 'w') as f:
    f.write(entry)



In [61]:
pathToHotelsListFile = Path(kHotelsListFile)  
if Path.exists(pathToHotelsListFile):
  (pathToHotelsListFile).unlink()

!python "scraper8_hotels_per_city.py"

print("DONE : Getting the names of the hotels for the currunt city")
print()


DONE : Getting the names of the hotels for the currunt city



In [62]:

# The file we are waiting for
expected_file = Path(kHotelsListFile)

while not expected_file.exists():
    time.sleep(1)  # Attendre une seconde avant de vérifier à nouveau

# Continuer l'exécution du notebook une fois que le fichier JSON est créé
print("The json file with the list of hotels of the current city is created")
print("Let's move forward")

The json file with the list of hotels names and URL is created
Let's move forward


In [67]:
pathToHotelsAttributes = Path(kHotelsAttributes)  
if Path.exists(pathToHotelsAttributes):
  (pathToHotelsAttributes).unlink()

!python "scraper7_attributes.py"

print("DONE : Getting the attributes of the hotels of the list of the currunt city")
print()


DONE : Getting the attributes of the hotels of the list of the currunt city



In [66]:

# The file we are waiting for
expected_file = Path(kHotelsAttributes)

while not expected_file.exists():
    time.sleep(1)  # Attendre une seconde avant de vérifier à nouveau

# Continuer l'exécution du notebook une fois que le fichier JSON est créé
print("The json file with the attributes of the hotels of the list of the current city")
print("Let's move forward")

The json file with the attributes of the hotels of the list of the current city
Let's move forward


In [82]:
# At this point
# df_cities         contains the cities, latitude, long, temp and rain forecast
# one_city.json     contain the name of the town
# hotels_list.json  contains the names of the hotels of the town
# hotels_attributes.json contains, in order, the attributes of each hotels of the list

# let's store all these data in a DataFrame

df_attributes = pd.read_json("./hotels_attributes.json")
# df_attributes.head()

df_hotels = pd.read_json("./hotels_list.json")
# df_hotels.head() 

df = pd.concat([df_hotels, df_attributes], axis=1)
# df.head()

# 	id	city	latitude	longitude	temp	rain
current_index = 2
df["city"] = row = df_cities.iloc[current_index][1]
df["latitude_city"] = row = df_cities.iloc[current_index][2]
df["longitude_city"] = row = df_cities.iloc[current_index][3]
df["temp"] = row = df_cities.iloc[current_index][4]
df["rain"] = row = df_cities.iloc[current_index][5]
display(df)


  df["city"] = row = df_cities.iloc[current_index][1]
  df["latitude_city"] = row = df_cities.iloc[current_index][2]
  df["longitude_city"] = row = df_cities.iloc[current_index][3]
  df["temp"] = row = df_cities.iloc[current_index][4]
  df["rain"] = row = df_cities.iloc[current_index][5]


Unnamed: 0,hotel,url,score,latitude,longitude,description,city,latitude_city,longitude_city,temp,rain
0,Les Cocons Heritage,https://www.booking.com/hotel/fr/les-cocons-he...,7.8,45.756962,4.87282,"Situé à Lyon, à moins de 1,7 km de la gare de ...",Lyon,45.757814,4.832011,17.15,2.05
1,Citadines Presqu'île Lyon,https://www.booking.com/hotel/fr/citadines-apa...,8.4,45.761587,4.832627,Le Citadines Presqu'île Lyon est situé dans le...,Lyon,45.757814,4.832011,17.15,2.05
2,Campanile Lyon Centre - Berges du Rhône,https://www.booking.com/hotel/fr/bleumarinelyo...,7.1,45.7567,4.841953,Le Campanile Lyon Centre - Berges du Rhône pro...,Lyon,45.757814,4.832011,17.15,2.05
3,Boscolo Lyon Hotel & Spa,https://www.booking.com/hotel/fr/boscolo-exedr...,8.4,45.760833,4.837589,Le Boscolo Lyon Hotel & Spa occupe un bâtiment...,Lyon,45.757814,4.832011,17.15,2.05
4,Best Western Hotel du Pont Wilson,https://www.booking.com/hotel/fr/lyon-wilson.f...,8.3,45.758431,4.841479,"Installé dans le centre-ville de Lyon, le Best...",Lyon,45.757814,4.832011,17.15,2.05
5,Hôtel du Helder,https://www.booking.com/hotel/fr/hotelduhelder...,6.4,45.752111,4.840485,"Situé dans le centre de Lyon, l'Hôtel du Helde...",Lyon,45.757814,4.832011,17.15,2.05
6,TRIBE Lyon Croix Rousse,https://www.booking.com/hotel/fr/tribe-lyon-cr...,8.7,45.78235,4.83378,"Situé à Lyon, à 1,9 km du musée des beaux-arts...",Lyon,45.757814,4.832011,17.15,2.05
7,Hôtel Charlemagne by Happyculture,https://www.booking.com/hotel/fr/hotelbestwest...,8.0,45.746023,4.823894,L’Hôtel Charlemagne by Happyculture se trouve ...,Lyon,45.757814,4.832011,17.15,2.05
8,Hôtel Chromatics & Restaurant Hill Club by Hap...,https://www.booking.com/hotel/fr/perrache.fr.html,7.6,45.744684,4.825809,L’Hôtel Chromatics & Restaurant Hill Club by H...,Lyon,45.757814,4.832011,17.15,2.05
9,Collège Hôtel,https://www.booking.com/hotel/fr/college.fr.html,8.1,45.765711,4.827827,Bénéficiant d’un emplacement central dans la v...,Lyon,45.757814,4.832011,17.15,2.05
