# Kayak

* Voir : https://app.jedha.co/course/project-plan-your-trip-with-kayak-ft/plan-your-trip-with-kayak-ft

Summary
* 70% of the users who are planning a trip would like to have more information about the destination they are going to. 
* People tend to be defiant about the information they are reading if they don't know the brand which produced the content => SOURCING 
* Therefore, Kayak Marketing Team would like to create an application that will recommend where people should plan their next holidays. 
* The application should be based on real data about:
    * Weather 
    * Hotels in the area 
* The application should then be able to recommend the best destinations and hotels based on the above variables at any given time. 


## Helpers

### Get weather data with an API 

1. DONE - Use https://nominatim.org/ to get the gps coordinates of all the cities (no subscription required) Documentation : https://nominatim.org/release-docs/develop/api/Search/

1. DONE - Use https://openweathermap.org/appid and https://openweathermap.org/api/one-call-api to get some information about the weather for the 35 cities and put it in a DataFrame

1. DONE - Determine the list of cities where the weather will be the nicest within the next 7 days For example, you can use the values of daily.pop and daily.rain to compute the expected volume of rain within the next 7 days... But it's only an example, actually you can have different opinions on a what a nice weather would be like 😎 Maybe the most important criterion for you is the temperature or humidity, so feel free to change the rules !


1. DONE - Save all the results in a `.csv` file, you will use it later 😉 You can save all the informations that seem important to you ! Don't forget to save the name of the cities, and also to create a column containing a unique identifier (id) of each city (this is important for what's next in the project)

1. DONE - Use plotly to display the best destinations on a map 



<!--
### Scrape Booking.com 

Since BookingHoldings doesn't have aggregated databases, it will be much faster to scrape data directly from booking.com 

You can scrap as many information asyou want, but we suggest that you get at least:

*   hotel name,
*   Url to its booking.com page,
*   Its coordinates: latitude and longitude
*   Score given by the website users
*   Text description of the hotel


### Create your data lake using S3 

Once you managed to build your dataset, you should store into S3 as a csv file. 

### ETL 

Once you uploaded your data onto S3, it will be better for the next data analysis team to extract clean data directly from a Data Warehouse. Therefore, create a SQL Database using AWS RDS, extract your data from S3 and store it in your newly created DB.  -->

# Architecture & process

<p align="center">
<img src="./assets/process1.png" alt="drawing" width="800"/>
<p>


In [11]:
# prelude
import pandas as pd
import requests
import plotly.express as px
from pathlib import Path

import my_api_id as key
k_OpenWeatherMapKey  = key.openweathermap

import include_kayak as k
k_Current_dir = Path.cwd()


In [12]:
ma_liste = [
  "Le Havre",
  "Rouen",
  "Paris",
]

# ma_liste = [
#   "Mont Saint Michel",
#   "St Malo",
#   "Bayeux",
#   "Le Havre",
#   "Rouen",
#   "Paris",
#   "Amiens",
#   "Lille",
#   "Strasbourg",
#   "Chateau du Haut Koenigsbourg",
#   "Colmar",
#   "Eguisheim",
#   "Besancon",
#   "Dijon",
#   "Annecy",
#   "Grenoble",
#   "Lyon",
#   "Gorges du Verdon",
#   "Bormes les Mimosas",
#   "Cassis",
#   "Marseille",
#   "Aix en Provence",
#   "Avignon",
#   "Uzes",
#   "Nimes",
#   "Aigues Mortes",
#   "Saintes Maries de la mer",
#   "Collioure",
#   "Carcassonne",
#   "Ariege",
#   "Toulouse",
#   "Montauban",
#   "Biarritz",
#   "Bayonne",
#   "La Rochelle"
# ]

# 1. Get GPS coordinates 

In [13]:
df_gps = pd.DataFrame (
  {
    "city"      : [],
    "latitude"  : [],
    "longitude" : [],
  }
)

url = "https://nominatim.openstreetmap.org/search"

for site in ma_liste:
  params = {
    "q" : site,
    "countrycodes" : "fr",
    "format":"json",
  }
  response = requests.get(url, params=params) 
  # print(response.json())
  
  list_tmp = [site, response.json()[0]["lat"], response.json()[0]["lon"]]
  df_gps.loc[len(df_gps)] = list_tmp

  # Alternative
  # new_row = pd.DataFrame (
  #   {
  #     "city"      : [site],
  #     "latitude"  : [response.json()[0]["lat"]],
  #     "longitude" : [response.json()[0]["lon"]],
  #   }
  # )
  # df_gps = pd.concat([df_gps, new_row], ignore_index=True)

display(df_gps)

Unnamed: 0,city,latitude,longitude
0,Le Havre,49.4938975,0.1079732
1,Rouen,49.4404591,1.0939658
2,Paris,48.8588897,2.3200410217200766


# 2. Get weather forecast 

In [1]:

# TODO : evaluate_xxx() functions can be much more sophisticated

def evaluate_proba_rain(df):
  rain = round(100 * df["pop"].mean(), 2)
  return rain

def evaluate_proba_temp(df):
  temp = round((df["main.temp"].mean()-273.0), 2)
  return temp


In [15]:
df_weather = pd.DataFrame (
  {
    "city"  : [],
    "temp"  : [],
    "rain"  : [],
  }
)

url_weather = "http://api.openweathermap.org/data/2.5/forecast"

for index, row in df_gps.iterrows():
  params = {
    "lat" : row['latitude'],
    "lon" : row['longitude'],
    "appid" : k_OpenWeatherMapKey,
  }

  response = requests.get(url_weather, params=params) 

  df_tmp = pd.json_normalize(response.json(), record_path=["list"])

  proba_rain = evaluate_proba_rain(df_tmp)
  proba_temp = evaluate_proba_temp(df_tmp)
  # TODO : others functions could be added here

  list_tmp = [row["city"], proba_temp, proba_rain]
  df_weather.loc[len(df_weather)] = list_tmp

display(df_weather)



Unnamed: 0,city,temp,rain
0,Le Havre,11.51,12.32
1,Rouen,12.16,9.72
2,Paris,14.8,10.3


# 3. Rank sites

In [16]:
# This is an example of implementation
# Duck typing principle
# One can write its own function (more sophisticated) later

# Optimal temp is 25
# On classe les villes 
# 15° => abs(15-25) = 10
# 30° => abs(30-25) = 5
# Ville 2 sera classée en premier

def RankTheCities(df):
  k_TempOptimalTemp = 25     # °C

  df["id"] = abs(df["temp"] - k_TempOptimalTemp)
  df.sort_values(by="id", inplace=True)
  
  df.reset_index(inplace=True, drop=True)
  df.drop(["id"], axis=1, inplace=True)

  return df

In [17]:
df = pd.merge(df_gps, df_weather, how="left", on=["city", "city"])
df = RankTheCities(df)
# df.head()

# 4. Save as .csv

In [18]:
# current_dir = Path.cwd()
df.to_csv(k_Current_dir/k.AssetsDir/k.Filename)

# 5. Plot cities

In [19]:
df = pd.read_csv(k_Current_dir/k.AssetsDir/k.Filename)
df.rename(columns={"Unnamed: 0": "id"}, inplace=True)
df.head()



Unnamed: 0,id,city,latitude,longitude,temp,rain
0,0,Paris,48.85889,2.320041,14.8,10.3
1,1,Rouen,49.440459,1.093966,12.16,9.72
2,2,Le Havre,49.493898,0.107973,11.51,12.32


In [20]:
# POur les couleurs voir : https://plotly.com/python/builtin-colorscales/
# TODO : calculer le centre du zoom ? 
# Voir center=dict... sur https://plotly.com/python/scattermapbox/ 

fig = px.scatter_mapbox(
  df, 
  lat="latitude", 
  lon="longitude", 
  color="temp", 
  mapbox_style="open-street-map", 
  # color_continuous_scale = 'YlOrBr', 
  # color_continuous_scale = px.colors.cyclical.IceFire,
  color_continuous_scale=px.colors.sequential.Rainbow,
  zoom = 4.5,
  height = k.HeightPx,
  width = k.WidthPx,
  size = "temp",
  text="city"
)
fig.update_layout(
  title = "Best destinations",
)
fig.show()