# Project: Planning my next holidays : Get coordinates, temperature and plot ☀️

Let's use APIs and web-scraping to determine what would be the nicest places to visit for a last-minute holiday next week!

According to <a href=https://one-week-in.com/35-cities-to-visit-in-france/ target=_blank>this website</a>, here are the 35 best places to visit in France in 2020:

* Mont Saint Michel, St Malo, Bayeux, Le Havre, Rouen, Paris, Amiens,
* Lille, Strasbourg, Chateau du Haut Koenigsbourg, Colmar, Eguisheim,
* Besancon, Dijon, Annecy, Grenoble, Lyon, Gorges du Verdon,
* Bormes les Mimosas, Cassis, Marseille, Aix en Provence, Avignon,
* Uzes, Nimes, Aigues Mortes, Saintes Maries de la mer, Collioure,
* Carcassonne, Ariege, Toulouse, Montauban, Biarritz, Bayonne, La Rochelle


In [20]:
#!pip install requests
#!pip install pandas
#!pip install plotly==5.9.0

In [22]:

print(requests.__version__)

2.31.0


In [23]:
import requests
import pandas as pd
import plotly.express as px
import os
from datetime import date
import datetime

In [24]:
city_names = [
    "Mont Saint Michel", "St Malo", "Bayeux", "Le Havre", "Rouen", "Paris", "Amiens", "Lille", "Strasbourg", 
    "Chateau du Haut Koenigsbourg", "Colmar", "Eguisheim", "Besancon", "Dijon", "Annecy", "Grenoble", "Lyon",
    "Gorges du Verdon", "Bormes les Mimosas", "Cassis", "Marseille", "Aix en Provence", "Avignon", "Uzes",
    "Nimes", "Aigues Mortes", "Saintes Maries de la mer", "Collioure", "Carcassonne", "Foix",
    "Toulouse", "Montauban", "Biarritz", "Bayonne", "La Rochelle"
]

In [26]:
columns = ['city_id' , 'name', 'latitude', 'longitude', 'main_weather', 'expected_rain', 'day_temperature']
dataset = pd.DataFrame(columns=columns)

In [27]:
LINK_OPEN_STREET= 'https://nominatim.openstreetmap.org/search'

In [28]:
LINK_OPEN_WEATHER = "https://api.openweathermap.org/data/3.0/onecall"

In [39]:
API_KEY_OPENWEATHER = os.getenv('API_KEY_OPENWEATHER')

In [40]:
for i in range(len(city_names)):
    
    print("Making requests for {}".format(city_names[i]))
    
    # Use nominatim api to get GPS coordinates of city
    par = {
        "city": city_names[i],
        "country": "France",
        "format": "json"
    }
    #1. Get the gps coordinates of all the cities
    r = requests.get(LINK_OPEN_STREET, params=par)
    res = r.json()
    
    dataset.loc[i,'city_id'] = i
    dataset.loc[i,'name'] = city_names[i]
    dataset.loc[i, 'latitude'] = res[0]['lat']
    dataset.loc[i, 'longitude'] = res[0]['lon']
    
    # Use openweathermap api to get weather for the 7 next days
    par = {
        "lat": dataset.loc[i, 'latitude'],
        "lon": dataset.loc[i, 'longitude'],
        "exclude": "current,minutely,hourly",
        "units": "metric",
        "appid": API_KEY_OPENWEATHER,
    }
    #Get some information about the weather for the cities and put it in a DataFrame.
    r = requests.get(LINK_OPEN_WEATHER, params=par)
    res = r.json()
    
    # Compute expected volume of rain
    expected_rain = 0
    #print(res)
    for d in res['daily']:
        if 'rain' in d.keys():
            expected_rain += d['pop']*d['rain']
            
    # Compute average day temperature
    temperatures = pd.Series([d['temp']['day'] for d in res['daily']])
    mean_temperature = temperatures.mean()

    # Extract most probable weather
    weathers = pd.Series([d['weather'][0]['main'] for d in res['daily']])
    main_weather = weathers.mode()[0]
    
    dataset.loc[i, 'main_weather'] = main_weather
    dataset.loc[i,'expected_rain'] = expected_rain
    dataset.loc[i,'day_temperature'] = mean_temperature
    

Making requests for Mont Saint Michel
Making requests for St Malo
Making requests for Bayeux
Making requests for Le Havre
Making requests for Rouen
Making requests for Paris
Making requests for Amiens
Making requests for Lille
Making requests for Strasbourg
Making requests for Chateau du Haut Koenigsbourg
Making requests for Colmar
Making requests for Eguisheim
Making requests for Besancon
Making requests for Dijon
Making requests for Annecy
Making requests for Grenoble
Making requests for Lyon
Making requests for Gorges du Verdon
Making requests for Bormes les Mimosas
Making requests for Cassis
Making requests for Marseille
Making requests for Aix en Provence
Making requests for Avignon
Making requests for Uzes
Making requests for Nimes
Making requests for Aigues Mortes
Making requests for Saintes Maries de la mer
Making requests for Collioure
Making requests for Carcassonne
Making requests for Foix
Making requests for Toulouse
Making requests for Montauban
Making requests for Biarrit

In [41]:
#3. Determine the list of cities where the weather will be the nicest within the next 7 days.
dataset.loc[:,'rank'] = dataset['expected_rain'].rank(method='min')
dataset.loc[:,'inverted_rank'] = dataset['expected_rain'].rank(method='min', ascending=False)

dataset = dataset.sort_values(by=['expected_rain', 'day_temperature'], ascending = [True, False]).reset_index(drop=True)
display(dataset)

Unnamed: 0,city_id,name,latitude,longitude,main_weather,expected_rain,day_temperature,rank,inverted_rank
0,28,Carcassonne,43.2130358,2.3491069,Clouds,3.3434,17.5675,1.0,35.0
1,27,Collioure,42.52505,3.0831554,Clouds,3.603,18.3725,2.0,34.0
2,8,Strasbourg,48.584614,7.7507127,Rain,6.1416,13.91625,3.0,33.0
3,30,Toulouse,43.6044622,1.4442469,Clouds,10.65,16.25625,4.0,32.0
4,26,Saintes Maries de la mer,43.4515922,4.4277202,Clouds,11.49,14.70125,5.0,31.0
5,25,Aigues Mortes,43.5661521,4.19154,Clouds,11.6828,15.00375,6.0,30.0
6,9,Chateau du Haut Koenigsbourg,48.2495226,7.3454923,Rain,13.2914,11.66375,7.0,29.0
7,10,Colmar,48.0777517,7.3579641,Rain,13.2975,14.435,8.0,28.0
8,11,Eguisheim,48.0447968,7.3079618,Rain,15.0694,14.1475,9.0,27.0
9,24,Nimes,43.8374249,4.3600687,Rain,16.7611,17.215,10.0,26.0


In [42]:
print('Best places for a trip next week are : ')

for i,row in dataset.loc[dataset['rank']==1,:].iterrows():
    print("{} -- Mostly {} with temperature {} °C".format(row['name'], row['main_weather'], row['day_temperature']))

Best places for a trip next week are : 
Carcassonne -- Mostly Clouds with temperature 17.567500000000003 °C


In [43]:
dataset.loc[:,'latitude'] = dataset['latitude'].astype('float')
dataset.loc[:,'longitude'] = dataset['longitude'].astype('float')
dataset.loc[:,'expected_rain'] = dataset['expected_rain'].astype('float')
dataset.loc[:,'day_temperature'] = dataset['day_temperature'].astype('float')
dataset.loc[:,'rank'] = dataset['rank'].astype('int')
dataset.loc[:,'inverted_rank'] = dataset['inverted_rank'].astype('int')

In [44]:
#4. Save all the results in a .csv file, you will use it later
if not os.path.isdir("res"):
    os.mkdir("res")
dataset.to_csv('res/cities.csv', index=False)

In [45]:
import plotly
print(plotly.__version__)

5.9.0


In [46]:
#5. Use plotly to display the best destinations on a map.
fig = px.scatter_mapbox(dataset, lat="latitude", lon="longitude", hover_name = 'name', zoom = 4,
                        hover_data = ['main_weather', 'expected_rain', 'day_temperature'], 
                        color = 'day_temperature', color_continuous_scale = 'Bluered', size = 'inverted_rank',
                        mapbox_style="carto-positron")
fig.show()

# Project: Scrape hotels for each cities for planning my next holidays ☀️

Let's create a script that allows to get some information about all the hotels in a given city on <a href="https://www.booking.com" target="_blank">www.booking.com</a> 🧙

**We strongly recommend that you use Scrapy, it will be much easier!**

You can scrap as many information as you want, but we suggest that you get at least:
* The hotel name, 
* The url to its booking.com page, 
* Its coordinates: latitude and longitude,
* The score given by the website users,
* The text description of the hotel.

In [47]:
from bs4 import BeautifulSoup
import json

In [48]:
HEADERS = {
    "User-Agent" : "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:99.0) Gecko/20100101 Firefox/99.0"
}

In [50]:
def get_hotels_by_city(city):
    today = date.today()
    week = today + datetime.timedelta(days=7)
    start_day = today.strftime("%Y-%m-%d")
    end_day = week.strftime("%Y-%m-%d")
    URL=f"https://www.booking.com/searchresults.fr.html?ss={city.replace(' ', '+')}&lang=fr&src=city&checkin={start_day}&checkout={end_day}"
    print(URL)
    page = requests.get(URL, headers=HEADERS)
    soup = BeautifulSoup(page.content, 'html.parser')
    hotels= []
    items = soup.find_all("div", {"data-testid" : "property-card"})
    for item in items:
        score= item.find("div", {"class": "a3b8729ab1 d86cee9b25"})
        if score is not None:
            score = score.text.split("Avec une note de")[0]
            score = float(score.replace(",", "."))
        hotel = {
            "name": item.find_all("div", {"data-testid": "title"})[0].text.strip(),
            "url": item.find_all("a")[0]["href"],
            "score": score,
        }
        result_desc = requests.get(hotel["url"], headers=HEADERS)
        des_bs4 = BeautifulSoup(result_desc.content, 'html.parser')
        hotel["description"] = des_bs4.find("p", {"data-testid": "property-description"}).text.replace("\n", "")
        extrat_values = des_bs4.find_all("div", {"class": "a53cbfa6de ebbf62ced0"})
        hotel["extrat"] = list(map(lambda item: item.text, extrat_values))
        hotel["address"] = des_bs4.find("span", {"data-source": "top_link"}).text.replace("\n", "")
        coords = des_bs4.find("a",  {"id":"hotel_header"})["data-atlas-latlng"].split(",")
        hotel["latitude"] = float(coords[0])
        hotel["longitude"] = float(coords[1])
        hotels.append(hotel)
    return hotels

def process_extract_hotels_by_city(city):
    #hear to create folder
    if not os.path.isdir("res"):
        os.mkdir("res")
    hotels = get_hotels_by_city(city)
    json_object = json.dumps(hotels, indent=2)
    with open(f"res/hotels_{city.replace(' ', '-')}.json", "w") as outfile:
        outfile.write(json_object)

def process_cities(cities):
    for city in cities:
        print(f"Processing to scrape {city}")
        process_extract_hotels_by_city(city)

In [51]:
process_cities(city_names)

Processing to scrape Mont Saint Michel
https://www.booking.com/searchresults.fr.html?ss=Mont+Saint+Michel&lang=fr&src=city&checkin=2024-03-29&checkout=2024-04-05
Processing to scrape St Malo
https://www.booking.com/searchresults.fr.html?ss=St+Malo&lang=fr&src=city&checkin=2024-03-29&checkout=2024-04-05
Processing to scrape Bayeux
https://www.booking.com/searchresults.fr.html?ss=Bayeux&lang=fr&src=city&checkin=2024-03-29&checkout=2024-04-05
Processing to scrape Le Havre
https://www.booking.com/searchresults.fr.html?ss=Le+Havre&lang=fr&src=city&checkin=2024-03-29&checkout=2024-04-05
Processing to scrape Rouen
https://www.booking.com/searchresults.fr.html?ss=Rouen&lang=fr&src=city&checkin=2024-03-29&checkout=2024-04-05
Processing to scrape Paris
https://www.booking.com/searchresults.fr.html?ss=Paris&lang=fr&src=city&checkin=2024-03-29&checkout=2024-04-05
Processing to scrape Amiens
https://www.booking.com/searchresults.fr.html?ss=Amiens&lang=fr&src=city&checkin=2024-03-29&checkout=2024-04

# Project: Planning my next holidays ☀️

Today, you'll clean and analyze the data that you collected about cities and hotels. In the end, you will be able to save it in a S3 bucket in your AWS account.

Import all libraries you need at first:

In [53]:
#!pip install boto3

In [56]:
import glob
import plotly.express as px
import boto3

In [57]:
print(boto3.__version__)

1.34.72


1. Read the `.csv` file that contains information about cities and weather

In [58]:
cities = pd.read_csv('res/cities.csv')
cities.head()

Unnamed: 0,city_id,name,latitude,longitude,main_weather,expected_rain,day_temperature,rank,inverted_rank
0,28,Carcassonne,43.213036,2.349107,Clouds,3.3434,17.5675,1.0,35.0
1,27,Collioure,42.52505,3.083155,Clouds,3.603,18.3725,2.0,34.0
2,8,Strasbourg,48.584614,7.750713,Rain,6.1416,13.91625,3.0,33.0
3,30,Toulouse,43.604462,1.444247,Clouds,10.65,16.25625,4.0,32.0
4,26,Saintes Maries de la mer,43.451592,4.42772,Clouds,11.49,14.70125,5.0,31.0


2. Read the `.json` files containing information about hotels and save it into a single pandas Dataframe. You can save as much information as you want, but don't forget to save at least these important ones:

- name of the city
- id of the city (you will find it in the `.csv` file about cities)
- create a column containing a unique identifier of the hotel (hotel_id)
- name of the hotel

👀 _If you store textual information, make sure that you clean it such that it is readable._

In [59]:
# Get all files in `res` folder starting by `2_hotels_` and 
# finishing by `.json`
hotel_files = glob.glob('res/hotels_*.json')

# Create a new DataFrame
hotels = pd.DataFrame(columns = ['city_id', 'city_name', 'hotel_id', 'name', 'url', 'latitude', 'longitude', 'score', 'description'])
print(hotel_files)
# Iterate over all JSON files
for f in hotel_files:
    city_name = f.split('_')[1].split('.')[0].replace("-"," ")
    city_id = cities.loc[cities['name'] == city_name,'city_id'].values[0]
    
    print("Processing {}".format(city_name))
    
    # Read json files and add hotel_id, city_id and city_name into DataFrames
    temp_dataset = pd.read_json(f)
    if temp_dataset.empty:
        print(f"file {f} is empty")
        continue
    temp_dataset = temp_dataset.reset_index().rename({'index': 'hotel_id'}, axis = 1)
    temp_dataset.loc[:,'city_id'] = city_id
    temp_dataset.loc[:,'city_name'] = city_name
    
    # Clean text fields
    temp_dataset.loc[:, 'name'] = temp_dataset['name'].str.replace('\n', '')
    temp_dataset.loc[:, 'url'] = temp_dataset['url'].str.replace('\n', '')
    temp_dataset.loc[:, 'description'] = temp_dataset['description'].str.replace('\n', '')
    
    # Append to hotels dataframe
    hotels = pd.concat([hotels, temp_dataset])

['res\\hotels_Aigues-Mortes.json', 'res\\hotels_Aix-en-Provence.json', 'res\\hotels_Amiens.json', 'res\\hotels_Annecy.json', 'res\\hotels_Avignon.json', 'res\\hotels_Bayeux.json', 'res\\hotels_Bayonne.json', 'res\\hotels_Besancon.json', 'res\\hotels_Biarritz.json', 'res\\hotels_Bormes-les-Mimosas.json', 'res\\hotels_Carcassonne.json', 'res\\hotels_Cassis.json', 'res\\hotels_Chateau-du-Haut-Koenigsbourg.json', 'res\\hotels_Collioure.json', 'res\\hotels_Colmar.json', 'res\\hotels_Dijon.json', 'res\\hotels_Eguisheim.json', 'res\\hotels_Foix.json', 'res\\hotels_Gorges-du-Verdon.json', 'res\\hotels_Grenoble.json', 'res\\hotels_La-Rochelle.json', 'res\\hotels_Le-Havre.json', 'res\\hotels_Lille.json', 'res\\hotels_Lyon.json', 'res\\hotels_Marseille.json', 'res\\hotels_Mont-Saint-Michel.json', 'res\\hotels_Montauban.json', 'res\\hotels_Nimes.json', 'res\\hotels_Paris.json', 'res\\hotels_Rouen.json', 'res\\hotels_Saintes-Maries-de-la-mer.json', 'res\\hotels_St-Malo.json', 'res\\hotels_Strasbour

In [60]:
# Convert columns to convenient types
hotels.loc[:, 'city_id'] = hotels['city_id'].astype('int')
hotels.loc[:, 'hotel_id'] = hotels['hotel_id'].astype('int')
hotels.loc[:, 'latitude'] = hotels['latitude'].astype('float')
hotels.loc[:, 'longitude'] = hotels['longitude'].astype('float')
hotels.loc[:, 'score'] = hotels['score'].astype('float')

In [61]:
# Sanity check
hotels.head()

Unnamed: 0,city_id,city_name,hotel_id,name,url,latitude,longitude,score,description,extrat,address
0,25,Aigues Mortes,0,La Villa Mazarin,https://www.booking.com/hotel/fr/la-villa-maza...,43.564987,4.191752,9.2,La Villa Mazarin est construite dans un bâtime...,[],"35 boulevard Gambetta, 30220 Aigues-Mortes, Fr..."
1,25,Aigues Mortes,1,Boutique Hôtel des Remparts & Spa,https://www.booking.com/hotel/fr/les-remparts-...,43.568036,4.190344,9.5,Aménagé dans une ancienne base militaire datan...,[],"6, Place Anatole France, 30220 Aigues-Mortes, ..."
2,25,Aigues Mortes,2,L’olivier De Manel,https://www.booking.com/hotel/fr/olivier-de-ma...,43.57367,4.183167,8.8,Situé à 23 km de la salle omnisports Montpelli...,[],"55 rue Antoine bedaride, 30220 Aigues-Mortes, ..."
3,25,Aigues Mortes,3,Hôtel La Plage 5 étoiles La Grande Motte,https://www.booking.com/hotel/fr/hoteldelaplag...,43.55605,4.098812,8.4,"Situé sur le front de mer, l'Hôtel La Plage 5 ...",[],"Allée Du Levant, 34280 La Grande Motte, France"
4,25,Aigues Mortes,4,FLA112,https://www.booking.com/hotel/fr/fla112.fr.htm...,43.522869,4.136153,,"FLA112 is located in Le Grau-du-Roi, 700 metre...","[Appartement entier, 34 m² superficie, Terrass...","2 avenue Jean Lasserre, 30240 Le Grau-du-Roi, ..."


In [62]:
# Save hotels DataFrame into .csv file
hotels.to_csv('res/hotels.csv', index=False)

In [63]:
hotels_with_score = hotels.loc[hotels['score'].notnull(),:]

fig = px.scatter_mapbox(hotels_with_score, lat="latitude", lon="longitude", hover_name = 'name', zoom = 4,
                        hover_data = ['description'],
                        color = 'score', color_continuous_scale = 'thermal',
                        mapbox_style="carto-positron")
fig.show()

In [76]:
YOUR_ACCESS_KEY_ID = os.getenv("YOUR_ACCESS_KEY_ID")
YOUR_SECRET_ACCESS_KEY = os.getenv ("YOUR_SECRET_ACCESS_KEY")

5. Use `boto3` to save the DataFrames about cities and hotels into `.csv` files located in a new S3 bucket in your AWS account:

In [77]:
print (YOUR_SECRET_ACCESS_KEY)
print (YOUR_ACCESS_KEY_ID)


3ImeVIelgYU9MfSh90GcEoKJgFgXoFv65FMkEHIw
AKIA5HOXYRWA75Y2EPM4


In [79]:
BUCKET_NAME = "my-project-kayak-dustin"

configuration={
        'LocationConstraint': 'eu-central-1'
}

# Initiate a new session
session = boto3.Session(aws_access_key_id=YOUR_ACCESS_KEY_ID, 
                        aws_secret_access_key=YOUR_SECRET_ACCESS_KEY)

# Declare s3 object and create a new bucket
s3 = session.resource("s3")

bucket = s3.Bucket(BUCKET_NAME)
if bucket.creation_date:
    print(f"Bucket already created {bucket.creation_date}")
else:
    print("Creating the bucket")
    bucket = s3.create_bucket(Bucket=BUCKET_NAME, CreateBucketConfiguration=configuration)

# Export hotels to CSV file and upload it
hotels_csv = hotels.to_csv()
put_object = bucket.put_object(Key = "hotels.csv", Body = hotels_csv)
print("Finish to upload the file hotels.csv")

# Do the same for cities
cities_csv = cities.to_csv()
put_object = bucket.put_object(Key = "cities.csv", Body = cities_csv)
print("Finish to upload the file cities.csv")

Creating the bucket
Finish to upload the file hotels.csv
Finish to upload the file cities.csv
