# Delivery of Weather History using Web Crawl in Python

The purpose of this notebook is to attain the hourly weather history of 2 days ago from the current moment of all possible districts in 2 cities, Hanoi and Ho Chi Minh City, Vietnam. The data API can be found on Visual Crossing Weather:  https://www.visualcrossing.com/.

Essential libraries are imported to get the needed data from the API. requests is used to handle HTTP GET requests. datetime and timedelta are utilized to calculate time-related information, and pandas is only for the purpose of visualizing the dataframe. 

In [1]:
import requests
from datetime import datetime, timedelta
import pandas as pd
import os

The website allows the creation of free account, which is sufficient within the scope of this project. Each account is provided with an unique API key to access the data. To protect privacy, the exact key is hidden.

Cities is a dictionary of all districts in one city. As the addresses of Visual Crossing Weather don't follow a conformable pattern, the districts were manually detailed.

In [3]:
API_KEY = os.environ["WEATHER_API"]

# Define the cities and their districts
cities = {
    "Ho Chi Minh City": ["Quan 1", "Quan 2", "Quan 3", "Quan 4", "Quan 5", "Quan 6", "Quan 7", "Quan 8", "Quan 9", "Quan 10", "Quan 11", "Quan 12", "Quan Thu Duc", "Quan Tan Binh", "Quan Binh Tan", "Quan Binh Thanh", "Quan Tan Phu", "Quan Go Vap", "Quan Phu Nhuan", "Huyen Binh Chanh", "Huyen Hoc Mon", "Huyen Can Gio", "Huyen Cu Chi", "Huyen Nha Be"],
    "Hanoi": ["Quan Hoan Kiem", "Quan Dong Da", "Quan Ba Dinh", "Quan Hai Ba Trung", "Quan Hoang Mai", "Quan Thanh Xuan", "Quan Long Bien", "Quan Nam Tu Liem", "Quan Bac Tu Liem", "Quan Tay Ho", "Quan Cau Giay", "Quan Ha Đong", "Thi xa Son Tay", "Huyen Ba Vi", "Huyen Chuong My", "Huyen Phuc Tho", "Huyen Dan Phuong", "Huyen Dong Anh", "Huyenn Gia Lam", "Huyen Hoai Duc", "Huyen Me Linh", "Huyen My Duc", "Huyen Phu Xuyen", "Huyen Quoc Oai", "Huyen Soc Son", "Huyen Thach That", "Huyen Thanh Oai", "Huyen Thuong Tin", "Huyen Ung Hoa", "Huyen Thanh Tri"]
}

# Calculate the date two days ago
two_days_ago = (datetime.now() - timedelta(days=2)).strftime("%Y-%m-%d")

get_weather_data is a function that get the weather history for each district in each city. The weather elements are listed in parameter "elements" to get only important features. The unit group is defined as "metric" to acquire the data in International System of Units.

To avoid error 429 (too many requests) or DDoS, Multiple Location Timeline Weather API can be used. However, due to limit in time, this possibility wasn't implemented and tested properly.

In [4]:
def get_weather_data(city, district):
    base_url = "https://weather.visualcrossing.com/VisualCrossingWebServices/rest/services/timeline"
    
    # Construct the full address
    address = f"{district}, {city}, Vietnam"
    
    params = {
        "unitGroup": "metric",
        "key": API_KEY,
        "contentType": "json",
        "include": "hours",
        "elements": "datetime,temp,humidity,precip,windspeed,conditions,icon",
    }
    
    url = f"{base_url}/{address}/{two_days_ago}"
    response = requests.get(url, params=params)
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error fetching data for {address}: {response.status_code}")
        return None

The data is then turned into a dataframe for easy visualization.

In [5]:
# Collect data for all cities and districts
all_data = []

for city, districts in cities.items():
    for district in districts:
        weather_data = get_weather_data(city, district)
        if weather_data:
            for hour_data in weather_data['days'][0]['hours']:
                all_data.append({
                    'City': city,
                    'District': district,
                    'Date': weather_data['days'][0]['datetime'],
                    'Time': hour_data['datetime'],
                    'Temperature': hour_data['temp'],
                    'Humidity': hour_data['humidity'],
                    'Precipitation': hour_data['precip'],
                    'WindSpeed': hour_data['windspeed'],
                    'Conditions': hour_data['conditions'],
                    'Icon': hour_data['icon'],
                    'Exact Location': weather_data['resolvedAddress']
                })

# Create pandas DataFrame
df = pd.DataFrame(all_data)

# Display the first few rows of the DataFrame
print(df.head())

               City District        Date      Time  Temperature  Humidity  \
0  Ho Chi Minh City   Quan 1  2024-08-17  00:00:00         29.0     74.46   
1  Ho Chi Minh City   Quan 1  2024-08-17  01:00:00         27.0     88.83   
2  Ho Chi Minh City   Quan 1  2024-08-17  02:00:00         27.0     88.83   
3  Ho Chi Minh City   Quan 1  2024-08-17  03:00:00         27.0     88.83   
4  Ho Chi Minh City   Quan 1  2024-08-17  04:00:00         28.0     78.91   

   Precipitation  WindSpeed        Conditions                 Icon  \
0            0.0        7.6  Partially cloudy  partly-cloudy-night   
1            0.0        5.4  Partially cloudy  partly-cloudy-night   
2            0.0        5.4          Overcast               cloudy   
3            0.0        7.6  Partially cloudy  partly-cloudy-night   
4            0.0        3.6  Partially cloudy  partly-cloudy-night   

                  Exact Location  
0  Quận 1, Hồ Chí Minh, Việt Nam  
1  Quận 1, Hồ Chí Minh, Việt Nam  
2  Quận 1, 

In [6]:
# Save to CSV (optional)
df.to_csv('weather_data.csv', index=False)
print("Data collection complete. Results saved to weather_data.csv")

Data collection complete. Results saved to weather_data.csv
