# Data Exploration for Rest Countries & OpenWeather API

## Project Objective:

The objective of this project is to collect all meteorological data from the capitals of every country in the world and classify them based on their weather factors: the hottest city, the coldest city, etc.

To achieve this, we will retrieve data from the OpenWeather API.

### Install Dependencies : 

In [1]:
! pip install requests pandas matplotlib seaborn


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
import requests
import json
import os
import time
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


### Retrieving capitals with the Rest Countries API

In [3]:
rest_countries_url = "https://restcountries.com/v3.1/all?fields=name,capital"

In [4]:
def get_capitals():
    """
    Queries the REST Countries API to retrieve the list of capitals of every country in the world.
    Excludes countries without a capital.
    """
    try:
        response = requests.get(rest_countries_url, timeout=10)
        response.raise_for_status()
        countries = response.json()

        # Extract capitals
        capitals = []
        for country in countries:
            name = country.get('name', {}).get('common', 'Unknown Country')
            capital_list = country.get('capital', [])
            if capital_list:  # Check that the list of capitals is not empty
                capitals.append({"country": name, "city": capital_list[0]})

        print("\nTotal number of captured capitals:", len(capitals))
        return capitals

    except requests.exceptions.RequestException as e:
        print(f"Error when querying the REST Countries API: {e}")
        return []


In [5]:
get_capitals()


Total number of captured capitals: 246


[{'country': 'South Georgia', 'city': 'King Edward Point'},
 {'country': 'Grenada', 'city': "St. George's"},
 {'country': 'Switzerland', 'city': 'Bern'},
 {'country': 'Sierra Leone', 'city': 'Freetown'},
 {'country': 'Hungary', 'city': 'Budapest'},
 {'country': 'Taiwan', 'city': 'Taipei'},
 {'country': 'Wallis and Futuna', 'city': 'Mata-Utu'},
 {'country': 'Barbados', 'city': 'Bridgetown'},
 {'country': 'Pitcairn Islands', 'city': 'Adamstown'},
 {'country': 'Ivory Coast', 'city': 'Yamoussoukro'},
 {'country': 'Tunisia', 'city': 'Tunis'},
 {'country': 'Italy', 'city': 'Rome'},
 {'country': 'Benin', 'city': 'Porto-Novo'},
 {'country': 'Indonesia', 'city': 'Jakarta'},
 {'country': 'Cape Verde', 'city': 'Praia'},
 {'country': 'Saint Kitts and Nevis', 'city': 'Basseterre'},
 {'country': 'Laos', 'city': 'Vientiane'},
 {'country': 'Caribbean Netherlands', 'city': 'Kralendijk'},
 {'country': 'Uganda', 'city': 'Kampala'},
 {'country': 'Andorra', 'city': 'Andorra la Vella'},
 {'country': 'Burund

### Data recovery:

In [6]:
# OpenWeather API Key
api_key = "b022acb509eacae0875ded1afe41a527"

# OpenWeather API URL
base_url = "https://api.openweathermap.org/data/2.5/weather"


In [7]:
# Function to call the API and retrieve weather data
def fetch_weather_data(api_key, city):
    try:
        params = {"appid": api_key, "lang": "en", "q": city}
        response = requests.get(base_url, params=params)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"Error calling the API for {city}: {e}")
        return None

In [8]:
# Retrieving data for all capitals
def fetch_weather_for_all_capitals(api_key, capitals):
    weather_data = {}
    for capital in capitals:
        city = capital["city"]
        country = capital["country"]
        print(f"Retrieving data for {city}, {country}...")
        
        data = fetch_weather_data(api_key, city)
        if data:
            weather_data[city] = data
        else:
            weather_data[city] = {"error": "Data not available"}
        
        # Pause to avoid exceeding API rate limits
        time.sleep(1)
    return weather_data

In [12]:
capitals = get_capitals()
all_weather_data = fetch_weather_for_all_capitals(api_key, capitals)



Total number of captured capitals: 246
Retrieving data for King Edward Point, South Georgia...
Error calling the API for King Edward Point: 404 Client Error: Not Found for url: https://api.openweathermap.org/data/2.5/weather?appid=b022acb509eacae0875ded1afe41a527&lang=en&q=King+Edward+Point
Retrieving data for St. George's, Grenada...
Retrieving data for Bern, Switzerland...
Retrieving data for Freetown, Sierra Leone...
Retrieving data for Budapest, Hungary...
Retrieving data for Taipei, Taiwan...
Retrieving data for Mata-Utu, Wallis and Futuna...
Retrieving data for Bridgetown, Barbados...
Retrieving data for Adamstown, Pitcairn Islands...
Retrieving data for Yamoussoukro, Ivory Coast...
Retrieving data for Tunis, Tunisia...
Retrieving data for Rome, Italy...
Retrieving data for Porto-Novo, Benin...
Retrieving data for Jakarta, Indonesia...
Retrieving data for Praia, Cape Verde...
Retrieving data for Basseterre, Saint Kitts and Nevis...
Retrieving data for Vientiane, Laos...
Retrievi

In [13]:
# Call the function for Paris
city = "Paris"
print(f"Retrieving weather data for {city}...")
paris_weather_data = fetch_weather_data(api_key, city)

# Display all JSON data for Paris
if paris_weather_data:
    print(f"Complete weather data for {city}:")
    print(json.dumps(paris_weather_data, indent=4, ensure_ascii=False))
else:
    print(f"Weather data for {city} not available.")

Retrieving weather data for Paris...
Complete weather data for Paris:
{
    "coord": {
        "lon": 2.3488,
        "lat": 48.8534
    },
    "weather": [
        {
            "id": 800,
            "main": "Clear",
            "description": "clear sky",
            "icon": "01d"
        }
    ],
    "base": "stations",
    "main": {
        "temp": 291.46,
        "feels_like": 290.5,
        "temp_min": 290.58,
        "temp_max": 292.18,
        "pressure": 1006,
        "humidity": 44,
        "sea_level": 1006,
        "grnd_level": 996
    },
    "visibility": 10000,
    "wind": {
        "speed": 6.69,
        "deg": 120
    },
    "clouds": {
        "all": 0
    },
    "dt": 1741451119,
    "sys": {
        "type": 2,
        "id": 2012208,
        "country": "FR",
        "sunrise": 1741414714,
        "sunset": 1741455866
    },
    "timezone": 3600,
    "id": 2988507,
    "name": "Paris",
    "cod": 200
}


### Units of JSON Fields from the OpenWeather API

The fields and their associated units retrieved from the OpenWeather API are listed below:

#### Geographic Coordinates
- **`coord.lon`**: degrees (longitude)  
- **`coord.lat`**: degrees (latitude)

#### Weather Conditions
- **`weather.id`**: no unit (weather condition identifier)  
- **`weather.main`**: no unit (group of weather parameters)  
- **`weather.description`**: no unit (description of the weather condition)  
- **`weather.icon`**: no unit (weather icon identifier)

#### Main Data
- **`main.temp`**: Kelvin (temperature)  
- **`main.feels_like`**: Kelvin (feels like temperature)  
- **`main.temp_min`**: Kelvin (minimum temperature)  
- **`main.temp_max`**: Kelvin (maximum temperature)  
- **`main.pressure`**: hPa (atmospheric pressure at sea level)  
- **`main.humidity`**: % (humidity)  
- **`main.sea_level`**: hPa (atmospheric pressure at sea level)  
- **`main.grnd_level`**: hPa (atmospheric pressure at ground level)

#### Visibility and Wind
- **`visibility`**: meters (visibility)  
- **`wind.speed`**: meters/sec (wind speed)  
- **`wind.deg`**: degrees (wind direction)

#### Cloudiness
- **`clouds.all`**: % (cloudiness)

#### Date and Time
- **`dt`**: UNIX timestamp (date and time when data was calculated)

#### System Information
- **`sys.type`**: no unit (type of weather station)  
- **`sys.id`**: no unit (weather station identifier)  
- **`sys.country`**: no unit (country code)  
- **`sys.sunrise`**: UNIX timestamp (sunrise time)  
- **`sys.sunset`**: UNIX timestamp (sunset time)

#### Time Zone
- **`timezone`**: seconds (time offset from UTC)

#### City Information
- **`id`**: no unit (city identifier)  
- **`name`**: no unit (city name)

#### Response Code
- **`cod`**: no unit (HTTP response code)

---

### Note

Default units can be modified via the **`units`** parameter in the API call to:  
- Get temperatures in Celsius: **`units=metric`**  
- Get temperatures in Fahrenheit and wind speed in miles per hour: **`units=imperial`**

### Converting semi-structured data: JSON to structured: DataFrame

In [14]:
def convert_weather_data_to_dataframe(weather_data, capitals):
    # Create an empty list to store the structured data
    structured_data = []

    # Iterate over the weather data for each city
    for capital in capitals:
        city = capital["city"]
        country = capital["country"]
        data = weather_data.get(city, {})

        if "error" in data:
            print(f"Data not available for {city}, {country}.")
            continue

        # Extract the necessary information while handling missing keys
        structured_data.append({
            "country": country,
            "city": city,
            "id": data.get("id"),
            "lon": data.get("coord", {}).get("lon"),
            "lat": data.get("coord", {}).get("lat"),
            "base": data.get("base"),
            "main": data.get("weather", [{}])[0].get("main"),
            "description": data.get("weather", [{}])[0].get("description"),
            "temp": data.get("main", {}).get("temp"),
            "feels_like": data.get("main", {}).get("feels_like"),
            "temp_min": data.get("main", {}).get("temp_min"),
            "tem_max": data.get("main", {}).get("temp_max"),
            "pressure": data.get("main", {}).get("pressure"),
            "humidity": data.get("main", {}).get("humidity"),
            "sea_level": data.get("main", {}).get("sea_level"),
            "grnd_level": data.get("main", {}).get("grnd_level"),
            "visibility": data.get("visibility"),
            "speed": data.get("wind", {}).get("speed"),
            "deg": data.get("wind", {}).get("deg"),
            "clouds": data.get("clouds", {}).get("all"),
            "dt": data.get("dt"),
            "sunrise": data.get("sys", {}).get("sunrise"),
            "sunset": data.get("sys", {}).get("sunset"),
            "timezone": data.get("timezone"),
            "cod": data.get("cod")
        })

    # Convert the structured data into a DataFrame
    df = pd.DataFrame(structured_data)
    return df

In [15]:
# Convert the collected data into a DataFrame
df_weather = convert_weather_data_to_dataframe(all_weather_data, capitals)


Data not available for King Edward Point, South Georgia.
Data not available for St. Peter Port, Guernsey.
Data not available for Fakaofo, Tokelau.
Data not available for Papeetē, French Polynesia.
Data not available for Ngerulmud, Palau.
Data not available for Diego Garcia, British Indian Ocean Territory.


In [16]:
df_weather.head()

Unnamed: 0,country,city,id,lon,lat,base,main,description,temp,feels_like,...,grnd_level,visibility,speed,deg,clouds,dt,sunrise,sunset,timezone,cod
0,Grenada,St. George's,3579925,-61.7485,12.0564,stations,Clouds,few clouds,301.97,306.21,...,1010,10000,6.69,130,20,1741451051,1741429110,1741472231,-14400,200
1,Switzerland,Bern,2661552,7.4474,46.9481,stations,Clear,clear sky,288.79,287.35,...,938,10000,3.09,320,0,1741450891,1741413417,1741454716,3600,200
2,Sierra Leone,Freetown,2409306,-13.2299,8.484,stations,Clouds,scattered clouds,300.99,304.02,...,1002,10000,6.17,250,40,1741451053,1741417399,1741460658,0,200
3,Hungary,Budapest,3054643,19.0399,47.498,stations,Clear,clear sky,290.77,289.37,...,994,10000,1.54,90,0,1741450802,1741410660,1741451910,3600,200
4,Taiwan,Taipei,1668341,121.5319,25.0478,stations,Clouds,broken clouds,289.32,288.72,...,1003,10000,5.66,100,75,1741451008,1741471768,1741514384,28800,200


In [17]:
df_weather.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 25 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   country      240 non-null    object 
 1   city         240 non-null    object 
 2   id           240 non-null    int64  
 3   lon          240 non-null    float64
 4   lat          240 non-null    float64
 5   base         240 non-null    object 
 6   main         240 non-null    object 
 7   description  240 non-null    object 
 8   temp         240 non-null    float64
 9   feels_like   240 non-null    float64
 10  temp_min     240 non-null    float64
 11  tem_max      240 non-null    float64
 12  pressure     240 non-null    int64  
 13  humidity     240 non-null    int64  
 14  sea_level    240 non-null    int64  
 15  grnd_level   240 non-null    int64  
 16  visibility   240 non-null    int64  
 17  speed        240 non-null    float64
 18  deg          240 non-null    int64  
 19  clouds  

In [18]:
df_weather.describe()

Unnamed: 0,id,lon,lat,temp,feels_like,temp_min,tem_max,pressure,humidity,sea_level,grnd_level,visibility,speed,deg,clouds,dt,sunrise,sunset,timezone,cod
count,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0,240.0
mean,2499525.0,12.490642,18.854052,293.477875,293.471167,293.168625,293.815167,1012.5125,61.991667,1012.5125,971.058333,9610.041667,3.748833,157.383333,42.020833,1741451000.0,1741423000.0,1741466000.0,5193.75,200.0
std,1651845.0,73.888524,25.107127,9.277577,11.055642,9.439592,9.193141,5.753646,21.740769,5.753646,63.380276,1350.970235,2.292998,102.168552,38.06144,130.9346,25214.13,25543.26,17625.984126,0.0
min,53654.0,-176.1745,-49.35,265.06,260.45,265.06,265.06,989.0,8.0,989.0,701.0,3200.0,0.0,0.0,0.0,1741451000.0,1741388000.0,1741432000.0,-39600.0,200.0
25%,1156250.0,-53.32365,4.097775,288.0275,286.98,287.7625,288.36,1009.0,45.75,1009.0,960.0,10000.0,2.06,80.0,0.75,1741451000.0,1741407000.0,1741450000.0,-8100.0,200.0
50%,2390970.0,16.1748,17.997,295.3,295.33,295.25,295.73,1013.0,65.5,1013.0,1000.0,10000.0,3.515,130.0,34.5,1741451000.0,1741414000.0,1741456000.0,3600.0,200.0
75%,3536317.0,47.64665,40.23025,300.2075,301.885,300.1525,300.735,1016.25,78.0,1016.25,1009.0,10000.0,5.14,232.5,75.0,1741451000.0,1741432000.0,1741475000.0,14400.0,200.0
max,8224783.0,179.1942,78.2186,314.31,320.17,314.31,314.31,1029.0,100.0,1029.0,1024.0,10000.0,17.04,360.0,100.0,1741451000.0,1741542000.0,1741587000.0,46800.0,200.0


In [19]:
df_weather.head()

Unnamed: 0,country,city,id,lon,lat,base,main,description,temp,feels_like,...,grnd_level,visibility,speed,deg,clouds,dt,sunrise,sunset,timezone,cod
0,Grenada,St. George's,3579925,-61.7485,12.0564,stations,Clouds,few clouds,301.97,306.21,...,1010,10000,6.69,130,20,1741451051,1741429110,1741472231,-14400,200
1,Switzerland,Bern,2661552,7.4474,46.9481,stations,Clear,clear sky,288.79,287.35,...,938,10000,3.09,320,0,1741450891,1741413417,1741454716,3600,200
2,Sierra Leone,Freetown,2409306,-13.2299,8.484,stations,Clouds,scattered clouds,300.99,304.02,...,1002,10000,6.17,250,40,1741451053,1741417399,1741460658,0,200
3,Hungary,Budapest,3054643,19.0399,47.498,stations,Clear,clear sky,290.77,289.37,...,994,10000,1.54,90,0,1741450802,1741410660,1741451910,3600,200
4,Taiwan,Taipei,1668341,121.5319,25.0478,stations,Clouds,broken clouds,289.32,288.72,...,1003,10000,5.66,100,75,1741451008,1741471768,1741514384,28800,200


### Data analysis:

### Data preparation:

#### Identify duplicates:

In [20]:
duplicates = df_weather[df_weather.duplicated()]

# Display duplicates (if any)
if not duplicates.empty:
    print("Removed duplicates:")
    print(duplicates)

# Remove duplicates from the main DataFrame
df_weather = df_weather.drop_duplicates()


#### Delete lines with Response_Code different from 200:

In [21]:
rows_to_delete = df_weather[df_weather["cod"] != 200]
print("Deleted rows:")
print(rows_to_delete)

# Delete rows with response code different from 200
df_weather = df_weather[df_weather["cod"] == 200]


Deleted rows:
Empty DataFrame
Columns: [country, city, id, lon, lat, base, main, description, temp, feels_like, temp_min, tem_max, pressure, humidity, sea_level, grnd_level, visibility, speed, deg, clouds, dt, sunrise, sunset, timezone, cod]
Index: []

[0 rows x 25 columns]


#### Add a localdatetime column:

In [22]:
df_weather["local_datetime"] = pd.to_datetime(df_weather["dt"], unit='s') + pd.to_timedelta(df_weather["timezone"], unit='s')

df_weather.head()

Unnamed: 0,country,city,id,lon,lat,base,main,description,temp,feels_like,...,visibility,speed,deg,clouds,dt,sunrise,sunset,timezone,cod,local_datetime
0,Grenada,St. George's,3579925,-61.7485,12.0564,stations,Clouds,few clouds,301.97,306.21,...,10000,6.69,130,20,1741451051,1741429110,1741472231,-14400,200,2025-03-08 12:24:11
1,Switzerland,Bern,2661552,7.4474,46.9481,stations,Clear,clear sky,288.79,287.35,...,10000,3.09,320,0,1741450891,1741413417,1741454716,3600,200,2025-03-08 17:21:31
2,Sierra Leone,Freetown,2409306,-13.2299,8.484,stations,Clouds,scattered clouds,300.99,304.02,...,10000,6.17,250,40,1741451053,1741417399,1741460658,0,200,2025-03-08 16:24:13
3,Hungary,Budapest,3054643,19.0399,47.498,stations,Clear,clear sky,290.77,289.37,...,10000,1.54,90,0,1741450802,1741410660,1741451910,3600,200,2025-03-08 17:20:02
4,Taiwan,Taipei,1668341,121.5319,25.0478,stations,Clouds,broken clouds,289.32,288.72,...,10000,5.66,100,75,1741451008,1741471768,1741514384,28800,200,2025-03-09 00:23:28


#### Remove unnecessary columns:

In [23]:
columns_to_drop = ["dt", "timezone","id", "base", "cod"]
df_weather = df_weather.drop(columns=columns_to_drop)

df_weather.head()

Unnamed: 0,country,city,lon,lat,main,description,temp,feels_like,temp_min,tem_max,...,humidity,sea_level,grnd_level,visibility,speed,deg,clouds,sunrise,sunset,local_datetime
0,Grenada,St. George's,-61.7485,12.0564,Clouds,few clouds,301.97,306.21,301.97,301.97,...,74,1015,1010,10000,6.69,130,20,1741429110,1741472231,2025-03-08 12:24:11
1,Switzerland,Bern,7.4474,46.9481,Clear,clear sky,288.79,287.35,288.42,289.57,...,36,1008,938,10000,3.09,320,0,1741413417,1741454716,2025-03-08 17:21:31
2,Sierra Leone,Freetown,-13.2299,8.484,Clouds,scattered clouds,300.99,304.02,300.99,300.99,...,74,1008,1002,10000,6.17,250,40,1741417399,1741460658,2025-03-08 16:24:13
3,Hungary,Budapest,19.0399,47.498,Clear,clear sky,290.77,289.37,290.26,292.44,...,30,1015,994,10000,1.54,90,0,1741410660,1741451910,2025-03-08 17:20:02
4,Taiwan,Taipei,121.5319,25.0478,Clouds,broken clouds,289.32,288.72,288.89,289.86,...,66,1023,1003,10000,5.66,100,75,1741471768,1741514384,2025-03-09 00:23:28


#### Converting UNIX timestamps to ISO 8601 format:

In [24]:
for col in ["sunrise", "sunset"]:
    df_weather[col] = pd.to_datetime(df_weather[col], unit='s')

df_weather.head()


Unnamed: 0,country,city,lon,lat,main,description,temp,feels_like,temp_min,tem_max,...,humidity,sea_level,grnd_level,visibility,speed,deg,clouds,sunrise,sunset,local_datetime
0,Grenada,St. George's,-61.7485,12.0564,Clouds,few clouds,301.97,306.21,301.97,301.97,...,74,1015,1010,10000,6.69,130,20,2025-03-08 10:18:30,2025-03-08 22:17:11,2025-03-08 12:24:11
1,Switzerland,Bern,7.4474,46.9481,Clear,clear sky,288.79,287.35,288.42,289.57,...,36,1008,938,10000,3.09,320,0,2025-03-08 05:56:57,2025-03-08 17:25:16,2025-03-08 17:21:31
2,Sierra Leone,Freetown,-13.2299,8.484,Clouds,scattered clouds,300.99,304.02,300.99,300.99,...,74,1008,1002,10000,6.17,250,40,2025-03-08 07:03:19,2025-03-08 19:04:18,2025-03-08 16:24:13
3,Hungary,Budapest,19.0399,47.498,Clear,clear sky,290.77,289.37,290.26,292.44,...,30,1015,994,10000,1.54,90,0,2025-03-08 05:11:00,2025-03-08 16:38:30,2025-03-08 17:20:02
4,Taiwan,Taipei,121.5319,25.0478,Clouds,broken clouds,289.32,288.72,288.89,289.86,...,66,1023,1003,10000,5.66,100,75,2025-03-08 22:09:28,2025-03-09 09:59:44,2025-03-09 00:23:28


#### Converting temperatures from Kelvin to °C:

In [25]:
temperature_columns = ["temp", "feels_like", "temp_min", "tem_max"]
for col in temperature_columns:
    df_weather[col] = df_weather[col] - 273.15
    
df_weather.head()

Unnamed: 0,country,city,lon,lat,main,description,temp,feels_like,temp_min,tem_max,...,humidity,sea_level,grnd_level,visibility,speed,deg,clouds,sunrise,sunset,local_datetime
0,Grenada,St. George's,-61.7485,12.0564,Clouds,few clouds,28.82,33.06,28.82,28.82,...,74,1015,1010,10000,6.69,130,20,2025-03-08 10:18:30,2025-03-08 22:17:11,2025-03-08 12:24:11
1,Switzerland,Bern,7.4474,46.9481,Clear,clear sky,15.64,14.2,15.27,16.42,...,36,1008,938,10000,3.09,320,0,2025-03-08 05:56:57,2025-03-08 17:25:16,2025-03-08 17:21:31
2,Sierra Leone,Freetown,-13.2299,8.484,Clouds,scattered clouds,27.84,30.87,27.84,27.84,...,74,1008,1002,10000,6.17,250,40,2025-03-08 07:03:19,2025-03-08 19:04:18,2025-03-08 16:24:13
3,Hungary,Budapest,19.0399,47.498,Clear,clear sky,17.62,16.22,17.11,19.29,...,30,1015,994,10000,1.54,90,0,2025-03-08 05:11:00,2025-03-08 16:38:30,2025-03-08 17:20:02
4,Taiwan,Taipei,121.5319,25.0478,Clouds,broken clouds,16.17,15.57,15.74,16.71,...,66,1023,1003,10000,5.66,100,75,2025-03-08 22:09:28,2025-03-09 09:59:44,2025-03-09 00:23:28


#### Added length of day:

In [26]:
df_weather["daylight_duration"] = (df_weather["sunset"] - df_weather["sunrise"]).dt.total_seconds() / 3600

df_weather.head()

Unnamed: 0,country,city,lon,lat,main,description,temp,feels_like,temp_min,tem_max,...,sea_level,grnd_level,visibility,speed,deg,clouds,sunrise,sunset,local_datetime,daylight_duration
0,Grenada,St. George's,-61.7485,12.0564,Clouds,few clouds,28.82,33.06,28.82,28.82,...,1015,1010,10000,6.69,130,20,2025-03-08 10:18:30,2025-03-08 22:17:11,2025-03-08 12:24:11,11.978056
1,Switzerland,Bern,7.4474,46.9481,Clear,clear sky,15.64,14.2,15.27,16.42,...,1008,938,10000,3.09,320,0,2025-03-08 05:56:57,2025-03-08 17:25:16,2025-03-08 17:21:31,11.471944
2,Sierra Leone,Freetown,-13.2299,8.484,Clouds,scattered clouds,27.84,30.87,27.84,27.84,...,1008,1002,10000,6.17,250,40,2025-03-08 07:03:19,2025-03-08 19:04:18,2025-03-08 16:24:13,12.016389
3,Hungary,Budapest,19.0399,47.498,Clear,clear sky,17.62,16.22,17.11,19.29,...,1015,994,10000,1.54,90,0,2025-03-08 05:11:00,2025-03-08 16:38:30,2025-03-08 17:20:02,11.458333
4,Taiwan,Taipei,121.5319,25.0478,Clouds,broken clouds,16.17,15.57,15.74,16.71,...,1023,1003,10000,5.66,100,75,2025-03-08 22:09:28,2025-03-09 09:59:44,2025-03-09 00:23:28,11.837778


#### Calculation of temperature difference:

In [27]:
df_weather["temperature_difference"] = (
    df_weather["tem_max"] - df_weather["temp_min"]
)

df_weather.head()

Unnamed: 0,country,city,lon,lat,main,description,temp,feels_like,temp_min,tem_max,...,grnd_level,visibility,speed,deg,clouds,sunrise,sunset,local_datetime,daylight_duration,temperature_difference
0,Grenada,St. George's,-61.7485,12.0564,Clouds,few clouds,28.82,33.06,28.82,28.82,...,1010,10000,6.69,130,20,2025-03-08 10:18:30,2025-03-08 22:17:11,2025-03-08 12:24:11,11.978056,0.0
1,Switzerland,Bern,7.4474,46.9481,Clear,clear sky,15.64,14.2,15.27,16.42,...,938,10000,3.09,320,0,2025-03-08 05:56:57,2025-03-08 17:25:16,2025-03-08 17:21:31,11.471944,1.15
2,Sierra Leone,Freetown,-13.2299,8.484,Clouds,scattered clouds,27.84,30.87,27.84,27.84,...,1002,10000,6.17,250,40,2025-03-08 07:03:19,2025-03-08 19:04:18,2025-03-08 16:24:13,12.016389,0.0
3,Hungary,Budapest,19.0399,47.498,Clear,clear sky,17.62,16.22,17.11,19.29,...,994,10000,1.54,90,0,2025-03-08 05:11:00,2025-03-08 16:38:30,2025-03-08 17:20:02,11.458333,2.18
4,Taiwan,Taipei,121.5319,25.0478,Clouds,broken clouds,16.17,15.57,15.74,16.71,...,1003,10000,5.66,100,75,2025-03-08 22:09:28,2025-03-09 09:59:44,2025-03-09 00:23:28,11.837778,0.97


#### Calculation of the thermal comfort index (simplified):

In [28]:
# Calculate the thermal comfort index including the effect of wind speed
df_weather["thermal_comfort_index"] = (
    df_weather["temp"] -  # Ambient temperature
    (0.55 * (1 - (df_weather["humidity"] / 100)) * (df_weather["temp"] - 14.5)) -  # Humidity effect
    (0.2 * df_weather["speed"])  # Wind effect
)

# Display a preview of the results
df_weather[["temp", "humidity", "speed", "thermal_comfort_index"]].head()


Unnamed: 0,temp,humidity,speed,thermal_comfort_index
0,28.82,74,6.69,25.43424
1,15.64,36,3.09,14.62072
2,27.84,74,6.17,24.69838
3,17.62,30,1.54,16.1108
4,16.17,66,5.66,14.72571


#### Added Season column:

In [29]:
def get_season(date, lat):
    """
    Determines the season based on the date and latitude.
    """
    day_of_year = date.timetuple().tm_yday  # Day number of the year

    if lat > 0:  # Northern Hemisphere
        if 80 <= day_of_year < 172:  # March 21 - June 20
            return "Spring"
        elif 172 <= day_of_year < 264:  # June 21 - September 20
            return "Summer"
        elif 264 <= day_of_year < 355:  # September 21 - December 20
            return "Autumn"
        else:  # December 21 - March 20
            return "Winter"
    else:  # Southern Hemisphere
        if 80 <= day_of_year < 172:  # March 21 - June 20
            return "Autumn"
        elif 172 <= day_of_year < 264:  # June 21 - September 20
            return "Winter"
        elif 264 <= day_of_year < 355:  # September 21 - December 20
            return "Spring"
        else:  # December 21 - March 20
            return "Summer"

# Apply the function to create the "season" column
df_weather["season"] = df_weather.apply(
    lambda row: get_season(row["local_datetime"], row["lat"]), axis=1
)

# Check the results
print(df_weather[["local_datetime", "lat", "season"]].head())


       local_datetime      lat  season
0 2025-03-08 12:24:11  12.0564  Winter
1 2025-03-08 17:21:31  46.9481  Winter
2 2025-03-08 16:24:13   8.4840  Winter
3 2025-03-08 17:20:02  47.4980  Winter
4 2025-03-09 00:23:28  25.0478  Winter


#### Added categories for temperatures:

In [30]:
def categorize_temperature(temp):
    if temp < 0:
        return "Very Cold"
    elif temp < 10:
        return "Cold"
    elif temp < 25:
        return "Moderate"
    else:
        return "Hot"

df_weather["temperature_category"] = df_weather["temp"].apply(categorize_temperature)

df_weather.head()


Unnamed: 0,country,city,lon,lat,main,description,temp,feels_like,temp_min,tem_max,...,deg,clouds,sunrise,sunset,local_datetime,daylight_duration,temperature_difference,thermal_comfort_index,season,temperature_category
0,Grenada,St. George's,-61.7485,12.0564,Clouds,few clouds,28.82,33.06,28.82,28.82,...,130,20,2025-03-08 10:18:30,2025-03-08 22:17:11,2025-03-08 12:24:11,11.978056,0.0,25.43424,Winter,Hot
1,Switzerland,Bern,7.4474,46.9481,Clear,clear sky,15.64,14.2,15.27,16.42,...,320,0,2025-03-08 05:56:57,2025-03-08 17:25:16,2025-03-08 17:21:31,11.471944,1.15,14.62072,Winter,Moderate
2,Sierra Leone,Freetown,-13.2299,8.484,Clouds,scattered clouds,27.84,30.87,27.84,27.84,...,250,40,2025-03-08 07:03:19,2025-03-08 19:04:18,2025-03-08 16:24:13,12.016389,0.0,24.69838,Winter,Hot
3,Hungary,Budapest,19.0399,47.498,Clear,clear sky,17.62,16.22,17.11,19.29,...,90,0,2025-03-08 05:11:00,2025-03-08 16:38:30,2025-03-08 17:20:02,11.458333,2.18,16.1108,Winter,Moderate
4,Taiwan,Taipei,121.5319,25.0478,Clouds,broken clouds,16.17,15.57,15.74,16.71,...,100,75,2025-03-08 22:09:28,2025-03-09 09:59:44,2025-03-09 00:23:28,11.837778,0.97,14.72571,Winter,Moderate


#### Rename columns with their units:

In [31]:
# Dictionary for renaming columns with their units
rename_columns = {
    "lon": "longitude",
    "lat": "latitude",
    "main": "weather_condition",
    "description": "weather_description",
    "temp": "temperature",
    "feels_like": "feels_like_temperature",
    "temp_min": "min_temperature",
    "tem_max": "max_temperature",
    "sea_level": "sea_level_pressure",
    "grnd_level": "ground_level_pressure",
    "speed": "wind_speed",
    "deg": "wind_direction",
    "clouds": "cloud_cover",
    "sunrise": "sunrise_time",
    "sunset": "sunset_time",
}

# Rename the DataFrame columns
df_weather.rename(columns=rename_columns, inplace=True)

df_weather.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 26 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   country                 240 non-null    object        
 1   city                    240 non-null    object        
 2   longitude               240 non-null    float64       
 3   latitude                240 non-null    float64       
 4   weather_condition       240 non-null    object        
 5   weather_description     240 non-null    object        
 6   temperature             240 non-null    float64       
 7   feels_like_temperature  240 non-null    float64       
 8   min_temperature         240 non-null    float64       
 9   max_temperature         240 non-null    float64       
 10  pressure                240 non-null    int64         
 11  humidity                240 non-null    int64         
 12  sea_level_pressure      240 non-null    int64     

#### Sorted the order of the columns:

In [32]:
# List of columns in a logical order
column_order = [
    # Geographic information
    "country", "city", "longitude", "latitude",
    
    # General weather data
    "weather_condition", "weather_description",
    
    # Temperature-related data
    "temperature", "temperature_category", "feels_like_temperature", "min_temperature", "max_temperature",
    "temperature_difference", "thermal_comfort_index",
    
    # Pressure and humidity data
    "pressure", "sea_level_pressure", "ground_level_pressure", "humidity",
    
    # Visibility and wind data
    "visibility", "wind_speed", "wind_direction",
    
    # Cloudiness data
    "cloud_cover",
    
    # Temporal data
    "sunrise_time", "sunset_time", "daylight_duration", "local_datetime",
    
    # Derived information
    "season"
]

# Reorganize the columns
df_weather = df_weather[column_order]

df_weather.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 240 entries, 0 to 239
Data columns (total 26 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   country                 240 non-null    object        
 1   city                    240 non-null    object        
 2   longitude               240 non-null    float64       
 3   latitude                240 non-null    float64       
 4   weather_condition       240 non-null    object        
 5   weather_description     240 non-null    object        
 6   temperature             240 non-null    float64       
 7   temperature_category    240 non-null    object        
 8   feels_like_temperature  240 non-null    float64       
 9   min_temperature         240 non-null    float64       
 10  max_temperature         240 non-null    float64       
 11  temperature_difference  240 non-null    float64       
 12  thermal_comfort_index   240 non-null    float64   