<h3>
  <span style="color:green; font-weight:bold; background-color:#f0f0f0; padding:6px; border-radius:4px; font-size:20px;">
    Fetching Hourly Weather Data from Meteostat
  </span>
</h3>

Imports the required libraries, defines the Central Park location and the 2018 time range, <br>
and obtains hourly weather observations from the Meteostat API.

In [2]:
from datetime import datetime
import os

import pandas as pd
from meteostat import Point, Hourly

In [3]:
# Central Park coordinates
cp = Point(40.7812, -73.9665)

# 2018 time range
start = datetime(2018, 1, 1)
end = datetime(2019, 1, 1)

# Fetch hourly weather data from Meteostat
data = Hourly(cp, start, end)
df_weather = data.fetch()

df_weather.head()

Unnamed: 0_level_0,temp,dwpt,rhum,prcp,snow,wdir,wspd,wpgt,pres,tsun,coco
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2018-01-01 00:00:00,-10.6,-20.0,46.0,,,320.0,18.4,,1026.7,,
2018-01-01 01:00:00,-9.4,-14.9,64.0,0.0,,330.0,5.4,,1026.7,,
2018-01-01 02:00:00,-10.0,-16.1,61.0,0.0,,320.0,7.6,,1027.0,,
2018-01-01 03:00:00,-10.6,-17.2,58.0,0.0,,330.0,5.4,,1026.8,,
2018-01-01 04:00:00,-10.6,-15.0,70.0,0.0,,320.0,7.6,,1027.1,,


<h3>
  <span style="color:green; font-weight:bold; background-color:#f0f0f0; padding:6px; border-radius:4px; font-size:20px;">
    Inspecting Missing Values and Time Coverage
  </span>
</h3>

Summarizes missing values for each weather variable, checks the start and end of the time period, <br>
and verifies that the data form a regular hourly time series over 2018.


In [4]:
# Missing values summary
print("Missing values (count):\n", df_weather.isna().sum(), "\n")
print("Missing values (percentage):\n", df_weather.isna().mean(), "\n")

# Time coverage and sampling frequency
print("Time range:", df_weather.index.min(), "→", df_weather.index.max())
print("Number of rows:", len(df_weather))

freq_counts = df_weather.index.to_series().diff().value_counts()
print("\nTime step distribution (delta between rows):")
print(freq_counts)


Missing values (count):
 temp       0
dwpt       0
rhum       0
prcp     351
snow    8761
wdir       4
wspd       0
wpgt    8761
pres       8
tsun    8761
coco    2998
dtype: int64 

Missing values (percentage):
 temp    0.000000
dwpt    0.000000
rhum    0.000000
prcp    0.040064
snow    1.000000
wdir    0.000457
wspd    0.000000
wpgt    1.000000
pres    0.000913
tsun    1.000000
coco    0.342198
dtype: float64 

Time range: 2018-01-01 00:00:00 → 2019-01-01 00:00:00
Number of rows: 8761

Time step distribution (delta between rows):
time
0 days 01:00:00    8760
Name: count, dtype: int64


<h3>
  <span style="color:green; font-weight:bold; background-color:#f0f0f0; padding:6px; border-radius:4px; font-size:20px;">
    Cleaning Columns and Creating Derived Weather Features
  </span>
</h3>

Resets the index to obtain a <code>datetime</code> column, renames and converts core weather variables <br>
(temperature, pressure, wind speed, precipitation) to analysis-friendly units, and defines simple <br>
indicator variables for dry, rainy, and snow-like conditions.


In [5]:
# Move index to a proper datetime column
df_weather_out = (
    df_weather
    .reset_index()
    .rename(columns={"time": "datetime"})
    .sort_values("datetime")
)

# Keep original Meteostat variables but also create analysis-friendly columns
df_weather_out["temp_C"] = df_weather_out["temp"]                # already in °C

# Convert pressure from hPa to inHg
df_weather_out["pressure_inHg"] = df_weather_out["pres"] * 0.0295299830714

# Convert wind speed from km/h to knots
df_weather_out["wind_speed_kn"] = df_weather_out["wspd"] * 0.539957

# Convert precipitation from mm to inches
df_weather_out["precip_in"] = df_weather_out["prcp"] * 0.0393701

# Simple weather condition flags
df_weather_out["is_rain"] = (df_weather_out["prcp"] > 0) & (df_weather_out["temp_C"] > 0)
df_weather_out["is_snow_like"] = (df_weather_out["prcp"] > 0) & (df_weather_out["temp_C"] <= 0)
df_weather_out["is_dry"] = df_weather_out["prcp"] == 0

df_weather_out.head()


Unnamed: 0,datetime,temp,dwpt,rhum,prcp,snow,wdir,wspd,wpgt,pres,tsun,coco,temp_C,pressure_inHg,wind_speed_kn,precip_in,is_rain,is_snow_like,is_dry
0,2018-01-01 00:00:00,-10.6,-20.0,46.0,,,320.0,18.4,,1026.7,,,-10.6,30.318434,9.935209,,False,,
1,2018-01-01 01:00:00,-9.4,-14.9,64.0,0.0,,330.0,5.4,,1026.7,,,-9.4,30.318434,2.915768,0.0,False,False,True
2,2018-01-01 02:00:00,-10.0,-16.1,61.0,0.0,,320.0,7.6,,1027.0,,,-10.0,30.327293,4.103673,0.0,False,False,True
3,2018-01-01 03:00:00,-10.6,-17.2,58.0,0.0,,330.0,5.4,,1026.8,,,-10.6,30.321387,2.915768,0.0,False,False,True
4,2018-01-01 04:00:00,-10.6,-15.0,70.0,0.0,,320.0,7.6,,1027.1,,,-10.6,30.330246,4.103673,0.0,False,False,True


<h3>
  <span style="color:green; font-weight:bold; background-color:#f0f0f0; padding:6px; border-radius:4px; font-size:20px;">
    Saving Processed Meteostat Weather to CSV
  </span>
</h3>

Stores the processed hourly weather dataset as a CSV file in the project directory, so it can be <br>
directly used in subsequent analysis and merged with the CitiBike trips data.


In [6]:
output_dir = "../data/Weather_Data/processed/"
os.makedirs(output_dir, exist_ok=True)

output_file = os.path.join(output_dir, "central_park_weather_2018_meteostat_hourly.csv")
df_weather_out.to_csv(output_file, index=False)

print(f"Rows saved: {len(df_weather_out)}")
print(f"Saved file: {output_file}")


Rows saved: 8761
Saved file: ../data/Weather_Data/processed/central_park_weather_2018_meteostat_hourly.csv
