# OpenMeteo API Exploration

Initial investigation of the OpenMeteo API to understand:
- Endpoint structure and parameters
- Response schema and metadata
- Error handling
- Data characteristics

Keeping abstraction in mind - what's OpenMeteo-specific vs generic?

**API Docs**: https://open-meteo.com/en/docs

In [2]:
import requests
from datetime import datetime, timezone
import json

## 1. Basic Request

Endpoint: `/v1/forecast`

Required params:
- `latitude`, `longitude` (WGS84 coordinates)

Optional:
- `hourly`, `daily`, `current` - weather variables to return
- `timezone` - defaults to GMT
- `forecast_days` - 1 to 16 (default 7)

In [3]:
BASE_URL = "https://api.open-meteo.com/v1/forecast"

params = {
    "latitude": 51.5074,
    "longitude": -0.1278,
    "hourly": "temperature_2m"
}

response = requests.get(BASE_URL, params=params)
print(f"Status: {response.status_code}")
print(f"URL: {response.url}")

Status: 200
URL: https://api.open-meteo.com/v1/forecast?latitude=51.5074&longitude=-0.1278&hourly=temperature_2m


In [None]:
data = response.json()
print(json.dumps(data, indent=2)[:500])

{
  "latitude": 51.5,
  "longitude": -0.120000124,
  "generationtime_ms": 0.05412101745605469,
  "utc_offset_seconds": 0,
  "timezone": "GMT",
  "timezone_abbreviation": "GMT",
  "elevation": 16.0,
  "hourly_units": {
    "time": "iso8601",
    "temperature_2m": "\u00b0C"
  },
  "hourly": {
    "time": [
      "2025-12-23T00:00",
      "2025-12-23T01:00",
      "2025-12-23T02:00",
      "2025-12-23T03:00",
      "2025-12-23T04:00",
      "2025-12-23T05:00",
      "2025-12-23T06:00",
      "2025-


## 2. Response Structure

Expected fields from docs:
- `latitude`, `longitude` - actual coords used (may differ due to grid snapping)
- `elevation` - from 90m digital elevation model
- `generationtime_ms` - API processing time
- `utc_offset_seconds` - timezone offset applied
- `timezone`, `timezone_abbreviation`
- `hourly` / `daily` / `current` - actual weather data
- `hourly_units` / `daily_units` - units for each variable

In [4]:
print("Top-level keys:", list(data.keys()))

Top-level keys: ['latitude', 'longitude', 'generationtime_ms', 'utc_offset_seconds', 'timezone', 'timezone_abbreviation', 'elevation', 'hourly_units', 'hourly']


In [7]:
# API-provided metadata
metadata_keys = ['latitude', 'longitude', 'elevation', 'generationtime_ms', 
                 'utc_offset_seconds', 'timezone', 'timezone_abbreviation']

print("API Metadata:")
for k in metadata_keys:
    if k in data:
        print(f"  {k}: {data[k]}")

API Metadata:
  latitude: 51.5
  longitude: -0.120000124
  elevation: 16.0
  generationtime_ms: 0.04780292510986328
  utc_offset_seconds: 0
  timezone: GMT
  timezone_abbreviation: GMT


In [9]:
# Hourly data structure
if 'hourly' in data:
    print("Hourly keys:", list(data['hourly'].keys()))
    print(f"Number of time points: {len(data['hourly']['time'])}")
    print(f"\nFirst 5 times: {data['hourly']['time'][:5]}")
    print(f"First 5 temps: {data['hourly']['temperature_2m'][:5]}")

if 'hourly_units' in data:
    print(f"\nUnits: {data['hourly_units']}")

Hourly keys: ['time', 'temperature_2m']
Number of time points: 168

First 5 times: ['2025-12-22T00:00', '2025-12-22T01:00', '2025-12-22T02:00', '2025-12-22T03:00', '2025-12-22T04:00']
First 5 temps: [10.3, 10.1, 10.0, 9.8, 9.5]

Units: {'time': 'iso8601', 'temperature_2m': '°C'}


## 3. Client-Side Metadata

What we can capture from the request itself (not API-specific):

In [10]:
ingestion_timestamp = datetime.now(timezone.utc)
response = requests.get(BASE_URL, params=params)

client_metadata = {
    "ingestion_timestamp": ingestion_timestamp.isoformat(),
    "request_url": response.url,
    "status_code": response.status_code,
    "elapsed_ms": response.elapsed.total_seconds() * 1000,
    "response_size_bytes": len(response.content),
}

print("Client-side metadata (generic to any API):")
for k, v in client_metadata.items():
    print(f"  {k}: {v}")

Client-side metadata (generic to any API):
  ingestion_timestamp: 2025-12-23T00:16:28.675704+00:00
  request_url: https://api.open-meteo.com/v1/forecast?latitude=51.5074&longitude=-0.1278&hourly=temperature_2m
  status_code: 200
  elapsed_ms: 133.715
  response_size_bytes: 4129


In [None]:
## 3b. Why Metadata Matters (Brief Context)

"""
The brief notes: "The state of the data at the time the API was called might be useful."

Gemini: Weather forecasts are mutable - calling the API at 9am vs 3pm gives different forecasts 
for the same target date. Without metadata, we lose the ability to:
- Audit: What forecast did we act on at time T?
- Debug: Why did our model make decision X?
- Analyse: How accurate was the 3-day forecast vs 1-day?
"""

## 3c. Combined Metadata View

# Captured per ingestion
ingestion_record = {
    # Client-side (generic)
    "run_id": "uuid-would-go-here",
    "ingestion_timestamp": ingestion_timestamp.isoformat(),
    "request_url": response.url,
    "elapsed_ms": response.elapsed.total_seconds() * 1000,
    "status_code": response.status_code,
    
    # API-provided (OpenMeteo-specific)
    "api_generation_time_ms": data['generationtime_ms'],
    "api_latitude": data['latitude'],
    "api_longitude": data['longitude'],
    "api_elevation": data['elevation'],
    "api_timezone": data['timezone'],
    "api_utc_offset_seconds": data['utc_offset_seconds'],
}

print("Complete metadata record:")
for k, v in ingestion_record.items():
    print(f"  {k}: {v}")

Complete metadata record:
  run_id: uuid-would-go-here
  ingestion_timestamp: 2025-12-23T00:16:03.380296+00:00
  request_url: https://api.open-meteo.com/v1/forecast?latitude=51.5074&longitude=-0.1278&hourly=temperature_2m
  elapsed_ms: 136.93099999999998
  status_code: 200
  api_generation_time_ms: 0.05412101745605469
  api_latitude: 51.5
  api_longitude: -0.120000124
  api_elevation: 16.0
  api_timezone: GMT
  api_utc_offset_seconds: 0


## 4. Coordinate Snapping

API docs note: *"WGS84 of the center of the weather grid-cell which was used to generate this forecast. This coordinate might be a few kilometres away from the requested coordinate."*

In [11]:
requested = {"lat": 51.5074, "lon": -0.1278}
returned = {"lat": data['latitude'], "lon": data['longitude']}

print(f"Requested: ({requested['lat']}, {requested['lon']})")
print(f"Returned:  ({returned['lat']}, {returned['lon']})")
print(f"Difference: ({abs(requested['lat'] - returned['lat']):.4f}, {abs(requested['lon'] - returned['lon']):.4f})")
print(f"Elevation used: {data.get('elevation')}m")

Requested: (51.5074, -0.1278)
Returned:  (51.5, -0.120000124)
Difference: (0.0074, 0.0078)
Elevation used: 16.0m


## 5. Error Handling

From docs: *"In case an error occurs... a JSON error object is returned with HTTP 400 status code"*

```json
{"error": true, "reason": "Cannot initialize WeatherVariable from invalid String value..."}
```

In [11]:
# Invalid variable name
bad_response = requests.get(BASE_URL, params={
    "latitude": 51.5,
    "longitude": -0.1,
    "hourly": "invalid_variable_name"
})
print(f"Status: {bad_response.status_code}")
print(f"Response: {json.dumps(bad_response.json(), indent=2)}")

Status: 400
Response: {
  "error": true,
  "reason": "Data corrupted at path ''. Cannot initialize SurfacePressureAndHeightVariable<VariableAndPreviousDay, ForecastPressureVariable, ForecastHeightVariable> from invalid String value invalid_variable_name."
}


In [12]:
# Missing required param (longitude)
missing_response = requests.get(BASE_URL, params={"latitude": 51.5})
print(f"Status: {missing_response.status_code}")
print(f"Response: {json.dumps(missing_response.json(), indent=2)}")

Status: 400
Response: {
  "error": true,
  "reason": "Parameter 'latitude' and 'longitude' must have the same number of elements"
}


## 6. Different Intervals

Available intervals:
- `hourly` - 168 hours (7 days default)
- `daily` - aggregations (max, min, sum, etc.)
- `current` - single point-in-time
- `minutely_15` - 15-min intervals (limited regions)

In [14]:
# Hourly
hourly_resp = requests.get(BASE_URL, params={
    "latitude": 51.5, "longitude": -0.1,
    "hourly": "temperature_2m"
}).json()
print(f"Hourly: {len(hourly_resp['hourly']['time'])} data points")

Hourly: 168 data points


In [15]:
# Daily - note different variable names (temperature_2m_max vs temperature_2m)
daily_resp = requests.get(BASE_URL, params={
    "latitude": 51.5, "longitude": -0.1,
    "daily": "temperature_2m_max,temperature_2m_min,precipitation_sum",
    "timezone": "UTC"  # Required for daily
}).json()

print(f"Daily: {len(daily_resp['daily']['time'])} data points")
print(f"Daily keys: {list(daily_resp['daily'].keys())}")
print(f"\nSample: {list(zip(daily_resp['daily']['time'][:3], daily_resp['daily']['temperature_2m_max'][:3]))}")

Daily: 7 data points
Daily keys: ['time', 'temperature_2m_max', 'temperature_2m_min', 'precipitation_sum']

Sample: [('2025-12-22', 11.1), ('2025-12-23', 9.3), ('2025-12-24', 6.6)]


In [16]:
# Current conditions
current_resp = requests.get(BASE_URL, params={
    "latitude": 51.5, "longitude": -0.1,
    "current": "temperature_2m,precipitation,weather_code"
}).json()

print(f"Current keys: {list(current_resp['current'].keys())}")
print(f"Current data: {current_resp['current']}")

Current keys: ['time', 'interval', 'temperature_2m', 'precipitation', 'weather_code']
Current data: {'time': '2025-12-22T23:45', 'interval': 900, 'temperature_2m': 9.2, 'precipitation': 0.0, 'weather_code': 3}


## 7. Multiple Variables & Response Size

In [17]:
# Single variable
single = requests.get(BASE_URL, params={
    "latitude": 51.5, "longitude": -0.1,
    "hourly": "temperature_2m"
})
print(f"Single variable: {len(single.content):,} bytes")

# Multiple variables
multi = requests.get(BASE_URL, params={
    "latitude": 51.5, "longitude": -0.1,
    "hourly": "temperature_2m,precipitation,wind_speed_10m,relative_humidity_2m,cloud_cover"
})
print(f"Five variables: {len(multi.content):,} bytes")

# Extended forecast
extended = requests.get(BASE_URL, params={
    "latitude": 51.5, "longitude": -0.1,
    "hourly": "temperature_2m",
    "forecast_days": 16
})
extended_data = extended.json()
print(f"16-day forecast: {len(extended.content):,} bytes ({len(extended_data['hourly']['time'])} hours)")

Single variable: 4,139 bytes
Five variables: 7,006 bytes
16-day forecast: 9,105 bytes (384 hours)


## 8. Multiple Locations

API supports comma-separated coordinates. Response changes to list structure.

In [18]:
multi_loc = requests.get(BASE_URL, params={
    "latitude": "51.5,40.7,35.7",  # London, NYC, Tokyo
    "longitude": "-0.1,-74.0,139.7",
    "hourly": "temperature_2m"
})

multi_data = multi_loc.json()
print(f"Type: {type(multi_data)}")
print(f"Number of locations: {len(multi_data) if isinstance(multi_data, list) else 1}")

if isinstance(multi_data, list):
    for i, loc in enumerate(multi_data):
        print(f"  Location {i}: ({loc['latitude']}, {loc['longitude']})")

Type: <class 'list'>
Number of locations: 3
  Location 0: (51.5, -0.10000014)
  Location 1: (40.710335, -73.99309)
  Location 2: (35.7, 139.6875)


## 9. Timezone Handling

In [19]:
# Default (GMT)
gmt_resp = requests.get(BASE_URL, params={
    "latitude": 51.5, "longitude": -0.1,
    "hourly": "temperature_2m"
}).json()
print(f"Default timezone: {gmt_resp['timezone']} (offset: {gmt_resp['utc_offset_seconds']}s)")
print(f"First time: {gmt_resp['hourly']['time'][0]}")

# Auto timezone
auto_resp = requests.get(BASE_URL, params={
    "latitude": 51.5, "longitude": -0.1,
    "hourly": "temperature_2m",
    "timezone": "auto"
}).json()
print(f"\nAuto timezone: {auto_resp['timezone']} (offset: {auto_resp['utc_offset_seconds']}s)")
print(f"First time: {auto_resp['hourly']['time'][0]}")

Default timezone: GMT (offset: 0s)
First time: 2025-12-22T00:00

Auto timezone: Europe/London (offset: 0s)
First time: 2025-12-22T00:00


## 10. Key Weather Variables

Useful hourly variables for a general weather pipeline:
- `temperature_2m` - Air temp at 2m (instant)
- `relative_humidity_2m` - Humidity (instant)
- `precipitation` - Rain + snow, preceding hour sum
- `wind_speed_10m`, `wind_direction_10m` - Wind at 10m
- `cloud_cover` - Total cloud cover %
- `weather_code` - WMO weather condition code

In [20]:
comprehensive = requests.get(BASE_URL, params={
    "latitude": 51.5, "longitude": -0.1,
    "hourly": "temperature_2m,relative_humidity_2m,precipitation,wind_speed_10m,wind_direction_10m,cloud_cover,weather_code",
    "timezone": "UTC"
}).json()

print("Variables returned:")
for var in comprehensive['hourly'].keys():
    if var != 'time':
        unit = comprehensive['hourly_units'].get(var, 'N/A')
        sample = comprehensive['hourly'][var][0]
        print(f"  {var}: {sample} {unit}")

Variables returned:
  temperature_2m: 10.3 °C
  relative_humidity_2m: 95 %
  precipitation: 0.3 mm
  wind_speed_10m: 12.6 km/h
  wind_direction_10m: 70 °
  cloud_cover: 100 %
  weather_code: 61 wmo code


### API-Specific Findings
- Grid snapping: Requested coords are snapped to model grid (51.5074 → 51.5, -0.1278 → -0.12)
- Elevation is returned (16.0m) - used for statistical downscaling
- `generationtime_ms` is very fast (~0.05ms) - API processing time, not network latency
- Multiple locations return a list, single location returns object - need to handle both
- Daily variables have different names than hourly (`temperature_2m_max` vs `temperature_2m`)
- `current` endpoint includes `interval: 900` (15-min resolution)
- Error format is consistent: `{"error": true, "reason": "..."}`

### Generic Patterns
- Client metadata (timestamp, elapsed_ms, status_code) applies to any REST API
- Response structure: metadata + data + units is common pattern
- Coordinate snapping means storing both requested AND returned coords
- Error handling: check for `error` key in response, not just HTTP status

### Config Candidates
- `base_url` - endpoint
- `locations` - list of lat/lon pairs with names
- `hourly_variables` - which metrics to fetch
- `daily_variables` - which daily aggregations
- `forecast_days` - 1 to 16
- `timezone` - "UTC" or "auto"
- `interval` - determines valid variables and storage destination

### Data Model Considerations
- Hourly: 168 rows per location per ingestion (7 days × 24 hours)
- Daily: 7 rows per location per ingestion
- Hourly and daily have different schemas - can't mix in same table:
  - Hourly: `temperature_2m` 
  - Daily: `temperature_2m_max`, `temperature_2m_min`
- Recommendation: separate storage paths per interval (`/hourly/`, `/daily/`)
- Response size scales linearly (~4KB single var → ~7KB five vars)
- Need to flatten nested structure: `hourly.time[]` + `hourly.temperature_2m[]` → rows
- Store units alongside data or in separate metadata
- Partition by: date + location makes sense given data volume
- Forecast mutability: same target date will have multiple forecasts over time - `ingestion_timestamp` is essential for distinguishing them