# Weather Data Ingestion (ERA5)

**Project Phase:** 2. Data Integration and Preprocessing
**Goal:** Download historical weather data for Norway, Denmark, and Spain.

### Data Source
We use **ERA5 climate reanalysis** data from the **Copernicus Climate Change Service (CDS)**.
ERA5 provides hourly estimates of atmospheric variables and represents the current state-of-the-art in climate reanalysis.

### Requirements
* **Timeframe:** 10 years (2015 - 2025)
* **Resolution:** Hourly
* **Format:** NetCDF (standard for multidimensional climate data)
* **License:** Copernicus License

In [None]:
import cdsapi
import os
from pathlib import Path
from dotenv import load_dotenv

# 1. Setup Paths
# We navigate up two levels to reach the project root
current_dir = Path(os.getcwd())
project_dir = current_dir.parent.parent
secrets_path = project_dir / 'config' / 'secrets.env'

# Define output directory for raw weather data
output_dir = project_dir / 'data' / 'raw' / 'weather'
output_dir.mkdir(parents=True, exist_ok=True)

print(f"Output directory: {output_dir}")

# 2. Load API Credentials
# We load the CDS_URL and CDS_KEY from the secrets.env file
if secrets_path.exists():
    load_dotenv(secrets_path)
    print("Credentials loaded from secrets.env")
else:
    print(f"Error: Secrets file not found at {secrets_path}")

# 3. Initialize CDS Client
try:
    cds_url = os.getenv('CDS_URL')
    cds_key = os.getenv('CDS_KEY')
    
    if not cds_url or not cds_key:
        raise ValueError("Missing CDS_URL or CDS_KEY in secrets file.")
        
    # Initialize the client with the specific key
    client = cdsapi.Client(url=cds_url, key=cds_key)
    print("CDS API Client initialized successfully.")
    
except Exception as e:
    print(f"Initialization failed: {e}")

## Configuration

We define the parameters strictly according to the project proposal.

### [cite_start]Variables [cite: 71]
1.  **2m temperature**: Air temperature at 2 meters height.
2.  **Total precipitation**: Rain and snow accumulation.
3.  **10m u-component of wind**: Eastward wind component.
4.  **10m v-component of wind**: Northward wind component.
5.  **Surface solar radiation downwards**: Solar irradiance (for solar power).

*Note: We download U and V wind components to calculate exact wind speed and direction later.*

### [cite_start]Locations [cite: 28, 55-57]
We define bounding boxes (North, West, South, East) for:
* **DK1** (Denmark West)
* **ES** (Spain)
* **NO2** (Norway - Covering Southern/Southwestern region)

In [None]:
# Timeframe: 10 Years
YEARS = [str(year) for year in range(2015, 2025)]
MONTHS = [f"{month:02d}" for month in range(1, 13)]

# Variables required for forecasting models
VARIABLES = [
    '2m_temperature',
    'total_precipitation',
    '10m_u_component_of_wind',
    '10m_v_component_of_wind',
    'surface_solar_radiation_downwards'
]

# Geographical Bounding Boxes [North, West, South, East]
# These boxes cover the relevant bidding zones
AREAS = {
    'DK1': [58, 7, 54, 16],      # Denmark West
    'ES':  [44, -10, 35, 5],     # Spain
    'NO2': [62, 4, 57, 10]       # Norway South (NO2)
}

print("Job Configuration:")
print(f"  Years: {len(YEARS)} (2015-2024)")
print(f"  Variables: {len(VARIABLES)}")
print(f"  Zones: {list(AREAS.keys())}")

## Data Download

The following function downloads data in **monthly chunks**.
This is necessary because the ERA5 API has limits on request size. Downloading 10 years in one request would fail.

**Logic:**
1.  Check if the file already exists (to avoid re-downloading).
2.  If not, send a request to the Copernicus API.
3.  Save the result as a `.nc` (NetCDF) file.

In [None]:
def download_era5_month(year, month, zone_name, zone_coords):
    """
    Downloads one month of ERA5 data for a specific zone.
    """
    # Define filename
    file_name = f"era5_{zone_name}_{year}_{month}.nc"
    file_path = output_dir / file_name
    
    # Skip if file already exists
    if file_path.exists():
        print(f"Skipping {file_name} (Already exists)")
        return

    print(f"Requesting {file_name}...")
    
    try:
        # API Request
        client.retrieve(
            'reanalysis-era5-single-levels',
            {
                'product_type': 'reanalysis',
                'format': 'netcdf',
                'variable': VARIABLES,
                'year': year,
                'month': month,
                'day': [
                    '01', '02', '03', '04', '05', '06',
                    '07', '08', '09', '10', '11', '12',
                    '13', '14', '15', '16', '17', '18',
                    '19', '20', '21', '22', '23', '24',
                    '25', '26', '27', '28', '29', '30', '31'
                ],
                'time': [
                    '00:00', '01:00', '02:00', '03:00', '04:00', '05:00',
                    '06:00', '07:00', '08:00', '09:00', '10:00', '11:00',
                    '12:00', '13:00', '14:00', '15:00', '16:00', '17:00',
                    '18:00', '19:00', '20:00', '21:00', '22:00', '23:00'
                ],
                'area': zone_coords,
            },
            str(file_path)
        )
        print(f"Success: {file_name}")
        
    except Exception as e:
        print(f"Failed to download {file_name}: {e}")

# --- Execution Loop ---
print("Starting download queue...")
print("Note: This process may take time depending on the CDS queue.")

for zone, coords in AREAS.items():
    print(f"\nProcessing Zone: {zone}")
    print("-" * 30)
    
    for year in YEARS:
        for month in MONTHS:
            download_era5_month(year, month, zone, coords)

print("\nAll downloads completed.")