# open-source-marginal-emissions

## weather_data_retrieval

The purpose of this notebook is to provide an interactive way to retrieve data from either:
* CDS API (Copernicus Data Store) for ERA5 data
* Open-Meteo API

## Code

Actuaally I will give you CDS_ERA5, then file_management, then session management, then prompts, then validation, then cli, then orchestrator, then main, then logging.


I actuall don't have config loader built yet

### Libraries

In [163]:
from __future__ import annotations

import argparse
import re
import os
import sys
import time
import json
import math
import hashlib
import getpass
import calendar
import requests
from dataclasses import dataclass
from datetime import datetime, timedelta
from pathlib import Path
from typing import List, Tuple, Optional, Iterable

import cdsapi
from tqdm import tqdm

# Optional MPI
try:
    from mpi4py import MPI  # type: ignore
    MPI_AVAILABLE = True
except Exception:
    MPI_AVAILABLE = False

from concurrent.futures import ThreadPoolExecutor, as_completed

### Paths

In [164]:
root_dir = os.path.abspath(os.path.join(os.getcwd(), ".."))
data_dir = os.path.join(root_dir, "data")
raw_data_dir = os.path.join(data_dir, "raw")
input_dir = os.path.join(root_dir, "input")

### Inputs

In [165]:
default_save_dir = raw_data_dir
default_input_dir = input_dir

### Main Code

#### Backend Execution Functions

##### Downloading

##### Other Helpers

In [219]:
import os

def print_detailed_directory_contents(path='.'):
    """Print detailed directory contents with file types."""
    for item in os.listdir(path):
        item_path = os.path.join(path, item)
        if os.path.isfile(item_path):
            print(f"📄 {item} (file)")
        elif os.path.isdir(item_path):
            print(f"📁 {item}/ (directory)")
        else:
            print(f"❓ {item} (other)")

# Usage
print_detailed_directory_contents("../weather_data_retrieval")

📄 .DS_Store (file)
📁 io/ (directory)
📁 utils/ (directory)
📄 orchestrator.py (file)
📁 sources/ (directory)
📄 structure.txt (file)
📄 main.py (file)


In [220]:
def print_directory_contents(path='.'):
    """Print all files and directories in the given path."""
    for item in os.listdir(path):
        print(item)

# Usage
print_directory_contents()  # Current directory
print_directory_contents('../weather_data_retrieval')

.DS_Store
weather_data_retrieval.ipynb
.DS_Store
io
utils
orchestrator.py
sources
structure.txt
main.py


In [227]:
import os

def print_directory_tree(path='.', indent=0):
    """Print directory tree structure recursively."""
    for item in os.listdir(path):
        item_path = os.path.join(path, item)
        if os.path.isdir(item_path):
            print("  " * indent + f"📁 {item}/")
            print_directory_tree(item_path, indent + 1)
        else:
            print("  " * indent + f"📄 {item}")

print_directory_tree(path='..')

📁 weather_data_retrieval/
  📄 runner.py
  📄 .DS_Store
  📁 io/
    📄 config_loader.py
    📄 __init__.py
    📁 __pycache__/
      📄 cli.cpython-311.pyc
      📄 __init__.cpython-311.pyc
    📄 prompts.py
    📄 cli.py
  📄 __init__.py
  📁 utils/
    📄 logging.py
    📄 session_management.py
    📄 __init__.py
    📁 __pycache__/
      📄 logging.cpython-311.pyc
      📄 data_validation.cpython-311.pyc
      📄 session_management.cpython-311.pyc
      📄 __init__.cpython-311.pyc
    📄 data_validation.py
    📄 file_management.py
  📁 __pycache__/
    📄 main.cpython-311.pyc
    📄 __init__.cpython-311.pyc
  📁 sources/
    📄 __init__.py
    📄 cds_era5.py
    📄 open_meteo.py
  📄 main.py
  📄 __main__.py
📄 .DS_Store
📄 touch
📁 input/
  📄 download_request.json
📄 pyproject.toml
📁 tests/
📄 README.md
📄 .gitignore
📄 core_concepts_and_definitions copy.md
📁 .git/
  📄 config
  📁 objects/
    📁 92/
      📄 bb78e95fa33ddb3cf06ebef92676d4ef828630
      📄 26f88addadda5ba0214d1bd3beb246f8396048
    📁 50/
      📄 08ddfcf5

#### Program Utilities

##### Core I/O

##### Connections (API & Internet)

##### File Sizes and Download Times

##### Printing and Logging

#### User Facing Functions

Interactive, prompts, and other functions that the user will directly interact with.

##### Main Runners and Orchestrators

##### Prompting Functions

##### Backend - Execution

#### Downloading

##### Utilities: Warnings, Systems Checks, Information

##### Main Function

In [216]:
def main():
    """
    Orchestrates full data retrieval process.
    """
    print("=" * 60)
    print("Welcome to the Weather Data Retrieval Tool")
    print("=" * 60)
    print("\nThis tool will guide you through downloading weather data from\n - The Copernicus Climate Data Store (CDS) using the CDS API\n - The Open-Meteo API (not yet implemented)")
    print("\nYou will be prompted to provide information such as:\n - API credentials and connection details, which dataset to access,\n   desired variables, time range, and geographic area.")
    print("\nThe tool will assist you in estimating download sizes and times based on your selections,\n handling parallel downloads, and managing existing files.\n")
    print("\nFor more details on the CDS or Open-Meteo datasets and APIs, please visit their websites:\n - https://cds.climate.copernicus.eu/ \n - https://open-meteo.com/")
    print("\nYou may type 'exit' at any time to quit.\n" + "-"*60 + "\n")

    session = SessionState()

    # --- Prompt wizard (all interactive steps) ---
    completed = run_prompt_wizard(session)
    if not completed:
        print("Exiting.")
        return

    # === Estimation & Confirmation ===
    print("\nRunning speed test (quick heuristic)...")
    speed_mbps = internet_speedtest(test_urls=None, max_seconds=10)

    estimates = estimate_cds_download(
        variables=session.get("variables"),
        area=session.get("region_bounds"),
        start_date=session.get("start_date"),
        end_date=session.get("end_date"),
        observed_speed_mbps=speed_mbps
    )
    parallel_conf = session.get("parallel_settings")
    if parallel_conf["enabled"]:
        efficiency_factor = 0.75  # less than ideal scaling
        est_parallel_time = estimates["total_time_min"] / (parallel_conf["max_concurrent"] * efficiency_factor)
        estimates["total_time_min"] = est_parallel_time

    summary = build_download_summary(session, estimates, speed_mbps)
    cont = prompt_continue_confirmation(summary)
    if cont in ("__EXIT__", "__BACK__"):
        if cont == "__BACK__":
            # allow re-tuning parallel settings then resume
            session.unset("parallel_settings")
            return main()
        return
    if not cont:
        print("Download cancelled. Exiting.")
        return

    # === Begin downloads ===
    print("\nStarting ERA5 data retrieval...\n")
    start = datetime.strptime(session.get("start_date"), "%Y-%m-%d")
    end = datetime.strptime(session.get("end_date"), "%Y-%m-%d")

    successful_downloads, failed_downloads, skipped_downloads = [], [], []
    coord_str = format_coordinates_nwse(session.get('region_bounds'))
    hash_str = generate_filename_hash(
        session.get('dataset_short_name'),
        session.get('variables'),
        session.get('region_bounds')
    )
    filename_base = f"{session.get('dataset_short_name')}_{coord_str}_{hash_str}"

    orchestrate_cds_downloads(
        session=session,
        start=start,
        end=end,
        filename_base=filename_base,
        successful_downloads=successful_downloads,
        failed_downloads=failed_downloads,
        skipped_downloads=skipped_downloads,
    )
    print("\nAll downloads processed successfully (see lists above).")
    print(f"   Successful downloads   : {len(successful_downloads)}")
    print(f"   Skipped downloads      : {len(skipped_downloads)}")
    print(f"   Failed downloads       : {len(failed_downloads)}")


### Running Code

In [217]:
if __name__ == "__main__":
    try:
        args = parse_args().parse_args()
        if args.requirements:
            # Batch mode
            sys.exit(
                run_batch_from_config(
                    args.requirements,
                    assumed_mbps=args.assumed_mbps,
                    log_file=args.log_file,
                    quiet=args.quiet,
                )
            )
        else:
            # Interactive mode
            print("=" * 60)
            print("Welcome to the Weather Data Retrieval Tool")
            print("=" * 60)
            print("\nThis tool supports:")
            print(" - Interactive wizard (no args)")
            print(" - Non-interactive batch (pass requirements JSON file)\n")
            sys.exit(run_interactive(no_speedtest=args.no_speedtest, assumed_mbps=args.assumed_mbps))

    except KeyboardInterrupt:
        print("\nUser interrupted. Exiting.")
        sys.exit(130)


usage: ipykernel_launcher.py [-h] [--log-file LOG_FILE] [requirements]
ipykernel_launcher.py: error: unrecognized arguments: --f=/Users/Daniel/Library/Jupyter/runtime/kernel-v3e72097a7c1bb7c2606fa9d415762bacb619347db.json


SystemExit: 2

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


In [None]:
def prompt_config_source(session: SessionState) -> str:
    """
    Ask user if they want to load config from file or enter manually.

    Parameters
    ----------
    session : SessionState
        Current session state.

    """
    print("\nConfiguration Source Options:\n" + '-'*30)
    print("  1. Load configuration from file (download_request.json)")
    print("  2. Enter configuration manually")
    while True:
        raw = read_input("Configuration Source : Enter 1 to load from file or 2 to manually enter : ").strip().lower()
        if raw in ("__EXIT__", "__BACK__"):
            return raw
        if raw in ("1", "2"):
            print(f"You selected option {raw}.")
            session.set("config_source", raw)
            return raw

        print("Invalid choice. Please enter 1 or 2.")


In [None]:
main()

Welcome to the Weather Data Retrieval Tool

This tool will guide you through downloading weather data from
 - The Copernicus Climate Data Store (CDS) using the CDS API
 - The Open-Meteo API (not yet implemented)

You will be prompted to provide information such as:
 - API credentials and connection details, which dataset to access,
   desired variables, time range, and geographic area.

The tool will assist you in estimating download sizes and times based on your selections,
 handling parallel downloads, and managing existing files.


For more details on the CDS or Open-Meteo datasets and APIs, please visit their websites:
 - https://cds.climate.copernicus.eu/ 
 - https://open-meteo.com/

You may type 'exit' at any time to quit.
------------------------------------------------------------


Available Data providers:
	1. Copernicus Climate Data Store (CDS)
	2. Open-Meteo
You selected: CDS

Available datasets:
------------------------------
	1. ERA5-Land
	2. ERA5-World
You selected: ERA5

In [None]:
#=== SUMMARY ===
print("\n📊 Download Summary")
print(f"   Total months requested : {n_files}")
print(f"   Successful downloads   : {len(successful_downloads)}")
print(f"   Skipped downloads      : {len(skipped_downloads)}")
print(f"   Failed downloads       : {len(failed_downloads)}")
print(f"   Total time elapsed     : {format_duration(overall_elapsed)}")

if failed_downloads:
    print("   Failed months:", ", ".join([f"{y}-{m}" for y, m in failed_downloads]))

print(f"Files Downloaded to {save_dir}:")
for year, month in successful_downloads:
    print(f"   - {filename_base}_{year}-{month}.grib")


📊 Download Summary


NameError: name 'n_files' is not defined

What data source would you like to use?
* Open-Meteo
* CDS API (ERA5 datasets)
    * IF USING CDS API:
    * What dataset would you like to use?
        * ERA5-Land only
        * ERA5-World only
        * ERA5-Land where available, otherwise ERA5-World
        * ERA5-World where available, otherwise ERA5-Land

Options:
open-meteo
era5-land
era5-world
era5-land-then-era5-world
era5-world-then-era5-land

Time Period:
Over what time period would you like to retrieve data?
* Start date (YYYY-MM-DD):
* End date (YYYY-MM-DD):

How would you like to  specify the goegraphical area over which to retrieve data?
* Provide bounding box coordinates
* Select bounding box on map
* Provide Country Name
* Provide City and Country Name 

Over what geographical area would you like to retrieve data?
* North latitude (degrees):
* South latitude (degrees):
* East longitude (degrees):
* West longitude (degrees):

* What temporal resolution would you like the data at?
    * 2-Hourly (aggregates)
    * Hourly (native)
    * Half-hourly (interpolated)

* What geographic resolution would you like the data at?
    * 0.1° x 0.1° (approx. 11km x 11km) (only native for ERA5-Land)
    * 0.25° x 0.25° (approx. 28km x 28km)
    * 0.5° x 0.5° (approx. 55km x 55km)
    * 1.0° x 1.0° (approx. 111km x 111km)

| NAME | VAR NAME | Units
|------|----------|------|
| Surface pressure | surface_pressure | Pa
| Total cloud cover | total_cloud_cover | 0-1
| 10 metre U wind component | 10m_u_component_of_wind | m/s
| 10 metre V wind component | 10m_v_component_of_wind | m/s
| 2 metre temperature | 2m_temperature | K
| Low cloud cover | low_cloud_cover | 0-1
| Medium cloud cover | medium_cloud_cover | 0-1
| High cloud cover | high_cloud_cover | 0-1
| Instantaneous large-scale surface precipitation fraction | instantaneous_large_scale_surface_precipitation_fraction | 0-1
| 100 metre U wind component | 100m_u_component_of_wind | m/s
| 100 metre V wind component | 100m_v_component_of_wind | m/s
| Surface solar radiation downwards | surface_solar_radiation_downwards | J m**-2
| Surface thermal radiation downwards | surface_thermal_radiation_downwards | J m**-2
| Surface net solar radiation | surface_net_solar_radiation | J m**-2
| Top net solar radiation | top_net_solar_radiation | J m**-2
| Top net thermal radiation | top_net_thermal_radiation | J m**-2
| Top net solar radiation, clear sky | top_net_solar_radiation_clear_sky | J m**-2
| Top net thermal radiation, clear sky | top_net_thermal_radiation_clear_sky | J m**-2
| Surface net solar radiation, clear sky | surface_net_solar_radiation_clear_sky | J m**-2
| Surface net thermal radiation, clear sky | surface_net_thermal_radiation_clear_sky | J m**-2
| TOA incident solar radiation | toa_incident_solar_radiation | J m**-2
| Total sky direct solar radiation at surface | total_sky_direct_solar_radiation_at_surface | J m**-2
| Clear-sky direct solar radiation at surface | clear_sky_direct_solar_radiation_at_surface | J m**-2
| Surface solar radiation downward clear-sky  | surface_solar_radiation_downward_clear_sky	 | J m**-2
| Surface thermal radiation downward clear-sky | surface_thermal_radiation_downward_clear_sky | J m**-2
| Large-scale precipitation | large_scale_precipitation | kg m**-2
| Convective precipitation | convective_precipitation | kg m**-2
| Total precipitation | total_precipitation | kg m**-2
| Total column water | total_column_water | kg m**-2
| Fraction of cloud cover | fraction_of_cloud_cover | 0-1