# Streamflow Data Download

Metadata for streamflow to be downloaded from: https://wateroffice.ec.gc.ca/station_metadata/station_characteristics_e.html using the specifications `Province = Alberta`, `Parameter Type = Flows`, and `Regulation = Natural` and saved as `data/raw/station_metadata.csv`.

The stations listed in the metadata file subject to the date specifications are downloaded below from HYDAT and saved to `combined_streamflow.csv`.

In [1]:
import pandas as pd
from src.data_ingestion import fetch_streamflow_batch
from src.config import RAW_DATA_DIR

METADATA_PATH = RAW_DATA_DIR / "station_metadata.csv"
OUTPUT_FILENAME = "combined_streamflow.csv"
START_YEAR = 1980
END_YEAR = 2022

# Filter stations
metadata = pd.read_csv(METADATA_PATH)
filtered_stations = metadata[
    (metadata['Year From'] <= START_YEAR) & 
    (metadata['Year To'] >= END_YEAR)
]["Station Number"].tolist()

# download data
df_streamflow = fetch_streamflow_batch(filtered_stations, START_YEAR, END_YEAR, OUTPUT_FILENAME)

Downloading data for 181 stations...
Data downloaded and saved to combined_streamflow.csv
15706 days of data saved for 181 stations


## Monthy Glacier Mass Balance Reconstruction Data
Data was produced by Christina Draeger for the years 1980 to 2022 and can be accessed via: https://www.dropbox.com/scl/fo/yat0rxeoztpwol29qput2/AEtDmgySFbMEr3B9YcwLmks/kp_dp_alphabias_monthly_NN?dl=0&rlkey=4t3uobuuo8ufn5selgr5afoo4&subfolder_nav_tracking=1

The files are saved under `data/raw/mass_balance`

## Downloading Glacier Areas
Spatial information for the glaciers is downloaded from the [Randolph Glacier Inventory (RGI) version 6](https://daacdata.apps.nsidc.org/pub/DATASETS/nsidc0770_rgi_v6/). The region for Western Canada and US (`nsidc0770_02.rgi60.WesternCanadaUS.zip`) is the only download required. The files are saved under `data/raw/RGI-western-canada`

## Downloading Drainage Areas
Water basin polygons can be downloaded from https://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/HydrometricNetworkBasinPolygons/gpkg/. The major drainage areas (MDA) selected are:
* (5) Nelson River
* (7) Great Slave Lake

These MDAs were selected due to their proximity to the Eastern Rockies in Alberta. The files are saved under `data/raw/drainage_areas/`.

## Preprocessing Streamflow Data and Computing Basin Attributes
Streamflow data is filtered to remove any stations that have less that 50% of daily data available for any given year. Stations that are outside the selected MDAs are also removed.

The area and percent glaciation of each remaining station is computed and saved to `data/processed/static_attributes.csv`. The monthly mass balance data was also used to compute the monthly change in glacier volume in million cubic meters for each glacierized basin. This data is saved in `data/processed/glacier_volume_change.csv`.

In [1]:
from src.processing import filter_stations_by_annual_completeness, filter_stations_by_mda
from src.spatial_utils import process_spatial_attributes
from src.config import RAW_DATA_DIR, DRAINAGE_FILES, GLACIER_SHP_PATH, MASS_BALANCE_PATH
import pandas as pd

# 1. Load raw data
print("--- Loading Streamflow ---")
df_raw = pd.read_csv(RAW_DATA_DIR / "combined_streamflow.csv", index_col="Date", parse_dates=True)

# 2. Filter: Completeness (<50% missing per year)
df_clean = filter_stations_by_annual_completeness(df_raw, max_missing_pct=50.0)

# 3. Filter: Region (MDA 05 and 07 only)
df_final = filter_stations_by_mda(df_clean, mda_codes=["05", "07"])

print(f"\nFinal Dataset: {df_final.shape[1]} stations ready for analysis.")

# 4. Run Spatial Analysis using the filtered station list
print("\n--- Running Spatial Analysis ---")
static_stats, vol_changes = process_spatial_attributes(
    basin_gpkg_paths=DRAINAGE_FILES,
    glacier_shp_path=GLACIER_SHP_PATH,
    mass_balance_path=MASS_BALANCE_PATH,
    stations_list=df_final.columns.tolist()
)

--- Loading Streamflow ---
Filtering at 50.0% annual threshold:
 - Keeping 142 stations.
 - Dropping 39 stations due to incomplete years.
Filtering by MDA codes ['05', '07']:
 - Keeping 132 stations.
 - Dropping 10 stations (wrong region).

Final Dataset: 132 stations ready for analysis.

--- Running Spatial Analysis ---
‚è≥ Loading and merging basin files...
‚úÖ Processing 132 basins.
‚è≥ Loading glaciers and reprojecting...
‚è≥ Calculating glacier-basin intersections...


KeyboardInterrupt: 

## Downloading ERA5 Climate Data
This project uses the Copernicus Climate Data Store (CDS) to download ERA5 precipitation and temperature data. Follow these steps to configure your environment.

#### Create a CDS Account

1. Visit the [Climate Data Store (CDS) registration page](https://cds.climate.copernicus.eu/#!/home).

2. Create an account and log in.

#### Accept the Terms of Use
Important: You must manually accept the "Terms of Use" for every dataset you wish to download, or the API will return an error.

1. Go to the ERA5 daily statistics page.

2. Click the "Download Data" tab.

3. Scroll to the bottom and click Accept Terms (look for a "License" section).

4. Repeat this for the ERA5 reanalysis single levels.

#### Get your API Key
1. Go to your User Profile.

2. Scroll down to the section labeled API Key.

3. You will see a block of text that looks like this:

```
url: https://cds.climate.copernicus.eu/api/v2
key: <UID>:<API-KEY>
```
#### Configure the Configuration File (`.cdsapirc`)
The cdsapi library looks for a hidden file in your home directory to authenticate.

**For Windows Users:**

1. Open your User folder (e.g., C:\Users\YourName).

2. Create a new text file named .cdsapirc (Note the leading dot).

* Tip: If Windows doesn't let you create a file starting with a dot, name it .cdsapirc. (with a dot at the end) and it will save correctly.

3. Paste the url and key from Step 3 into this file.

**For Mac/Linux Users:**

1. Open your terminal.

2. Run the following command: `nano ~/.cdsapirc`

3. Paste your credentials:
```
url: https://cds.climate.copernicus.eu/api/v2
key: 12345:abcdefgh-ijkl-mnop-qrst-uvwxyz
```
4. Save and exit (`Ctrl+O`, `Enter`, `Ctrl+X`).

In [None]:
from src.data_ingestion import download_era5_precipitation, download_era5_temperature

# Study parameters
STUDY_YEARS = range(1980, 2023)

# Run downloads
download_era5_precipitation(STUDY_YEARS)
download_era5_temperature(STUDY_YEARS)

‚è≥ Downloading Precip: 1980-01 ...


2026-01-19 21:34:10,071 INFO Request ID is 8c34f45a-0807-4fd6-bd7c-45c76d6b5ed2
2026-01-19 21:34:10,259 INFO status has been updated to accepted
2026-01-19 21:34:24,381 INFO status has been updated to running
2026-01-19 21:35:26,893 INFO status has been updated to successful


a8dabf6fae54983dd4ba5abd678a0ca2.nc:   0%|          | 0.00/302k [00:00<?, ?B/s]

‚úî Skipping 1980-02 (already exists)
‚úî Skipping 1980-03 (already exists)
‚úî Skipping 1980-04 (already exists)
‚úî Skipping 1980-05 (already exists)
‚úî Skipping 1980-06 (already exists)
‚úî Skipping 1980-07 (already exists)
‚úî Skipping 1980-08 (already exists)
‚úî Skipping 1980-09 (already exists)
‚úî Skipping 1980-10 (already exists)
‚úî Skipping 1980-11 (already exists)
‚úî Skipping 1980-12 (already exists)
‚úî Skipping 1981-01 (already exists)
‚úî Skipping 1981-02 (already exists)
‚úî Skipping 1981-03 (already exists)
‚úî Skipping 1981-04 (already exists)
‚úî Skipping 1981-05 (already exists)
‚úî Skipping 1981-06 (already exists)
‚úî Skipping 1981-07 (already exists)
‚úî Skipping 1981-08 (already exists)
‚úî Skipping 1981-09 (already exists)
‚úî Skipping 1981-10 (already exists)
‚úî Skipping 1981-11 (already exists)
‚úî Skipping 1981-12 (already exists)
‚úî Skipping 1982-01 (already exists)
‚úî Skipping 1982-02 (already exists)
‚úî Skipping 1982-03 (already exists)
‚úî Skipping

## Preprocess Climate Data
Compute daily basin averaged statistics for each climate vraible.

In [2]:
from src.climate import process_era5_basin_data
from src.config import DRAINAGE_FILES

# Assumes you already ran the filtering steps and have 'df_final'
# df_final contains the columns of the stations we want

precip_df, temp_df = process_era5_basin_data(
    basin_gpkg_list=DRAINAGE_FILES,
    stations_list=df_final.columns.tolist()
)

print(precip_df.head())

Step 1/4: Loading Basins...
   Original CRS: PROJCS["Canada_Albers_Equal_Area_Conic",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]],PROJECTION["Albers_Conic_Equal_Area"],PARAMETER["latitude_of_center",40],PARAMETER["longitude_of_center",-96],PARAMETER["standard_parallel_1",50],PARAMETER["standard_parallel_2",70],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
   üîÑ Reprojecting basins to EPSG:4326 (Lat/Lon)...
Step 2/4: Mapping Spatial Weights...
‚è≥ Computing spatial weights for 132 basins...


Mapping Grid: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 132/132 [00:14<00:00,  8.87it/s]



Step 3/4: Processing Precipitation...
--- Processing 516 Precipitation Files ---


Precip Files: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 516/516 [01:17<00:00,  6.69it/s]



Step 4/4: Processing Temperature...
--- Processing 43 Temperature Files (Hourly) ---


Temp Files:  12%|‚ñà‚ñè        | 5/43 [03:56<29:57, 47.30s/it]Can't read index file 'C:\\Users\\tbwil\\Documents\\School\\MSc Geophysics\\Thesis Project\\data\\raw\\era5\\temperature\\era5_temp_1985.grib.5b7b6.idx'
Traceback (most recent call last):
  File "C:\Users\tbwil\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\cfgrib\messages.py", line 551, in from_indexpath_or_filestream
    self = cls.from_indexpath(indexpath)
  File "C:\Users\tbwil\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\cfgrib\messages.py", line 430, in from_indexpath
    index = pickle.load(file)
EOFError: Ran out of input
Temp Files: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 43/43 [41:22<00:00, 57.73s/it]


‚úÖ Climate processing complete.
             05AA004   05AA008   05AA022   05AA027   05AA028   05AB005  \
datetime                                                                 
1980-01-01  0.053606  0.232041  0.289149  0.063420  0.325748  0.000021   
1980-01-02  0.538679  1.546677  1.204349  1.178653  1.196000  0.529095   
1980-01-03  0.656272  1.329921  1.280606  0.723043  1.460468  0.587772   
1980-01-04  0.034958  0.198072  0.176594  0.077786  0.189435  0.003662   
1980-01-05  5.758264  4.711472  5.823813  4.281838  6.383781  4.125207   

             05AB029   05AD003   05AD035   05AE005  ...   07JC001   07JD002  \
datetime                                            ...                       
1980-01-01  0.000505  0.294787  0.000000  0.001700  ...  0.256983  0.291562   
1980-01-02  0.348530  0.871475  0.159057  0.290331  ...  0.458887  0.281282   
1980-01-03  0.575559  1.534850  0.745324  0.288815  ...  3.655144  3.306906   
1980-01-04  0.001656  0.146171  0.000000  0.002382  .