# Forecast: exceedance - Loire
***

**Author**: Chus Casado<br>
**Date**: 27-01-2023<br>

**Introduction**:<br>


**Questions**:<br>


**Tasks to do**:<br>

**Interesting links**<br>
[Pythonic way to perform statistics across multiple variables with Xarray](https://towardsdatascience.com/pythonic-way-to-perform-statistics-across-multiple-variables-with-xarray-d0221c78e34a)

In [1]:
import os
path_root = os.getcwd()
import glob
import numpy as np
import pandas as pd
import xarray as xr
from datetime import datetime, timedelta

import warnings
warnings.filterwarnings("ignore")

os.chdir('../py/')
from notifications import *
os.chdir(path_root)

## 1 Data

In [2]:
catchment = 'Loire'

### 1.1 Stations 

In [3]:
# load selected points
stations = pd.read_csv(f'results/{catchment}/points_selected.csv', index_col='station_id')
print('no. stations:\t{0}'.format(stations.shape[0]))

no. stations:	27


### 1.2 Discharge forecast

#### List available data

In [6]:
path_forecast = 'E:/casadje/Documents/skill_assessment/data/CDS/forecast/'

models = ['COS', 'DWD', 'EUD', 'EUE']

In [7]:
# list files
fore_files = {model: [] for model in models}
for year in [2020, 2021, 2022]:
    for month in range(1, 13):    
        # list files
        for model in models:
            fore_files[model] += glob.glob(f'{path_forecast}{model}/{year}/{month:02d}/*.nc')

# count files and check if all are avaible
n_files = pd.Series(data=[len(fore_files[model]) for model in models], index=models)

# list of forecast from the beginning to the end of the data
start, end = datetime(1900, 1, 1), datetime(2100, 1, 1)
for model in models:
    st, en = [datetime.strptime(fore_files[model][step][-13:-3], '%Y%m%d%H') for step in [0, -1]]
    start = max(st, start)
    end = min(en, end)
dates = pd.date_range(start, end, freq='12h')

# find missing files
if any(n_files != len(dates)):
    missing = {}
    for model in models:
        filedates = [datetime.strptime(file[-13:-3], '%Y%m%d%H') for file in fore_files[model]]    
        missing[model] = [date for date in dates if date not in filedates]
    print('mising files:', missing)

# trim files to the period where all models are available
for model in models:
    fore_files[model] = [file for file in fore_files[model] if start <= datetime.strptime(file[-13:-3], '%Y%m%d%H') <= end]
    print('{0}:\t{1} files'.format(model, len(fore_files[model])))

mising files: {'COS': [], 'DWD': [], 'EUD': [], 'EUE': []}
COS:	730 files
DWD:	730 files
EUD:	730 files
EUE:	730 files


## 2 Analysis

### 2.1 Reforecast data: exceedance probability

This section will iteratively (station by station) load all the available forecast and compute the probability of exceeding the discharge threshold for each of the meteorological forcings. The result will be a NetCDF file for each station that contains the exceedance probability. These files will be later used in the skill assessment.

In [8]:
n_stations = stations.shape[0]
path = f'../data/exceedance/forecast/'

for i, stn in enumerate(stations.index):    
    
    file = f'{stn:04d}.nc'
    if file in os.listdir(path):
        print(f'File {file} already exists')
        continue
            
    dct = {}
    for model in models:
        
        print(f'Station {stn:>4d} ({i+1:>4d} of {n_stations:>4d}) - {model}', end='\r')
        
        # compute exceedance of discharge threshold
        exceedance = compute_exceedance(fore_files[model], str(stn), stations.loc[stn, 'rl5'])
        # limit to 10 days leadtime
        if len(exceedance.leadtime) > 40:
            exceedance = exceedance.isel(leadtime=slice(0, 40))
        # if probabilistic, compute probability of exceedance
        if 'member' in exceedance.dims:
            dct[model] = exceedance.mean('member')
        else:
            dct[model] = exceedance
    
    # convert exceedance data into a Dataset
    ds = xr.Dataset(dct, coords={'forecast': dct['EUE'].forecast, 'leadtime': dct['EUE'].leadtime})

    # convert the dataset into a DataArray in which models is a dimension instead of different variables
    da_list = [ds[model].expand_dims('model', axis=0).assign_coords(model=[model]).rename('exceedance') for model in models]
    da = xr.merge(da_list)['exceedance']
    
    # export as NetCDF
    da.to_netcdf(f'{path}{file}')

Station 2770 (  27 of   27) - EUE