## Metrics Calculations Tutorial

This notebook provides examples on how to carry out data metrics calcuations and analysis using the post_processing python library. Be sure to go through the [Quick Start](https://nhs-postprocessing.readthedocs.io/en/stable/QuickStart.html) section of the [documentation](https://nhs-postprocessing.readthedocs.io/en/stable/index.html) for instructions on how to access and import the libary and its packages.

If you would like to open an editable runnable version of the tutorial click [here](https://mybinder.org/v2/gh/UchechukwuUdenze/NHS_PostProcessing/main?%2FHEAD=&urlpath=%2Fdoc%2Ftree%2Fdocs%2Fsource%2Fnotebooks%2Ftutorial-metrics.ipynb) to be directed to a binder platform

<mark>The Library is still under active development and empty sections will be completed in Due time</mark>

### Table of content
- [Available Metrics](#available-metrics)
- [Single Data Metrics](#single-data-metrics)
- [Comparison Metrics](#comparison-metrics)

 All files are available in the github repository [here](https://github.com/UchechukwuUdenze/NHS_PostProcessing/tree/main/docs/source/notebooks)

### Requirements

The conda environmnent contains all libraries associated the post processing library. After setting up the conda environment, you only have to import the metrics maniupulation module from postprocessinglib.evaluation.

In [1]:
### Remove and modify these later.
import sys
import pandas as pd
sys.path.append("../../../")

In [2]:
from postprocessinglib.evaluation import data, metrics

Lets use one of the data blocks from the data manipulation tutorial

In [3]:
# passing a controlled csv file for testing
path_output = "MESH_output_streamflow_2.csv"
path_input = "Station_data.xlsx"

DATAFRAMES = data.generate_dataframes(csv_fpaths=path_output, warm_up=91)
               
Stations = pd.read_excel(io=path_input)

ignore = []
for i in range(0, len(Stations)):
    if Stations['Properties'][i] == 'X':
        ignore.append(i)

Stations = Stations.drop(Stations[Stations['Properties'] == 'X'].index)
Stations = Stations.set_index('Station Number')

for i in reversed(ignore):
        DATAFRAMES["DF_OBSERVED"] = DATAFRAMES["DF_OBSERVED"].drop(columns = DATAFRAMES['DF_OBSERVED'].columns[i])
        DATAFRAMES['DF_SIMULATED']  = DATAFRAMES["DF_SIMULATED"].drop(columns = DATAFRAMES['DF_SIMULATED'].columns[i])
        for key, dataframe in DATAFRAMES.items():
            if key != "DF_SIMULATED" and key != "DF_OBSERVED":
                DATAFRAMES[key] = dataframe.drop(columns = dataframe.columns[[2*i, 2*i+1]])
            

# for key, value in DATAFRAMES.items():
#     print(f"{key}:\n{value.head}")

The start date for the Data is 1982-01-01


Now that we have our data, let's jump right in!

### Available Metrics

Because the library is in active development, there will be regular removals and additions to its features. As a rule of thumb therefore it is always a good idea to check what it can do at the time of use. We can do this by going ->

In [4]:
metrics.available_metrics()

['MSE - Mean Square Error',
 'RMSE - Roor Mean Square Error',
 'MAE - Mean Average Error',
 'NSE - Nash-Sutcliffe Efficiency ',
 'NegNSE - Nash-Sutcliffe Efficiency * -1',
 'LogNSE - Log of Nash-Sutcliffe Efficiency',
 'NegLogNSE - Log of Nash-Sutcliffe Efficiency * -1',
 'KGE - Kling-Gupta Efficiency',
 'NegKGE - Kling-Gupta Efficiency * -1',
 'KGE 2012 - Kling-Gupta Efficiency modified as of 2012',
 'BIAS- Prcentage Bias',
 'AbsBIAS - Absolute Value of the Percentage Bias',
 'TTP - Time to Peak',
 'TTCoM - Time to Centre of Mass',
 'SPOD - Spring Pulse ONset Delay',
 'FDC Slope - Slope of the Flow Duration Curve']

### Single Data Metrics
These are the metrics that only apply to just one of either the simulated or observed data. They are less about analysis and more about obtaining information about the data. These aren't made to compare but rather to inform trends and behaviours at a particular station. The library has 4 of them :

- [Time to Peak](#time-to-peak)
- [Time to Centre of Mass](#time-to-centre-of-mass)
- [Spring Pulse Onset Delay](#spring-pulse-onset-delay)
- [Slope of the Flow Duration Curve](#flow-duration-curve-slope)

#### Time to Peak
This helps to show how long it takes on average to get to the highest streamflow each year. An example is shown below:

In [5]:
# The Time to Peak for the simulated data will look like 
print(metrics.time_to_peak(df=DATAFRAMES['DF_SIMULATED']))

# The time to peak for the observed data looks like:-
print(metrics.time_to_peak(df=DATAFRAMES['DF_OBSERVED']))

              ttp
Station          
Station 1   170.0
Station 2   177.0
Station 3   176.0
Station 4   168.0
Station 6   171.0
Station 8   175.0
Station 9   166.0
Station 10  156.0
Station 11  156.0
Station 12  170.0
Station 13  179.0
Station 14  162.0
Station 16  171.0
Station 17  171.0
Station 18  168.0
Station 20  175.0
Station 21  170.0
Station 22  190.0
Station 23  187.0
Station 24  184.0
Station 27  187.0
Station 28  174.0
Station 29  173.0
Station 30  214.0
Station 32  176.0
Station 33  184.0
Station 34  149.0
Station 35  148.0
Station 36  155.0
Station 37  186.0
Station 39  141.0
Station 40  143.0
Station 41  154.0
Station 42  171.0
Station 46  177.0
Station 47  170.0
Station 48  172.0
Station 52  178.0
Station 53  147.0
Station 54  155.0
              ttp
Station          
Station 1   157.0
Station 2   157.0
Station 3   158.0
Station 4   159.0
Station 6   160.0
Station 8   172.0
Station 9   175.0
Station 10  166.0
Station 11  165.0
Station 12  173.0
Station 13  189.0
Station 14

As you can see, at the first station, on average, over the years, the highest predicted streamflow value will usually occur after 170 days - somewhere in the third week of June. For the second station on average, over the years, the highest predicted streamflow value usually occur after 177 days - somewhere in the final week of June. 
As you can see, you are able to observe and notice trends with the data at specific stations.

#### Time to Centre of Mass
This helps to show how long it takes on average to obtain 50% of the streamflow each year. An example is shown below:

In [6]:
# The Time to Centre of Mass for the simulated data will look like 
print(metrics.time_to_centre_of_mass(df=DATAFRAMES['DF_SIMULATED']))

# The time to Centre of Mass for the observed data looks like:-
print(metrics.time_to_centre_of_mass(df=DATAFRAMES['DF_OBSERVED']))

            ttcom
Station          
Station 1   184.0
Station 2   166.0
Station 3   188.0
Station 4   182.0
Station 6   182.0
Station 8   183.0
Station 9   190.0
Station 10  183.0
Station 11  180.0
Station 12  158.0
Station 13  175.0
Station 14  185.0
Station 16  169.0
Station 17  181.0
Station 18  182.0
Station 20  181.0
Station 21  187.0
Station 22  175.0
Station 23  190.0
Station 24  190.0
Station 27  192.0
Station 28  187.0
Station 29  193.0
Station 30  205.0
Station 32  189.0
Station 33  190.0
Station 34  152.0
Station 35  147.0
Station 36  161.0
Station 37  189.0
Station 39  155.0
Station 40  160.0
Station 41  167.0
Station 42  169.0
Station 46  187.0
Station 47  171.0
Station 48  190.0
Station 52  187.0
Station 53  156.0
Station 54  183.0
            ttcom
Station          
Station 1     0.0
Station 2     0.0
Station 3     0.0
Station 4   178.0
Station 6     0.0
Station 8   178.0
Station 9     0.0
Station 10  194.0
Station 11  186.0
Station 12  172.0
Station 13    0.0
Station 14

As you can see, at the fourth station, on average, over the years, 50% of the total volume of streamflow each year will usually have occured by 178 days - somewhere in the final week of June and for the twentieth station, after 179 days - Right at the end of June. 

#### Spring Pulse Onset Delay
This is used to determine what day snowmelt starts. An example is shown below:

In [7]:
# The Spring Pulse Onset for the simulated data will look like 
print(metrics.SpringPulseOnset(df=DATAFRAMES['DF_SIMULATED']))

# The Spring Pulse Onset for the observed data looks like:-
print(metrics.SpringPulseOnset(df=DATAFRAMES['DF_OBSERVED']))

             spod
Station          
Station 1   127.0
Station 2   116.0
Station 3   126.0
Station 4   119.0
Station 6   121.0
Station 8   124.0
Station 9   139.0
Station 10  126.0
Station 11  125.0
Station 12  297.0
Station 13  114.0
Station 14  134.0
Station 16  109.0
Station 17  123.0
Station 18  126.0
Station 20  132.0
Station 21  128.0
Station 22  124.0
Station 23  144.0
Station 24  146.0
Station 27  145.0
Station 28  122.0
Station 29  119.0
Station 30  199.0
Station 32  136.0
Station 33  120.0
Station 34  108.0
Station 35  107.0
Station 36  105.0
Station 37  116.0
Station 39  112.0
Station 40  108.0
Station 41  113.0
Station 42  107.0
Station 46  115.0
Station 47  126.0
Station 48  205.0
Station 52  135.0
Station 53  103.0
Station 54  110.0
             spod
Station          
Station 1   109.0
Station 2    73.4
Station 3    96.7
Station 4   114.0
Station 6   113.0
Station 8   114.0
Station 9   142.0
Station 10  136.0
Station 11  137.0
Station 12  297.0
Station 13  169.0
Station 14

This shows us that at the first station, on average, over the years, snowmelt is predicted to begin 127 days into the year - somewhere in the First week of May. For the third station on average, over the years, snowmelt is predicted to begin 126 days into the year - somewhere in the First week of May as well

#### Flow Duration Curve Slope
This is used to calculate the slope of the flow duration curve. An example is shown below:

In [8]:
# The Fliw Duration Curve for the Simulated Data will look like 
print(metrics.slope_fdc(df=DATAFRAMES['DF_SIMULATED']))

# You can also specify which percentile to pick values from 
print(metrics.slope_fdc(df=DATAFRAMES['DF_OBSERVED'], percentiles=(25, 77)))

            fdc_Slope
Station 1      3.1504
Station 2      2.3760
Station 3      2.1143
Station 4      4.2863
Station 5      3.7256
Station 6      2.7546
Station 7      8.0229
Station 8      7.0940
Station 9      1.5833
Station 10     4.0913
Station 11     6.9928
Station 12     2.3851
Station 13     5.7800
Station 14     2.7093
Station 15     2.6374
Station 16     2.3301
Station 17     5.6500
Station 18     5.6270
Station 19     1.1424
Station 20     1.3281
Station 21     1.5908
Station 22     6.1167
Station 23     0.7576
Station 24     0.7100
Station 25     6.3471
Station 26     1.0788
Station 27     4.4442
Station 28     3.1001
Station 29     5.1716
Station 30     1.2852
Station 31     5.2684
Station 32     4.5245
Station 33     3.5924
Station 34     3.9921
Station 35     1.7148
Station 36     5.9512
Station 37     1.3825
Station 38     0.6681
Station 39     5.2883
Station 40     1.1330
            fdc_Slope
Station 1      3.1056
Station 2      2.8474
Station 3      1.9309
Station 4 

### Comparison Metrics

These are the metrics that are used to compare the simulated and observed data. They work to show accurately we are able to predict the streamflow values using the models. Every other metric is a comparison metric. They are shown below:

- [Mean Square Error](#mean-square-error)
- [Root Mean Square Error](#root-mean-square-error)
- [Mean Average Error](#mean-average-error)
- [Nash-Sutcliffe Efficiency](#nash-sutcliffe-efficiency)
- [Kling-Gupta Efficiency](#kling-gupta-efficiency)
- [Percentage Bias](#percentage-bias)

#### Mean Square Error


In [9]:
# Mean square error for the data we were given
print(metrics.mse(observed=DATAFRAMES['DF_OBSERVED'], simulated=DATAFRAMES['DF_SIMULATED']))

                model1
Station 1     1304.000
Station 2      801.200
Station 3       17.440
Station 4     5539.000
Station 5     4951.000
Station 6    14250.000
Station 7       85.480
Station 8      577.500
Station 9     1838.000
Station 10      45.260
Station 11      87.610
Station 12    1898.000
Station 13     535.500
Station 14    4722.000
Station 15    6484.000
Station 16    4763.000
Station 17     621.800
Station 18     125.100
Station 19    1415.000
Station 20    2282.000
Station 21    3487.000
Station 22     798.200
Station 23   11150.000
Station 24    1876.000
Station 25    1287.000
Station 26   13730.000
Station 27      54.800
Station 28       8.375
Station 29      41.190
Station 30   17410.000
Station 31      19.850
Station 32      89.690
Station 33     166.600
Station 34     170.600
Station 35   20190.000
Station 36      36.270
Station 37   21630.000
Station 38   68600.000
Station 39    2450.000
Station 40  139600.000


#### Root Mean Square Error

In [10]:
# Root Mean square error for the data we were given
print(metrics.rmse(observed=DATAFRAMES['DF_OBSERVED'], simulated=DATAFRAMES['DF_SIMULATED']))

             model1
Station 1    36.110
Station 2    28.310
Station 3     4.177
Station 4    74.420
Station 5    70.360
Station 6   119.400
Station 7     9.246
Station 8    24.030
Station 9    42.870
Station 10    6.728
Station 11    9.360
Station 12   43.570
Station 13   23.140
Station 14   68.720
Station 15   80.530
Station 16   69.010
Station 17   24.940
Station 18   11.190
Station 19   37.610
Station 20   47.770
Station 21   59.050
Station 22   28.250
Station 23  105.600
Station 24   43.320
Station 25   35.870
Station 26  117.200
Station 27    7.403
Station 28    2.894
Station 29    6.418
Station 30  131.900
Station 31    4.456
Station 32    9.471
Station 33   12.910
Station 34   13.060
Station 35  142.100
Station 36    6.023
Station 37  147.100
Station 38  261.900
Station 39   49.500
Station 40  373.600


#### Mean Average Error

In [11]:
# Mean Average error for the data we were given
print(metrics.mae(observed=DATAFRAMES['DF_OBSERVED'], simulated=DATAFRAMES['DF_SIMULATED']))

               model1
Station 1    210400.0
Station 2     29700.0
Station 3     15570.0
Station 4    492800.0
Station 5    450700.0
Station 6    941100.0
Station 7     50290.0
Station 8    228500.0
Station 9    467100.0
Station 10    58020.0
Station 11    50720.0
Station 12   463000.0
Station 13   134400.0
Station 14   355200.0
Station 15   415100.0
Station 16   584800.0
Station 17   151100.0
Station 18    51360.0
Station 19   241900.0
Station 20   290000.0
Station 21   346600.0
Station 22   156500.0
Station 23   468600.0
Station 24   423300.0
Station 25   163000.0
Station 26   874300.0
Station 27    27240.0
Station 28     8091.0
Station 29    14870.0
Station 30   943500.0
Station 31    19570.0
Station 32    36810.0
Station 33    15320.0
Station 34    63130.0
Station 35   994400.0
Station 36    20120.0
Station 37   994500.0
Station 38  2110000.0
Station 39   194800.0
Station 40  2793000.0


#### Nash-Sutcliffe Efficiency

In [12]:
# Nash-Sutcliffe Efficiency for the data we were given
print(metrics.nse(observed=DATAFRAMES['DF_OBSERVED'], simulated=DATAFRAMES['DF_SIMULATED']))

              model1
Station 1   0.515000
Station 2  -1.746000
Station 3  -2.021000
Station 4   0.616500
Station 5   0.658800
Station 6   0.658800
Station 7   0.366200
Station 8   0.663000
Station 9   0.331100
Station 10 -0.392200
Station 11 -2.092000
Station 12  0.463900
Station 13  0.703700
Station 14  0.690500
Station 15  0.579000
Station 16  0.514400
Station 17  0.401900
Station 18 -0.175000
Station 19  0.588000
Station 20  0.482900
Station 21  0.329100
Station 22 -0.931100
Station 23 -0.763800
Station 24 -1.619000
Station 25  0.186000
Station 26  0.396100
Station 27 -0.881000
Station 28 -0.016910
Station 29 -1.577000
Station 30  0.332200
Station 31  0.365600
Station 32 -0.009723
Station 33  0.215400
Station 34 -0.204100
Station 35  0.369400
Station 36 -0.252500
Station 37  0.191100
Station 38  0.162100
Station 39 -5.021000
Station 40 -0.110300


##### Logarithm of the Nash-Sutcliffe Efficiency

In [13]:
# Logarithm of the Nash-Sutcliffe Efficiency for the data we were given
print(metrics.lognse(observed=DATAFRAMES['DF_OBSERVED'], simulated=DATAFRAMES['DF_SIMULATED']))

              model1
Station 1   -0.29320
Station 2   -0.19990
Station 3   -0.03280
Station 4    0.12760
Station 5    0.16830
Station 6   -0.16750
Station 7   -0.47860
Station 8   -1.66000
Station 9   -1.87100
Station 10   0.03629
Station 11 -24.23000
Station 12  -2.01400
Station 13  -1.09600
Station 14  -0.11540
Station 15   0.03147
Station 16  -0.76310
Station 17  -0.01762
Station 18  -1.82600
Station 19  -0.02795
Station 20   0.05579
Station 21   0.21040
Station 22  -1.47800
Station 23  -1.82200
Station 24  -7.55300
Station 25  -0.83130
Station 26  -0.99350
Station 27  -0.51630
Station 28  -0.06095
Station 29   0.03787
Station 30  -0.38420
Station 31  -2.06700
Station 32  -0.07810
Station 33   0.12600
Station 34   0.40760
Station 35   0.19540
Station 36  -9.34200
Station 37   0.36230
Station 38   0.22310
Station 39  -0.67950
Station 40   0.26580


#### Kling-Gupta Efficiency

In [14]:
# Kling-Gupta Efficiency for the data we were given
print(metrics.kge(observed=DATAFRAMES['DF_OBSERVED'], simulated=DATAFRAMES['DF_SIMULATED']))

             model1
Station 1   0.50170
Station 2  -0.11880
Station 3  -0.02808
Station 4   0.78110
Station 5   0.82350
Station 6   0.80610
Station 7   0.59930
Station 8   0.58330
Station 9   0.58140
Station 10  0.09846
Station 11 -0.20680
Station 12  0.61750
Station 13  0.71600
Station 14  0.76860
Station 15  0.70680
Station 16  0.69220
Station 17  0.51780
Station 18  0.36490
Station 19  0.75520
Station 20  0.69010
Station 21  0.64740
Station 22 -0.01263
Station 23  0.08817
Station 24  0.03350
Station 25  0.43390
Station 26  0.60640
Station 27  0.23640
Station 28  0.50080
Station 29  0.07385
Station 30  0.62840
Station 31  0.20520
Station 32  0.21400
Station 33  0.23620
Station 34  0.42820
Station 35  0.64150
Station 36 -0.16920
Station 37  0.61540
Station 38  0.60540
Station 39 -0.81910
Station 40  0.46120


##### Modified Kling Gupta efficiency
This is different from the regular kge in that this uses the coefficient of Variation as its bias term (i.e., std/mean) as opposed to just the mean

In [15]:
# Kling-Gupta Efficiency for the data we were given
print(metrics.kge_2012(observed=DATAFRAMES['DF_OBSERVED'], simulated=DATAFRAMES['DF_SIMULATED']))

              model1
Station 1   0.556500
Station 2   0.059540
Station 3  -0.232300
Station 4   0.756200
Station 5   0.823000
Station 6   0.746500
Station 7   0.320900
Station 8   0.365600
Station 9   0.111200
Station 10  0.108600
Station 11 -0.175000
Station 12  0.207600
Station 13  0.570600
Station 14  0.694600
Station 15  0.764700
Station 16  0.539700
Station 17  0.652300
Station 18 -0.263600
Station 19  0.649100
Station 20  0.636600
Station 21  0.571400
Station 22  0.170600
Station 23 -0.084050
Station 24  0.009723
Station 25  0.208900
Station 26  0.414800
Station 27 -0.066640
Station 28  0.511100
Station 29  0.012960
Station 30  0.452700
Station 31  0.269300
Station 32 -0.085330
Station 33  0.001935
Station 34  0.242400
Station 35  0.525000
Station 36 -0.185300
Station 37  0.610100
Station 38  0.615200
Station 39 -0.080510
Station 40  0.393800


#### Percentage Bias

In [16]:
# Percentage Bias for the data we were given
print(metrics.bias(observed=DATAFRAMES['DF_OBSERVED'], simulated=DATAFRAMES['DF_SIMULATED']))

             model1
Station 1   34.7500
Station 2  -10.6600
Station 3   14.5900
Station 4   -9.7750
Station 5   -0.3038
Station 6    8.0200
Station 7   20.2900
Station 8   39.0200
Station 9   35.3200
Station 10  48.4300
Station 11  -1.6260
Station 12  33.7400
Station 13  24.5700
Station 14   6.6710
Station 15 -17.3900
Station 16  13.5300
Station 17 -11.5000
Station 18  38.0200
Station 19  12.2800
Station 20  17.6700
Station 21  15.9200
Station 22 -10.7300
Station 23   8.7560
Station 24  36.1700
Station 25  13.4700
Station 26  14.1300
Station 27  23.3200
Station 28 -18.7200
Station 29   4.6580
Station 30  14.9400
Station 31  56.1900
Station 32  53.9500
Station 33  57.1200
Station 34  23.1600
Station 35  10.0600
Station 36  60.6800
Station 37   2.5050
Station 38  -1.3850
Station 39 -56.7800
Station 40   5.4300


Now that we have seen individual metrics, we also have the ability to calculate a list of metrics using our **calculate_all_metrics** or **calculate_metrics(list of merics)**. These are shown below:

In [17]:
metrices = ["MSE", "RMSE", "MAE", "NSE", "NegNSE"]
metrics.calculate_metrics(observed=DATAFRAMES['DF_OBSERVED'], simulated=DATAFRAMES['DF_SIMULATED'],
                                            metrices=metrices)

Unnamed: 0_level_0,MSE,RMSE,MAE,NSE,NEGNSE
Unnamed: 0_level_1,model1,model1,model1,model1,model1
Station 1,1304.0,36.11,210400.0,0.515,-0.515
Station 2,801.2,28.31,29700.0,-1.746,1.746
Station 3,17.44,4.177,15570.0,-2.021,2.021
Station 4,5539.0,74.42,492800.0,0.6165,-0.6165
Station 5,4951.0,70.36,450700.0,0.6588,-0.6588
Station 6,14250.0,119.4,941100.0,0.6588,-0.6588
Station 7,85.48,9.246,50290.0,0.3662,-0.3662
Station 8,577.5,24.03,228500.0,0.663,-0.663
Station 9,1838.0,42.87,467100.0,0.3311,-0.3311
Station 10,45.26,6.728,58020.0,-0.3922,0.3922


We are also able to save these metrics as text files and csv files by specifying the **format** parameter and even the **out** parameter to specify a name to save it as.

In [18]:
metrics.calculate_all_metrics(observed=DATAFRAMES['DF_OBSERVED'], simulated=DATAFRAMES['DF_SIMULATED'],
#                          format='txt', out='metrics'
                         )

Unnamed: 0_level_0,MSE,RMSE,MAE,NSE,NegNSE,LogNSE,NegLogNSE,KGE,NegKGE,KGE 2012,BIAS,AbsBIAS,TTP_obs,TTCoM_obs,SPOD_obs,TTP_sim_model1,TTCoM_sim_model1,SPOD_sim_model1
Unnamed: 0_level_1,model1,model1,model1,model1,model1,model1,model1,model1,model1,model1,model1,model1,ttp,ttcom,spod,ttp,ttcom,spod
Station 1,1304.0,36.11,210400.0,0.515,-0.515,-0.2932,0.2932,0.5017,-0.5017,0.5565,34.75,34.75,157.0,0.0,109.0,170.0,184.0,127.0
Station 2,801.2,28.31,29700.0,-1.746,1.746,-0.1999,0.1999,-0.1188,0.1188,0.05954,-10.66,10.66,157.0,0.0,73.4,177.0,166.0,116.0
Station 3,17.44,4.177,15570.0,-2.021,2.021,-0.0328,0.0328,-0.02808,0.02808,-0.2323,14.59,14.59,158.0,0.0,96.7,176.0,188.0,126.0
Station 4,5539.0,74.42,492800.0,0.6165,-0.6165,0.1276,-0.1276,0.7811,-0.7811,0.7562,-9.775,9.775,159.0,178.0,114.0,168.0,182.0,119.0
Station 5,4951.0,70.36,450700.0,0.6588,-0.6588,0.1683,-0.1683,0.8235,-0.8235,0.823,-0.3038,0.3038,,,,,,
Station 6,14250.0,119.4,941100.0,0.6588,-0.6588,-0.1675,0.1675,0.8061,-0.8061,0.7465,8.02,8.02,160.0,0.0,113.0,171.0,182.0,121.0
Station 7,85.48,9.246,50290.0,0.3662,-0.3662,-0.4786,0.4786,0.5993,-0.5993,0.3209,20.29,20.29,,,,,,
Station 8,577.5,24.03,228500.0,0.663,-0.663,-1.66,1.66,0.5833,-0.5833,0.3656,39.02,39.02,172.0,178.0,114.0,175.0,183.0,124.0
Station 9,1838.0,42.87,467100.0,0.3311,-0.3311,-1.871,1.871,0.5814,-0.5814,0.1112,35.32,35.32,175.0,0.0,142.0,166.0,190.0,139.0
Station 10,45.26,6.728,58020.0,-0.3922,0.3922,0.03629,-0.03629,0.09846,-0.09846,0.1086,48.43,48.43,166.0,194.0,136.0,156.0,183.0,126.0


## <note>This section will not run in Binder!!</note>

Shown below this is an similar set of examples using multimodel simulation runs. The files are not available on the github repo and as such cannot be run outside in the binder environment. It only exists to show the functionality as well as show you, the user, what to expect from a multi model run-through.

In [19]:
import glob
from natsort import natsorted

folder = r'C:\Users\udenzeU\OneDrive - EC-EC\Fuad_Mesh_Dataset\CanRCM_runs' ## new line
start_dates = [pd.to_datetime('1990-01-01'), pd.to_datetime('2026-01-01'), pd.to_datetime('2071-01-01')]
end_dates = [pd.to_datetime('2010-12-31'), pd.to_datetime('2055-12-31'), pd.to_datetime('2100-12-31')]

# Extract list of CSV files
csv_files = glob.glob(f"{folder}/**/MESH_output_streamflow.csv")
csv_files = natsorted(csv_files)

DATAFRAMES = data.generate_dataframes(csv_fpaths=csv_files)
for key, value in DATAFRAMES.items():
    print(f"{key}")

The start date for the Data is 1990-10-01
DF_1
DF_2
DF_3
DF_4
DF_5
DF_6
DF_OBSERVED
DF_SIMULATED_1
DF_SIMULATED_2
DF_SIMULATED_3
DF_SIMULATED_4
DF_SIMULATED_5
DF_SIMULATED_6
DF_MERGED


In [20]:
metrices = ["MSE", "RMSE", "MAE", "NSE", "NegNSE"]
metrics.calculate_metrics(observed=DATAFRAMES['DF_OBSERVED'],
                          simulated = [v for k, v in DATAFRAMES.items() if k.startswith("DF_SIMULATED_")],
                                            metrices=metrices)

Unnamed: 0_level_0,MSE,MSE,MSE,MSE,MSE,MSE,RMSE,RMSE,RMSE,RMSE,...,NSE,NSE,NSE,NSE,NEGNSE,NEGNSE,NEGNSE,NEGNSE,NEGNSE,NEGNSE
Unnamed: 0_level_1,model1,model2,model3,model4,model5,model6,model1,model2,model3,model4,...,model3,model4,model5,model6,model1,model2,model3,model4,model5,model6
Station 1,1466.0,1357.0,1439.0,1339.0,1521.0,1592.0,38.29,36.84,37.93,36.59,...,0.1361,0.1962,0.08687,0.04393,-0.1198,-0.1849,-0.1361,-0.1962,-0.08687,-0.04393
Station 2,241.3,182.4,188.6,224.6,187.7,268.4,15.53,13.51,13.73,14.99,...,-1.393,-1.85,-1.381,-2.405,2.061,1.314,1.393,1.85,1.381,2.405
Station 3,11.4,7.611,9.141,12.27,8.557,8.784,3.376,2.759,3.023,3.503,...,-2.824,-4.135,-2.58,-2.675,3.768,2.184,2.824,4.135,2.58,2.675
Station 4,7647.0,6473.0,7188.0,7440.0,8004.0,7948.0,87.45,80.45,84.78,86.25,...,0.06821,0.03559,-0.03754,-0.03027,-0.008772,-0.1609,-0.06821,-0.03559,0.03754,0.03027
Station 5,354.4,295.8,329.3,402.0,383.2,336.2,18.83,17.2,18.15,20.05,...,-0.216,-0.4844,-0.4153,-0.2415,0.3088,0.09246,0.216,0.4844,0.4153,0.2415
Station 6,7704.0,6519.0,7307.0,7579.0,8075.0,8066.0,87.77,80.74,85.48,87.06,...,0.04059,0.004887,-0.06019,-0.05902,0.01154,-0.144,-0.04059,-0.004887,0.06019,0.05902
Station 7,7.258,4.349,4.81,5.38,7.577,4.222,2.694,2.085,2.193,2.319,...,-2.404,-2.808,-4.363,-1.988,4.137,2.078,2.404,2.808,4.363,1.988
Station 8,30940.0,22130.0,26680.0,26970.0,31060.0,31640.0,175.9,148.8,163.4,164.2,...,-0.1364,-0.1487,-0.3229,-0.3473,0.3176,-0.05741,0.1364,0.1487,0.3229,0.3473
Station 9,115.8,112.7,114.6,110.9,134.0,119.5,10.76,10.61,10.7,10.53,...,0.01479,0.0464,-0.1524,-0.02788,-0.003947,-0.03104,-0.01479,-0.0464,0.1524,0.02788
Station 10,816.4,800.5,853.5,819.1,1075.0,857.7,28.57,28.29,29.21,28.62,...,0.4155,0.439,0.2639,0.4126,-0.4409,-0.4518,-0.4155,-0.439,-0.2639,-0.4126


In [None]:
metrics.calculate_all_metrics(observed=DATAFRAMES['DF_OBSERVED'],
                          simulated = [v for k, v in DATAFRAMES.items() if k.startswith("DF_SIMULATED_")])