# Flow Threshold Metric
Freshes have been defined by site specific flow thresholds that are linked to ecological outcomes. This metric is measured by days per year over a threshold measured as a function of catchment inflows. Gauges were selected based on where reported ecological freshes were available. More information and threshold references are available in the technical report (Hydrological Analysis to Inform the 2020 Basin Plan Evaluation).   


## Inputs:

[AWRA-L inflow data](https://data.gov.au/data/dataset/e65078cd-808d-4514-ab60-17e597b9a883/resource/7442a111-2894-4572-aa41-1f488bf06636)

[Gauges of interest for freshes](https://data.gov.au/data/dataset/7c44535b-4a6a-432d-acff-00ec578ce7b9/resource/076140d9-f70c-48b0-a310-e9eae0c99021)

[Modeled flows Baseline 845](https://data.gov.au/data/dataset/9e3d2d32-33e7-4270-a8af-c655d6eb7710/resource/64cc37eb-19a0-4a80-b99b-6ddb4da71e49)

## Outputs:

[Results](https://data.gov.au/data/dataset/hydrologic-indicator-results-for-the-basin-plan-evaluation-2020)

In [0]:
import pandas as pd
from datetime import datetime
import numpy as np
import scipy.stats
import warnings 
warnings.filterwarnings('ignore')

## Load Model data

Loading in the 845 Model Baseline. This scenario represents baseline conditions as specified in the Basin Plan (conditions as at 2009).

*Please note while the 845 scenario was part of the information base used to develop the Basin Plan run 871 has subsequently become the baseline scenario for legislative purposes.*

In [0]:
allsites_845_daily = \
    pd.read_csv('https://data.gov.au/data/dataset/9e3d2d32-33e7-4270-a8af-c655d6eb7710/resource/64cc37eb-19a0-4a80-b99b-6ddb4da71e49/download/modelledflows_modelrun845.csv', encoding='latin1'
                 )

In [0]:
def removeHeader(PandasDataframe):
    """ Extracts a clean dataframe from a model run CSV
  Takes a pandas dataframe and removes the header information by looking for EOH
  Renames the columns and produces a date data type to use as the index
  """

  # find the end of header (EOH) row

    idx = \
        PandasDataframe.index[PandasDataframe[PandasDataframe.columns[0]]
                              == 'EOH'].tolist()

  # extract the data below the header

    data = PandasDataframe[idx[0] + 1:]

  # extract the column names

    columns = PandasDataframe.loc[idx[0] - 1].tolist()

  # rename the dataframs columns

    data.columns = PandasDataframe.loc[idx[0] - 1].tolist()

  # Check date format

    if data.columns[0:3].tolist() == ['Dy', 'Mn', 'Year']:
        data['date'] = pd.to_datetime(data.Year.astype(int) * 10000
                + data.Mn.astype(int) * 100 + data.Dy.astype(int),
                format='%Y%m%d')

    # data = data.set_index(["date"])

    if data.columns[0:3].tolist() == ['YYYY', 'MM', 'DD']:
        data['date'] = pd.to_datetime(data.YYYY.astype(int) * 10000
                + data.MM.astype(int) * 100 + data.DD.astype(int),
                format='%Y%m%d')

    # data = data.set_index("date")

    return data

In [0]:
allsites_845 = removeHeader(allsites_845_daily)
allsites_845.head()

Unnamed: 0,Dy,Mn,Year,424202A,424201A,424002_,424001_,423204_,423201A,423203A,423202C,423003_,423001_,422394A,422310C,422329A,422355A,422395A,422353A,422316A,422345A,422350A,422333A,422308A,422325A,422213A,422203A,422407C,422401C,422404A,03M_EOS,03STGE,03L_JTW,03L_INF,422204A,422208A,422015A,422017_,422011_,422013A,...,MILDURA,WENTWTH,LOCK9US,LOCK8US,LVICIN,LOCK7US,LVICOUT,FLOWSA,LOCK6US,LOCK5US,LOCK4US,LOCK3DS,MORGAN,LOCK1US,MBRIDGE,WELLING,BARRAGE,CAWNOUT,WEIR32,WYCOT,BULPUNG,BURTUND,ROCHEST,APPIN,GSM-SHE,5CIFWBC,03LEOSC,03LEOSB,03LEOSN,9INWARR,9INLBON,9INMOON,9INBRIV,9INGWME,9INNAPI,9INMACA,9USUGIN,9ADUGIN,9TINBDB,date
292,1,7,1895,0,111,1128,586,0,0,0,0,0,0,21,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0.49000001,0,0,0,0,0,0,0,0,...,83870,83993,50479,50466,8000,50457,0,50227,30602,44144,44081,43832,43665,43636,43353,43298,120680,24,200,0,1174,196,71,35.1,2564,0,0,0,0,0,0,0,0.00788632,59.97433,158.1353,275.5231,8.593439,0,502.2341,1895-07-01
293,2,7,1895,0,147,584,935,0,0,0,0,0,0,21,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0.49000001,0,0,0,0,0,0,0,0,...,83855,83991,65663,56595,880,53526,1000,54084,30621,44539,44100,43751,43653,43630,43353,43298,94842,24,200,0,1062,196,70,25.1,2348,0,0,0,0,0,0,0,0.02719779,63.84311,160.0294,279.0137,11.62051,0,514.5339,1895-07-02
294,3,7,1895,0,97,377,735,0,0,0,0,0,0,21,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0.49000001,0,0,0,0,0,0,0,0,...,83721,83976,75129,66095,881,60076,1000,60842,30734,45728,44229,43781,43633,43615,43350,43298,60309,24,200,0,936,196,72,19.7,2064,0,0,0,0,0,0,0,0.04356699,85.45871,143.7059,282.5714,26.56293,0,538.3425,1895-07-03
295,4,7,1895,0,67,217,411,0,0,0,0,0,0,21,0,0,81,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0.49000001,0,0,0,0,0,0,0,0,...,82835,83861,76041,72551,882,66987,1000,67752,31051,47495,44632,43832,43625,43602,43342,43297,43365,24,200,0,849,196,70,16.0,1845,0,0,0,0,0,0,0,0.05501439,137.0225,280.289,283.8032,55.28242,0,756.4521,1895-07-04
296,5,7,1895,0,50,164,265,0,0,0,0,0,0,21,0,0,81,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0.49000001,0,0,0,0,0,0,0,0,...,78194,83192,75989,75167,883,71960,1000,72726,31629,49415,45438,43945,43653,43610,43333,43296,43364,24,200,0,769,196,71,14.3,1314,0,0,0,0,0,0,0,0.05902108,115.4843,512.0446,274.1736,50.86988,0,952.6314,1895-07-05


##Load catchment inflow data

Load daily inflow data for each catchment from Australian Landscape Water Balance Landscape model. 

The data is loaded in, cleaned, and grouped by the water year 

This dataset is stored in https://data.gov.au/data/dataset/e65078cd-808d-4514-ab60-17e597b9a883/resource/7442a111-2894-4572-aa41-1f488bf06636

Inflows are given by:  
 $$ Inflows = Runoff \times Surface Area$$

Where runoff from 1911 - 2018/19 was provided by Bureau of Meteorology’s (BoM) AWRA Modelling Team from the Australian Water Resources Assessment Landscape model (AWRA-L) version 6.0. 

Surface Area calculated from shapefile of catchments (available [here](https://services8.arcgis.com/5xxEi7I2m6ml97fE/arcgis/rest/services/BASIN_PLAN_REGIONS/FeatureServer))

In [0]:
RawBOMData = pd.read_csv('https://data.gov.au/data/dataset/e65078cd-808d-4514-ab60-17e597b9a883/resource/7442a111-2894-4572-aa41-1f488bf06636/download/catchmentinflows_modelledrunoffdata_awralv6.csv')
RawBOMData.head()

Unnamed: 0,Column1,Barwon-Darling,Border Rivers,Campaspe,Condamine-Balonne,Eastern Mt Lofty Ranges,Goulburn-Broken,Gwydir,Lachlan,Loddon,Lower Darling,Macquarie-Castlereagh,Moonie,Murray,Murrumbidgee,Namoi,Ovens,Paroo,Warrego,Wimmera-Avoca
0,1/01/1911,280.945743,889.116774,448.18278,464.969232,731.547092,5352.450565,842.971632,3708.877278,244.310367,361.576414,1989.390134,10.016151,32348.09683,12736.19431,1018.08705,7820.42964,197.437945,311.490847,318.456924
1,2/01/1911,245.939033,931.281696,354.872346,551.652614,561.807131,4426.229832,869.601585,2982.273356,217.188593,267.220044,1854.414908,7.661954,25171.03952,10177.40817,1024.492008,6337.477974,175.114516,365.595952,284.131808
2,3/01/1911,221.136955,996.579011,287.529469,656.670069,443.33388,3747.495391,1056.964622,2521.92074,197.458266,207.665376,1802.773964,6.639534,19788.21003,8278.185418,1030.521136,5224.703577,161.117734,380.395537,259.23294
3,4/01/1911,228.037716,906.414042,238.892884,565.38187,360.467282,3252.306652,1344.195169,2588.127451,183.177053,172.462025,1928.902206,5.369864,16204.2339,7446.809863,1009.258218,4470.30592,152.364875,332.566101,240.716578
4,5/01/1911,282.894836,780.040817,211.475708,494.168382,302.34578,3000.776435,1054.972176,3047.736346,185.881273,172.42676,2834.161997,4.33717,15689.32201,7848.837431,1056.232902,4109.854376,156.725326,300.506739,231.350913


In [0]:
def transformPipline(RawDataframe):
    """
  Single function to transform raw dataframe from blob into a pandas dataframe ready for analysis
  """

  # Turn Column1 into Date

    DailyRunoffDataframe = RawDataframe.rename({'Column1':'Date'}, axis =1 )

  # total up northen basin catchments

    NorthernBasinCatchments = [
        'Barwon-Darling',
        'Border Rivers',
        'Condamine-Balonne',
        'Gwydir',
        'Macquarie-Castlereagh',
        'Moonie',
        'Namoi',
        'Paroo',
        'Warrego',
        ]

    DailyRunoffDataframe['Northern Basin'] = \
        DailyRunoffDataframe.apply(lambda row: \
                                   row[NorthernBasinCatchments].sum(),
                                   axis=1)

  # total up southern basin catchments

    SouthernBasinCatchments = [
        'Campaspe',
        'Eastern Mt Lofty Ranges',
        'Goulburn-Broken',
        'Lachlan',
        'Loddon',
        'Lower Darling',
        'Murray',
        'Murrumbidgee',
        'Ovens',
        'Wimmera-Avoca',
        ]

    DailyRunoffDataframe['Southern Basin'] = \
        DailyRunoffDataframe.apply(lambda row: \
                                   row[SouthernBasinCatchments].sum(),
                                   axis=1)

  # total up all catchments

    AllCatchments = NorthernBasinCatchments + SouthernBasinCatchments

    DailyRunoffDataframe['Total MDB'] = \
        DailyRunoffDataframe.apply(lambda row: \
                                   row[AllCatchments].sum(), axis=1)

  # convert to a datetime data type

    DailyRunoffDataframe['Date'] = \
        pd.to_datetime(DailyRunoffDataframe['Date'], format='%d/%m/%Y')

  # drop Nulls

    DailyRunoffDataframe = DailyRunoffDataframe.dropna()

    return DailyRunoffDataframe

In [0]:
def waterYear(date):
  '''Takes in date,
  changes year to water year
  returns water year'''
  if date.month <= 6:  # for months Jan to Jun move them to the previous water year
    waterYear = date.year - 1
  else:
    waterYear = date.year
    
  return int(waterYear)

In [0]:
DailyRunoffDataframe = transformPipline(RawBOMData)

# apply water year function to populate the water year column

DailyRunoffDataframe['Water Year'] = \
    DailyRunoffDataframe.apply(lambda row: waterYear(row['Date']),
                               axis=1)

In [0]:
# summing annual inflow by water year

AnnualisedInflow = DailyRunoffDataframe.groupby('Water Year',
        as_index=False).sum()
AnnualisedInflow = \
    AnnualisedInflow.rename(columns={'Northern Basin': 'Overall North',
                            'Southern Basin': 'Overall South',
                            'Total MDB': 'Overall MDBA System'})

In [0]:
meltedinflows = pd.melt(AnnualisedInflow, id_vars=['Water Year'],
                        var_name='Catchment', value_name='inflow')
meltedinflows.head()

Unnamed: 0,Water Year,Catchment,inflow
0,1910,Barwon-Darling,126450.682659
1,1911,Barwon-Darling,227134.33317
2,1912,Barwon-Darling,164450.268088
3,1913,Barwon-Darling,81373.743705
4,1914,Barwon-Darling,64467.114776


## Load gauges of interest for flow thresholds
Gets gauges of interest and their associated gauge data.

Flow threshold gauge information can be found: https://data.gov.au/data/dataset/7c44535b-4a6a-432d-acff-00ec578ce7b9/resource/076140d9-f70c-48b0-a310-e9eae0c99021

In [0]:
Data = pd.read_csv('https://data.gov.au/data/dataset/7c44535b-4a6a-432d-acff-00ec578ce7b9/resource/d05b82a7-301f-4b2b-b9bc-7a968487916a/download/observedflows_baseflowsandflowthresholds.csv')
Data.head()

Unnamed: 0.1,Unnamed: 0,Condamine-Balonne,Gwydir,Namoi,Overall North,Border Rivers,Campaspe,Goulburn-Broken,Lachlan,Loddon,Overall North.1,Macquarie-Castlereagh,Macquarie-Castlereagh.1,Macquarie-Castlereagh.2,Overall South,Overall South.1,Warrego,Warrego.1,Murrumbidgee
0,,Brenda on Culgoa,Yarraman Bridge,Bugilbone,Bourke,Mungindi,Rochester,McCoys Bridge,Booligal Weir,Kerang Weir,Weir 32,Marebone Break,Marebone Break,Warren Weir,Wentworth (lock 10),D/S Yarrawonga,Cunnamulla Weir,Wyandra,Narrandera
1,Date,422015,418004,419021,425003,416001,406202,405232,421005,407202,425012,421090,421088,421004,425010,409025,423202CO,423203A,410005
2,8/09/1953,,,,,,,,,337.61,,,,,,,,,
3,9/09/1953,,,,,,,,,422.55,,,,,,,,,
4,10/09/1953,,,,,,,,,653.59,,,,,,,,,


## Transform gauge data (model and gauge)

Organising dataframe to get it ready for analysis:
- Putting gauge numbers as column headings
- stripping header information and using this data to filter the gauge mapping dataframe to only the locations of interest

d
 #### Model data transformation

In [0]:
DataFrame = Data.loc[2:]
DataFrame.columns = map(str.strip,
                           Data.loc[1].astype(str).tolist())
DataFrame['Date'] = DataFrame['Date'].apply(pd.to_datetime, format='%d/%m/%Y')

DataFrame.set_index('Date')

#combining sites 421090 and 421088 as they are represented as a single site in the model 
DataFrame['421090'] = pd.to_numeric(DataFrame['421090']) \
    + pd.to_numeric(DataFrame['421088'])
DataFrame = DataFrame.drop(['421088'], axis=1)
DataFrame.head()

Unnamed: 0,Date,422015,418004,419021,425003,416001,406202,405232,421005,407202,425012,421090,421004,425010,409025,423202CO,423203A,410005
2,1953-09-08,,,,,,,,,337.61,,,,,,,,
3,1953-09-09,,,,,,,,,422.55,,,,,,,,
4,1953-09-10,,,,,,,,,653.59,,,,,,,,
5,1953-09-11,,,,,,,,,800.35,,,,,,,,
6,1953-09-12,,,,,,,,,922.35,,,,,,,,


In [0]:
gauge_mapping = pd.read_csv("https://data.gov.au/data/dataset/7c44535b-4a6a-432d-acff-00ec578ce7b9/resource/265dd4f0-08e1-4485-be9e-0feac8deb5f0/download/gaugemapping.csv", usecols = [0,1,2,3,4])
gauge_mapping.head()

Unnamed: 0,catchment,gauge_name,node,gauge_number,fresh
0,Condamine-Balonne,CULGOA RIVER AT D/S COLLERINA (KENEBREE),422006_,422006,
1,Condamine-Balonne,BOKHARA RIVER AT BOKHARA (GOODWINS),422005_,422005,
2,Gwydir,MEHI RIVER NEAR COLLARENEBRI,6COLARG,418055,
3,Loddon,LODDON RIVER @ APPIN SOUTH,407205,407205,
4,Macquarie-Castlereagh,MACQUARIE RIVER AT CARINDA (BELLS BRIDGE),8FGCARR,421012,


In [0]:
CatchmentGaugeMapping = Data.loc[0:1]
CatchmentGaugeMapping.columns = Data.loc[1].str.strip().tolist()
CatchmentGaugeMapping = CatchmentGaugeMapping.drop('Date', axis=1)
CatchmentGaugeMapping = CatchmentGaugeMapping.transpose()
CatchmentGaugeMapping.columns = ['Name', 'Gauge']

CatchmentGaugeMapping = CatchmentGaugeMapping.reset_index()
CatchmentGaugeMapping = \
    CatchmentGaugeMapping.rename(columns={'index': 'gauge_number'})

CatchmentGaugeMapping = pd.merge(CatchmentGaugeMapping, gauge_mapping,
                                 on='gauge_number')
CatchmentGaugeMapping.head()

Unnamed: 0,gauge_number,Name,Gauge,catchment,gauge_name,node,fresh
0,422015,Brenda on Culgoa,422015,Condamine-Balonne,Brenda on Culgoa,422015A,1000.0
1,418004,Yarraman Bridge,418004,Gwydir,Yarraman Bridge,6YARMAN,540.0
2,419021,Bugilbone,419021,Namoi,Bugilbone,7BUGILB,1800.0
3,425003,Bourke,425003,Overall North,Bourke,9GSBOUR,6000.0
4,416001,Mungindi,416001,Border Rivers,Mungindi,5MUNGDG,4000.0


In [0]:
# Apply water year and merge catchment gauge mapping with the 845 model data to associate fresh value with the flow for each location
allsites_845['water year'] = allsites_845.apply(lambda row: \
        waterYear(row['date']), axis=1)

melted845 = pd.melt(allsites_845, id_vars=['Dy', 'Mn', 'Year', 'date',
                    'water year'], var_name='Node', value_name='Flow')

CatchmentGaugeMapping_noIndex = CatchmentGaugeMapping.reset_index()

freshmerge845 = pd.merge(melted845, CatchmentGaugeMapping_noIndex,
                         left_on='Node', right_on='node')


# no fresh threshold value available at this site

freshmerge845 = \
    freshmerge845[~freshmerge845.Node.str.contains('423203AC')]

# gauge merging has two gauges to the sole node at Marebone - dropped in 845 due to duplication

freshmerge845 = \
    freshmerge845[~freshmerge845.gauge_number.str.contains('421088')]

freshmerge845 = freshmerge845.drop(['node'], axis=1)
freshmerge845.Flow = pd.to_numeric(freshmerge845.Flow)
freshmerge845.fresh = pd.to_numeric(freshmerge845.fresh)

#### Gauge data transformation

In [0]:
# Adding water year and filtering the data to only include observed flow data after the cap on diversions was introduced (1994):
DataFrame['water year'] = DataFrame.apply(lambda row: \
        waterYear(row['Date']), axis=1)
DataFrame = DataFrame[DataFrame['water year'] >= 1994]

In [0]:
meltedDataFrame = pd.melt(DataFrame, id_vars=['Date', 'water year'
                             ], var_name='ID', value_name='Outflow')

obsfresh = pd.merge(meltedDataFrame, CatchmentGaugeMapping,
                     left_on='ID', right_on='gauge_number')

obsfresh = obsfresh[~obsfresh.ID.str.contains('423203A')]
obsfresh = obsfresh[~obsfresh.ID.str.contains('421004')]

obsfresh = obsfresh.reset_index()
obsfresh = obsfresh.drop(['index'], axis=1)

obsfresh.fresh = pd.to_numeric(obsfresh.fresh)

obsfresh.Outflow = pd.to_numeric(obsfresh.Outflow)

d
 ## Calculating days above the flow threshold and grouping by water year

#### Model data

In [0]:
freshmerge845['freshcount'] = freshmerge845.apply(lambda row: row['Flow'
        ] > row['fresh'], axis=1)

freshtrue845 = freshmerge845[freshmerge845['freshcount']]
countedfresh845 = freshtrue845.groupby(['Node', 'water year',
        'catchment', 'fresh'], as_index=False).count()[['freshcount',
        'water year', 'catchment', 'Node']]

# Taking a dataframe, pivoting and filtering to have a multi-index dataframe with the water year and location as the dual index, 
# and the days above the fresh threshold in the equivilent column
countedfreshpivot845 = countedfresh845.pivot_table(index='Node',
        columns='water year', values='freshcount', fill_value=0,
        aggfunc='sum').unstack().to_frame().rename(columns={0: 'Freshes'
        })

countedfreshpivot845 = countedfreshpivot845.reset_index(drop=False)

countedfreshpivot845.head()

#### Gauge data

In [0]:
obsfresh['freshcount'] = obsfresh.apply(lambda row: row['Outflow'] \
        > row['fresh'], axis=1)

freshtrue = obsfresh[obsfresh['freshcount']]
countedfresh = freshtrue.groupby([
    'ID',
    'water year',
    'catchment',
    'gauge_name',
    'node',
    'fresh',
    ], as_index=False).count()[[
    'freshcount',
    'water year',
    'ID',
    'catchment',
    'gauge_name',
    'node',
    ]]

# Taking a dataframe, pivoting and filtering to have a multi-index dataframe with the water year and location as the dual index, 
# and the days above the fresh threshold in the equivilent column
countedfreshpivot = countedfresh.pivot_table(index='ID',
        columns='water year', values='freshcount', fill_value=0,
        aggfunc='sum').unstack().to_frame().rename(columns={0: 'Freshes'
        })
countedfreshpivot = countedfreshpivot.reset_index(drop=False)

countedfreshpivot.head()

## Calculating the 'Fresh per ML' ratio

#### Gauge data

In [0]:
# Merging site metadata into the fresh dataframe
finalfresh = countedfreshpivot.merge(obsfresh.groupby(['water year',
        'ID', 'catchment', 'gauge_name', 'node'],
        as_index=False).first()[['water year', 'ID', 'catchment',
        'gauge_name', 'node']], how='inner', on=['water year', 'ID'])


In [0]:
# Taking the fresh dataframe and adding the inflows column, 
# calculating the 'fresh per ML' ratio using the inflow and fresh count columns:
finalfinalfresh = finalfresh.merge(meltedinflows, left_on=['water year'
                                   , 'catchment'],
                                   right_on=['Water Year', 'Catchment'])
finalfinalfresh['Ratio'] = finalfinalfresh.apply(lambda row: \
        row['Freshes'] / row['inflow'], axis=1)
finalfinalfresh.head()


#### Model data

In [0]:
finalfresh845 = \
    countedfreshpivot845.merge(freshmerge845.groupby(['water year',
                               'gauge_number', 'catchment', 'gauge_name'
                               , 'Node'],
                               as_index=False).first()[['water year',
                               'gauge_number', 'catchment', 'gauge_name'
                               , 'Node']], how='left', on=['water year'
                               , 'Node'])

finalfresh845 = finalfresh845[finalfresh845['water year'] >= 1911]

finalfresh845 = finalfresh845.merge(meltedinflows, left_on=['water year'
                                    , 'catchment'],
                                    right_on=['Water Year', 'Catchment'
                                    ])
# Calculating the 'fresh per ML' ratio using the inflow and fresh count columns:
finalfresh845['Ratio'] = finalfresh845.apply(lambda row: row['Freshes'] \
        / row['inflow'], axis=1)

finalfresh845 = finalfresh845.drop(['Water Year'], axis=1)

finalfresh845.head()

## Compare pre and post Basin Plan
Compare the pre and post Basin Plan baseflow index using:
- Welsh's T-test (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html)
- the KS two sample test (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html)

In [0]:
def siteloop(
    ResultsDataFrame,
    ID,
    Catchment,
    quiet=True,
    ):
    '''Takes in dataframe with fresh/inflow ratio and date,
  filters the dataframe to pre and post basin plan periods,
  runs  Welsh's t test and ks two sample test on both periods,
  returns the results dataframe'''
    pre = np.array(ResultsDataFrame[(ResultsDataFrame['water year']
                   < 2012) & (ResultsDataFrame['ID'] == ID)]['Ratio'])
    post = np.array(ResultsDataFrame[(ResultsDataFrame['water year']
                    >= 2012) & (ResultsDataFrame['ID'] == ID)]['Ratio'])

    (ksStat, KsP) = scipy.stats.ks_2samp(pre, post)
    (tStat, tP) = scipy.stats.ttest_ind(pre, post, equal_var=False)

    Outcome = Significant(KsP, tStat, tP, alpha)

    if not quiet:
        print (ID, scipy.stats.ks_2samp(pre, post))
        print (ID, scipy.stats.ttest_ind(pre, post, equal_var=False))

    StepDataFrame = pd.DataFrame({
        'Catchment': [Catchment],
        'ID': [ID],
        'Metric': ['Flow Thresholds'],
        'Source': ['Observed'],
        'Ks_2sampResult statistic': [ksStat],
        'Ks_2sampResult pvalue': [KsP],
        "Welch's t-test statistic": [tStat],
        "Welch's t-test pvalue": [tP],
        'Outcome': [Outcome],
        })

    return StepDataFrame


def Significant(
    Ksp,
    tStat,
    tP,
    alpha,
    ):
    '''Takes in results of statistical tests,
  compares the results of the two tests to an alpha value defined by the operator,
  returns the significance'''
    if Ksp < alpha and tStat < 0 and tP < alpha:
        outcome = 'Improved'
    elif tStat > 0 and Ksp < alpha and tP < alpha:
        outcome = 'Degraded'
    elif Ksp > alpha and tP > alpha:
        outcome = 'Maintained'
    elif Ksp < alpha and tP > alpha:
        outcome = 'Unsure - t-test failed'
    else:
        outcome = 'Unsure - ks-test failed'
    return outcome


##Selecting an Alpha
With two tests with \\(\alpha\\)s set at 0.1, the probability of observing a false statistically significant results in both tests is 1%  

Typically, methods for dealing with multiple tests call for adjusting alpha in some way, however, these methods are designed for statistical investigations looking for a single significant result, ‘a discovery’. This is not the case in the application of two statistical tests looking for concurrent significant results.  

Setting alpha to 0.1 in both tests so that the chance of a false positive ‘increased’ or ‘decreased’ result is 1% is suitably rigorous and decidedly reasonable for the task at hand.

In [0]:
alpha = 0.1

StatsResults = pd.DataFrame(data=[], columns=[
    'Catchment',
    'ID',
    'Metric',
    'Source',
    'Ks_2sampResult statistic',
    'Ks_2sampResult pvalue',
    "Welch's t-test statistic",
    "Welch's t-test pvalue",
    'Outcome',
    ])

for ID in finalfinalfresh['ID'].unique():

    Catchment = finalfinalfresh[finalfinalfresh['ID']
                                == ID].Catchment.unique()[0]
    StepDataFrame = siteloop(finalfinalfresh, ID, Catchment)
    StatsResults = StatsResults.append(StepDataFrame)

StatsResults

#### Stats: Model data
Pre basin Plan period: (1911 to mid 2009)

Post Basin Plan period: (2012 to mid 2019)

In [0]:
def siteloop(
    ResultsDataFrame845,
    ResultsDataFrame,
    ID,
    Catchment,
    quiet=True,
    ):
    '''Takes in dataframe with fresh/inflow ratio and date,
  filters the dataframe to pre and post basin plan periods,
  runs  Welsh's t test and ks two sample test on both periods,
  returns the results dataframe'''
    pre = np.array(ResultsDataFrame845[(ResultsDataFrame845['water year'
                   ] < 2012) & (ResultsDataFrame845['ID']
                   == ID)]['Ratio'])
    post = np.array(ResultsDataFrame[(ResultsDataFrame['water year']
                    >= 2012) & (ResultsDataFrame['ID'] == ID)]['Ratio'])

    (ksStat, KsP) = scipy.stats.ks_2samp(pre, post)
    (tStat, tP) = scipy.stats.ttest_ind(pre, post, equal_var=False)

    Outcome = Significant(KsP, tStat, tP, alpha)

    if not quiet:
        print (ID, scipy.stats.ks_2samp(pre, post))
        print (ID, scipy.stats.ttest_ind(pre, post, equal_var=False))

    StepDataFrame = pd.DataFrame({
        'Catchment': [Catchment],
        'ID': [ID],
        'Metric': ['Flow Thresholds'],
        'Source': ['Model'],
        'Ks_2sampResult statistic': [ksStat],
        'Ks_2sampResult pvalue': [KsP],
        "Welch's t-test statistic": [tStat],
        "Welch's t-test pvalue": [tP],
        'Outcome': [Outcome],
        })

    return StepDataFrame


def Significant(
    Ksp,
    tStat,
    tP,
    alpha,
    ):
    '''Takes in results of statistical tests,
  compares the results of the two tests to an alpha value defined by the operator,
  returns the significance'''
    if Ksp < alpha and tStat < 0 and tP < alpha:
        outcome = 'Improved'
    elif tStat > 0 and Ksp < alpha and tP < alpha:
        outcome = 'Degraded'
    elif Ksp > alpha and tP > alpha:
        outcome = 'Maintained'
    elif Ksp < alpha and tP > alpha:
        outcome = 'Unsure - t-test failed'
    else:
        outcome = 'Unsure - ks-test failed'
    return outcome


In [0]:

StatsResults845 = pd.DataFrame(data=[], columns=[
    'Catchment',
    'ID',
    'Metric',
    'Source',
    'Ks_2sampResult statistic',
    'Ks_2sampResult pvalue',
    "Welch's t-test statistic",
    "Welch's t-test pvalue",
    'Outcome',
    ])

finalfresh845['ID'] = finalfresh845['gauge_number']

for ID in finalfresh845['ID'].unique():

    Catchment = finalfresh845[finalfresh845['ID']
                              == ID].Catchment.unique()[0]
    StepDataFrame = siteloop(finalfresh845, finalfinalfresh, ID,
                             Catchment)
    StatsResults845 = StatsResults845.append(StepDataFrame)

StatsResults845


In [0]:
frames = [StatsResults845, StatsResults]
  
finalThresholdsresult = pd.concat(frames)
finalThresholdsresult