# Transmission Ratio Metric

An overarching aim of the Basin Plan is the increase flow through the river system. The transmission ratio is the primary or first-order metric that examines whether overall inflow-to-outflow volume has changed since the Basin Plan was first implemented. An increased transmission ratio indicates that a greater proportion of inflows passed through the river. Changes in transmission ratio can occur due to a number of factors, such as changes to river operations or climate. A straightforward transmission calculation represents the amount of flow that reaches end of system gauges for each catchment. Within this umbrella measurement, the metrics listed below track which parts of the flow regime may have changed. 

This analysis is not suitable for determining or tracking the river losses as it does not distinguish between water taken for use or used on the floodplain for watering wetlands and forests or water lost through the movement of water down a river channel. The analysis also does not account for impacts of trade. 
  
## Inputs: 

[AWRA-L inflow data](https://data.gov.au/data/dataset/e65078cd-808d-4514-ab60-17e597b9a883/resource/7442a111-2894-4572-aa41-1f488bf06636)

[End of system flows](https://data.gov.au/data/dataset/7c44535b-4a6a-432d-acff-00ec578ce7b9/resource/076140d9-f70c-48b0-a310-e9eae0c99021)

[Modeled flows Baseline 845](https://data.gov.au/data/dataset/9e3d2d32-33e7-4270-a8af-c655d6eb7710/resource/64cc37eb-19a0-4a80-b99b-6ddb4da71e49)
## Outputs:

[Results](https://data.gov.au/data/dataset/hydrologic-indicator-results-for-the-basin-plan-evaluation-2020)

In [0]:
import pandas as pd
from datetime import datetime
import numpy as np
import scipy.stats
import itertools
import warnings 
warnings.filterwarnings('ignore')

## Load Model data

Loading in the 845 Model Baseline. This scenario represents baseline conditions as specified in the Basin Plan (conditions as at 2009).

*Please note while the 845 scenario was part of the information base used to develop the Basin Plan run 871 has subsequently become the baseline scenario for legislative purposes.*

In [0]:
allsites_845_daily = \
    pd.read_csv('https://data.gov.au/data/dataset/9e3d2d32-33e7-4270-a8af-c655d6eb7710/resource/64cc37eb-19a0-4a80-b99b-6ddb4da71e49/download/modelledflows_modelrun845.csv', encoding='latin1'
                 )

In [0]:
# defining function to clean model data and convert to pandas dataframe

def removeHeader(PandasDataframe):
    """ Extracts a clean dataframe from a model run CSV
  Takes a pandas dataframe and removes the header information by looking for EOH
  Renames the columns and produces a date data type to use as the index
  """

  # find the end of header (EOH) row

    idx = \
        PandasDataframe.index[PandasDataframe[PandasDataframe.columns[0]]
                              == 'EOH'].tolist()

  # extract the data below the header

    data = PandasDataframe[idx[0] + 1:]

  # extract the column names

    columns = PandasDataframe.loc[idx[0] - 1].tolist()

  # rename the dataframs columns

    data.columns = PandasDataframe.loc[idx[0] - 1].tolist()

  # Check date format

    if data.columns[0:3].tolist() == ['Dy', 'Mn', 'Year']:
        data['date'] = pd.to_datetime(data.Year.astype(int) * 10000
                + data.Mn.astype(int) * 100 + data.Dy.astype(int),
                format='%Y%m%d')

    # data = data.set_index(["date"])

    if data.columns[0:3].tolist() == ['YYYY', 'MM', 'DD']:
        data['date'] = pd.to_datetime(data.YYYY.astype(int) * 10000
                + data.MM.astype(int) * 100 + data.DD.astype(int),
                format='%Y%m%d')

    # data = data.set_index("date")

    return data


In [0]:
# cleaning and converting model 845 to pandas dataframe

allsites_845 = removeHeader(allsites_845_daily)

##Load catchment inflow data

Load daily inflow data for each catchment from Australian Landscape Water Balance Landscape model. 

The data is loaded in, cleaned, and grouped by the water year 

This dataset is stored in https://data.gov.au/data/dataset/e65078cd-808d-4514-ab60-17e597b9a883/resource/7442a111-2894-4572-aa41-1f488bf06636

Inflows are given by:  
 $$ Inflows = Runoff*Surface Area$$

Where runoff from 1911 - 2018/19 was provided by Bureau of Meteorology’s (BoM) AWRA Modelling Team from the Australian Water Resources Assessment Landscape model (AWRA-L) version 6.0. 

Surface Area calculated from shapefile of catchments (available [here](https://services8.arcgis.com/5xxEi7I2m6ml97fE/arcgis/rest/services/BASIN_PLAN_REGIONS/FeatureServer))

In [0]:
RawBOMData = pd.read_csv('http://az3mdbastg001.blob.core.windows.net/mdba-landingzone-dataset/resources/c474dda0-0bf6-48a1-9f41-8a72c7e5015c/raw-data-from-bom.csv')
RawBOMData.head()

Unnamed: 0,Column1,Barwon-Darling,Border Rivers,Campaspe,Condamine-Balonne,Eastern Mt Lofty Ranges,Goulburn-Broken,Gwydir,Lachlan,Loddon,Lower Darling,Macquarie-Castlereagh,Moonie,Murray,Murrumbidgee,Namoi,Ovens,Paroo,Warrego,Wimmera-Avoca
0,1/01/1911,280.945743,889.116774,448.18278,464.969232,731.547092,5352.450565,842.971632,3708.877278,244.310367,361.576414,1989.390134,10.016151,32348.09683,12736.19431,1018.08705,7820.42964,197.437945,311.490847,318.456924
1,2/01/1911,245.939033,931.281696,354.872346,551.652614,561.807131,4426.229832,869.601585,2982.273356,217.188593,267.220044,1854.414908,7.661954,25171.03952,10177.40817,1024.492008,6337.477974,175.114516,365.595952,284.131808
2,3/01/1911,221.136955,996.579011,287.529469,656.670069,443.33388,3747.495391,1056.964622,2521.92074,197.458266,207.665376,1802.773964,6.639534,19788.21003,8278.185418,1030.521136,5224.703577,161.117734,380.395537,259.23294
3,4/01/1911,228.037716,906.414042,238.892884,565.38187,360.467282,3252.306652,1344.195169,2588.127451,183.177053,172.462025,1928.902206,5.369864,16204.2339,7446.809863,1009.258218,4470.30592,152.364875,332.566101,240.716578
4,5/01/1911,282.894836,780.040817,211.475708,494.168382,302.34578,3000.776435,1054.972176,3047.736346,185.881273,172.42676,2834.161997,4.33717,15689.32201,7848.837431,1056.232902,4109.854376,156.725326,300.506739,231.350913


In [0]:
def transformPipline(RawDataframe):
    """
  Single function to transform raw dataframe from blob into a pandas dataframe ready for analysis
  """

  # Turn Column1 into Date

    DailyRunoffDataframe = RawDataframe.rename({'Column1':'Date'}, axis =1 )

  # total up northen basin catchments

    NorthernBasinCatchments = [
        'Barwon-Darling',
        'Border Rivers',
        'Condamine-Balonne',
        'Gwydir',
        'Macquarie-Castlereagh',
        'Moonie',
        'Namoi',
        'Paroo',
        'Warrego',
        ]

    DailyRunoffDataframe['Northern Basin'] = \
        DailyRunoffDataframe.apply(lambda row: \
                                   row[NorthernBasinCatchments].sum(),
                                   axis=1)

  # total up southern basin catchments

    SouthernBasinCatchments = [
        'Campaspe',
        'Eastern Mt Lofty Ranges',
        'Goulburn-Broken',
        'Lachlan',
        'Loddon',
        'Lower Darling',
        'Murray',
        'Murrumbidgee',
        'Ovens',
        'Wimmera-Avoca',
        ]

    DailyRunoffDataframe['Southern Basin'] = \
        DailyRunoffDataframe.apply(lambda row: \
                                   row[SouthernBasinCatchments].sum(),
                                   axis=1)

  # total up all catchments

    AllCatchments = NorthernBasinCatchments + SouthernBasinCatchments

    DailyRunoffDataframe['Total MDB'] = \
        DailyRunoffDataframe.apply(lambda row: \
                                   row[AllCatchments].sum(), axis=1)

  # convert to a datetime data type

    DailyRunoffDataframe['Date'] = \
        pd.to_datetime(DailyRunoffDataframe['Date'], format='%d/%m/%Y')

  # drop Nulls

    DailyRunoffDataframe = DailyRunoffDataframe.dropna()

    return DailyRunoffDataframe


In [0]:
def waterYear(date):
    '''Takes in date,
  changes year to water year
  returns water year'''
    if date.month <= 6:  # for months Jan to Jun move them to the previous water year
        waterYear = date.year - 1
    else:

         # for months after Jun move them to this  water year

        waterYear = date.year
    return int(waterYear)

In [0]:
DailyRunoffDataframe = transformPipline(RawBOMData)

# apply water year function to populate the water year column

DailyRunoffDataframe['Water Year'] = \
    DailyRunoffDataframe.apply(lambda row: waterYear(row['Date']),
                               axis=1)

In [0]:
# summing annual inflow by water year

AnnualisedInflow = DailyRunoffDataframe.groupby('Water Year').sum()

AnnualisedInflow = \
    AnnualisedInflow.rename(columns={'Northern Basin': 'Overall North '
                            , 'Southern Basin': 'Overall South ',
                            'Total MDB': 'Overall MDBA System '})

## Load gauges of interest for Transmission Ratio 
Gets gauges of interest and their associated gauge data.

Transmission ratio gauge information can be found: https://data.gov.au/data/dataset/7c44535b-4a6a-432d-acff-00ec578ce7b9/resource/076140d9-f70c-48b0-a310-e9eae0c99021

In [0]:
EosData = pd.read_csv('https://data.gov.au/data/dataset/7c44535b-4a6a-432d-acff-00ec578ce7b9/resource/076140d9-f70c-48b0-a310-e9eae0c99021/download/observedflows_transmissionofflows.csv', header=None)
EosData.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21
0,,Border Rivers,Campaspe,Condamine-Balonne,Condamine-Balonne,Goulburn-Broken,Gwydir,Lachlan,Loddon,Macquarie-Castlereagh,Macquarie-Castlereagh,Moonie,Murrumbidgee,Namoi,Ovens,Paroo,Warrego,Warrego,Wimmera-Avoca,Overall North,Overall South,Overall MDBA System
1,,BARWON RIVER AT MUNGINDI,CAMPASPE RIVER @ ROCHESTER,CULGOA RIVER AT D/S COLLERINA (KENEBREE),BOKHARA RIVER AT BOKHARA (GOODWINS),GOULBURN RIVER @ McCOYS BRIDGE,MEHI RIVER NEAR COLLARENEBRI,LACHLAN RIVER AT BOOLIGAL,LODDON RIVER @ APPIN SOUTH,MACQUARIE RIVER AT CARINDA (BELLS BRIDGE),MARTHAGUY CREEK AT CARINDA,MOONIE RIVER AT GUNDABLOUIE,MURRUMBIDGEE RIVER AT DOWNSTREAM BALRANALD WEIR,NAMOI RIVER AT GOANGRA,OVENS RIVER @ WANGARATTA,PAROO RIVER AT WILLARA CROSSING,WARREGO RIVER AT FORDS BRIDGE (MAIN CHANNEL),WARREGO RIVER AT FORDS BRIDGE BYWASH,WIMMERA RIVER @ LOCHIEL RAILWAY BRIDGE,DARLING RIVER AT WILCANNIA MAIN CHANNEL,Euston,River Murray at Lock 1 Downstream
2,Date,416001,406202,422006,422005,405232,418055,412005,407205,421012,421011,417001,410130,419026,403200,424002,423001,423002,415246,425008,414203,A4260903
3,1/07/1970,,,0,,,,,,88.938,0,,,,,,,,,,,5633.1072
4,2/07/1970,,,0,,,,,,85.673,0,,,,,,,,,,,5811.1776


## Transform gauge data (model and gauge)

Organising dataframe to get it ready for analysis:
- Putting gauge numbers as column headings
- stripping header information and using this data to filter the gauge mapping dataframe to only the locations of interest

#### Gauge data transformation

In [0]:
EosDataFrame = EosData.loc[3:]
EosDataFrame.columns = map(str.strip,
                           EosData.loc[2].astype(str).tolist())
EosDataFrame['Date'] = EosDataFrame['Date'].apply(pd.to_datetime, format='%d/%m/%Y')

EosDataFrame.set_index('Date')
EosDataFrame.head()

Unnamed: 0,Date,416001,406202,422006,422005,405232,418055,412005,407205,421012,421011,417001,410130,419026,403200,424002,423001,423002,415246,425008,414203,A4260903
3,1970-07-01,,,0,,,,,,88.938,0,,,,,,,,,,,5633.1072
4,1970-07-02,,,0,,,,,,85.673,0,,,,,,,,,,,5811.1776
5,1970-07-03,,,0,,,,,,83.92,0,,,,,,,,,,,7255.6128
6,1970-07-04,,,0,,,,,,81.858,0,,,,,,,,,,,9035.5392
7,1970-07-05,,,0,,,,,,80.21,0,,,,,,,,,,,9923.3856


In [0]:
CatchmentGaugeMapping = EosData.loc[0:2]
CatchmentGaugeMapping.columns = EosData.loc[2].str.strip().tolist()
CatchmentGaugeMapping = CatchmentGaugeMapping.drop('Date', axis=1)
CatchmentGaugeMapping = CatchmentGaugeMapping.transpose()
CatchmentGaugeMapping.columns = ["catchment",'Name', 'Gauge']

CatchmentGaugeMapping.head()

Unnamed: 0,catchment,Name,Gauge
416001,Border Rivers,BARWON RIVER AT MUNGINDI,416001
406202,Campaspe,CAMPASPE RIVER @ ROCHESTER,406202
422006,Condamine-Balonne,CULGOA RIVER AT D/S COLLERINA (KENEBREE),422006
422005,Condamine-Balonne,BOKHARA RIVER AT BOKHARA (GOODWINS),422005
405232,Goulburn-Broken,GOULBURN RIVER @ McCOYS BRIDGE,405232


In [0]:
EosDataFrame['water year'] = EosDataFrame.apply(lambda row: \
        waterYear(row['Date']), axis=1)

In [0]:
# Melting/transposing EosDataFrame

meltedEosDataFrame = pd.melt(EosDataFrame, id_vars=['Date', 'water year'
                             ], var_name='ID', value_name='Outflow')

meltedEosDataFrame.head()

Unnamed: 0,Date,water year,ID,Outflow
0,1970-07-01,1970,416001,
1,1970-07-02,1970,416001,
2,1970-07-03,1970,416001,
3,1970-07-04,1970,416001,
4,1970-07-05,1970,416001,


In [0]:
def gaugetocatchment(gaugeID):
  '''Takes gauge number,
  returns catchment based on the CatchmentGaugeMapping dataframe'''
  
  return CatchmentGaugeMapping.loc[gaugeID]['catchment']


gaugetocatchment('405232')


In [0]:
# Applying catchment to melted EosDataFrame

meltedEosDataFrame['Catchment'] = meltedEosDataFrame.apply(lambda x: \
        gaugetocatchment(x['ID']), axis=1)

meltedEosDataFrame.head()

Unnamed: 0,Date,water year,ID,Outflow,Catchment
0,1970-07-01,1970,416001,,Border Rivers
1,1970-07-02,1970,416001,,Border Rivers
2,1970-07-03,1970,416001,,Border Rivers
3,1970-07-04,1970,416001,,Border Rivers
4,1970-07-05,1970,416001,,Border Rivers


In [0]:
# Annualising EoS data

meltedEosDataFrame['Outflow'] = meltedEosDataFrame['Outflow'
        ].astype('float64')

AnnualisedEosDataFrame = meltedEosDataFrame[['Outflow', 'water year',
        'Catchment']].groupby(['Catchment', 'water year'],
                              as_index=False).sum()[['Outflow',
        'water year', 'Catchment']]
AnnualisedEosDataFrame.head()

# Melting/transposing inflow

AnnualisedInflow['Water Year'] = AnnualisedInflow.index
meltedinflows = pd.melt(AnnualisedInflow, id_vars=['Water Year'],
                        var_name='Catchment', value_name='inflow')
meltedinflows.head()


Unnamed: 0,Water Year,Catchment,inflow
0,1910,Barwon-Darling,126450.682659
1,1911,Barwon-Darling,227134.33317
2,1912,Barwon-Darling,164450.268088
3,1913,Barwon-Darling,81373.743705
4,1914,Barwon-Darling,64467.114776


d
 #### Model data transformation

In [0]:
gauge_mapping = pd.read_csv("https://data.gov.au/data/dataset/7c44535b-4a6a-432d-acff-00ec578ce7b9/resource/265dd4f0-08e1-4485-be9e-0feac8deb5f0/download/gaugemapping.csv", usecols = [0,1,2,3])
gauge_mapping

gauge_mapping.loc[gauge_mapping.catchment == 'Overall South',
                  'catchment'] = 'Overall South '

In [0]:
# Filtering to locations of interest

allsites_845['water year'] = allsites_845.apply(lambda row: \
        waterYear(row['date']), axis=1)

justnodes_845 = allsites_845.drop(['Dy', 'Mn', 'Year', 'date'], axis=1)

listofcol = justnodes_845.columns.tolist()

Gauges = CatchmentGaugeMapping.index.tolist()


def GaugeToNode(gauge):
  '''Takes in a gauge, 
  returns the matching node'''
  node = gauge_mapping[gauge_mapping['gauge_number'] == gauge]['node']
  return node

match = []
for gauge in Gauges:
    mapping = GaugeToNode(gauge)
    if len(mapping) > 0:

        match.append(mapping.tolist())

merged = list(itertools.chain.from_iterable(match))

joinmerged = list(set(listofcol) & set(merged))

justEOSnodes_845 = justnodes_845[joinmerged]

justEOSnodes_845['Water Year'] = allsites_845['water year']
justEOSnodes_845.set_index('Water Year')

justEOSnodes_845.head()

Unnamed: 0,EUSTDS,8FGCARR,7GOANGR,423001_,LOCK1US,422006_,422005_,ROCHEST,6COLARG,5MUNGDG,9GSWILC,8CARMAR,11FGBLW,10BOOLG,GSM-McC,417001_,424002_,Water Year
292,86206,102,112,0,43636,0,0,71,39,0,2,14,1522,200,2564,0,1128,1895
293,8534,109,106,0,43630,0,0,70,41,0,1,11,1521,201,2348,0,584,1895
294,7031,116,77,0,43615,0,0,72,67,0,1,8,1519,163,2064,0,377,1895
295,8399,122,99,0,43602,0,0,70,118,0,2,6,1485,105,1845,0,217,1895
296,10700,127,215,0,43610,0,0,71,83,0,5,4,1530,97,1314,0,164,1895


## Calculating the Transmission Ratio

#### Gauge data ratio

In [0]:
def Catchmenttogauge(CatchmentID):
    '''Takes in catchment,
    returns gauge number based on the CatchmentGaugeMapping dataframe'''
    
    return CatchmentGaugeMapping[CatchmentGaugeMapping['catchment']
                                 == CatchmentID].index.values[0]


Catchmenttogauge('Campaspe')

In [0]:
# Filtering the data to only include observed flow data after the cap on diversions was introduced (1994) and calculating the transmission ratio:

MergedResults = pd.merge(meltedinflows, AnnualisedEosDataFrame,
                         left_on=['Water Year', 'Catchment'],
                         right_on=['water year', 'Catchment'])

MergedResults = MergedResults.drop(['water year'], axis=1)

MergedResults = MergedResults[MergedResults['Water Year'] >= 1994]

MergedResults = MergedResults[['Outflow', 'inflow', 'Water Year',
                              'Catchment']].groupby(['Catchment',
        'Water Year'], as_index=False).sum()[['Outflow', 'inflow',
        'Water Year', 'Catchment']]

MergedResults['ID'] = MergedResults.apply(lambda x: \
        Catchmenttogauge(x['Catchment']), axis=1)

MergedResults['Ratio'] = MergedResults.apply(lambda row: row['Outflow'] \
        / row['inflow'], axis=1)

Results = MergedResults
observedPostDF = Results
MergedResults.head()


Unnamed: 0,Outflow,inflow,Water Year,Catchment,ID,Ratio
0,21232.998,591665.7,1994,Border Rivers,416001,0.035887
1,1558200.349,3716136.0,1995,Border Rivers,416001,0.419307
2,674860.196,1363526.0,1996,Border Rivers,416001,0.494938
3,49775.651,839890.3,1997,Border Rivers,416001,0.059264
4,1653292.883,3208394.0,1998,Border Rivers,416001,0.515302


#### Model data ratio

In [0]:
def nodetoCatchment(nodeID):
  '''Takes in the node,
  matches the node to the catchment,
  returns the catchment'''
  
  return gauge_mapping[gauge_mapping['node'] == nodeID]['catchment'
                                                       ].values[0]

In [0]:
# Cleaning 845 series, and merging the 845 series with available gauge results for both the pre and post basin plan period:
melt845 = pd.melt(justEOSnodes_845, id_vars=['Water Year'],
                  var_name='Node', value_name='Outflow')
melt845['Outflow'] = melt845['Outflow'].astype(int)
melt845 = melt845.groupby(['Node', 'Water Year'], as_index=False).sum()

melt845['Catchment'] = melt845.apply(lambda row: \
        nodetoCatchment(row['Node']), axis=1)

melt845 = melt845.groupby(['Water Year', 'Catchment'],
                          as_index=False).sum()

melt845 = melt845[melt845['Water Year'] > 1910]
melt845 = melt845.set_index('Water Year')

meltedEosDataFrameinflows = \
    meltedinflows.replace({'Overall North ': 'Overall North',
                          'Overall MDBA System ': 'Overall MDBA System'
                          })

modelmerge = pd.merge(melt845, meltedEosDataFrameinflows, how='left',
                      left_on=['Water Year', 'Catchment'],
                      right_on=['Water Year', 'Catchment'])

modelmerge['Ratio'] = modelmerge.apply(lambda row: row['Outflow'] \
        / row['inflow'], axis=1)


In [0]:
# For results display the node is renamed as the gauge location it represents

modelmerge = modelmerge.replace({'Overall North': 'Overall North ',
                                'Overall MDBA System': 'Overall MDBA System '
                                })

modelmerge['ID'] = modelmerge.apply(lambda x: \
                                    Catchmenttogauge(x['Catchment']),
                                    axis=1)
modelmerge.head()


Unnamed: 0,Water Year,Catchment,Outflow,inflow,Ratio,ID
0,1911,Border Rivers,65688,505802.4,0.129869,416001
1,1911,Campaspe,83955,142921.9,0.587419,406202
2,1911,Condamine-Balonne,50766,635247.3,0.079915,422006
3,1911,Goulburn-Broken,1419999,1865151.0,0.761332,405232
4,1911,Gwydir,25264,349458.5,0.072295,418055


## Compare pre and post Basin Plan
Compare the pre and post Basin Plan transmission ratio using:
- Welsh's T-test (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html)
- the KS two sample test (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html)

##Selecting an \\(\alpha\\)
With two tests with alphas set at 0.1, the probability of observing a false statistically significant results in both tests is 1%  

Typically, methods for dealing with multiple tests call for adjusting alpha in some way, however, these methods are designed for statistical investigations looking for a single significant result, ‘a discovery’. This is not the case in the application of two statistical tests looking for concurrent significant results.  

Setting alpha to 0.1 in both tests so that the chance of a false positive ‘increased’ or ‘decreased’ result is 1% is suitably rigorous and decidedly reasonable for the task at hand.

#### Stats: observed data
Pre basin Plan period: (1994 to 2012)

Post Basin Plan period: (2012 to mid 2019)

In [0]:
# Stats using an observed baseline - takes ratio column from the Results dataframe and splits pre/post BP for each catchment. Performs both Welch t-test and 2 sample ks test using the pre and post datasets. Produces a dataframe with results

def siteloop(ResultsDataFrame, Catchment, quiet=True):
  '''Takes in dataframe with transmission ratio and date,
  filters the dataframe to pre and post basin plan periods,
  runs  Welsh's t test and ks two sample test on both periods,
  returns the results dataframe'''
  pre = np.array(ResultsDataFrame[(ResultsDataFrame['Water Year']
                 < 2012) & (ResultsDataFrame['Catchment']
                 == Catchment)]['Ratio'])
  post = np.array(ResultsDataFrame[(ResultsDataFrame['Water Year']
                  >= 2012) & (ResultsDataFrame['Catchment']
                  == Catchment)]['Ratio'])

  (ksStat, KsP) = scipy.stats.ks_2samp(pre, post)
  (tStat, tP) = scipy.stats.ttest_ind(pre, post, equal_var=False)
  ID = ResultsDataFrame[ResultsDataFrame['Catchment']
                        == Catchment]['ID'].iloc[0]

  Outcome = Significant(KsP, tStat, tP, alpha)

  if not quiet:
      print (Catchment, scipy.stats.ks_2samp(pre, post))
      print (Catchment, scipy.stats.ttest_ind(pre, post,
             equal_var=False))
    
  StepDataFrame = pd.DataFrame({
    "Catchment":[Catchment], 
    "ID":[ID], 
    "Metric":["Transmission Ratio"], 
    "Source":["Observed"],
    "Ks_2sampResult statistic":[ksStat], 
    "Ks_2sampResult pvalue":[KsP], 
    "Welch’s t-test statistic":[tStat], 
    "Welch’s t-test pvalue":[tP], 
    "Outcome":[Outcome]
    }) 
  
  return StepDataFrame

  
def Significant (Ksp, tStat, tP, alpha):
  '''Takes in results of statistical tests,
  compares the results of the two tests to an alpha value defined by the operator,
  returns the significance'''
  if ((Ksp < alpha) and (tStat <0) and (tP < alpha)):
    outcome = "Improved" 
  elif (tStat >0 and Ksp <alpha and tP < alpha):
    outcome = "Degraded" 
  elif (Ksp >alpha and tP > alpha):
    outcome = "Maintained" 
  elif (Ksp <alpha and tP > alpha):
    outcome = "Unsure - t-test failed" 
  else:
    outcome = "Unsure - ks-test failed"
  return outcome
  

In [0]:
alpha = 0.1

StatsResults = pd.DataFrame(data=[],columns = [
  "Catchment", 
  "ID", 
  "Metric", 
  "Source", 
  "Ks_2sampResult statistic", 
  "Ks_2sampResult pvalue", 
  "Welch’s t-test statistic", 
  "Welch’s t-test pvalue", 
  "Outcome"
  ])

for Catchment in Results["Catchment"].unique():

  StepDataFrame = siteloop(Results, Catchment) 
  StatsResults = StatsResults.append(StepDataFrame)
  
StatsResults

Unnamed: 0,Catchment,ID,Metric,Source,Ks_2sampResult statistic,Ks_2sampResult pvalue,Welch’s t-test statistic,Welch’s t-test pvalue,Outcome
0,Border Rivers,416001,Transmission Ratio,Observed,0.349206,0.476043,1.129347,0.276108,Maintained
0,Campaspe,406202,Transmission Ratio,Observed,0.833333,0.000611,-2.832064,0.010607,Improved
0,Condamine-Balonne,422006,Transmission Ratio,Observed,0.555556,0.054803,3.258805,0.003459,Degraded
0,Goulburn-Broken,405232,Transmission Ratio,Observed,0.722222,0.00458,-3.232322,0.006615,Improved
0,Gwydir,418055,Transmission Ratio,Observed,0.357143,0.447066,-0.410818,0.689706,Maintained
0,Lachlan,412005,Transmission Ratio,Observed,0.444444,0.199883,-1.030214,0.332676,Maintained
0,Loddon,407205,Transmission Ratio,Observed,0.777778,0.001734,-3.186506,0.004114,Improved
0,Macquarie-Castlereagh,421012,Transmission Ratio,Observed,0.333333,0.536573,-0.492039,0.635144,Maintained
0,Moonie,417001,Transmission Ratio,Observed,0.380952,0.366222,1.213756,0.254527,Maintained
0,Murrumbidgee,410130,Transmission Ratio,Observed,0.555556,0.054803,-1.986319,0.083309,Improved


#### Stats: Model data
Pre basin Plan period: (1911 to mid 2009)

Post Basin Plan period: (2012 to mid 2019)

In [0]:
def modelloop(
    modelmergeDataFrame,
    observedPostDF,
    Catchment,
    quiet=True,
    ):
  '''Takes in dataframe with transmission ratio and date,
  filters the dataframe to pre and post basin plan periods,
  runs  Welsh's t test and ks two sample test on both periods,
  returns the results dataframe'''
  modelpre = \
      np.array(modelmergeDataFrame[(modelmergeDataFrame['Water Year']
               < 2012) & (modelmergeDataFrame['Catchment']
               == Catchment)]['Ratio'])
  post = np.array(observedPostDF[(observedPostDF['Water Year']
                  >= 2012) & (observedPostDF['Catchment']
                  == Catchment)]['Ratio'])
  (mksStat, mKsP) = scipy.stats.ks_2samp(modelpre, post)
  (mtStat, mtP) = scipy.stats.ttest_ind(modelpre, post,
          equal_var=False)
  ID = modelmergeDataFrame[modelmergeDataFrame['Catchment']
                           == Catchment]['ID'].iloc[0]

  Outcome = Significant(mKsP, mtStat, mtP, alpha)

  if not quiet:
      print (Catchment, scipy.stats.ks_2samp(modelpre, post))
      print (Catchment, scipy.stats.ttest_ind(modelpre, post,
             equal_var=False))

      
  ModelStepDataFrame = pd.DataFrame({
    "Catchment":[Catchment], 
    "ID":[ID], 
    "Metric":["Transmission Ratio"], 
    "Source":["Model"],"Ks_2sampResult statistic":[mksStat], "Ks_2sampResult pvalue":[mKsP], "Welch’s t-test statistic":[mtStat], "Welch’s t-test pvalue":[mtP], "Outcome":[Outcome]}) 
  
  return ModelStepDataFrame

def Significant (mKsp, mtStat, mtP, alpha):
  '''Takes in results of statistical tests,
  compares the results of the two tests to an alpha value defined by the operator,
  returns the significance'''
  if ((mKsp < alpha) and (mtStat <0) and (mtP < alpha)):
    outcome = "Improved" 
  elif (mtStat >0 and mKsp <alpha and mtP < alpha):
    outcome = "Degraded" 
  elif (mKsp >alpha and mtP >alpha):
    outcome = "Maintained" 
  elif (mKsp <alpha and mtP > alpha):
    outcome = "Unsure - t-test failed"
  else:
    outcome = "Unsure - ks-test failed"
  return outcome


In [0]:
ModelStatsResults = pd.DataFrame(data=[],columns = [
  "Catchment", 
  "ID", 
  "Metric", 
  "Source", 
  "Ks_2sampResult statistic", 
  "Ks_2sampResult pvalue", 
  "Welch’s t-test statistic", 
  "Welch’s t-test pvalue", 
  "Outcome"
  ])


for Catchment in modelmerge["Catchment"].unique():
  
  ModelStepDataFrame = modelloop(modelmerge,observedPostDF, 
                                 Catchment) 
  ModelStatsResults = ModelStatsResults.append(ModelStepDataFrame)
  

ModelStatsResults

Unnamed: 0,Catchment,ID,Metric,Source,Ks_2sampResult statistic,Ks_2sampResult pvalue,Welch’s t-test statistic,Welch’s t-test pvalue,Outcome
0,Border Rivers,416001,Transmission Ratio,Model,0.469388,0.076935,1.409355,0.203128,Unsure - t-test failed
0,Campaspe,406202,Transmission Ratio,Model,0.489796,0.057601,0.221735,0.828413,Unsure - t-test failed
0,Condamine-Balonne,422006,Transmission Ratio,Model,0.5,0.049611,3.668652,0.004154,Degraded
0,Goulburn-Broken,405232,Transmission Ratio,Model,0.306122,0.492484,-0.350225,0.73559,Maintained
0,Gwydir,418055,Transmission Ratio,Model,0.234694,0.810114,0.120538,0.907816,Maintained
0,Lachlan,412005,Transmission Ratio,Model,0.255102,0.721793,0.288777,0.781572,Maintained
0,Macquarie-Castlereagh,421012,Transmission Ratio,Model,0.479592,0.066673,0.883821,0.40652,Unsure - t-test failed
0,Moonie,417001,Transmission Ratio,Model,0.163265,0.989209,-0.274864,0.792051,Maintained
0,Murrumbidgee,410130,Transmission Ratio,Model,0.367347,0.271241,1.173659,0.277297,Maintained
0,Namoi,419026,Transmission Ratio,Model,0.653061,0.003649,4.897865,0.000493,Degraded


In [0]:
frames = [ModelStatsResults, StatsResults]

finalresult = pd.concat(frames)
finalresult

Unnamed: 0,Catchment,ID,Metric,Source,Ks_2sampResult statistic,Ks_2sampResult pvalue,Welch’s t-test statistic,Welch’s t-test pvalue,Outcome
0,Border Rivers,416001,Transmission Ratio,Model,0.469388,0.076935,1.409355,0.203128,Unsure - t-test failed
0,Campaspe,406202,Transmission Ratio,Model,0.489796,0.057601,0.221735,0.828413,Unsure - t-test failed
0,Condamine-Balonne,422006,Transmission Ratio,Model,0.5,0.049611,3.668652,0.004154,Degraded
0,Goulburn-Broken,405232,Transmission Ratio,Model,0.306122,0.492484,-0.350225,0.73559,Maintained
0,Gwydir,418055,Transmission Ratio,Model,0.234694,0.810114,0.120538,0.907816,Maintained
0,Lachlan,412005,Transmission Ratio,Model,0.255102,0.721793,0.288777,0.781572,Maintained
0,Macquarie-Castlereagh,421012,Transmission Ratio,Model,0.479592,0.066673,0.883821,0.40652,Unsure - t-test failed
0,Moonie,417001,Transmission Ratio,Model,0.163265,0.989209,-0.274864,0.792051,Maintained
0,Murrumbidgee,410130,Transmission Ratio,Model,0.367347,0.271241,1.173659,0.277297,Maintained
0,Namoi,419026,Transmission Ratio,Model,0.653061,0.003649,4.897865,0.000493,Degraded
