# Check duration of rainfall measurements

The observation files include rainfall since the last observation - that may not be one minute. There is an additional field "Period over which precipitation since last (AWS) observation is measured in minutes". This is to check the values in that period attribute and ensure the precipitation values are not duplicated. We do this to give us confidence that we can take a sum of the precipitation values to get the accumulated precipitation before/after a daily peak gust value.

In [1]:
import pandas as pd
from datetime import datetime, timedelta
from stndata import ONEMINUTESTNNAMES, ONEMINUTEDTYPE, ONEMINUTENAMES

TZ = {
    "QLD": 10,
    "NSW": 10,
    "VIC": 10,
    "TAS": 10,
    "SA": 9.5,
    "NT": 9.5,
    "WA": 8,
    "ANT": 0,
}

In [3]:
def loadDataFile(filename, stnState):
    try:
        df = pd.read_csv(
            filename,
            sep=",",
            index_col=False,
            dtype=ONEMINUTEDTYPE,
            names=ONEMINUTENAMES,
            header=0,
            parse_dates={"datetimeLST": [7, 8, 9, 10, 11]},
            na_values=["####"],
            skipinitialspace=True,
        )
    except Exception as err:
        print(f"Cannot load data from {filename}: {err}")

    df["datetimeLST"] = pd.to_datetime(df.datetimeLST, format="%Y %m %d %H %M")
    df["datetime"] = df.datetimeLST - timedelta(hours=TZ[stnState])
    df["date"] = df.datetime.dt.date
    # First have to set the datetime as index, then localize to UTC:
    df.set_index("datetime", inplace=True)
    df.set_index(df.index.tz_localize(tz="UTC"), inplace=True)
    return df

In [4]:
filename = r"X:\georisk\HaRIA_B_Wind\data\raw\from_bom\2023\1-minute\HD01D_Data_066037_9999999910402980.txt"
df = loadDataFile(filename, "NSW")

In [16]:
df[(df.rain_duration>1) & (df.rainfall>0.0)][['rainfall', 'rainq', 'rain_duration']]

Unnamed: 0_level_0,rainfall,rainq,rain_duration
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1999-02-08 04:44:00+00:00,0.2,N,4.0
1999-02-28 22:07:00+00:00,0.2,N,4.0
1999-03-13 10:39:00+00:00,0.2,N,3.0
1999-05-01 00:00:00+00:00,0.2,N,2.0
1999-07-13 02:14:00+00:00,0.4,N,4.0
...,...,...,...
2013-03-01 02:50:00+00:00,0.2,Y,2.0
2013-03-01 04:00:00+00:00,0.2,Y,2.0
2013-03-01 04:30:00+00:00,0.4,Y,2.0
2017-07-12 03:27:00+00:00,0.2,Y,2.0


In [19]:
df[(df.rain_duration>1) & (df.rainfall>0.0)]['rainq'].value_counts()

Y    1671
N      25
S       1
Name: rainq, dtype: int64

In [23]:
pd.crosstab(df[(df.rain_duration>1) & (df.rainfall>0.0)]['rain_duration'], df[(df.rain_duration>1) & (df.rainfall>0.0)]['rainq'])

rainq,N,S,Y
rain_duration,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2.0,13,1,1633
3.0,1,0,2
4.0,3,0,7
5.0,1,0,0
6.0,0,0,5
7.0,0,0,3
8.0,0,0,1
9.0,0,0,1
10.0,0,0,1
11.0,1,0,1


So at Sydney Airport, there are 1697 instances where the rainfall duration is greater than 1 minute, and there was precipitation recorded in that period. Of those, only 26 are marked as not quality checked or suspect. Can use this to potentially filter out suspect rainfall accumulations.

Next we will check the rainfall recorded in adjacent observations

In [17]:
idx = df[(df.rain_duration>1) & (df.rainfall>0.0)].index
for i in idx:
    startdt = i - timedelta(minutes=5)
    enddt = i + timedelta(minutes=5)
    print(df.loc[startdt:enddt][['rainfall', 'rainq', 'rain_duration']])

                           rainfall rainq  rain_duration
datetime                                                
1999-02-08 04:39:00+00:00       0.0     N            1.0
1999-02-08 04:40:00+00:00       0.2     N            1.0
1999-02-08 04:41:00+00:00       NaN   NaN            NaN
1999-02-08 04:42:00+00:00       NaN   NaN            NaN
1999-02-08 04:43:00+00:00       NaN   NaN            NaN
1999-02-08 04:44:00+00:00       0.2     N            4.0
1999-02-08 04:45:00+00:00       0.2     N            1.0
1999-02-08 04:46:00+00:00       0.0     N            1.0
1999-02-08 04:47:00+00:00       0.2     N            1.0
1999-02-08 04:48:00+00:00       0.0     N            1.0
1999-02-08 04:49:00+00:00       0.0     N            1.0
                           rainfall rainq  rain_duration
datetime                                                
1999-02-28 22:02:00+00:00       0.0     N            1.0
1999-02-28 22:03:00+00:00       0.0     N            1.0
1999-02-28 22:04:00+00:00      

This produces a lot of output, so it can be collapsed.