# Rain gauge checks for quality control flags
Covers QC1-7

## Table of contents
[QC1 Percentiles](#QC1---Percentiles)  
[QC2 K-largest](#QC2---K-largest)  
[QC3 Days of week](#QC3---Days-of-week)  
[QC4 Hours of day](#QC4---Hours-of-day)  
[QC5 Intermittency](#QC5---Intermittency)  
[QC6 Breakpoints](#QC6---Breakpoints)  
[QC7 Minimum value change](#QC7---Minimum-value-change)  

See '3.3 Suspect gauges' in Lewis et al. (2021)

In [1]:
import datetime
import polars as pl
import pandas as pd
import numpy as np

In [132]:
def read_intense_metadata(data_path):
    metadata = {}

    with open(data_path, 'r') as f:
        while True:
            key, val = f.readline().strip().split(':', maxsplit=1)
            key = key.lower().replace(' ', '_')
            metadata[key.strip()] = val.strip()
            if key == 'other':
                break
    return metadata

In [134]:
metadata = read_intense_metadata(data_path='../data/gauge_data/DE_02483.txt')
startdate = datetime.datetime.strptime(metadata['start_datetime'], '%Y%m%d%H')
enddate = datetime.datetime.strptime(metadata['end_datetime'], '%Y%m%d%H')

In [151]:
metadata

{'station_id': 'DE_02483',
 'country': 'Germany',
 'original_station_number': '02483',
 'original_station_name': 'NA',
 'path_to_original_data': 'B:/INTENSE data/Original data/Germany/hourly/precipitation/historical/stundenwerte_RR_02483_19951012_20141231_hist.zip,B:/INTENSE data/Original data/Germany/hourly/precipitation/recent/stundenwerte_RR_02483_akt.zip',
 'latitude': '51.1803',
 'longitude': '8.4891',
 'start_datetime': '2006010100',
 'end_datetime': '2010123123',
 'elevation': '839m',
 'number_of_records': '43824',
 'percent_missing_data': '0.00',
 'original_timestep': '1hr',
 'new_timestep': '1hr',
 'original_units': 'mm',
 'new_units': 'mm',
 'time_zone': 'CET',
 'daylight_saving_info': 'NA',
 'no_data_value': '-999',
 'resolution': '0.10',
 'other': ''}

In [154]:
precip_col = f'precip_{metadata['original_units']}'

In [155]:
def get_delta(d1, d2):
    delta = d2 - d1
    return delta

hourly_date_interval = []
delta = get_delta(startdate, enddate+datetime.timedelta(days=1))
for i in range(delta.days * 24):
    hourly_date_interval.append(startdate + datetime.timedelta(hours=i))


In [157]:
gauge_data = pl.read_csv('../data/gauge_data/DE_02483.txt', skip_rows=20)

assert len(gauge_data) == len(hourly_date_interval)

## set time columns
gauge_data = gauge_data.with_columns(time=pl.Series(hourly_date_interval))

## Rename
gauge_data = gauge_data.rename({'Other: ': precip_col})

## Reorder (to look nice)
gauge_data = gauge_data.select(['time', precip_col])

In [158]:
gauge_data.head()

time,precip_mm
datetime[μs],f64
2006-01-01 00:00:00,0.9
2006-01-01 01:00:00,0.3
2006-01-01 02:00:00,0.3
2006-01-01 03:00:00,0.0
2006-01-01 04:00:00,0.0


# QC1 - Percentiles 
[Back to Index](#Table-of-contents)

#### Differences from `intenseqc`: 
- 

In [207]:
perc95 = gauge_data.group_by_dynamic('time', every='1y').agg(pl.quantile("precip_mm", .95))
perc99 = gauge_data.group_by_dynamic('time', every='1y').agg(pl.quantile("precip_mm", .99))

In [208]:
list(perc95.filter(pl.col("precip_mm") == 0)['time'].dt.year())

[]

In [209]:
list(perc99.filter(pl.col("precip_mm") == 0)['time'].dt.year())

[]

In [210]:
perc50 = gauge_data.group_by_dynamic('time', every='1y').agg(pl.quantile("precip_mm", .50))
list(perc50.filter(pl.col("precip_mm") == 0)['time'].dt.year())

[2006, 2007, 2008, 2009, 2010]

# QC2 - K-largest
[Back to Index](#Table-of-contents)

#### Differences from `intenseqc`: 
- 

# QC3 - Days of week
[Back to Index](#Table-of-contents)

#### Differences from `intenseqc`: 
- 

# QC4 - Hours of day
[Back to Index](#Table-of-contents)

#### Differences from `intenseqc`: 
- 

# QC5 - Intermittency 
[Back to Index](#Table-of-contents)

#### Differences from `intenseqc`: 
- 

# QC6 - Breakpoints 
[Back to Index](#Table-of-contents)

#### Differences from `intenseqc`: 
- 

# QC7 - Minimum value change 
[Back to Index](#Table-of-contents)

#### Differences from `intenseqc`: 
- 