# Neighbour checks for quality control flags
Covers QC16-25

## Table of contents
[QC16 Daily neighbours (wet)](#QC16---Daily-neighbours-wet)  
[QC17 Hourly neighbours (wet)](#QC17---Hourly-neighbours-wet)  
[QC18 Daily neighbours (dry)](#QC18---Daily-neighbours-dry)  
[QC19 Hourly neighbours (dry)](#QC19---Hourly-neighbours-dry)  
[QC20 Monthly neighbours](#QC20---Monthly-neighbours)  
[QC21 Timing offset](#QC21---Timing-offset)  
[QC22 Pre-QC Affinity](#QC22---Pre-QC-Affinity)  
[QC23 Pre-QC Pearson](#QC23---Pre-QC-Pearson)  
[QC24 Daily factor](#QC24---Daily-factor)  
[QC25 Monthly factor](#QC25---Monthly-factor)  

See '3.3.4 Neighbouring gauge checks on large values' in Lewis et al. (2021)

In [47]:
import datetime
import glob

import polars as pl

import geopy.distance
import scipy.spatial

In [16]:
def read_metadata(data_path):
    metadata = {}

    with open(data_path, 'r') as f:
        while True:
            key, val = f.readline().strip().split(':', maxsplit=1)
            key = key.lower().replace(' ', '_')
            metadata[key.strip()] = val.strip()
            if key == 'other':
                break
    return metadata

In [18]:
## Step 1. Make or read summary metadata of stations

In [19]:
## Could work by checking if metadata already exists (or user can input)
all_gauge_data = glob.glob("../data/gauge_data/*.txt")
all_gauge_data

['../data/gauge_data/DE_06303.txt',
 '../data/gauge_data/DE_02718.txt',
 '../data/gauge_data/DE_04313.txt',
 '../data/gauge_data/DE_00390.txt',
 '../data/gauge_data/DE_02483.txt',
 '../data/gauge_data/DE_03215.txt',
 '../data/gauge_data/DE_00389.txt',
 '../data/gauge_data/DE_06264.txt',
 '../data/gauge_data/DE_01300.txt',
 '../data/gauge_data/DE_04488.txt',
 '../data/gauge_data/DE_00310.txt']

In [22]:
all_station_metadata_list = []
for ind, file in enumerate(all_gauge_data):
    one_station_metadata = read_metadata(data_path=file)
    all_station_metadata_list.append(one_station_metadata)


In [101]:
all_station_metadata = pl.from_dicts(all_station_metadata_list)
all_station_metadata = all_station_metadata.with_columns(
    pl.col("latitude").cast(pl.Float64),
    pl.col("longitude").cast(pl.Float64),
    (pl.col("start_datetime")+'00').str.strptime(pl.Datetime, "%Y%m%d%H%M"),
    (pl.col("end_datetime")+'00').str.strptime(pl.Datetime, "%Y%m%d%H%M"),
)
all_station_metadata.head()

station_id,country,original_station_number,original_station_name,path_to_original_data,latitude,longitude,start_datetime,end_datetime,elevation,number_of_records,percent_missing_data,original_timestep,new_timestep,original_units,new_units,time_zone,daylight_saving_info,no_data_value,resolution,other
str,str,str,str,str,f64,f64,datetime[μs],datetime[μs],str,str,str,str,str,str,str,str,str,str,str,str
"""DE_06303""","""Germany""","""06303""","""NA""","""B:/INTENSE data/Original data/…",51.2884,8.5907,2006-01-01 00:00:00,2010-12-31 23:00:00,"""588m""","""43824""","""0.00""","""1hr""","""1hr""","""mm""","""mm""","""CET""","""NA""","""-999""","""0.10""",""""""
"""DE_02718""","""Germany""","""02718""","""NA""","""B:/INTENSE data/Original data/…",51.288,8.7928,2006-01-01 00:00:00,2010-12-31 23:00:00,"""458m""","""43824""","""0.00""","""1hr""","""1hr""","""mm""","""mm""","""CET""","""NA""","""-999""","""0.10""",""""""
"""DE_04313""","""Germany""","""04313""","""NA""","""B:/INTENSE data/Original data/…",51.4966,8.4342,2006-01-01 00:00:00,2010-12-31 23:00:00,"""361m""","""43824""","""0.00""","""1hr""","""1hr""","""mm""","""mm""","""CET""","""NA""","""-999""","""0.10""",""""""
"""DE_00390""","""Germany""","""00390""","""NA""","""B:/INTENSE data/Original data/…",50.9837,8.3679,2006-01-01 00:00:00,2010-12-31 23:00:00,"""610m""","""43824""","""0.00""","""1hr""","""1hr""","""mm""","""mm""","""CET""","""NA""","""-999""","""0.10""",""""""
"""DE_02483""","""Germany""","""02483""","""NA""","""B:/INTENSE data/Original data/…",51.1803,8.4891,2006-01-01 00:00:00,2010-12-31 23:00:00,"""839m""","""43824""","""0.00""","""1hr""","""1hr""","""mm""","""mm""","""CET""","""NA""","""-999""","""0.10""",""""""


In [102]:
# ALTERNATIVE: do we want to avoid using geopy.distance and simply write a distance function?
all_station_dist_mtx = scipy.spatial.distance.cdist(all_station_metadata[['latitude', 'longitude']].rows(),
                                        all_station_metadata[['latitude', 'longitude']].rows(),
                                        metric=lambda pnt1, pnt2: geopy.distance.geodesic(pnt1, pnt2).kilometers)

In [103]:
print(all_station_dist_mtx.shape)
all_station_dist_mtx

(11, 11)


array([[ 0.        , 14.09894683, 25.59691371, 37.3130221 , 13.96385857,
        15.87407843, 32.40525435, 14.59959055, 30.54248194, 24.09764638,
        25.0005783 ],
       [14.09894683,  0.        , 34.08234805, 45.06119283, 24.36175299,
        14.45974999, 39.51940345, 17.2247529 , 44.56675361, 37.63594424,
        30.46503547],
       [25.59691371, 34.08234805,  0.        , 57.24912491, 35.39722566,
        41.38815216, 53.60203785, 17.56187361, 33.20205466, 33.63868242,
        48.42088382],
       [37.3130221 , 45.06119283, 57.24912491,  0.        , 23.46277837,
        31.70877799,  5.66497103, 51.79901752, 33.51654307, 26.45625271,
        15.01638048],
       [13.96385857, 24.36175299, 35.39722566, 23.46277837,  0.        ,
        15.71007358, 18.84434242, 28.34376957, 24.64263808, 15.92931114,
        13.13459419],
       [15.87407843, 14.45974999, 41.38815216, 31.70877799, 15.71007358,
         0.        , 26.05191884, 27.74211572, 40.03766665, 31.55486415,
        16.724

In [None]:
def compute_overlap_days(start_1, end_1, start_2, end_2):
    ## TODO: add cast to datetime functionality/checks
    ## compute overlap
    overlap_start = max(start_1, start_2)
    overlap_end = min(end_1, end_2)

    overlap_days = max(0, (overlap_end - overlap_start).days)

    return overlap_days

In [111]:
start_1, end_1 = all_station_metadata['start_datetime'][0], all_station_metadata['end_datetime'][0]
start_2, end_2 = all_station_metadata['start_datetime'][1], all_station_metadata['end_datetime'][1]
overlap_days = compute_overlap_days(start_1, end_1, start_2, end_2)

In [113]:
print(overlap_days)

1825


# QC16 - Daily neighbours (wet)
[Back to Index](#Table-of-contents)

#### Differences from `intense-qc`: 
- 

# QC17 - Hourly neighbours (wet) 
[Back to Index](#Table-of-contents)

#### Differences from `intense-qc`: 
- 

# QC18 - Daily neighbours (dry) 
[Back to Index](#Table-of-contents)

#### Differences from `intense-qc`: 
- 

# QC19 - Hourly neighbours (dry) 
[Back to Index](#Table-of-contents)

#### Differences from `intense-qc`: 
- 

# QC20 - Monthly neighbours
[Back to Index](#Table-of-contents)

#### Differences from `intense-qc`: 
- 

# QC21 - Timing offset 
[Back to Index](#Table-of-contents)

#### Differences from `intense-qc`: 
- 

# QC22 - Pre-QC Affinity  
[Back to Index](#Table-of-contents)

#### Differences from `intense-qc`: 
- 

# QC23 - Pre-QC Pearson
[Back to Index](#Table-of-contents)

#### Differences from `intense-qc`: 
- 

# QC24 - Daily factor  
[Back to Index](#Table-of-contents)

#### Differences from `intense-qc`: 
- 

# QC25 - Monthly factor
[Back to Index](#Table-of-contents)

#### Differences from `intense-qc`: 
- 