# Generate summed flag count tables

This notebook creates QAQC flag counts csv files per network from the corresponding eraqc_counts_timestep files that were generated as a part of the final processing step for stations within the Historical Data Pipeline. These tables are used to then generate statistics for the QAQC success report.

This is carried out in two steps:

1. Generate the per-network QAQC flag count tables, at native and hourly timesteps

2. Generates one flag count table that sums all per-network tables, at native and hourly timesteps


Using the following functions:


- _pairwise_sum(): helper function that merges two input flag tables, used by network_sum_flag_counts() and total_sum_flag_counts().

- network_sum_flag_counts(): sums all station flag count tables for a given network, creating one flag count table for that network

- generate_station_tables(): runs network_sum_flag_counts() for every network

- total_sum_flag_counts(): sums all network flag count tables, creating one final flag count table 

## Step 0: Environment set-up

In [1]:
import time
import boto3
import numpy as np
import pandas as pd
import xarray as xr

from qaqc_success_report_functions import *

In [None]:
# # Set AWS credentials
# s3 = boto3.resource("s3")
# s3_cl = boto3.client("s3")  # for lower-level processes

# # Set relative paths to other folders and objects in repository.
# bucket_name = "wecc-historical-wx"
# stations_csv_path = f"s3://{bucket_name}/2_clean_wx/temp_clean_all_station_list.csv"
# qaqc_dir = "3_qaqc_wx"
# merge_dir = "4_merge_wx"

### The functions

## Step 1: Generate flag sum tables for ever network

First, loop through every network, combining each of their station flag count tables into one table. The result is one flag count table at each timestep - native and hourly - for every network.

This will take around 1 hour to run for both timesteps. 

In [None]:
#### this is where the issue pops up! 
# something to do with merging the flag meanings table with the counts table
# potentially related to how I convert strins to integers? see red comments in the format helper functions

generate_station_tables('hourly')

# 22 minutes for 27 networks

In [None]:
generate_station_tables("native")

## Step 2: Generate total flag sum table

Now combine all the network flag count tables generated in step 1 into one final flag count table. First at the hourly timestep, and then at the native timestep.

Step 1 must be complete before moving on to this step.

In [None]:
test_total = total_sum_flag_counts('native')

In [None]:
test_total

In [None]:
total_sum_flag_counts('hourly')