# Historical Data Platform QA/QC Event Evaluation Procedure

| Event | Start Date | End Date | Location | Variables to Evaluate |
|-------|------------|----------|----------|-----------------------|
|Santa Ana Wind|2/16/1988|2/19/1988|Los Angeles, Orange Counties|wind speed, wind direction, air temperature, humidity|
|Winter Storm | 12/20/1990|12/24/1990| WECC, Sac Valley, Oakland|air temperature, pressure, precipitation|
|AR|1/16/2017|1/20/2017|CA, coastal WECC|precipitation, wind speed, wind direction|
|Mudslide|1/5/2018|1/9/2018|Santa Barbara County|precipitation, wind speed, wind direction, air temperature|
|"Heatwave1"|8/14/2020|8/15/2020|WECC|air temperature, wind speed, air pressure|
|"Heatwave2"|9/5/2020|9/8/2020| CA (coastal + S. CA), Los Angeles County|air temperature, wind speed, air pressure|
|"Heatwave3"|8/30/2022|9/9/2020| CA (coastal + S. CA)|air temperature, wind speed, air pressure|
|Offshore wind|1/15/2021|1/16/2021|Coastal CA|wind speed, wind direction|

**In evaluation**: Santa Ana Wind

**Steps**:
1. Identify stations within event location
2. Retrieve station file to evaluate
3. Look at full timeseries plot (flagged_timeseries_plot) to see general trend
4. Look at climatology plot
5. Look at event plot

In [None]:
# import libraries
import xarray as xr
import numpy as np
import pandas as pd
import os
import sys

from qaqc_eval_utils import (id_all_flags, known_issue_check, subset_eval_stns, latlon_to_mercator_cartopy, pull_nc_from_aws, return_ghcn_vars)
from qaqc_eval_plot import stn_visualize

sys.path.append(os.path.expanduser('../'))
from qaqc_plot import flagged_timeseries_plot
from QAQC_pipeline import qaqc_ds_to_df # not working at present

%matplotlib inline

### Step 1: Retrieve relevant station files and comparison data for evaluation
Read in training list of QA/QC'd stations.

In [None]:
# read in stations
train_stns = pd.read_csv('../qaqc_training_station_list_events.csv')
train_stns.head()

For event evaluation, randomly sample for a manageable number of stations per event using the `subset_eval_stns` function.

In [None]:
eval_stations = subset_eval_stns(
    event_to_eval = 'santa_ana_wind',
    stn_list = train_stns,
    subset = 4,
    return_stn_ids = True
)
eval_stations

Visualize the station next

In [None]:
stn_visualize(
    stn_id = eval_stations['era-id'].values[0],
    stn_list = eval_stations,
    event_to_eval = 'santa_ana_wind'
)

# TODO: add county boundaries

### Step 2: Holistic / qualitative station evaluation

In [None]:
# pull station from AWS
ds = pull_nc_from_aws('ASOSAWOS_72383023187')
ds

In [None]:
id_all_flags(ds)

In [None]:
%%time 
df = ds.to_dataframe().reset_index() # takes about 4 min...
df.head(5)

## TO DO: do we close the ds to save space/memory with the df open? 

## would be better to use our tailored "qaqc_ds_to_df" function but it's erroring out
# df = qaqc_ds_to_df(ds)

In [None]:
# santa ana wind vars: air temp, humidity, wind speed, wind direction
vars_to_check = ['tas', 'hurs', 'sfcWind', 'sfcWind_dir']
vars_to_eval = [var for var in vars_to_check if var in df.columns] # check if variable is not present in the specific station

for var in vars_to_eval:
    known_issue_check(network=df.station.unique()[0].split('_')[0], 
                      var=var, 
                      stn=df.station.unique()[0]) # check if known issues are present first!
    flagged_timeseries_plot(df, var=var)

In [None]:
from qaqc_eval_plot import test_subset_plot

In [None]:
dd = event_subset_plot(df, event='santa_ana_wind')

In [None]:
vars_to_check = ['tas', 'hurs', 'sfcWind', 'sfcWind_dir']
vars_to_eval = [var for var in vars_to_check if var in df.columns]

for var in vars_to_eval:
    flagged_timeseries_plot(dd, var=var)

In [None]:
flagged_timeseries_plot(dd, var='tas') # why no showing? 

#### Append local GHCNh library path

In [None]:
ghcnh_lib_path = "/Users/hector/ERA_work/historical-obs-platform/test_platform/scripts/3_qaqc_data/qaqc_eval_notebooks/GHCNh"
sys.path.append(ghcnh_lib_path)
# from GHCNh.GHCNh_lib import GHCNh  # If GHCNh is within current folder
from GHCNh_lib import GHCNh # If GHCNh is was appended to path

In [None]:
%%time
ghcnh = GHCNh(stations_local=True)
ghcnh.select_wecc()
id = ghcnh.stations_df['id'].iloc[0]
ghcnh.read_data_from_url(id, save=True)
ghcnh.convert_df_to_gpd()
ghcnh.station_data.head(3)

In [None]:
lon = ghcnh.station_data.Longitude.mean()
lat = ghcnh.station_data.Latitude.mean()
print("{}, {:.5f}, {:.5f}".format(id, lon, lat))

In [None]:
fig,ax = plt.subplots(figsize=(9,3))

ghcnh.station_data.plot(ax=ax, x="time", y="temperature")
ghcnh.station_data.plot(ax=ax, x="time", y="dew_point_temperature")
ax.set_title("{}  ({:.3f}, {:.3f})".format(id,lon,lat));

In [None]:
# initial test for identifying the event: large jumps on windspeed

In [None]:
return_ghcn_vars(ghcnh.station_data, 'sfcWind').head(3)