# Winter 2022-2023 Wind Events at Kettle Ponds

Author: Daniel Hogan
Created: January 10, 2024

This notebook will start to address two main questions (with sub-focuses discussed below):
1) What events had the highest percentile of wind speeds?
2) What were the general storm characteristics? Were they related?
2) How much sublimation over the season came from these events?

### Imports


In [2]:
# general
import os
import datetime as dt
import json
# data 
import xarray as xr 
from sublimpy import utils, variables, tidy
import numpy as np
import pandas as pd
from act import discovery, plotting
# plotting
import matplotlib.pyplot as plt
import plotly.express as px 
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import cufflinks as cf
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
# helper tools
from scripts.get_sail_data import get_sail_data
from scripts.helper_funcs import create_windrose_df
import scripts.helper_funcs as hf
from metpy import calc, units
# make plotly work 
init_notebook_mode(connected=True)
cf.go_offline()

nctoolkit is using Climate Data Operators version 2.3.0


## 1. What events at Kettle Ponds had the highest percentile of wind speeds?
We will begin to address this by looking at daily average wind speeds  from the SOS data to classify the top 90th percentile of windy days during the main snow season which we will call December 1, 2023 to May 1, 2024. This may need to be broken down further to capture wind events, but we'll start with this. We'll begin by comparing the days for 3-20 m wind speeds and compare tower-to-tower to make sure we have consistency at our location.

1) We'll first make some box plots of daily average wind speeds at each height and each tower to look at the total distribution over winter
2) We'll make a timeseries plot for each height bin and mark out the highest percentile of wind speeds for the year
3) We'll also make a correlation plot with tower on one axis and measurement height on the other.
4) We'll then filter to the days with the highest 10% of wind speeds over our period

In [1]:
start_date = '20221129'
end_date = '20230507'

In [None]:
# Let's begin by downloading the SOS data and storing it in the /storage/ directory
output_dir = '/storage/dlhogan/synoptic_sublimation/sos_data/'
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
sos_5min_ds = utils.download_sos_data(
                        start_date=start_date,
                        end_date=end_date,
                        variable_names=variables.DEFAULT_VARIABLES,
                        local_download_dir=output_dir,
                        cache=True
                    )    

**NOTE**: No filtering is done here. Will update with filters in the future.

In [153]:
# only get the wind variables and convert to dataframe
sos_5min_wind_df = sos_5min_ds[hf.WIND_VARIABLES].to_dataframe()
# resample to daily, get the mean for spd_*, u_*, v_* and w_* and the median for dir_*
# make a dictionary of the aggregations by iterating through the wind variables
sos_daily_avg_dict = {}
for var in hf.WIND_VARIABLES:
    if 'dir_*' in var:
        sos_daily_avg_dict[var] = 'median'
    else:
        sos_daily_avg_dict[var] = 'mean'
sos_daily_avg_df = sos_5min_wind_df.resample('1D').agg(sos_daily_avg_dict)
# reset index for sos_daily_avg_df
sos_daily_avg_df = sos_daily_avg_df.reset_index()

# resample to daily, get the max for spd_*, u_*, v_* and w_*
# make a dictionary of the aggregations by iterating through the wind variables
sos_daily_max_dict = {}
for var in hf.WIND_VARIABLES:
    if 'dir_' in var:
        continue
    else:
        sos_daily_max_dict[var] = 'max'
sos_daily_max_df = sos_5min_wind_df.resample('1D').agg(sos_daily_max_dict)

# find the wind direction during the max wind speed and add it to the daily max dataframe
idx = sos_5min_wind_df.filter(regex='spd_*').dropna().groupby(sos_5min_wind_df.filter(regex='spd_*').dropna().index.date).idxmax(skipna=True)
for dir_var in sos_5min_wind_df.filter(regex='dir_*').columns:
    # for the column, extract everything after the first underscore
    loc = dir_var.split('_', 1)[1]
    spd_loc = 'spd_' + loc
    # create a column with the max wind direction
    dates = sos_5min_wind_df.loc[idx[spd_loc].values, dir_var].index.date
    # fill a new dir_var colummn with nan
    sos_daily_max_df[dir_var] = np.nan
    sos_daily_max_df.loc[dates, dir_var] = sos_5min_wind_df.loc[idx[spd_loc].values, dir_var].values
# reset index for sos_daily_max_df
sos_daily_max_df = sos_daily_max_df.reset_index()

In [212]:
# for values greater than 25 in the spd_* columns, fill with nan
for spd_var in sos_daily_avg_df.filter(regex='spd_*').columns:
    sos_daily_avg_df.loc[sos_daily_avg_df[spd_var] > 25, spd_var] = np.nan
    sos_daily_max_df.loc[sos_daily_max_df[spd_var] > 25, spd_var] = np.nan
    # do the same for the dir vars 
    dir_var = 'dir_' + spd_var.split('_', 1)[1]
    sos_daily_avg_df.loc[sos_daily_avg_df[spd_var] > 25, dir_var] = np.nan
    sos_daily_max_df.loc[sos_daily_max_df[spd_var] > 25, dir_var] = np.nan

In [156]:
sos_daily_avg_tidy_df = tidy.get_tidy_dataset(sos_daily_avg_df, hf.WIND_VARIABLES)
sos_daily_max_tidy_df = tidy.get_tidy_dataset(sos_daily_max_df, hf.WIND_VARIABLES)
# filter to only spd variables
sos_daily_avg_tidy_df = sos_daily_avg_tidy_df[sos_daily_avg_tidy_df['variable'].str.contains('spd_')]
sos_daily_max_tidy_df = sos_daily_max_tidy_df[sos_daily_max_tidy_df['variable'].str.contains('spd_')]

Let's start to get an understanding for daily wind speeds by plotting wind speed and max wind speed on each day as a time series

In [201]:
# create a color dictionary for the unique height values in the sos_tidy_df
color_values = ['spd_1m_uw','spd_1m_d','spd_1m_ue',  'spd_2m_c', 'spd_3m_uw','spd_3m_c', 'spd_3m_ue','spd_3m_d', 'spd_5m_c',
                             'spd_10m_uw','spd_10m_c','spd_10m_ue','spd_10m_d','spd_15m_c', 'spd_20m_c']
n_colors = len(color_values)                      
color_scale = px.colors.sample_colorscale("viridis_r", [n/(n_colors -1) for n in range(n_colors)])
color_dict = dict(zip(color_values,color_scale))
fig = go.Figure()
fig = make_subplots(rows=2, 
                    cols=1, 
                    shared_xaxes=True, 
                    vertical_spacing=0.04, 
                    subplot_titles=('Daily Average Wind Speed', 'Daily Max 5 min Wind Speed'))
for variable in sos_daily_avg_df.filter(regex='spd_*').columns:
    fig.add_trace(go.Scatter(
        x=sos_daily_avg_df['time'], 
        y=sos_daily_avg_df[variable],
        name=f"{variable}",
        marker_color=color_dict[variable],
        connectgaps=False
    ),
    row=1, col=1)
    fig.add_trace(go.Scatter(
        x=sos_daily_max_df['time'], 
        y=sos_daily_max_df[variable],
        name=f"{variable}",
        marker_color=color_dict[variable],
        connectgaps=False,
        showlegend=False
    ),
    row=2, col=1)
# add a hortizontal line in the first plot at 5 m/s
fig.add_hline(y=5, row=1, col=1, line_dash="dash", line_color="black")
# add an annotation for the horizontal line
fig.add_annotation(xref="paper", yref="y", x=dt.date(2023,3,5), y=5.5,
            text="<b>5 m/s</b>",
            showarrow=False,
            # increase font size
            font=dict(
                size=16,
                color="black",
                ),
            row=1, col=1)
# update traces to not connect gaps
fig.update_traces(connectgaps=False)
# update layout
fig.update_layout(
    title='Daily Wind Speeds at Kettle Ponds',
    xaxis_title='Date',
    # set 1st yaxis title
    yaxis1=dict(
        title='Wind Speed (m/s)',
        range=[0, 12],
    ),
    # set 2nd yaxis title
    yaxis2=dict(
        title='Wind Speed (m/s)',
        range=[0, 25],
    ),
    legend_title_text='Wind Speeds',
    height=800,
    width=800,
    template='plotly_white'
)
fig

We see strong relationships between values. Lower levels are not so clean. But max wind speeds for the winter occured on the December 2022 wind event. But there were numberous wind events that exceeted an average of 5 m/s over the day with lots with max "gusts" over 10 m/s

In [210]:
# Make a boxplot of the daily average wind speeds at each height and each tower
fig = px.box(sos_daily_avg_tidy_df, 
             x='variable', 
             y='value', 
             color='height',
             title='Daily Average Wind Speeds at Kettle Ponds',
             # show time in the hover
             hover_data=['time'],
             template='plotly_dark',
             height=500,
             width=1100,
             # widen the box
            boxmode='overlay',
            notched=True,
            points='all',
            category_orders={
                "variable":['spd_1m_uw','spd_1m_d','spd_1m_ue',  'spd_2m_c', 'spd_3m_uw','spd_3m_c', 'spd_3m_ue','spd_3m_d', 'spd_5m_c',
                             'spd_10m_uw','spd_10m_c','spd_10m_ue','spd_10m_d','spd_15m_c', 'spd_20m_c']
            }
            )
# add jitter
fig.update_traces(jitter=0.5, marker=dict(size=2))
# add labels to the x and y axis
fig.update_xaxes(title_text='Measurement Locations')
fig.update_yaxes(title_text='Wind Speed (m/s)')

fig

Wind speeds agree generally across these, but we should look at correlations as well. The 10m level looks like the best and most consistent across space and time, additionally, that will be consistent with other measurements of wind speed.


In [211]:
# Make a boxplot of the daily average wind speeds at each height and each tower
fig = px.box(sos_daily_max_tidy_df, 
             x='variable', 
             y='value', 
             color='height',
             title='Daily Max (5-min) Wind Speeds at Kettle Ponds',
             # show time in the hover
             hover_data=['time'],
             template='plotly_dark',
             height=500,
             width=1100,
             # widen the box
            boxmode='overlay',
            notched=True,
            points='all',
            category_orders={
                "variable":['spd_1m_uw','spd_1m_d','spd_1m_ue',  'spd_2m_c', 'spd_3m_uw','spd_3m_c', 'spd_3m_ue','spd_3m_d', 'spd_5m_c',
                             'spd_10m_uw','spd_10m_c','spd_10m_ue','spd_10m_d','spd_15m_c', 'spd_20m_c']
            }
            )
# add jitter
fig.update_traces(jitter=0.5, marker=dict(size=2))
# add labels to the x and y axis
fig.update_xaxes(title_text='Measurement Locations')
fig.update_yaxes(title_text='Wind Speed (m/s)')

fig

Maxes are generally consistent at different levels, bit the 10m at d has an outlier on the 22nd. Could be real, but likely is not as it was higher than any other wind speed measured at another location.

### Similar plots for wind direction

In [221]:
# Make the same plot for wind direction variables
sos_daily_avg_dir_tidy_df = tidy.get_tidy_dataset(sos_daily_avg_df, hf.WIND_VARIABLES)
sos_daily_max_dir_tidy_df = tidy.get_tidy_dataset(sos_daily_max_df, hf.WIND_VARIABLES)
# filter to only spd variables
sos_daily_avg_dir_tidy_df = sos_daily_avg_dir_tidy_df[sos_daily_avg_dir_tidy_df['variable'].str.contains('dir_')]
sos_daily_max_dir_tidy_df = sos_daily_max_dir_tidy_df[sos_daily_max_dir_tidy_df['variable'].str.contains('dir_')]

In [224]:
# Make a boxplot of the daily average wind speeds at each height and each tower
fig = px.box(sos_daily_avg_dir_tidy_df, 
             x='variable', 
             y='value', 
             color='height',
             title='Daily Median Wind Direction at Kettle Ponds',
             # show time in the hover
             hover_data=['time'],
             template='plotly_dark',
             height=500,
             width=1100,
             # widen the box
            boxmode='overlay',
            notched=True,
            points='all',
            category_orders={
                "variable":['spd_1m_uw','spd_1m_d','spd_1m_ue',  'spd_2m_c', 'spd_3m_uw','spd_3m_c', 'spd_3m_ue','spd_3m_d', 'spd_5m_c',
                             'spd_10m_uw','spd_10m_c','spd_10m_ue','spd_10m_d','spd_15m_c', 'spd_20m_c']
            }
            )
# add jitter
fig.update_traces(jitter=0.5, marker=dict(size=2))
# add labels to the x and y axis
fig.update_xaxes(title_text='Measurement Locations')
fig.update_yaxes(title_text='Wind direction (degrees)')

fig

Have to verify that these plots actually make sense with wind speeds (make wind roses) but generally sensible that winds came from everywhere except due north or east over winter due to blocking from mountains


In [225]:
# Make a boxplot of the daily average wind speeds at each height and each tower
fig = px.box(sos_daily_max_dir_tidy_df, 
             x='variable', 
             y='value', 
             color='height',
             title='Daily Wind Direction During Daily Max Wind at Kettle Ponds',
             # show time in the hover
             hover_data=['time'],
             template='plotly_dark',
             height=500,
             width=1100,
             # widen the box
            boxmode='overlay',
            notched=True,
            points='all',
            category_orders={
                "variable":['spd_1m_uw','spd_1m_d','spd_1m_ue',  'spd_2m_c', 'spd_3m_uw','spd_3m_c', 'spd_3m_ue','spd_3m_d', 'spd_5m_c',
                             'spd_10m_uw','spd_10m_c','spd_10m_ue','spd_10m_d','spd_15m_c', 'spd_20m_c']
            }
            )
# add jitter
fig.update_traces(jitter=0.5, marker=dict(size=2))
# add labels to the x and y axis
fig.update_xaxes(title_text='Measurement Locations')
fig.update_yaxes(title_text='Wind Direction (deg)')

fig

This is interesting. WInds basically solely come from the NW during winter when maximum wind speeds occur over the day, with some southeasterly components as well. Need wind roses to verify. 

## 2. What were some of the storm characteristics at the surface? Are they related?
This section will likely produce another notebook to focus on upper-level dynamics, but we want to get an idea of what the storm was like. We'll look at correlations of:
- wind speed
- wind direction
- relative humidity
- 2m temperature
We'll also take a look at the SAIL radiosondes from those days to try to get a picture of what was happening at upper levels. Perhaps we'll make a mean radiosonde by binning the pressure columns and taking the mean? Have to figure that out.
Can also start to explore some of the doppler lidar data.

Eventually, (not in this notebook) I want to get an understanding for what the precipitation timing was like and see if that matters? I would think that windy storms where snow falls first and then blows around could be the most important.

## 3. How much sublimation over the season came from these events?
We will address this question by calculating hourly sublimation totals from SOS and SAIL over the winter period (dates may need to be adjusted to what Eli calculated with). Then for each of the days we calculated from above, we'll get the total sublimation from those specific days. 
1) First, make a timeseries plot of cumulative sublimation over the year. Add horizontal boxes that mark the days of each wind event
2) Filter the hourly sublimation totals to just the days we want to include and sum the total. 
3) How well the days with the most sublimation correspond with these windy days.

From ELI: 
Some of the sonics look fine. Some sonics mess up the estimates. Filtering is important.