# Science Plan

Objective - test hypothesis that weather events can affect the bottom currents that control the bending and rise heights of Hydrothermal plumes 

Mostly copied from the Old Sacker Science Plan notebook, however this uses the new bending data. I figured the old one might be good to keep around for reference but adding the new data to it would make it too cluttered. This notebook also does not include the Centerline Vertical Flow Rate Data and Ras Data at the moment.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from mpl_toolkits import mplot3d
import pandas as pd
from datetime import datetime
import hvplot.pandas
import holoviews as hv
from holoviews import dim, opts
hv.extension('bokeh')

## Step One - Find Some Data

* Check out the loaded zip file. It contains plume bending data (direction of bending and magnitude of bending), vertical velocity in the plume, a vent temperature data file, and some weather data.  Note these files contain data for two time frames - Oct 2010 and Oct to Dec 2011. 
    i. BendData*.txt - these two files are the basic bending data - three columns = {direction as angle from north, bending magnitude as angle from vertical, julian date}
    ii. Other files are .mat format so Dax or I may need to help with these. I’ll try to load more information on them soon.
* I also loaded two powerpoints of talks that came out of the pilot study.  Some of the material is irrelevant.
* I don’t have a handy tidal data file -- bottom current and pressure from tides -- but this should exist at least as model data
* At Ocean Networks Canada’s NEPTUNE observatory, they had current meters (ADCP) at a regional circulation mooring about 1 km N to NE of the COVIS site that collected data in Oct 2010 and in late 2011 to early 2012. 
* I did include the weather data I found. This came from the NOAA and National Weather Center’s records.  Feel free to do your own hunt for data!
* At some point, you might want the actual grids of COVIS data.  Right now most data is in Matlab’s .mat format and takes a bit of processing to get images, centerlines, and bending data.  But this will be useful to lengthen the data series (COVIS took data in Oct 2010 and form Oct 2011 to some time in late 2014 or early 2015).

**New bending data (angles.partialpts_2010/2011.dat) have replaced the BendData files. This bending data as initially ingested contains two angles:**

**1. inclination - which is the angle from the vertical (positive z-axis)**

**2. azimuth - which is the counterclockwise angle from the postive x-axis (standard for polar coordinate systems)**

## Recomputed Plume Bending Data 

**2010**

In [None]:
!echo 'date,inclination,azimuth' | cat - ~/covis/diaz/CovisData/angles_partialpts_2010.dat > ~/covis/diaz/CovisData/temp.txt
!sed 's/ /,/g' ~/covis/diaz/CovisData/temp.txt > ~/covis/diaz/CovisData/angles_partialpts_2010.csv
!rm ~/covis/diaz/CovisData/temp.txt*
!head ~/covis/diaz/CovisData/angles_partialpts_2010.csv

In [None]:
path = '~/covis/diaz/CovisData/angles_partialpts_2010.csv'
df_rc2010 = pd.read_csv(path, sep=",")
df_rc2010['year'] = '2010'
df_rc2010['datetime'] = pd.to_datetime(df_rc2010.year, format='%Y') + pd.to_timedelta(df_rc2010.date - 1, unit='d')
df_rc2010['datetime'] = df_rc2010['datetime'].dt.round('1s')
df_rc2010 = df_rc2010.set_index('datetime')
df_rc2010.drop(['date', 'year'], axis=1,inplace=True)
df_rc2010 = df_rc2010.rename_axis(None)
df_rc2010.azimuth = df_rc2010.add(180)
df_rc2010 = df_rc2010.resample('h').mean()
df_rc2010.head(80)

**2011**

In [None]:
!echo 'date,inclination,azimuth' | cat - ~/covis/diaz/CovisData/angles_partialpts_2011.dat > ~/covis/diaz/CovisData/temp.txt
!sed 's/ /,/g' ~/covis/diaz/CovisData/temp.txt > ~/covis/diaz/CovisData/angles_partialpts_2011.csv
!rm ~/covis/diaz/CovisData/temp.txt*
!head ~/covis/diaz/CovisData/angles_partialpts_2011.csv

In [None]:
path = '~/covis/diaz/CovisData/angles_partialpts_2011.csv' 
df_rc2011 = pd.read_csv(path, sep=",")
df_rc2011['year'] = '2011'
df_rc2011['datetime'] = pd.to_datetime(df_rc2011.year, format='%Y') + pd.to_timedelta(df_rc2011.date - 1, unit='d')
df_rc2011['datetime'] = df_rc2011['datetime'].dt.round('1s')
df_rc2011 = df_rc2011.set_index('datetime')
df_rc2011.drop(['date', 'year'], axis=1,inplace=True)
df_rc2011 = df_rc2011.rename_axis(None)
df_rc2011.azimuth = df_rc2011.add(180)
df_rc2011 = df_rc2011.resample('h').mean()
df_rc2011.head(50)

## Weather Data

(weather_data_for_plotting*.txt files)

 An explanation of the variable names can be found at https://www.ndbc.noaa.gov/measdes.shtml

13 variables (Year, Month, Day, Hour, Minute, Seconds, Julian day, wave height, wind direction, wind speed, wind gust speed, atmospheric pressure, air temperature)

separator = single blank space

Notes:
1. Time is given both as a vector (from original data file probably) and as a Julian day.
2. No information on other units by m/s likely for speeds.  Rest should match the NOAA information.  
3. I probably still have the original NOAA data files for these buoys. I'll see if I can match data files to the time periods; my directories seemed a little confused. 

**2010 C46036**

In [None]:
!echo 'year,month,day,hour,minute,seconds,jday,wv_height,wnd_dir,wnd_spd,wnd_gspd,atm_prs,air_temp' | cat - ~/covis/diaz/CovisData/weather_data_for_plotting_2010_C46036.txt > ~/covis/diaz/CovisData/temp.txt
!sed 's/   /,/g' ~/covis/diaz/CovisData/temp.txt > ~/covis/diaz/CovisData/temp1.txt
!sed 's/^,//' <~/covis/diaz/CovisData/temp1.txt >~/covis/diaz/CovisData/weather_data_for_plotting_2010_C46036.csv
!rm ~/covis/diaz/CovisData/temp*

In [None]:
path = '~/covis/diaz/CovisData/weather_data_for_plotting_2010_C46036.csv' 
df_wd2010c46036 = pd.read_csv(path, sep=",")
df_wd2010c46036['year']= df_wd2010c46036['year'].astype(int).astype(str)
df_wd2010c46036['month']= df_wd2010c46036['month'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010c46036['day']= df_wd2010c46036['day'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010c46036['hour']= df_wd2010c46036['hour'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010c46036['minute']=df_wd2010c46036['minute'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010c46036['datetime'] = df_wd2010c46036['year'] + df_wd2010c46036['month'] + df_wd2010c46036['day'] +\
'T' + df_wd2010c46036['hour']+ ':' + df_wd2010c46036['minute']
df_wd2010c46036['datetime'] = pd.to_datetime(df_wd2010c46036['datetime'])
df_wd2010c46036 = df_wd2010c46036.set_index('datetime')
df_wd2010c46036 = df_wd2010c46036[['jday', 'wv_height', 'wnd_dir', 'wnd_spd', 'wnd_gspd', 'atm_prs', 'air_temp']]
df_wd2010c46036 = df_wd2010c46036.rename_axis(None)
df_wd2010c46036 = df_wd2010c46036.resample('h').mean()
df_wd2010c46036

**2011 C46036**

In [None]:
!echo 'year,month,day,hour,minute,seconds,jday,wv_height,wnd_dir,wnd_spd,wnd_gspd,atm_prs,air_temp' | cat - ~/covis/diaz/CovisData/weather_data_for_plotting_2011_C46036.txt > ~/covis/diaz/CovisData/temp.txt
!sed 's/   /,/g' ~/covis/diaz/CovisData/temp.txt > ~/covis/diaz/CovisData/temp1.txt
!sed 's/^,//' <~/covis/diaz/CovisData/temp1.txt >~/covis/diaz/CovisData/weather_data_for_plotting_2011_C46036.csv
!rm ~/covis/diaz/CovisData/temp.txt*

In [None]:
path = '~/covis/diaz/CovisData/weather_data_for_plotting_2011_C46036.csv'
df_wd2011c46036 = pd.read_csv(path, sep=",")
df_wd2011c46036['year']= df_wd2011c46036['year'].astype(int).astype(str)
df_wd2011c46036['month']= df_wd2011c46036['month'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011c46036['day']= df_wd2011c46036['day'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011c46036['hour']= df_wd2011c46036['hour'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011c46036['minute']=df_wd2011c46036['minute'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011c46036['datetime'] = df_wd2011c46036['year'] + df_wd2011c46036['month'] + df_wd2011c46036['day'] +\
'T' + df_wd2011c46036['hour']+ ':' + df_wd2011c46036['minute']
df_wd2011c46036['datetime'] = pd.to_datetime(df_wd2011c46036['datetime'])
df_wd2011c46036 = df_wd2011c46036.set_index('datetime')
df_wd2011c46036 = df_wd2011c46036[['jday', 'wv_height', 'wnd_dir', 'wnd_spd', 'wnd_gspd', 'atm_prs', 'air_temp']]
df_wd2011c46036 = df_wd2011c46036.rename_axis(None)
df_wd2011c46036 = df_wd2011c46036.resample('h').mean()
df_wd2011c46036

**2010 Tillamook**

In [None]:
!head ~/covis/diaz/CovisData/weather_data_for_plotting_2010_Tillamook.txt

In [None]:
!rm ~/covis/diaz/CovisData/WeatherT2010.csv
!echo 'year,month,day,hour,minute,seconds,jday,wv_height,wnd_dir,wnd_spd,wnd_gspd,atm_prs,air_temp' | cat - ~/covis/diaz/CovisData/weather_data_for_plotting_2010_Tillamook.txt > ~/covis/diaz/CovisData/temp.txt
!sed 's/   /,/g' ~/covis/diaz/CovisData/temp.txt > ~/covis/diaz/CovisData/temp1.txt
!sed 's/,,,,/,/g' ~/covis/diaz/CovisData/temp1.txt > ~/covis/diaz/CovisData/temp2.txt
!sed 's/^,//' <~/covis/diaz/CovisData/temp2.txt >~/covis/diaz/CovisData/weather_data_for_plotting_2010_Tillamook.csv
!rm ~/covis/diaz/CovisData/temp*
!head ~/covis/diaz/CovisData/weather_data_for_plotting_2010_Tillamook.csv

In [None]:
path = '~/covis/diaz/CovisData/weather_data_for_plotting_2010_Tillamook.csv' 
df_wd2010Tillamook = pd.read_csv(path, sep=",")
df_wd2010Tillamook['year']= df_wd2010Tillamook['year'].astype(int).astype(str)
df_wd2010Tillamook['month']= df_wd2010Tillamook['month'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010Tillamook['day']= df_wd2010Tillamook['day'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010Tillamook['hour']= df_wd2010Tillamook['hour'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010Tillamook['minute']=df_wd2010Tillamook['minute'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010Tillamook['wv_height'] = df_wd2010Tillamook['wv_height'].astype('float64')
df_wd2010Tillamook['wnd_dir'] = df_wd2010Tillamook['wnd_dir'].astype('float64')
df_wd2010Tillamook['wnd_spd'] = df_wd2010Tillamook['wnd_spd'].astype('float64')
df_wd2010Tillamook['wnd_gspd'] = df_wd2010Tillamook['wnd_gspd'].astype('float64')
df_wd2010Tillamook['atm_prs'] = df_wd2010Tillamook['atm_prs'].astype('float64')
df_wd2010Tillamook['air_temp'] = df_wd2010Tillamook['air_temp'].astype('float64')
df_wd2010Tillamook['datetime'] = df_wd2010Tillamook['year'] + df_wd2010Tillamook['month'] + df_wd2010Tillamook['day'] +\
'T' + df_wd2010Tillamook['hour']+ ':' + df_wd2010Tillamook['minute']
df_wd2010Tillamook['datetime'] = pd.to_datetime(df_wd2010Tillamook['datetime'])
df_wd2010Tillamook = df_wd2010Tillamook.set_index('datetime')
df_wd2010Tillamook = df_wd2010Tillamook[['jday', 'wv_height', 'wnd_dir', 'wnd_spd', 'wnd_gspd', 'atm_prs', 'air_temp']]
df_wd2010Tillamook = df_wd2010Tillamook.rename_axis(None)
df_wd2010Tillamook = df_wd2010Tillamook.resample('h').mean()
df_wd2010Tillamook

**2011 Tillamook**

In [None]:
!head ~/covis/diaz/CovisData/weather_data_for_plotting_2011_Tillamook.txt

In [None]:
!rm ~/covis/diaz/CovisData/WeatherT2011.csv
!echo 'year,month,day,hour,minute,seconds,jday,wv_height,wnd_dir,wnd_spd,wnd_gspd,atm_prs,air_temp' | cat - ~/covis/diaz/CovisData/weather_data_for_plotting_2011_Tillamook.txt > ~/covis/diaz/CovisData/temp.txt
!sed 's/   /,/g' ~/covis/diaz/CovisData/temp.txt > ~/covis/diaz/CovisData/temp1.txt
!sed 's/,,,,/,/g' ~/covis/diaz/CovisData/temp1.txt > ~/covis/diaz/CovisData/temp2.txt
!sed 's/  /,/g' ~/covis/diaz/CovisData/temp2.txt > ~/covis/diaz/CovisData/temp3.txt
!sed 's/^,//' <~/covis/diaz/CovisData/temp3.txt >~/covis/diaz/CovisData/weather_data_for_plotting_2011_Tillamook.csv
!rm ~/covis/diaz/CovisData/temp*
!head ~/covis/diaz/CovisData/weather_data_for_plotting_2011_Tillamook.csv

In [None]:
path = '~/covis/diaz/CovisData/weather_data_for_plotting_2011_Tillamook.csv' 
df_wd2011Tillamook = pd.read_csv(path, sep=",", engine='python')
df_wd2011Tillamook= df_wd2011Tillamook.dropna()
df_wd2011Tillamook['year']= df_wd2011Tillamook['year'].astype(int).astype(str)
df_wd2011Tillamook['month']= df_wd2011Tillamook['month'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011Tillamook['day']= df_wd2011Tillamook['day'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011Tillamook['hour']= df_wd2011Tillamook['hour'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011Tillamook['minute']=df_wd2011Tillamook['minute'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011Tillamook['wv_height'] = df_wd2011Tillamook['wv_height'].astype('float64')
df_wd2011Tillamook['wnd_dir'] = df_wd2011Tillamook['wnd_dir'].astype('float64')
df_wd2011Tillamook['wnd_spd'] = df_wd2011Tillamook['wnd_spd'].astype('float64')
df_wd2011Tillamook['wnd_gspd'] = df_wd2011Tillamook['wnd_gspd'].astype('float64')
df_wd2011Tillamook['atm_prs'] = df_wd2011Tillamook['atm_prs'].astype('float64')
df_wd2011Tillamook['air_temp'] = df_wd2011Tillamook['air_temp'].astype('float64')
df_wd2011Tillamook['datetime'] = df_wd2011Tillamook['year'] + df_wd2011Tillamook['month'] + df_wd2011Tillamook['day'] +\
'T' + df_wd2011Tillamook['hour']+ ':' + df_wd2011Tillamook['minute']
df_wd2011Tillamook['datetime'] = pd.to_datetime(df_wd2011Tillamook['datetime'])
df_wd2011Tillamook = df_wd2011Tillamook.set_index('datetime')
df_wd2011Tillamook = df_wd2011Tillamook[['jday', 'wv_height', 'wnd_dir', 'wnd_spd', 'wnd_gspd', 'atm_prs', 'air_temp']]
df_wd2011Tillamook = df_wd2011Tillamook.rename_axis(None)
df_wd2011Tillamook = df_wd2011Tillamook.resample('h').mean()
df_wd2011Tillamook

## Step Two - Plot The Data
* What patterns do you see?  
* What else can you do with this data?
* Do the different data sets correlate?

**Weather Data**

**Weather Data 2010 C46036**

1. Wave Height

In [None]:
df_wd2010c46036.hvplot(x = 'index', y= ['wv_height'])

2. Wind Direction 

In [None]:
df_wd2010c46036.hvplot(x = 'index', y= ['wnd_dir'])

3. Wind Speed

In [None]:
df_wd2010c46036.hvplot(x = 'index', y= ['wnd_spd'])

4. Wind Gust Speed

In [None]:
df_wd2010c46036.hvplot(x = 'index', y= ['wnd_gspd'])

5. Atmospheric Pressure

In [None]:
df_wd2010c46036.hvplot(x = 'index', y= ['atm_prs'])

6. Air Temperature

In [None]:
df_wd2010c46036.hvplot(x = 'index', y= ['air_temp'])

7. All Variables

In [None]:
(df_wd2010c46036.hvplot(x = 'index', y= ['wv_height'], width=350, height=300) + df_wd2010c46036.hvplot(x = 'index', y= ['wnd_dir'], width=350, height=300) + df_wd2010c46036.hvplot(x = 'index', y= ['wnd_spd'], width=350, height=300) + df_wd2010c46036.hvplot(x = 'index', y= ['wnd_gspd'], width=350, height=300) + df_wd2010c46036.hvplot(x = 'index', y= ['atm_prs'], width=350, height=300) + df_wd2010c46036.hvplot(x = 'index', y= ['air_temp'], width=350, height=300)).cols(2)

**Weather Data 2011 C46036:**

1. Wave Height

In [None]:
df_wd2011c46036.hvplot(x = 'index', y= ['wv_height'])

2. Wind Direction

In [None]:
df_wd2011c46036.hvplot(x = 'index', y= ['wnd_dir'])

3. Wind Speed

In [None]:
df_wd2011c46036.hvplot(x = 'index', y= ['wnd_spd'])

4. Wind Gust Speed

In [None]:
df_wd2011c46036.hvplot(x = 'index', y= ['wnd_gspd'])

5. Atmospheric Pressure

In [None]:
df_wd2011c46036.hvplot(x = 'index', y= ['atm_prs'])

6. Air Temperature

In [None]:
df_wd2011c46036.hvplot(x = 'index', y= ['air_temp'])

7. All Variables

In [None]:
(df_wd2011c46036.hvplot(x = 'index', y= ['wv_height'], width=350, height=300) + df_wd2011c46036.hvplot(x = 'index', y= ['wnd_dir'], width=350, height=300) + df_wd2011c46036.hvplot(x = 'index', y= ['wnd_spd'], width=350, height=300) + df_wd2011c46036.hvplot(x = 'index', y= ['wnd_gspd'], width=350, height=300) + df_wd2011c46036.hvplot(x = 'index', y= ['atm_prs'], width=350, height=300) + df_wd2011c46036.hvplot(x = 'index', y= ['air_temp'], width=350, height=300)).cols(2)

**Weather Data 2010 Tillamook:**
1. Wind Direction

In [None]:
df_wd2010Tillamook.hvplot(x = 'index' , y= ['wnd_dir'])

2. Wind Speed

In [None]:
df_wd2010Tillamook.hvplot(x = 'index' , y= ['wnd_spd'])

3. Wind Gust Speed

In [None]:
df_wd2010Tillamook.hvplot(x = 'index', y= ['wnd_gspd'])

4. Air Temperature

In [None]:
df_wd2010Tillamook.hvplot(x = 'index', y= ['air_temp'])

5. All Variables

In [None]:
(df_wd2010Tillamook.hvplot(x = 'index', y= ['wv_height'], width=350, height=300) + df_wd2010Tillamook.hvplot(x = 'index', y= ['wnd_dir'], width=350, height=300) + df_wd2010Tillamook.hvplot(x = 'index', y= ['wnd_spd'], width=350, height=300) + df_wd2010Tillamook.hvplot(x = 'index', y= ['wnd_gspd'], width=350, height=300) + df_wd2010Tillamook.hvplot(x = 'index', y= ['atm_prs'], width=350, height=300) + df_wd2010Tillamook.hvplot(x = 'index', y= ['air_temp'], width=350, height=300)).cols(2)

**Weather Data 2011 Tillamook:**

1. Wind Direction

In [None]:
df_wd2011Tillamook.hvplot(x = 'index', y= ['wnd_dir'])

2. Wind Speed

In [None]:
df_wd2011Tillamook.hvplot(x = 'index', y= ['wnd_spd'])

3. Wind Gust Speed

In [None]:
df_wd2011Tillamook.hvplot(x = 'index', y= ['wnd_gspd'])

4. Air Temperature

In [None]:
df_wd2011Tillamook.hvplot(x = 'index', y= ['air_temp'])

5. All Variables

In [None]:
(df_wd2011Tillamook.hvplot(x = 'index', y= ['wv_height'], width=350, height=300) + df_wd2011Tillamook.hvplot(x = 'index', y= ['wnd_dir'], width=350, height=300) + df_wd2011Tillamook.hvplot(x = 'index', y= ['wnd_spd'], width=350, height=300) + df_wd2011Tillamook.hvplot(x = 'index', y= ['wnd_gspd'], width=350, height=300) + df_wd2011Tillamook.hvplot(x = 'index', y= ['atm_prs'], width=350, height=300) + df_wd2011Tillamook.hvplot(x = 'index', y= ['air_temp'], width=350, height=300)).cols(2)

## Step Three - Statistical Tests

The goal here is to use statistical tests to identify or confirm correlations and common patterns.  Correlation tests include the correlation coefficient (sometimes known as R2) and cross-correlation. Correlation coefficients test if variables shift value in sync.  Cross-correlation compares pairs of time series and tests for consistent long-term patterns.  We will also look for periodicities.

Issues:
* Both correlation coefficient estimation and cross-correlation computation assume that the values in the different datasets correspond … here it would be in time.  So we will need to do some interpolation to get our time series onto similar sampling times.
* Most estimators of periodicity assume regular or uniform spacing of the data. There are, however, methods for working with data with gaps.
* Periodicity estimates also change with the length of the data set.  So we will need to think about how to best compare the 2010 and 2011 data sets given the short 2010 record.

1. Extract timing information for the various data sets, especially start time, stop time, and time intervals.  We need both the typical (not necessarily average) time step as well as some information on the number and size of data gaps.  We will use this to determine how to proceed with the correlation steps.  In particular, it will be useful to know which data are sampled at similar or faster rates than the plume bending data.

In [None]:
# not yet altered
times = {'Recomputed Bending Data 2010':['2010-09-29 15:30:17','2010-10-25 00:08:03','3 hours, but highly variable'],
        'Recomputed Bending Data 2011':['2011-09-27 00:00:38','2011-12-30 21:00:40','3 hours, but highly variable'],
        'Weather Data 2010 C':['2010-09-30 23:43:00','2010-10-31 23:43:00','hourly'],
        'Weather Data 2011 C':['2011-09-25 23:43:00','2011-12-31 23:43:00','hourly'],
        'Weather Data 2010 T':['2010-09-29 00:07:00','2010-11-01 23:51:00','20 minutes'],
        'Weather Data 2011 T':['2010-10-01 00:10:00','2013-12-01 23:55:00','20 minutes']}
df_times = pd.DataFrame(times,index = ['start time','end time','time interval'])
df_times

*df_nanvalues shows the number of NaN values for each column in the weather data. Some NaN values (particularly for the C46036 data) are the result of resampling the data.*

In [None]:
nanvalues = {'Weather Data 2010 C':[df_wd2010c46036['wv_height'].isna().sum(),df_wd2010c46036['wnd_dir'].isna().sum(),df_wd2010c46036['wnd_spd'].isna().sum(),df_wd2010c46036['wnd_gspd'].isna().sum(),df_wd2010c46036['atm_prs'].isna().sum(),df_wd2010c46036['air_temp'].isna().sum()],
        'Weather Data 2011 C':[df_wd2011c46036['wv_height'].isna().sum(),df_wd2011c46036['wnd_dir'].isna().sum(),df_wd2011c46036['wnd_spd'].isna().sum(),df_wd2011c46036['wnd_gspd'].isna().sum(),df_wd2011c46036['atm_prs'].isna().sum(),df_wd2011c46036['air_temp'].isna().sum()],
        'Weather Data 2010 T':[df_wd2010Tillamook['wv_height'].isna().sum(),df_wd2010Tillamook['wnd_dir'].isna().sum(),df_wd2010Tillamook['wnd_spd'].isna().sum(),df_wd2010Tillamook['wnd_gspd'].isna().sum(),df_wd2010Tillamook['atm_prs'].isna().sum(),df_wd2010Tillamook['air_temp'].isna().sum()],
        'Weather Data 2011 T':[df_wd2011Tillamook['wv_height'].isna().sum(),df_wd2011Tillamook['wnd_dir'].isna().sum(),df_wd2011Tillamook['wnd_spd'].isna().sum(),df_wd2011Tillamook['wnd_gspd'].isna().sum(),df_wd2011Tillamook['atm_prs'].isna().sum(),df_wd2011Tillamook['air_temp'].isna().sum()]}
df_nanvalues = pd.DataFrame(nanvalues,index = ['nan values wv_height','nan values wnd_dir','nan values wnd_spd','nan values wnd_gspd','nan values atm_prs','nan values air_temp'])
df_nanvalues

2. Estimate some basic statistics for key data sets: means, standard deviations, obvious trends.  This will be useful in defining what extreme bending means if nothing else.

Recomputed Bending Data

In [None]:
df_rc2010.describe()

In [None]:
df_rc2011.describe()

Weather Data 

In [None]:
df_wd2010c46036.describe()

In [None]:
df_wd2011c46036.describe()

In [None]:
df_wd2010Tillamook.describe()

In [None]:
df_wd2011Tillamook.describe()

3. Identify which data sets are key. Which data sets are central to the goals or questions?  Which data sets seem to have common features, patterns, or trends?  This may require plotting data together in ways or combinations that have not yet seemed obvious.

4. Interpolate data onto a common time sampling scheme.  All the COVIS data should be at similar but not necessarily identical times: the vertical velocity data will be offset about 20-40 minutes later than the bending data and may have more gaps.  I don't remember about the weather data and each buoy may be different.  I forget what else there is that I gave you.  I think there is some relevant data that we have not yet pulled from its repository.

5. So you might think a little about what information you'd like to have but don't.

6. Once interpolated to a common sampling, you can estimate the correlation coefficient.  This expects a linear pattern so it might or might not pick up on all relationships that exist.  Also, some of the directions in the bending data are not correct.  So I need to get you an updated data file soon. But you can figure out how  to do this with the existing data while I figure out the problems with the old data.

## **2010**

## Recomputed Plume Bending Data and C46036 Weather Data

In [None]:
rc2010c = pd.merge(df_wd2010c46036, df_rc2010,how='inner', indicator=True, left_index=True, right_index=True, suffixes=('_B', '_G'))

In [None]:
df_covis_rc2010c = pd.DataFrame()
df_covis_rc2010c = rc2010c[rc2010c['_merge'] == 'both']
del df_covis_rc2010c['_merge']
df_covis_rc2010c = df_covis_rc2010c.dropna()
del df_covis_rc2010c['jday']
df_covis_rc2010c.head()

In [None]:
df_covis_rc2010c.describe()

In [None]:
weather = df_covis_rc2010c.hvplot.scatter(y='wnd_dir', color='wnd_spd',
                                     cmap='colorwheel', s= 'wnd_spd',
                                     scale = 2, height=200,
                                     title= 'Example Plots',
                                     ylabel= 'Wind Direction (deg)',
                                     ylim = (0, 400),hover_cols=['wv_height'])
bending = df_covis_rc2010c.hvplot.scatter(y='azimuth',color='inclination',
                                     cmap='viridis', s= 'wnd_spd',
                                     scale = 2, height=200,
                                     ylabel= 'Azimuth (deg)',
                                     xlabel = 'Time (h)',
                                     ylim = (-160, 150)).hist() 
(weather + bending).cols(1)

*Above we have two figures that allow for comparison over time of the various parameters we are interested. The top figure uses the y axis to plot wind direction, and uses both color and dot size to communicate wind speed.*

*The second plot uses the y axis to plot azimuth and uses color to communicate inclination... the wind speed is still communicated using size.*

In [None]:
df_covis_rc2010c.hvplot.scatter(x = 'wnd_dir', y= 'azimuth')

In [None]:
df_covis_rc2010c.hvplot.scatter(x = 'wnd_spd', y= 'inclination')

In [None]:
df_covis_rc2010c.wnd_dir.hvplot.violin(by='index.day')

In [None]:
df_covis_rc2010c.wnd_spd.hvplot.violin(by='index.day')

In [None]:
df_covis_rc2010c.hvplot.hist(+y='azimuth')

In [None]:
df_covis_rc2010c.hvplot.hist(y='inclination')

## Recomputed Plume Bending Data and Tillamook Weather Data

***Dropna() command results in no dataframe being shown***

In [None]:
rc2010T = pd.merge(df_wd2010Tillamook, df_rc2010,how='inner', indicator=True, left_index=True, right_index=True, suffixes=('_B', '_G'))

In [None]:
df_covis_rc2010T = pd.DataFrame()
df_covis_rc2010T = rc2010T[rc2010T['_merge'] == 'both']
del df_covis_rc2010T['_merge']
#df_covis_rc2010T = df_covis_rc2010T.dropna()
del df_covis_rc2010T['jday']
df_covis_rc2010T.head()

In [None]:
df_covis_rc2010T.describe()

In [None]:
weather = df_covis_rc2010T.hvplot.scatter(y='wnd_dir', color='wnd_spd',
                                     cmap='colorwheel', s= 'wnd_spd',
                                     scale = 2, height=200,
                                     title= 'Example Plots',
                                     ylabel= 'Wind Direction (deg)',
                                     ylim = (0, 400),hover_cols=['wv_height'])
bending = df_covis_rc2010T.hvplot.scatter(y='azimuth',color='inclination',
                                     cmap='viridis', s= 'wnd_spd',
                                     scale = 2, height=200,
                                     ylabel= 'Azimuth (deg)',
                                     xlabel = 'Time (h)',
                                     ylim = (-160, 150)).hist() 
(weather + bending).cols(1)

In [None]:
df_covis_rc2010T.hvplot.scatter(x = 'wnd_dir', y= 'azimuth')

In [None]:
df_covis_rc2010T.hvplot.scatter(x = 'wnd_spd', y= 'inclination')

In [None]:
df_covis_rc2010T.wnd_dir.hvplot.violin(by='index.day')

In [None]:
df_covis_rc2010T.wnd_spd.hvplot.violin(by='index.day')

In [None]:
df_covis_rc2010T.hvplot.hist(y='azimuth')

In [None]:
df_covis_rc2010T.hvplot.hist(y='inclination')

## **2011**

## Recomputed Plume Bending Data and C46036 Weather Data

In [None]:
rc2011c = pd.merge(df_wd2011c46036, df_rc2011,how='inner', indicator=True, left_index=True, right_index=True, suffixes=('_B', '_G'))

In [None]:
df_covis_rc2011c = pd.DataFrame()
df_covis_rc2011c = rc2011c[rc2011c['_merge'] == 'both']
del df_covis_rc2011c['_merge']
df_covis_rc2011c = df_covis_rc2011c.dropna()
del df_covis_rc2011c['jday']
df_covis_rc2011c.head()

In [None]:
df_covis_rc2011c.describe()

In [None]:
weather = df_covis_rc2011c.hvplot.scatter(y='wnd_dir', color='wnd_spd',
                                     cmap='colorwheel', s= 'wnd_spd',
                                     scale = 2, height=200,
                                     title= 'Example Plots',
                                     ylabel= 'Wind Direction (deg)',
                                     ylim = (0, 400),hover_cols=['wv_height'])
bending = df_covis_rc2011c.hvplot.scatter(y='azimuth',color='inclination',
                                     cmap='viridis', s= 'wnd_spd',
                                     scale = 2, height=200,
                                     ylabel= 'Azimuth (deg)',
                                     xlabel = 'Time (h)',
                                     ylim = (-160, 150)).hist() 
(weather + bending).cols(1)

In [None]:
df_covis_rc2011c.hvplot.scatter(x = 'wnd_dir', y= 'azimuth')

In [None]:
df_covis_rc2011c.hvplot.scatter(x = 'wnd_spd', y= 'inclination')

In [None]:
df_covis_rc2011c.wnd_dir.hvplot.violin(by='index.day')

In [None]:
df_covis_rc2011c.wnd_spd.hvplot.violin(by='index.day')

In [None]:
df_covis_rc2011c.hvplot.hist(y='azimuth')

In [None]:
df_covis_rc2011c.hvplot.hist(y='inclination')

## Recomputed Plume Bending Data and Tillamook Weather Data

***Dropna() command results in no dataframe being shown***

In [None]:
rc2011T = pd.merge(df_wd2011Tillamook, df_rc2011,how='inner', indicator=True, left_index=True, right_index=True, suffixes=('_B', '_G'))

In [None]:
df_covis_rc2011T = pd.DataFrame()
df_covis_rc2011T = rc2011T[rc2011T['_merge'] == 'both']
del df_covis_rc2011T['_merge']
#df_covis_rc2011T = df_covis_rc2011T.dropna()
del df_covis_rc2011T['jday']
df_covis_rc2011T.head()

In [None]:
df_covis_rc2011T.describe()

In [None]:
weather = df_covis_rc2011T.hvplot.scatter(y='wnd_dir', color='wnd_spd',
                                     cmap='colorwheel', s= 'wnd_spd',
                                     scale = 2, height=200,
                                     title= 'Example Plots',
                                     ylabel= 'Wind Direction (deg)',
                                     ylim = (0, 400),hover_cols=['wv_height'])
bending = df_covis_rc2011T.hvplot.scatter(y='azimuth',color='inclination',
                                     cmap='viridis', s= 'wnd_spd',
                                     scale = 2, height=200,
                                     ylabel= 'Azimuth (deg)',
                                     xlabel = 'Time (h)',
                                     ylim = (-160, 150)).hist() 
(weather + bending).cols(1)

In [None]:
df_covis_rc2011T.hvplot.scatter(x = 'wnd_dir', y= 'azimuth')

In [None]:
df_covis_rc2011T.hvplot.scatter(x = 'wnd_spd', y= 'inclination')

In [None]:
df_covis_rc2011T.wnd_dir.hvplot.violin(by='index.day')

In [None]:
df_covis_rc2011T.wnd_spd.hvplot.violin(by='index.day')

In [None]:
df_covis_rc2011T.hvplot.hist(y='azimuth')