# Science Plan

Objective - test hypothesis that weather events can affect the bottom currents that control the bending and rise heights of Hydrothermal plumes 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from mpl_toolkits import mplot3d
import pandas as pd
from datetime import datetime
import hvplot.pandas
import holoviews as hv
from holoviews import dim, opts
hv.extension('bokeh')

## Step One - Find Some Data

* Check out the loaded zip file. It contains plume bending data (direction of bending and magnitude of bending), vertical velocity in the plume, a vent temperature data file, and some weather data.  Note these files contain data for two time frames - Oct 2010 and Oct to Dec 2011. 
    i. BendData*.txt - these two files are the basic bending data - three columns = {direction as angle from north, bending magnitude as angle from vertical, julian date}
    ii. Other files are .mat format so Dax or I may need to help with these. I’ll try to load more information on them soon.
* I also loaded two powerpoints of talks that came out of the pilot study.  Some of the material is irrelevant.
* I don’t have a handy tidal data file -- bottom current and pressure from tides -- but this should exist at least as model data
* At Ocean Networks Canada’s NEPTUNE observatory, they had current meters (ADCP) at a regional circulation mooring about 1 km N to NE of the COVIS site that collected data in Oct 2010 and in late 2011 to early 2012. 
* I did include the weather data I found. This came from the NOAA and National Weather Center’s records.  Feel free to do your own hunt for data!
* At some point, you might want the actual grids of COVIS data.  Right now most data is in Matlab’s .mat format and takes a bit of processing to get images, centerlines, and bending data.  But this will be useful to lengthen the data series (COVIS took data in Oct 2010 and form Oct 2011 to some time in late 2014 or early 2015).



## Plume Bending Data

**2010**

In [None]:
!echo 'direction,deg_from_vert,jday' | cat - ~jovyan/data/covis_data/BendData2010Oct.txt > ~jovyan/data/covis_data/temp.txt
!sed 's/  /,/g' ~jovyan/data/covis_data/temp.txt > ~jovyan/data/covis_data/bend_data2010.csv

In [None]:
path = '/home/jovyan/data/covis_data/bend_data2010.csv'
df_bd2010 = pd.read_csv(path, sep=",")
df_bd2010['year'] = '2010'
df_bd2010['datetime'] = pd.to_datetime(df_bd2010.year, format='%Y') + pd.to_timedelta(df_bd2010.jday - 1, unit='d')
df_bd2010['datetime'] = df_bd2010['datetime'].dt.round('1s')
df_bd2010 = df_bd2010.set_index('datetime')
df_bd2010.drop(['jday', 'year'], axis=1,inplace=True)
df_bd2010 = df_bd2010.rename_axis(None)
df_bd2010

In [None]:
r = df_bd2010['r_value'] = 1
theta = df_bd2010['direction'] + 270
phi = 90 - df_bd2010['deg_from_vert']
df_bd2010['north (x)'] = r * np.sin(theta) * np.cos(phi)
df_bd2010['east (y)'] = r * np.sin(theta) * np.sin(phi)
df_bd2010['vertical (z)'] = r * np.cos(theta)
df_bd2010

**2011**

In [None]:
!echo 'direction,deg_from_vert,jday' | cat - ~jovyan/data/covis_data/BendData2011OctDec.txt > ~jovyan/data/covis_data/temp2.txt
!sed 's/  /,/g' ~jovyan/data/covis_data/temp2.txt > ~jovyan/data/covis_data/bend_data2011.csv

In [None]:
path = '/home/jovyan/data/covis_data/bend_data2011.csv'
df_bd2011 = pd.read_csv(path, sep=",")
df_bd2011['year'] = '2011'
df_bd2011['datetime'] = pd.to_datetime(df_bd2011.year, format='%Y') + pd.to_timedelta(df_bd2011.jday - 1, unit='d')
df_bd2011['datetime'] = df_bd2011['datetime'].dt.round('1s')
df_bd2011 = df_bd2011.set_index('datetime')
df_bd2011.drop(['jday', 'year'], axis=1,inplace=True)
df_bd2011 = df_bd2011.rename_axis(None)
df_bd2011

In [None]:
r = df_bd2011['r_value'] = 1
theta = df_bd2011['direction'] + 270
phi = 90 - df_bd2011['deg_from_vert']
df_bd2011['north (x)'] = r * np.sin(theta) * np.cos(phi)
df_bd2011['east (y)'] = r * np.sin(theta) * np.sin(phi)
df_bd2011['vertical (z)'] = r * np.cos(theta)
df_bd2011.head(3)

## Centerline Vertical Flow Rate 

7 variables (Year, Month, Day, Hour, Minute, Second, Flow rate)

separator = single blank space

Notes:
1. Time is given as a vector here.  This was the simplest conversion from the internal Matlab datenum. 
2. Flow rate is in m/s. Values should all be <1 m/s.
3. Several rows contain only the values "NaN" which is a Matlab code for Not a Number.  These represent missing data.
4. The data in this file is from a diffent set of files than the BendData so the times will not match but be offset by 20-40 minutes.  And there will be missing values (maybe more even than the NaN's represent) as the Doppler data can only be interpretated when the plume is near vertical.

Data is from COVIS data collected at Grotto Vent in the MEF.

In [None]:
!echo 'year month day hour minute second flowrate' | cat - ~jovyan/data/covis_data/centerline_vertical_flow_rate_2010.txt > ~jovyan/data/covis_data/temp.txt
!sed 's/ /,/g' ~jovyan/data/covis_data/temp.txt > ~jovyan/data/covis_data/temp1.txt
!sed 's/,,,,,,,,,,,,,/,/g' ~jovyan/data/covis_data/temp1.txt > ~jovyan/data/covis_data/temp2.txt
!sed 's/,,,/,/g' ~jovyan/data/covis_data/temp2.txt > ~jovyan/data/covis_data/temp3.txt
!sed 's/^,//' <~jovyan/data/covis_data/temp3.txt >~jovyan/data/covis_data/centerline_vertical_flow_rate_2010.csv
!rm ~jovyan/data/covis_data/temp*

In [None]:
path = '/home/jovyan/data/covis_data/centerline_vertical_flow_rate_2010.csv'
df_cvfr2010 = pd.read_csv(path, sep=",", engine='python')
df_cvfr2010= df_cvfr2010.dropna()
df_cvfr2010['year']= df_cvfr2010['year'].round(0).astype(int).astype(str)
df_cvfr2010['month']= df_cvfr2010['month'].round(0).astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_cvfr2010['day']= df_cvfr2010['day'].round(0).astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_cvfr2010['hour']= df_cvfr2010['hour'].round(0).astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_cvfr2010['minute']=df_cvfr2010['minute'].round(0).astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_cvfr2010['second']= df_cvfr2010['second'].round(0).astype(int).astype(str)
df_cvfr2010['datetime'] = df_cvfr2010['year'] + df_cvfr2010['month'] + df_cvfr2010['day'] +\
'T' + df_cvfr2010['hour']+ ':' + df_cvfr2010['minute']
df_cvfr2010['datetime'] = pd.to_datetime(df_cvfr2010['datetime'])
df_cvfr2010 = df_cvfr2010.set_index('datetime')
df_cvfr2010 = df_cvfr2010[['flowrate']]
df_cvfr2010 = df_cvfr2010.rename_axis(None)
df_cvfr2010

## Ras Data

7 variables (Year, Month, Day, Hour, Minute, Second, Temperature)

separator = single blank space

Notes:
1. Time is given as a vector here.  This was the simplest conversion from the internal Matlab datenum. 
2. Temperature is in degrees C. Temperature is averaged at 15 minute intervals starting at 15 minutes after the hour.
3. This data is from the RAS instument's inflow temperature sensor.
4. The RAS is a water sampling instrument that takes samples when a user request or inflow temperature sensor exceeds a given value. The temperature sensor is under a "hood" that keeps the ambient ocean bottom waters from mixing with the discharging diffuse fluids.  

I don't expect these data to match up with the COVIS data but you never know.



In [None]:
!rm ~jovyan/data/covis_data/rasdata_2010.csv
!echo 'year,month,day,hour,minute,second,temp' | cat - ~jovyan/data/covis_data/rasdata_2010.txt > ~jovyan/data/covis_data/temp.txt
!sed 's/   /,/g' ~jovyan/data/covis_data/temp.txt > ~jovyan/data/covis_data/temp1.txt
!sed 's/^,//' <~jovyan/data/covis_data/temp1.txt >~jovyan/data/covis_data/rasdata_2010.csv
!rm ~jovyan/data/covis_data/temp*

In [None]:
path = '/home/jovyan/data/covis_data/rasdata_2010.csv'
df_ras2010 = pd.read_csv(path, sep=",")
df_ras2010['year']= df_ras2010['year'].astype(int).astype(str)
df_ras2010['month']= df_ras2010['month'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_ras2010['day']= df_ras2010['day'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_ras2010['hour']= df_ras2010['hour'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_ras2010['minute']=df_ras2010['minute'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_ras2010['datetime'] = df_ras2010['year'] + df_ras2010['month'] + df_ras2010['day'] +\
'T' + df_ras2010['hour']+ ':' + df_ras2010['minute']
df_ras2010['datetime'] = pd.to_datetime(df_ras2010['datetime'])
df_ras2010 = df_ras2010.set_index('datetime')
df_ras2010 = df_ras2010[['temp']]
df_ras2010 = df_ras2010.rename_axis(None)
df_ras2010

## Weather Data

(weather_data_for_plotting*.txt files)

 An explanation of the variable names can be found at https://www.ndbc.noaa.gov/measdes.shtml

13 variables (Year, Month, Day, Hour, Minute, Seconds, Julian day, wave height, wind direction, wind speed, wind gust speed, atmospheric pressure, air temperature)

separator = single blank space

Notes:
1. Time is given both as a vector (from original data file probably) and as a Julian day.
2. No information on other units by m/s likely for speeds.  Rest should match the NOAA information.  
3. I probably still have the original NOAA data files for these buoys. I'll see if I can match data files to the time periods; my directories seemed a little confused. 

**2010 C46036**

In [None]:
!rm ~jovyan/data/covis_data/weather_data_for_plotting_2010_C46036.csv
!echo 'year,month,day,hour,minute,seconds,jday,wv_height,wnd_dir,wnd_spd,wnd_gspd,atm_prs,air_temp' | cat - ~jovyan/data/covis_data/weather_data_for_plotting_2010_C46036.txt > ~jovyan/data/covis_data/temp.txt
!sed 's/   /,/g' ~jovyan/data/covis_data/temp.txt > ~jovyan/data/covis_data/temp1.txt
!sed 's/^,//' <~jovyan/data/covis_data/temp1.txt >~jovyan/data/covis_data/weather_data_for_plotting_2010_C46036.csv
!rm ~jovyan/data/covis_data/temp*

In [None]:
path = '/home/jovyan/data/covis_data/weather_data_for_plotting_2010_C46036.csv' 
df_wd2010c46036 = pd.read_csv(path, sep=",")
df_wd2010c46036['year']= df_wd2010c46036['year'].astype(int).astype(str)
df_wd2010c46036['month']= df_wd2010c46036['month'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010c46036['day']= df_wd2010c46036['day'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010c46036['hour']= df_wd2010c46036['hour'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010c46036['minute']=df_wd2010c46036['minute'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010c46036['datetime'] = df_wd2010c46036['year'] + df_wd2010c46036['month'] + df_wd2010c46036['day'] +\
'T' + df_wd2010c46036['hour']+ ':' + df_wd2010c46036['minute']
df_wd2010c46036['datetime'] = pd.to_datetime(df_wd2010c46036['datetime'])
df_wd2010c46036 = df_wd2010c46036.set_index('datetime')
df_wd2010c46036 = df_wd2010c46036[['jday', 'wv_height', 'wnd_dir', 'wnd_spd', 'wnd_gspd', 'atm_prs', 'air_temp']]
df_wd2010c46036 = df_wd2010c46036.rename_axis(None)
df_wd2010c46036

**2011 C46036**

In [None]:
!rm ~jovyan/data/covis_data/weather_data_for_plotting_2011_C46036.csv
!echo 'year,month,day,hour,minute,seconds,jday,wv_height,wnd_dir,wnd_spd,wnd_gspd,atm_prs,air_temp' | cat - ~jovyan/data/covis_data/weather_data_for_plotting_2011_C46036.txt > ~jovyan/data/covis_data/temp.txt
!sed 's/   /,/g' ~jovyan/data/covis_data/temp.txt > ~jovyan/data/covis_data/temp1.txt
!sed 's/^,//' <~jovyan/data/covis_data/temp1.txt >~jovyan/data/covis_data/weather_data_for_plotting_2011_C46036.csv
!rm ~jovyan/data/covis_data/temp*

In [None]:
path = '/home/jovyan/data/covis_data/weather_data_for_plotting_2011_C46036.csv'
df_wd2011c46036 = pd.read_csv(path, sep=",")
df_wd2011c46036['year']= df_wd2011c46036['year'].astype(int).astype(str)
df_wd2011c46036['month']= df_wd2011c46036['month'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011c46036['day']= df_wd2011c46036['day'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011c46036['hour']= df_wd2011c46036['hour'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011c46036['minute']=df_wd2011c46036['minute'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011c46036['datetime'] = df_wd2011c46036['year'] + df_wd2011c46036['month'] + df_wd2011c46036['day'] +\
'T' + df_wd2011c46036['hour']+ ':' + df_wd2011c46036['minute']
df_wd2011c46036['datetime'] = pd.to_datetime(df_wd2011c46036['datetime'])
df_wd2011c46036 = df_wd2011c46036.set_index('datetime')
df_wd2011c46036 = df_wd2011c46036[['jday', 'wv_height', 'wnd_dir', 'wnd_spd', 'wnd_gspd', 'atm_prs', 'air_temp']]
df_wd20110c46036 = df_wd2011c46036.rename_axis(None)
df_wd2011c46036

**2010 Tillamook**

In [None]:
!head ~jovyan/data/covis_data/weather_data_for_plotting_2010_Tillamook.txt

In [None]:
!rm ~jovyan/data/covis_data/weather_data_for_plotting_2010_Tillamook.csv
!echo 'year,month,day,hour,minute,seconds,jday,wv_height,wnd_dir,wnd_spd,wnd_gspd,atm_prs,air_temp' | cat - ~jovyan/data/covis_data/weather_data_for_plotting_2010_Tillamook.txt > ~jovyan/data/covis_data/temp.txt
!sed 's/   /,/g' ~jovyan/data/covis_data/temp.txt > ~jovyan/data/covis_data/temp1.txt
!sed 's/,,,,/,/g' ~jovyan/data/covis_data/temp1.txt > ~jovyan/data/covis_data/temp2.txt
!sed 's/^,//' <~jovyan/data/covis_data/temp2.txt >~jovyan/data/covis_data/weather_data_for_plotting_2010_Tillamook.csv
!rm ~jovyan/data/covis_data/temp*
!head ~jovyan/data/covis_data/weather_data_for_plotting_2010_Tillamook.csv

In [None]:
path = '/home/jovyan/data/covis_data/weather_data_for_plotting_2010_Tillamook.csv' 
df_wd2010Tillamook = pd.read_csv(path, sep=",")
df_wd2010Tillamook['year']= df_wd2010Tillamook['year'].astype(int).astype(str)
df_wd2010Tillamook['month']= df_wd2010Tillamook['month'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010Tillamook['day']= df_wd2010Tillamook['day'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010Tillamook['hour']= df_wd2010Tillamook['hour'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010Tillamook['minute']=df_wd2010Tillamook['minute'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2010Tillamook['wv_height'] = df_wd2010Tillamook['wv_height'].astype('float64')
df_wd2010Tillamook['wnd_dir'] = df_wd2010Tillamook['wnd_dir'].astype('float64')
df_wd2010Tillamook['wnd_spd'] = df_wd2010Tillamook['wnd_spd'].astype('float64')
df_wd2010Tillamook['wnd_gspd'] = df_wd2010Tillamook['wnd_gspd'].astype('float64')
df_wd2010Tillamook['atm_prs'] = df_wd2010Tillamook['atm_prs'].astype('float64')
df_wd2010Tillamook['air_temp'] = df_wd2010Tillamook['air_temp'].astype('float64')
df_wd2010Tillamook['datetime'] = df_wd2010Tillamook['year'] + df_wd2010Tillamook['month'] + df_wd2010Tillamook['day'] +\
'T' + df_wd2010Tillamook['hour']+ ':' + df_wd2010Tillamook['minute']
df_wd2010Tillamook['datetime'] = pd.to_datetime(df_wd2010Tillamook['datetime'])
df_wd2010Tillamook = df_wd2010Tillamook.set_index('datetime')
df_wd2010Tillamook = df_wd2010Tillamook[['jday', 'wv_height', 'wnd_dir', 'wnd_spd', 'wnd_gspd', 'atm_prs', 'air_temp']]
df_wd2010Tillamook = df_wd2010Tillamook.rename_axis(None)
df_wd2010Tillamook

**2011 Tillamook**

In [None]:
!head ~jovyan/data/covis_data/weather_data_for_plotting_2011_Tillamook.txt

In [None]:
!rm ~jovyan/data/covis_data/weather_data_for_plotting_2011_Tillamook.csv
!echo 'year,month,day,hour,minute,seconds,jday,wv_height,wnd_dir,wnd_spd,wnd_gspd,atm_prs,air_temp' | cat - ~jovyan/data/covis_data/weather_data_for_plotting_2011_Tillamook.txt > ~jovyan/data/covis_data/temp.txt
!sed 's/   /,/g' ~jovyan/data/covis_data/temp.txt > ~jovyan/data/covis_data/temp1.txt
!sed 's/,,,,/,/g' ~jovyan/data/covis_data/temp1.txt > ~jovyan/data/covis_data/temp2.txt
!sed 's/  /,/g' ~jovyan/data/covis_data/temp2.txt > ~jovyan/data/covis_data/temp3.txt
!sed 's/^,//' <~jovyan/data/covis_data/temp3.txt >~jovyan/data/covis_data/weather_data_for_plotting_2011_Tillamook.csv
!rm ~jovyan/data/covis_data/temp*
!head ~jovyan/data/covis_data/weather_data_for_plotting_2011_Tillamook.csv

In [None]:
path = '/home/jovyan/data/covis_data/weather_data_for_plotting_2011_Tillamook.csv' 
df_wd2011Tillamook = pd.read_csv(path, sep=",", engine='python')
df_wd2011Tillamook= df_wd2011Tillamook.dropna()
df_wd2011Tillamook['year']= df_wd2011Tillamook['year'].astype(int).astype(str)
df_wd2011Tillamook['month']= df_wd2011Tillamook['month'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011Tillamook['day']= df_wd2011Tillamook['day'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011Tillamook['hour']= df_wd2011Tillamook['hour'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011Tillamook['minute']=df_wd2011Tillamook['minute'].astype(int).astype(str).str.pad(width=2, side='left', fillchar='0')
df_wd2011Tillamook['wv_height'] = df_wd2011Tillamook['wv_height'].astype('float64')
df_wd2011Tillamook['wnd_dir'] = df_wd2011Tillamook['wnd_dir'].astype('float64')
df_wd2011Tillamook['wnd_spd'] = df_wd2011Tillamook['wnd_spd'].astype('float64')
df_wd2011Tillamook['wnd_gspd'] = df_wd2011Tillamook['wnd_gspd'].astype('float64')
df_wd2011Tillamook['atm_prs'] = df_wd2011Tillamook['atm_prs'].astype('float64')
df_wd2011Tillamook['air_temp'] = df_wd2011Tillamook['air_temp'].astype('float64')
df_wd2011Tillamook['datetime'] = df_wd2011Tillamook['year'] + df_wd2011Tillamook['month'] + df_wd2011Tillamook['day'] +\
'T' + df_wd2011Tillamook['hour']+ ':' + df_wd2011Tillamook['minute']
df_wd2011Tillamook['datetime'] = pd.to_datetime(df_wd2011Tillamook['datetime'])
df_wd2011Tillamook = df_wd2011Tillamook.set_index('datetime')
df_wd2011Tillamook = df_wd2011Tillamook[['jday', 'wv_height', 'wnd_dir', 'wnd_spd', 'wnd_gspd', 'atm_prs', 'air_temp']]
df_wd2011Tillamook = df_wd2011Tillamook.rename_axis(None)
df_wd2011Tillamook

## Step Two - Plot The Data
* What patterns do you see?  
* What else can you do with this data?
* Do the different data sets correlate?

**Plume Bending Data**

**Time series of direction and magnitude:**
For current data, the magnitude would be a speed.  For the plumes, what we have is an angle from the vertical.  So for the other plots, we will need to assume a distance over which that angle applies.

**2010 Direction**

In [None]:
df_bd2010.hvplot(x = 'index', y= 'direction') 

**2010 Degrees from Vertical**

In [None]:
df_bd2010.hvplot(x = 'index', y= 'deg_from_vert', color= 'darkorange')

**2010 Direction and Degrees from Vertical**

In [None]:
df_bd2010.hvplot(x = 'index', y= 'direction')  * df_bd2010.hvplot(x = 'index', y= 'deg_from_vert')

**2011 Direction**

In [None]:
df_bd2011.hvplot(x = 'index', y= 'direction') 

**2011 Degrees from Vertical**

In [None]:
df_bd2011.hvplot(x = 'index', y= 'deg_from_vert', color= 'darkorange') 

**2011 Direction and Degrees from Vertical**

In [None]:
df_bd2011.hvplot(x = 'index', y= 'direction') * df_bd2011.hvplot(x = 'index', y= 'deg_from_vert') 

**Time series of East and North components:**  
For current information (direction and speed), this is a straightforward conversion using the standard polar coordinate equations with the direction as the angle and the speed as the range.  Note, however, that all our directions are angles from North (+y axis direction) and that python is likely to define angles relative to the +x or -y axis, so you may need to apply a correction before using polar coordinates.  For plumes, you need the 3D polar coordinates (two angles and a range). The direction is the azimuth or angle in the 2D plane while the bending magnitude is the angle from vertical.  You will need to assume a “unit” height.

**2010 North**

In [None]:
df_bd2010.hvplot(x = 'index', y= 'north (x)')

**2010 East**

In [None]:
df_bd2010.hvplot(x = 'index', y= 'east (y)', color= 'darkorange')

**2010 North and East**

In [None]:
df_bd2010.hvplot(x = 'index', y= 'north (x)') * df_bd2010.hvplot(x = 'index', y= 'east (y)')

**2011 North**

In [None]:
df_bd2011.hvplot(x = 'index', y= 'north (x)')

**2011 East**

In [None]:
df_bd2011.hvplot(x = 'index', y= 'east (y)', color= 'darkorange')

**2011 North and East**

In [None]:
df_bd2011.hvplot(x = 'index', y= 'north (x)') * df_bd2011.hvplot(x = 'index', y= 'east (y)')

**Centerline Vertical Flow Rate**

In [None]:
df_cvfr2010.hvplot(x = 'index', y= 'flowrate')

**Ras Data**

In [None]:
df_ras2010.hvplot(x = 'index', y= 'temp')

**Weather Data**

**Weather Data 2010 C46036:**

1. Wave Height

In [None]:
df_wd2010c46036.hvplot(x = 'index', y= ['wv_height'])

2. Wind Direction

In [None]:
df_wd2010c46036.hvplot(x = 'index', y= ['wnd_dir'])

3. Wind Speed

In [None]:
df_wd2010c46036.hvplot(x = 'index', y= ['wnd_spd'])

4. Wind Gust Speed

In [None]:
df_wd2010c46036.hvplot(x = 'index', y= ['wnd_gspd'])

5. Atmospheric Pressure

In [None]:
df_wd2010c46036.hvplot(x = 'index', y= ['atm_prs'])

6. Air Temperature

In [None]:
df_wd2010c46036.hvplot(x = 'index', y= ['air_temp'])

7. All Variables

In [None]:
df_wd2010c46036.hvplot(x='index', y=['wv_height',  'wnd_dir', 'wnd_spd', 'wnd_gspd', 'atm_prs', 'air_temp'], 
                width=350, height=300, subplots=True, shared_axes=False, title= 'wd2010c46036').cols(2)

**Weather Data 2011 C46036:**

1. Wave Height

In [None]:
df_wd2011c46036.hvplot(x = 'datetime', y= ['wv_height'])

2. Wind Direction

In [None]:
df_wd2011c46036.hvplot(x = 'datetime', y= ['wnd_dir'])

3. Wind Speed

In [None]:
df_wd2011c46036.hvplot(x = 'datetime', y= ['wnd_spd'])

4. Wind Gust Speed

In [None]:
df_wd2011c46036.hvplot(x = 'datetime', y= ['wnd_gspd'])

5. Atmospheric Pressure

In [None]:
df_wd2011c46036.hvplot(x = 'datetime', y= ['atm_prs'])

6. Air Temperature

In [None]:
df_wd2011c46036.hvplot(x = 'datetime', y= ['air_temp'])

7. All Variables

In [None]:
df_wd2011c46036.hvplot(x='datetime', y=['wv_height',  'wnd_dir', 'wnd_spd', 'wnd_gspd', 'atm_prs', 'air_temp'], 
                width=350, height=300, subplots=True, shared_axes=False, title= 'wd2011c46036').cols(2)

**Weather Data 2010 Tillamook:**
1. Wind Direction

In [None]:
df_wd2010Tillamook.hvplot(x = 'index' , y= ['wnd_dir'])

2. Wind Speed

In [None]:
df_wd2010Tillamook.hvplot(x = 'index' , y= ['wnd_spd'])

3. Wind Gust Speed

In [None]:
df_wd2010Tillamook.hvplot(x = 'index', y= ['wnd_gspd'])

4. Air Temperature

In [None]:
df_wd2010Tillamook.hvplot(x = 'index', y= ['air_temp'])

5. All Variables

In [None]:
df_wd2010Tillamook.hvplot(x='index', y=['wv_height',  'wnd_dir', 'wnd_spd', 'wnd_gspd', 'atm_prs', 'air_temp'], 
                width=350, height=300, subplots=True, shared_axes=False, title= 'wd2010Tillamook').cols(2)

**Weather Data 2011 Tillamook:**

1. Wind Direction

In [None]:
df_wd2011Tillamook.hvplot(x = 'index', y= ['wnd_dir'])

2. Wind Speed

In [None]:
df_wd2011Tillamook.hvplot(x = 'index', y= ['wnd_spd'])

3. Wind Gust Speed

In [None]:
df_wd2011Tillamook.hvplot(x = 'index', y= ['wnd_gspd'])

4. Air Temperature

In [None]:
df_wd2011Tillamook.hvplot(x = 'index', y= ['air_temp'])

5. All Variables

In [None]:
df_wd2011Tillamook.hvplot(x='index', y=['wv_height',  'wnd_dir', 'wnd_spd', 'wnd_gspd', 'atm_prs', 'air_temp'], 
                width=350, height=300, subplots=True, shared_axes=False, title= 'wd2011T').cols(2)

## Step Three - Statistical Tests

The goal here is to use statistical tests to identify or confirm correlations and common patterns.  Correlation tests include the correlation coefficient (sometimes known as R2) and cross-correlation. Correlation coefficients test if variables shift value in sync.  Cross-correlation compares pairs of time series and tests for consistent long-term patterns.  We will also look for periodicities.

Issues:
* Both correlation coefficient estimation and cross-correlation computation assume that the values in the different datasets correspond … here it would be in time.  So we will need to do some interpolation to get our time series onto similar sampling times.
* Most estimators of periodicity assume regular or uniform spacing of the data. There are, however, methods for working with data with gaps.
* Periodicity estimates also change with the length of the data set.  So we will need to think about how to best compare the 2010 and 2011 data sets given the short 2010 record.

1. Extract timing information for the various data sets, especially start time, stop time, and time intervals.  We need both the typical (not necessarily average) time step as well as some information on the number and size of data gaps.  We will use this to determine how to proceed with the correlation steps.  In particular, it will be useful to know which data are sampled at similar or faster rates than the plume bending data.


*df_times shows the start time, end time, and typical time interval for each data set. All values were gathered from just looking through the data. As a result, the time intervals for the bend data I am unsure about due to the data sets having a high variability regarding the times data was taken.* ***I am unsure on whether there's a way to get them through Python.***

In [None]:
times = {'Bending Data 2010':['2010-09-29 15:30:17','2010-10-25 00:08:03','unsure, highly variable'],
        'Bending Data 2011':['2011-09-27 00:00:38','2011-12-30 21:00:40','3 hours but highly variable'],
        'Centerline Vertical Flow Rate':['2010-09-30 20:31:00','2010-10-25 21:25:00','6 hours'],
        'Ras Data':['2010-10-04 20:45:00','2010-10-27 08:15:00','45-60 minutes'],
        'Weather Data 2010 C':['2010-09-30 23:43:00','2010-10-31 23:43:00','hourly'],
        'Weather Data 2011 C':['2011-09-25 23:43:00','2011-12-31 23:43:00','hourly'],
        'Weather Data 2010 T':['2010-09-29 00:07:00','2010-11-01 23:51:00','20 minutes'],
        'Weather Data 2011 T':['2010-10-01 00:10:00','2013-12-01 23:55:00','20 minutes']}
df_times = pd.DataFrame(times,index = ['start time','end time','time interval'])
df_times

*df_nanvalues shows the number of NaN values for each column in the weather data. The other data sets were excluded due to there being no NaN values (you can also exclude the C46036 weather data for the same reason).* ***I am currently unsure how to find the size of each data gap.***

In [None]:
nanvalues = {'Weather Data 2010 C':[df_wd2010c46036['wv_height'].isna().sum(),df_wd2010c46036['wnd_dir'].isna().sum(),df_wd2010c46036['wnd_spd'].isna().sum(),df_wd2010c46036['wnd_gspd'].isna().sum(),df_wd2010c46036['atm_prs'].isna().sum(),df_wd2010c46036['air_temp'].isna().sum()],
        'Weather Data 2011 C':[df_wd2011c46036['wv_height'].isna().sum(),df_wd2011c46036['wnd_dir'].isna().sum(),df_wd2011c46036['wnd_spd'].isna().sum(),df_wd2011c46036['wnd_gspd'].isna().sum(),df_wd2011c46036['atm_prs'].isna().sum(),df_wd2011c46036['air_temp'].isna().sum()],
        'Weather Data 2010 T':[df_wd2010Tillamook['wv_height'].isna().sum(),df_wd2010Tillamook['wnd_dir'].isna().sum(),df_wd2010Tillamook['wnd_spd'].isna().sum(),df_wd2010Tillamook['wnd_gspd'].isna().sum(),df_wd2010Tillamook['atm_prs'].isna().sum(),df_wd2010Tillamook['air_temp'].isna().sum()],
        'Weather Data 2011 T':[df_wd2011Tillamook['wv_height'].isna().sum(),df_wd2011Tillamook['wnd_dir'].isna().sum(),df_wd2011Tillamook['wnd_spd'].isna().sum(),df_wd2011Tillamook['wnd_gspd'].isna().sum(),df_wd2011Tillamook['atm_prs'].isna().sum(),df_wd2011Tillamook['air_temp'].isna().sum()]}
df_nanvalues = pd.DataFrame(nanvalues,index = ['nan values wv_height','nan values wnd_dir','nan values wnd_spd','nan values wnd_gspd','nan values atm_prs','nan values air_temp'])
df_nanvalues

2. Estimate some basic statistics for key data sets: means, standard deviations, obvious trends.  This will be useful in defining what extreme bending means if nothing else.

*df_bdstats and df_wdstats show the means and standard deviations for all columns of data for the bending data and weather data respectively. The Centerline Vertical Flow Rate and Ras Data were excluded for the time being*

Bending Data

In [None]:
bdstats = {'Bending Data 2010':[df_bd2010['direction'].mean(),df_bd2010['deg_from_vert'].mean(),df_bd2010['direction'].std(),df_bd2010['deg_from_vert'].std()],
        'Bending Data 2011':[df_bd2011['direction'].mean(),df_bd2011['deg_from_vert'].mean(),df_bd2011['direction'].std(),df_bd2011['deg_from_vert'].std()]}
df_bdstats = pd.DataFrame(bdstats,index = ['mean direction','mean degrees from vertical','direction standard deviation','degrees from vertical standard deviation'])
df_bdstats

Weather Data

In [None]:
wdstats = {'Weather Data 2010 C':[df_wd2010c46036['wv_height'].mean(),df_wd2010c46036['wnd_dir'].mean(),df_wd2010c46036['wnd_spd'].mean(),df_wd2010c46036['wnd_gspd'].mean(),df_wd2010c46036['atm_prs'].mean(),df_wd2010c46036['air_temp'].mean(),
                                  df_wd2010c46036['wv_height'].std(),df_wd2010c46036['wnd_dir'].std(),df_wd2010c46036['wnd_spd'].std(),df_wd2010c46036['wnd_gspd'].std(),df_wd2010c46036['atm_prs'].std(),df_wd2010c46036['air_temp'].std()],
           'Weather Data 2011 C':[df_wd2011c46036['wv_height'].mean(),df_wd2011c46036['wnd_dir'].mean(),df_wd2011c46036['wnd_spd'].mean(),df_wd2011c46036['wnd_gspd'].mean(),df_wd2011c46036['atm_prs'].mean(),df_wd2011c46036['air_temp'].mean(),
                                  df_wd2011c46036['wv_height'].std(),df_wd2011c46036['wnd_dir'].std(),df_wd2011c46036['wnd_spd'].std(),df_wd2011c46036['wnd_gspd'].std(),df_wd2011c46036['atm_prs'].std(),df_wd2011c46036['air_temp'].std()],
           'Weather Data 2010 T':[np.nan,df_wd2010Tillamook['wnd_dir'].mean(),df_wd2010Tillamook['wnd_spd'].mean(),df_wd2010Tillamook['wnd_gspd'].mean(),np.nan,df_wd2010Tillamook['air_temp'].mean(),
                                  np.nan,df_wd2010Tillamook['wnd_dir'].std(),df_wd2010Tillamook['wnd_spd'].std(),df_wd2010Tillamook['wnd_gspd'].std(),np.nan,df_wd2010Tillamook['air_temp'].std()],
           'Weather Data 2011 T':[df_wd2011Tillamook['wv_height'].mean(),df_wd2011Tillamook['wnd_dir'].mean(),df_wd2011Tillamook['wnd_spd'].mean(),df_wd2011Tillamook['wnd_gspd'].mean(),df_wd2011Tillamook['atm_prs'].mean(),df_wd2011Tillamook['air_temp'].mean(),
                                  df_wd2011Tillamook['wv_height'].std(),df_wd2011Tillamook['wnd_dir'].std(),df_wd2011Tillamook['wnd_spd'].std(),df_wd2011Tillamook['wnd_gspd'].std(),df_wd2011Tillamook['atm_prs'].std(),df_wd2011Tillamook['air_temp'].std()]}
df_wdstats = pd.DataFrame(wdstats,index = ['mean wave height','mean wind direction','mean wind speed','mean wind gust speed','mean atmospheric pressue','mean air temperature', 'wave height standard deviation','wind direction standard deviation','wind speed standard deviation','wind gust speed standard deviation','atmospheric pressure standard deviation','air temperature standard deviation'])
df_wdstats

3. Identify which data sets are key. Which data sets are central to the goals or questions?  Which data sets seem to have common features, patterns, or trends?  This may require plotting data together in ways or combinations that have not yet seemed obvious.

4. Interpolate data onto a common time sampling scheme.  All the COVIS data should be at similar but not necessarily identical times: the vertical velocity data will be offset about 20-40 minutes later than the bending data and may have more gaps.  I don't remember about the weather data and each buoy may be different.  I forget what else there is that I gave you.  I think there is some relevant data that we have not yet pulled from its repository.

*As a test, result is a dataframe that combines the data in df_bd2010 with df_wd2010c46036 into a single dataframe. This also combines the datetimes from each data set and a graph was made to plot both data sets with this single index.* ***The main issue at the moment is that neither of the seperate dataframes have values that match on any datetime, so, for example, where df_bd2010 has data, df_wd2010c46036 shows a NaN values and I am unsure how to make it so you can use the combined dataframe to find a correlation coefficient (a quick test done previously resulted in the correlation coefficient = NaN)***

In [None]:
result = df_bd2010.join(df_wd2010c46036, how='outer')
result

In [None]:
fig = plt.figure(figsize=(20, 10))
ax1 = fig.add_subplot(111)
ax1.plot(result.index, result['direction'])
ax1.set_ylabel('direction')

ax2 = ax1.twinx()
ax2.plot(result.index, result['wv_height'], 'r-')
ax2.set_ylabel('wv_height', color='r')
for tl in ax2.get_yticklabels():
    tl.set_color('r')

5. So you might think a little about what information you'd like to have but don't.

6. Once interpolated to a common sampling, you can estimate the correlation coefficient.  This expects a linear pattern so it might or might not pick up on all relationships that exist.  Also, some of the directions in the bending data are not correct.  So I need to get you an updated data file soon. But you can figure out how  to do this with the existing data while I figure out the problems with the old data.

7. Cross-correlation is the next step after that.

8. Then we need to figure out about periodicities.  The Lomb-Scargle method is the best one for this data.  I think it exists in python but we will need to track that information down.

9. Another things to think about is the definition of a weather event.  What is a storm? How will it appear in the data? What does that mean about what we expect to find in the bending data.