<a href="https://colab.research.google.com/github/alexanderknysh/adcpml/blob/main/adcpml.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A machine learning algorithm for near-shore ADCP data processing

The following machine learning study aims to select several typical and several anomalous (less than 20-30 in total) loadcases from a detailed ADCP datasets obtained by the Wood Island research site, Maine, USA. Two ADCPs measured water depth, significant wave height and period, as well as north and east projections of current velocity profiles during a half of a month period. More details on the research can be found in [Section 3.2 of this paper](https://github.com/alexanderknysh/adcpml/blob/main/paper.pdf).

## Data formatting
Before we start processing the ADCP datasets, let's first list the libraries needed for the analysis

In [None]:
# required libraries
import pandas as pd
import numpy  as np
import math   as m
import matplotlib.pyplot   as     plt
from collections           import Counter
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster       import KMeans
from sklearn               import metrics

and define a function that will format velocity profiles to the total fluid energy in the upper ocean layers:

In [None]:
# function that format velocity profile datasets
def format_profile(data, top): 
  output = np.empty((0,top))
  for i in range(data.shape[0]):
    row = data[i]
    row = row[np.logical_not(np.isnan(row))]
    row = row[-top:]
    row = np.dot(row, row)
    output = np.append(output, row)
  return output

Next, upload the datasets from the [following GitHub repository](https://github.com/alexanderknysh/adcpml):

In [None]:
# upload datasets common for both adcps:  wave properties and water depth
adcpdata = pd.read_excel('https://github.com/alexanderknysh/adcpml/blob/main/data_adcp_cm.xlsx?raw=true')

# upload velocity dataset: west and east adpc profiles
west_vx  = pd.read_excel('https://github.com/alexanderknysh/adcpml/blob/main/west_adcp_vx.xlsx?raw=true')
west_vy  = pd.read_excel('https://github.com/alexanderknysh/adcpml/blob/main/west_adcp_vy.xlsx?raw=true')
east_vx  = pd.read_excel('https://github.com/alexanderknysh/adcpml/blob/main/east_adcp_vx.xlsx?raw=true')
east_vy  = pd.read_excel('https://github.com/alexanderknysh/adcpml/blob/main/east_adcp_vy.xlsx?raw=true')

# other important data
samples = range(0, adcpdata.shape[0]) # range of field samples
alpha   = 13*m.pi/180                 # major axis of tidal ellipse (13 degrees)
cells   = 9                           # number of velocity measurement cells

In ocean engineering, tidal-driven current velocities are usually represented in terms of projections on major and minor axes of a tidal ellipse. In this study, we are mostly interested in the major projections since they have the highest absolute values of current velocities. Both west and east major velocity profiles are converted to the energy values that represent total kinetic energy in the upper water layers (four meters deep). This also reduce number of features we have to deal with in the future.

In [None]:
# project the velocity profiles on the major axis of the tidal ellipse
# save as numpy array to ease further formatting
# convert profiles to relative energy
west_major = west_vx * m.cos(alpha) - west_vy * m.sin(alpha)
east_major = east_vx * m.cos(alpha) - east_vy * m.sin(alpha)
west_major = west_major.to_numpy()
east_major = east_major.to_numpy()
adcpdata['WestEnergy'] = format_profile(west_major, cells)
adcpdata['EastEnergy'] = format_profile(east_major, cells)

# display the resulting dataset
display(adcpdata)

Unnamed: 0,Date,WaterDepth,WaveHeight,WavePeriod,WestEnergy,EastEnergy
0,2019-05-16 14:15:00.000,9.475,0.235,11.075,0.174540,0.136834
1,2019-05-16 14:30:00.000,9.395,0.250,12.150,0.271423,0.177364
2,2019-05-16 14:44:59.990,9.289,0.240,11.370,0.249177,0.214879
3,2019-05-16 14:59:59.985,9.160,0.235,11.555,0.355531,0.250243
4,2019-05-16 15:14:59.980,9.007,0.210,11.640,0.471080,0.328769
...,...,...,...,...,...,...
1157,2019-05-28 15:29:54.215,7.275,0.275,8.780,0.025099,0.019459
1158,2019-05-28 15:44:54.210,7.176,0.260,8.640,0.063899,0.029413
1159,2019-05-28 15:59:54.205,7.083,0.260,8.660,0.072142,0.073600
1160,2019-05-28 16:14:54.200,7.040,0.280,8.360,0.097749,0.094061


Let's now visualize the final dataset using the Matplotlib: