# TUD-COASTAL instrument processing
This notebook will guide you through the data postprocessing of the ADV-instruments. If you have any questions, please contact [m.a.vanderlugt@tudelft.nl].

# 1. Read/store raw data

In [1]:
import os
import sys
from datetime import datetime
import numpy as np
import xarray as xr
sys.path.append(r'c:\checkouts\python\TUD-COASTAL\instrumentProcessing')
import puv
from vector import Vector
from KNMI_readers import read_knmi_uurgeg

### Data management
Before we start processing the data we need to first define the location of the measurement data files, the start and stop time of the measurement window we would like to process and a location to write the netcdf output data to. We will also create some information metadata later on; in this part we will create a name for the instrument we will process. Below you can find an example of the code, please adjust accordingly. 

In [2]:
# location of raw data
dataFolder = r'c:\checkouts\python\TUD-COASTAL\instrumentProcessing\example_data\ADV\raw_phzd'
# name of the instantiated vector class
name = 'vec1'
# start time over which to read data (must be larger than first recorded time)
tstart = '2020-11-30 17:00:00'
# stop time over which to read data (must be smaller than last recorded time)
tstop = '2020-12-01 00:00:00'
# location of netcdfdata 
ncOutDir = r'c:\checkouts\python\TUD-COASTAL\instrumentProcessing\example_data\ADV\raw_netcdf'

In [3]:
# location of raw data
dataFolder = r'c:\checkouts\python\TUD-COASTAL\instrumentProcessing\example_data\ADV\raw_phzd'
# name of the instantiated vector class
name = 'vec1'
# start time over which to read data (must be larger than first recorded time)
tstart = '2020-11-30 17:00:00'
# stop time over which to read data (must be smaller than last recorded time)
# tstop = '2020-12-02 17:00:00'
tstop = '2020-12-01 00:00:00'
# location of netcdfdata 
ncOutDir = r'c:\checkouts\python\TUD-COASTAL\instrumentProcessing\example_data\ADV\raw_netcdf'

### Read the data
In this part we create a vec object, based on the Vector class. The class holds all sorts of functions and variables which we will use to process the raw data and transform it into the netcdf data used for further processing. 

In [4]:
# raw data to netcdf; create a vector object 'vec'
vec = Vector(name, dataFolder, tstart=tstart, tstop=tstop)

# reads the raw data from tstart to tstop and casts all data in a pandas DataFrame that is stored under vec.dfpuv.
# in case there is no data between tstart and tstop the DataFrame is not instantiated
vec.read_raw_data()

.dat file was read
.sen file was read


Unnamed: 0_level_0,u,v,w,p,anl1,anl2,a1,a2,a3,snr1,snr2,snr3,cor1,cor2,cor3
t,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2020-11-30 17:00:00.000000,-0.167,0.211,-0.020,11520.0,1615.0,0.0,156.0,157.0,154.0,45.1,45.6,44.7,99.0,99.0,99.0
2020-11-30 17:00:00.062500,-0.165,0.208,-0.026,11530.0,1605.0,0.0,159.0,154.0,160.0,46.4,44.3,47.3,99.0,99.0,98.0
2020-11-30 17:00:00.125000,-0.149,0.222,-0.003,11560.0,1597.0,0.0,153.0,154.0,157.0,43.9,44.3,46.0,99.0,99.0,99.0
2020-11-30 17:00:00.187500,-0.148,0.231,0.002,11550.0,1619.0,0.0,156.0,157.0,156.0,45.1,45.6,45.6,99.0,99.0,99.0
2020-11-30 17:00:00.250000,-0.137,0.233,0.003,11550.0,1607.0,0.0,155.0,155.0,161.0,44.7,44.7,47.7,99.0,99.0,98.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-11-30 23:59:59.687500,0.132,-0.242,0.015,9740.0,6800.0,0.0,159.0,163.0,165.0,46.4,48.2,49.4,99.0,99.0,99.0
2020-11-30 23:59:59.750000,0.118,-0.235,0.013,9730.0,6768.0,0.0,163.0,157.0,167.0,48.2,45.6,50.3,99.0,98.0,99.0
2020-11-30 23:59:59.812500,0.134,-0.222,-0.005,9710.0,6775.0,0.0,158.0,164.0,165.0,46.0,48.6,49.4,99.0,99.0,99.0
2020-11-30 23:59:59.875000,0.163,-0.208,-0.016,9740.0,6821.0,0.0,159.0,163.0,163.0,46.4,48.2,48.6,99.0,99.0,99.0


In [5]:
# break up the data into burst blocks
vec.cast_to_blocks_in_xarray(blockWidth=600)

# compute burst averages (make sure to read vector.py what is happening exactly!)
vec.compute_block_averages()

# all data is collected in an xarray Dataset ds. We extract this from the class instantiation and
# we can easily write it to netCDF
ds = vec.ds

### Create metadata
This part will create the metadata. Adding information to the measurement will help in the later part of postprocessing to keep track of which measurement you are working on. Filling it in is not necessary perse, but it will lessen the paper administration later on. Choose your adminstrative battles wisely. The example has been omitted, please replace the code below with your own data.

In [6]:
# add global attribute metadata
ds.attrs = {'Conventions': 'CF-1.6',
            'title': '{}'.format(vec.name),
            'instrument': '{}'.format('vec1'),
            'instrument serial number': '{}'.format(16725),
            'epsg': 28992,
            'x': 117196.6,
            'y': 559818.2,
            'time zone': 'UTC+2',
            'coordinate type': 'XYZ',
            'summary': 'December pilot field campaign',
            'contact person': 'Marlies van der Lugt',
            'emailadres': 'm.a.vanderlugt@tudelft.nl',
            'construction datetime': datetime.now().strftime("%d-%b-%Y (%H:%M:%S)"),
            'version': 'v1',
            'version comments': 'constructed with xarray'}

### Store the data
To reduce the size of the data we will compress the data and use listcomprehension to apply it to the dataset. After this step we save the dataset in a netcdf format in the output folder we previously created. The code will check whether this folder exists or not and create one if it is not in your path. 

In [7]:
#specify compression for all the variables to reduce file size
comp = dict(zlib=True, complevel=5)
ds.encoding = {var: comp for var in ds.data_vars}

# save to netCDF
if not os.path.exists(ncOutDir):
    os.mkdir(ncOutDir)
ds.to_netcdf(ncOutDir + r'\{}_pilot_short.nc'.format(vec.name))
#

# 2. Quality control adv
The raw data has been transformed into the netcdf format. Now, the quality of the data is checked. We define some quality control parameters, and detect outliers based on these parameters. The data will also be corrected for weather conditions and its position. First we start with defining the general settings and parameters for quality control. Please check all the parameters to fit your measurement data.

In [8]:
# if skipping the code above, one can immediately load the raw data from netcdf
ds = xr.open_dataset(r'c:\checkouts\python\TUD-COASTAL\instrumentProcessing\example_data\ADV\raw_netcdf\vec1_pilot_short.nc')

In [9]:
# height of instrument above bed
hi = 0.57 #m
# height of instruments pressure sensor above bed
hip = 0.27 #m
# bed level
zb = -1.13 #m NAP
# angle of x-pod of the vector head with respect to north (clockwise positive)
thet = 45
# density of water
rho = 1025 # kg/m3
# gravitational acceleration
g = 9.81 # m/s2
# parameters for the quality control:
QC = {
     'uLim':2.1, #maximum acceptable recorded u-velocity
     'vLim':2.1, #maximum acceptable recorded v-velocity
     'wLim':0.6, #maximum acceptable recorded w-velocity
     'corTreshold':70, #minimum correlation
     'maxFracNans': 0.02, #maximum fraction of rejected pings in the sample to proceed with processing based on interpolation
     'maxGap' : 4 #maximum amount of sequential rejected pings in the sample to proceed with processing based on interpolation
      }

### Add data parameters
At this point we will add some datacolumns to the existing dataset, based on the general settings defined above. 

In [10]:
# % add some data to the dataset
ds['zb'] = zb
ds['zb'].attrs = {'units': 'm+NAP', 'long_name': 'bed level'}

ds['zi'] = ds['zb'] + hi
ds['zi'].attrs = {'units': 'm+NAP', 'long_name': 'position probe'}

ds['zip'] = ds['zb'] + hip
ds['zip'].attrs = {'units': 'm+NAP', 'long_name': 'position pressure sensor'}

ds['rho'] = rho
ds['rho'].attrs = {'units': 'kg/m3', 'long_name': 'water density'}

ds['g'] = g
ds['g'].attrs = {'units': 'm', 'long_name': 'gravitational acceleration'}

### Quality control threshold and outlier detection
The pressure and velocity is now checked if it fits within the previously defined confidence range and if the observations are within range. Outliers are detected and removed from the dataset. 

In [11]:
# if correlation is outside confidence range
mc1 = ds.cor1 > QC['corTreshold']
mc2 = ds.cor2 > QC['corTreshold']
mc3 = ds.cor3 > QC['corTreshold']

# if observation is outside of velocity range
mu1 = np.abs(ds.u) < QC['uLim']
mu2 = np.abs(ds.v) < QC['uLim']
mu3 = np.abs(ds.w) < QC['uLim']

# if du larger than 4*std(u) then we consider it outlier and hence remove:
md1 = np.abs(ds.u.diff('N')) < 3 * ds.u.std(dim='N')
md1 = md1.combine_first(mu1)
md2 = np.abs(ds.v.diff('N')) < 3 * ds.v.std(dim='N')
md2 = md1.combine_first(mu2)
md3 = np.abs(ds.w.diff('N')) < 3 * ds.w.std(dim='N')
md3 = md1.combine_first(mu3)

ds['mc'] = np.logical_and(np.logical_and(mc1, mc2), mc3)
ds['mu'] = np.logical_and(np.logical_and(mu1, mu2), mu3)
ds['md'] = np.logical_and(np.logical_and(md1, md2), md3)
ds['mc'].attrs = {'units': '-', 'long_name': 'mask correlation'}
ds['mu'].attrs = {'units': '-', 'long_name': 'mask vel limit'}
ds['md'].attrs = {'units': '-', 'long_name': 'mask deviation'}

# if dp larger than 4*std(p) then we consider it outlier and hence remove:
mp = np.abs(ds.p.diff('N')) < 4 * ds.p.std(dim='N')
mp = xr.concat([mp.isel(N=0), mp], dim="N")

# add the mask variables as coordinates to the dataset
ds.coords['maskp'] = (('t', 'N'), mp.values)
ds.coords['maskv'] = (('t', 'N'), np.logical_and(np.logical_and(ds.mc.values, ds.mu.values), ds.md.values))

### Correct the data
The next step is correcting for air pressure fluctuations and drift. This will be done using the knmi file we loaded before.  

In [12]:
# location of knmi file (to correct for air pressure drift during the experiment)
knmiFile = r'c:\checkouts\python\TUD-COASTAL\instrumentProcessing\example_data\KNMI_20201208_hourly.txt'
# number of the knmi station (to make sure the correction is done with the correct KNMI station)
stationNumber = 235

In [13]:
# correct for the air pressure fluctuations and drift in the instrument
# first we load the data and add it to the dataset
dfp = read_knmi_uurgeg(
    knmiFile,
    stationNumber)
dt = ((ds.t[1] - ds.t[0]) / np.timedelta64(1, 's')).values
pAir = dfp['P'].to_xarray().resample({'t': '{}S'.format(dt)}).interpolate('linear')
ds['pAir'] = pAir.sel(t=slice(ds.t.min(), ds.t.max()))

# we correct for drift in air pressure, nothing else
ds['dpAir'] = ds['pAir'] - ds['pAir'].isel(t=0)

# correct the pressure signal with dpAir and with drift in instrument pressure
ds['pc'] = ds['p'] - ds['dpAir']
ds['pc'].attrs = {'units': 'Pa + NAP', 'long_name': 'pressure', 'comments': 'drift in air pressure is corrected'}

ds['eta'] = ds['pc'] / rho / g + ds.zip
ds['eta'].attrs = {'units': 'm+NAP', 'long_name': 'hydrostatic water level'}


In [14]:
# compute mean water level and water depth
ds['zsmean'] = ds.eta.mean(dim='N')
ds['zsmean'].attrs = {'units': 'm + NAP', 'long_name': 'water level',
                      'comments': 'burst averaged'}

ds['h'] = ds.zsmean - zb
ds['h'].attrs = {'units': 'm', 'long_name': 'water column height'}

Now the data is corrected we can rotate the data to ENU coordinate system. If the data is already measured in ENU, nothing needs to be done. After rotating we also remove data that is below the location of the sensor.


In [15]:
# #% rotate to ENU coordinates (this is only necessary if measurements were performed in XYZ coordinate system)
ufunc = lambda u,v: puv.rotate_velocities(u,v,thet-90)
ds['u'],ds['v'] = xr.apply_ufunc(ufunc,
                    ds['u'], ds['v'],
                    input_core_dims=[['N'], ['N']],
                    output_core_dims=[['N'],['N']],
                    vectorize=True)
ds['u'].attrs = {'units':'m/s','long_name':'velocity E'}
ds['v'].attrs = {'units':'m/s','long_name':'velocity N'}
ds['w'].attrs = {'units':'m/s','long_name':'velocity U'}

# remove pressure observations where the estimated water level is
# lower than the sensor height with margin of error of 10 cm
ds.coords['maskd'] = (('t', 'N'), zb+hi < (ds['eta'].values - 0.1))
ds[['u','v','w','p','pc','eta']] = ds[['u','v','w','p','pc','eta']].where(ds.maskp == True)
ds[['u','v','w','p','pc','eta']] = ds[['u','v','w','p','pc','eta']].where(ds.maskd == True)
ds[['u','v','w','p','pc','eta']] = ds[['u','v','w','p','pc','eta']].where(ds.maskv == True)

### Metadata update
The metadata will be updated after correcting the data. Version number goes up and extra information on the corrections made will be added. We can also omit the sen data, so this will be deleted from the dataset. Finally, the data will be compressed and saved to the directory for output we defined previously. 


In [16]:
# ammending the meta data to add extra info
ds.attrs['version'] = 'v2'
ds.attrs['coordinate type'] = 'ENU'
ds.attrs['comment'] = 'Quality checked data: pressure reference level corrected for airpressure drift,' + \
                 r'correlation and amplitude checks done and spikes were removed. ' + \
                 r'Velocities rotated to ENU coordinates based on heading and configuration in the field.'

# save to netCDF wwhere we don't include the sen data any more because we have only used it for the quality check
ds = ds.drop(['a1', 'a2', 'a3',
              'cor1', 'cor2', 'cor3',
              'snr1', 'snr2', 'snr3',
              'heading', 'pitch', 'roll',
              'voltage', 'pc'])

# specify compression for all the variables to reduce file size
comp = dict(zlib=True, complevel=5)
ds.encoding = {var: comp for var in ds.data_vars}
ncOutDir = r'c:\checkouts\python\TUD-COASTAL\instrumentProcessing\example_data\ADV\qc'
if not os.path.exists(ncOutDir):
    os.mkdir(ncOutDir)
ds.to_netcdf(os.path.join(ncOutDir, 'vec1_short.nc'), encoding=ds.encoding)

# 3. Compute wave statistics
The final notebook will compute the wave statistics of the measurements. The output will be a slimmed down dataset with wave characteristics based on the pressures measured. All other data will be removed as it is already present in the raw data. This will speed up the final dataprocessing. We start with defining the inputparameters.

### Input parameters

In [17]:
# input specification
instrFile = r'c:\checkouts\python\TUD-COASTAL\instrumentProcessing\example_data\ADV\qc\vec1_short.nc'
 
# frequency resolution in fourier space
fresolution = 0.03125
#number of directional bins 
ntheta = 64

### Load data
We load the data we prepared in the previous step. The defined bursts will be interpolated and filtered on nan-values. 

In [18]:
# load the raw data from netcdf
ds0 = xr.open_dataset(instrFile).load()

# interpolate nans
N = len(ds0.N)
for var in ['u', 'v', 'p', 'eta']:
    # interpolate the bursts where there is less than 5% nans
    data = ds0[var].where(
        np.isnan(ds0[var]).sum(dim='N') < 0.05 * len(ds0.N)
    ).dropna(dim='t', how='all')
    if len(data.t) != 0:
        ds0[var] = data.interpolate_na(
            dim='N',
            method='cubic',
            max_gap=8)

    # and fill the gaps more than 8 in length with the burst average
    ds0[var] = ds0[var].fillna(ds0[var].mean(dim='N'))

ds0 = ds0.dropna(dim='t')

### New dataset
A new dataset will be created to provide for wave analysis. New columns are added to calculate in the frequency domain. Other information will be copied from the ds0 dataset.

In [19]:
# make a new dataset that has an extra dimension to accomodate for the frequency axis
ds = xr.Dataset(data_vars={},
          coords = {'t': ds0.t.values,
                    'N': ds0.N.values,
                    'f': np.arange(0, ds0.sf.values/2, fresolution),
                    'theta': np.arange(start=-np.pi,stop=np.pi,step=2*np.pi/ntheta)})
ds['f'].attrs = {'units': 'Hz'}
ds.attrs = ds0.attrs

# put all variables in this new dataset
for key in ds0.data_vars:
    ds[key] = ds0[key]


# extract sampling frequency as explicit variable
sf = ds.f.values          

# compute water depth
ds['h'] = ds['zsmean']-ds['zb']   

### Wave characteristics
Next step is calculating the wave characteristics. The spectrum will be calculated based on pressure measurements; from this spectrum, the spectral density and peak frequency will be derived. Using these, we calculate the main wave characteristics like significant wave height, peak period and the mean wave periods Tm01, Tm02 ad Tmm10. A smoothed peak period will also be calculated.

In [20]:
# compute wave spectra from pressure signal
# by applying the linear transfer function to translate the pressure signal to the water surface
ufunc = lambda x, h: puv.attenuation_corrected_wave_spectrum(
    'pressure',
    ds.sf.values, x, h,
    ds.zi.values,
    ds.zb.values,
    fresolution=fresolution)

# spectral density
fx, ds['vy'] = xr.apply_ufunc(ufunc,
                        ds['p'], ds['h'],
                        input_core_dims=[['N'], []],
                        output_core_dims=[['f'], ['f']],
                        vectorize=True) 
ds['vy'].attrs = {'units': 'm2/Hz', 'long_name': 'spectral density'}

In [21]:
#compute spectral statistics               

# peak frequency
ufunc = lambda vy: puv.get_peak_frequency(ds.f.values, vy)
ds['fp'] = xr.apply_ufunc(ufunc,
                        ds['vy'],
                        input_core_dims=[['f']],
                        output_core_dims=[[]], 
                        vectorize=True) 

# wave characteristics 
ufunc = lambda vy, fp: puv.compute_wave_params(ds.f.values, vy, fmin=0.5*fp, fmax=5)
ds['Hm0'], ds['Tp'], ds['Tm01'], ds['Tm02'], ds['Tmm10'], ds['Tps'] = xr.apply_ufunc(ufunc,
                        ds['vy'], ds['fp'],
                        input_core_dims=[['f'], []],
                        output_core_dims=[[], [], [], [], [], []],
                        vectorize=True) 
ds['Hm0'].attrs = {'units': 'm', 'long_name': 'significant wave height','computation':'computed between fmin=0.5fp and fmax=5'}
ds['Tp'].attrs = {'units': 's', 'long_name': 'peak wave period','computation':'computed between fmin=0.5fp and fmax=5'}
ds['Tm01'].attrs = {'units': 's', 'long_name': 'mean wave period','computation':'computed between fmin=0.5fp and fmax=5'}
ds['Tm02'].attrs = {'units': 's', 'long_name': 'mean wave period','computation':'computed between fmin=0.5fp and fmax=5'}
ds['Tmm10'].attrs = {'units': 's', 'long_name': 'mean wave period','computation':'computed between fmin=0.5fp and fmax=5'}
ds['Tps'].attrs = {'units': 's', 'long_name': 'peak wave period','computation':'computed between fmin=0.5fp and fmax=5', 'comment':'smoothed estimate from the discrete spectrum'}

### Wave characteristics time domain
We also calculate the directional wave spectra based on the velocities in the time domain.

In [22]:
# compute current magnitudes and direction all computed in the time domain
ds['u_mean'] = ds.u.mean(axis=1)
ds['u_mean'].attrs = {'units': 'm/s', 'long_name': 'current x-component', 'computation': 'burst averaged'}

ds['v_mean'] = ds.v.mean(axis=1)
ds['v_mean'].attrs = {'units': 'm/s', 'long_name': 'current y-component', 'computation': 'burst averaged'}
               
ds['cur_dir'] = np.arctan2(ds['v_mean'], ds['u_mean'])*180/np.pi
ds['cur_dir'].attrs = {'units': 'deg', 'long_name': 'current direction, cartesian convention'}

### 2D Wave characteristics from method of maximum entropy
We also calculate the directional wave spectra based on the phase coupling between pressure and velocities.

In [23]:
# directional wave spectra
ufunc = lambda p, u, v, h, fp: puv.wave_MEMpuv(p/1e4, u, v, h,
                    ds.zi.values,
                    ds.zb.values,
                    ds.sf.values,
                    fresolution=fresolution,
                    ntheta=ntheta,
                    fcorrmin=0.5*fp,
                    fcorrmax=5,
                    maxiter=20)
            
fx, vy, theta, ds['S'] = xr.apply_ufunc(ufunc,
                        ds['p'], ds['u'], ds['v'], ds['h'], ds['fp'],
                        input_core_dims=[['N'], ['N'], ['N'], [], []],
                        output_core_dims=[['f'], ['f'], ['theta'], ['f', 'theta']],
                        vectorize=True) 
ds['S'].attrs = {'units': 'm2/Hz/rad', 'long_name': 'directional variance density',
                 'computation': 'computed between fmin=0.5fp and fmax=5'}

In [24]:
# statistics from directional wave spectra
ufunc = lambda vy,S,fp: puv.compute_wave_params(ds.f.values, vy, fmin=0.5*fp, fmax=5, theta=ds.theta.values, S=S)
Hm0, Tp, Tm01, Tm02, Tmm10, Tps, ds['wavedirmean'],ds['dirspread'] = xr.apply_ufunc(ufunc,
                        ds['vy'], ds['S'], ds['fp'],
                        input_core_dims=[['f'], ['f', 'theta'], []],
                        output_core_dims=[[], [], [], [], [], [], [], []],
                        vectorize=True) 
ds['wavedirmean'].attrs = {'units': 'deg', 'long_name': 'mean wave direction', 'computation': 'computed between fmin=0.5fp and fmax=5'}
ds['dirspread'].attrs = {'units': 'deg', 'long_name': 'directional spreading', 'computation': 'computed between fmin=0.5fp and fmax=5'}

  result = getattr(ufunc, method)(*inputs, **kwargs)
  result = getattr(ufunc, method)(*inputs, **kwargs)
  result = getattr(ufunc, method)(*inputs, **kwargs)


### Output
The new data will be saved in the output file. The old data will be deleted, as stated previously, to save memory and to reduce duplicate data. You will end up with three different files. The raw data, the corrected/filtered data and the wave characteristics data, all in netcdf format. 

In [25]:
# write to file
ncOutFile = r'c:\checkouts\python\TUD-COASTAL\instrumentProcessing\example_data\ADV\tailored\vec1_pilot_tailored_short.nc'

# we strip all information on burst scale from the dataset to reduce size (and this info is already present in the raw_netcdf version of the data)
dsTailored = ds.drop_dims('N')
dsTailored.to_netcdf(ncOutFile)