# Washington Shelf Experiment Data

In this exercise we will study data from the Washington Shelf experiment. The data contains transmission signals among other background noise. We will:

* fit a distribution to the background noise
* fit a distribution to the transmission signal
* use the fitted distributions to set up a statistical test to predict presence of a transmission sound and test on new samples

## Reading Data

To construct "idealized" distributions for transmission and background, we have extracted short segments of length 1 sec corresponding to the transmission, and 1 sec of background sound before it. The sampling frequency is 48kHz. We have stored the time series in two `.csv` files: `df_tr.csv` and `df_bg.csv`. The column names correspond to the signal timestamps.



In [None]:
import scipy.signal as signal
import matplotlib.pyplot as plt
import scipy.io as io
import numpy as np
import pandas as pd
import scipy as sp

In [None]:
fs = 48000

In [None]:
data_path = "/home/jovyan/shared-public/"

In [None]:
# reading transmission dataset
df_tr = pd.read_csv(data_path+"df_tr.csv")
# reading background dataset
df_bg = pd.read_csv(data_path+"df_bg.csv")

In [None]:
# visualize how the data is organized
df_tr.head()

In [None]:
df_tr.shape

Each column corresponds to a 1 sec time series (of length 48000) at a given timestamp, and we have 500 observations.

In [None]:
# display 1 pair of transmission and background time series

# select one index to plot the signals
idx = 10

time = np.arange(0, 1, 1. / fs)

plt.plot(time, df_tr.iloc[:,idx], label="transmission", alpha=0.5)
plt.plot(time, df_bg.iloc[:,idx], label="background", alpha=0.5)
plt.legend()

In [None]:
Fs = 48000

In [None]:
_ = plt.specgram(df_tr.iloc[:,idx], Fs=Fs)
plt.title("Transmission")

In [None]:
_ = plt.specgram(df_bg.iloc[:,idx], Fs=Fs)
plt.title("Background")

## Filtering Data

Since the data is quite noisy, and sometimes the transmission is not visible. We will use a filter to limit the frequency content to (3450Hz, 3550Hz) range. 

In [None]:
import scipy as sp

In [None]:
def BandPass(inputSignal, bandLimits, freqSample):
    sos = sp.signal.butter(4, bandLimits, 'bandpass', fs = freqSample, output='sos')
    outputSignal = sp.signal.sosfilt(sos,inputSignal)
    
    return outputSignal

In [None]:
flimits = [3450, 3550]
signal_tr_filt = BandPass( df_tr.iloc[:,idx], flimits, fs)
signal_bg_filt = BandPass( df_bg.iloc[:,idx], flimits, fs)

In [None]:
plt.plot(time, signal_tr_filt, alpha=0.5, label = "transmission")
plt.plot(time, signal_bg_filt, alpha=0.5, label = "background")
plt.legend()
plt.show()

**TODO:** Filter all the data

Hint: you can either use the pandas dataframe [`apply`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html) function or loop through the columns.

## Computing RMS

Compute the Root Mean Square for one 1 sec signal, and convert to dB.

In [None]:
# RMS in dB for one example
rms_tr = 20*np.log10(np.sqrt(np.mean(np.square(signal_tr_filt.astype('float')))))
print(rms_tr)

rms_bg = 20*np.log10(np.sqrt(np.mean(np.square(signal_bg_filt.astype('float')))))
print(rms_bg)

**TODO:** Compute RMS and convert to dB for each 1 sec signal.

**TODO:** Plot the histograms of the RMS dB values for the transmission and background signals. What type of distribution do they follow?

## Fitting Distributions

We can fit a distribution separately to the transmission samples, and to the background samples. You can utilize the skew-normal widget we used in the lecture and modify parameters, models.

In [None]:
from scipy import stats

In [None]:
def plot_skewnorm_density_L(a, scale, loc):
  h = plt.hist(X, bins=100, density=True, alpha=0.5)

  # evaluate the function at the histogram bins
  skewnorm_density = stats.skewnorm.pdf(h[1], a=a, scale=scale, loc=loc)

  # evaluation the function at the observations
  skewnorm_likelihood = stats.skewnorm.pdf(X, a=a, scale=scale, loc=loc)
  L = np.sum(np.log(skewnorm_likelihood))

  plt.plot(h[1], skewnorm_density)
  plt.title(f"Log-Likelihood {L:.10f}")

In [None]:
from ipywidgets import interact
import ipywidgets as widgets

**TODO:** Set the sample variable (transmission or background RMS)

**TODO:** Select starting values for the widget.

In [None]:
# shape =
# loc = 
# scale = 

In [None]:
shape_slider = widgets.FloatSlider(
    value=shape,
    min=shape-10,
    max=shape+10,
    step=0.01,
    description='Shape:',
    continuous_update=True,
    orientation='horizontal',
    readout=True,
    readout_format='.2f',
)

scale_slider = widgets.FloatSlider(
    value=scale,
    min=scale-10,
    max=scale+10,
    step=0.01,
    description='Scale:',
    continuous_update=True,
    orientation='horizontal',
    readout=True,
    readout_format='.2f',
)

loc_slider = widgets.FloatSlider(
    value=loc,
    min=loc-10,
    max=loc+10,
    step=0.01,
    description='Location:',
    continuous_update=True,
    orientation='horizontal',
    readout=True,
    readout_format='.2f',
)


**TODO:** Select parameters to fit the distribution. It does not need to be perfect!

In [None]:
out = interact(plot_skewnorm_density_L, a = shape_slider, scale = scale_slider, loc = loc_slider)