## **Noise Exploration Notebook Description**

This notebook is an example of the process we used to compare our analyte of interest (A) with a less noisy analyte (B) to try and determine the optimal way to reduce noise. Future work could definitely benefit from trying other attempts. In the final preprocessing notebook, three forms of noise reduction are included, along with a function for diagnostic plots which only considers analyte A. 

# Package Imports

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.fft import rfft, rfftfreq
from scipy import signal
from scipy.signal import savgol_filter

# Read in the data

In [None]:
un_pred = pd.read_csv('/Users/saral/OneDrive - UBC/MDS/Capstone/Code + Data/Data/Raw Data Predictors/Unsuccessful.csv')
ecd_pred = pd.read_csv('/Users/saral/OneDrive - UBC/MDS/Capstone/Code + Data/Data/Raw Data Predictors/ECD.csv')
ecd_ts = pd.read_csv('/Users/saral/OneDrive - UBC/MDS/Capstone/Code + Data/Data/Time Series/ECDTS/ECD_TS.csv')
un_ts = pd.read_csv('/Users/saral/OneDrive - UBC/MDS/Capstone/Code + Data/Data/Time Series/unsuccesful time series/US_TS.csv')
B = pd.read_csv('/Users/saral/OneDrive - UBC/MDS/Capstone/Code + Data/Data/ECD_A2_all.csv')

# Plot the raw waveforms

In [None]:
loc = 69

In [None]:
ide = ecd_ts['TestId'][loc]

In [None]:
t = np.arange(0.2, 300.2, 0.2) -  ecd_pred[ecd_pred['TestID'] == ide]['SampleDetectTime'].item()
s = plt.plot(t, ecd_ts[ecd_ts['TestId'] == ide].iloc[0,1:], c = 'r')
plt.title('Raw Entire A Reading')
plt.xlabel('Index w.r.t. sample detect time')
plt.ylabel('Signal')

In [None]:
s = plt.plot(t, B[B['TestId'] == ide].iloc[:,2:].transpose())
plt.xlabel('index w.r.t sample detection')
plt.title('Raw Entire B Reading')
plt.xlabel('Index w.r.t. sample detect time')
plt.ylabel('Signal')

# Define Period of time w.r.t sample detect to ingnore wet-up

In [None]:
sample_start = int(ecd_pred[ecd_pred['TestID'] == ide]['SampleDetectTime'].item()/0.2)

In [None]:
cal_start = int(-30/0.2)
cal_end = int(40/0.2)

In [None]:
B_cal = B[B['TestId'] == ide].iloc[:, sample_start + cal_start + 2:sample_start+cal_end+2]
A_cal = ecd_ts[ecd_ts['TestId'] == ide].iloc[:, sample_start + cal_start + 1 :sample_start+cal_end+1]

# Plot Waveforms without wet-up

In [None]:
x = np.arange(cal_start, cal_end)
t = plt.plot(x*0.2, B_cal.transpose())
plt.title('Raw B Reading without wet-up')
plt.xlabel('Index w.r.t. sample detect time')
plt.ylabel('Signal')

In [None]:
t = plt.plot(x*0.2, A_cal.transpose(), c = 'r')
plt.title('Raw A Reading without wet-up')
plt.xlabel('Index w.r.t. sample detect time')
plt.ylabel('Signal')

In [None]:
t = plt.plot(x*0.2, B_cal.transpose(), label = 'B')
t = plt.plot(x*0.2, A_cal.transpose(), c = 'r', label = 'A')
plt.legend()
plt.title("Raw readings with wet-up removed, overlayed")
plt.ylabel('signal')
plt.xlabel('Index w.r.t. sample detect time')

# Try normalizing the signals and plot them (scale so shape is retained but all values are between 0 and 1)

In [None]:
if A_cal.max(axis = 1).item() != A_cal.min(axis = 1).item():
    A_norm = (A_cal - A_cal.min(axis = 1).item() )/ (A_cal.max(axis = 1).item() - A_cal.min(axis = 1).item())
else:
    A_norm = A_cal
B_norm = (B_cal - B_cal.min(axis = 1).item() )/ (B_cal.max(axis = 1).item() - B_cal.min(axis = 1).item())

In [None]:
A_norm

In [None]:
plt.plot(x, A_norm.transpose(), c = 'r', label = 'A')
plt.plot(x, B_norm.transpose(), c = 'b', label = 'B')
plt.title("Normalized readings with wet-up removed, overlayed")
plt.ylabel('normalized signal')
plt.xlabel('Index w.r.t. sample detect time')
plt.legend()

# Convolution with tsmoothie

In [None]:
from tsmoothie.smoother import *

# operate smoothing
smoother = ConvolutionSmoother(window_len=50, window_type='bartlett')
smoother.smooth(A_norm.transpose().iloc[:,0].to_numpy())

# generate intervals
low, up = smoother.get_intervals('sigma_interval', n_sigma=3)

# plot the smoothed timeseries with intervals
plt.figure(figsize=(11,6))
plt.plot(smoother.data[0], color='orange')
plt.plot(smoother.smooth_data[0], linewidth=3, color='blue')
plt.fill_between(range(len(smoother.data[0])), low[0], up[0], alpha=0.3)

In [None]:
f, ps = signal.periodogram(smoother.smooth_data[0], 5)
f_g, ps_g = signal.periodogram(B_norm.transpose().iloc[:,0].to_numpy(), 5)

Use a periodogram to estimate the power spectrum density. This allows us to see how much of the signal occurs at different frequency and will hopefully give an idea of whether we have effectively reduced noise (assumed to be higher frequency activity)

In [None]:
plt.plot(f_g, ps_g, label = 'B')
plt.plot(f, ps, label = 'filtered A')
plt.legend()
plt.title('Overlayed PSD Estimations')
plt.xlabel('Hz')
plt.ylabel('power')

In [None]:
plt.plot(np.arange(-30,40,0.2), smoother.data[0], color='orange', alpha = 0.3, label = 'A')
plt.plot(np.arange(-30,40,0.2), smoother.smooth_data[0], color='orange', label = 'smoothed A' )
plt.plot(np.arange(-30,40,0.2), B_norm.transpose().iloc[:,0].to_numpy(), label = "B")
plt.legend()