In [20]:
import numpy as np
import pandas as pd
import scipy
from scipy import signal
import plotly.graph_objects as go
import matplotlib.pyplot as plt

# Data Smoothing

Data Smoothing is a tool that eliminates noise from a dataset. To exlain this, consider $f(x)=\sin(x)\cos(x)$ with $0\leq x \leq 4\pi$. We can discretize $[0,4\pi]$ using 100 points. We have

In [21]:
## Data
x = np.linspace(0, 4*np.pi, 100)
y = np.sin(x)*np.cos(x)

Adding noise to the data using [*numpy.random.normal*](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.normal.html).

In [22]:
## Noisy Data
mu, sigma = 0, 0.1 # mean and standard deviation
noise = np.random.normal(mu, sigma, len(x))
y_noise = y + noise

## Savitzky-Golay Filter

Without getting into the details, we only apply Savitzky–Golay filter here and interested readers can see [this link](https://en.wikipedia.org/wiki/Savitzky%E2%80%93Golay_filter#Treatment_of_first_and_last_points) for more details. We use **scipy.signal.savgol_filter** from **scipy** package. The details on the usage of this package can be found [here](https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.signal.savgol_filter.html).

In [23]:
## Filtering noisy data
y_filtered=signal.savgol_filter(y_noise,23,4)

Figure

In [24]:
fig = go.Figure()
# ploting the original data
fig.add_trace(go.Scatter(x=x, y=y, mode='lines+markers', marker=dict(size=2, color='DarkRed'), name='Original Data' ))
# ploting the noisy data
fig.add_trace(go.Scatter(x=x, y=y_noise, mode='markers', marker=dict(size=5, color='royalblue', symbol='circle-open' ), 
                name='Noisy Data'))
fig.add_trace(go.Scatter(x=x, y=y_filtered, mode='lines+markers', marker=dict(size=2, color='DarkGreen',
              symbol='triangle-up'), name='Filtered Noisy Data'))