# Data Smoothing

Although detecting *outliers* is a very broad topic, we can do the same using moving averages on **time series** data, eliminating *spikes*, measurement errors, or both. Even though the *spikes are accurate*, they 
may not reflect the underlying process and may be more a matter of instrumentation issues, so it is common to smooth the data.

Data smoothing is closely related to missing data imputation. Thus, some of 
his techniques are also relevant to us. Smoothing can have many purposes:
    - data preparation;
    - feature generation;
    - prediction;
    - visualization;

### Exponential Smoothing

Compared to the moving average, exponential smoothing is more sensitive to temporality, weighting recent points more than less recent points. Therefore, for a given window, the closest point in time has a denser weight and each previous point in time has an exponentially smaller weight.

### Importing libs

In [1]:
import pandas as pd

### Dataset Description
The airline passenger dataset is a description of thousands of monthly passengers

In [2]:
# reading the csv of AirPassengers
air = pd.read_csv("/Users/dellacorte/py-projects/data-science/time-series-pocket-reference/getting-time-series-datasets/datasets/AirPassengers.csv", parse_dates = True, header = None)
air.columns = ['Date', 'Passengers']

# converting object to datetime
air['Date'] = pd.to_datetime(air['Date'])

air

Unnamed: 0,Date,Passengers
0,1949-01-01,112
1,1949-02-01,118
2,1949-03-01,132
3,1949-04-01,129
4,1949-05-01,121
...,...,...
139,1960-08-01,606
140,1960-09-01,508
141,1960-10-01,461
142,1960-11-01,390


In [3]:
air.dtypes

Date          datetime64[ns]
Passengers             int64
dtype: object

We can easily smooth passenger values ​​using a variety of exponential decay operations and the **Pandas** function *ewma()*

In [4]:
air['Smooth.1'] = air["Passengers"].ewm(alpha=0.1).mean()
air['Smooth.5'] = air["Passengers"].ewm(alpha=0.5).mean()
air['Smooth.9'] = air["Passengers"].ewm(alpha=0.9).mean()

air

Unnamed: 0,Date,Passengers,Smooth.1,Smooth.5,Smooth.9
0,1949-01-01,112,112.000000,112.000000,112.000000
1,1949-02-01,118,115.157895,116.000000,117.454545
2,1949-03-01,132,121.372694,125.142857,130.558559
3,1949-04-01,129,123.590579,127.200000,129.155716
4,1949-05-01,121,122.957974,124.000000,121.815498
...,...,...,...,...,...
139,1960-08-01,606,468.874660,582.096411,606.665454
140,1960-09-01,508,472.787195,545.048205,517.866545
141,1960-10-01,461,471.608475,503.024103,466.686655
142,1960-11-01,390,463.447626,446.512051,397.668665


As we can see, the level of the *alpha* parameter, also called *smoothing factor*, impacts how much the value is updated to its current value versus retaining information from the existing average. The higher the value of *alpha*, the faster the value is updated closer to its current price.