# Resampling

`Resampling` is a technique in time series analysis that involves changing the frequency of the data observations. It’s often used to transform the data to a different frequency (e.g., from daily to monthly) to reveal patterns or trends more clearly.

Two Types : 
   * Upsampling
   * Downsampling

## Upsampling

Upsampling involves increasing the time-frequency of the data, it is a data disaggregation procedure where we break down the time frequency from a higher level to a lower level. 

For example Breaking down the time-frequency from months to days, or days to hours or hours to seconds. Upsampling usually blows up the size of the data, depending on the sampling frequency.

In [1]:
# import the python pandas library 
import pandas as pd 

# read data using read_csv 
data = pd.read_csv("Detergent sales data.csv", header=0, 
                index_col=0, parse_dates=True) 

# If you need to convert a single column DataFrame to a Series, you can do so explicitly:
if data.shape[1] == 1:
    data = data.squeeze()

data.head()

Months
1/31/2021    200
2/28/2021    300
3/31/2021    250
4/30/2021    450
5/31/2021    325
Name: Sales, dtype: int64

In [4]:
# Ensure the index is a DatetimeIndex
data.index = pd.to_datetime(data.index, errors = 'coerce')

# Use resample function to upsample months 
# to days using the mean sales of month 
upsampled = data.resample('D').mean() 
upsampled

Months
2019-09-11    595.0
2019-09-12    610.0
2019-09-13    625.0
2019-09-14    640.0
2019-09-15    655.0
              ...  
2021-12-26      NaN
2021-12-27      NaN
2021-12-28      NaN
2021-12-29      NaN
2021-12-30    520.0
Name: Sales, Length: 842, dtype: float64

The dataset has been upsampled with nan values for the remaining days except for those days which were originally available in our dataset. 

Now, we can fill these nan values using a technique called Interpolation. Pandas provide a function called DataFrame.interpolate() for this purpose. 

``Interpolation`` is a method that involves filling the nan values using one of the techniques like nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’. We will choose “linear” interpolation. This draws a straight line between available data, in this case on the last of the month, and fills in values at the chosen frequency from this line. 

In [6]:
# Use interpolate function with method linear to upsample the values of the upsampled days linearly
interpolated = upsampled.interpolate(method='linear') 

# Printing the linear interpolated values for month 2 
print(interpolated['2021-02'])

Months
2021-02-01    310.000000
2021-02-02    309.629630
2021-02-03    309.259259
2021-02-04    308.888889
2021-02-05    308.518519
2021-02-06    308.148148
2021-02-07    307.777778
2021-02-08    307.407407
2021-02-09    307.037037
2021-02-10    306.666667
2021-02-11    306.296296
2021-02-12    305.925926
2021-02-13    305.555556
2021-02-14    305.185185
2021-02-15    304.814815
2021-02-16    304.444444
2021-02-17    304.074074
2021-02-18    303.703704
2021-02-19    303.333333
2021-02-20    302.962963
2021-02-21    302.592593
2021-02-22    302.222222
2021-02-23    301.851852
2021-02-24    301.481481
2021-02-25    301.111111
2021-02-26    300.740741
2021-02-27    300.370370
2021-02-28    300.000000
Name: Sales, dtype: float64


## Upsampling with a polynomial interpolation
Another common interpolation method is to use a polynomial or a spline to connect the values. This creates more curves and can look realistic on many datasets. 

In [10]:
# use interpolate function with method polynomial 
# This upsamples the values of the remaining days with a quadratic function of degree 2. 
interpolated = upsampled.interpolate(method='polynomial', order=2) 

# Printing the polynomial interpolated value 
interpolated

Months
2019-09-11    595.000000
2019-09-12    610.000000
2019-09-13    625.000000
2019-09-14    640.000000
2019-09-15    655.000000
                 ...    
2021-12-26    575.019038
2021-12-27    562.873470
2021-12-28    549.655108
2021-12-29    535.363951
2021-12-30    520.000000
Name: Sales, Length: 842, dtype: float64

## Downsampling

Downsampling involves decreasing the time-frequency of the data, it is a data aggregation procedure where we aggregate the time frequency from a lower level to a higher level. For example summarizing the time-frequency from days to months, or hours to days or seconds to hours. Downsampling usually shrinks the size of the data, depending on the sampling frequency.

For example, car sales data shows sales value for the first 6 months daywise. Assume the task here is to predict the value of the quarterly sales. Given daily data, we are asked to predict the quarterly sales data, which signifies the use of downsampling. 

In [12]:
# import the python pandas library 
import pandas as pd 

# read the data using pandas read_csv() function. 
data = pd.read_csv("car-sales.csv", header=0, 
                index_col=0, parse_dates=True) 

# If you need to convert a single column DataFrame to a Series, you can do so explicitly:
if data.shape[1] == 1:
    data = data.squeeze()
    
# printing the first 6 rows of the dataset 
print(data.head(6)) 

Months
2021-01-01    210.000000
2021-01-02    209.666667
2021-01-03    209.333333
2021-01-04    209.000000
2021-01-05    208.666667
2021-01-06    208.333333
Name: Sales, dtype: float64


We can use quarterly resampling frequency ‘Q’ to aggregate the data quarter-wise.

In [13]:
# Use resample function to downsample days 
# to months using the mean sales of month. 
downsampled = data.resample('Q').mean() 

# printing the downsampled data. 
print(downsampled) 

Months
2018-03-31    2088.000000
2018-06-30    2269.000000
2018-09-30    2452.000000
2018-12-31    2636.000000
2019-03-31    2818.000000
2019-06-30    2999.000000
2019-09-30    3182.000000
2019-12-31    3366.000000
2020-03-31    1357.000000
2020-06-30    1539.000000
2020-09-30    1722.000000
2020-12-31    1906.000000
2021-03-31     253.333333
2021-06-30     428.901099
2021-09-30     988.152174
2021-12-31    1144.810127
Name: Sales, dtype: float64


Now, this downsampled data can be used for predicting quarterly sales.