# Notebook Instructions

1. If you are new to Jupyter notebooks, please go through this introductory manual <a href='https://quantra.quantinsti.com/quantra-notebook' target="_blank">here</a>.
1. Any changes made in this notebook would be lost after you close the browser window. **You can download the notebook to save your work on your PC.**
1. Before running this notebook on your local PC:<br>
i.  You need to set up a Python environment and the relevant packages on your local PC. To do so, go through the section on "**Run Codes Locally on Your Machine**" in the course.<br>
ii. You need to **download the zip file available in the last unit** of this course. The zip file contains the data files and/or python modules that might be required to run this notebook.

## Resampling Data

The data we get from the source is mostly of 1-minute resolution. But there are many cases where 15-minute, hourly, daily, or even weekly data may need to be derived from greater resolution data. In this notebook, we will resample the 1-minute resolution data into 5-min data. 

**Note - We can resample the data only from higher frequency to lower frequency and not vice versa**

The notebook is structured as follows:
1. [Import the Data](#import)
2. [Resample Data](#resample)
3. [Additional Parameters in Resample Function](#params)

## Import Libraries

In [1]:
# For data manipulation
import pandas as pd

<a id='import'></a>
## Import the Data

Import the file `AAPL_minute_wise_2012_2021.csv` using the `read_csv` method of `pandas`. This file has the 1-minute resolution OHLC values for Apple stock.

This CSV file is available in the zip file of the unit 'Python Codes and Data' in the 'Course Summary' section.

In [2]:
# Price data of Apple stock
data = pd.read_csv('../data_modules/AAPL_1_minute_2012_2021.csv', index_col=0)

# Change index to datetime
data.index = pd.to_datetime(data.index)
data.head()

Unnamed: 0,Open,High,Low,Close
2012-01-03 09:30:00+00:00,14.617857,14.641071,14.614286,14.614286
2012-01-03 09:31:00+00:00,14.612143,14.625,14.607143,14.611071
2012-01-03 09:32:00+00:00,14.611071,14.630357,14.608214,14.623929
2012-01-03 09:33:00+00:00,14.626071,14.642857,14.623929,14.642857
2012-01-03 09:34:00+00:00,14.642857,14.642857,14.630714,14.635357


<a id='resample'></a>
## Resample Data

The resampling process involves three major steps:

Specifying a resample mapping: The resample mapping tells pandas which columns should be used in data aggregation and the functions that should be used in consideration of each. The price data consists of OHLC values. 

Let's say, for example, we want to convert five 1-minute candles to one 5-min candle, then the mapping below will be used. The smallest value from the five 1-minute values would be used as the new low value, and the largest value from the five 1-minute values would be used as the new high value.

    "open": "first": The first value for open is used for resampling.

    "high": "max": The largest value of the sub-period is taken as the high.

    "low": "min": The smallest value of the sub-period is taken as the low.

    "close": "last": The last value is used as the closing value.
    
    
![title](https://d2a032ejo53cab.cloudfront.net/Course/Units/Mcq/UnitContent/KvIHEVL9/candleresample.png)

In [3]:
# Specify price mapping
price_mapping = {
        "Open": "first",
        "High": "max",
        "Low": "min",
        "Close": "last"
    }

Specifying a resample interval : The resolution required for the final resampled data. The commonly used resample intervals are: minute: `m`, hour: `H`, daily: `D`, weekly: `W`, monthly: `M`. We will use `dropna` method to remove off market hours data. For example, data for Sat, Sun and for holidays are shown as NaN and needs to be removed. 

To know more about the parameters of the resample method you can refer to the following documentation:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html

In [4]:
# Specify resample interval
interval_5min = "5min"

Resample using the `agg` function. The `agg` function takes price mapping as a parameter.

In [5]:
# Resample data
data_5min = data.resample(interval_5min).agg(price_mapping)

# Drop NaN values
data_5min.dropna(inplace=True)

# Display resampled data
data_5min.head()

Unnamed: 0,Open,High,Low,Close
2012-01-03 09:30:00+00:00,14.617857,14.642857,14.607143,14.635357
2012-01-03 09:35:00+00:00,14.635357,14.696429,14.632857,14.688929
2012-01-03 09:40:00+00:00,14.689286,14.702857,14.665357,14.673571
2012-01-03 09:45:00+00:00,14.672857,14.695357,14.668214,14.688571
2012-01-03 09:50:00+00:00,14.688571,14.696429,14.673214,14.678571


<a id='params'></a>
## Additional Parameters in Resample Function

Closed: The closed argument tells which side is included, ‘closed’ being the included side (implying the other side is not included) in the calculation for each time interval.

In [6]:
# Resample with closed=right
data_closed_right = data.resample('5min', closed='right').sum()

# Resample with closed=left
data_closed_left = data.resample('5min', closed='left').sum()

# Drop NaN values
data_closed_right.dropna(inplace=True)
data_closed_left.dropna(inplace=True)

In [7]:
# Display resampled data with closed=right
data_closed_right.head()

Unnamed: 0,Open,High,Low,Close
2012-01-03 09:25:00+00:00,14.617857,14.641071,14.614286,14.614286
2012-01-03 09:30:00+00:00,73.1275,73.201071,73.102857,73.167143
2012-01-03 09:35:00+00:00,73.406429,73.473571,73.386786,73.446786
2012-01-03 09:40:00+00:00,73.430714,73.438929,73.377143,73.406429
2012-01-03 09:45:00+00:00,73.428929,73.463929,73.394286,73.446429


In [8]:
# Display resampled data with closed=left
data_closed_left.head()

Unnamed: 0,Open,High,Low,Close
2012-01-03 09:30:00+00:00,73.11,73.182143,73.084286,73.1275
2012-01-03 09:35:00+00:00,73.3525,73.430714,73.334286,73.403214
2012-01-03 09:40:00+00:00,73.447143,73.465714,73.394286,73.433929
2012-01-03 09:45:00+00:00,73.413214,73.443571,73.379643,73.423571
2012-01-03 09:50:00+00:00,73.440714,73.454643,73.407143,73.428214


Label: This argument relabels the output based on the desired edge once the aggregation is performed.

In [9]:
# Resample with label=right
data_day_left = data.resample('5min', label='right').agg(price_mapping)

# Resample with label=left
data_day_right = data.resample('5min', label='left').agg(price_mapping)

# Drop NaN values
data_day_left.dropna(inplace=True)
data_day_left.dropna(inplace=True)

In [10]:
# Display resampled data with label=left
data_day_left.head()

Unnamed: 0,Open,High,Low,Close
2012-01-03 09:35:00+00:00,14.617857,14.642857,14.607143,14.635357
2012-01-03 09:40:00+00:00,14.635357,14.696429,14.632857,14.688929
2012-01-03 09:45:00+00:00,14.689286,14.702857,14.665357,14.673571
2012-01-03 09:50:00+00:00,14.672857,14.695357,14.668214,14.688571
2012-01-03 09:55:00+00:00,14.688571,14.696429,14.673214,14.678571


In [11]:
# Display resampled data with label=right
data_day_right.head()

Unnamed: 0,Open,High,Low,Close
2012-01-03 09:30:00+00:00,14.617857,14.642857,14.607143,14.635357
2012-01-03 09:35:00+00:00,14.635357,14.696429,14.632857,14.688929
2012-01-03 09:40:00+00:00,14.689286,14.702857,14.665357,14.673571
2012-01-03 09:45:00+00:00,14.672857,14.695357,14.668214,14.688571
2012-01-03 09:50:00+00:00,14.688571,14.696429,14.673214,14.678571


In [12]:
# Try it yourself (Valid resample intervals are- minute: m, hour: H, daily: D, weekly: W, monthly: M)
data_day = data.resample(______).agg(price_mapping)

## Conclusion
In this notebook, you learned to resample the data from a higher frequency to lower frequency. You can try doing the same yourself for different time intervals.<br><br>