# Intraday Trading via Day Trading Techniques & Indicators
---

### Data collected via AlphaVantage free API using extended intraday data. 
> https://www.alphavantage.co/documentation/

---

# 01 - Feature Engineering

### Library Imports

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# Read in Combined Dataset

In [2]:
df = pd.read_csv('../01_Data/extended_intraday_SPY_1min_combined.csv')
df.set_index(pd.DatetimeIndex(df['time']), inplace=True)
df.drop(columns = ['time'], inplace = True)
df.head()

Unnamed: 0_level_0,open,high,low,close,volume
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2019-10-28 04:01:00,292.170418,292.170418,292.170418,292.170418,581
2019-10-28 04:02:00,292.151078,292.151078,292.141408,292.141408,300
2019-10-28 04:05:00,292.238108,292.238108,292.238108,292.238108,300
2019-10-28 04:06:00,292.257448,292.257448,292.228438,292.228438,300
2019-10-28 04:07:00,292.286458,292.286458,292.286458,292.286458,100


In [3]:
df.shape

(411267, 5)

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 411267 entries, 2019-10-28 04:01:00 to 2021-10-15 20:00:00
Data columns (total 5 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   open    411267 non-null  float64
 1   high    411267 non-null  float64
 2   low     411267 non-null  float64
 3   close   411267 non-null  float64
 4   volume  411267 non-null  int64  
dtypes: float64(4), int64(1)
memory usage: 18.8 MB


In [5]:
df.describe()

Unnamed: 0,open,high,low,close,volume
count,411267.0,411267.0,411267.0,411267.0,411267.0
mean,351.810012,351.880121,351.740932,351.809786,101843.4
std,57.098483,57.074556,57.120553,57.098499,245337.2
min,213.565721,213.824423,213.394232,213.570219,0.0
25%,307.082194,307.141116,307.013921,307.078073,1943.0
50%,341.525,341.59,341.455966,341.525,27570.0
75%,407.774123,407.823797,407.704579,407.764188,108822.0
max,452.615596,453.064145,452.615596,452.615596,23934400.0


**Looks to be imported correctly with no errors nor null values.**

# Initial Feature Engineering
---
**Beyond price and volume, day traders tend to use similar indicators. The most commonly used consist of:**
- Volume Weighted Average Price (VWAP)
- VWAP 1st, 2nd, and 3rd standard deviations
- 9 and 20 Exponential Moving Averages (EMA)
- Distance between the 9 and 20 EMA
- Support and Resistance levels
 - These are more difficult to calculate dynamically and setting these is more of an art than a hard science.
  - Some simple examples would be todays's open, yesterday's close, yesterday's high & low price, and today's high & low price

**For our moving averages, at this point, we believe the best way to capture the relationship between price movement is through distance from our close price and those moving averages.** 

## Volume Weighted Average Price

In [6]:
# copied via https://stackoverflow.com/questions/44854512/how-to-calculate-vwap-volume-weighted-average-price-using-groupby-and-apply
# All credit for this cell to piRSquared's Answer, Option 1.
df = df.assign(
    vwap=df.eval(
        'wgtd = close * volume', inplace=False
    ).groupby(df.index.date).cumsum().eval('wgtd / volume')
)
df['vwap_Distance'] = df['close'] - df['vwap']

## Volume Weighted Average Price - Standard Deviations 1, 2, and 3

In [7]:
# 1st, 2nd, and 3rd standard deviation above the VWAP on a rolling 12 period and distance from current close.
df['vwap_1std_above'] = df['vwap'] + df['vwap'].rolling(12).std()
df['vwap_1std_above_Distance'] = df['close'] - df['vwap_1std_above']
df['vwap_2std_above'] = df['vwap'] + 2 * df['vwap'].rolling(12).std()
df['vwap_2std_above_Distance'] = df['close'] - df['vwap_2std_above']
df['vwap_3std_above'] = df['vwap'] + 3 * df['vwap'].rolling(12).std()
df['vwap_3std_above_Distance'] = df['close'] - df['vwap_3std_above']

# 1st, 2nd, and 3rd standard deviation below the VWAP on a rolling 12 period and distance from current close.
df['vwap_1std_below'] = df['vwap'] - df['vwap'].rolling(12).std()
df['vwap_1std_below_Distance'] = df['close'] - df['vwap_1std_below']
df['vwap_2std_below'] = df['vwap'] - 2 * df['vwap'].rolling(12).std()
df['vwap_2std_below_Distance'] = df['close'] - df['vwap_2std_below']
df['vwap_3std_below'] = df['vwap'] - 3 * df['vwap'].rolling(12).std()
df['vwap_3std_below_Distance'] = df['close'] - df['vwap_3std_below']

## 9 and 20 Exponential Moving Averages

In [8]:
df['9_EMA'] = df['close'].ewm(span=9).mean()
df['9_EMA_Distance'] = df['close'] - df['9_EMA']

df['20_EMA'] = df['close'].ewm(span=20).mean()
df['20_EMA_Distance'] = df['close'] - df['20_EMA']

## Distance between 9 and 20 Exponential Moving Averages

In [9]:
df['EMA_Distance'] = df['9_EMA'] - df['20_EMA']

In [10]:
EMA_cross = []
for i, line in enumerate(df['EMA_Distance']):
    if line > 0:
        EMA_cross.append('above')
    elif line < 0:
        EMA_cross.append('below')
    else:
        EMA_cross.append('crossing')
df['EMA_cross'] = EMA_cross

## Check Indicators

In [11]:
df.head()

Unnamed: 0_level_0,open,high,low,close,volume,vwap,vwap_Distance,vwap_1std_above,vwap_1std_above_Distance,vwap_2std_above,...,vwap_2std_below,vwap_2std_below_Distance,vwap_3std_below,vwap_3std_below_Distance,9_EMA,9_EMA_Distance,20_EMA,20_EMA_Distance,EMA_Distance,EMA_cross
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-10-28 04:01:00,292.170418,292.170418,292.170418,292.170418,581,292.170418,0.0,,,,...,,,,,292.170418,0.0,292.170418,0.0,0.0,crossing
2019-10-28 04:02:00,292.151078,292.151078,292.141408,292.141408,300,292.160539,-0.019132,,,,...,,,,,292.154301,-0.012893,292.155188,-0.01378,-0.000886,below
2019-10-28 04:05:00,292.238108,292.238108,292.238108,292.238108,300,292.180243,0.057865,,,,...,,,,,292.188648,0.04946,292.185636,0.052473,0.003013,above
2019-10-28 04:06:00,292.257448,292.257448,292.228438,292.228438,300,292.190006,0.038432,,,,...,,,,,292.202127,0.026311,292.197992,0.030446,0.004135,above
2019-10-28 04:07:00,292.286458,292.286458,292.286458,292.286458,100,292.196107,0.090352,,,,...,,,,,292.227214,0.059244,292.219391,0.067067,0.007823,above


In [12]:
df.tail()

Unnamed: 0_level_0,open,high,low,close,volume,vwap,vwap_Distance,vwap_1std_above,vwap_1std_above_Distance,vwap_2std_above,...,vwap_2std_below,vwap_2std_below_Distance,vwap_3std_below,vwap_3std_below_Distance,9_EMA,9_EMA_Distance,20_EMA,20_EMA_Distance,EMA_Distance,EMA_cross
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021-10-15 19:54:00,445.92,445.94,445.92,445.94,1383,445.383581,0.556419,445.383581,0.556419,445.383581,...,445.383581,0.556419,445.383581,0.556419,445.904598,0.035402,445.899427,0.040573,0.00517,above
2021-10-15 19:56:00,445.88,445.88,445.88,445.88,152,445.383582,0.496418,445.383582,0.496418,445.383582,...,445.383582,0.496418,445.383582,0.496418,445.899678,-0.019678,445.897577,-0.017577,0.002101,above
2021-10-15 19:58:00,445.9,445.9,445.88,445.88,774,445.383588,0.496412,445.383588,0.496412,445.383588,...,445.383588,0.496412,445.383588,0.496412,445.895743,-0.015743,445.895903,-0.015903,-0.000161,below
2021-10-15 19:59:00,445.91,445.91,445.88,445.88,641,445.383593,0.496407,445.383593,0.496407,445.383593,...,445.383593,0.496407,445.383593,0.496407,445.892594,-0.012594,445.894388,-0.014388,-0.001794,below
2021-10-15 20:00:00,445.97,445.97,445.87,445.87,1078,445.383601,0.486399,445.383601,0.486399,445.383601,...,445.383601,0.486399,445.383601,0.486399,445.888075,-0.018075,445.892066,-0.022066,-0.003991,below


In [13]:
df.loc['2021-10-15 12:09:00']

open                            444.91
high                            445.12
low                             444.91
close                          445.115
volume                           83755
vwap                        444.966199
vwap_Distance                 0.148801
vwap_1std_above             444.966681
vwap_1std_above_Distance      0.148319
vwap_2std_above             444.967163
vwap_2std_above_Distance      0.147837
vwap_3std_above             444.967646
vwap_3std_above_Distance      0.147354
vwap_1std_below             444.965716
vwap_1std_below_Distance      0.149284
vwap_2std_below             444.965234
vwap_2std_below_Distance      0.149766
vwap_3std_below             444.964752
vwap_3std_below_Distance      0.150248
9_EMA                       444.960631
9_EMA_Distance                0.154369
20_EMA                      444.943254
20_EMA_Distance               0.171746
EMA_Distance                  0.017378
EMA_cross                        above
Name: 2021-10-15 12:09:00

In [14]:
df['volume'].describe()

count    4.112670e+05
mean     1.018434e+05
std      2.453372e+05
min      0.000000e+00
25%      1.943000e+03
50%      2.757000e+04
75%      1.088220e+05
max      2.393440e+07
Name: volume, dtype: float64

In [15]:
df.index.max()

Timestamp('2021-10-15 20:00:00')

# Create Target Columns

**We want to build a model that can predict whether the next (?) time interval will go 'up', 'down', or 'flat' (relatively). We also want to test a binary classification model, so we create a second target column with only the target classes of 'up' and 'down'.**

In [16]:
# Define our target column, the % change in price
df['target'] = ((df['close'] - df['close'].shift(1)) / df['close'].shift(1)) * 100

# Iterate through our target column and classify them.
target_class = []
for i, line in enumerate(df['target']):
    if line > 0.01:
        target_class.append('up')
    elif line < - 0.01:
        target_class.append('down')
    else: #incorporates all values in real[-0.01, 0.01]
        target_class.append('flat')
df['target_multi_class'] = target_class


target_class = []
for i, line in enumerate(df['target']):
    if line > 0:
        target_class.append('up')
    else: # 'else' was needed over line <= 0 due to rounding errors leading to issues in the dataframe.
        #    These terms causing error were thrown into 'down' due to the belief that, on average
        #    the market goes up, thus leading to more balanced classes
        target_class.append('down')
df['target_binary_class'] = target_class
df.head()

Unnamed: 0_level_0,open,high,low,close,volume,vwap,vwap_Distance,vwap_1std_above,vwap_1std_above_Distance,vwap_2std_above,...,vwap_3std_below_Distance,9_EMA,9_EMA_Distance,20_EMA,20_EMA_Distance,EMA_Distance,EMA_cross,target,target_multi_class,target_binary_class
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-10-28 04:01:00,292.170418,292.170418,292.170418,292.170418,581,292.170418,0.0,,,,...,,292.170418,0.0,292.170418,0.0,0.0,crossing,,flat,down
2019-10-28 04:02:00,292.151078,292.151078,292.141408,292.141408,300,292.160539,-0.019132,,,,...,,292.154301,-0.012893,292.155188,-0.01378,-0.000886,below,-0.009929,flat,down
2019-10-28 04:05:00,292.238108,292.238108,292.238108,292.238108,300,292.180243,0.057865,,,,...,,292.188648,0.04946,292.185636,0.052473,0.003013,above,0.033101,up,up
2019-10-28 04:06:00,292.257448,292.257448,292.228438,292.228438,300,292.190006,0.038432,,,,...,,292.202127,0.026311,292.197992,0.030446,0.004135,above,-0.003309,flat,down
2019-10-28 04:07:00,292.286458,292.286458,292.286458,292.286458,100,292.196107,0.090352,,,,...,,292.227214,0.059244,292.219391,0.067067,0.007823,above,0.019854,up,up


# Save Feature Engineered Dataset

In [17]:
df.to_csv('../01_Data/extended_intraday_SPY_1min_featured.csv')