# Double Tap Feature Engineering

Hypothesis - consistent financial gains can be made by placing trades after two opposing 30minute candles create a new resistance level. The trade is placed in the direction of the confirming second candle which must close in the opposite direction of the previous 30 minute candle. 

In this notebook we will create our detailed dataframe that will form our training and test data for future modelling tasks. We will be creating new features from the base datasets that may correlate with failed or successful trades.

In [1301]:
#Import our base libraries for feature engineering
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt

In [1302]:
import warnings
warnings.filterwarnings('ignore')

In [1303]:
import seaborn as sns

In [1304]:
import random

In [1305]:
pd.set_option('display.max_columns', None)

In [1306]:
!ls

Double_Tap_Feature_Engineering.ipynb gj_30minC.csv
FX_GBPJPY, 30.csv                    gj_30minsupres.csv
[34mForex[m[m                                gj_4base.csv
[34mTelegraphFF[m[m                          gj_4hr.csv
gj_30base.csv                        gj_5min.csv
gj_30min.csv                         gpbjpy_dataframe_gen.ipynb
gj_30minA.csv                        tvexp_gj30min.csv
gj_30minB.csv                        tvexp_gj5min.csv


In [1307]:
#Read in our clean 30 minute candle dataset
df = pd.read_csv('gj_30base.csv', parse_dates=['time'])
df.head()

Unnamed: 0,time,open,high,low,close,S/R,SR,vwma,volume,sent_30
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024
3,2019-01-06 23:30:00,138.272,138.333,138.162,138.207,137.268,138.012,137.697588,9983,0.054335
4,2019-01-07 00:00:00,138.207,138.247,138.095,138.224,137.268,138.012,137.772113,10163,0.04955


The first thing we notice when the dataframe reads in is that the time is all one hour behind where it should be. for example where the time is given as 21:00 we know the actual time was 22:00. We need to shift all the hours forward (or back depending on how you view the flow of time) by 1 hour. 

In [1308]:
# from datetime import timedelta

In [1309]:
# df["date"] = df['time'] + timedelta(hours=1)

In [1310]:
# df.head()

In [1311]:
#We'll just re-organise the columns now as we have two columns showing conflicting date info. 
# date = df['date']
# df.drop(labels=['time', 'date'], axis=1, inplace=True)
# df.insert(0, 'date', date)
# df.head()

In [1312]:
#Let's rename the time column as date
df.rename(columns={"time":"date"}, inplace=True)

In [1313]:
df.head(2)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658


In [1314]:
df.shape 
#Nice! More than 18000 rows

(18753, 10)

In [1315]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18753 entries, 0 to 18752
Data columns (total 10 columns):
 #   Column   Non-Null Count  Dtype         
---  ------   --------------  -----         
 0   date     18753 non-null  datetime64[ns]
 1   open     18753 non-null  float64       
 2   high     18753 non-null  float64       
 3   low      18753 non-null  float64       
 4   close    18753 non-null  float64       
 5   S/R      18753 non-null  float64       
 6   SR       18753 non-null  float64       
 7   vwma     18753 non-null  float64       
 8   volume   18753 non-null  int64         
 9   sent_30  18753 non-null  float64       
dtypes: datetime64[ns](1), float64(8), int64(1)
memory usage: 1.4 MB


In [1316]:
#Let's have brief visualisation of the available 30minute data in classic candlestick
import plotly.graph_objs as go

In [1317]:
fig = go.Figure(data=[go.Candlestick(x=df['date'], 
                        open=df['open'],
                        high=df['high'], low=df['low'], 
                        close=df['close'])])



fig.update_layout(
    title={'text': "GBP/JPY 30Min",
           'y':0.9,
           'x':0.5,
           'xanchor': 'center',
           'yanchor': 'top'},
    xaxis_title="Date",
    yaxis_title="Value",
    font=dict(
        family="Courier New, monospace",
        size=18,
        color="#7f7f7f"
        
    )
)

fig.show()

In [1318]:
#We can pick out a smaller range from that plot by creating a msk between two dates.
temp = df['date'].between('2020-05-17 21:00:00', '2020-05-22 21:00:00')
dfx = df[temp]
dfx.head(2)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30
16896,2020-05-17 21:00:00,129.352,129.473,129.303,129.313,129.55,132.732,130.107847,408,-0.059065
16897,2020-05-17 21:30:00,129.313,129.392,129.298,129.392,129.55,132.732,130.10596,663,-0.063899


In [1319]:
fig = go.Figure(data=[go.Candlestick(x=dfx['date'], 
                        open=dfx['open'],
                        high=dfx['high'], low=dfx['low'], 
                        close=dfx['close'])])



fig.update_layout(
    title={'text': "GBP/JPY 30Min Zoomed In",
           'y':0.9,
           'x':0.5,
           'xanchor': 'center',
           'yanchor': 'top'},
    xaxis_title="Date",
    yaxis_title="Value",
    font=dict(
        family="Courier New, monospace",
        size=18,
        color="#7f7f7f"
        
    )
)

fig.show()

In [1320]:
df.head(3)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024


In [1321]:
df["direction"]=''
df.head(3)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,


Let's define direction, candle body size, upper wick and lower wick sizes and create columns to show

In [1322]:
#Create a new column that shows the candle direction based on open and close values
def direction(row):
    open_ = row[1]
    close_ = row[4]
    
    if open_ < close_:
        return  'long'
    elif open_ > close_:
        return 'short'
    else:
        return 'neutral'

market_direction = df.apply(direction, axis='columns').to_frame()
df=df.join(market_direction).drop('direction', axis=1)
df.head()

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,0
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long
3,2019-01-06 23:30:00,138.272,138.333,138.162,138.207,137.268,138.012,137.697588,9983,0.054335,short
4,2019-01-07 00:00:00,138.207,138.247,138.095,138.224,137.268,138.012,137.772113,10163,0.04955,long


In [1323]:
#Create a column for candle body size ad set it to the difference between open and close
df['body_size'] = abs(df['close']-df['open'])
df.head(3)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,0,body_size
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306


In [1324]:
#Rename our direction column to 'direction'
df.rename(columns={0: 'direction'}, inplace=True)

In [1325]:
df.head(3)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306


In [1326]:
df.direction.unique()

array(['short', 'long', 'neutral'], dtype=object)

In [1327]:
df[df['direction']=='neutral']
#Note that we have some candles that open neutral, these may need special attention at some point

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size
99,2019-01-08 23:30:00,138.535,138.579,138.459,138.535,137.268,138.891,138.293239,5431,-0.024939,neutral,0.0
232,2019-01-11 18:00:00,139.409,139.440,139.348,139.409,137.806,139.409,138.761073,5463,0.077637,neutral,0.0
432,2019-01-17 22:00:00,141.831,141.919,141.807,141.831,138.997,141.831,141.125988,3769,0.098235,neutral,0.0
594,2019-01-23 07:00:00,142.140,142.270,142.110,142.140,140.945,142.095,141.948276,8530,0.020207,neutral,0.0
986,2019-02-04 11:00:00,143.412,143.496,143.405,143.412,143.132,144.778,143.500401,5606,-0.001244,neutral,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
16645,2020-05-08 15:30:00,132.436,132.514,132.403,132.436,131.076,131.822,132.012220,6915,0.048074,neutral,0.0
17086,2020-05-21 20:00:00,131.435,131.479,131.406,131.435,129.550,132.157,131.694853,3162,-0.003171,neutral,0.0
17132,2020-05-22 19:00:00,130.961,130.993,130.904,130.961,129.550,132.157,130.966364,3347,0.009890,neutral,0.0
17723,2020-06-10 02:30:00,137.136,137.164,137.060,137.136,136.389,138.698,137.159132,3972,0.038602,neutral,0.0


In [1328]:
#Create columns for bottom and top wicks
def top_wick(row):
    open_ = row[1]
    high = row[2]
    low = row[3]
    close = row[4]
    direction = row[5]
    
    if direction == 'long':
        return high - close
    else:
        return high - open_



In [1329]:
upper_wick = df.apply(top_wick, axis='columns').to_frame()

In [1330]:
df=df.join(upper_wick)

In [1331]:
df.rename(columns={0:"top_wick"}, inplace=True)

In [1332]:
df.head(2)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111


In [1333]:
def bottom_wick(row):
    open_ = row[1]
    high = row[2]
    low = row[3]
    close = row[4]
    direction = row[5]
    
    if direction == 'short':
        return abs(low - close)
    else:
        return abs(low - open_)
    

In [1334]:
df.iloc[0][7]

137.52138154324214

In [1335]:
lower_wick = df.apply(bottom_wick, axis='columns').to_frame()
df=df.join(lower_wick)
df.rename(columns={0:"bottom_wick"}, inplace=True)
df.head(3)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043


## The plan

We need to work through the following: \
1) state whether a trade at the close of a candle would have produce a winning trade based on a basket of condtions \
2) using .describe() let's see if we can categorize the body and wick sizes based on their values \
3) based on the above two pieces of information can we narrow the trading conditions even further?

In [1336]:
#We could need nump for this section
import numpy as np

In [1337]:
#Let's see what the statistics show on this
df.describe()

Unnamed: 0,open,high,low,close,S/R,SR,vwma,volume,sent_30,body_size,top_wick,bottom_wick
count,18753.0,18753.0,18753.0,18753.0,18753.0,18753.0,18753.0,18753.0,18753.0,18753.0,18753.0,18753.0
mean,138.216676,138.307516,138.124705,138.217078,137.201333,139.386774,138.217508,8043.306404,-5.2e-05,0.088279,0.09084,0.091971
std,5.377768,5.366317,5.389297,5.377502,5.63332,5.391384,5.365859,7333.709869,0.04422,0.107721,0.112437,0.109098
min,124.586,125.179,123.991,124.586,127.028,128.549,125.429072,1.0,-0.249129,0.0,0.0,0.0
25%,133.603,133.725,133.482,133.603,132.394,135.073,133.602838,3528.0,-0.02424,0.026,0.026,0.028
50%,138.498,138.61,138.379,138.498,137.806,139.384,138.474414,6160.0,-0.000278,0.057,0.059,0.061
75%,142.846,142.911,142.774,142.846,141.943,144.157,142.841892,10063.0,0.022024,0.114,0.115,0.119
max,148.777,148.874,148.508,148.777,147.67,148.252,148.255961,104072.0,0.357465,3.841,3.887,3.18


So to figure out whether a trade was a success or not we need to see that the value increased by 15 pips in the next 5 candles and doesn't at any point drop below our stop loss which would be placed at the wick at the other end of the candle to our trade point.

In line with our hypothesis we first need to check whether our confirmation candle has closed in the opposite direction to the previous candle?

## Notes on the latest trade signal function

The following function is a highly simplified version of what we want the complete trade_class to be based on. 

What this function does:
1. Checks to see if a candle has closed in the opposite direction to the previous candle. if not, returns 'no trade'
2. Checks to see if a candle with a short trade has high higher than the next candle. if yes returns 'win' if not returns 'loss'
3. Checks to see if a candle with a long trade has lower low than the next candle. if yes returns 'win' if not returns 'loss'

Of course this is not accurately going to tell you whether a candle would have been a good trade or not. 

1. We need to check if the future candles actually improve on the position of the signal candle. Just not getting stopped out is no evidence of success. 
2. We need to check over a numnber of future candles as a trade could last over a period of 3-5 candles, but ideally no more. 
3. Where we're checking against multiple future candles we need to check what the order of events is... i.e. if we get stopped out on candle 1 then it's game over. But if we accrue a huge winning position over 2 or 3 candles before retracing to a stop out we need to decide if we would've taken a win iwhen it presented itself?

In [1338]:
#We need to set a global counter outside the function that is changed evertime the function runs
x = 0 

#Note the very first return should be blank as we cannot know if the first candle is a trade or not as we don't
#have the previous candle information

def trade_class(row): #this function will be applied to each row in our dataframe. 
    global x #let's make our variable global so we can change it's value from inside this function
    
#     open_ = row[1]
#     high = row[2]
#     low = row[3]
#     close = row[4]
#     direction = row[5]
    
    a = df['high'] #series to iterate over for our short trade success check
    b = df['low'] #series to iterate over for our long trade success check
    
    long_short = "" #set an empty variable to take a long or short position
    
    if x == 0:
        x += 1
        return "unknown"
    
    
    #let's first check if the candle presents a valid trading signal?
    if df.loc[x].direction == df.loc[x-1].direction or df.loc[x].direction == 'neutral':
        x += 1
        return "no_trade"
    elif df.loc[x].direction == "neutral":
        x += 1
        return "no_trade"
    elif df.loc[x].direction == "short":
        long_short = "short"
    else:
        long_short = "long"
        

    short_count = 0 # we need to set a counter so we can stop the loop once we've successfully returned 1 val
    long_count = 0 # same as above
        
  
    #Now we find out if a short trade is successful or not based on what the next candle does
    for i, j in enumerate(a[x:-1]): #we need to set the list from x as that reflects the index position we're at
        if short_count == 0:
            #if the next candle as a high greater than our candle then we'll be stopped out
            if long_short == "short":
                if a[x+1] > j: #We check if the high of the next 30minute candle is highere than our signal candle
                    short_count += 1
                    x += 1
                    return "loss" #if the next candle is higher then we lose the trade on a stop out
                    
                else:
                    short_count += 1
                    x += 1
                    return "win"
                    
            else:
                continue
    
    
    #Now we find out if a long trade is successful or not based on what the next candle does
    for i, j in enumerate(b[x:-1]):
        if long_count == 0:
            #if the next candle as a high greater than our candle then we'll be stopped out
            if long_short == "long":
                if b[x+1] < j: #We check out if the low of the newxt 30minute candle is lower than our signal candle
                    long_count += 1
                    x += 1
                    return "loss" #if the next candle has a lower low then we lose the trade on a stop out
            
                else:
                    long_count += 1
                    x += 1
                    return "win"

In [1339]:
result = df.apply(trade_class, axis='columns').to_frame()
df=df.join(result)
df.rename(columns={0:"trade_class"}, inplace=True)
df.head(3)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade


In [1340]:
df.trade_class.unique() #Hmm for some reason no losses are being returned!! 

array(['unknown', 'win', 'no_trade', 'loss'], dtype=object)

In [1341]:
#What's the resulting class split of trade_class feature
df.trade_class.value_counts()  #hmmm nice, 70% win rate of the trade signals found

no_trade    8864
win         6935
loss        2953
unknown        1
Name: trade_class, dtype: int64

In [1342]:
#Sanity check on the neutral trades, should open and close at the same price. 
df.loc[df['direction'] == "neutral"]


Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class
99,2019-01-08 23:30:00,138.535,138.579,138.459,138.535,137.268,138.891,138.293239,5431,-0.024939,neutral,0.0,0.044,0.076,no_trade
232,2019-01-11 18:00:00,139.409,139.440,139.348,139.409,137.806,139.409,138.761073,5463,0.077637,neutral,0.0,0.031,0.061,no_trade
432,2019-01-17 22:00:00,141.831,141.919,141.807,141.831,138.997,141.831,141.125988,3769,0.098235,neutral,0.0,0.088,0.024,no_trade
594,2019-01-23 07:00:00,142.140,142.270,142.110,142.140,140.945,142.095,141.948276,8530,0.020207,neutral,0.0,0.130,0.030,no_trade
986,2019-02-04 11:00:00,143.412,143.496,143.405,143.412,143.132,144.778,143.500401,5606,-0.001244,neutral,0.0,0.084,0.007,no_trade
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16645,2020-05-08 15:30:00,132.436,132.514,132.403,132.436,131.076,131.822,132.012220,6915,0.048074,neutral,0.0,0.078,0.033,no_trade
17086,2020-05-21 20:00:00,131.435,131.479,131.406,131.435,129.550,132.157,131.694853,3162,-0.003171,neutral,0.0,0.044,0.029,no_trade
17132,2020-05-22 19:00:00,130.961,130.993,130.904,130.961,129.550,132.157,130.966364,3347,0.009890,neutral,0.0,0.032,0.057,no_trade
17723,2020-06-10 02:30:00,137.136,137.164,137.060,137.136,136.389,138.698,137.159132,3972,0.038602,neutral,0.0,0.028,0.076,no_trade


In [1343]:
df.head(3)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade


## Trade Class Feature function 2.0

So far we have a dataframe that tells us if we can take a trade or not, based on our hypothesis, and whether that trade was successful when we measure the very next candle in the period. 

Now we want to apply a trade class based on a more complex set of metrics. 

1. Using the next 3 candles post trade, do any of them achieve the following:
    a. 8 pip gain
    b. 15 pip gain
    c. retrace to our stop loss.
    
2. If more than one of the above outcomes occur, what order do they occur in.

The following orders will yield the following trade classes:
a, b, c = 'strong win
a, c, b = 'weak win'
b, c, a = 'strong win'
b, a, c = 'strong win'
c, a, b = 'loss'
c, b, a = 'loss'

3. If none of the above occur we will return the following trade class
no outcome = 'no outcome'

In this function we have re-used some code from the trade_class function but have attempted to condense and simplify wherever possible. This function will hopefully supersede the previous one. 

In [1344]:
df1 = df #create a new df variable so that we have a good mid notebook checkpoint. 

In [1345]:
x = 0  #our current row loc stored in a global variable

def trade_class_two(row):
    global x #let's make our variable global so we can change it's value from inside this function
    
    
    #We'll add a print report to show the progress we're making mid function execute
    if x%150 == 0: 
        print("processing row {}".format(x))
    
    
    #create variables for each of the rows we may need to iterate over
    high = df1['high'] 
    low = df1['low'] 
    close = df1['close']
    open_ = df1['open']
    
   
    
    long_short = "" #set an empty variable to take a long or short position 
    
    if x == 0: 
        x += 1
        return "unknown" #We can't know what happens at row 0 as there's no earlier data to inform our decision. 
    
    
    #let's first check if the candle presents a valid trading signal?
    if df1.loc[x].direction == df1.loc[x-1].direction:
        x += 1
        return "no_trade"
    elif df.loc[x].direction == "neutral":
        x += 1
        return "no_trade" 
    elif df1.loc[x].direction == "short":
        long_short = "short"
    else:
        long_short = "long"
        
    short_count = 0 # set our loop counter to 0 but this time we'll loop when value is less than 3
    long_count = 0 # same as above
    
    can_list = [] #create an empty container to store our trade sub-outcomes

    '''
    First we'll focus on the short trade signals. We'll rehash the code use in trade_class
    '''
    
    #Now we find out if a short trade is successful or not based on next 3 candle values
    for i, (j, k, m, n) in enumerate(zip(high[x+1:], low[x+1:], close[x+1:], open_[x+1:])): 
        
        if len(can_list) < 3 and long_short == "short": #check both conditions are true before proceeding. 
            if high[x] < j: #We check if the high of the next 30minute candle is higher than our signal candle high

                can_list.append('loss') #add sub_outcome 'loss' to our value_list container. 
                    
            elif close[x] - k > 0.08: #Determines whether more than 8 pip winning move below our low

                can_list.append('win')
                
            else:

                can_list.append('no_score') #applicable for a minor retrace or small winning position < 0.08
        
        elif len(can_list) < 3 and long_short == "long":
            if k < low[x]: #We check out if the low of the newxt 30minute candle is lower than our signal candle

                can_list.append('loss')#if so then add this to the temp_list
                    
            elif j - close[x] > 0.08: #Determine whether we've had a greater than 8 pip move above our high

                can_list.append('win')
                    
            else:

                can_list.append('no_score')
           
    
    '''
    In this code section we look at the newly populated can_list and see what outcomes we have at each index location.
    We will return a different final trade outcome dependent on the order of the sub-outomes in the list. 
    
    I feel this section could be subject to some improvement but for now the if-else statement will serve purpose. 
    '''
    if len(can_list)==3: #If our container has stored the next 3 candle outcomes, proceed. 
        if can_list[0] == 'loss':
            x+=1 #increment x by one
            return 'loss' #and return the trade outcome
        elif can_list[0] == 'win':
            x+=1
            return 'win'
        elif can_list[0] == 'no_score' and can_list[1] == 'loss':
            x+=1
            return 'loss'
        elif can_list[0] == 'no_score' and can_list[1] == 'win':
            x+=1
            return 'win'
        elif can_list[0] == 'no_score' and can_list[1] == 'no_score' and can_list[2] == 'loss':
            x+=1
            return 'loss'
        elif can_list[0] == 'no_score' and can_list[1] == 'no_score' and can_list[2] == 'win':
            x+=1
            return 'win'
        else:
            x+=1
            return 'no_score'
    

In [1346]:
outcomes = df1.apply(trade_class_two, axis='columns').to_frame()
df1=df1.join(outcomes)
df1.rename(columns={0:"trade_class_two"}, inplace=True)
df1.head(3)

processing row 0
processing row 150
processing row 300
processing row 450
processing row 600
processing row 750
processing row 900
processing row 1050
processing row 1200
processing row 1350
processing row 1500
processing row 1650
processing row 1800
processing row 1950
processing row 2100
processing row 2250
processing row 2400
processing row 2550
processing row 2700
processing row 2850
processing row 3000
processing row 3150
processing row 3300
processing row 3450
processing row 3600
processing row 3750
processing row 3900
processing row 4050
processing row 4200
processing row 4350
processing row 4500
processing row 4650
processing row 4800
processing row 4950
processing row 5100
processing row 5250
processing row 5400
processing row 5550
processing row 5700
processing row 5850
processing row 6000
processing row 6150
processing row 6300
processing row 6450
processing row 6600
processing row 6750
processing row 6900
processing row 7050
processing row 7200
processing row 7350
processin

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade,no_trade


In [1347]:
df1.trade_class_two.value_counts() 
#So we have recorded 0 no_score outcomes which is good as it means we'll always be in short term trades.  

no_trade    8863
win         4550
loss        4291
no_score    1045
unknown        1
Name: trade_class_two, dtype: int64

In [1348]:
#We call up a section of the df to do a manual test on the results
df1.loc[136:140] 

#All looks good for this new column!! 
#Boom! That might be the best function you've written in your short coding career Jim. Give yourself a bun! 

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two
136,2019-01-09 18:00:00,138.382,138.4,138.28,138.318,137.268,138.891,138.442006,10160,-0.038144,short,0.064,0.018,0.102,loss,loss
137,2019-01-09 18:30:00,138.318,138.465,138.29,138.347,137.268,138.891,138.413848,7374,-0.035591,long,0.029,0.147,0.028,loss,loss
138,2019-01-09 19:00:00,138.347,138.517,138.235,138.337,137.268,138.891,138.387528,30148,-0.033293,short,0.01,0.17,0.112,loss,loss
139,2019-01-09 19:30:00,138.337,138.6,138.284,138.538,137.268,138.891,138.380822,16748,-0.025509,long,0.201,0.263,0.053,win,loss
140,2019-01-09 20:00:00,138.538,138.6,138.379,138.388,137.268,138.891,138.373328,9347,-0.023082,short,0.15,0.062,0.159,win,win


## Trading Periods

We'd like to split out the trading periods within our dataframe to see any trends regarding which periods may be more productive than others. We want to highlight the following periods:
DayofWeek(mon, tues...) \, Month(jan, feb...), Session(morning, afternoon).

In [1349]:
import datetime as dt

In [1350]:
df1.head(2) #our dataframe as it currently looks:

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win


In [1351]:
#We want to add the day of the week as a feature...
df1.date[1].day_name() 

'Sunday'

In [1352]:
#Let's use lamda to give our df a column that gives us the day of the week!! 
df1["day_of_week"] = df1.apply(lambda row : row.date.day_name(), axis=1)

In [1353]:
df1.head(2)
#all good! 

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown,Sunday
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win,Sunday


In [1354]:
#In this subsection let's create a basket of features for the year, month, day of the month (as an int), time. 
df1["day"] = df1.apply(lambda row : row.date.day, axis=1)
df1["month"] = df1.apply(lambda row : row.date.month, axis=1)
df1["year"] = df1.apply(lambda row : row.date.year, axis=1)
df1["time_24h"] = df1.apply(lambda row : row.date.time(), axis=1)

In [1355]:
df1.head(3)
#all good we now have a basket of periods to call on. 

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown,Sunday,6,1,2019,22:00:00
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win,Sunday,6,1,2019,22:30:00
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade,no_trade,Sunday,6,1,2019,23:00:00


In [1356]:
#There's another time period of interest, the trading session - morning, afternoon, evening, night

#Extracts the hour component from the datetime feature 'time'
hours = list(df1.date.dt.hour.values)
counter = 0 #we create this global variable that we can increment after each return of a new value

def session_picker(row):
    global counter
  
    
    '''
    This function looks at the hour and decides whether the session is in morning, afternoon, evening or night
    '''


    #We loop through the list of hours starting at our counter location and return a suitable session label. 
    for hour in hours[counter:]: 
        if hour >= 0 and hour < 7: 
            counter += 1 #increment the counter by 1 so we can move to the next index in hours list
            return 'night'
        elif hour >= 7 and hour < 12:
            counter += 1
            return 'morning'
        elif hour >= 12 and hour <= 17:
            counter += 1
            return 'afternoon'
        else:
            counter += 1
            return 'evening'

In [1357]:
sessions = df1.apply(session_picker, axis='columns').to_frame()
df1=df1.join(sessions)
df1.rename(columns={0:"session"}, inplace=True)

In [1358]:
df1.head(4)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown,Sunday,6,1,2019,22:00:00,evening
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win,Sunday,6,1,2019,22:30:00,evening
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade,no_trade,Sunday,6,1,2019,23:00:00,evening
3,2019-01-06 23:30:00,138.272,138.333,138.162,138.207,137.268,138.012,137.697588,9983,0.054335,short,0.065,0.061,0.11,win,win,Sunday,6,1,2019,23:30:00,evening


Ok that's all we'll need from a time period perspective for now. Let's look at some more features that we can engineer out of the data frame we've constructed. Potential new features as follows:
1) candle body size / candle body size compared with rest of day
2) retrace values / retrace as a percentage of signal candle
3) high impact news released or due to be released
4) high time frame resistance
5) higher time frame momentum / sentiment
6) time taken for a win or a loss to confirm
7) magnitude of a win or a loss

## Ave Recent Volumes
Let's create a feature that holds the average volume from the last 5 candles.

In [1359]:
import statistics

In [1360]:
df2 = df1

In [1361]:
#ave vol. last 5 candles
x = 0 #Set our start point as we don't have any data for previous period. 
 
def volume_five(row):
    global x
    y = x-5 #set our start for calculating the average volume
    
    vol = df2.volume
    
    if x < 5:
        x+=1
        return 'unknown'
    
    container = []
    
    for i, j in enumerate(vol[y:]):
        if len(container) < 5:
            container.append(j)
    
    
    
    result = statistics.mean(container)
    
    x+=1
    
    return result
        
        

In [1362]:
vol_five = df2.apply(volume_five, axis='columns').to_frame()
df2=df2.join(vol_five)
df2.rename(columns={0:"vfive"}, inplace=True)

In [1363]:
df2.head(10)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown,Sunday,6,1,2019,22:00:00,evening,unknown
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win,Sunday,6,1,2019,22:30:00,evening,unknown
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade,no_trade,Sunday,6,1,2019,23:00:00,evening,unknown
3,2019-01-06 23:30:00,138.272,138.333,138.162,138.207,137.268,138.012,137.697588,9983,0.054335,short,0.065,0.061,0.11,win,win,Sunday,6,1,2019,23:30:00,evening,unknown
4,2019-01-07 00:00:00,138.207,138.247,138.095,138.224,137.268,138.012,137.772113,10163,0.04955,long,0.017,0.04,0.112,win,loss,Monday,7,1,2019,00:00:00,night,unknown
5,2019-01-07 00:30:00,138.224,138.281,138.145,138.15,137.268,138.012,137.854514,6684,0.042286,short,0.074,0.057,0.079,win,loss,Monday,7,1,2019,00:30:00,night,7188
6,2019-01-07 01:00:00,138.15,138.232,138.097,138.218,137.268,138.012,137.944878,7148,0.037044,long,0.068,0.082,0.053,loss,loss,Monday,7,1,2019,01:00:00,night,8176.8
7,2019-01-07 01:30:00,138.218,138.282,138.024,138.087,137.268,138.012,138.033155,8658,0.028058,short,0.131,0.064,0.194,win,win,Monday,7,1,2019,01:30:00,night,9090.6
8,2019-01-07 02:00:00,138.087,138.105,137.961,138.048,137.268,138.012,138.106844,5759,0.01879,short,0.039,0.018,0.126,no_trade,no_trade,Monday,7,1,2019,02:00:00,night,8527.2
9,2019-01-07 02:30:00,138.048,138.074,137.927,137.93,137.268,138.012,138.121546,4309,0.007308,short,0.118,0.026,0.121,no_trade,no_trade,Monday,7,1,2019,02:30:00,night,7682.4


Now lets create a feature that holds the relative volume of the signal candle in comparison to the ave vol. 

In [1364]:
# We need to turn the vfive feature into an int, however we have the unknowns at the top of the df so skip these as 
#they will be dropped once our final dataframe is built. 
df2['vfive'][5:] = pd.to_numeric(df2['vfive'][5:])

In [1365]:
ave = df2["volume"][5:] / df2["vfive"][5:]
df2["rel_vol"] = ave #This will need to be rounded at some point when we normalize

In [1366]:
df2.loc[5:8]

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol
5,2019-01-07 00:30:00,138.224,138.281,138.145,138.15,137.268,138.012,137.854514,6684,0.042286,short,0.074,0.057,0.079,win,loss,Monday,7,1,2019,00:30:00,night,7188.0,0.929883
6,2019-01-07 01:00:00,138.15,138.232,138.097,138.218,137.268,138.012,137.944878,7148,0.037044,long,0.068,0.082,0.053,loss,loss,Monday,7,1,2019,01:00:00,night,8176.8,0.874181
7,2019-01-07 01:30:00,138.218,138.282,138.024,138.087,137.268,138.012,138.033155,8658,0.028058,short,0.131,0.064,0.194,win,win,Monday,7,1,2019,01:30:00,night,9090.6,0.952412
8,2019-01-07 02:00:00,138.087,138.105,137.961,138.048,137.268,138.012,138.106844,5759,0.01879,short,0.039,0.018,0.126,no_trade,no_trade,Monday,7,1,2019,02:00:00,night,8527.2,0.675368


In [1367]:
df2.columns

Index(['date', 'open', 'high', 'low', 'close', 'S/R', 'SR', 'vwma', 'volume',
       'sent_30', 'direction', 'body_size', 'top_wick', 'bottom_wick',
       'trade_class', 'trade_class_two', 'day_of_week', 'day', 'month', 'year',
       'time_24h', 'session', 'vfive', 'rel_vol'],
      dtype='object')

In [1368]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18753 entries, 0 to 18752
Data columns (total 24 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   date             18753 non-null  datetime64[ns]
 1   open             18753 non-null  float64       
 2   high             18753 non-null  float64       
 3   low              18753 non-null  float64       
 4   close            18753 non-null  float64       
 5   S/R              18753 non-null  float64       
 6   SR               18753 non-null  float64       
 7   vwma             18753 non-null  float64       
 8   volume           18753 non-null  int64         
 9   sent_30          18753 non-null  float64       
 10  direction        18753 non-null  object        
 11  body_size        18753 non-null  float64       
 12  top_wick         18753 non-null  float64       
 13  bottom_wick      18753 non-null  float64       
 14  trade_class      18753 non-null  objec

## Win and Loss Sizes!

Let's create a feature that gives us a win size and a loss size based on the next three candles. We can surely re-use some code from the trade_class_two function?

In [1369]:
df2.head(1)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown,Sunday,6,1,2019,22:00:00,evening,unknown,


In [1370]:
df2.trade_class_two.value_counts()

no_trade    8863
win         4550
loss        4291
no_score    1045
unknown        1
Name: trade_class_two, dtype: int64

In [1371]:
x = 0  #our current row loc stored in a global variable

def profit_loss(row):
    global x #let's make our variable global so we can change it's value from inside this function
    
    
    #We'll add a print report to show the progress we're making mid function execute
    if x%1000 == 0: 
        print("processing row {}".format(x))
    
    
    #create variables for each of the rows we may need to iterate over
    high = df2['high'] 
    low = df2['low'] 
    close = df2['close']
    open_ = df2['open']
    trade = df2['trade_class_two']
    
   
    
    long_short = "" #set an empty variable to take a long or short position 
    win_loss = "" #set an empty variable to take a win or loss outcom
    
    if x == 0: 
        x += 1
        return "unknown" #We can't know what happens at row 0 as there's no earlier data to inform our decision. 
    
    
    #let's first check if the candle presents a valid trading signal?
    if df2.loc[x].direction == df2.loc[x-1].direction:
        x += 1
        return "no_trade"
    elif df.loc[x].direction == "neutral":
        x += 1
        return "no_trade"
    elif df2.loc[x].direction == "short":
        long_short = "short"
    else:
        long_short = "long"
    
    #Then let's check if the candle returned a no_score, win or loss. 
    if df2.loc[x].trade_class_two == "no_score":
        x+=1
        return "no_score"
    elif df2.loc[x].trade_class_two == "win":
        win_loss = "win"
    else:
        win_loss = "loss"
        
        
    short_count = 0 # set our loop counter to 0 but this time we'll loop when value is less than 3
    long_count = 0 # same as above
    
    can_list = [] #create an empty container to store our trade sub-outcomes

    '''
    First we'll focus on the short trade signals. We'll rehash the code use in trade_class
    '''
    
    if win_loss == "loss" and long_short == "short":
        x+=1
        return abs(high[x]-close[x])#return the signal candle size as this will be our loss value 
        
    elif win_loss == "loss" and long_short == "long":
        x+=1
        return abs(close[x]-low[x])
            
    
    #Now we find out if a short trade is successful or not based on next 3 candle values
    for i, (j, k, m, n) in enumerate(zip(high[x+1:], low[x+1:], close[x+1:], open_[x+1:])): 
        if len(can_list) < 3:
            if win_loss == "win" and long_short == "short":
                
                can_list.append(abs(close[x] - k))
                
            elif win_loss == "win" and long_short == "long":
                
                can_list.append(abs(j - close[x]))
    
    x+=1       
    
    return max(can_list)
    

In [1372]:
result_size = df2.apply(profit_loss, axis='columns').to_frame()
df2=df2.join(result_size)
df2.rename(columns={0:"out_mag"}, inplace=True)

processing row 0
processing row 1000
processing row 2000
processing row 3000
processing row 4000
processing row 5000
processing row 6000
processing row 7000
processing row 8000
processing row 9000
processing row 10000
processing row 11000
processing row 12000
processing row 13000
processing row 14000
processing row 15000
processing row 16000
processing row 17000
processing row 18000


In [1373]:
df2.head()

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown,Sunday,6,1,2019,22:00:00,evening,unknown,,unknown
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win,Sunday,6,1,2019,22:30:00,evening,unknown,,0.367
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade,no_trade,Sunday,6,1,2019,23:00:00,evening,unknown,,no_trade
3,2019-01-06 23:30:00,138.272,138.333,138.162,138.207,137.268,138.012,137.697588,9983,0.054335,short,0.065,0.061,0.11,win,win,Sunday,6,1,2019,23:30:00,evening,unknown,,0.112
4,2019-01-07 00:00:00,138.207,138.247,138.095,138.224,137.268,138.012,137.772113,10163,0.04955,long,0.017,0.04,0.112,win,loss,Monday,7,1,2019,00:00:00,night,unknown,,0.005


## Trade / No Trade
We have already completed this task of sorts by adding a no_trade class to the trade class features. However it might come in handy to have a feature that simply signals a trade or no trade based on our hypothesis. This may make it easier to split the data frame at a later stage. 

This function will re-use some of the code already utilised in the trade features. 

In [1374]:
x = 0

def trade_no_trade(row):
    global x
    
    #We have no previous information so we must return "unknown at x=0"
    if x == 0: 
        x += 1
        return "unknown"
    
    #let's check if we have a valid trading signal
    if df2.loc[x].direction == df2.loc[x-1].direction:
        x += 1
        return "no_trade"
    elif df.loc[x].direction == "neutral":
        x += 1
        return "no_trade"
    else:
        x+=1
        return "trade"
    

In [1375]:
signal = df2.apply(trade_no_trade, axis='columns').to_frame()
df2=df2.join(signal)
df2.rename(columns={0:"signal"}, inplace=True)

In [1376]:
df2.tail(10)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag,signal
18743,2020-07-09 08:30:00,135.56,135.633,135.509,135.612,132.184,135.505,135.449953,5975,0.010271,long,0.052,0.073,0.051,no_trade,no_trade,Thursday,9,7,2020,08:30:00,morning,6290.2,0.94989,no_trade,no_trade
18744,2020-07-09 09:00:00,135.612,135.666,135.574,135.658,132.184,135.505,135.463788,5198,0.011563,long,0.046,0.054,0.038,no_trade,no_trade,Thursday,9,7,2020,09:00:00,morning,6470.2,0.803375,no_trade,no_trade
18745,2020-07-09 09:30:00,135.658,135.919,135.634,135.783,132.184,135.505,135.485882,5926,0.0155,long,0.125,0.261,0.024,no_trade,no_trade,Thursday,9,7,2020,09:30:00,morning,6695.0,0.885138,no_trade,no_trade
18746,2020-07-09 10:00:00,135.783,135.808,135.721,135.756,132.184,135.505,135.509014,4515,0.017308,short,0.027,0.025,0.062,win,win,Thursday,9,7,2020,10:00:00,morning,6419.6,0.703315,0.126,trade
18747,2020-07-09 10:30:00,135.756,135.783,135.662,135.677,132.184,135.505,135.529186,6836,0.01602,short,0.079,0.027,0.094,no_trade,no_trade,Thursday,9,7,2020,10:30:00,morning,5766.8,1.18541,no_trade,no_trade
18748,2020-07-09 11:00:00,135.677,135.75,135.63,135.662,132.184,135.505,135.550563,6192,0.014076,short,0.015,0.073,0.047,no_trade,no_trade,Thursday,9,7,2020,11:00:00,morning,5690.0,1.08822,no_trade,no_trade
18749,2020-07-09 11:30:00,135.662,135.76,135.643,135.737,132.184,135.505,135.57293,4757,0.014086,long,0.075,0.098,0.019,win,loss,Thursday,9,7,2020,11:30:00,morning,5733.4,0.8297,0.036,trade
18750,2020-07-09 12:00:00,135.737,135.777,135.653,135.689,132.184,135.505,135.588551,5204,0.012277,short,0.048,0.04,0.084,win,,Thursday,9,7,2020,12:00:00,afternoon,5645.2,0.921845,0.035,trade
18751,2020-07-09 12:30:00,135.689,135.762,135.633,135.727,132.184,135.505,135.608491,6891,0.011409,long,0.038,0.073,0.056,win,,Thursday,9,7,2020,12:30:00,afternoon,5500.8,1.25273,0.173,trade
18752,2020-07-09 13:00:00,135.727,135.852,135.675,135.848,132.184,135.505,135.628175,5410,0.013529,long,0.121,0.125,0.052,no_trade,,Thursday,9,7,2020,13:00:00,afternoon,5976.0,0.905288,no_trade,no_trade


In [1377]:
df2.trade_class_two.value_counts()

no_trade    8863
win         4550
loss        4291
no_score    1045
unknown        1
Name: trade_class_two, dtype: int64

In [1378]:
#The difference in values by 2 trades and 1 no_trade is due to the 3 none values at tail of trade_class_two
df2.signal.value_counts()

trade       9888
no_trade    8864
unknown        1
Name: signal, dtype: int64

In [1379]:
#The difference in values by 2 trades and 1 no_trade is due to the 3 none values at tail of trade_class_two
df2.loc[~(df2["trade_class_two"] == df2["signal"])].tail(3)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag,signal
18750,2020-07-09 12:00:00,135.737,135.777,135.653,135.689,132.184,135.505,135.588551,5204,0.012277,short,0.048,0.04,0.084,win,,Thursday,9,7,2020,12:00:00,afternoon,5645.2,0.921845,0.035,trade
18751,2020-07-09 12:30:00,135.689,135.762,135.633,135.727,132.184,135.505,135.608491,6891,0.011409,long,0.038,0.073,0.056,win,,Thursday,9,7,2020,12:30:00,afternoon,5500.8,1.25273,0.173,trade
18752,2020-07-09 13:00:00,135.727,135.852,135.675,135.848,132.184,135.505,135.628175,5410,0.013529,long,0.121,0.125,0.052,no_trade,,Thursday,9,7,2020,13:00:00,afternoon,5976.0,0.905288,no_trade,no_trade


## Break of Previous Candle values

This feature will state whether we have closed within the previous candle body, outside the body but in the wick, or beyond the wick. This will likely become a categorical variable. 


Plan, 
1. On a long position check if our close was \
    a) below open of last candle (return "in_body") \
    b) above open of last candle, and below high of last candle, (return "in_wick") \
    c) above high of last candle (return "out_wick") \


2. On a short position check if our close was \
    a) above open of last candle \
    b) below open of last candle, and above low of last candle \
    c) below low of last candle

In [1380]:
df2.direction.value_counts() #check what these neutral values have been returning! Have we messed up?

short      9355
long       9318
neutral      80
Name: direction, dtype: int64

In [1381]:
#Set a counter to be incremented from within the function
x = 0
def break_check(row):
    
    #Make our global counter available from within the function
    global x 
    
    #We'll add a print report to show the progress we're making mid function execute
    if x%1000 == 0: 
        print("processing row {}".format(x))
    
    #We have no previous information so we must return "unknown at x=0"
    if x == 0: 
        x += 1
        return "unknown"
    
    
    #Set our local variables
    close = df2['close'][x]
    
    #previous candle variable
    op = df2['open'][x-1]
    hi = df2['high'][x-1]
    lo = df2['low'][x-1]
    
    #Set an empty variable to contain the direction of trade
    long_short = ""
    
    
    
    #Create a conditional to send a long or short signal to our long_short variable
    if df2.loc[x].direction == df2.loc[x-1].direction:
        x += 1
        return "no_trade"
    elif df2.loc[x].direction == 'neutral':
        x+=1
        return "no_trade"
    elif df2.loc[x].direction == "short":
        long_short = "short"
    else:
        long_short = "long"
        
    
    if long_short == "short":
        if close > op:
            x+=1
            return "in_body"
        elif close <= op and close >= lo:
            x+=1
            return "in_wick"
        elif close < lo :
            x+=1
            return "out_wick"
        
    elif long_short == "long":
        if close < op:
            x+=1
            return "in_body"
        elif close >= op and close <= hi:
            x+=1
            return "in_wick"
        elif close > hi:
            x+=1
            return "out_wick"
    
    
    
    
    
    

In [1382]:
closer = df2.apply(break_check, axis='columns').to_frame()
df2=df2.join(closer)
df2.rename(columns={0:"break_level"}, inplace=True)

processing row 0
processing row 1000
processing row 2000
processing row 3000
processing row 4000
processing row 5000
processing row 6000
processing row 7000
processing row 8000
processing row 9000
processing row 10000
processing row 11000
processing row 12000
processing row 13000
processing row 14000
processing row 15000
processing row 16000
processing row 17000
processing row 18000


In [1383]:
df2.head()

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag,signal,break_level
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown,Sunday,6,1,2019,22:00:00,evening,unknown,,unknown,unknown,unknown
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win,Sunday,6,1,2019,22:30:00,evening,unknown,,0.367,trade,in_body
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade,no_trade,Sunday,6,1,2019,23:00:00,evening,unknown,,no_trade,no_trade,no_trade
3,2019-01-06 23:30:00,138.272,138.333,138.162,138.207,137.268,138.012,137.697588,9983,0.054335,short,0.065,0.061,0.11,win,win,Sunday,6,1,2019,23:30:00,evening,unknown,,0.112,trade,in_body
4,2019-01-07 00:00:00,138.207,138.247,138.095,138.224,137.268,138.012,137.772113,10163,0.04955,long,0.017,0.04,0.112,win,loss,Monday,7,1,2019,00:00:00,night,unknown,,0.005,trade,in_body


In [1384]:
df2.break_level.value_counts()

no_trade    8864
in_body     5050
out_wick    2692
in_wick     2146
unknown        1
Name: break_level, dtype: int64

## Higher Time Frame Sentiment
Higher time frame sentiment could well play a part in the success or otherwise of a trade. We should build a feature that states whether a trade signal occurs within a period of bullish or bearish 1hr, 4hr and daily sentiment. For this base dataframe we'll stick to 4hr sentiment as a mid range. 


In [1385]:
!ls

Double_Tap_Feature_Engineering.ipynb gj_30minC.csv
FX_GBPJPY, 30.csv                    gj_30minsupres.csv
[34mForex[m[m                                gj_4base.csv
[34mTelegraphFF[m[m                          gj_4hr.csv
gj_30base.csv                        gj_5min.csv
gj_30min.csv                         gpbjpy_dataframe_gen.ipynb
gj_30minA.csv                        tvexp_gj30min.csv
gj_30minB.csv                        tvexp_gj5min.csv


In [1386]:
#First we'll read in the 4hr dataframe
fourh = pd.read_csv('gj_4base.csv', parse_dates=["time"])

In [1387]:
fourh 
#Note that the four hour start times change from even hours to odd hours, this must be something to do with clock 
#changes in the uk although the dates don't really correlate. 

Unnamed: 0,time,sent_4
0,2019-01-06 22:00:00,0.004876
1,2019-01-07 02:00:00,0.037737
2,2019-01-07 06:00:00,0.065515
3,2019-01-07 10:00:00,0.098051
4,2019-01-07 14:00:00,0.135130
...,...,...
2342,2020-07-08 21:00:00,0.085336
2343,2020-07-09 01:00:00,0.089286
2344,2020-07-09 05:00:00,0.093195
2345,2020-07-09 09:00:00,0.097192


In [1388]:
fourh.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2347 entries, 0 to 2346
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   time    2347 non-null   datetime64[ns]
 1   sent_4  2347 non-null   float64       
dtypes: datetime64[ns](1), float64(1)
memory usage: 36.8 KB


In [1389]:
#Let's check that these time changes are reflected equally in both dataframes
fourh.loc[fourh["time"]=="2020-06-21 21:00:00"] 

Unnamed: 0,time,sent_4
2264,2020-06-21 21:00:00,-0.220474


In [1390]:
df2.loc[df2["date"]=="2020-06-21 21:00:00"] 
#Ok so we can have some confidence that this time shift is consistent across both datasets. 

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag,signal,break_level
18096,2020-06-21 21:00:00,131.861,131.907,131.758,131.897,132.394,138.698,132.218627,133,-0.012475,long,0.036,0.046,0.103,win,win,Sunday,21,6,2020,21:00:00,evening,4139.4,0.0321303,0.116,trade,in_body


In [1391]:
df2.shape

(18753, 27)

In [1392]:
fourh.shape

(2347, 2)

We need to figure out a way of merging the relevant rows from the sent_4 feature column to the appropriate rows in our thirty minute time frame dataset. This will mean copying the same sent_4 values 8 times across into the 8 corresponding thirty minute rows. 

It's worth noting that this is a high risk feature that may contain errors due to inaccuracies in the base data. Indeed the number of 4hr rows does not nicely divide into the number of rows in our master dataset... however we're talking about 3 four hour rows too many across 2346 so this should only be responsible for minimum error. 

In [1393]:
df2.loc[df2.day_of_week == "Sunday"].tail(2)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag,signal,break_level
18580,2020-07-05 23:00:00,134.13,134.204,134.118,134.159,132.184,134.465,134.069715,1872,0.014189,long,0.029,0.074,0.012,no_trade,no_trade,Sunday,5,7,2020,23:00:00,evening,1509.4,1.24023,no_trade,no_trade,no_trade
18581,2020-07-05 23:30:00,134.159,134.24,134.121,134.203,132.184,134.465,134.076066,1526,0.015477,long,0.044,0.081,0.038,no_trade,no_trade,Sunday,5,7,2020,23:30:00,evening,1044.4,1.46113,no_trade,no_trade,no_trade


Plan: \
We'll merge our two dataframes on the matching datetimes. This will clearly leave numerous nan values but these will then be filled across the subsequent thirty minute rows. 

In [1421]:
#merge the dataframes
df3 = df2.merge(fourh, how="left", left_on="date", right_on="time")
df3.head(5) 

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag,signal,break_level,time,sent_4
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown,Sunday,6,1,2019,22:00:00,evening,unknown,,unknown,unknown,unknown,2019-01-06 22:00:00,0.004876
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win,Sunday,6,1,2019,22:30:00,evening,unknown,,0.367,trade,in_body,NaT,
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade,no_trade,Sunday,6,1,2019,23:00:00,evening,unknown,,no_trade,no_trade,no_trade,NaT,
3,2019-01-06 23:30:00,138.272,138.333,138.162,138.207,137.268,138.012,137.697588,9983,0.054335,short,0.065,0.061,0.11,win,win,Sunday,6,1,2019,23:30:00,evening,unknown,,0.112,trade,in_body,NaT,
4,2019-01-07 00:00:00,138.207,138.247,138.095,138.224,137.268,138.012,137.772113,10163,0.04955,long,0.017,0.04,0.112,win,loss,Monday,7,1,2019,00:00:00,night,unknown,,0.005,trade,in_body,NaT,


In [1422]:
df3.shape

(18753, 29)

In [1423]:
#Now we create a new feature which fills all the nan values based on the initial value
df3["sent_4h"] = df3[["sent_4"]].fillna(method='ffill')

In [1424]:
df3.tail()

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag,signal,break_level,time,sent_4,sent_4h
18748,2020-07-09 11:00:00,135.677,135.75,135.63,135.662,132.184,135.505,135.550563,6192,0.014076,short,0.015,0.073,0.047,no_trade,no_trade,Thursday,9,7,2020,11:00:00,morning,5690.0,1.08822,no_trade,no_trade,no_trade,NaT,,0.097192
18749,2020-07-09 11:30:00,135.662,135.76,135.643,135.737,132.184,135.505,135.57293,4757,0.014086,long,0.075,0.098,0.019,win,loss,Thursday,9,7,2020,11:30:00,morning,5733.4,0.8297,0.036,trade,in_wick,NaT,,0.097192
18750,2020-07-09 12:00:00,135.737,135.777,135.653,135.689,132.184,135.505,135.588551,5204,0.012277,short,0.048,0.04,0.084,win,,Thursday,9,7,2020,12:00:00,afternoon,5645.2,0.921845,0.035,trade,in_body,NaT,,0.097192
18751,2020-07-09 12:30:00,135.689,135.762,135.633,135.727,132.184,135.505,135.608491,6891,0.011409,long,0.038,0.073,0.056,win,,Thursday,9,7,2020,12:30:00,afternoon,5500.8,1.25273,0.173,trade,in_body,NaT,,0.097192
18752,2020-07-09 13:00:00,135.727,135.852,135.675,135.848,132.184,135.505,135.628175,5410,0.013529,long,0.121,0.125,0.052,no_trade,,Thursday,9,7,2020,13:00:00,afternoon,5976.0,0.905288,no_trade,no_trade,no_trade,2020-07-09 13:00:00,0.097155,0.097155


In [1425]:
df3.drop(columns=["sent_4", "time"], inplace=True)

In [1426]:
df3.head(3)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag,signal,break_level,sent_4h
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown,Sunday,6,1,2019,22:00:00,evening,unknown,,unknown,unknown,unknown,0.004876
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win,Sunday,6,1,2019,22:30:00,evening,unknown,,0.367,trade,in_body,0.004876
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade,no_trade,Sunday,6,1,2019,23:00:00,evening,unknown,,no_trade,no_trade,no_trade,0.004876


In [1427]:
df3.direction.value_counts()

short      9355
long       9318
neutral      80
Name: direction, dtype: int64

# Volume Weighted Candle Metrics

We will now create 3 features that measure how the signal candle compares with the volume weighted moving average. \
1) A binary feature that states whether the signal candle crosses through the volume weighted moving average, or not. \
2) A positive candle feature value that states the distance to VWMA from the close of a candle moving toward VWMA \
3) A negative candle feature valiue that states the distance to VWMA from the close of candle moving away from VWMA \

In [1428]:
def crossing_vwma(row):
    """
    This function checks whether a candle crosses the vwma and returns True if so, and False if not
    """
    dir_ = row[10]
    open_ = row[1]
    close = row[4]
    vol = row[7]
    
    if dir_ == "short":
        if open_ >= vol and close <= vol:
            return 1
        else:
            return 0
    elif dir_ == "long":
        if open_ <= vol and close >= vol:
            return 1
        else:
            return 0
    else:
        return 0

In [1429]:
xvol = df3.apply(crossing_vwma, axis='columns').to_frame()
df3=df3.join(xvol)
df3.rename(columns={0:"x_vwma"}, inplace=True)

In [1430]:
df3.head()

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag,signal,break_level,sent_4h,x_vwma
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown,Sunday,6,1,2019,22:00:00,evening,unknown,,unknown,unknown,unknown,0.004876,0
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win,Sunday,6,1,2019,22:30:00,evening,unknown,,0.367,trade,in_body,0.004876,0
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade,no_trade,Sunday,6,1,2019,23:00:00,evening,unknown,,no_trade,no_trade,no_trade,0.004876,0
3,2019-01-06 23:30:00,138.272,138.333,138.162,138.207,137.268,138.012,137.697588,9983,0.054335,short,0.065,0.061,0.11,win,win,Sunday,6,1,2019,23:30:00,evening,unknown,,0.112,trade,in_body,0.004876,0
4,2019-01-07 00:00:00,138.207,138.247,138.095,138.224,137.268,138.012,137.772113,10163,0.04955,long,0.017,0.04,0.112,win,loss,Monday,7,1,2019,00:00:00,night,unknown,,0.005,trade,in_body,0.004876,0


In [1431]:
df3.x_vwma.value_counts() # That's 14% of candles that cross the vwma... seems on the outer edge of plausibility

0    16127
1     2626
Name: x_vwma, dtype: int64

In [1432]:
df3.loc[df3.x_vwma==1].sample(n=5) #Let's check a sample of x_vwma==1 to ensure the function is accurate

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag,signal,break_level,sent_4h,x_vwma
11590,2019-12-10 09:00:00,142.923,143.048,142.735,142.754,142.633,143.058,142.818759,8429,0.00025,short,0.169,0.125,0.188,win,loss,Tuesday,10,12,2019,09:00:00,morning,4913.6,1.71544,0.04,trade,in_wick,-0.006887,1
9407,2019-10-07 20:30:00,131.926,131.926,131.785,131.814,132.652,131.932,131.862766,1922,0.023416,short,0.112,0.0,0.141,win,win,Monday,7,10,2019,20:30:00,evening,5333.6,0.360357,0.094,trade,out_wick,-0.002683,1
2340,2019-03-14 15:00:00,147.851,148.206,147.844,148.092,144.556,147.328,148.000052,11766,-0.051558,long,0.241,0.355,0.007,no_trade,no_trade,Thursday,14,3,2019,15:00:00,afternoon,13555.4,0.867994,no_trade,no_trade,no_trade,0.187902,1
7692,2019-08-19 03:00:00,129.18,129.26,129.151,129.259,127.07,129.306,129.185583,3909,-0.024454,long,0.079,0.08,0.029,no_trade,no_trade,Monday,19,8,2019,03:00:00,night,5716.4,0.683822,no_trade,no_trade,no_trade,0.228455,1
11897,2019-12-18 18:30:00,143.369,143.37,143.247,143.258,143.354,147.789,143.339906,2867,0.041536,short,0.111,0.001,0.122,no_trade,no_trade,Wednesday,18,12,2019,18:30:00,evening,6382.4,0.449204,no_trade,no_trade,no_trade,-0.249781,1


In [1433]:
#The above sample rows have been accurately calculated for x_vwma feature. We can move on. 

In [1434]:
def pos_can(row):
    """
    This function checks whether a signal candle is moving toward the vwma. If so it will return the distance from the
    closing value to the vwma
    """
    
    dir_ = row[10]
    open_ = row[1]
    close = row[4]
    vol = row[7]
    
    if dir_ == "short":
        if close > vol:
            return close - vol 
        else:
            return 0
    elif dir_ == "long":
        if close < vol:
            return vol - close
        else: 
            return 0
    else:
        return 0
    
    

In [1435]:
pos = df3.apply(pos_can, axis='columns').to_frame()
df3=df3.join(pos)
df3.rename(columns={0:"can_vwma"}, inplace=True)

In [1436]:
df3.head()

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag,signal,break_level,sent_4h,x_vwma,can_vwma
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown,Sunday,6,1,2019,22:00:00,evening,unknown,,unknown,unknown,unknown,0.004876,0,0.410618
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win,Sunday,6,1,2019,22:30:00,evening,unknown,,0.367,trade,in_body,0.004876,0,0.0
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade,no_trade,Sunday,6,1,2019,23:00:00,evening,unknown,,no_trade,no_trade,no_trade,0.004876,0,0.0
3,2019-01-06 23:30:00,138.272,138.333,138.162,138.207,137.268,138.012,137.697588,9983,0.054335,short,0.065,0.061,0.11,win,win,Sunday,6,1,2019,23:30:00,evening,unknown,,0.112,trade,in_body,0.004876,0,0.509412
4,2019-01-07 00:00:00,138.207,138.247,138.095,138.224,137.268,138.012,137.772113,10163,0.04955,long,0.017,0.04,0.112,win,loss,Monday,7,1,2019,00:00:00,night,unknown,,0.005,trade,in_body,0.004876,0,0.0


In [1437]:
def neg_can(row):
    """
    This function checks whether a signal candle is moving away from the vwma. If so it will return the distance 
    from the closing value to the vwma.
    """
    
    dir_ = row[10]
    open_ = row[1]
    close = row[4]
    vol = row[7]
    
    if dir_ == "short":
        if close < vol:
            return vol - close
        else:
            return 0
    elif dir_ == "long":
        if close > vol:
            return close - vol
        else: 
            return 0
    else:
        return 0

In [1438]:
neg = df3.apply(neg_can, axis='columns').to_frame()
df3=df3.join(neg)
df3.rename(columns={0:"neg_can"}, inplace=True)

In [1439]:
df3.rename(columns={"can_vwma":"pos_can"}, inplace=True)

In [1440]:
df3.head(3) 

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag,signal,break_level,sent_4h,x_vwma,pos_can,neg_can
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown,Sunday,6,1,2019,22:00:00,evening,unknown,,unknown,unknown,unknown,0.004876,0,0.410618,0.0
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win,Sunday,6,1,2019,22:30:00,evening,unknown,,0.367,trade,in_body,0.004876,0,0.0,0.427525
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade,no_trade,Sunday,6,1,2019,23:00:00,evening,unknown,,no_trade,no_trade,no_trade,0.004876,0,0.0,0.681112


# Detailed Trading Signal
After reviewing the out_mag we need to make a profit and loss feature and then we can also create a more detailed trade class feature. 

The profit and loss feature should be equal to the out_mag feature except negatives for losses should be recorded and positives for wins. 

The more detailed trade class feature should have the following classes: \
1) >8 pip gain == win \
2) >15 pip gain == strong win \
3) >8 pip loss == loss \
4) >15 pip loss == strong loss 

In [1441]:
x = 0  #our current row loc stored in a global variable

def net_result(row):
    global x #let's make our variable global so we can change it's value from inside this function
    
    
    #We'll add a print report to show the progress we're making mid function execute
    if x%1000 == 0: 
        print("processing row {}".format(x))
    
    
    #create variables for each of the rows we may need to iterate over
    high = df3['high'] 
    low = df3['low'] 
    close = df3['close']
    open_ = df3['open']
    trade = df3['trade_class_two']
    
   
    
    long_short = "" #set an empty variable to take a long or short position 
    win_loss = "" #set an empty variable to take a win or loss outcom
    
    if x == 0: 
        x += 1
        return 0 #We can't know what happens at row 0 as there's no earlier data to inform our decision. 
    
    
    #let's first check if the candle presents a valid trading signal?
    if df3.loc[x].direction == df3.loc[x-1].direction:
        x += 1
        return 0
    elif df3.loc[x].direction == "neutral":
        x += 1
        return 0
    elif df3.loc[x].direction == "short":
        long_short = "short"
    else:
        long_short = "long"
    
    #Then let's check if the candle returned a no_score, win or loss. 
    if df3.loc[x].trade_class_two == "no_score":
        x+=1
        return 0
    elif df3.loc[x].trade_class_two == "win":
        win_loss = "win"
    else:
        win_loss = "loss"
        
        
    short_count = 0 # set our loop counter to 0 but this time we'll loop when value is less than 3
    long_count = 0 # same as above
    
    can_list = [] #create an empty container to store our trade sub-outcomes

    '''
    First we'll focus on the short trade signals. We'll rehash the code use in trade_class
    '''
    
    if win_loss == "loss" and long_short == "short":
        x+=1
        return close[x]-high[x]#return the signal candle size + wick as this will be our loss value 
        
    elif win_loss == "loss" and long_short == "long":
        x+=1
        return low[x]-close[x]
            
    
    #Now we find out if a short trade is successful or not based on next 3 candle values
    for i, (j, k, m, n) in enumerate(zip(high[x+1:], low[x+1:], close[x+1:], open_[x+1:])): 
        if len(can_list) < 3:
            if win_loss == "win" and long_short == "short":
                
                can_list.append(abs(close[x] - k))
                
            elif win_loss == "win" and long_short == "long":
                
                can_list.append(abs(j - close[x]))
    
    x+=1       
    
    return max(can_list)
    

In [1442]:
pl = df3.apply(net_result, axis='columns').to_frame()
df3=df3.join(pl)
df3.rename(columns={0:"prof_loss"}, inplace=True)

processing row 0
processing row 1000
processing row 2000
processing row 3000
processing row 4000
processing row 5000
processing row 6000
processing row 7000
processing row 8000
processing row 9000
processing row 10000
processing row 11000
processing row 12000
processing row 13000
processing row 14000
processing row 15000
processing row 16000
processing row 17000
processing row 18000


In [1443]:
df3.head()

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag,signal,break_level,sent_4h,x_vwma,pos_can,neg_can,prof_loss
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown,Sunday,6,1,2019,22:00:00,evening,unknown,,unknown,unknown,unknown,0.004876,0,0.410618,0.0,0.0
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win,Sunday,6,1,2019,22:30:00,evening,unknown,,0.367,trade,in_body,0.004876,0,0.0,0.427525,0.367
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade,no_trade,Sunday,6,1,2019,23:00:00,evening,unknown,,no_trade,no_trade,no_trade,0.004876,0,0.0,0.681112,0.0
3,2019-01-06 23:30:00,138.272,138.333,138.162,138.207,137.268,138.012,137.697588,9983,0.054335,short,0.065,0.061,0.11,win,win,Sunday,6,1,2019,23:30:00,evening,unknown,,0.112,trade,in_body,0.004876,0,0.509412,0.0,0.112
4,2019-01-07 00:00:00,138.207,138.247,138.095,138.224,137.268,138.012,137.772113,10163,0.04955,long,0.017,0.04,0.112,win,loss,Monday,7,1,2019,00:00:00,night,unknown,,0.005,trade,in_body,0.004876,0,0.0,0.451887,-0.005


Now we have an accurate profit and loss class we can create a feature that lists our detailed trade class. 

In [1444]:
def det_trade(row):
    pro_lo = row[-1]
    
    if pro_lo >= 0.08 and pro_lo < 0.16:
        return "win"
    elif pro_lo >= 0.16:
        return "strong win"
    elif pro_lo < 0 and pro_lo > -0.16:
        return "loss"
    elif pro_lo <= -0.16:
        return "strong loss"
    else:
        return 0
        

In [1445]:
detailed = df3.apply(det_trade, axis='columns').to_frame()
df3=df3.join(detailed)
df3.rename(columns={0:"det_trade"}, inplace=True)

In [1446]:
df3.head(10)

Unnamed: 0,date,open,high,low,close,S/R,SR,vwma,volume,sent_30,direction,body_size,top_wick,bottom_wick,trade_class,trade_class_two,day_of_week,day,month,year,time_24h,session,vfive,rel_vol,out_mag,signal,break_level,sent_4h,x_vwma,pos_can,neg_can,prof_loss,det_trade
0,2019-01-06 22:00:00,138.134,138.134,137.919,137.932,137.268,138.012,137.521382,1740,0.07128,short,0.202,0.0,0.215,unknown,unknown,Sunday,6,1,2019,22:00:00,evening,unknown,,unknown,unknown,unknown,0.004876,0,0.410618,0.0,0.0,0
1,2019-01-06 22:30:00,137.932,138.043,137.912,137.966,137.268,138.012,137.538475,2579,0.060658,long,0.034,0.111,0.02,win,win,Sunday,6,1,2019,22:30:00,evening,unknown,,0.367,trade,in_body,0.004876,0,0.0,0.427525,0.367,strong win
2,2019-01-06 23:00:00,137.966,138.297,137.923,138.272,137.268,138.012,137.590888,11475,0.059024,long,0.306,0.331,0.043,no_trade,no_trade,Sunday,6,1,2019,23:00:00,evening,unknown,,no_trade,no_trade,no_trade,0.004876,0,0.0,0.681112,0.0,0
3,2019-01-06 23:30:00,138.272,138.333,138.162,138.207,137.268,138.012,137.697588,9983,0.054335,short,0.065,0.061,0.11,win,win,Sunday,6,1,2019,23:30:00,evening,unknown,,0.112,trade,in_body,0.004876,0,0.509412,0.0,0.112,win
4,2019-01-07 00:00:00,138.207,138.247,138.095,138.224,137.268,138.012,137.772113,10163,0.04955,long,0.017,0.04,0.112,win,loss,Monday,7,1,2019,00:00:00,night,unknown,,0.005,trade,in_body,0.004876,0,0.0,0.451887,-0.005,loss
5,2019-01-07 00:30:00,138.224,138.281,138.145,138.15,137.268,138.012,137.854514,6684,0.042286,short,0.074,0.057,0.079,win,loss,Monday,7,1,2019,00:30:00,night,7188,0.929883,0.014,trade,in_wick,0.004876,0,0.295486,0.0,-0.014,loss
6,2019-01-07 01:00:00,138.15,138.232,138.097,138.218,137.268,138.012,137.944878,7148,0.037044,long,0.068,0.082,0.053,loss,loss,Monday,7,1,2019,01:00:00,night,8176.8,0.874181,0.063,trade,in_body,0.004876,0,0.0,0.273122,-0.063,loss
7,2019-01-07 01:30:00,138.218,138.282,138.024,138.087,137.268,138.012,138.033155,8658,0.028058,short,0.131,0.064,0.194,win,win,Monday,7,1,2019,01:30:00,night,9090.6,0.952412,0.344,trade,out_wick,0.004876,0,0.053845,0.0,0.344,strong win
8,2019-01-07 02:00:00,138.087,138.105,137.961,138.048,137.268,138.012,138.106844,5759,0.01879,short,0.039,0.018,0.126,no_trade,no_trade,Monday,7,1,2019,02:00:00,night,8527.2,0.675368,no_trade,no_trade,no_trade,0.037737,0,0.0,0.058844,0.0,0
9,2019-01-07 02:30:00,138.048,138.074,137.927,137.93,137.268,138.012,138.121546,4309,0.007308,short,0.118,0.026,0.121,no_trade,no_trade,Monday,7,1,2019,02:30:00,night,7682.4,0.560892,no_trade,no_trade,no_trade,0.037737,0,0.0,0.191546,0.0,0


In [1447]:
#A quick look at the split of our detailed trade classes. 
df3.det_trade.value_counts()

0              9990
loss           3899
strong win     2743
win            1807
strong loss     314
Name: det_trade, dtype: int64

In [1448]:
df3.columns

Index(['date', 'open', 'high', 'low', 'close', 'S/R', 'SR', 'vwma', 'volume',
       'sent_30', 'direction', 'body_size', 'top_wick', 'bottom_wick',
       'trade_class', 'trade_class_two', 'day_of_week', 'day', 'month', 'year',
       'time_24h', 'session', 'vfive', 'rel_vol', 'out_mag', 'signal',
       'break_level', 'sent_4h', 'x_vwma', 'pos_can', 'neg_can', 'prof_loss',
       'det_trade'],
      dtype='object')

In [1449]:
df3.columns.value_counts().sum()
#31 features of which we're likely to use around 20-25 in a training set

33

In [1450]:
#We'll export this csv as a draft
df3.to_csv("gj_cleandraft.csv", index=False)