This blog post is the second in line of the series of posts discussing on how to create an AI (Artificial intelligent) trading bot. If you want to revisit the first blog post go [here]().
 We try to make the individual blog post to be self content, for this reason we will give you a recap of the problem we are solving.  We are creating smart trading bot using AI, we will call the bot `chambot`.  The steps involved are in the  flow diagram below 

![](../images/flow_diagram.png)


* We already discussed how to fetch data from Binance in [here](https://chambox.github.io/1_AI_trading_bot_fetch_data/) (1.), this was the first in the series
* This blog is about point 2., creating a target variable. 

We will cast our problem into a trinomial setting, with the three states being the price goes up ($u$), stays constant ($c$) or goes down ($d$). Note that you will want to buy at low prices and sell at a high price. That is, buy at $d$, do nothing at $c$ and sell at $u$. 

Correctly predicting these states $u,c,d$ will result in a very profitable bot. The aim of this blog is then to show you how to create the target variable $Y$, that has the three states $u,c$ and $d$.  Let's get started by loading all neccesary libraries.

### Libraries


In [1]:
# plotly 
import plotly.graph_objects as go 
import plotly.express as px
import plotly.offline as py_offline

# numpy and pandas 
import numpy as np 
import pandas as pd

# other libraries
from re import L
from statistics import mode
from xml.etree.ElementPath import prepare_predicate
from scipy.signal import find_peaks,peak_prominences
import ta.trend as td

### Get the data 
In blog post 1., we fetched and downloaded the Ethereum data. We used the symbol `ETHUSDT` (Ethereum price with respect to a stable coin USDT) at `15m` (15 minutes) time steps.  We will read that data in here. I trust that when you download the `ETHUSDT` data you saved it in location that you remember. 

In [2]:
# get the symbol
symbol = 'ETHUSDT'
#get time steps 
time = '15m'

# read the raw data
 # edit this line with your path
path = f'{symbol}-{time}-data.csv' 
raw_data = pd.read_csv(path)
raw_data.set_index('timestamp',inplace=True)

# visualise the first few rows and columns
print(raw_data.iloc[:,0:4].head().to_markdown())

| timestamp           |   open |   high |    low |   close |
|:--------------------|-------:|-------:|-------:|--------:|
| 2017-08-17 04:00:00 | 301.13 | 301.13 | 298    |  298    |
| 2017-08-17 04:15:00 | 298    | 300.8  | 298    |  299.39 |
| 2017-08-17 04:30:00 | 299.39 | 300.79 | 299.39 |  299.6  |
| 2017-08-17 04:45:00 | 299.6  | 302.57 | 299.6  |  301.61 |
| 2017-08-17 05:00:00 | 301.61 | 302.57 | 300.95 |  302.01 |


There are multiple price columns, open, high, low, close. We will use the closing price to compute  $u,c$ and $d$. Instead of working with the raw closing price values, we will apply a very basic smoothing technique called the moving average. I.e., the smooth version of todays price is the average of the last $n$ days price. We refer to $n$ as the period, $n=7$ has proven to be very effective so we will use it.

In [3]:
# get the closing price and compute 7 periods moving average (MA)
price = raw_data[['close']].close.rolling(7).mean()
# we then drop the values for the first 7 days since we could not 
# compute the MA 
index_to_drop = price.index[0:6]
price = price.drop(index_to_drop)

Let's then visualise the data for the first 500 time steps.

In [4]:
dt = price.head(500)
fig = go.Figure()
trace1 = go.Scatter(y=dt,
                    x=dt.index,
                    name='closing price')

trace2 = go.Scatter(y=(dt[dt == dt.max()]),
                    x=(dt[dt == dt.max()]).index,
                    name='Max price')
trace3 = go.Scatter(y=(dt[dt == dt.min()]),
                    x=(dt[dt == dt.min()]).index,
                    name='Min price')
data = [trace1,trace2,trace3]
# py_offline.plot(data, filename='basic-line', include_plotlyjs=False, output_type='div')
py_offline.iplot(data)

We see that in the first 500 steps, the difference between the minimum and the maximum price is approximately 55 dollars in magnitude. In the graph we have also shown the max and the min values of the price. In this time horizon (500 time steps), if we had a smart bot that could correctly predict the minimum and maximum value of the price of Ethereum, we will make a profit of about 55 dollars for each Ethereum.

### Detrending the data 

Let's look at the data for a longer time horizon.

In [5]:
dt = price.head(10000)
fig = go.Figure()
trace1 = go.Scatter(y=dt,
                    x=dt.index,
                    name='closing price')
data = [trace1]
# py_offline.plot(data, filename='basic-line', include_plotlyjs=False, output_type='div')
py_offline.iplot(data)

Notice the price of Ethereum starts having an upward trend after September 17. Our trading bot is not interested in trends. The chambot is only interested in the volatily of the price. We want to buy at a low an sell at a high as frequent as possible and make profits for each transaction (after excluding the transaction fees). The next chunk of code will detrend the data. We are using the DPO (Detrended Price Oscillator) indicator to detrend the data. In our example, we use a DPO of period 20, i.e., the  DPO transformed price value now uses the price of the past 20 time steps. As we saw with the moving average, this means that the first 20 values after DPO transformation are missing 

In [6]:
price_dpo = td.DPOIndicator(price,20).dpo()
price_dpo = td.DPOIndicator(price,20).dpo()
print(price_dpo.head(25).to_markdown())


| timestamp           |      dpo_20 |
|:--------------------|------------:|
| 2017-08-17 05:30:00 | nan         |
| 2017-08-17 05:45:00 | nan         |
| 2017-08-17 06:00:00 | nan         |
| 2017-08-17 06:15:00 | nan         |
| 2017-08-17 06:30:00 | nan         |
| 2017-08-17 06:45:00 | nan         |
| 2017-08-17 07:00:00 | nan         |
| 2017-08-17 07:15:00 | nan         |
| 2017-08-17 07:30:00 | nan         |
| 2017-08-17 07:45:00 | nan         |
| 2017-08-17 08:00:00 | nan         |
| 2017-08-17 08:15:00 | nan         |
| 2017-08-17 08:30:00 | nan         |
| 2017-08-17 08:45:00 | nan         |
| 2017-08-17 09:00:00 | nan         |
| 2017-08-17 09:15:00 | nan         |
| 2017-08-17 09:30:00 | nan         |
| 2017-08-17 09:45:00 | nan         |
| 2017-08-17 10:00:00 | nan         |
| 2017-08-17 10:15:00 |  -2.05914   |
| 2017-08-17 10:30:00 |  -1.77843   |
| 2017-08-17 10:45:00 |  -1.45386   |
| 2017-08-17 11:00:00 |  -0.964357  |
| 2017-08-17 11:15:00 |  -0.283071  |
| 2017-08-17

We now plot the DPO  transformed prices.

In [7]:
dt = price_dpo.iloc[0:100]
fig = go.Figure()
trace = go.Scatter(y=dt,x=dt.index,name='DPO price')
data = [trace]
# py_offline.plot(data, filename='basic-line', include_plotlyjs=False, output_type='div')
py_offline.iplot(data)

Lets not loose track of our goal, we want to identify peaks ($u$), planes ($c$) and valleys ($d$). 

### Compute peaks and valleys
Here we use `scipy`'s `find_peaks` method. We will then assign  the value `1` for peaks and `2` for values.

In [8]:
# Compute peaks and valleys
peaks, _ = find_peaks(price_dpo)
valleys, _ = find_peaks(price_dpo*-1)


# Assign 2 to  valleys and call the new variable 'valley_status' 
valleys_val = pd.Series(np.repeat(2,len(valleys)))
valleys_val.index = price.index[valleys] 
valleys_val.name = 'valley_status'



# Assign 1 to  valleys and call the new variable 'valley_status' 
peaks_val = pd.Series(np.repeat(1,len(peaks)))
peaks_val.index = price.index[peaks] 
peaks_val.name = 'peak_status'

### Get prominences

The prominnce/relative height   measures the height of a mountain or hill's summit relative to the lowest contour line encircling it but containing no higher summit within it.  This will tells us if the $u$ is high enough to sell and the $d$  is low enough to buy.

In [9]:
# get prominnces  for each 
# also referred to relative height
# measures the height of a mountain or 
# hill's summit relative to the lowest contour 
# line encircling it but containing no higher summit within it.
prominences_peaks = peak_prominences(price_dpo, peaks)[0]
prominences_valleys = peak_prominences(price_dpo*-1, valleys)[0]

# assign original IDs to the prominence  valley values
p_valley_val = pd.Series(prominences_valleys)
p_valley_val.index = price.index[valleys] 
p_valley_val.name = 'value_prominence'


# assign original IDs to the prominence  peak values
p_peak_val = pd.Series(prominences_peaks)
p_peak_val.index = price.index[peaks] 
p_peak_val.name = 'peak_prominence'

Combine everything together into a single dataset. We will also create the final trinomial variable that has values 0,1 or 2.

In [11]:
df = pd.DataFrame(price).join(price_dpo).join(peaks_val).join(valleys_val).join(p_valley_val).join(p_peak_val)
df.fillna(0,inplace=True)
df.head(100)
# df.columns
df['state'] = df.peak_status+df.valley_status
df.head()

Unnamed: 0_level_0,close,dpo_20,peak_status,valley_status,value_prominence,peak_prominence,state
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2017-08-17 05:30:00,300.871429,0.0,0.0,0.0,0.0,0.0,0.0
2017-08-17 05:45:00,301.6,0.0,0.0,0.0,0.0,0.0,0.0
2017-08-17 06:00:00,302.06,0.0,0.0,0.0,0.0,0.0,0.0
2017-08-17 06:15:00,302.561429,0.0,0.0,0.0,0.0,0.0,0.0
2017-08-17 06:30:00,302.847143,0.0,0.0,0.0,0.0,0.0,0.0
