# Linear Regression For Dynamic Stop-Loss/Liquidation

This test is broken into the following parts

- [Collecting data](#collecting-data)
- [Build the DataFrame](build-the-dataframe)
- [Run linear regression on the data](run-linear-regression-on-the-data)

The goal is to collect 288 5min candles (representing 24hours) spanning the following percentiles:

- **Capitulation**  (8% > n)
- **Bear**          (-4% to -8%)
- **Slightly bear** (-1% to -4%)
- **Flat**          (-1% to 1%)
- **Slightly Bull** (1% to 4%)
- **Bull**          (4% to 8%)
- **Parabolic**     (8% > n)

Then, we will run linear regression over each dataset. For each section we will find the mean of the linear regression results.
Using this data we can dynamically set the stop loss levels from the percentile amount depending on which level we are at. 
For instance, if the bot is experiencing parabolic price action and it gets the signal to short, this could be risky because it could be triggered too soon. Therefore we would use a much tighter stop-loss than if it was signaled to long instead.

These are the data samples we will be using. All are using Bitcoin in Jan/Feb of 2022:

|   | flat                     | slightly bull         | bull                   | parabolic bull         | slightly bear            | bear                     | capitulation              |
|---|--------------------------|-----------------------|------------------------|------------------------|--------------------------|--------------------------|---------------------------|
|   | (-1 to 1%)               | (2 to 4%)             | (5 to 8%).             | (>8%)                  | (-2 to -4%)              | (-5 to -8%)              | (> -8%)                   |
| 1 | 12/13 Feb 02:10 (-0.52%) | 11/12feb 19:15 (2%)   | 14/15feb 19.30 (5.33%) | 27/28feb 23:05 (15.9%) | 15/16 Jan 20:30 (-1.48%) | 09/10 Jan 14:25 -4.16%   | 20/21 Jan 15:55 (-10.79%) |
| 2 | 08/09 Feb 22:10 (0.68%)  | 09/10feb 02.50 (2.2%) | 25/26Jan 15:15 (5.07%) | 24/25feb 05:45 (12.36) | 16/17 Jan 16:30 (-2.47%) | 23/24 Jan 11:25 (-7.7%)  | 20/21 Jan 21:55 (-11.31%) |
| 3 | 12/13 Feb 01:50 (0.03%)  | 09/10feb 16:45 (3.5%) | 11/12Jan 15:00 (5.86%) | 24/25Jan 17:20 (9.8%)  | 17/18 Jan 07:35 (-1.64%) | 21/22 Jan 21:25 (-8.11%) | 23/24 Feb 05:45 (-10.11)  |
| 4 | 05/06 Feb 06:50 (-0.3%)  | 06/07feb 07:45 (3.4%) | 31/01feb 09:15 (5.09%) | 04/05feb 07:05 (9.8%)  | 13/14 Jan 00:20 (-3.33%) | 26/27 Jan 19:20 (-6.06%) | 16/17 Feb 19:55 (-07.17)  |
| 5 | 01/02 Feb 05:55 (-0.5%)  | 28/29Jan 19:00 (2.8%) | 06/07feb 15:15 (5.00%) | 04/05feb 07:05 (9.8%)  | 09/10 Jan 19:45 (-2.63%) | 26/27 Jan 06:15 (-4.62)  | 21/22 Jan 21:25 (-8.11%)  |


The first step will be to import all the relevant libraries and set any matplotlib parameters:

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import datetime
import time
import dateutil.parser
import requests
from typing import List, Dict

# from sklearn.linear_model import LinearRegression 
from scipy.stats import linregress, zscore

%matplotlib inline
plt.rcParams['figure.figsize'] = [17, 7]

# Collecting Data

The first step is to build a function that collects our data from Bitmex using the REST API. This function will be called multiple times for each section of the data so we set a sleep timer to stop any rate limits.

In [2]:
def collect_data(from_time: List[int], to_time: List[int], contract: str = "ETHUSD"):

    url = "https://www.bitmex.com/api/v1/trade/bucketed"
    candles = {'timestamp': [], 'close': []}

    # Set inputs
    data = dict()
    data['symbol'] = contract
    data['partial'] = True  # returns a candle if it is not finished yet
    data['binSize'] = "5m"
    data['count'] = 500   # how many candles we can return (500 max)
    data['reverse'] = True

    # Format times
    year, month, day, hour, minute = from_time
    data["startTime"] = datetime.datetime(year=year, month=month, day=day, hour=hour, minute=minute)
    year, month, day, hour, minute = to_time
    data["endTime"] = datetime.datetime(year=year, month=month, day=day, hour=hour, minute=minute)


    try:
        response = requests.get(url, params=data)
    except Exception as e:
        print(f"Connecton error while making GET request to {url}: {e}")
        return

    if response.status_code == 200:
        raw_candles = response.json()
    else:
        print(f"Error while making GET request to {url}: {response.status_code}")
        print(response.headers)
        return None
    
    if raw_candles is not None:
        for idx, c in enumerate(reversed(raw_candles)):
            candles['timestamp'].append(idx)
            candles["close"].append(c["close"])
    
        candles['close'] = zscore(candles['close'])


    return candles

# Build the DataFrame

Construct an empty dataFrame with the column labels ready to insert our data

In [3]:

labels = ["Capitulation", "Bear", "Slightly Bear", "Flat", "Slightly Bull", "Bull", "Parabolic"]
main_df = pd.DataFrame(columns=labels)

main_df

Unnamed: 0,Capitulation,Bear,Slightly Bear,Flat,Slightly Bull,Bull,Parabolic


Set the variables for each dataset. Each list will contain a tuple of two lists that contain the following integer values:
- year
- month
- day
- hour
- minute

In [4]:
capitulation = [([2022,1,20,14,25],[2022,1,21,14,25]), ([2022,1,20,21,55],[2022,1,21,21,55]), ([2022,2,23,5,45],[2022,2,24,5,45]),
                ([2022,2,16,19,55],[2022,2,17,19,55]), ([2022,1,21,21,25],[2022,1,22,21,25])]

bear = [([2022,1,9,14,25],[2022,1,10,14,25]), ([2022,1,23,11,25],[2022,1,24,11,25]), ([2022,1,21,21,25],[2022,1,22,21,25]),
        ([2022,1,26,19,20],[2022,1,27,19,20]), ([2022,1,26,6,15],[2022,1,27,6,15])]

slightly_bear = [([2022,1,15,20,30],[2022,1,16,20,30]), ([2022,1,16,16,30],[2022,1,17,16,30]), ([2022,1,17,7,35],[2022,1,18,7,35]),
                ([2022,1,13,0,20],[2022,1,14,0,20]), ([2022,1,9,19,45],[2022,1,10,19,45])]

flat = [([2022,2,12,2,10],[2022,2,13,2,10]), ([2022,2,8,22,10],[2022,2,9,22,10]), ([2022,2,12,1,50],[2022,2,13,1,50]),
        ([2022,2,5,6,50],[2022,2,6,6,50]), ([2022,2,1,5,55],[2022,2,2,5,55])]

slightly_bull = [([2022,2,11,19,15],[2022,2,12,19,15]), ([2022,2,9,2,50],[2022,2,10,2,50]), ([2022,2,9,16,45],[2022,2,10,16,45]),
                ([2022,2,6,7,45],[2022,2,7,7,45]), ([2022,1,28,19,0],[2022,1,29,19,0])]

bull = [([2022,2,14,19,30],[2022,2,15,19,30]), ([2022,1,25,15,15],[2022,1,26,15,15]), ([2022,1,11,15,0],[2022,1,12,15,0]),
        ([2022,1,31,9,15],[2022,2,1,9,15]), ([2022,2,6,15,15],[2022,2,7,15,15])]

parabolic = [([2022,2,27,23,5],[2022,2,28,23,5]), ([2022,2,24,5,45],[2022,2,25,5,45]), ([2022,1,24,17,20],[2022,1,25,17,20]),
             ([2022,2,4,7,5],[2022,2,5,7,5]), ([2022,2,4,7,5],[2022,2,5,7,5])]

# All data for looping
all_data_inputs = [capitulation, bear, slightly_bear, flat, slightly_bull, bull, parabolic]

# Run linear regression on the data

We call ```collect_data``` for each of the time frames in our variables and run linear regression over the data points storing the results in a list to be inserted into ```main_df```.

In [5]:
for data_set, label in zip(all_data_inputs, labels):
    results = []
    for d in data_set:
        while True:
            # Check for rate limits
            collected_data = collect_data(d[0], d[1])
            if collected_data == None:
                time.sleep(4)
            else:
                break
        # Time
        x = np.array(collected_data['timestamp'])
        # Price
        y = np.array(collected_data['close'])
        # Linear Regression
        model = linregress(x, y)
        # Slope
        slope = model.slope
        results.append(slope)
        time.sleep(2)
    main_df[label] = results

main_df

Unnamed: 0,Capitulation,Bear,Slightly Bear,Flat,Slightly Bull,Bull,Parabolic
0,-0.01128,-0.004578,0.002505,-0.000781,-0.003532,0.011255,0.010353
1,-0.010691,-0.006568,-0.010846,0.010356,0.011157,0.008755,0.011043
2,-0.009736,-0.010737,-0.009826,-0.000995,-0.002888,0.008515,0.004634
3,-0.011207,-0.001026,-0.008805,0.001317,0.008065,0.011146,0.011212
4,-0.010737,-0.00575,-0.010189,0.00586,0.007905,0.010627,0.011212


This gives us an average for each event where we can assign stop-loss percentiles.

In [6]:
df = main_df.copy()
df = df * 1000
df = df.round(2)
print(df, "\n\n")

for column_name in labels:
    print(f"{column_name} => {round(df[column_name].mean(), 2)}")

   Capitulation   Bear  Slightly Bear   Flat  Slightly Bull   Bull  Parabolic
0        -11.28  -4.58           2.50  -0.78          -3.53  11.26      10.35
1        -10.69  -6.57         -10.85  10.36          11.16   8.76      11.04
2         -9.74 -10.74          -9.83  -1.00          -2.89   8.52       4.63
3        -11.21  -1.03          -8.81   1.32           8.06  11.15      11.21
4        -10.74  -5.75         -10.19   5.86           7.91  10.63      11.21 


Capitulation => -10.73
Bear => -5.73
Slightly Bear => -7.44
Flat => 3.15
Slightly Bull => 4.14
Bull => 10.06
Parabolic => 9.69
