# Importing libraries

In [1]:
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
import pandas as pd

from useful_functions import *  # our own set of functions

# Importing data

In [2]:
lin_reg = pd.Series(pd.read_pickle('output/lin_reg.pkl'), name='Linear regression')
log_reg = pd.Series(pd.read_pickle('output/log_reg.pkl'), name='Logistic regression')
rf = pd.Series(pd.read_pickle('output/RandomForestModel.pkl'), name='Random Forest')

returns = pd.read_pickle('input/stock_returns.pkl')
df = pd.read_pickle('input/df.pkl')

# Simple stratiegies - benchmark

To compare results of our strategies we calculate equity lines for two simple strategies:
- naive strategy,
- buy and hold.

## Naive strategy ($\hat{y}_{t+1}=y_t$)

As the estimate of stock price movement direction for the next day we will use the observed value for previous day.

In [3]:
naive = pd.Series(df['Signal'].shift(1).dropna(), name='Naive strategy')

## Buy and hold

In this simple strategy we buy stock on the first day and wait.

In [4]:
bnh = pd.Series(np.repeat('up', len(df.index)), index=df.index, name='Buy and hold')

## Results EDA

In [5]:
results = pd.concat([lin_reg, log_reg, rf, naive, bnh, df['Signal']], axis=1).dropna()
results = results[['Buy and hold', 'Naive strategy', 'Signal', 'Linear regression', 'Random Forest', 'Logistic regression']]

In [6]:
fig = make_subplots(rows=2, cols=3, subplot_titles=results.columns, vertical_spacing=0.05,
                    specs=[[{'type': 'pie'}, {'type': 'pie'}, {'type': 'pie'}],
                           [{'type': 'pie'}, {'type': 'pie'}, {'type': 'pie'}]])

for i in range(len(results.columns)):
    vc = results[results.columns[i]].value_counts()
    labels = vc.index
    values = vc.values

    fig.add_trace(go.Pie(labels=labels, values=values, textinfo='percent+label', hole=0.3,
                         marker=dict(line=dict(color='#000000', width=1.5))),
                  row=(i // 3) + 1, col=(i % 3) + 1)

fig.update_layout(height=800, width=1000, title_text="SIGNAL FROM EACH STRATEGY", template="plotly_dark",)
fig.show()


Conclusion: Our conometric strategies tend to overestimate stock performance (less 'down' signals). RF approach maintains classes proportions, with the exception for 'same' class.

# Performance

## Performance metrics

Because we did not want to make a models that minimize prediction error, but model that performs the best on the market, we did not consider any classification metrics as accuracy, recall or precision. Instead, both in hyperparameter tuning and results decription we use **Information ratio\*\***. Which is a metric that describes the model performance taking into account all simpler metrics that will be described below.

### Simple metrics

All metrics were calculated using either simple returns or the equity line calculated using the given formula:

$X(t) = K \cdot \prod_{t=1}^{T} (1 + r_t)$ 

where: \
$X(t)$ - the portfolio value at moment $t$, \
$K$ - the invested capital (here $K = 1$), \
$T$ - the number of trading days in the testing period, \
$r_t$ - the strategy returns on period $t$. 


**Annualized rate of return**

$ARC\% = ((\prod_{t=1}^{T} (1 + r_t))^{\frac{T}{252}} - 1) \cdot 100\%$


**Annualized standard deviation**

$ASD\% = \sqrt{\frac{252}{T} \sum_{t=1}^{T} (r_t - \bar{r})^2} \cdot 100\%$

where $\bar{r} = \frac{1}{T} \cdot \sum_{t=1}^{T} r_t$.


**Maximum drawdown**

$MDD\% = \max_{\tau \in (0, T)}[\max_{t \in (\tau, T)} \frac{X(\tau)-X(t)}{X(\tau)}] \cdot 100\%$


**Information ratio\***

A simplified version of the Sharpe Ratio not including a risk-free rate. The measure of pay-off between returns and volatility.

$IR^{*} = \frac{ARC\%}{ASD\%}$

**Information ratio\*\***

A modified version of information ratio* including the maximum drawdown.

$IR^{**} = \frac{ARC\% \cdot |ARC\%|}{ASD\% \cdot MDD\%}$

Firstly, lets look at our equity lines.

In [7]:
eqlines = pd.DataFrame(columns=results.columns.drop('Signal'), index=results.index)

for column in eqlines.columns:
    eqlines[column] = get_eqline(returns[results.index], results[column])

In [8]:
fig_equity_curve_strategy = go.Figure()

for column in eqlines.columns:
    fig_equity_curve_strategy.add_trace(
        go.Scatter(x=eqlines.index, y=eqlines[column], name=column),
    )

fig_equity_curve_strategy.update_layout(
    title={
        'text': "Equity Lines of our Strategies",
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Date",
    yaxis_title="Strategy Value",
    legend_title="Data",
    template="plotly_dark",
    height = 500,
    width = 1000
)

fig_equity_curve_strategy.show()

We can see that linear and logistic regression have very similar quity lines.

In [9]:
perf_metrics = pd.DataFrame(index=['ARC%', 'ASD%', 'MDD%', 'IR*', 'IR**'], columns=results.columns.drop('Signal'))

In [10]:
for column in perf_metrics.columns:
    perf_metrics[column]['ARC%'] = PerformanceMetrics(None, None, None, None).ARC(eqlines[column])
    perf_metrics[column]['ASD%'] = PerformanceMetrics(None, None, None, None).ASD(eqlines[column])
    perf_metrics[column]['MDD%'] = PerformanceMetrics(None, None, None, None).MaximumDrawdown(eqlines[column])
    perf_metrics[column]['IR*'] = PerformanceMetrics(None, None, None, None).IR1(eqlines[column])
    perf_metrics[column]['IR**'] = PerformanceMetrics(None, None, None, None).IR2(eqlines[column])

In [11]:
perf_metrics.style.apply(highlight_values_by_index, axis=1).format(precision=4).set_table_styles(TABLE_STYLES)

Unnamed: 0,Buy and hold,Naive strategy,Linear regression,Random Forest,Logistic regression
ARC%,25.9154,10.6558,28.6183,23.4401,10.044
ASD%,28.4784,19.1953,26.6185,21.0332,21.5673
MDD%,38.7297,37.7155,33.7388,36.1604,42.1071
IR*,0.91,0.5551,1.0751,1.1144,0.4657
IR**,0.6089,0.1568,0.912,0.7224,0.1111


### Observations:
- The best values are highlighted in darkgreen.
- In terms of raw returns (ARC%), suprisingly, the linear regression strategy is the best, it is followed by the buy and hold and random forest strategies. The naive strategy has the worst returns.
- Even though the linear regression strategy has the best returns, it also has the second highest risk (ASD%). The most risky strategy is buy and hold and the least risky strategy is the naive strategy.
- Other measure of risk is MDD%, it is realitvely similar for all strategies with the exception for linear regression, for which it much lower.
- IR\* is the highest for the random forest and the linear regression, both are noticeably better than buy and hold strategy. The naive strategy and the logistic regression strategy performed worse than buy and hold in terms of IR\*.
- **IR\*\***, which is our main metric, is the highest for the linear regression, which surpass all other strategies by far. The second best strategy is the random forest strategy, which also has significantly higher IR\*\* than buy and hold. Again, the naive strategy and the logistic regression strategy performed worse than buy and hold in terms of IR\*\*.

### Conclusions:
To our surprise, linear regression performs the best among all other models in regards to most metrics (except for ASD% and IR\*). Second best strategy is the random forest strategy, which also outperforms buy and hold in regards to most merics (except for ARC%). The logistic regression and naive strategies are ussually worse than other strategies.