# Evaluating a Strategy using Bootstrap Metrics

Bootstrap metrics can help us to more thoroughly evaluate a trading strategy, as we will see below.

[In the last notebook](https://pybroker.com/en/latest/notebooks/2.%20Backtesting%20a%20Strategy.html), we wrote a trading strategy that we backtested. Here is the implementation again:

In [1]:
import pybroker
from pybroker import Strategy, StrategyConfig, YFinance

pybroker.enable_data_source_cache('my_strategy')

def buy_low(ctx):
    if ctx.long_pos():
        return
    if len(ctx.low) >= 2 and ctx.close[-1] < ctx.low[-2]:
        ctx.buy_shares = ctx.calc_target_shares(0.25)
        ctx.buy_limit_price = ctx.close[-1] - 0.01
        ctx.hold_bars = 3
        
def short_high(ctx):
    if ctx.short_pos():
        return
    if len(ctx.high) >= 2 and ctx.close[-1] > ctx.high[-2]:
        ctx.sell_shares = 100
        ctx.hold_bars = 2

And as before, we configure a new [Strategy](https://pybroker.com/en/latest/reference/pybroker.strategy.html#pybroker.strategy.Strategy) instance:

In [2]:
config = StrategyConfig(initial_cash=500_000, bootstrap_sample_size=100)
strategy = Strategy(YFinance(), '3/1/2017', '3/1/2022', config)

This time, the ```Strategy``` is configured with a [bootstrap_sample_size](https://pybroker.com/en/latest/reference/pybroker.config.html#pybroker.config.StrategyConfig.bootstrap_sample_size) of ```100``` (the default is ```1_000```). Next, the ```Strategy``` is backtested again, but now with bootstrap metrics enabled by default:

In [3]:
strategy.add_execution(buy_low, ['AAPL', 'MSFT'])
strategy.add_execution(short_high, ['TSLA'])
result = strategy.backtest()
result.metrics_df

Backtesting: 2017-03-01 00:00:00 to 2022-03-01 00:00:00

Loaded cached bar data.

Test split: 2017-03-01 05:00:00 to 2022-02-28 05:00:00


100% (1259 of 1259) |####################| Elapsed Time: 0:00:00 Time:  0:00:00



Calculating bootstrap metrics: sample_size=100, samples=10000...
Calculated bootstrap metrics: 0:00:02 

Finished backtest: 0:00:05


Unnamed: 0,name,value
0,trade_count,777.0
1,initial_value,500000.0
2,end_value,693111.87
3,total_profit,403511.08
4,total_loss,-237770.88
5,max_drawdown,-56721.6
6,max_drawdown_pct,-7.908429
7,win_rate,0.525773
8,loss_rate,0.474227
9,avg_profit,1977.99549


In [4]:
result.orders

Unnamed: 0_level_0,date,symbol,order_type,limit_price,fill_price,shares,pnl,pnl %
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,2017-03-03 05:00:00,MSFT,buy,64.00,63.95,1952,0.00,0.000000
2,2017-03-08 05:00:00,MSFT,sell,,64.67,1952,1405.44,95.601009
3,2017-03-14 04:00:00,MSFT,buy,64.70,64.35,1937,0.00,0.000000
4,2017-03-15 04:00:00,TSLA,sell,,17.18,100,0.00,0.000000
5,2017-03-17 04:00:00,MSFT,sell,,64.96,1937,1181.57,94.788734
...,...,...,...,...,...,...,...,...
773,2022-02-18 05:00:00,MSFT,buy,290.72,290.08,610,0.00,0.000000
774,2022-02-18 05:00:00,AAPL,buy,168.87,168.36,1051,0.00,0.000000
775,2022-02-24 05:00:00,MSFT,sell,,283.34,610,-4111.40,-93.552747
776,2022-02-24 05:00:00,AAPL,sell,,157.43,1051,-11487.43,-98.648073


When looking at ```initial_value``` and ```end_value``` above, it appears that we have successfully implemented a profitable trading strategy on our first attempt! But how can we be so sure that those results are repeatable and were not just a fluke? We can gain more confidence in our results by computing metrics using the boostrap method.

The basic idea behind the bootstrap method is to repeatedly compute a metric on random samples drawn from the backtest results. Then the metric is computed on each random sample and the average is taken. By computing the metric on thousands of random samples, a more robust and accurate estimate of the metric is obtained.

## Confidence Intervals

**PyBroker** uses the bootstrap method to compute confidence intervals for the [Profit Factor](https://pybroker.com/en/latest/reference/pybroker.eval.html#pybroker.eval.EvalMetrics.profit_factor) and [Sharpe Ratio](https://en.wikipedia.org/wiki/Sharpe_ratio):

In [5]:
result.bootstrap.conf_intervals

Unnamed: 0_level_0,Unnamed: 1_level_0,lower,upper
name,conf,Unnamed: 2_level_1,Unnamed: 3_level_1
Profit Factor,97.5%,-0.767625,1.044637
Profit Factor,95%,-0.625429,0.884855
Profit Factor,90%,-0.468109,0.699156
Sharpe Ratio,97.5%,-0.15718,0.23962
Sharpe Ratio,95%,-0.126932,0.209973
Sharpe Ratio,90%,-0.090482,0.173737


Specifically, **PyBroker** used the [bias corrected and accelerated (BCa) bootstrap method](https://blogs.sas.com/content/iml/2017/07/12/bootstrap-bca-interval.html) to compute the confidence intervals above. The returns used for the bootstrap were sampled per-bar rather than per-trade to maximize the information captured by those metrics.

We can see that the lower bounds of both the Profit Factor and Sharpe Ratio are both negative. This is not a good sign, and shows that our strategy is not reliably profitable!

## Maximum Drawdown

Still, we continue by looking at bootstrap metrics for maximum drawdown: 

In [6]:
result.bootstrap.drawdown_conf

Unnamed: 0_level_0,amount,percent
conf,Unnamed: 1_level_1,Unnamed: 2_level_1
99.9%,-290731.85,-35.110355
99%,-225715.88,-28.387
95%,-175866.86,-23.066928
90%,-149356.73,-20.032371


Shown above are the probabilities that the drawdown will not exceed the computed values, which are given in cash amounts and percentages of portfolio equity. Like the Profit Factor and Sharpe Ratio, these confidence levels were computed using per-bar returns obtained from the backtest's out-of-sample results.

The bootstrapped max drawdown of ```-35.1%``` at a ```99.9%``` confidence level is much worse than the ```-7.9%``` we saw in our original results!

Hopefully, this example gives you a sense of the importance of using randomized tests to analyze (and scrutinize) the performance of your trading strategy.

[The next notebook will go over how to use ranking and position sizing in your trading strategies](https://pybroker.com/en/latest/notebooks/4.%20Ranking%20and%20Position%20Sizing.html).