# Rapid Prototyping for Quantitative Investing with d6tflow

The quantitative investing research typically involves managing complex data dependencies and optimizing many tunable strategy parameters. d6tflow is an easy to use python library for rapid prototyping and experiment management to manage quantitative investing research workflows.

https://github.com/d6t/d6tflow

### Why standard backtest code is bad

The standard way of writing backtest typically involves functions, manually caching data in pickle files and parameters all over the place. This is bad because:
* Have to manually track functions, parameters, files
* Doesn't scale well as you add complexity
* Cumbersome to compare output of different models/parameters
* Difficult for others to read and audit
* Costly to productionize

### Rapid prototyping with d6tflow

Instead of writing functions, quantitative investing code is better written as a set of tasks with dependencies between them. That is your workflow should be a DAG.

The benefits of doings this are:
* Easily define tasks with dependencies and parameters
* Intelligently run workflow with dependencies/parameters
* Easily compare results of different models/parameters
* Lightweight and quick to learn yet powerful
* Code scales well and is easy to audit
* Quick to productionize

For more details see [4 Reasons Why Your Machine Learning Code is Probably Bad](https://github.com/d6t/d6t-python/blob/master/blogs/reasons-why-bad-ml-code.rst)

# Example quant trading backtest with d6tflow

In this notebook, you have a stylized example of a typical quantitative investing backtest.

In 3 simple steps you will:
1. Define the backtest workflow: get macro data, generate trading signals, get pricing data and perform backtest
2. Define multiple strategies to backtest: change investment universe and backtest period
3. Run the backtests and compare pnl performance of the different strategies

There underlying notebook is [here on github](https://github.com/d6tdev/d6tflow-binder-interactive/blob/master/example-trading.ipynb) and you can [try it out on this interactive notebook](https://mybinder.org/v2/gh/d6tdev/d6tflow-binder-interactive/master?filepath=example-trading.ipynb).

### Step 1: Define backtest workflow

With d6tflow, instead of defining functions, you define tasks which have dependencies, parameters and input/output data.

In [17]:
import d6tflow
import pandas as pd
import numpy as np
import pandas_datareader as pddr
import datetime

#************************************************************
# define workflow
#************************************************************

# get economic data
class GetDataEcon(d6tflow.tasks.TaskPqPandas):
    date_start = d6tflow.DateParameter() # define backtest parameter
    date_end = d6tflow.DateParameter() # define backtest parameter

    def run(self):
        df_gdp = pddr.DataReader('CPGDPAI', 'fred', self.date_start, self.date_end)
        self.save(df_gdp) # save task output

# generate l/s signals
@d6tflow.requires(GetDataEcon) # define dependency
class TradingSignals(d6tflow.tasks.TaskPqPandas):
    lookback_period = d6tflow.IntParameter() # define strategy parameter

    def run(self):
        df_gdp = self.inputLoad() # load input data

        # generate l/s trading signals
        df_signal = (df_gdp['CPGDPAI'].diff(self.lookback_period)>0)
        df_signal = df_signal.to_frame(name='position')
        df_signal['position'] = np.where(df_signal['position'],1,-1)

        self.save(df_signal)

# get stock prices
@d6tflow.requires(GetDataEcon)
class GetDataPx(d6tflow.tasks.TaskPqPandas):
    symbols = d6tflow.ListParameter() # define universe

    def run(self):
        df = pddr.DataReader(self.symbols, 'yahoo', self.date_start, self.date_end)
        df_rtn = df['Adj Close'].pct_change()
        self.save(df_rtn)

# run backtest
@d6tflow.requires(TradingSignals,GetDataPx)
class Backtest(d6tflow.tasks.TaskPqPandas):
    persist = ['portfolio','pnl'] # save multiple outputs

    def run(self):
        df_signal = self.input()[0].load()
        df_rtn = self.input()[1].load()

        # combine signals and returns
        df_portfolio = pd.merge_asof(df_rtn, df_signal, left_index=True, right_index=True)

        # calc pnl
        df_pnl = df_portfolio[list(self.symbols)].multiply(df_portfolio['position'],axis=0)
        df_pnl = df_pnl.add_prefix('rtn_')

        self.save({'portfolio':df_portfolio,'pnl':df_pnl})


In [18]:
# for demo purposes only: reset everything at every run
import shutil
shutil.rmtree(d6tflow.settings.dirpath, ignore_errors=True)

### Step 2: Define strategies

We will now define 3 strategies we want to backtest:  
1) base strategy  
2) change investment universe  
3) change time period  

Creating new strategies is as easy as changing or adding new parameter and d6tflow will intelligently figure out how to run the backtest.

In [19]:
#************************************************************
# define different strategies to backtest
#************************************************************

strategy1 = dict(
    date_start=datetime.date(2018,1,1),
    date_end=datetime.date(2020,1,1),
    symbols = ['CAT','WMT'],
    lookback_period = 1
    )
strategy2 = strategy1.copy()
strategy2['symbols']=['MSFT','FB'] # run another universe
strategy3 = strategy1.copy()
strategy3['date_start']= datetime.date(2019,1,1) # run another time period


### Step 3: Run backtest and compare strategy p&l

For each of the strategies, we want to run the backtest and see the strategy pnl.

d6tflow automatically executies all backtest dependencies. Before you execute the backtest, you can see what exactly gets executed. This not only makes it easy to see what is going to happen but also makes code easy to audit in code reviews.

In [20]:
d6tflow.preview(Backtest(**strategy1))  # show which tasks will be run


 ===== Luigi Execution Preview ===== 


└─--[Backtest-{'date_start': '2018-01-01', 'date_end': '2020-01-01', 'lookback_period': '1', 'symbols': '["CAT", "WMT"]'} ([94mPENDING[0m)]
   |--[TradingSignals- ([94mPENDING[0m)]
   |  └─--[GetDataEcon- ([94mPENDING[0m)]
   └─--[GetDataPx- ([94mPENDING[0m)]
      └─--[GetDataEcon- ([94mPENDING[0m)]

 ===== Luigi Execution Preview ===== 



In [21]:
d6tflow.run(Backtest(**strategy1)) # run backtest including dependencies


===== Luigi Execution Summary =====

Scheduled 4 tasks of which:
* 4 ran successfully:
    - 1 Backtest(date_start=2018-01-01, date_end=2020-01-01, lookback_period=1, symbols=["CAT", "WMT"])
    - 1 GetDataEcon(date_start=2018-01-01, date_end=2020-01-01)
    - 1 GetDataPx(date_start=2018-01-01, date_end=2020-01-01, symbols=["CAT", "WMT"])
    - 1 TradingSignals(date_start=2018-01-01, date_end=2020-01-01, lookback_period=1)

This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====



LuigiRunResult(status=<LuigiStatusCode.SUCCESS: (':)', 'there were no failed tasks or missing dependencies')>,worker=<luigi.worker.Worker object at 0x0000020FD06BFFC8>,scheduling_succeeded=True)

Having run the base strategy, we can now run the additional strategies. The beauty is that all previously computed data that can be reused, will be reused and do not have to be recomputed. For the strategy with the different universe, only 2 tasks have to be re-run. For the strategy with the updated time period, d6tflow intelligently figures out that all tasks have to be rerun.

In [22]:
d6tflow.run(Backtest(**strategy2)) # change universe
d6tflow.run(Backtest(**strategy3)) # change time period


===== Luigi Execution Summary =====

Scheduled 4 tasks of which:
* 2 complete ones were encountered:
    - 1 GetDataEcon(date_start=2018-01-01, date_end=2020-01-01)
    - 1 TradingSignals(date_start=2018-01-01, date_end=2020-01-01, lookback_period=1)
* 2 ran successfully:
    - 1 Backtest(date_start=2018-01-01, date_end=2020-01-01, lookback_period=1, symbols=["MSFT", "FB"])
    - 1 GetDataPx(date_start=2018-01-01, date_end=2020-01-01, symbols=["MSFT", "FB"])

This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====


===== Luigi Execution Summary =====

Scheduled 4 tasks of which:
* 4 ran successfully:
    - 1 Backtest(date_start=2019-01-01, date_end=2020-01-01, lookback_period=1, symbols=["CAT", "WMT"])
    - 1 GetDataEcon(date_start=2019-01-01, date_end=2020-01-01)
    - 1 GetDataPx(date_start=2019-01-01, date_end=2020-01-01, symbols=["CAT", "WMT"])
    - 1 TradingSignals(date_start=2019-01-01, date_end=2020-01-01, lookba

LuigiRunResult(status=<LuigiStatusCode.SUCCESS: (':)', 'there were no failed tasks or missing dependencies')>,worker=<luigi.worker.Worker object at 0x0000020FD073EC48>,scheduling_succeeded=True)

Now we can easily compare the results of the different strategies. We load the backtest output and compute strategy pnl.

In [23]:
for istrat, strategy in enumerate([strategy1,strategy2,strategy3]):
    df_pnl1 = Backtest(**strategy).output()['pnl'].load() # load task output
    print(f'pnl strategy #{istrat+1}:', df_pnl1.sum().sum().round(3))

pnl strategy #1: -0.029
pnl strategy #2: -0.16
pnl strategy #3: -0.449


# Bonus: More Rapid Prototyping

After the initial backtest, typically team members and other stakeholders have questions like "what if you did XYZ?". This normally means introducing a new parameter and/or updating the tasks. d6tflow makes it easy  to accomodate such questions.

1) new parameters: simply add the parameter and run the backtest, d6tflow will intelligently figure out what to do
2) updating tasks: edit the code, reset the task and d6tflow will automatically recompute all the downstream dependencies

Lets say we want to change the trading signals tasks. You reset that task and only 2 of 5 steps need to be recomputed, which is the task itself and 1 downstream dependency, in this case the backtest task.

In [24]:
TradingSignals(**strategy1).reset(confirm=False)
d6tflow.preview(Backtest(**strategy2))


 ===== Luigi Execution Preview ===== 


└─--[Backtest-{'date_start': '2018-01-01', 'date_end': '2020-01-01', 'lookback_period': '1', 'symbols': '["MSFT", "FB"]'} ([94mPENDING[0m)]
   |--[TradingSignals- ([94mPENDING[0m)]
   |  └─--[GetDataEcon- ([92mCOMPLETE[0m)]
   └─--[GetDataPx- ([92mCOMPLETE[0m)]
      └─--[GetDataEcon- ([92mCOMPLETE[0m)]

 ===== Luigi Execution Preview ===== 



# Next steps: Transition code to d6tflow

Updating your code to work with d6tflow is typically easy, just take your old functions and wrap them into a d6tflow workflow.

See https://d6tflow.readthedocs.io/en/latest/transition.html

# Reference

There underlying notebook is [here on github](https://github.com/d6tdev/d6tflow-binder-interactive/blob/master/example-trading.ipynb) and you can [try it out on this interactive notebook](https://mybinder.org/v2/gh/d6tdev/d6tflow-binder-interactive/master?filepath=example-trading.ipynb).

# Disclaimer

These materials, and any other information or data conveyed in connection with these materials, is intended for informational purposes only. Under no circumstances are these materials, or any information or data conveyed in connection with such report, to be considered an offer or solicitation of an offer to buy or sell any securities of any company. Nor may these materials, or any information or data conveyed in connection with such report, be relied on in any manner as legal, tax or investment advice. The information and data is not intended to be used as the primary basis of investment decisions and nothing contained herein or conveyed in connection therewith is, or is intended to be, predictive of the movement of the market prices of the securities of the applicable company or companies. The facts and opinions presented are those of the author only and not official opinions of any financial instituion.
