# Executive Summary

This full example is meant to implement how the digital twin abstract class is supposed to be useful and work. It will show what needs to be done on the model end as well as what gets used on the digital twin abstract class end.

The directory for this model is "model2"

Everything here would be what should be going into a dt.py file that would create the digital twin framework.

# 1. Data Pipelines

# Model Specific

## Types

Under types, there is definitions for all types that are to be used in the system. Copious typing should be done, and for all data types with both raw/processed there should be a type.

## Data

While there is data processing functions, these are supposed to be light data functions and any hardcode data processing should be through a data infrastructure. The functions will be of the following categories, where N is a variable number of data pulls, and brackets will show how many of any given type there should be.

1. [1+] Data Connection Method: A method for how to connect to the database. Input: None Output: Connection
2. [N] Raw Data Pulls: Pulls that directly hit the data infrastructure held database tables. Input: Connection Output: Raw Data Type
3. [N] Data Processing: Any light data processing such as doing pivots since holding a pivot table would be space ineffecient on the SQL side. Input: Raw Data Type Output: Processed Data Type
4. [1] Backtest Data Pull: A pull that combines all raw pulls and data processing pulls, plus connects to the database and returns a Backtest Data Type. Input: None Output: Backtest Data Type
5. [1] Input Data Computation: A processing function that takes backtest data and then returns starting state, the historical data, the input data (what goes into the backtest) and the output data (what is being used to validate the backtest). Input: Backtest Data Type Output: Input Data Type
6. [1] Format Inputs: One function which takes the input data and formats the input into data classes for use within cadCAD.

# Digital Twin Specific

1. A DataPipeline class should be made which fills out pull_historical_data, compute_input_data, format_input_data corresponding to the functions defined above.
2. A function of load_data_initial should be made in the DT class which specifies how to save down pulled data. This can be csv, pickle, etc. 
3. A function of load_data_prior should be made in the DT class which specifies how to pull back old data used.

In [1]:
import digital_twin
from model2.data import pull_backtest_data, create_input_data, format_inputs
from model2.types import BacktestData

class ArbitrageDataPipeline(digital_twin.DataPipeline):
    
    def pull_historical_data(self):
        return pull_backtest_data()
    
    def compute_input_data(self, data):
        return create_input_data(data)
    
    def format_input_data(self, data):
        return format_inputs(data)

class ArbitrageDigitalTwin(digital_twin.DigitalTwin):
    
    def load_data_initial(self):
        self.historical_data = self.data_pipeline.pull_historical_data()
        
        self.historical_data.pure_returns.to_csv("pure_returns.csv")
        self.historical_data.prices_data.to_csv("prices_data.csv")
        self.historical_data.trades_data.to_csv("trades_data.csv")
    
    def load_data_prior(self):
        pure_returns = pd.read_csv("pure_returns.csv", index_col = 0)
        prices_data = pd.read_csv("prices_data.csv", index_col = 0)
        trades_data = pd.read_csv("trades_data.csv", index_col = 0)
        
        self.historical_data = BacktestData(pure_returns = pure_returns_data, 
                        prices_data = prices_data,
                        trades_data = trades_data)

In [2]:
TestDataPipeline = ArbitrageDataPipeline()
arb_dt = ArbitrageDigitalTwin(name = "Test",
                    data_pipeline = TestDataPipeline)
arb_dt.load_data_initial()
arb_dt.compute_input_data()

# 2. Backtest Model

# Model Specific

## Partial State Update Blocks

- The file of psub.py holds all the partial state update blocks. There is distinction between the backtesting based blocks and the extrapolation based blocks.

## Run 

This file has a few functionalities to be built.

1. load_config_backtest: This function is meant to load up the configuration for backtesting.

In [3]:
from model2.run import load_config_backtest, run, post_processing

In [4]:
input_data = arb_dt.input_data.input_data
starting_state = arb_dt.input_data.starting_state

exp = load_config_backtest(monte_carlo_runs = 1,
            timesteps = 100,
            params = {"input_data": [input_data]},
            initial_state = starting_state)
raw = run(exp)
df = post_processing(raw)


                  ___________    ____
  ________ __ ___/ / ____/   |  / __ \
 / ___/ __` / __  / /   / /| | / / / /
/ /__/ /_/ / /_/ / /___/ ___ |/ /_/ /
\___/\__,_/\__,_/\____/_/  |_/_____/
by cadCAD

cadCAD Version: 0.4.28
Execution Mode: local_proc
Simulation Dimensions:
Entire Simulation: (Models, Unique Timesteps, Params, Total Runs, Sub-States) = (1, 100, 1, 1, 2)
     Simulation 0: (Timesteps, Params, Runs, Sub-States) = (100, 1, 1, 2)
Execution Method: local_simulations
Execution Mode: single_threaded
Total execution time: 0.06s


In [5]:
raw

Unnamed: 0,prices,trades,simulation,subset,run,substep,timestep
0,"Prices(index_price=100.0, basket_price=100.0)",,0,0,1,0,0
1,"Prices(index_price=110.87138625391076, basket_...",,0,0,1,1,1
2,"Prices(index_price=112.53772595362811, basket_...",,0,0,1,1,2
3,"Prices(index_price=117.2461197668698, basket_p...",,0,0,1,1,3
4,"Prices(index_price=131.4369055087102, basket_p...",,0,0,1,1,4
...,...,...,...,...,...,...,...
96,"Prices(index_price=297.2265210849316, basket_p...",,0,0,1,1,96
97,"Prices(index_price=302.78180557011564, basket_...",,0,0,1,1,97
98,"Prices(index_price=332.78724444458226, basket_...",,0,0,1,1,98
99,"Prices(index_price=345.38104647048516, basket_...",,0,0,1,1,99
