In [1]:
from src.pair_trade import PairTradeAnalysis
from src.pair_trading_dash_board import PairTradingDashBoard
from src.backtesting import BackTesting
from src.utils.plot_utils import plot_regression_results
from src.stock_regression_dash_board import StockRegressionDashBoard
from src.most_explain_dash_board import  MostExplainDashBoard

#### **Note:** Due to GitHub's 25MB file size limit, I couldn't upload the full data. Instead, a text file in the "Data" folder provides a Dropbox link to access the full data. To run the code, please download the data from Dropbox link, and make sure the downloaded data in the "Data" foler.

# Project 1: Pair Trading

## Part 1: How to Select Pairs? Steps Below

1. **Correlation Check**: Filter pairs within the same sector using a correlation threshold of 0.9.

2. **Cointegration Test**: Select pairs with strong relationships by conducting cointegration tests (significance level = 0.05).

3. **Static Hedge Ratio and Spread Estimation**: 
   - Apply OLS to estimate the hedge ratio statically, and then calculate the spread between the two cointegrated pairs.
   - To confirm mean reversion to zero:
     - Use ADF p-value < 0.05 to confirm stationarity.
     - Use KPSS p-value > 0.05 with regression='c' to confirm stationarity around a constant mean (no trend).

4. **Dynamic Hedge Ratio and Spread Estimation**:
   - Apply the Kalman filter to estimate the hedge ratio dynamically over time, and then calculate dynamically spread between the two cointegrated pairs.
   - To confirm mean reversion to zero:
     - Use ADF p-value < 0.05 to confirm stationarity.
     - Use KPSS p-value > 0.05 with regression='c' to confirm stationarity around a constant mean (no trend).
   - Add a **burn-in** period (65 days, i.e., 3 months trading days): By allowing this "burn-in" period, we give the Kalman filter time to converge on stable alpha and beta values, producing a more accurate and comparable spread estimate.

5. **Note**: Once we find a pair, we use the ticker with **higher volatility** as the dependent variable **(Y)**, as it tends to have larger fluctuations that may be mean-reverting around the expected value derived from the less volatile asset (X).


In [2]:
# http://127.0.0.1:3033/
def Project_1_DashBoard():
    # Project1 step 1/2: Pair Trading Strategy
    PairTradingDashBoard(PairTradeAnalysis()).run_app() 


In [3]:
'''
    See saved webpage PDF in Project_1_HighCorrelation.pdf
'''

# Note: this step may take 2-3 mins to run, drag to the bottom in the below output box to see the progression bar. 
Project_1_DashBoard() 

Analysis started


Pairs Trading Workflow: 100%|█████████████████| 100/100 [05:19<00:00,  3.20s/it]


Analysis completed successfully


## Part 2: Backtesting Selected Pairs

**Key Idea**  
- **Entry Condition (entry threshold)**:  
  - Enter a **long** position when the spread deviates **below** a certain number of standard deviations from its mean (e.g., μ− entry_thres * σ).  
  - Enter a **short** position when the spread deviates **above** a certain number of standard deviations from its mean (e.g., μ + entry_thres * σ).  

- **Exit Condition (exit threshold)**:  
  - **Exit** the position when the spread **reverts** to the mean or crosses zero (μ +/- exit_thres * σ).

---

#### Steps:

1. **Identify a Pair from Part I and Visualize:**  
   - (1) Plot price  
   - (2) Plot OLS spread  
   - (3) Plot Kalman spread  

2. **Generate Trading Signals/Positions:**  
   - For fixed inputs `entry_thres` and `exit_thres`, generate trading signals / trading positions (long/short/no).  
   - Consider:
     - Trading duration (default: `duration_cap=30` days)  
     - Minimum holding days (default: `min_hold_days=3` days)  
     - Rolling window for mean and standard deviation calculation (default: 130 days)  

3. **Backtesting Returns Using the Signals:**  
   - Backtest signals DataFrame to get strategy returns.
     - (1) Calculate daily returns for each ticker (prices percent change)  
     - (2) Calculate daily strategy returns based on trading signals/positions  
     - (3) Consider transaction cost (default: `transaction_cost = 0.0001` per trade)  

4. **Calculate Comprehensive Performance Metrics:**  
   - Define **eight** performance metrics to assess strategy effectiveness:
     - Annual Return (assuming 252 trading days)  
     - Annual Volatility  
     - Annualized Sharpe Ratio  
     - Annualized Sortino Ratio  
     - Gain-to-Pain Ratio  
     - Maximum Drawdown  
     - Drawdown-to-Volatility Ratio  
     - Profit-to-Drawdown Ratio (Calmar Ratio)  

5. **Bayesian Optimization for Optimized `entry_thres` and `exit_thres`:**  
   - **Define and Optimize Objective Function**  
     - Develop an objective function tailored to each metric, e.g., maximize Sharpe Ratio, minimize Max Drawdown, etc.  
   - **Bayesian Optimization**  
     - Find optimal `entry_thres` and `exit_thres` that maximize/minimize the metric.  
   - **Apply Optimized Thresholds**  
     - Apply these thresholds to evaluate strategy performance using the eight metrics.

6. **Sensitivity Check:**  
   - **6.1.** Set a fixed entry threshold (e.g., `entry_thres = 2`), vary `exit_thres`, and plot the Sharpe ratio.  
   - **6.2.** Set a fixed exit threshold (e.g., `exit_thres = 1`), vary `entry_thres`, and plot the Sharpe ratio.


In [2]:
#http://127.0.0.1:3034
def Project_1_BackTest(ticker1='WVE', ticker2='XERS'):
    bt = BackTesting()
    bt.backtesting_dashboard(ticker1, ticker2)  # This would start the Dash server and display the webpage with plots and tables.


In [3]:
'''
    See saved webpage PDF in Project_1_BackTest.pdf
'''

Project_1_BackTest(ticker1='WVE', ticker2='XERS')


Optimizing for sharpe_ratio...
Optimizing for sortino_ratio...
Optimizing for max_drawdown...



# Project 2: Multi-Variate Index



## Main Part: Index Performance Analysis (Regression-Based Insights)

1. **Index Selection**  
   - A dropdown menu for the user to choose one index—S&P 500, Russell 2000, or Nasdaq 100—which will serve as the dependent variable **Y**.

2. **Stock Selection Based on Index**  
   - Allow the user to select up to ten stock tickers corresponding to the chosen index:
     - Dynamically update the available stock options based on the selected index.
     - Display a reminder of the remaining stock selections (maximum of 10).
     - If a selected stock has fewer than 252 trading days of data, prompt the user to select alternative stocks.

3. **Run Regression**  
   - Ensure both stock and index data are on a daily returns scale.
   - Display the regression tables, including coefficients (10 beta values + 1 constant).
   - Plot the predicted index returns against the actual index returns.



In [6]:
#http://127.0.0.1:3035
def Project2():
    # Project2: Regression Analysis
    StockRegressionDashBoard().run_app()
    # MostExplainDashBoard().run_app()


In [7]:
'''
    See saved webpage PDF in Project_2_Regression.pdf
'''
Project2()

'\n    See saved webpage PDF in Project_2_Regression.pdf\n'

## Optional Part: Find Top 10 Stocks That Best Explain the Index

1. **Index Selection**  
   - Provide a dropdown menu for the user to select one index—S&P 500, Russell 2000, or Nasdaq 100—which will serve as the dependent variable **Y**.

2. **Time Selection**  
   - Allow the user to specify the start and end dates for the analysis.

3. **Method Selection**  
   - Include **three methods** for identifying the top 10 stocks that best explain the selected index. Each method follows this initial filter:
     - **Shared Filter**: Calculate the rolling correlation between the index and each stock, then filter to the top 50 stocks with the highest correlations.

   - **Method 1: Stepwise Regression (Static Beta)**  
     - From the top 50 stocks after pre-filtering, iteratively add or remove stocks as predictors based on their statistical significance to optimize the model.

   - **Method 2: Lasso Regression (Static Beta)**  
     - From the top 50 stocks after pre-filtering, Lasso uses regularization to shrink less important coefficients to zero, effectively selecting a subset of stocks that best predict index performance.

   - **Method 3: Kalman Filter Regression (Dynamic Beta)**  
     - From the top 50 stocks after pre-filtering, Kalman employs a dynamic model to estimate time-varying relationships between the index and stocks, adapting coefficients as new data becomes available.

4. **Results Dashboard**  
   - Provide a dashboard for users to select their options and view results.
     - If **Method 1** or **Method 2** is selected (yielding static betas), display a bar plot of each top 10 stock’s coefficient.
     - If **Method 3** is chosen (yielding dynamic betas), show a time-varying scatter plot for each beta.



In [8]:
#http://127.0.0.1:3036
def Project2():
    # StockRegressionDashBoard().run_app()
    MostExplainDashBoard().run_app()

In [9]:
'''
    See saved webpage PDF in Project_2_MostExplain.pdf
'''
Project2()

'\n    See saved webpage PDF in Project_2_MostExplain.pdf\n'