# Assignment 5

Deadline: 11.06.2025 12:00 CEST

## Task

Develop an investment strategy for the Swiss equity market, backtest it using the provided datasets (`market_data.parquet`, `jkp_data.parquet`, `spi_index.csv`) and analyze its performance by benchmarking it against the SPI index. Work with the existing code infrastructure (`qpmwp-course`) and extend it by implementing any additional components needed for the strategy. Write a report that presents your methodology and the results.

### Coding (15 points)

- Selection:
  Implement selection item builder functions (via `SelectionItemBuilder`) to filter stocks based on specific criteria (e.g., exclude low-quality or high-volatility stocks).

- Optimization Data & Constraints:
  Implement functions to prepare optimization data (via `OptimizationItemBuilder`), including any econometric or machine learning-based predictions. These functions should also define optimization constraints (e.g., stock, sector, or factor exposure limits).

- Optimization Model:
  If you choose to create a custom optimization model, develop a class inheriting from Optimization (similar to `MeanVariance`, `LeastSquares`, or `BlackLitterman`). Your class should include methods set_objective and solve for defining the objective function and solving the optimization problem.

- Machine Learning Prediction:
  Integrate a machine learning model to estimate inputs for the optimization, such as expected returns or risk. This could include regression, classification, or learning-to-rank models. I suggest you to use the provided jkp_data as features, but you may also create your own (e.g., technical indicators computed on the return or price series).

- Simulation:
  Backtest the strategy and simulate portfolio returns. Account for fixed costs (1% per annum) and variable (transaction) costs (0.2% per rebalancing).


### Report (15 points):

Generate an HTML report with the following sections:

- High-level strategy overview: Describe the investment strategy you developed.

- Detailed explanation of the backtesting steps: Offer a more comprehensive breakdown of the backtesting process, including a description of the models implemented (e.g., details of the machine learning method used).

- Backtesting results:
    
    - Charts: Include visual representations (e.g., cumulative performance charts, rolling 3-year returns, etc.).
    - Descriptive statistics: Present key statistics such as mean, standard deviation, drawdown, turnover, and Sharpe ratio (or any other relevant metric) for the full backtest period as well as for subperiods (e.g., the last 5 years, or during bull vs. bear market phases).
    - Compare your strategy against the SPI index.


In [2]:
# Standard library imports
import os
import sys
import types

# Third party imports
import numpy as np
import pandas as pd

# Add the project root directory to Python path
# Load environment variables from .env file
from dotenv import load_dotenv
load_dotenv()
src_path = os.getenv('PROJECT_SOURCE_DIR')
#print(src_path)
sys.path.append(src_path)

# Local modules imports
from helper_functions import load_data_msci
from estimation.covariance import Covariance
from estimation.expected_return import ExpectedReturn
from optimization.optimization import MeanVariance
from backtesting.backtest_item_builder_classes import (
    SelectionItemBuilder,
    OptimizationItemBuilder,
)
from backtesting.backtest_item_builder_functions import (
    bibfn_selection_data_random,
    bibfn_return_series,
    bibfn_budget_constraint,
    bibfn_box_constraints,
)
from backtesting.portfolio import floating_weights
from backtesting.backtest_service import BacktestService
from backtesting.backtest import Backtest

In [None]:
## 1. Load Data
market_data = pd.read_parquet('../data/market_data.parquet')
jkp_data = pd.read_parquet('../data/jkp_data.parquet')
# Load spi_index.csv with correct column names and date parsing
spi_index = pd.read_csv(
    '../data/spi_index.csv',
    names=['date', 'spi'],
    header=0,
    parse_dates=['date'],
    dayfirst=True
)

# Check for date and ticker columns in market_data
#print(market_data.head())
#print(market_data.index.names)

# If date/ticker are in the index, reset them to columns
if 'date' not in market_data.columns or 'ticker' not in market_data.columns:
    market_data = market_data.reset_index()
    #print(market_data.columns)
    #print(market_data.head())

# Compute daily returns for each stock
market_data = market_data.sort_values(['id', 'date'])
market_data['return'] = market_data.groupby('id')['price'].pct_change()

# Inspect the result
#print(market_data[['date', 'id', 'price', 'return']].head(10))

# Merge market return (SPI) into market_data
market_data = pd.merge(
    market_data,
    spi_index.rename(columns={'spi': 'market_return'}),
    on='date',
    how='left'
)

print(market_data[['date', 'id', 'return', 'market_return']].head(10))

        date id    return  market_return
0 1999-05-06  1       NaN      -0.005188
1 1999-05-07  1 -0.013015      -0.001979
2 1999-05-10  1  0.000000       0.001216
3 1999-05-11  1  0.010989      -0.001575
4 1999-05-12  1  0.000000      -0.004866
5 1999-05-14  1  0.000000      -0.011553
6 1999-05-19  1  0.000000       0.002712
7 1999-05-21  1  0.000000      -0.006180
8 1999-05-26  1  0.000000       0.001580
9 1999-05-31  1  0.006522       0.015358


In [18]:
## 2. Compute Idiosyncratic Volatility
from statsmodels.regression.rolling import RollingOLS

def compute_idio_vol(df, window=252):
    df = df.copy()
    df['resid'] = np.nan
    for ticker in df['id'].unique():
        stock = df[df['id'] == ticker]
        if len(stock) < window:
            continue
        # Rolling regression: return ~ market_return
        rols = RollingOLS(stock['return'], stock[['market_return']], window=window)
        res = rols.fit()
        # Use .resids for residuals
        df.loc[stock.index, 'resid'] = res.resids
    # Rolling std of residuals = idiosyncratic volatility
    df['idio_vol'] = df.groupby('id')['resid'].rolling(window).std().reset_index(0, drop=True)
    return df

market_data = compute_idio_vol(market_data)

# Inspect result
print(market_data[['date', 'id', 'return', 'market_return', 'idio_vol']].dropna().head(10))


AttributeError: 'RollingRegressionResults' object has no attribute 'resids'