<h1 style="text-align: center;">Portfolio Optimization Techniques</h1>

<h2 style="text-align: center;"> A Comparative Study</h2>

Investing in a portfolio is complex due to uncertainty, randomness, and biases affecting decision-making. The efficient market hypothesis suggests it's difficult to consistently outperform the market. Modern portfolio theory aims to balance risk and return, often through diversification. Diversification spreads risk across different assets but requires understanding correlations between them. Automating asset classification and clustering can help reduce biases and optimize diversification strategies for better investment decisions.

This notebook explores various portfolio optimization strategies by combining traditional and advanced machine learning techniques. The core methodologies include autoencoders for dimensionality reduction, KMeans clustering (with both Euclidean Distance & Dynamic Time Warping as a metric) for asset selection, and the application of classic portfolio optimization techniques such as Sharpe Ratio maximization and mean-variance optimization.

Throughout this notebook, we focus on understanding how these techniques can improve portfolio construction and achieve superior returns over a specified testing period, benchmarking the results against the MASI Index.

# Setting Up the Environment

In this section, we will import the necessary libraries and packages required for the implementation of various portfolio optimization strategies. These include traditional finance tools such as mean-variance optimization and Sharpe Ratio maximization, as well as machine learning techniques like autoencoders and KMeans clustering. The combination of these tools allows us to explore and compare different approaches for constructing and optimizing portfolios.

In [1]:
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
import random
from scipy.optimize import minimize
import plotly.graph_objects as go

# Import libraries for time series clustering and preprocessing
from tslearn.clustering import TimeSeriesKMeans
from sklearn.cluster import KMeans

# Standard scaling & ParameterGrid
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import ParameterGrid

# Import libraries for autoencoder
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense

import warnings
# Ignore warnings to keep the output clean
warnings.simplefilter(action='ignore')

# Data Loading & Preprocessing

In this step, we load the historical moroccan stock price data and relevant company information. The stock price data will be used to compute returns and apply various portfolio optimization techniques, while the company information will assist in selecting assets based on criteria such as market capitalization. We set the `Date` as the index for the stock price data to facilitate time-series analysis.

In [2]:
# Load the stock price data, with 'Date' as the index column
stocks = pd.read_csv("data/stocks.csv", index_col='Date')

# Load the companies' information with the first column as the index
companies = pd.read_csv("data/companies_isin.csv", index_col=0)

In [3]:
# Convert the index of the stocks DataFrame to datetime format
stocks.index = pd.to_datetime(stocks.index, format='%d/%m/%Y')

# Remove companies whose instruments are not found in the stocks DataFrame columns
companies.drop(companies[~companies["Instrument"].isin(stocks.columns)].index, inplace=True)
companies.reset_index(inplace=True, drop=True)

In [4]:
# Display top 15 rows with missing values in the stocks DataFrame
(stocks.isnull().sum() * 100 / stocks.shape[1]).sort_values(ascending=False).head(15)

Diac Salaf         1660.000000
SAMIR              1660.000000
Aradei Capital     1277.333333
Mutandis            621.333333
Immr Invest         429.333333
Nexans Maroc         64.000000
Maghreb Oxygene      32.000000
Agma                 30.666667
Unimer               28.000000
Maghrebail           26.666667
Central.Danone       24.000000
Zellidja             22.666667
BALIMA               21.333333
Rebab Company        20.000000
Afric Indus          17.333333
dtype: float64

In this step, we select the **MASI index**, a market index, as our benchmark. This index will serve as a reference point to evaluate the performance of our various portfolio optimization strategies. By comparing our portfolios against this benchmark, we can assess how well each strategy performs relative to the broader market.

In [5]:
# Set 'MASI' column as the benchmark (likely a market index)
benchmark = stocks['MASI']

In [6]:
# Drop columns with more than 10% missing values, alongside the benchmark column
stocks.drop(columns=["SAMIR", "Diac Salaf", "Aradei Capital", "Mutandis", "Immr Invest", "MASI"], inplace=True)

# Remove the corresponding rows in the companies DataFrame
companies.drop(companies[companies["Instrument"].isin(["SAMIR", "Diac Salaf", "Aradei Capital", "Mutandis", "Immr Invest"])].index, inplace=True)
companies.reset_index(drop=True, inplace=True)

After cleaning the dataset by removing stocks with significant missing data, we address the remaining missing data in the stock prices. Missing values are filled using forward filling and backward filling methods, ensuring that any gaps in the time series data are handled appropriately. This step ensures consistency in the data before proceeding with further analysis and optimizations.

In [7]:
# Fill missing values by forward filling (propagating last valid observation forward)
stocks = stocks.fillna(method='ffill')

# Fill any remaining missing values by backward filling (propagating next valid observation backward)
stocks = stocks.fillna(method='bfill')

To build and evaluate our portfolio optimization strategies, we divide the stock price data into training and testing sets. The training set, comprising $80%$ of the data, will be used to develop and optimize the portfolios, while the remaining $20%$ will be reserved for testing and evaluating the performance of the portfolios against the benchmark. This ensures that the strategies are trained on historical data and validated on unseen data, allowing for an objective performance comparison.

In [8]:
# Get the number of assets (columns) in the stocks DataFrame
num_assets = stocks.shape[1]

# Split the data into 80% for training and 20% for testing
split = int(len(stocks) * 0.8)
train = stocks.iloc[:split]  # Training data
test = stocks.iloc[split:]   # Testing data
test_benchmark = benchmark.loc[test.index] # Benchmark data for testing

# Print the shapes of the training and testing sets
print(f"Shape of the training set: {train.shape}")
print(f"Shape of the testing set: {test.shape}")

Shape of the training set: (996, 69)
Shape of the testing set: (249, 69)


# Portfolio Utility Functions

In this section, we define several utility functions that will be used throughout the notebook for portfolio analysis and optimization. These functions help calculate key performance metrics such as cumulative returns and Sharpe Ratio, which are essential for evaluating and comparing the performance of various portfolio strategies. By abstracting these calculations into reusable functions, we streamline the optimization process and ensure consistency across different strategies.

In [9]:
def cumulative_returns(weights, data):
    """
    Parameters:
    - weights : A List or numpy array of weights for each asset in the portfolio.
    - data : A pandas DataFrame containing historical adjusted close prices of assets.

    Returns:
    - cumulative_returns : Cumulative returns of the portfolio over the period.
    """
    
    # Calculate daily returns for each asset as a percentage change
    daily_returns = data.pct_change()

    # Set the first row (which contains NaNs due to pct_change) to zero
    daily_returns.iloc[0] = 0
    
    # Apply the portfolio weights to each asset's daily returns
    weighted_returns = (weights * daily_returns).sum(axis=1)
    
    # Calculate the cumulative returns of the portfolio
    cumulative_returns = (weighted_returns + 1).cumprod() - 1
    
    return cumulative_returns

In [10]:
def sharpe_ratio(weights, returns, risk_free_rate=0.02):
    """
    Compute the Negative Sharpe Ratio of a portfolio with given weights.

    Parameters:
    - weights: A list or numpy array of weights for each asset in the portfolio.
    - returns: A pandas DataFrame containing the historical returns of each asset.
    - risk_free_rate: The annual risk-free rate, usually expressed as a decimal (e.g., 0.05 for 5%).

    Returns:
    - sharpe_ratio: The Sharpe Ratio of the portfolio.
    """
    
    # Calculate the weighted returns of the portfolio
    weighted_returns = (weights * returns).sum(axis=1)
    
    # Calculate the excess return of the portfolio (over the risk-free rate, annualized)
    excess_return = weighted_returns.mean() - (risk_free_rate / 252)  # Assuming 252 trading days in a year
    
    # Calculate the annualized volatility (standard deviation) of the portfolio returns
    volatility = weighted_returns.std() * np.sqrt(252)  # Assuming 252 trading days in a year
    
    # Calculate and return the Sharpe Ratio (negative for minimization purposes)
    sharpe_ratio = excess_return / volatility
    
    return -sharpe_ratio  # Negating for optimization (to maximize Sharpe Ratio)

In [11]:
def maximum_drawdown(returns):
    """
    Compute the maximum drawdown of a portfolio.

    Parameters:
    - returns: A pandas DataFrame containing the historical cumulative returns of the portfolio.

    Returns:
    - drawdown: The full drawdown series of the portfolio.
    """
    
    # Calculate the rolling maximum cumulative return at each point in time
    rolling_max = returns.cummax()
    
    # Calculate the drawdown as the percentage decline from the rolling maximum
    drawdown = (returns - rolling_max) / rolling_max

    # Set the first drawdown value to 0 (to avoid NaNs or misleading values)
    drawdown.iloc[0] = 0
    
    return drawdown

# Equal-Weighted portfolio

As a baseline strategy, we construct an equal-weighted portfolio, where each asset is assigned the same weight regardless of its characteristics or historical performance. This simple approach provides a benchmark for comparison against more advanced optimization techniques.

## Building the Equal-Weighted Portfolio

In [12]:
# Calculate equal weights for the portfolio, assigning the same weight to each asset
equal_weights = np.full(num_assets, 1 / num_assets)

# Calculate the cumulative returns of the portfolio using equal weights
portfolio_returns_equal_weighted = cumulative_returns(equal_weights, test)

print(f"The cumulative return for the equal weighted portfolio over the entire testing period : {portfolio_returns_equal_weighted[-1] * 100:.2f}%")

The cumulative return for the equal weighted portfolio over the entire testing period : 28.06%


## Backtesting the Equal-Weighted Portfolio

We will backtest the equal-weighted portfolio using historical stock data from 2021 to 2022. The backtesting process involves simulating the portfolio's performance over time to evaluate its effectiveness. The portfolio is rebalanced annually, ensuring that each asset maintains equal weighting throughout the testing period. This approach provides a benchmark to compare with more advanced strategies implemented later in the notebook.

In [13]:
# Calculate the cumulative returns for the benchmark (e.g., MASI) over the test period
benchmark_cumulative_returns = cumulative_returns([1], pd.DataFrame(benchmark[test.index]))

In [14]:
# Create traces
portfolio_trace = go.Scatter(x=portfolio_returns_equal_weighted.index, y=portfolio_returns_equal_weighted, mode='lines', name='Equal-Weighted Portfolio', line=dict(color='green'))
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=benchmark_cumulative_returns, mode='lines', name='MASI Index', line=dict(color='red', dash='dash'))

# Layout
layout = go.Layout(
    title='Equal-Weighted Portfolio Performance',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Cumulative Returns'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=12))
)

# Create figure
fig = go.Figure(data=[portfolio_trace, masi_trace], layout=layout)

# Show plot
fig.show()

The equal-weighted portfolio consistently outperforms the MASI Index, particularly from mid-2021 onwards, indicating that this simple strategy was effective during the period under consideration. The portfolio is rebalanced annually, ensuring that all assets maintain equal weighting, which contributes to its steady growth. The MASI Index, represented with a dashed line, exhibits more volatility and lower overall returns compared to the equal-weighted strategy.

In [15]:
# Create traces
portfolio_trace = go.Scatter(x=portfolio_returns_equal_weighted.index, y=maximum_drawdown(portfolio_returns_equal_weighted), mode='lines', name='Equal-Weighted Portfolio Drawdown', line=dict(color='blue'))
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=maximum_drawdown(benchmark_cumulative_returns), mode='lines', name='MASI Index Drawdown', line=dict(dash='dash', color='red'))

# Layout
layout = go.Layout(
    title='Equal-Weighted Portfolio Drawdown Comparison',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Maximum Drawdown'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=11))
)

# Create figure
fig = go.Figure(data=[portfolio_trace, masi_trace], layout=layout)

# Show plot
fig.show()

Drawdown measures the decline from the portfolio's peak value to its lowest point before a recovery. The equal-weighted portfolio demonstrates better resilience, with shallower drawdowns compared to the MASI Index. The MASI Index shows periods of more severe drawdowns, particularly in mid-2021, where it experiences a significant drop of nearly 50%. In contrast, the equal-weighted portfolio’s largest drawdown is around 30%, indicating that it was less volatile and more stable during downturns. This drawdown comparison highlights the risk reduction benefits of the equal-weighted strategy relative to the market index.

# Market-Cap Weighted Portfolio

In this section, we construct a market-cap weighted portfolio, where the weight of each stock is proportional to its market capitalization. This strategy reflects the relative size of each company in the market, with larger companies receiving higher allocations in the portfolio. We begin by extracting the market capitalization data, calculate the total market value, and then assign weights accordingly. Finally, we calculate the cumulative returns of the market-cap weighted portfolio over the testing period to evaluate its performance.

## Building the Market-Cap Weighted Portfolio

In [16]:
# Step 1: Extract market capitalization for each stock from the companies DataFrame
market_cap = companies.set_index("Instrument")["Share Capital"]

# Step 2: Calculate the total market capitalization across all stocks
total_market_cap = market_cap.sum()

# Step 3: Calculate the portfolio weights for each stock based on their proportion of total market capitalization
weights = market_cap / total_market_cap

# Step 4: Calculate the cumulative returns of the market cap-weighted portfolio
portfolio_returns_market_cap_weighted = cumulative_returns(weights, test)

print(f"The cumulative return for the market cap weighted portfolio over the entire testing period : {portfolio_returns_market_cap_weighted[-1] * 100 :.2f}%")

The cumulative return for the market cap weighted portfolio over the entire testing period : 8.86%


## Backtesting the Market-Cap Weighted Portfolio

In [17]:
# Create traces
marketcap_trace = go.Scatter(x=portfolio_returns_market_cap_weighted.index, y=portfolio_returns_market_cap_weighted, mode='lines', name='Market-Cap Portfolio', line=dict(color='green'))
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=benchmark_cumulative_returns, mode='lines', name='MASI Index', line=dict(dash='dash'))

# Layout
layout = go.Layout(
    title='Market-Cap Weighted Portfolio Performance',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Cumulative Returns'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=12))
)

# Create figure
fig = go.Figure(data=[marketcap_trace, masi_trace], layout=layout)

# Show plot
fig.show()

Unlike the equal-weighted portfolio, the market-cap weighted portfolio gives more weight to larger companies, which may influence its performance. In this case, the market-cap weighted portfolio underperforms the MASI Index, particularly from mid-2021 onwards. While it experiences a slight uptick in performance towards the end of the period, it still lags behind the benchmark index, which demonstrates stronger and more consistent growth. This suggests that, during the testing period, larger companies included in the portfolio did not perform as well relative to the overall market.

In [18]:
# Create traces
marketcap_trace = go.Scatter(x=portfolio_returns_market_cap_weighted.index, y=maximum_drawdown(portfolio_returns_market_cap_weighted), mode='lines', name='Market-Cap Portfolio Drawdown', line=dict(color='blue'))
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=maximum_drawdown(benchmark_cumulative_returns), mode='lines', name='MASI Index Drawdown', line=dict(dash='dash', color='red'))

# Layout
layout = go.Layout(
    title='Market-Cap Portfolio Drawdown Comparison',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Maximum Drawdown'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=11))
)

# Create figure
fig = go.Figure(data=[marketcap_trace, masi_trace], layout=layout)

# Show plot
fig.show()

The market-cap weighted portfolio exhibits significantly deeper drawdowns compared to the MASI Index, with a dramatic dip occurring in mid-2021, where the portfolio experienced a drawdown of over 100%, indicating a substantial loss. Although the MASI Index also experiences drawdowns, they are much less severe and recover more quickly than those of the market-cap portfolio. This suggests that the market-cap weighted strategy carried higher risk and was more vulnerable to large losses during this period, emphasizing the importance of managing downside risk in portfolio construction.

# Maximizing Sharpe Ratio

The Sharp ratio measures the risk-adjusted return of an investment. It is given by the formula:

$$
\text{Sharp Ratio} = \frac{R_p - R_f}{\sigma_p}
$$

where: 

- $R_p$ is the portfolio return
- $R_f$ is the risk-free rate
- $\sigma_p$ is the portfolio standard deviation
The Sharp ratio is used to compare the performance of different portfolios. A higher Sharp ratio indicates better risk-adjusted returns.

To build an optimal portfolio, the goal is to maximize the Sharpe Ratio. This is done by adjusting the portfolio's asset weights such that the overall return ($R_p$) increases relative to the portfolio’s risk ($\sigma_p$). In practice, this involves solving a constrained optimization problem, where the sum of the portfolio weights equals 1 (fully invested portfolio) and no asset weight exceeds predefined bounds. By optimizing the Sharpe Ratio, we seek to construct a portfolio that maximizes returns for each unit of risk, thus improving overall performance compared to other strategies that may not account for risk as effectively.

## Building a Sharp Ratio Optimized Portfolio

In this section, we aim to optimize the portfolio's asset weights to maximize the Sharpe Ratio, thereby achieving the best possible risk-adjusted returns. We start by making an initial guess of equal weights for all assets and then define the bounds for the weights, ensuring that no asset exceeds a certain allocation. Additionally, we impose a constraint that the total allocation across all assets must sum to 100%, ensuring a fully invested portfolio.

To optimize the portfolio, we use the SLSQP (Sequential Least Squares Programming) method, which minimizes the negative Sharpe Ratio—effectively maximizing it. The result of this optimization is a set of asset weights that generate the highest possible Sharpe Ratio, maximizing returns for each unit of risk taken.


In [19]:
# Define the initial guess for the weights (equal weights for each asset)
initial_weights = np.array([1 / num_assets] * num_assets)

# Define the bounds for the weights (each weight should be between 0.001 and 1)
bounds = [(0.001, 1)] * num_assets

# Define the constraint that the sum of all weights must equal 1 (full investment)
constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})

# Calculate the daily returns for the training data, removing any rows with missing values
train_returns = train.pct_change().dropna()

# Minimize the negative Sharpe Ratio (to effectively maximize the Sharpe Ratio)
result = minimize(sharpe_ratio, initial_weights, args=(train_returns),
                  bounds=bounds, method='SLSQP', constraints=constraints)

# Extract the optimized weights from the result of the minimization
optimized_weights = result.x

# Print the optimized Sharpe Ratio (converted back from the negative value)
print("Optimized Sharpe Ratio:", -result.fun)

Optimized Sharpe Ratio: 0.00927165701200428


In [20]:
# Calculate the portfolio return using the portfolio_return function
portfolio_returns_optimized = cumulative_returns(optimized_weights, test)

print(f"The cumulative return for the optimized portfolio over the entire testing period : {portfolio_returns_optimized[-1]*100:.2f}%")

The cumulative return for the optimized portfolio over the entire testing period : 21.72%


## Backtesting the Sharpe Ratio Optimized Portfolio

In [21]:
# Create traces
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=benchmark_cumulative_returns, mode='lines', name='MASI Index', line=dict(dash='dash', color='red'))
optimized_trace = go.Scatter(x=portfolio_returns_optimized.index, y=portfolio_returns_optimized, mode='lines', name='Optimized Portfolio', line=dict(color='green'))

# Layout
layout = go.Layout(
    title='Sharpe Ratio Optimized Portfolio Performance',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Cumulative Returns'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=12))
)

# Create figure
fig = go.Figure(data=[masi_trace, optimized_trace], layout=layout)

# Show plot
fig.show()

The optimized portfolio is constructed to maximize risk-adjusted returns, meaning it seeks to achieve the highest possible returns for a given level of risk. 

Throughout the testing period, the optimized portfolio shows a performance similar to the MASI Index but demonstrates higher stability in periods of market volatility. Particularly towards the end of 2021, the optimized portfolio begins to outperform the MASI Index. This suggests that the optimization successfully adjusted asset weights to

In [22]:
# Create traces
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=maximum_drawdown(benchmark_cumulative_returns), mode='lines', name='MASI Index Drawdown', line=dict(dash='dash', color='red'))
optimized_trace = go.Scatter(x=portfolio_returns_optimized.index, y=maximum_drawdown(portfolio_returns_optimized), mode='lines', name='Optimized Portfolio Drawdown', line=dict(color='purple'))

# Layout
layout = go.Layout(
    title='Sharpe Ratio Optimized Portfolio Drawdown Comparison',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Maximum Drawdown'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=11))
)

# Create figure
fig = go.Figure(data=[optimized_trace, masi_trace], layout=layout)

# Show plot
fig.show()

Interestingly, the Sharpe Ratio optimized portfolio exhibits larger drawdowns early in the testing period, particularly around March 2021. This indicates that the portfolio experienced a substantial loss during that time. However, after recovering from this significant drop, the optimized portfolio exhibits shallower drawdowns and stabilizes, closely tracking the MASI Index for the rest of the testing period.

This comparison highlights the optimized portfolio's ability to recover from early drawdowns and maintain stability, showing that while the portfolio was exposed to greater initial risk, it managed to mitigate further losses and perform on par with the market benchmark for the majority of the period.

# Mean-Variance Optimization (MVO)

Mean-Variance Optimization (MVO) is a portfolio optimization technique that seeks to maximize expected returns for a given level of risk. By considering both the expected returns and the covariance matrix of the assets, MVO identifies the optimal allocation of weights to minimize risk while achieving the desired level of return. This method is fundamental in modern portfolio theory and is widely used to balance the risk-return trade-off.

Logarithmic (log) returns are often preferred when calculating the expected returns and the covariance matrix of assets due to the following reasons:

+ **Normalization**: Log returns transform absolute price changes into relative changes, thereby normalizing the data. This makes it easier to compare returns across different assets or time periods, especially when assets have varying price scales.
   
+ **Additivity**: One of the key advantages of log returns is their additive property. The sum of individual log returns over multiple time periods is equivalent to the total log return for the combined period. This additivity simplifies the calculation of returns across time horizons.
  
+ **Statistical Properties**: Log returns often exhibit more desirable statistical properties compared to simple returns. They tend to have more symmetric distributions and are closer to normality, making them more suitable for use in mathematical models and statistical analyses, such as those employed in MVO. These properties lead to more robust and reliable results in optimization.

By using log returns in MVO, we ensure that the optimization process benefits from more stable and mathematically tractable inputs, which helps improve the accuracy and effectiveness of the portfolio construction.

## Building the Mean-Variance Optimized Portfolio

In this step, we calculate the log returns of the assets, which are essential inputs for the mean-variance optimization process. Log returns are preferred due to their normalization properties, additivity over time periods, and more favorable statistical characteristics. After calculating the log returns, we compute the expected return (mean) and the covariance matrix of the assets, which represent the relationships between asset returns. These statistics will be used to determine the optimal portfolio allocation under the mean-variance framework.

In [23]:
# Calculate the log returns for the training data
log_returns = np.log(train / train.shift(1))

# Drop any rows with missing values (due to shifting)
log_returns = log_returns.dropna()

# Compute the mean of the log returns (logarithmic expected returns)
log_mu = log_returns.mean()

# Compute the covariance matrix of the log returns (logarithmic covariance of returns)
log_cov = log_returns.cov()

The objective function in portfolio optimization, $F(w)$, combines different components to find an optimal allocation of assets. It can be mathematically represented as:

\begin{align*} 
\text{Objective Function} = \text{Portfolio Risk} + \text{Penalty}
\end{align*}

1. **Portfolio Risk**: The portfolio risk measures the uncertainty or volatility of the portfolio's returns. It is typically represented by the standard deviation of the portfolio returns. The portfolio risk can be computed using the following formula:

   \begin{align*}
   \text{Portfolio Risk} = \sqrt{w^T \cdot C_{log} \cdot w}
   \end{align*}

   Here, $w$ represents the vector of portfolio weights, and $C_{log}$ is the logarithmic covariance matrix of asset returns.

2. **Penalty**: The penalty term is included to enforce the desired target return in the optimization process. The penalty encourages the portfolio to approach the target return by penalizing deviations from it. Mathematically, the penalty term can be expressed as:

   \begin{align*}
   \text{Penalty} = P \cdot \left|\bar V - (1 + T) V\right|
   \end{align*}

   Here, **$\bar V$** represents the expected return of the portfolio, **$P$** is a parameter that controls the strength of the penalty, **$T$** is the desired target return, and **$V$** is the value of the portfolio.
   
   **Note**:
   
   \begin{align*}
   \bar V = w^T \cdot \mu_{log}\\ \text{ and } V = w^T \cdot \Gamma
   \end{align*}
   where $\mu_{log}$ denotes the logarithmic expected returns and $\Gamma$ is the return in the last period.
   

By summing the portfolio risk and the penalty term, the objective function aims to find the optimal weights that minimize the risk of the portfolio while considering the deviation from the target return.

The total formula of the objective function, combining the portfolio risk and the penalty term, is as follows:

\begin{align*}
F(w) = \sqrt{w^T \cdot C_{log} \cdot w} + P \cdot \left| \bar V - (1 + T) \cdot V \right|
     = \sqrt{w^T \cdot C_{log} \cdot w} + P \cdot \left|w^T \cdot (\mu_{log} - (1 + T) \cdot \Gamma)\right|
\end{align*}


It's important to note that the specific formulas and mathematical representations may vary depending on the optimization approach or methodology used. However, the key idea remains consistent: to balance risk and return while incorporating a penalty for deviations from the desired target return.

In [24]:
def objective(weights, log_mu, log_cov, target_return, penalty_weight):
    """
    Objective function to minimize for the portfolio optimization problem.

    Parameters:
    - weights: A numpy array of portfolio weights.
    - log_mu: A numpy array of mean log returns for each asset.
    - log_cov: A 2-dimensional numpy array of log return covariance matrix.
    - target_return: The desired target return for the portfolio.
    - penalty_weight: A penalty factor for deviating from the target return.

    Returns:
    - objective_value: The objective value to minimize.
    """

    # Calculate the expected portfolio return using the log returns mean and weights
    portfolio_return = log_mu @ weights
    
    # Calculate the portfolio risk (standard deviation) using the covariance matrix and weights
    portfolio_risk = np.sqrt(np.dot(weights.T, np.dot(log_cov, weights)))
    
    # Compute the portfolio value based on the last available log returns
    portfolio_value = np.dot(weights, log_returns.iloc[-1].values.T)
    
    # Calculate the penalty based on how far the portfolio return deviates from the target return
    penalty = penalty_weight * np.abs(portfolio_return - (1 + target_return) * portfolio_value)
    
    # Return the objective value, combining risk and penalty
    return portfolio_risk + penalty

In this cell, we define the key parameters that guide the portfolio optimization process. Each parameter plays a crucial role in ensuring the portfolio adheres to specific requirements and objectives:

1. **Constraints**:  
   The constraints parameter defines the conditions that the portfolio must satisfy during optimization. In this case, the primary constraint ensures that the sum of all weights assigned to assets equals 1. This constraint guarantees that the portfolio is fully invested, meaning all available resources are allocated across the assets without any leftover capital. The portfolio remains balanced by adhering to this constraint.

2. **Bounds**:  
   The bounds parameter specifies the allowable range for the portfolio weights of each asset. Here, the bounds are set between 0 and 1, which means that each asset can have a minimum allocation of 0% and a maximum allocation of 100%. This ensures the portfolio follows a long-only strategy, prohibiting short-selling (negative weights) and ensuring that the entire portfolio remains fully allocated.

3. **Target Return**:  
   The target return represents the desired level of expected return that the portfolio optimization aims to achieve. For this problem, the target return is set at 10% (0.1). The optimization algorithm will seek to find the best allocation of weights that produces this return while simultaneously minimizing portfolio risk, aligning with the principles of mean-variance optimization.

4. **Penalty Weight**:  
   The penalty weight is a critical parameter that determines the significance of the penalty term in the objective function. It controls how strongly the optimization prioritizes achieving the target return. A higher penalty weight forces the portfolio to stay closer to the desired return, potentially at the expense of higher risk. Conversely, a lower penalty weight allows for greater flexibility in balancing the risk-return trade-off, giving the optimizer more leeway in deviating slightly from the target return to minimize overall risk.

In [25]:
# Define constraint: The sum of the weights must be equal to 1 (fully invested portfolio)
constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})

# Define bounds: Each weight must be between 0 and 1 (no short selling, no over-leveraging)
bounds = tuple((0, 1) for i in range(len(log_returns.mean())))

# Define the target return for the portfolio optimization (e.g., aiming for 20% return)
target_return = 0.2

# Define the penalty weight to balance between minimizing risk and achieving the target return
penalty_weight = 0.05

In [26]:
# Optimize portfolio weights by minimizing the objective function (risk + penalty)
# Starting with equal weights, and considering the log returns mean, covariance, target return, and penalty weight
result = minimize(objective,
                  x0=np.ones(len(log_mu)) / len(log_mu), 
                  args=(log_mu, log_cov, target_return, penalty_weight), 
                  method='SLSQP',
                  bounds=bounds,
                  constraints=constraints
                  )

# Extract the optimized weights from the optimization result
optimized_weights = result.x

## Backtesting the Mean-Variance Optimized Portfolio

In [27]:
# Calculate the portfolio return using the portfolio_return function
mean_variance_portfolio = cumulative_returns(result.x, test)

print(f"The cumulative return for the optimized portfolio over the entire testing period : {mean_variance_portfolio[-1]*100:.2f}%")

The cumulative return for the optimized portfolio over the entire testing period : 17.59%


In [28]:
# Create traces
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=benchmark_cumulative_returns, mode='lines', name='MASI Index', line=dict(dash='dash', color='red'))
mean_variance_trace = go.Scatter(x=mean_variance_portfolio.index, y=mean_variance_portfolio, mode='lines', name='Mean-Variance Optimized Portfolio', line=dict(color='green'))

# Layout
layout = go.Layout(
    title='Mean-Variance Optimized Portfolio Performance',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Cumulative Returns'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=12))
)

# Create figure
fig = go.Figure(data=[masi_trace, mean_variance_trace], layout=layout)

# Show plot
fig.show()

The Mean-Variance Optimization (MVO) approach aims to maximize returns for a given level of risk by considering both expected returns and the covariance of asset returns.

During the initial months, the mean-variance optimized portfolio underperforms, showing a sharp decline in returns around May 2021, even dipping below -20%. The portfolio remains flat for a significant period after this drop, indicating that the optimization was unable to fully recover or provide substantial returns until late in the year.

Starting in November 2021, the optimized portfolio experiences a substantial increase, significantly outperforming the MASI Index. This suggests that, despite its earlier underperformance, the optimization eventually aligned the portfolio to capture a favorable upward trend in the market. However, the volatility and sharp drawdowns earlier in the period highlight the potential risk associated with this strategy. Ultimately, the MVO portfolio ends the period with higher cumulative returns than the MASI Index, but with higher volatility along the way.


In [29]:
# Create traces
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=maximum_drawdown(benchmark_cumulative_returns), mode='lines', name='MASI Index Drawdown', line=dict(dash='dash', color='red'))
mean_variance_trace = go.Scatter(x=mean_variance_portfolio.index, y=maximum_drawdown(mean_variance_portfolio), mode='lines', name='Mean-Variance Optimized Portfolio Drawdown', line=dict(color='purple'))

# Layout
layout = go.Layout(
    title='Mean-Variance Optimized Portfolio Drawdown Comparison',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Maximum Drawdown'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=11))
)

# Create figure
fig = go.Figure(data=[mean_variance_trace, masi_trace], layout=layout)

# Show plot
fig.show()

The Mean-Variance Optimized Portfolio experiences a severe drawdown around May 2021, with the drawdown reaching nearly 30%. This indicates that the portfolio suffered significant losses during this period and remained in a deep drawdown for several months, with minimal recovery until late in the year. In contrast, the MASI Index (in red) demonstrates a much smaller drawdown, consistently staying within a relatively tight range throughout the period.

The recovery of the Mean-Variance Optimized Portfolio is delayed until late 2021, where it begins to recover from the earlier drawdowns. Although it eventually improves and stabilizes by the end of the period, the prolonged drawdown highlights the higher volatility and risk associated with the mean-variance strategy during the first half of the testing period. This plot underscores the importance of risk management in mean-variance optimization, as the portfolio's steep early losses contrast with its eventual stabilization.

# K-Means Clustering Portfolio

Clustering methods can broadly be categorized into two types: hierarchical clustering and partitional clustering. 

- **Hierarchical clustering** either merges smaller clusters into larger ones or splits larger clusters into smaller subsets. This method is often used when the underlying structure of the data can be represented as a tree, and its results are typically visualized in a dendrogram.

- **Partitional clustering**, on the other hand, directly divides the data set into a set of disjoint clusters. The most commonly used method in this category is **K-means clustering**, which partitions the data by minimizing the variance within each cluster.

In this section, we apply K-means clustering to group assets based on their historical returns, aiming to construct a diversified portfolio from the resulting clusters. By clustering assets with similar return patterns, we can identify groups of stocks that exhibit comparable behaviors and select representative assets from each cluster to form the portfolio. This approach helps reduce risk by diversifying across different clusters, ensuring that the portfolio is not overly exposed to a single asset class or sector.

## Building the K-Means Clustering Portfolio

Before applying K-means clustering, it is important to standardize the asset returns data. Standardization ensures that all assets are on the same scale, which prevents the clustering algorithm from being biased towards assets with larger numerical values.

In [30]:
# Step 1: Standardize the data (returns), with assets as rows
train_returns = train.pct_change().dropna().T  # Transpose to have assets as rows and dates as columns

# Apply standardization to the returns data (z-score normalization)
scaler = StandardScaler()
scaled_returns = scaler.fit_transform(train_returns)

To construct an optimal portfolio from the clustered assets, we define an objective function that balances the number of clusters, the number of assets selected per cluster, and the portfolio's risk-return characteristics. The objective function aims to maximize the Sharpe Ratio by selecting the most appropriate assets from each cluster.

The process begins with KMeans clustering of the returns data. After clustering, we select a specified number of assets from each cluster, ensuring that the portfolio remains diversified. Finally, we optimize the weights of the selected assets to maximize the portfolio’s Sharpe Ratio, subject to constraints such as ensuring that all weights sum to 1 and that no individual asset weight exceeds predefined limits.

In [31]:
def objective_function(n_clusters, n_assets_per_cluster, returns, metric, risk_free_rate=0.02):
    """
    Objective function to maximize returns while balancing the number of clusters and assets per cluster.

    Parameters:
    - n_clusters: Number of clusters for KMeans.
    - n_assets_per_cluster: Number of assets to select from each cluster.
    - returns: Historical returns data of the assets.
    - risk_free_rate: Annual risk-free rate, default to 2%.
    - metric: The metric to use for clustering (e.g., 'euclidean', 'dtw', 'softdtw', 'correlation').

    Returns:
    - sharpe_ratio: Negative Sharpe ratio (to be minimized).
    """
    
    # Step 1: Perform KMeans clustering on the returns data
    kmeans = TimeSeriesKMeans(n_clusters=n_clusters, random_state=42, metric=metric)
    clusters = kmeans.fit_predict(returns)

    # Step 2: Select a fixed number of assets from each cluster
    selected_assets = []
    for cluster in range(n_clusters):
        # Get indices of assets in the current cluster
        cluster_assets = np.where(clusters == cluster)[0]
        
        # Select up to n_assets_per_cluster assets from the current cluster
        n_assets_in_cluster = min(n_assets_per_cluster, len(cluster_assets))
        selected_assets.extend(np.random.choice(cluster_assets, n_assets_in_cluster, replace=False))

    # Step 3: Subset the returns data to only the selected assets
    returns = returns[:, selected_assets].T  # Transpose to align with the optimization

    # Initialize weights equally across the selected assets
    initial_weights = np.full(len(selected_assets), 1 / len(selected_assets))
    
    # Define bounds and constraints for the optimization (weights must sum to 1)
    bounds = [(0.001, 1)] * len(selected_assets)
    constraints = {'type': 'eq', 'fun': lambda w: np.sum(w) - 1}
    
    # Optimize the Sharpe Ratio using the selected assets' returns
    result = minimize(sharpe_ratio, initial_weights, args=(returns.T, risk_free_rate),
                      bounds=bounds, method='SLSQP', constraints=constraints)
    
    # Return the negative Sharpe Ratio (for minimization purposes)
    return result.fun

In this section, we define a parameter grid and perform a grid search to identify the optimal combination of clusters and assets per cluster for the KMeans-based portfolio. By varying the number of clusters and the number of assets selected from each cluster, we aim to maximize the Sharpe Ratio.

The parameter grid includes a range of values for both the number of clusters and the number of assets per cluster. We evaluate each parameter combination by calculating the portfolio’s Sharpe Ratio. The goal is to identify the set of parameters that produces the highest risk-adjusted return. The best combination of parameters is determined based on the lowest negative Sharpe Ratio, which we will use to build the final portfolio.

In [32]:
# Define the parameter grid for the number of clusters and assets per cluster
param_grid = {
    'n_clusters': range(3, 10),  # Define a reasonable range for clusters
    'n_assets_per_cluster': range(3, 15)  # Define a range for assets per cluster
}

# Create a grid of all possible parameter combinations
grid = ParameterGrid(param_grid)

# Initialize variables to track the best Sharpe Ratio and corresponding parameters
best_sharpe_ratio = float('inf')
best_params = None

# Iterate through each combination of parameters in the grid
for params in grid:
    n_clusters = params['n_clusters']
    n_assets_per_cluster = params['n_assets_per_cluster']
    
    # Compute the negative Sharpe Ratio for the current parameter combination
    sharpe_ratio_value = objective_function(n_clusters, n_assets_per_cluster, scaled_returns, "euclidean")
    
    # Update the best Sharpe Ratio and parameters if a better combination is found
    if sharpe_ratio_value < best_sharpe_ratio:
        best_sharpe_ratio = sharpe_ratio_value
        best_params = params

# Print the best (negative) Sharpe Ratio and the corresponding optimal parameters
print(f"Best Sharpe Ratio: {-best_sharpe_ratio}")
print(f"Optimal Parameters: {best_params}")

Best Sharpe Ratio: -1.0381695126974114e-05
Optimal Parameters: {'n_assets_per_cluster': 3, 'n_clusters': 3}


Once the data has been standardized and hyperparameters have been defined, we proceed to apply KMeans clustering to group the assets based on their historical returns. Clustering helps us identify patterns in the data and group similar assets together, allowing for more informed asset selection in the portfolio construction process.

In [33]:
# Step 2: Apply KMeans Clustering to the scaled returns data
n_clusters = 3  # Set the number of clusters for KMeans
kmeans = TimeSeriesKMeans(n_clusters=n_clusters, metric="euclidean", random_state=42)

# Fit the KMeans model and predict cluster assignments for the assets
clusters = kmeans.fit_predict(scaled_returns)  # Transposed to cluster assets instead of time periods

In [34]:
# Create a DataFrame to associate each asset with its assigned cluster
cluster_df = pd.DataFrame({
    'Instrument': train_returns.index,  # Asset names
    'Cluster': clusters  # Cluster assignments
})

# Print the number of assets in each cluster
print(cluster_df.groupby('Cluster').count())

         Instrument
Cluster            
0                 1
1                67
2                 1


After clustering the assets, the next step is to select a specific number of assets from each cluster to include in the portfolio. By sampling assets from each cluster, we ensure that the portfolio remains diversified across different groups of assets with similar return patterns. 

In this step, we define how many assets to select from each cluster. If a cluster contains fewer assets than the specified number, we select all available assets. Otherwise, we randomly sample the required number of assets. This method allows us to build a portfolio that captures the diversity within each cluster while adhering to the desired portfolio size.

In [35]:
# Step 3: Select Assets from Each Cluster
n_assets_per_cluster = 4  # Number of assets to select from each cluster
selected_assets = []

for cluster in range(n_clusters):
    # Get the assets belonging to the current cluster
    cluster_assets = cluster_df[cluster_df['Cluster'] == cluster]['Instrument']
    
    # Get the number of assets in the current cluster
    n_assets_in_cluster = len(cluster_assets)
    
    # If the cluster has fewer assets than required, select all assets from the cluster
    if n_assets_per_cluster > n_assets_in_cluster:
        selected_assets.extend(cluster_assets.values)
    else:
        # Randomly select the specified number of assets from the cluster
        selected_assets.extend(cluster_assets.sample(n=n_assets_per_cluster, random_state=42).values)

print("Selected Assets:", selected_assets)

Selected Assets: ['Delta Holding', 'M2M Group', 'CIH', 'Agma', 'Auto Nejma', 'Alliances']


With the assets selected from each cluster, we now proceed to build and optimize the portfolio. We begin by calculating the daily returns for the selected assets and assign equal initial weights to each asset in the portfolio.

The next step is to optimize the portfolio to maximize the Sharpe Ratio, which measures the risk-adjusted returns of the portfolio. By applying constraints to ensure the portfolio remains fully invested (weights sum to 1) and setting bounds on the weights, we aim to find the optimal asset allocation that delivers the highest possible returns for the level of risk taken. This optimization process allows us to adjust the asset weights in a way that enhances the portfolio’s overall performance.

In [36]:
# Step 4: Create the Portfolio with Selected Assets
# Calculate daily returns for the selected assets and drop missing values
selected_returns = stocks[selected_assets].pct_change().dropna()

# Define initial equal weights for the selected assets
initial_weights = np.array([1 / len(selected_assets)] * len(selected_assets))

# Define bounds for the weights (each weight between 0.001 and 1) and a constraint that weights must sum to 1
bounds = [(0.001, 1)] * len(selected_assets)
constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})

# Optimize the portfolio for maximum Sharpe Ratio using the selected assets' returns
result = minimize(sharpe_ratio, initial_weights, args=(selected_returns, 0.02),
                  bounds=bounds, method='SLSQP', constraints=constraints)

# Extract the optimized portfolio weights
optimized_weights = result.x

# Print the optimized weights and the Sharpe Ratio of the optimized portfolio
print("Optimized Weights:", optimized_weights)
optimized_sharpe_ratio = -result.fun  # Convert back to positive since it was negated for optimization
print("Optimized Sharpe Ratio:", optimized_sharpe_ratio)

Optimized Weights: [0.04914818 0.3118435  0.02120566 0.61580265 0.001      0.001     ]
Optimized Sharpe Ratio: 0.003559630332449057


In [37]:
# Show the selected assets and their corresponding optimized weights
portfolio = pd.DataFrame({
    'Asset': selected_assets,  # List of selected assets
    'Weight': optimized_weights  # Corresponding optimized weights
})

# Print the portfolio composition (assets and their weights)
print("Portfolio Composition:\n", portfolio)

Portfolio Composition:
            Asset    Weight
0  Delta Holding  0.049148
1      M2M Group  0.311844
2            CIH  0.021206
3           Agma  0.615803
4     Auto Nejma  0.001000
5      Alliances  0.001000


## Backtesting the K-Means Clustering Portfolio

In [38]:
# Calculate the cumulative returns for the KMeans-optimized portfolio over the testing period
kmeans_portfolio_returns = cumulative_returns(optimized_weights, test[selected_assets])

print(f"The cumulative return for the KMeans Portfolio (Euclidean) over the entire testing period : {kmeans_portfolio_returns[-1] * 100:.2f}%")

The cumulative return for the KMeans Portfolio (Euclidean) over the entire testing period : 11.14%


In [39]:
# Create traces
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=benchmark_cumulative_returns, mode='lines', name='MASI Index', line=dict(dash='dash'))
kmeans_trace = go.Scatter(x=kmeans_portfolio_returns.index, y=kmeans_portfolio_returns, mode='lines', name='KMeans (Euclidean Metric) Portfolio', line=dict(color='green'))

# Layout
layout = go.Layout(
    title='KMeans (Euclidean Metric) Portfolio Performance',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Cumulative Returns'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=12))
)

# Create figure
fig = go.Figure(data=[kmeans_trace, masi_trace], layout=layout)

# Show plot
fig.show()

Throughout the testing period, the KMeans portfolio demonstrates fluctuating performance. Initially, it trails the MASI Index, showing minimal growth and even negative returns in the first few months. However, around mid-2021, the KMeans portfolio begins to gain traction, showing significant improvements in cumulative returns.

By late 2021, the KMeans portfolio overtakes the MASI Index, reflecting successful asset selection and weight optimization based on cluster characteristics. While the portfolio shows more volatility compared to the MASI Index, it ultimately yields a competitive performance. This indicates that the KMeans approach, particularly with the Euclidean metric, can effectively capture trends in asset returns and construct a portfolio that adapts to market dynamics.

In [40]:
# Create traces
kmeans_trace = go.Scatter(x=kmeans_portfolio_returns.index, y=maximum_drawdown(kmeans_portfolio_returns), mode='lines', name='KMeans (Euclidean Metric) Portfolio Drawdown', line=dict(color='orange'))
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=maximum_drawdown(benchmark_cumulative_returns), mode='lines', name='MASI Drawdown', line=dict(dash='dash', color='red'))

# Layout
layout = go.Layout(
    title='KMeans (Euclidean Metric) Portfolio Drawdown Comparison',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Maximum Drawdown'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=10))
)

# Create figure
fig = go.Figure(data=[kmeans_trace, masi_trace], layout=layout)

# Show plot
fig.show()

Throughout the testing period, the KMeans portfolio shows a more pronounced drawdown, particularly notable in May 2021, where it dips to nearly -2.5%. This indicates significant losses during that time, reflecting the challenges of asset selection and weight allocation inherent in the KMeans strategy. 

The MASI Index, while also experiencing drawdowns, maintains a relatively tighter range and demonstrates more stability compared to the KMeans portfolio. This suggests that the KMeans portfolio is more volatile, with larger fluctuations in value.

As the portfolio approaches the end of 2021, the drawdown levels for the KMeans portfolio stabilize, indicating an improved recovery from earlier losses. However, the higher drawdown levels throughout the testing period highlight the inherent risks associated with this clustering approach, suggesting that while KMeans can provide robust returns, it also exposes investors to greater volatility and potential losses during downturns.

# K-Means With Dynamic Time Warping (DTW) Portfolio

In this section, we extend our previous analysis by applying K-means clustering using the Dynamic Time Warping (DTW) distance metric. Unlike the Euclidean metric, DTW is particularly effective for time series data as it accounts for shifts and distortions in the temporal sequences of asset returns. This allows us to group assets based on their patterns of movement over time, rather than just their absolute return values.

By clustering assets with DTW, we aim to construct a portfolio that captures the underlying dynamics of asset prices more accurately. After performing the clustering, we will select representative assets from each cluster and optimize their weights to maximize the Sharpe Ratio, similar to our earlier approach. This methodology not only enhances diversification but also leverages the temporal relationships between assets, potentially improving overall portfolio performance.

To prepare the data for K-means clustering using Dynamic Time Warping, we first resample the training returns to a weekly frequency. By taking the mean of each week, we reduce noise and make the dataset more manageable, while still preserving the overall trends in asset returns.

In [41]:
# Resample the training returns to weekly frequency by taking the mean of each week
train_returns_resampled = train_returns.T.resample('W').mean().dropna()

# Standardize the weekly returns after resampling (z-score normalization)
scaled_returns = StandardScaler().fit_transform(train_returns_resampled.T)

In [42]:
# Define the parameter grid for the number of clusters and assets per cluster
param_grid = {
    'n_clusters': range(3, 10),  # Define a reasonable range for clusters
    'n_assets_per_cluster': range(3, 15)  # Define a range for assets per cluster
}

# Create a grid of all possible parameter combinations
grid = ParameterGrid(param_grid)

# Initialize variables to track the best Sharpe Ratio and corresponding parameters
best_sharpe_ratio = float('inf')  # We use infinity to ensure any found Sharpe Ratio will be smaller
best_params = None  # Placeholder for the best parameters

# Iterate through each combination of parameters in the grid
for params in grid:
    n_clusters = params['n_clusters']
    n_assets_per_cluster = params['n_assets_per_cluster']
    
    # Compute the negative Sharpe Ratio for the current parameter combination using Dynamic Time Warping (DTW) metric
    sharpe_ratio_value = objective_function(n_clusters, n_assets_per_cluster, scaled_returns, "dtw")
    
    # Update the best Sharpe Ratio and parameters if a better combination is found
    if sharpe_ratio_value < best_sharpe_ratio:
        best_sharpe_ratio = sharpe_ratio_value
        best_params = params

# Print the best Sharpe Ratio (converted back to positive) and the corresponding optimal parameters
print(f"Best Sharpe Ratio: {-best_sharpe_ratio}")
print(f"Optimal Parameters: {best_params}")

Best Sharpe Ratio: -1.4428154743835477e-05
Optimal Parameters: {'n_assets_per_cluster': 3, 'n_clusters': 4}


Once the data has been standardized and hyperparameters have been defined, we proceed to apply KMeans clustering using Dynamic Time Warping to group the assets based on their historical returns. Clustering with DTW allows us to capture the temporal relationships between assets, providing a more nuanced understanding of their return patterns.

In [43]:
# Step 2: Apply KMeans Clustering using the Dynamic Time Warping (DTW) metric
n_clusters = 4  # Set the number of clusters to use for KMeans
kmeans = TimeSeriesKMeans(n_clusters=n_clusters, metric="dtw", random_state=42)

# Fit the KMeans model to the scaled returns and assign each asset to a cluster
clusters = kmeans.fit_predict(scaled_returns)  # Transposed to cluster assets instead of time periods

In [44]:
# Create a DataFrame to associate each asset with its assigned cluster
cluster_df = pd.DataFrame({
    'Instrument': train_returns_resampled.columns,  # Asset names (columns of the resampled returns)
    'Cluster': clusters  # Cluster assignments from KMeans
})

# Print the count of assets in each cluster
print(cluster_df.groupby('Cluster').count())

         Instrument
Cluster            
0                 7
1                32
2                21
3                 9


In [45]:
# Step 3: Select Assets from Each Cluster
n_assets_per_cluster = 6  # Define the number of assets to select from each cluster
selected_assets = []

for cluster in range(n_clusters):
    # Get the assets belonging to the current cluster
    cluster_assets = cluster_df[cluster_df['Cluster'] == cluster]['Instrument']
    
    # Determine the number of assets available in the current cluster
    n_assets_in_cluster = len(cluster_assets)
    
    # If the cluster contains fewer assets than needed, select all assets from the cluster
    if n_assets_per_cluster > n_assets_in_cluster:
        selected_assets.extend(cluster_assets.values)
    else:
        # Randomly select the specified number of assets from the current cluster
        selected_assets.extend(cluster_assets.sample(n=n_assets_per_cluster, random_state=42).values)

# Print the final list of selected assets from all clusters
print("Selected Assets:", selected_assets)

Selected Assets: ['Addoha', 'Cartier Saada', 'Timar', 'INVOLYS', 'Risma', 'Med Paper', 'Ste Boissons', 'Jet Contractors', 'Saham Assurance', 'Lesieur Cristal', 'CMT', 'Colorado', 'AFMA', 'Rebab Company', 'Maroc Telecom', 'Afric Indus', 'COSUMAR', 'BCP', 'Sonasid', 'Delattre Lev', 'Res.Dar Saada', 'Alliances', 'STROC Indus', 'FENIE BROSSETTE']


In [46]:
# Step 4: Create the Portfolio with Selected Assets
# Calculate daily returns for the selected assets and drop missing values
selected_returns = stocks[selected_assets].pct_change().dropna()

# Define initial equal weights for the selected assets
initial_weights = np.array([1 / len(selected_assets)] * len(selected_assets))

# Define bounds for the weights (each weight between 0.001 and 1) and a constraint that the weights must sum to 1
bounds = [(0.001, 1)] * len(selected_assets)
constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})

# Optimize the portfolio for maximum Sharpe Ratio using the selected assets' returns
result = minimize(sharpe_ratio, initial_weights, args=(selected_returns, 0.02),
                  bounds=bounds, method='SLSQP', constraints=constraints)

# Extract the optimized portfolio weights from the optimization result
optimized_weights = result.x

# Print the optimized weights and the Sharpe Ratio of the optimized portfolio
print("Optimized Weights:", optimized_weights)
optimized_sharpe_ratio = -result.fun  # Convert back to positive since it was negated for optimization
print("Optimized Sharpe Ratio:", optimized_sharpe_ratio)

Optimized Weights: [0.001      0.05302953 0.001      0.06576776 0.001      0.03361686
 0.08175674 0.08231999 0.03664512 0.06612375 0.00501535 0.04300035
 0.22628596 0.001      0.001      0.07458222 0.001      0.001
 0.12366488 0.001      0.001      0.001      0.04758538 0.05060612]
Optimized Sharpe Ratio: 0.0035235493017717616


In [47]:
# Show the selected assets and their corresponding optimized weights
portfolio = pd.DataFrame({
    'Asset': selected_assets,  # List of selected assets
    'Weight': optimized_weights  # Corresponding optimized weights
})

# Print the portfolio composition (assets and their weights)
print("Portfolio Composition:\n", portfolio)

Portfolio Composition:
               Asset    Weight
0            Addoha  0.001000
1     Cartier Saada  0.053030
2             Timar  0.001000
3           INVOLYS  0.065768
4             Risma  0.001000
5         Med Paper  0.033617
6      Ste Boissons  0.081757
7   Jet Contractors  0.082320
8   Saham Assurance  0.036645
9   Lesieur Cristal  0.066124
10              CMT  0.005015
11         Colorado  0.043000
12             AFMA  0.226286
13    Rebab Company  0.001000
14    Maroc Telecom  0.001000
15      Afric Indus  0.074582
16          COSUMAR  0.001000
17              BCP  0.001000
18          Sonasid  0.123665
19     Delattre Lev  0.001000
20    Res.Dar Saada  0.001000
21        Alliances  0.001000
22      STROC Indus  0.047585
23  FENIE BROSSETTE  0.050606


## Backtesting the K-Means DTW Portfolio

In [48]:
# Calculate the cumulative returns for the KMeans-optimized portfolio over the testing period
kmeans_portfolio_returns = cumulative_returns(optimized_weights, test[selected_assets])

print(f"The cumulative return for the KMeans Portfolio (DTW Metric) over the entire testing period: {kmeans_portfolio_returns[-1] * 100:.2f}%")

The cumulative return for the KMeans Portfolio (DTW Metric) over the entire testing period: 45.00%


In [49]:
# Create traces
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=benchmark_cumulative_returns, mode='lines', name='MASI Index', line=dict(dash='dash'))
kmeans_trace = go.Scatter(x=kmeans_portfolio_returns.index, y=kmeans_portfolio_returns, mode='lines', name='KMeans (DTW) Portfolio', line=dict(color='green'))

# Layout
layout = go.Layout(
    title='K-Means (DTW) Portfolio Performance',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Cumulative Returns'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=12))
)

# Create figure
fig = go.Figure(data=[kmeans_trace, masi_trace], layout=layout)

# Show plot
fig.show()

During the initial months, the K-Means (DTW) Portfolio demonstrates a relatively slow start, exhibiting minimal growth. However, starting around mid-2021, it begins to show a marked improvement in returns, eventually surpassing the MASI Index. This suggests that the DTW clustering approach effectively identified and selected assets that performed well in the latter part of the testing period.

The portfolio's performance trajectory indicates that the DTW method may provide advantages over traditional metrics by focusing on the shape of the return patterns rather than their magnitude alone. As the K-Means (DTW) Portfolio continues to gain momentum towards the end of the period, it reflects a successful strategy for capturing market trends and achieving competitive returns compared to the MASI Index.

In [50]:
# Create traces
kmeans_trace = go.Scatter(x=kmeans_portfolio_returns.index, y=maximum_drawdown(kmeans_portfolio_returns), mode='lines', name='KMeans (DTW) Portfolio Drawdown', line=dict(color='orange'))
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=maximum_drawdown(benchmark_cumulative_returns), mode='lines', name='MASI Drawdown', line=dict(dash='dash', color='red'))

# Layout
layout = go.Layout(
    title='K-Means (DTW) Portfolio Drawdown Comparison',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Maximum Drawdown'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=11))
)

# Create figure
fig = go.Figure(data=[kmeans_trace, masi_trace], layout=layout)

# Show plot
fig.show()

In early 2021, the K-Means (DTW) Portfolio experiences significant drawdowns, reaching nearly -20% during the most volatile period. This indicates a challenging market environment for the portfolio, likely reflecting the complexities involved in clustering assets using DTW and the inherent risks of the selected investments. 

In contrast, the MASI Index exhibits a more stable performance, with a smaller drawdown throughout the testing period. This suggests that while the K-Means (DTW) Portfolio aimed for higher potential returns, it also exposed investors to greater volatility and risk.

However, after the initial volatility, the K-Means (DTW) Portfolio stabilizes, maintaining a relatively flat drawdown level, which indicates a recovery and more consistent performance in the latter part of the period. This plot highlights the importance of managing drawdowns when employing clustering techniques in portfolio construction, as the approach can lead to both substantial risks and opportunities for recovery.

# Autoencoder + KMeans Portfolio Building

In this section, we combine the strengths of autoencoders and KMeans clustering to construct a robust portfolio. Autoencoders, a type of neural network, are utilized to compress the high-dimensional asset return data into a lower-dimensional latent space. This compression allows us to capture the essential features and patterns in the data while reducing noise and computational complexity.

Once the data is compressed, we apply KMeans clustering to the latent representations. This approach enables us to group similar assets based on their return patterns, facilitating more informed asset selection. By identifying clusters of assets that exhibit similar behaviors, we aim to create a diversified portfolio that minimizes risk while maximizing potential returns.

After clustering, we will select representative assets from each cluster and optimize their weights using methods such as Sharpe Ratio maximization. This integrated approach leverages both machine learning and traditional portfolio theory, allowing us to harness the power of data-driven techniques to enhance portfolio performance.

## Building the Autoencoder + KMeans Portfolio

In [51]:
# Define the size of the input layer (equal to the number of assets/features)
input_dim = train_returns.shape[1]  # Number of assets

# Define the size of the latent space (compressed dimension)
latent_dim = 10  # You can adjust this based on experimentation

# Define the encoder part of the autoencoder
input_layer = Input(shape=(input_dim,))  # Input layer with size equal to the number of assets
encoder = Dense(64, activation='relu')(input_layer)  # First hidden layer with 64 units and ReLU activation
encoder = Dense(32, activation='relu')(encoder)  # Second hidden layer with 32 units and ReLU activation
latent_space = Dense(latent_dim, activation='relu')(encoder)  # Latent space with reduced dimensions

# Define the decoder part of the autoencoder
decoder = Dense(32, activation='relu')(latent_space)  # Decoder layer with 32 units and ReLU activation
decoder = Dense(64, activation='relu')(decoder)  # Another decoder layer with 64 units and ReLU activation
output_layer = Dense(input_dim, activation='sigmoid')(decoder)  # Output layer with sigmoid activation

# Combine encoder and decoder into a complete autoencoder model
autoencoder = Model(input_layer, output_layer)

# Compile the model with Adam optimizer and Mean Squared Error loss function
autoencoder.compile(optimizer='adam', loss='mse')

# Print the model summary to show the architecture
autoencoder.summary()

With the autoencoder architecture defined, we now train the model on the asset return data for both the input and output, aiming to reconstruct the input.

In [52]:
# Convert the train_returns DataFrame to a numpy array for training the autoencoder
X_train = train_returns.values

# Train the autoencoder on the training data (X_train) with 50 epochs and a batch size of 32
autoencoder.fit(X_train, X_train, epochs=50, batch_size=32, validation_split=0.2)

Epoch 1/50
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 118ms/step - loss: 0.2502 - val_loss: 0.2499
Epoch 2/50
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 23ms/step - loss: 0.2497 - val_loss: 0.2493
Epoch 3/50
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 21ms/step - loss: 0.2491 - val_loss: 0.2487
Epoch 4/50
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step - loss: 0.2484 - val_loss: 0.2478
Epoch 5/50
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step - loss: 0.2475 - val_loss: 0.2467
Epoch 6/50
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step - loss: 0.2463 - val_loss: 0.2453
Epoch 7/50
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step - loss: 0.2447 - val_loss: 0.2434
Epoch 8/50
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step - loss: 0.2426 - val_loss: 0.2409
Epoch 9/50
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m 

<keras.src.callbacks.history.History at 0x3339a71a0>

After training the autoencoder, we define an encoder model specifically to extract the latent features from the input data. This encoder model takes the original input layer and outputs the latent space, which contains the compressed representations of the asset returns.

We then generate the compressed data by predicting the latent features using the trained encoder model on the training data. These compressed representations will be used for the subsequent KMeans clustering, enabling us to capture the essential patterns in asset behavior.

In [53]:
# Define an encoder model to extract latent features
encoder_model = Model(inputs=input_layer, outputs=latent_space)

# Generate compressed representations (latent space) for clustering
compressed_data = encoder_model.predict(X_train)

[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step


In [100]:
# Perform KMeans clustering on the compressed data (latent space from the autoencoder)
n_clusters = 4  # Set the number of clusters
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
clusters = kmeans.fit_predict(compressed_data)  # Perform clustering on the latent space

# Create a DataFrame associating each asset with its assigned cluster
cluster_df = pd.DataFrame({
    'Instrument': train_returns.index,  # Asset names from the original data
    'Cluster': clusters  # Cluster assignments from KMeans
})

# Display the count of assets in each cluster
print(cluster_df.groupby('Cluster').count())

         Instrument
Cluster            
0                34
1                21
2                 4
3                10


In [121]:
# Step 3: Select Assets from Each Cluster
n_assets_per_cluster = 6  # Number of assets to select from each cluster
selected_assets = []

for cluster in range(n_clusters):
    # Get the assets belonging to the current cluster
    cluster_assets = cluster_df[cluster_df['Cluster'] == cluster]['Instrument']
    
    # Determine the number of assets available in the current cluster
    n_assets_in_cluster = len(cluster_assets)
    
    # If the cluster contains fewer assets than needed, select all assets from the cluster
    if n_assets_per_cluster > n_assets_in_cluster:
        selected_assets.extend(cluster_assets.values)
    else:
        # Randomly select the specified number of assets from the cluster
        selected_assets.extend(cluster_assets.sample(n=n_assets_per_cluster, random_state=42).values)

# Print the final list of selected assets from all clusters
print("Selected Assets:", selected_assets)

Selected Assets: ['FENIE BROSSETTE', 'LafargeHolcim', 'S2M', 'Risma', 'Colorado', 'Med Paper', 'AFMA', 'Stokvis Nord Afr', 'SALAFIN', 'Afric Indus', 'EQDOM', 'BCP', 'Addoha', 'Alliances', 'IBMaroc', 'Res.Dar Saada', 'Unimer', 'BALIMA', 'Rebab Company', 'Auto Nejma', 'Timar', 'Central.Danone']


Once a specified number of assets are selected from each cluster, we proceed to optimize the portfolio weights to maximize the Sharpe Ratio. By adjusting the asset allocations based on the latent representations and cluster characteristics, we aim to construct a portfolio that balances risk and return effectively.

In [122]:
# Step 4: Create the Portfolio with Selected Assets
# Calculate daily returns for the selected assets and drop missing values
selected_returns = stocks[selected_assets].pct_change().dropna()

# Define initial equal weights for the selected assets
initial_weights = np.array([1 / len(selected_assets)] * len(selected_assets))

# Define bounds for the weights (each weight between 0.001 and 1) and a constraint that the weights must sum to 1
bounds = [(0.001, 1)] * len(selected_assets)
constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})

# Optimize the portfolio for maximum Sharpe Ratio using the selected assets' returns
result = minimize(sharpe_ratio, initial_weights, args=(selected_returns, 0.02),
                  bounds=bounds, method='SLSQP', constraints=constraints)

# Extract the optimized portfolio weights from the optimization result
optimized_weights = result.x

# Print the optimized weights and the Sharpe Ratio of the optimized portfolio
print("Optimized Weights:", optimized_weights)
optimized_sharpe_ratio = -result.fun  # Convert back to positive since it was negated for optimization
print("Optimized Sharpe Ratio:", optimized_sharpe_ratio)

Optimized Weights: [0.07513009 0.001      0.001      0.06201394 0.0975145  0.05906878
 0.30324808 0.001      0.00580149 0.10025831 0.19785838 0.001
 0.001      0.001      0.001      0.001      0.001      0.08610643
 0.001      0.001      0.001      0.001     ]
Optimized Sharpe Ratio: 0.0026737551535977538


In [123]:
# Show the selected assets and their corresponding optimized weights in a DataFrame
portfolio = pd.DataFrame({
    'Asset': selected_assets,  # List of selected assets
    'Weight': optimized_weights  # Corresponding optimized weights
})

# Print the portfolio composition (assets and their weights)
print("Portfolio Composition:\n", portfolio)

Portfolio Composition:
                Asset    Weight
0    FENIE BROSSETTE  0.075130
1      LafargeHolcim  0.001000
2                S2M  0.001000
3              Risma  0.062014
4           Colorado  0.097514
5          Med Paper  0.059069
6               AFMA  0.303248
7   Stokvis Nord Afr  0.001000
8            SALAFIN  0.005801
9        Afric Indus  0.100258
10             EQDOM  0.197858
11               BCP  0.001000
12            Addoha  0.001000
13         Alliances  0.001000
14           IBMaroc  0.001000
15     Res.Dar Saada  0.001000
16            Unimer  0.001000
17            BALIMA  0.086106
18     Rebab Company  0.001000
19        Auto Nejma  0.001000
20             Timar  0.001000
21    Central.Danone  0.001000


## Backtesting the Autoencoder + KMeans Portfolio

In [124]:
# Calculate the cumulative returns for the autoencoder-optimized portfolio over the testing period
autoencoder_portfolio_returns = cumulative_returns(optimized_weights, test[selected_assets])

print(f"The cumulative return for the Autoencoder Portfolio over the entire testing period : {autoencoder_portfolio_returns[-1]*100:.2f}%")

The cumulative return for the Autoencoder Portfolio over the entire testing period : 40.22%


In [125]:
# Create traces
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=benchmark_cumulative_returns, mode='lines', name='MASI Index', line=dict(dash='dash'))
autoencoder_trace = go.Scatter(x=autoencoder_portfolio_returns.index, y=autoencoder_portfolio_returns, mode='lines', name='Autoencoder Clustered Portfolio', line=dict(color='green'))

# Layout
layout = go.Layout(
    title='Autoencoder Portfolio Performance',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Cumulative Returns'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=12))
)


# Create figure
fig = go.Figure(data=[autoencoder_trace, masi_trace], layout=layout)

# Show plot
fig.show()

Initially, the Autoencoder Clustered Portfolio exhibits modest growth, trailing the MASI Index during the early months. However, as the market dynamics evolve, the portfolio starts to gain momentum, showcasing a notable increase in cumulative returns, particularly from mid-2021 onwards.

By the end of the testing period, the Autoencoder Clustered Portfolio has surpassed the MASI Index, reflecting successful asset selection and weight optimization based on clustered features. This performance suggests that utilizing an autoencoder to inform clustering can lead to improved portfolio outcomes by effectively identifying assets that share similar return characteristics.

In [60]:
# Create traces
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=maximum_drawdown(benchmark_cumulative_returns), mode='lines', name='MASI Index Drawdown', line=dict(dash='dash', color='red'))
autoencoder_trace = go.Scatter(x=autoencoder_portfolio_returns.index, y=maximum_drawdown(autoencoder_portfolio_returns), mode='lines', name='Autoencoder Portfolio Drawdown', line=dict(color='green'))

# Layout
layout = go.Layout(
    title='Autoencoder Portfolio Drawdown Comparison',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Maximum Drawdown'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=11))
)

# Create figure
fig = go.Figure(data=[autoencoder_trace, masi_trace], layout=layout)

# Show plot
fig.show()

The Autoencoder Portfolio shows a maximum drawdown of approximately -10% during the early months of 2021, indicating a challenging market environment. This initial dip suggests that the portfolio struggled with asset selection during this period, reflecting the inherent volatility associated with new strategies.

In contrast, the MASI Index displays a more stable drawdown profile, with shallower fluctuations, indicating a less volatile investment environment. Notably, after the initial drawdown, the Autoencoder Portfolio stabilizes, maintaining a relatively flat drawdown level, which suggests improved resilience and recovery from earlier losses.

Overall, while the Autoencoder Portfolio experienced a significant drawdown initially, it demonstrates a capacity for stabilization and recovery, highlighting the importance of risk management and asset selection in enhancing portfolio performance.

# Conclusion

In this analysis, we explored various portfolio optimization strategies to assess their performance against the MASI Index. We began by implementing traditional methods, including the equal-weighted and market-cap weighted portfolios, and progressed to more advanced techniques such as Sharpe Ratio optimization and Mean-Variance optimization. 

Subsequently, we incorporated machine learning approaches by applying KMeans clustering with both Euclidean and Dynamic Time Warping (DTW) metrics, as well as leveraging autoencoders for dimensionality reduction prior to clustering. 

Throughout our analysis, the performance of the portfolios varied, with the autoencoder and KMeans approaches ultimately yielding competitive returns. The ability of these methods to capture patterns in asset behavior allowed for enhanced portfolio construction, demonstrating that integrating machine learning techniques can significantly improve investment strategies.

The following plot visually compares the cumulative returns of all the portfolio strategies employed, providing a comprehensive overview of their effectiveness in achieving risk-adjusted returns over the testing period :

In [126]:
# Create traces
marketcap_trace = go.Scatter(x=portfolio_returns_market_cap_weighted.index, y=portfolio_returns_market_cap_weighted, mode='lines', name='Market-Cap Portfolio')
masi_trace = go.Scatter(x=benchmark_cumulative_returns.index, y=benchmark_cumulative_returns, mode='lines', name='MASI Index', line=dict(dash='dash'))
portfolio_trace = go.Scatter(x=portfolio_returns_equal_weighted.index, y=portfolio_returns_equal_weighted, mode='lines', name='Equal-Weighted Portfolio')
optimized_trace = go.Scatter(x=portfolio_returns_optimized.index, y=portfolio_returns_optimized, mode='lines', name='Sharpe Ratio Optimized Portfolio')
autoencoder_trace = go.Scatter(x=autoencoder_portfolio_returns.index, y=autoencoder_portfolio_returns, mode='lines', name='Autoencoder Portfolio', line=dict(color='green'))
kmeans_trace = go.Scatter(x=kmeans_portfolio_returns.index, y=kmeans_portfolio_returns, mode='lines', name='KMeans (DTW) Portfolio', line=dict(color='orange'))
mean_variance_trace = go.Scatter(x=mean_variance_portfolio.index, y=mean_variance_portfolio, mode='lines', name='Mean-Variance Optimized Portfolio', line=dict(color='purple'))

# Layout
layout = go.Layout(
    title='Portfolio Strategies Comparison',
    xaxis=dict(title='Date'),
    yaxis=dict(title='Cumulative Returns'),
    legend=dict(x=0, y=1, traceorder='normal', font=dict(size=10))
)


# Create figure
fig = go.Figure(data=[autoencoder_trace, masi_trace, portfolio_trace, optimized_trace, kmeans_trace, mean_variance_trace, marketcap_trace], layout=layout)

# Show plot
fig.show()

+ The **Autoencoder Portfolio** emerges as a standout performer, demonstrating consistent growth throughout the period and outperforming all other strategies by a notable margin, particularly towards the end of 2021. This suggests that the combination of autoencoder and clustering techniques effectively captured advantageous patterns in asset returns.

+ The **KMeans (DTW) Portfolio** also performs well, reflecting a successful optimization strategy focused on maximizing risk-adjusted returns. It maintains a trajectory similar to that of the Autoencoder Portfolio, indicating the effectiveness of this traditional method.

+ In contrast, the **Mean-Variance Optimized Portfolio** and the **Market-Cap Portfolio** display more volatility and lesser cumulative returns, particularly the Market-Cap Portfolio, which struggles to keep pace with the others. 

+ The **Sharpe Ratio Optimized Portfolio** performs moderately, showcasing potential but not achieving the same level of success as the Autoencoder and Sharpe Ratio optimized approaches.

+ Finally, the **Equal-Weighted Portfolio** remains relatively flat, highlighting the limitations of this simple strategy in a dynamic market environment.


Overall, the comparison illustrates that advanced clustering techniques, such as those employed with autoencoders and KMeans, can enhance portfolio performance, providing investors with valuable insights into asset behavior and more effective strategies for maximizing returns.

# End :)