# **Project Description**

This notebook implements a  quantitative finance pipeline designed to construct a **Maximum Sharpe Ratio Portfolio** by leveraging an advanced **Hybrid Volatility Forecasting Model** over a diversified, multi-country (contintents) basket of financial sector stocks.

The core strategy is to move beyond historical risk assessment by combining traditional econometrics with modern deep learning to predict future asset risk (variance), which is then used as the primary input for portfolio optimization.

### **Key Methodology**

1.  **Hybrid Risk Forecasting:**
    * **GARCH(1,1):**
    
    Used to calculate the historical conditional variance (volatility-squared) of each asset's daily log returns.
    
    This GARCH variance serves as the **true risk signal (target variable)** for the neural network.


    * **BiLSTM Model:**
    
    A Bidirectional Long Short-Term Memory network is trained on historical log returns to forecast the **future GARCH conditional variance**.
    
    This structure is highly effective for capturing complex, non-linear, and time-dependent patterns in financial time series.


    

2.  **Portfolio Optimization:**
    * **Predictive Covariance Matrix:**
    
    The BiLSTM's volatility forecast is combined with the historical correlation matrix to construct a forward-looking, annualized Covariance Matrix.


    * **Maximum Sharpe Ratio:**
    
    The portfolio weights are optimized using the **SLSQP algorithm** to **Minimize the Negative Sharpe Ratio** (Maximize the Sharpe Ratio).


    * **Diversification Constraints:**
    
    Practical constraints are enforced to ensure robust and diversified allocation, with asset weights bounded between **1% and 40%** ($0.01 \le w_i \le 0.40$).

### **Conclusion**


The final comparison highlights the country-specific portfolio that yields the highest risk-adjusted return (Sharpe Ratio) based on the hybrid model's volatility predictions, providing an actionable strategy for capital allocation.

**WARNING**:

This notebook is not a financial advisor.

In [1]:
pip install arch

Collecting arch
  Downloading arch-8.0.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (13 kB)
Downloading arch-8.0.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (981 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.3/981.3 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: arch
Successfully installed arch-8.0.0


In [7]:
import yfinance as yf
import pandas as pd
import numpy as np
import tensorflow as tf
from arch import arch_model
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Bidirectional, LSTM, Dropout, BatchNormalization
from scipy.optimize import minimize
import warnings


warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
tf.get_logger().setLevel('ERROR')

In [8]:


START_DATE = "2010-01-01"
END_DATE = "2024-12-31"
LOOKBACK = 20
FORECAST_HORIZON = 1
RISK_FREE_RATE = 0.02
ANNUALIZATION_FACTOR = 252


TICKERS_BY_COUNTRY = {


    "UK": ["HSBA.L", "LLOY.L", "BARC.L", "NWG.L", "STAN.L", "AV.L", "LGEN.L"],

    "France": ["BNP.PA", "GLE.PA", "ACA.PA", "CS.PA", "ML.PA", "SGO.PA", "CA.PA"],


    "USA": ["JPM", "BAC", "WFC", "GS", "MS", "AXP", "C"],


    "Australia": ["CBA.AX", "NAB.AX", "WBC.AX", "ANZ.AX", "MQG.AX", "SUN.AX", "AMP.AX"],
}

ALL_TICKERS = [ticker for country in TICKERS_BY_COUNTRY.values() for ticker in country]



def calculate_garch_volatility(series):


    returns = np.log(series / series.shift(1)).dropna()

    am = arch_model(returns * 100, vol='Garch', p=1, q=1, rescale=False)
    try:
        res = am.fit(update_freq=0, disp='off')

        conditional_variance = (res.conditional_volatility**2) / (100**2)
        return pd.Series(conditional_variance, index=returns.index)
    except Exception:

        return pd.Series(np.nan, index=returns.index)

def portfolio_volatility(weights, cov_matrix):

    port_variance = weights.T @ cov_matrix @ weights

    epsilon = 1e-10
    return np.sqrt(max(0, port_variance) + epsilon)

def portfolio_return(weights, expected_returns):

    return weights.T @ expected_returns

def negative_sharpe_ratio(weights, expected_returns, cov_matrix, risk_free_rate):

    port_return = portfolio_return(weights, expected_returns)
    port_volatility = portfolio_volatility(weights, cov_matrix)

    if port_volatility < 1e-8:
        return 1e9


    sharpe_ratio = (port_return - risk_free_rate) / port_volatility
    return -sharpe_ratio



def build_bilstm_model(lookback, n_features):

    input_tensor = Input(shape=(lookback, n_features), name='Input_Layer')

    x = Bidirectional(LSTM(units=128, return_sequences=True, name='BiLSTM_1',
                           kernel_regularizer=tf.keras.regularizers.l2(0.001)))(input_tensor)
    x = BatchNormalization(name='BatchNorm_1')(x)
    x = Dropout(0.3, name='Dropout_1')(x)

    x = Bidirectional(LSTM(units=64, return_sequences=False, name='BiLSTM_2',
                           kernel_regularizer=tf.keras.regularizers.l2(0.001)))(x)
    x = BatchNormalization(name='BatchNorm_2')(x)
    x = Dropout(0.3, name='Dropout_2')(x)

    output_tensor = Dense(units=n_features, activation='linear', name='Output_Dense')(x)

    model = Model(inputs=input_tensor, outputs=output_tensor)
    optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
    model.compile(optimizer=optimizer, loss='mse', metrics=['mae', 'mse'])

    return model



def preprocess_and_sequence(data, volatility_data):


    features = np.log(data / data.shift(1)).dropna()
    targets = volatility_data.loc[features.index]
    features = features.loc[targets.index]

    if features.empty or targets.empty:
        return None, None, None, None, None, None, None, None, None

    N_FEATURES = features.shape[1]
    TICKERS = features.columns.tolist()


    feature_scaler = MinMaxScaler(feature_range=(0, 1))
    scaled_features = feature_scaler.fit_transform(features)
    target_scaler = MinMaxScaler(feature_range=(0, 1))
    scaled_targets = target_scaler.fit_transform(targets)


    def create_sequences(features_arr, targets_arr, lookback, horizon):
        X, Y = [], []
        for i in range(len(features_arr) - lookback - horizon + 1):
            X.append(features_arr[i:(i + lookback), :])
            Y.append(targets_arr[i + lookback + horizon - 1, :])
        return np.array(X), np.array(Y)

    X, Y = create_sequences(scaled_features, scaled_targets, LOOKBACK, FORECAST_HORIZON)
    train_size = int(len(X) * 0.8)
    X_train, X_test = X[:train_size], X[train_size:]
    Y_train, Y_test = Y[:train_size], Y[train_size:]


    test_index = features.index[train_size + LOOKBACK + FORECAST_HORIZON - 1:]

    return (X_train, Y_train, X_test, Y_test,
            features, target_scaler, test_index, TICKERS, N_FEATURES)



def optimize_portfolio(expected_returns, cov_matrix, tickers, risk_free_rate):

    num_assets = len(tickers)
    initial_weights = np.array([1/num_assets] * num_assets)


    MIN_WEIGHT = 0.01
    MAX_WEIGHT = 0.40


    bounds = tuple([(MIN_WEIGHT, MAX_WEIGHT)] * num_assets)

    constraints = ({'type': 'eq', 'fun': lambda weights: np.sum(weights) - 1})

    optimal_results = minimize(
        negative_sharpe_ratio,
        initial_weights,
        args=(expected_returns, cov_matrix, risk_free_rate),
        method='SLSQP',
        bounds=bounds,
        constraints=constraints
    )

    if optimal_results.success:
        optimal_weights = optimal_results.x
        final_return = portfolio_return(optimal_weights, expected_returns)
        final_volatility = portfolio_volatility(optimal_weights, cov_matrix)
        final_sharpe = (final_return - risk_free_rate) / final_volatility


        weights_df = pd.Series(optimal_weights, index=tickers)

        top_weights_str = weights_df.sort_values(ascending=False).head(5).to_string(float_format='%.2f')
    else:

        final_return, final_volatility, final_sharpe = np.nan, np.nan, np.nan
        top_weights_str = f"Optimization failed: {optimal_results.message}"

    return {
        'Return': final_return,
        'Volatility': final_volatility,
        'Sharpe': final_sharpe,
        'Weights': top_weights_str,
        'Status': optimal_results.message
    }


def main_analysis():
    print("--- STARTING MULTI-COUNTRY QUANTITATIVE ANALYSIS (DIVERSIFIED) ---\n")
    print(f"1. Downloading and processing data from {START_DATE} to {END_DATE}...")


    all_tickers_list = list(set(ALL_TICKERS))
    data = yf.download(all_tickers_list, start=START_DATE, end=END_DATE, progress=False)['Close']
    data = data.ffill().bfill()

    all_results = {}
    best_sharpe = -np.inf
    best_country = None

    for country, tickers in TICKERS_BY_COUNTRY.items():
        print(f"\n==================== PROCESSING: {country} ({len(tickers)} assets) ====================\n")


        country_data = data[[t for t in tickers if t in data.columns]].dropna(axis=1)
        country_tickers = country_data.columns.tolist()
        N_FEATURES_COUNTRY = len(country_tickers)

        if N_FEATURES_COUNTRY < 2:
            print(f"Skipping {country}: Not enough valid tickers ({N_FEATURES_COUNTRY}).")
            continue


        volatility_data = pd.DataFrame(index=country_data.index)
        for ticker in country_tickers:
            volatility_data[ticker] = calculate_garch_volatility(country_data[ticker])


        volatility_data = volatility_data.ffill().bfill().dropna(axis=1)
        country_tickers = volatility_data.columns.tolist()
        country_data = country_data[country_tickers]
        N_FEATURES_COUNTRY = len(country_tickers)

        if N_FEATURES_COUNTRY < 2:
            print(f"Skipping {country}: Not enough valid tickers after GARCH filtering ({N_FEATURES_COUNTRY}).")
            continue


        results = preprocess_and_sequence(country_data, volatility_data)
        if results is None or results[0].shape[0] == 0:
            print(f"Skipping {country}: Data sequencing failed.")
            continue

        X_train, Y_train, X_test, Y_test, features, target_scaler, test_index, TICKERS_LIST, N_FEATURES_FINAL = results

        print(f"Data prepared. Final Tickers: {N_FEATURES_FINAL}. Test set size: {X_test.shape[0]}.")


        tf.keras.backend.clear_session()
        bilstm_model = build_bilstm_model(LOOKBACK, N_FEATURES_FINAL)

        history = bilstm_model.fit(
            X_train, Y_train, epochs=30, batch_size=32,
            validation_data=(X_test, Y_test), verbose=0,
            callbacks=[tf.keras.callbacks.EarlyStopping(patience=5, monitor='val_loss', restore_best_weights=True)]
        )
        _, _, mse = bilstm_model.evaluate(X_test, Y_test, verbose=0)
        rmse = np.sqrt(mse)
        print(f"-> BiLSTM RMSE (Volatility Forecast Accuracy): {rmse:.6f}")


        Y_pred_scaled = bilstm_model.predict(X_test, verbose=0)
        Y_pred_denorm = target_scaler.inverse_transform(Y_pred_scaled)


        Y_pred_denorm = np.maximum(Y_pred_denorm, 1e-9)

        predicted_volatilities_mean = np.mean(Y_pred_denorm, axis=0)


        historical_returns = features.loc[test_index]
        expected_returns_annual = historical_returns.mean().values * ANNUALIZATION_FACTOR


        correlation_matrix = historical_returns.corr().values
        correlation_matrix[np.isnan(correlation_matrix)] = 0


        predicted_sigma = np.sqrt(predicted_volatilities_mean)
        D = np.diag(predicted_sigma)

        predicted_covariance_matrix = D @ correlation_matrix @ D * ANNUALIZATION_FACTOR


        metrics = optimize_portfolio(
            expected_returns_annual, predicted_covariance_matrix, TICKERS_LIST, RISK_FREE_RATE
        )

        metrics['RMSE'] = rmse
        metrics['Tickers'] = N_FEATURES_FINAL
        all_results[country] = metrics

        print(f"-> Optimization Status: {metrics['Status']}")
        print(f"-> Predicted Annual Return: {metrics['Return']:.2%}")
        print(f"-> Predicted Annual Volatility (Risk): {metrics['Volatility']:.2%}")
        print(f"-> Predicted Sharpe Ratio: {metrics['Sharpe']:.3f}")
        print(f"-> Top Weights (Min 1%, Max 40%):\\n{metrics['Weights']}")
        print("----------------------------------------------------------")

        if metrics['Sharpe'] > best_sharpe and not np.isnan(metrics['Sharpe']):
            best_sharpe = metrics['Sharpe']
            best_country = country



    print("\n\n=============== FINAL COMPARISON OF BEST COUNTRY PORTFOLIOS (DIVERSIFIED) ===============")
    summary_df = pd.DataFrame.from_dict(all_results, orient='index')

    summary_df['Return'] = (summary_df['Return'] * 100).map('{:.2f}%'.format)
    summary_df['Volatility'] = (summary_df['Volatility'] * 100).map('{:.2f}%'.format)
    summary_df['Sharpe'] = summary_df['Sharpe'].map('{:.3f}'.format)
    summary_df = summary_df[['Tickers', 'RMSE', 'Return', 'Volatility', 'Sharpe']]\
        .sort_values(by='Sharpe', ascending=False)

    print(summary_df.to_markdown(numalign="left", stralign="left"))

    print(f"\nConclusion: The country whose portfolio leads to the best Sharpe Ratio is **{best_country}**.")
    print("=======================================================================================\n")


try:
    main_analysis()
except NameError:
    print("Please ensure all necessary libraries (yfinance, pandas, numpy, tensorflow, arch, scipy, sklearn) are installed and imported.")

--- STARTING MULTI-COUNTRY QUANTITATIVE ANALYSIS (DIVERSIFIED) ---

1. Downloading and processing data from 2010-01-01 to 2024-12-31...


Data prepared. Final Tickers: 7. Test set size: 771.
-> BiLSTM RMSE (Volatility Forecast Accuracy): 0.060695
-> Optimization Status: Optimization terminated successfully
-> Predicted Annual Return: 16.93%
-> Predicted Annual Volatility (Risk): 21.04%
-> Predicted Sharpe Ratio: 0.709
-> Top Weights (Min 1%, Max 40%):\nNWG.L    0.39
HSBA.L   0.38
STAN.L   0.19
AV.L     0.01
LGEN.L   0.01
----------------------------------------------------------


Data prepared. Final Tickers: 7. Test set size: 771.
-> BiLSTM RMSE (Volatility Forecast Accuracy): 0.042075
-> Optimization Status: Optimization terminated successfully
-> Predicted Annual Return: 11.82%
-> Predicted Annual Volatility (Risk): 23.90%
-> Predicted Sharpe Ratio: 0.411
-> Top Weights (Min 1%, Max 40%):\nCS.PA    0.40
SGO.PA   0.40
ACA.PA   0.16
CA.PA    0.01
BNP.PA   0.01
-----------------------