### Plan of Action
ðŸ§  Step-by-Step Pipeline: Generalized Model + EA + XGBoost
1. Data Preparation
- Load OHLCV data for all selected tickers (e.g., AMZN, META, AVGO, ETFs)
- Normalize features per ticker (z-score or min-max scaling)
- Add metadata: ticker_id, sector, day_of_week, macro regime (optional)
- Create rolling windows for time series modeling (e.g., 10-day sequences)
2. Feature Engineering
- Technical indicators: RSI, MACD, Bollinger Bands, ATR
- Candle features: range, body size, wick ratios
- Volume features: OBV, VWAP, volume spikes
- Lagged returns, volatility, momentum scores
3. Labeling Strategy
- Define swing trade targets:
- Binary: Will price rise >x% in next n days?
- Multi-class: Uptrend / Downtrend / Sideways
- Regression: Expected return over next n days

ðŸ§¬ 4. Evolutionary Algorithm Optimization
- Purpose: Explore feature combinations, thresholds, and model hyperparameters
- Approach:
- Use genetic algorithm (e.g., DEAP, PyGAD) to evolve:
- Feature subsets
- Thresholds for entry/exit
- XGBoost hyperparameters (e.g., depth, learning rate)
- Fitness function: Sharpe ratio, accuracy, precision, or custom KPI
# Example fitness function
def fitness(individual):
    selected_features = individual[:n]
    xgb_params = decode_params(individual[n:])
    model = XGBClassifier(**xgb_params)
    score = cross_val_score(model, X[selected_features], y, scoring='accuracy')
    return score.mean(),

î·™î·š

ðŸŒ² 5. XGBoost Refinement
- Train final model using best EA-selected features and hyperparameters
- Use XGBoost for:
- Feature importance ranking
- Fast inference
- Robust performance on tabular data
from xgboost import XGBClassifier
model = XGBClassifier(**best_params)
model.fit(X_train[selected_features], y_train)



ðŸ“Š 6. Evaluation & Backtesting
- Use walk-forward validation or time-series cross-validation
- Evaluate:
- Accuracy, precision, recall
- Sharpe ratio, Sortino ratio
- Win rate, average trade duration


# 0. Dependencies

In [15]:
import yfinance as yf
import matplotlib as plt
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
from ta.momentum import RSIIndicator, StochRSIIndicator
from ta.trend import MACD, SMAIndicator, EMAIndicator
from ta.volatility import BollingerBands, AverageTrueRange
from ta.volume import OnBalanceVolumeIndicator, ChaikinMoneyFlowIndicator

## 1. Data Preparation
### Load OHLCV data for all selected tickers (e.g., AMZN, META, AVGO, ETFs)

Core Equity Holdings (NN Group, 2025)
- "AMZN", "META", "AVGO", "LLY", "ETN", "CYBR", "LIN", "WM", "SLNO", "CYTK"

ETF Additions (for sector/macro exposure)
- "SPY", "TLT", "LQD", "VNQ", "XLV"


Description:
- Amazon.com Inc â€“ Consumer Discretionary
- Meta Platforms Inc â€“ Communication Services
- Broadcom Inc â€“ Technology
- Eli Lilly & Co â€“ Healthcare
- Eaton Corp PLC â€“ Industrials
- CyberArk Software Ltd â€“ Technology
- Linde PLC â€“ Materials
- Waste Management Inc â€“ Industrials
- Soleno Therapeutics Inc â€“ Healthcare
- Cytokinetics Inc â€“ Healthcare
- SPDR S&P 500 ETF (SPY) â€“ Broad Market ETF
- iShares 20+ Year Treasury ETF (TLT) â€“ Government Bonds
- iShares Investment Grade Corporate Bond ETF (LQD) â€“ Corporate Bonds
- Vanguard Real Estate ETF (VNQ) â€“ Real Estate
- Health Care Select Sector SPDR ETF (XLV) â€“ Healthcare Sector ETF

Based on Q2 2025 of Nationale-Nederlanden Powszechne Towarzystwo Emerytalne S.A.
https://www.sensamarket.com/institutional-investor/000201108125000008/compare/000201108125000005

In [18]:
tickers = ["AMZN", "META", "AVGO", "LLY", "ETN", "CYBR", "LIN", "WM", "SLNO", "CYTK", "SPY", "TLT", "LQD", "VNQ", "XLV"]
leadup_days = 70
start_date = (datetime.strptime("2022-01-01", "%Y-%m-%d") - timedelta(days=leadup_days)).strftime("%Y-%m-%d")
raw_data = yf.download(tickers, interval="1d", start=start_date, end="2025-01-01", group_by="ticker")

  raw_data = yf.download(tickers, interval="1d", start=start_date, end="2025-01-01", group_by="ticker")
[*********************100%***********************]  15 of 15 completed


In [19]:
raw_data

Ticker,CYBR,CYBR,CYBR,CYBR,CYBR,META,META,META,META,META,...,LLY,LLY,LLY,LLY,LLY,XLV,XLV,XLV,XLV,XLV
Price,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume,...,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2021-10-25,184.580002,187.440002,181.850006,182.429993,342800,318.337508,327.540782,317.761075,326.676117,38409000,...,235.278184,236.826447,231.296939,235.653229,2763100,123.470062,123.695018,122.514030,123.563797,10629700
2021-10-26,183.789993,184.630005,177.970001,179.089996,371100,326.248730,328.186764,307.703058,313.875000,65654000,...,235.124321,239.374828,227.998464,238.913239,4533000,123.845012,124.529235,123.170158,124.173058,9271600
2021-10-27,180.339996,180.339996,175.520004,176.610001,398300,312.264968,317.293964,310.148014,310.307037,29971800,...,239.076727,244.702407,238.797854,240.576904,3181300,124.369891,124.501112,123.151403,123.188889,11851400
2021-10-28,177.399994,180.199997,175.639999,177.639999,266200,311.072297,323.525524,306.222192,314.978241,50806800,...,240.567301,246.904596,240.490367,243.634979,3386500,123.413857,124.697949,123.413857,124.219933,9282500
2021-10-29,177.190002,180.410004,177.190002,180.110001,195700,318.228198,324.002598,317.641817,321.587494,37059400,...,244.048459,246.635309,242.115540,244.990875,3152000,124.060580,125.588378,123.582564,125.429039,15415100
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-12-24,319.000000,320.871002,316.989990,320.609985,145500,601.315372,606.573110,597.883447,606.333679,4726100,...,786.664545,793.006614,782.419914,790.938965,1165400,136.813925,137.494888,136.231643,137.494888,3139300
2024-12-26,320.510010,322.809998,317.109985,322.459991,229600,604.069011,604.887107,597.544273,601.943970,6081400,...,789.398190,799.129958,786.793774,789.418091,1274200,137.011293,137.889633,136.863249,137.771210,4720300
2024-12-27,320.000000,321.390015,312.915009,320.820007,271700,598.013139,600.447456,588.425548,598.412231,8084200,...,785.302712,788.583078,774.586780,778.513306,2096500,137.011307,137.978465,136.547463,137.129730,5948000
2024-12-30,315.970001,321.000000,311.000000,319.600006,289600,587.377934,595.548850,584.215339,589.862122,7025900,...,773.294589,775.123675,766.415751,769.238892,1719100,136.340191,136.340191,134.938795,135.442123,6890300


## Technical Indicators
Momentum 
- RSI
- StochRSI

Trend
- MACD
- SMA
- EMA

Volatility
- BB (Bollinger Bands)
- ATR (Average True Range)

Volume
- OBV (On-Balance Volume)
- CMF (Chaikin Money Flow)

In [21]:
def add_features(df):
    df = df.copy()
    
    # General price dynamics
    df["returns"] = df["Close"].pct_change()
    df["logReturns"] = np.log(df["Close"] / df["Close"].shift(1))
    df["Volatility"] = df["returns"].rolling(10).std
    df['Range'] = df['High'] - df['Low']
    df['Body'] = abs(df['Close'] - df['Open'])
    df['Wick'] = df['Range'] - df['Body']

    # Momentum
    df['RSI'] = RSIIndicator(df['Close'], window=14).rsi()
    df['StochRSI'] = StochRSIIndicator(df['Close'], window=14).stochrsi()
    
    # Trend
    df['SMA_20'] = SMAIndicator(df['Close'], window=20).sma_indicator()
    df['EMA_20'] = EMAIndicator(df['Close'], window=20).ema_indicator()
    macd = MACD(df['Close'])
    df['MACD'] = macd.macd()
    df['MACD_Signal'] = macd.macd_signal()
    
    # Volatility
    bb = BollingerBands(df['Close'], window=20)
    df['BB_High'] = bb.bollinger_hband()
    df['BB_Low'] = bb.bollinger_lband()
    df['ATR'] = AverageTrueRange(df['High'], df['Low'], df['Close'], window=14).average_true_range()
    
    # Volume
    df['OBV'] = OnBalanceVolumeIndicator(df['Close'], df['Volume']).on_balance_volume()
    df['CMF'] = ChaikinMoneyFlowIndicator(df['High'], df['Low'], df['Close'], df['Volume'], window=20).chaikin_money_flow()
    
    return df