# TD3-Portfoliomanagement für den Dow Jones 30

Dieses Notebook zeigt, wie sich mit dem FinRL-Framework und dem TD3-Algorithmus (Twin Delayed DDPG) eine dynamische Portfolio-Allokation für den Dow Jones 30 aufbauen lässt. Alle Abschnitte sind kommentiert, um auch Einsteiger*innen bei jedem Schritt mitzunehmen.


## Virtuelle Umgebung (.venv)

1. **Anlegen:** Im Projektordner `python3 -m venv .venv` ausführen (bereits für dich eingerichtet).
2. **Aktivieren (macOS/Linux):** `source .venv/bin/activate`
3. **Aktivieren (Windows PowerShell):** `.venv\\Scripts\\Activate.ps1`
4. **Kernel registrieren (optional, für Jupyter):**
   ```bash
   python -m ipykernel install --user --name drl-td3-venv --display-name "Python (.venv TD3)"
   ```
5. **Pakete installieren:** Innerhalb der aktiven Umgebung die `%pip install ...`-Zelle ausführen, damit FinRL & Co. lokal isoliert bleiben.

> Tipp: Bei mehreren Projekten immer das passende `.venv` aktivieren, bevor du das Notebook startest.


In [1]:
# Paketinstallation (ggf. beim ersten Start ausführen)
# Hinweis (Deutsch): Die Installation kann ein paar Minuten dauern.
# Achtung: FinRL wird direkt aus dem GitHub-Master installiert, da dort macOS-kompatible Abhängigkeiten gepflegt werden.
%pip install -q "git+https://github.com/AI4Finance-Foundation/FinRL.git@master" ta


Note: you may need to restart the kernel to use updated packages.


In [None]:
# Basisimporte und Verzeichnisse vorbereiten (Kommentare auf Deutsch)
import warnings
warnings.filterwarnings("ignore")

from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from finrl import config
from finrl.config import INDICATORS


from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.meta.preprocessor.preprocessors import data_split
from finrl.meta.env_portfolio_allocation.env_portfolio import StockPortfolioEnv
from finrl.agents.stablebaselines3.models import DRLAgent
from finrl.plot import backtest_stats, backtest_plot
from finrl.meta.preprocessor.preprocessors import FeatureEngineer

# Lokale Verzeichnisse für Zwischenergebnisse
ROOT_DIR = Path.cwd()
DATA_SAVE_DIR = ROOT_DIR / "data"
MODEL_DIR = ROOT_DIR / "models_td3"
RESULTS_DIR = ROOT_DIR / "results"
for path in (DATA_SAVE_DIR, MODEL_DIR, RESULTS_DIR):
    path.mkdir(parents=True, exist_ok=True)



In [3]:
DOW_30_TICKER = [
    'MMM', 'AXP', 'AMGN', 'AAPL', 'BA', 'CAT', 'CVX', 'CSCO', 
    'KO', 'DIS', 'DOW', 'GS', 'HD', 'HON', 'IBM', 'INTC', 'JNJ', 
    'JPM', 'MCD', 'MRK', 'MSFT', 'NKE', 'PG', 'CRM', 'TRV', 
    'UNH', 'VZ', 'V', 'WMT', 'WBA' 
]

In [None]:
# Dow-Jones-Konfiguration und Trainings-/Testzeiträume (Kommentare auf Deutsch)
ticker_list = DOW_30_TICKER


train_start_date = "2014-01-01"
train_end_date = "2021-12-31"
trade_start_date = "2022-01-01"
trade_end_date = "2024-11-01"

initial_capital = 1_000_000
transaction_cost_pct = 0.001  # 10 Basispunkte pro Trade
hmax = 100  # maximale Stückzahl pro Order
reward_scaling = 1e-4  
num_stock_shares = 1000


## Schritt 1: Daten- und Marktparameter festlegen

Wir definieren, welche Dow-Jones-30-Titel, welchen Zeitraum und welche Finanzindikatoren unser Agent für das Training und das anschließende Trading nutzen soll.


In [7]:
df_raw = YahooDownloader(
    start_date=train_start_date,
    end_date=trade_end_date,
    ticker_list= DOW_30_TICKER,
).fetch_data()
df_raw.to_csv('dow_30_data.csv')

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

Shape of DataFrame:  (77772, 8)


In [21]:
df_raw



Price,date,close,high,low,open,volume,tic,day
0,2014-01-02,17.156706,17.277674,17.122277,17.235801,234684800,AAPL,3
1,2014-01-02,83.317406,83.598008,82.065482,82.281333,2528800,AMGN,3
2,2014-01-02,75.593262,76.970762,75.534106,76.818646,5112000,AXP,3
3,2014-01-02,116.807961,117.303672,115.816538,116.243874,3366700,BA,3
4,2014-01-02,66.045410,66.471648,65.648564,66.442256,4898000,CAT,3
...,...,...,...,...,...,...,...,...
77767,2024-10-31,241.938232,246.365011,241.741490,244.476254,1318600,TRV,3
77768,2024-10-31,552.517334,556.383499,548.974181,548.974181,2490000,UNH,3
77769,2024-10-31,287.259705,293.691696,287.031749,289.479677,7950200,V,3
77770,2024-10-31,39.416229,39.902734,38.592913,38.602267,31077900,VZ,3


## Schritt 2: Datenbeschaffung & Feature Engineering

Wir nutzen den integrierten `DataProcessor` von FinRL, der Yahoo Finance ansteuert, sämtliche Dow-Jones-Kurse lädt und anschließend technische Indikatoren hinzufügt.


In [None]:
#tech_indicators = ['rsi','macd']

fe = FeatureEngineer(
    use_technical_indicator=True,
    tech_indicator_list=INDICATORS,
    use_vix=False,        
    use_turbulence=False,  
    user_defined_feature=False
)
df = fe.preprocess_data(df_raw)

df=df.sort_values(['date','tic'],ignore_index=True)
df.index = df.date.factorize()[0]

cov_list = []
return_list = []


lookback=252
for i in range(lookback,len(df.index.unique())):
  data_lookback = df.loc[i-lookback:i,:]
  price_lookback=data_lookback.pivot_table(index = 'date',columns = 'tic', values = 'close')
  return_lookback = price_lookback.pct_change().dropna()
  return_list.append(return_lookback)

  covs = return_lookback.cov().values 
  cov_list.append(covs)

  
df_cov = pd.DataFrame({'date':df.date.unique()[lookback:],'cov_list':cov_list,'return_list':return_list})
df = df.merge(df_cov, on='date')
df = df.sort_values(['date','tic']).reset_index(drop=True)


df





Successfully added technical indicators


Unnamed: 0,date,close,high,low,open,volume,tic,day,macd,boll_ub,boll_lb,rsi_30,cci_30,dx_30,close_30_sma,close_60_sma,cov_list,return_list
0,2015-01-02,24.237551,24.705320,23.798600,24.694235,212818400,AAPL,4,-0.032313,25.802298,23.784588,49.712408,-117.917607,30.589492,25.154541,24.265838,"[[0.00018564806669213102, 5.2475743997731304e-...",tic AAPL AMGN AXP ...
1,2015-01-02,117.267029,119.247269,116.320918,117.465056,2605400,AMGN,4,-0.218230,126.512109,114.258211,52.014938,-98.288563,26.857916,120.554460,115.278998,"[[0.00018564806669213102, 5.2475743997731304e-...",tic AAPL AMGN AXP ...
2,2015-01-02,79.466217,80.252170,78.714443,79.594362,2437500,AXP,4,0.635426,81.914903,76.426383,54.472435,40.554814,2.258455,78.813838,76.874452,"[[0.00018564806669213102, 5.2475743997731304e-...",tic AAPL AMGN AXP ...
3,2015-01-02,113.657211,115.310248,112.905035,114.636798,4294200,BA,4,0.615906,118.411502,105.554005,53.311167,9.332513,4.733885,113.467716,110.866681,"[[0.00018564806669213102, 5.2475743997731304e-...",tic AAPL AMGN AXP ...
4,2015-01-02,69.320366,69.690059,68.399922,69.237374,3767900,CAT,4,-1.274687,74.301831,66.110407,41.395081,-68.647295,27.802444,72.634997,73.335316,"[[0.00018564806669213102, 5.2475743997731304e-...",tic AAPL AMGN AXP ...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69295,2024-10-31,241.938232,246.365011,241.741490,244.476254,1318600,TRV,3,4.289557,265.811934,218.994148,55.519532,30.789545,8.858985,239.052298,230.332258,"[[0.00020348651246347978, 5.1802428174087016e-...",tic AAPL AMGN AXP ...
69296,2024-10-31,552.517334,556.383499,548.974181,548.974181,2490000,UNH,3,-5.253441,591.313304,534.094575,48.006190,-78.300434,16.237826,564.678935,567.471445,"[[0.00020348651246347978, 5.1802428174087016e-...",tic AAPL AMGN AXP ...
69297,2024-10-31,287.259705,293.691696,287.031749,289.479677,7950200,V,3,2.314701,290.740082,269.592246,57.834998,143.750873,43.150769,278.255231,275.096878,"[[0.00020348651246347978, 5.1802428174087016e-...",tic AAPL AMGN AXP ...
69298,2024-10-31,39.416229,39.902734,38.592913,38.602267,31077900,VZ,3,-0.360340,41.844809,38.215385,49.647300,-100.877137,11.875918,40.412689,39.529393,"[[0.00020348651246347978, 5.1802428174087016e-...",tic AAPL AMGN AXP ...


In [70]:
train = data_split(df, start=train_start_date, end=train_end_date)
trade = data_split(df, start=trade_start_date, end=trade_end_date)

## Schritt 3: Handelsumgebung definieren

Wir erstellen zwei `StockTradingEnv`-Instanzen – eine für das Training und eine für das spätere Trading/Backtesting. Die Umgebung erhält Angaben zu Kapital, Transaktionskosten, Risiko-Kontrollen und zu den technischen Indikatoren.


In [86]:

stock_dim = len(train.tic.unique())

print(f"State-Dimension: {stock_dim} | Aktienanzahl: {stock_dim}")

env_kwargs = {
    "hmax": hmax,
    "initial_amount": initial_capital,
    "transaction_cost_pct": transaction_cost_pct,
    "state_space": stock_dim,
    "stock_dim": stock_dim,
    "tech_indicator_list": INDICATORS,
    "action_space": stock_dim,
    "reward_scaling": reward_scaling,
}

e_train_gym = StockPortfolioEnv(df=train, **env_kwargs)
e_trade_gym = StockPortfolioEnv(df=trade, **env_kwargs)

State-Dimension: 28 | Aktienanzahl: 28


## Agenten konfigurieren


In [87]:
agent = DRLAgent(env=e_train_gym)


PPO_PARAMS = {
    "n_steps": 2048,
    "ent_coef": 0.01,
    "learning_rate": 0.00025,
    "batch_size": 64,
}
model_ppo = agent.get_model("ppo", model_kwargs=PPO_PARAMS)
trained_ppo = agent.train_model(model=model_ppo, 
                                tb_log_name="ppo", 
                                total_timesteps=50000
                                )
trained_ppo.save("trained_models/ppo_portfolio")

DDPG_PARAMS = {
    "batch_size": 128, 
    "buffer_size": 50000, 
    "learning_rate": 0.001
}


model_ddpg = agent.get_model("ddpg", model_kwargs=DDPG_PARAMS)
trained_ddpg = agent.train_model(model=model_ddpg, 
                                 tb_log_name="ddpg", 
                                 total_timesteps=30000
                                 )
trained_ddpg.save("trained_models/ddpg_portfolio")

TD3_PARAMS = {
    "batch_size": 100, 
    "buffer_size": 1000000, 
    "learning_rate": 0.001
}
model_td3 = agent.get_model("td3", model_kwargs=TD3_PARAMS)
trained_td3 = agent.train_model(model=model_td3, 
                                tb_log_name="td3", 
                                total_timesteps=30000
                                )
trained_td3.save("trained_models/td3_portfolio")

{'n_steps': 2048, 'ent_coef': 0.01, 'learning_rate': 0.00025, 'batch_size': 64}
Using cpu device
Wrapping the env with a `Monitor` wrapper
Wrapping the env in a DummyVecEnv.
begin_total_asset:1000000
end_total_asset:2894929.9060965604
Sharpe:  0.9248047753576213
----------------------------------
| rollout/           |           |
|    ep_len_mean     | 1.76e+03  |
|    ep_rew_mean     | 2.95e+09  |
| time/              |           |
|    fps             | 687       |
|    iterations      | 1         |
|    time_elapsed    | 2         |
|    total_timesteps | 2048      |
| train/             |           |
|    reward          | 977040.56 |
|    reward_max      | 2900744.0 |
|    reward_mean     | 1582292.8 |
|    reward_min      | 917186.94 |
----------------------------------
begin_total_asset:1000000
end_total_asset:2850058.0084528155
Sharpe:  0.9083555306474409
---------------------------------------
| rollout/                |           |
|    ep_len_mean          | 1.76e+03  |
|  