**HƯỚNG DẪN CHẠY**

*Nhóm chạy code theo thứ tự từng cell từ trên xuống xuống dưới*

**Một số điểm lưu ý:**

- *Thời gian chạy các block đa số lâu (Khoảng 10 phút riêng block 6 khoảng 25 phút)*

- *Các block 8,9,10,11 có lưu kết quả file csv* 

- **Các file csv kết quả nhóm có upload lên github**

**Tải các thư viện cần thiết** 

In [None]:
!pip install pandas numpy torch scikit-learn matplotlib
!pip install --extra-index-url https://fiinquant.github.io/fiinquantx/simple fiinquantx
!pip install --upgrade --extra-index-url https://fiinquant.github.io/fiinquantx/simple fiinquantx

**Block 1: tải dữ liệu lịch sử và realtime**

In [1]:
# Block 1 — Login & Lấy dữ liệu tất cả HOSE/HNX/UPCOM
import pandas as pd
from FiinQuantX import FiinSession, BarDataUpdate
# --- Login ---
username = "DSTC_18@fiinquant.vn"
password = "Fiinquant0606"

client = FiinSession(
    username=username,
    password=password
).login()

# --- Lấy danh sách cổ phiếu từng sàn ---
tickers_hose  = list(client.TickerList(ticker="VNINDEX"))     # HOSE
print(f"Số mã HOSE: {len(tickers_hose)}")

# --- Lấy dữ liệu lịch sử toàn bộ ---
event_history = client.Fetch_Trading_Data(
    realtime=False,
    tickers=tickers_hose,
    fields=['open','high','low','close','volume','bu','sd','fs','fn'], 
    adjusted=True,
    by="1d",
    from_date="2023-01-01"   # Lấy dữ liệu từ 2023 tới nay
)

df_all = event_history.get_data()
print("History ban đầu:", df_all.head())

# --- Callback realtime ---
def onDataUpdate(data: BarDataUpdate):
    global df_all
    df_update = data.to_dataFrame()
    df_all = pd.concat([df_all, df_update])
    df_all = df_all.drop_duplicates()
    print("Realtime update:")
    print(df_update.head())

# --- Bật realtime nối tiếp dữ liệu ---
event_realtime = client.Fetch_Trading_Data(
    realtime=True,
    tickers=tickers_hose,
    fields=['open','high','low','close','volume','bu','sd','fs','fn'], 
    adjusted=True,
    by="1d",
    period=1,
    callback=onDataUpdate
)


Số mã HOSE: 413
Fetching data, it may take a while. Please wait...
History ban đầu:   ticker         timestamp      open      high       low     close     volume  \
0    AAA  2023-01-03 00:00  6539.643  6866.145  6539.643  6866.145  1543984.0   
1    AAA  2023-01-04 00:00  6866.145  7000.587  6827.733  6827.733  1302505.0   
2    AAA  2023-01-05 00:00  6866.145  6904.557  6808.527  6885.351   980473.0   
3    AAA  2023-01-06 00:00  6885.351  6990.984  6818.130  6856.542  1431699.0   
4    AAA  2023-01-09 00:00  6914.160  6962.175  6760.512  6789.321  1121385.0   

         bu        sd           fs           fn  
0  938600.0  504700.0   40579000.0  899404000.0  
1  462900.0  780600.0  151639000.0   36850000.0  
2  487200.0  473700.0  343911000.0  -59103000.0  
3  564300.0  828300.0  345999000.0 -294312000.0  
4  414000.0  631800.0  514557000.0 -483197000.0  


**Block 2: lấy dữ liệu FA, lọc các mã không hợp lệ**

In [2]:
# Block 2 — Lấy dữ liệu FA theo quý (HOSE only)

def fetch_fa_quarterly(ticker, latest_year=2025, n_periods=32):
    try:
        fi_list = client.FundamentalAnalysis().get_ratios(
            tickers=[ticker],
            TimeFilter="Quarterly",
            LatestYear=latest_year,
            NumberOfPeriod=n_periods,
            Consolidated=True
        )

        # Nếu không có dữ liệu thì bỏ qua
        if not fi_list or not isinstance(fi_list, list):
            return pd.DataFrame()

        df = pd.DataFrame(fi_list)
        if df.empty:
            return pd.DataFrame()

        df["ticker"] = ticker
        if "ReportDate" in df.columns:
            df["ReportDate"] = pd.to_datetime(df["ReportDate"])
        else:
            # Nếu không có ReportDate thì tạo cột null để tránh lỗi concat
            df["ReportDate"] = pd.NaT

        return df

    except Exception as e:
        print(f"⚠️ Lỗi khi lấy FA cho {ticker}: {e}")
        return pd.DataFrame()


# --- Lọc danh sách: chỉ giữ những mã có dữ liệu FA ---
fa_list = []
valid_tickers = []

for t in tickers_hose:   # lấy theo danh sách HOSE từ Block 1
    df_fa = fetch_fa_quarterly(t, latest_year=2025, n_periods=32)
    if not df_fa.empty:
        fa_list.append(df_fa)
        valid_tickers.append(t)

# --- Gộp DataFrame ---
if fa_list:
    fa_data = pd.concat(fa_list, ignore_index=True)
else:
    fa_data = pd.DataFrame()

print(f"Số mã HOSE ban đầu: {len(tickers_hose)}")
print(f"Số mã có dữ liệu FA: {len(valid_tickers)}")
print("FA Data sample:")
print(fa_data.head())


⚠️ Lỗi khi lấy FA cho FUETPVND: 'FUETPVND'
Số mã HOSE ban đầu: 413
Số mã có dữ liệu FA: 391
FA Data sample:
   organizationId ticker  year  quarter  \
0          894364    CCC  2023        4   
1          894364    CCC  2024        1   
2          894364    CCC  2024        2   
3          894364    CCC  2024        3   
4          894364    CCC  2024        4   

                                              ratios ReportDate  
0  {'SolvencyRatio': {'DebtToEquityRatio': 1.5102...        NaT  
1  {'SolvencyRatio': {'DebtToEquityRatio': 0.7722...        NaT  
2  {'SolvencyRatio': {'DebtToEquityRatio': 0.7357...        NaT  
3  {'SolvencyRatio': {'DebtToEquityRatio': 0.7914...        NaT  
4  {'SolvencyRatio': {'DebtToEquityRatio': 0.6437...        NaT  


**Block 3: Chuẩn hóa FA và gộp dữ liệu với giá**

In [22]:
# Block 3 — Chuẩn hoá FA + Merge với giá (HOSE only, dựa theo Block 2)

import pandas as pd

# --- Các chỉ số FA cần lấy ---
fa_fields = [
    "DebtToEquityRatio","EBITMargin","ROA","ROE","ROIC",
    "BasicEPS","PriceToBook","PriceToEarning",
    "NetRevenueGrowthYoY","GrossProfitGrowthYoY"
]

# --- Hàm nổ ratios ---
def explode_ratios(df, fa_fields):
    records = []
    for _, row in df.iterrows():
        d = {
            "ticker": row["ticker"],
            "fa_year": int(row["year"]),
            "fa_quarter": int(row["quarter"])
        }
        ratios = row.get("ratios", {})
        if isinstance(ratios, dict):   # ✅ fix chỗ lỗi
            for f in fa_fields:
                val = None
                for section in ratios.values():
                    if isinstance(section, dict) and f in section:
                        val = section[f]
                d[f] = val
        else:
            # nếu ratios không phải dict thì gán NaN hết
            for f in fa_fields:
                d[f] = None
        records.append(d)
    return pd.DataFrame(records)

# --- Chuẩn hoá FA ---
fa_clean = explode_ratios(fa_data, fa_fields)

# --- Chuẩn hoá giá ---
df_price = df_all[df_all["ticker"].isin(valid_tickers)].copy()
df_price["timestamp"] = pd.to_datetime(df_price["timestamp"])
df_price = df_price.sort_values(["ticker","timestamp"])

# tạo key (fa_year, fa_quarter) = quý trước
pi = df_price["timestamp"].dt.to_period("Q")
prev_pi = pi - 1
df_price["fa_year"] = prev_pi.dt.year.astype(int)
df_price["fa_quarter"] = prev_pi.dt.quarter.astype(int)

# --- Xử lý FA: giữ duy nhất bản cuối cùng mỗi quý
fa_clean = (
    fa_clean.sort_values(["ticker","fa_year","fa_quarter"])
            .drop_duplicates(subset=["ticker","fa_year","fa_quarter"], keep="last")
)

# --- Merge giá + FA ---
df_merged = df_price.merge(
    fa_clean,
    on=["ticker","fa_year","fa_quarter"],
    how="left"
)

# FFill theo thời gian trong từng ticker để lấp chỗ trống
df_merged = df_merged.sort_values(["ticker","timestamp"])
df_merged[fa_fields] = df_merged.groupby("ticker")[fa_fields].ffill()

print("Sample merged:")
print(df_merged.head())
print("Số mã merge thành công:", df_merged["ticker"].nunique())




Sample merged:
  ticker  timestamp      open      high       low     close     volume  \
0    AAA 2023-01-03  6539.643  6866.145  6539.643  6866.145  1543984.0   
1    AAA 2023-01-04  6866.145  7000.587  6827.733  6827.733  1302505.0   
2    AAA 2023-01-05  6866.145  6904.557  6808.527  6885.351   980473.0   
3    AAA 2023-01-06  6885.351  6990.984  6818.130  6856.542  1431699.0   
4    AAA 2023-01-09  6914.160  6962.175  6760.512  6789.321  1121385.0   

         bu        sd           fs  ...  DebtToEquityRatio  EBITMargin  \
0  938600.0  504700.0   40579000.0  ...           0.507521   -0.049731   
1  462900.0  780600.0  151639000.0  ...           0.507521   -0.049731   
2  487200.0  473700.0  343911000.0  ...           0.507521   -0.049731   
3  564300.0  828300.0  345999000.0  ...           0.507521   -0.049731   
4  414000.0  631800.0  514557000.0  ...           0.507521   -0.049731   

        ROA       ROE      ROIC    BasicEPS  PriceToBook  PriceToEarning  \
0  0.014669  0.0295

**Xóa biến df_all không cần thiết nữa để giảm dung lượng RAM**

**Block 4: Tính các chỉ số TA dựa vào thư viện FiinQuant và ghép dữ liệu**

In [23]:
# Block 4 — Tính các chỉ số TA + Regime (trên df_merged từ Block 3)

import pandas as pd
import numpy as np

# --- Khởi tạo Indicator ---
fi = client.FiinIndicator()

# --- Hàm tính TA theo từng ticker ---
def add_ta_indicators(df):
    df = df.sort_values("timestamp").copy()
    df = df.reset_index(drop = True)
    # EMA
    df['ema_5']  = fi.ema(df['close'], window=5)
    df['ema_20'] = fi.ema(df['close'], window=20)
    df['ema_50'] = fi.ema(df['close'], window=50)

    # MACD
    df['macd']        = fi.macd(df['close'], window_fast=12, window_slow=26)
    df['macd_signal'] = fi.macd_signal(df['close'], window_fast=12, window_slow=26, window_sign=9)
    df['macd_diff']   = fi.macd_diff(df['close'], window_fast=12, window_slow=26, window_sign=9)

    # RSI
    df['rsi'] = fi.rsi(df['close'], window=14)

    # Bollinger Bands
    df['bollinger_hband'] = fi.bollinger_hband(df['close'], window=20, window_dev=2)
    df['bollinger_lband'] = fi.bollinger_lband(df['close'], window=20, window_dev=2)

    # ATR
    df['atr'] = fi.atr(df['high'], df['low'], df['close'], window=14)

    # OBV
    df['obv'] = fi.obv(df['close'], df['volume'])

    # VWAP
    df['vwap'] = fi.vwap(df['high'], df['low'], df['close'], df['volume'], window=14)

    # ----------------- Chỉ báo cho Regime -----------------
    df['ma9']  = df['close'].rolling(window=9).mean()
    df['ma21'] = df['close'].rolling(window=21).mean()
    df['adx']  = fi.adx(df['high'], df['low'], df['close'], window=14)
    df['ret_3d'] = df['close'].pct_change(periods=3) * 100

    # Regime: bull/bear/sideway
    cond_bull = (df['ma9'] > df['ma21']) & (df['adx'] > 20)
    cond_bear = ((df['ma9'] < df['ma21']) & (df['adx'] > 20)) | (df['ret_3d'] < -7)
    df['regime'] = np.where(cond_bull, 'bull', np.where(cond_bear, 'bear', 'sideway'))

    return df

# --- Áp dụng cho toàn bộ df_merged ---
df_with_ta = df_merged.groupby("ticker", group_keys=False).apply(add_ta_indicators)

print("Sample with TA + Regime:")
print(df_with_ta[['ticker','timestamp','close','ma9','ma21','adx','ret_3d','regime']].head())
print("Shape sau khi thêm TA:", df_with_ta.shape)

Sample with TA + Regime:
  ticker  timestamp     close  ma9  ma21  adx    ret_3d   regime
0    AAA 2023-01-03  6866.145  NaN   NaN  NaN       NaN  sideway
1    AAA 2023-01-04  6827.733  NaN   NaN  NaN       NaN  sideway
2    AAA 2023-01-05  6885.351  NaN   NaN  NaN       NaN  sideway
3    AAA 2023-01-06  6856.542  NaN   NaN  NaN -0.139860  sideway
4    AAA 2023-01-09  6789.321  NaN   NaN  NaN -0.562588  sideway
Shape sau khi thêm TA: (263580, 40)


In [24]:
# --- Lưu kết quả Block 4 ---
#df_with_ta.to_parquet("df_with_ta.parquet", index=False)
df_with_ta.to_csv("df_with_ta.csv", index=False)

# --- Thống kê phân phối regime ---
print("=== Regime distribution (overall) ===")
print(df_with_ta['regime'].value_counts(normalize=False))
print(df_with_ta['regime'].value_counts(normalize=True).mul(100).round(2))

print("\n=== Regime by ticker (top 5) ===")
print(df_with_ta.groupby("ticker")['regime'].value_counts().unstack(fill_value=0).head())

=== Regime distribution (overall) ===
regime
sideway    98723
bull       90301
bear       74556
Name: count, dtype: int64
regime
sideway    37.45
bull       34.26
bear       28.29
Name: proportion, dtype: float64

=== Regime by ticker (top 5) ===
regime  bear  bull  sideway
ticker                     
AAA      210   264      207
AAM      218   104      359
AAT      235   194      252
ABR      225   201      255
ABS      224   166      291


**Xóa bớt biến df_merged không còn cần thiết để giảm dung lượng RAM**

In [5]:
import gc
del df_merged
gc.collect()

1802

**Block 5: Chuẩn hóa dữ liệu FA và TA**

In [37]:
# Block 5 — Feature engineering & scaling (with regime as feature)

import numpy as np
import pandas as pd
import json
from datetime import datetime

# ---------- Config ----------
SAVE_SNAPSHOT = True
SNAPSHOT_PATH = f"df_features_{datetime.now().strftime('%Y%m%d_%H%M%S')}.parquet"

SAVE_GLOBAL_SCALER = False
GLOBAL_SCALER_PATH = "global_fa_scaler.json"

# ---------- Feature lists ----------
fa_features = [
    "DebtToEquityRatio","EBITMargin","ROA","ROE","ROIC",
    "BasicEPS","PriceToBook","PriceToEarning",
    "NetRevenueGrowthYoY","GrossProfitGrowthYoY"
]

ta_features = [
    "ema_5","ema_20","ema_50","macd","macd_signal","macd_diff",
    "rsi","bollinger_hband","bollinger_lband","atr","obv","vwap"
]

# ---------- FA: cross-section min-max per day ----------
def scale_fa_minmax(df):
    df_scaled = df.copy()
    for f in fa_features:
        vals = pd.to_numeric(df[f], errors="coerce")
        vmin, vmax = vals.min(), vals.max()
        if np.isfinite(vmin) and np.isfinite(vmax) and vmax > vmin:
            df_scaled[f] = (vals - vmin) / (vmax - vmin)
        else:
            df_scaled[f] = np.nan
    return df_scaled

df_scaled_fa = df_with_ta.groupby("timestamp", group_keys=False).apply(scale_fa_minmax)

# ---------- TA: rolling z-score per ticker ----------
def zscore_rolling(series, window=60):
    return (series - series.rolling(window).mean()) / series.rolling(window).std()

df_scaled = df_scaled_fa.groupby("ticker", group_keys=False).apply(
    lambda g: g.assign(**{f"{col}_z": zscore_rolling(g[col], 60) for col in ta_features})
)

# ---------- Encode regime (one-hot) ----------
# bull=1, bear=0, sideway=0 -> regime_bull
# bull=0, bear=1, sideway=0 -> regime_bear
# bull=0, bear=0, sideway=1 -> regime_sideway
regime_dummies = pd.get_dummies(df_scaled["regime"], prefix="regime")
df_scaled = pd.concat([df_scaled, regime_dummies], axis=1)

regime_features = list(regime_dummies.columns)  # ["regime_bear","regime_bull","regime_sideway"]

# ---------- Keep features ----------
keep_cols = ["ticker","timestamp"] + fa_features + [f"{col}_z" for col in ta_features] + regime_features

df_features = df_scaled[keep_cols].dropna().reset_index(drop=True)

# ---------- Save snapshot ----------
if SAVE_SNAPSHOT:
    try:
        df_features.to_parquet(SNAPSHOT_PATH, index=False)
        print(f"Saved features snapshot -> {SNAPSHOT_PATH}")
    except Exception as e:
        print("Warning: cannot save snapshot:", e)

# ---------- Optionally persist a global FA scaler ----------
if SAVE_GLOBAL_SCALER:
    global_scaler = {}
    for f in fa_features:
        vals = pd.to_numeric(df_with_ta[f], errors="coerce")
        vmin, vmax = float(vals.min()) if vals.notna().any() else None, float(vals.max()) if vals.notna().any() else None
        global_scaler[f] = {"vmin": vmin, "vmax": vmax}
    try:
        with open(GLOBAL_SCALER_PATH, "w", encoding="utf-8") as fh:
            json.dump(global_scaler, fh, indent=2, ensure_ascii=False)
        print(f"Saved global FA scaler -> {GLOBAL_SCALER_PATH}")
    except Exception as e:
        print("Warning: cannot save global scaler:", e)

# ---------- Diagnostics ----------
print("Sample features (head):")
print(df_features.head())
print("Shape after scaling & dropna:", df_features.shape)
try:
    orig_rows = len(df_with_ta)
    kept_rows = len(df_features)
    print(f"Original rows: {orig_rows}, Kept after dropna: {kept_rows} (kept {kept_rows/orig_rows:.2%})")
except Exception:
    pass

Saved features snapshot -> df_features_20250930_165456.parquet
Sample features (head):
  ticker  timestamp  DebtToEquityRatio  EBITMargin       ROA       ROE  \
0    AAA 2023-06-14           0.559115    0.978956  0.368731  0.415728   
1    AAT 2023-06-14           0.511748    0.980206  0.463952  0.482511   
2    ABS 2023-06-14           0.572974    0.977818  0.380350  0.424487   
3    ABT 2023-06-14           0.505407    0.980548  0.482215  0.495237   
4    ACC 2023-06-14           0.592866    0.980726  0.403270  0.446116   

       ROIC  BasicEPS  PriceToBook  PriceToEarning  ...  macd_diff_z  \
0  0.753300  0.158046     0.377640        0.537353  ...    -1.164222   
1  0.778758  0.145325     0.349144        0.521837  ...     2.231773   
2  0.757676  0.156000     0.357324        0.528204  ...     1.191728   
3  0.776716  0.211753     0.372187        0.522248  ...     1.217966   
4  0.758087  0.153041     0.415297        0.528052  ...    -0.111005   

      rsi_z  bollinger_hband_z  bol

**Xóa các biến df_with_ta, df_scaled, df_scaled_fa không cần thiết nữa**

In [12]:
del df_with_ta, df_scaled, df_scaled_fa
gc.collect()


0

**Block 6: Giảm chiều dữ liệu bằng t-SNE và phân cụm bằng DBSCAN**

In [38]:
# Block 6 — Giảm chiều dữ liệu & phân cụm (t-SNE + DBSCAN)

from sklearn.manifold import TSNE
from sklearn.cluster import DBSCAN

# --- Chọn các cột features để phân cụm ---
feature_cols = [
    "DebtToEquityRatio","EBITMargin","ROA","ROE","ROIC",
    "BasicEPS","PriceToBook","PriceToEarning",
    "NetRevenueGrowthYoY","GrossProfitGrowthYoY"
] + [c for c in df_features.columns if c.endswith("_z")]

# --- Thêm cột tháng để snapshot ---
df_features["month"] = df_features["timestamp"].dt.to_period("M")

cluster_results = []

for (month, g) in df_features.groupby("month"):
    if len(g) < 10:   # quá ít cổ phiếu thì bỏ
        continue

    X = g[feature_cols].values

    # --- t-SNE giảm chiều còn 2D ---
    tsne = TSNE(n_components=2, perplexity=30, learning_rate="auto", init="random", random_state=42)
    X_emb = tsne.fit_transform(X)

    # --- DBSCAN phân cụm ---
    db = DBSCAN(eps=0.5, min_samples=5).fit(X_emb)
    labels = db.labels_

    temp = g[["ticker","timestamp"]].copy()
    temp["cluster"] = labels
    temp["tsne_x"] = X_emb[:,0]
    temp["tsne_y"] = X_emb[:,1]
    temp["month"]  = str(month)

    cluster_results.append(temp)

df_clusters = pd.concat(cluster_results, ignore_index=True)

print("Cluster sample:")
print(df_clusters.head())
print("Số cụm mỗi tháng:")
print(df_clusters.groupby("month")["cluster"].nunique())


Cluster sample:
  ticker  timestamp  cluster     tsne_x     tsne_y    month
0    AAA 2023-06-14       -1  11.322961 -39.312492  2023-06
1    AAT 2023-06-14       -1 -52.104614  35.878654  2023-06
2    ABS 2023-06-14       -1 -53.525627  12.820326  2023-06
3    ABT 2023-06-14       -1  43.397667  56.436115  2023-06
4    ACC 2023-06-14       -1  68.141510   5.876695  2023-06
Số cụm mỗi tháng:
month
2023-06     22
2023-07     74
2023-08     77
2023-09     41
2023-10     69
2023-11     54
2023-12     80
2024-01     84
2024-02     32
2024-03     70
2024-04     45
2024-05     65
2024-06     84
2024-07     97
2024-08     68
2024-09     81
2024-10    112
2024-11     84
2024-12     67
2025-01     41
2025-02     71
2025-03     93
2025-04     50
2025-05     74
2025-06     96
2025-07    103
2025-08     83
2025-09     65
Name: cluster, dtype: int64


**Block 7: Xây tensors (clusters mapping) và masks (active stocks)**

In [39]:
# Block 7 — Tensors & Masks (updated, include regime one-hot)

import numpy as np
import pandas as pd
import os, gc, json

LOOKBACK = 64   # window size
DATA_DIR = "./tensors/"
os.makedirs(DATA_DIR, exist_ok=True)

# --- Feature columns: giữ tất cả trừ mấy cột meta ---
feature_cols = [c for c in df_features.columns 
                if c not in ["ticker","timestamp","cluster","month"]]

tensor_index = []

for c_id, g in df_clusters.groupby("cluster"):
    if c_id == -1:   # noise bỏ qua
        continue

    tickers = sorted(g["ticker"].unique())
    g_feat = df_features[df_features["ticker"].isin(tickers)].copy()

    # Pivot: index = timestamp, columns = (ticker, feature)
    pivoted = g_feat.pivot(index="timestamp", columns="ticker", values=feature_cols)
    pivoted.columns = pd.MultiIndex.from_product([tickers, feature_cols])

    # Mask
    mask_df = ~pivoted.isna()
    pivoted_filled = pivoted.ffill().bfill()

    T, N, F = len(pivoted_filled.index), len(tickers), len(feature_cols)
    X = pivoted_filled.values.reshape(T, N, F)
    M = mask_df.values.reshape(T, N, F).astype(np.int8)

    cluster_tensors, cluster_masks, cluster_dates = [], [], []
    for i in range(LOOKBACK, T):
        cluster_tensors.append(X[i-LOOKBACK:i])
        cluster_masks.append(M[i-LOOKBACK:i])
        cluster_dates.append(pivoted_filled.index[i])  # ngày cuối của window

    if cluster_tensors:
        X_arr = np.array(cluster_tensors, dtype=np.float16)  # tiết kiệm RAM
        M_arr = np.array(cluster_masks, dtype=np.int8)

        tensor_file = f"cluster_{c_id}_tensor.npy"
        mask_file   = f"cluster_{c_id}_mask.npy"
        np.save(os.path.join(DATA_DIR, tensor_file), X_arr)
        np.save(os.path.join(DATA_DIR, mask_file), M_arr)

        tensor_index.append({
            "cluster": int(c_id),
            "tickers": tickers,
            "dates": [str(d) for d in cluster_dates],   # ngày T
            "dates_shifted": [str(d+pd.Timedelta(days=2)) for d in cluster_dates],  # T+1 (reward align, đổi T+2 ở Block 😎
            "tensor_file": tensor_file,
            "mask_file": mask_file,
            "n_features": F
        })

        print(f"Cluster {c_id}: tensor {X_arr.shape}, mask {M_arr.shape} saved.")

    del g_feat, pivoted, pivoted_filled, mask_df, X, M, cluster_tensors, cluster_masks
    gc.collect()

# Save metadata
with open(os.path.join(DATA_DIR, "tensor_index.json"), "w") as f:
    json.dump(tensor_index, f, indent=2)

print("✅ Done Block 7: tensors + masks saved for all clusters.")


Cluster 0: tensor (509, 64, 23, 25), mask (509, 64, 23, 25) saved.
Cluster 1: tensor (509, 64, 28, 25), mask (509, 64, 28, 25) saved.
Cluster 2: tensor (509, 64, 26, 25), mask (509, 64, 26, 25) saved.
Cluster 3: tensor (509, 64, 33, 25), mask (509, 64, 33, 25) saved.
Cluster 4: tensor (509, 64, 31, 25), mask (509, 64, 31, 25) saved.
Cluster 5: tensor (509, 64, 30, 25), mask (509, 64, 30, 25) saved.
Cluster 6: tensor (509, 64, 30, 25), mask (509, 64, 30, 25) saved.
Cluster 7: tensor (509, 64, 26, 25), mask (509, 64, 26, 25) saved.
Cluster 8: tensor (509, 64, 27, 25), mask (509, 64, 27, 25) saved.
Cluster 9: tensor (509, 64, 29, 25), mask (509, 64, 29, 25) saved.
Cluster 10: tensor (509, 64, 29, 25), mask (509, 64, 29, 25) saved.
Cluster 11: tensor (509, 64, 30, 25), mask (509, 64, 30, 25) saved.
Cluster 12: tensor (509, 64, 26, 25), mask (509, 64, 26, 25) saved.
Cluster 13: tensor (509, 64, 28, 25), mask (509, 64, 28, 25) saved.
Cluster 14: tensor (509, 64, 31, 25), mask (509, 64, 31, 2

**Block 7.5: Chuẩn bị dữ liệu backtest(loại bỏ các cột dữ liệu không cần thiết nữa)**

In [55]:
# Block 7.5 — Chuẩn bị dữ liệu backtest cho reward (TP/SL động + regime flip + phí)

import gc
import numpy as np
import pandas as pd

# --- Tham số cấu hình ---
COST_BPS = 30  # phí round-trip (0.3%)
ATR_MULT = 1.0 # hệ số nhân ATR để điều chỉnh SL
TP_SL_RULES = {
    "bull":    {"tp": 0.06,  "sl": -0.04},
    "sideway":{"tp": 0.035, "sl": -0.02},
    "bear":   {"tp": 0.02,  "sl": -0.015},
}

# --- Chuẩn bị dữ liệu giá cơ bản ---
df_backtest = df_price[["ticker", "timestamp", "close", "high", "low"]].copy()
df_backtest["timestamp"] = pd.to_datetime(df_backtest["timestamp"])

# --- Merge regime + ATR từ df_with_ta ---
df_ta = df_with_ta[["ticker","timestamp","regime","atr"]].copy()
df_ta["timestamp"] = pd.to_datetime(df_ta["timestamp"])
df_backtest = pd.merge(df_backtest, df_ta, on=["ticker","timestamp"], how="left")

# --- Hàm tính exit cho từng ticker ---
def compute_exit(df):
    df = df.sort_values("timestamp").reset_index(drop=True)

    exit_price, exit_date = [], []
    realized_return, exit_type = [], []
    tp_level_list, sl_level_list = [], []
    entry_regime_list, exit_regime_list = [], []
    horizon_list, fee_list = [], []

    for i in range(len(df)):
        entry_price = df.loc[i, "close"]
        entry_date  = df.loc[i, "timestamp"]
        entry_regime= df.loc[i, "regime"]
        atr_val     = df.loc[i, "atr"]

        if pd.isna(entry_price) or pd.isna(entry_regime):
            exit_price.append(np.nan); exit_date.append(pd.NaT)
            realized_return.append(np.nan); exit_type.append("no_data")
            tp_level_list.append(np.nan); sl_level_list.append(np.nan)
            entry_regime_list.append(entry_regime); exit_regime_list.append(np.nan)
            horizon_list.append(np.nan); fee_list.append(np.nan)
            continue

        # --- Thiết lập TP/SL theo regime + ATR ---
        rule = TP_SL_RULES.get(entry_regime, {"tp":0.03,"sl":-0.02})
        tp_pct, sl_pct = rule["tp"], rule["sl"]

        # điều chỉnh SL bằng ATR (%)
        if atr_val and entry_price > 0:
            atr_pct = atr_val / entry_price
            sl_pct = min(sl_pct, -ATR_MULT * atr_pct)

        tp_level = entry_price * (1 + tp_pct)
        sl_level = entry_price * (1 + sl_pct)

        tp_level_list.append(tp_level)
        sl_level_list.append(sl_level)
        entry_regime_list.append(entry_regime)

        # --- Kiểm tra trong T+1, T+2 ---
        future = df.loc[i+1:i+2].copy()
        if future.empty:
            exit_price.append(np.nan); exit_date.append(pd.NaT)
            realized_return.append(np.nan); exit_type.append("censored")
            exit_regime_list.append(np.nan); horizon_list.append(np.nan); fee_list.append(np.nan)
            continue

        decided = False
        for j in future.index:
            px_high, px_low = future.loc[j, "high"], future.loc[j, "low"]
            reg_j   = future.loc[j, "regime"]
            date_j  = future.loc[j, "timestamp"]

            # 1. Regime flip sang bear -> exit close ngay lập tức
            if reg_j == "bear" and entry_regime != "bear":
                px, dt, etype = future.loc[j, "close"], date_j, "regime_flip"
                decided = True; break

            # 2. SL precedence
            if px_low <= sl_level:
                px, dt, etype = sl_level, date_j, "sl"
                decided = True; break

            # 3. TP sau đó
            if px_high >= tp_level:
                px, dt, etype = tp_level, date_j, "tp"
                decided = True; break

        # 4. Nếu chưa exit trong 2 ngày -> exit cuối cùng T+2 close
        if not decided:
            if i+2 < len(df):
                px, dt, etype = df.loc[i+2, "close"], df.loc[i+2, "timestamp"], "time_limit"
            else:
                px, dt, etype = np.nan, pd.NaT, "censored"

        exit_price.append(px); exit_date.append(dt); exit_type.append(etype)
        exit_regime_list.append(reg_j if "reg_j" in locals() else np.nan)

        # --- Realized return (log) sau khi trừ phí ---
        if px and entry_price>0:
            gross_ret = np.log(px/entry_price)
            fee = COST_BPS/1e4   # phí round-trip
            realized_return.append(gross_ret - fee)
            fee_list.append(fee)
        else:
            realized_return.append(np.nan); fee_list.append(np.nan)

        horizon = (dt - entry_date).days if pd.notna(dt) else np.nan
        horizon_list.append(horizon)

    df["exit_price"] = exit_price
    df["exit_date"] = exit_date
    df["realized_return"] = realized_return
    df["exit_type"] = exit_type
    df["tp_level"] = tp_level_list
    df["sl_level"] = sl_level_list
    df["entry_regime"] = entry_regime_list
    df["exit_regime"] = exit_regime_list
    df["horizon_days"] = horizon_list
    df["fee"] = fee_list
    return df

# --- Áp dụng toàn bộ ticker ---
df_backtest = df_backtest.groupby("ticker", group_keys=False).apply(compute_exit)

# --- One-hot regime tại entry ---
df_regime_dummies = pd.get_dummies(df_backtest["entry_regime"], prefix="regime")
df_backtest = pd.concat([df_backtest, df_regime_dummies], axis=1)

print("✅ Done Block 7.5 (improved, ATR từ df_with_ta): df_backtest sẵn sàng.")
print("Kích thước:", df_backtest.shape)
print("Sample:")
print(df_backtest.head(10))

✅ Done Block 7.5 (improved, ATR từ df_with_ta): df_backtest sẵn sàng.
Kích thước: (263580, 20)
Sample:
  ticker  timestamp     close      high       low   regime  atr  exit_price  \
0    AAA 2023-01-03  6866.145  6866.145  6539.643  sideway  NaN  6885.35100   
1    AAA 2023-01-04  6827.733  7000.587  6827.733  sideway  NaN  6856.54200   
2    AAA 2023-01-05  6885.351  6904.557  6808.527  sideway  NaN  6789.32100   
3    AAA 2023-01-06  6856.542  6990.984  6818.130  sideway  NaN  6719.41116   
4    AAA 2023-01-09  6789.321  6962.175  6760.512  sideway  NaN  6904.55700   
5    AAA 2023-01-10  6789.321  6885.351  6693.291  sideway  NaN  6904.55700   
6    AAA 2023-01-11  6904.557  6962.175  6818.130  sideway  NaN  6837.33600   
7    AAA 2023-01-12  6904.557  6962.175  6866.145  sideway  NaN  6875.74800   
8    AAA 2023-01-13  6837.336  6990.984  6837.336  sideway  NaN  7076.64276   
9    AAA 2023-01-16  6875.748  6904.557  6818.130  sideway  NaN  7116.39918   

   exit_date  realized_retu

**Block 8: Huấn luyện A3C theo từng cụm**

In [56]:
# Block 8 — Train A3C multi-stock per-cluster (Final, regime-aware, clipped rewards)

import os, gc, json
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim

DATA_DIR  = "./tensors/"
MODEL_DIR = "./models/"
os.makedirs(MODEL_DIR, exist_ok=True)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# --- Hyperparams (tune as needed) ---
EPOCHS_DEFAULT = 3
LR_DEFAULT     = 1e-3
BATCH_SIZE_DEFAULT = 256
REWARD_CLIP = 0.1      # clip rewards to [-REWARD_CLIP, +REWARD_CLIP]
GRAD_CLIP = 1.0
BETA_ENTROPY = 0.01

# Load tensor metadata
with open(os.path.join(DATA_DIR, "tensor_index.json"), "r") as f:
    tensor_index = json.load(f)

# df_backtest must exist (produced in Block 7.5)
# It must contain: ticker, timestamp, realized_return and regime one-hot columns if available
if "df_backtest" not in globals():
    # try to load from CSV if user saved it
    fb = os.path.join("./backtest_ddpg/", "df_backtest.csv")
    if os.path.exists(fb):
        df_backtest = pd.read_csv(fb, parse_dates=["timestamp"])
    else:
        raise RuntimeError("df_backtest not found in workspace. Run Block 7.5 first.")

# Ensure timestamp dtype
df_backtest["timestamp"] = pd.to_datetime(df_backtest["timestamp"])

# --- Model ---
class A3CNet(nn.Module):
    def __init__(self, n_features, hidden=64):
        super().__init__()
        self.lstm = nn.LSTM(input_size=n_features, hidden_size=hidden, batch_first=True)
        self.actor = nn.Linear(hidden, 3)   # short, flat, long
        self.critic = nn.Linear(hidden, 1)
    def forward(self, x):
        # x: (B, T, F)
        out, _ = self.lstm(x)
        h = out[:, -1, :]   # use last timestep hidden
        return self.actor(h), self.critic(h).squeeze(-1)

# --- Loss (A3C-style with entropy) ---
def a3c_loss(logits, values, actions, rewards, beta=BETA_ENTROPY):
    # logits: (B, n_actions), values: (B,), actions: (B,), rewards: (B,)
    adv = rewards - values
    critic = adv.pow(2).mean()
    logp = torch.log_softmax(logits, dim=-1)
    selected_logp = logp.gather(1, actions.unsqueeze(1)).squeeze(1)
    actor_loss = -(selected_logp * adv.detach()).mean()
    probs = torch.softmax(logits, dim=-1)
    entropy = -(probs * logp).sum(-1).mean()
    return actor_loss + 0.5 * critic - beta * entropy

# --- Utility: build maps for rewards and regime vectors ---
def build_lookup_maps(tickers, df_backtest_local):
    # rewards_map[(ticker, timestamp)] -> float realized_return
    # regime_map[(ticker, timestamp)] -> np.array([rb, rbu, rs]) (order matches one-hot columns if present)
    rewards_map = {}
    regime_map = {}
    has_regime_cols = all(c in df_backtest_local.columns for c in ["regime_bear","regime_bull","regime_sideway"])
    sub = df_backtest_local[df_backtest_local["ticker"].isin(tickers)].sort_values(["ticker","timestamp"])
    for _, row in sub.iterrows():
        tk = row["ticker"]
        ts = pd.to_datetime(row["timestamp"])
        r = row.get("realized_return", np.nan)
        if pd.notna(r):
            rewards_map[(tk, ts)] = float(r)
        else:
            # leave absent -> later impute zero
            pass
        if has_regime_cols:
            try:
                rv = np.array([
                    float(row.get("regime_bear", 0.0)),
                    float(row.get("regime_bull", 0.0)),
                    float(row.get("regime_sideway", 0.0))
                ], dtype=np.float32)
            except Exception:
                rv = np.array([0.0,0.0,0.0], dtype=np.float32)
            regime_map[(tk, ts)] = rv
    return rewards_map, regime_map

# --- Training per cluster ---
def process_cluster(meta, epochs=EPOCHS_DEFAULT, lr=LR_DEFAULT, batch_size=BATCH_SIZE_DEFAULT, reward_clip=REWARD_CLIP):
    c_id = int(meta["cluster"])
    tickers = meta["tickers"]
    dates = [pd.to_datetime(d) for d in meta["dates"]]            # dates are strings in metadata
    dates_shifted = [pd.to_datetime(d) for d in meta.get("dates_shifted", meta["dates"])]
    tensor_file = os.path.join(DATA_DIR, meta["tensor_file"])
    mask_file   = os.path.join(DATA_DIR, meta["mask_file"])

    if not os.path.exists(tensor_file) or not os.path.exists(mask_file):
        print(f"Cluster {c_id}: tensor/mask files missing, skip.")
        return

    X = np.load(tensor_file, mmap_mode="r")
    M = np.load(mask_file, mmap_mode="r")
    if X.size == 0:
        return
    # X shape: (B, T, N, F) from earlier design -> ensure that shape
    # note: previous code used X shape (B, T, N, F) — but often saved as (B, T, N, F)
    # our code expects X as (B, T, N, F)
    if X.ndim != 4:
        raise ValueError(f"Unexpected tensor shape for {tensor_file}: {X.shape}")

    B, T, N, F = X.shape
    print(f"[Cluster {c_id}] loaded tensor {tensor_file} -> (B,T,N,F)=({B},{T},{N},{F})")

    # clean
    X = X.astype(np.float32)
    X = np.nan_to_num(X, nan=0.0, posinf=1e6, neginf=-1e6)
    M = M.astype(np.float32)
    # build reward/regime lookups
    rewards_map, regime_map = build_lookup_maps(tickers, df_backtest)

    # default regime vector if not found
    default_regime = np.array([0.0,0.0,0.0], dtype=np.float32)

    # We'll augment features by regime one-hot (3 dims), repeated along time axis
    extra_dim = 3
    F_aug = F + extra_dim

    # --- model & optimizer ---
    model = A3CNet(n_features=F_aug, hidden=64).to(device)
    opt = optim.Adam(model.parameters(), lr=lr)

    # generator that yields mini-batches (flatten B*N -> total samples)
    total = B * N
    def iterator():
        for start in range(0, total, batch_size):
            end = min(total, start + batch_size)
            xb_list, mb_list, rb_list, idx_list = [], [], [], []
            for s in range(start, end):
                b, n = divmod(s, N)
                entry_date = dates[b]   # datetime
                tk = tickers[n]
                reward = rewards_map.get((tk, entry_date), 0.0)    # impute missing as 0.0
                # load time-series features for this sample
                x_sample = X[b, :, n, :]    # (T, F)
                m_sample = M[b, :, n, :]    # (T, F)
                # regime vector for (tk, entry_date)
                rvec = regime_map.get((tk, entry_date), default_regime)
                # expand regime across time axis -> (T, 3)
                rmat = np.repeat(rvec.reshape(1, -1), x_sample.shape[0], axis=0).astype(np.float32)
                # concat along feature axis
                x_aug = np.concatenate([x_sample, rmat], axis=1)   # (T, F_aug)
                m_aug = np.concatenate([m_sample, np.ones_like(rmat)], axis=1)  # regime assumed present -> mask 1
                xb_list.append(x_aug)
                mb_list.append(m_aug)
                # clip reward to avoid exploding targets
                rb_list.append(float(np.clip(reward, -reward_clip, reward_clip)))
                idx_list.append((b, n))
            yield np.stack(xb_list), np.stack(mb_list), np.array(rb_list, dtype=np.float32), idx_list

    # --- Training loop ---
    model.train()
    for ep in range(epochs):
        loss_ep = 0.0
        n_batches = 0
        for xb_batch, mb_batch, rb_batch, _ in iterator():
            n_batches += 1
            # convert to torch
            xb = torch.tensor(xb_batch, dtype=torch.float32, device=device)    # (Bbatch, T, F_aug)
            mb = torch.tensor(mb_batch, dtype=torch.float32, device=device)
            rb = torch.tensor(rb_batch, dtype=torch.float32, device=device)

            # apply mask to inputs (zero out padded / missing feature slots)
            xb = xb * mb

            logits, vals = model(xb)   # logits (B,3), vals (B,)
            # sanitize
            logits = torch.nan_to_num(logits, nan=0.0, posinf=1e6, neginf=-1e6)
            vals   = torch.nan_to_num(vals, nan=0.0, posinf=1e6, neginf=-1e6)

            # sample actions via distribution for on-policy update
            dist = torch.distributions.Categorical(logits=logits)
            acts = dist.sample()             # (B,)
            # map action to signed reward: long=2 -> +r, flat=1 -> 0, short=0 -> -r
            reward_tensor = torch.where(acts == 2, rb,
                                       torch.where(acts == 0, -rb, torch.zeros_like(rb)))

            loss = a3c_loss(logits, vals, acts, reward_tensor, beta=BETA_ENTROPY)

            opt.zero_grad()
            loss.backward()
            # gradient clipping
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=GRAD_CLIP)
            opt.step()

            loss_ep += float(loss.item())

            # free memory
            del xb, mb, rb, logits, vals, dist, acts, reward_tensor, loss
            gc.collect()
        avg_loss = loss_ep / max(1, n_batches)
        print(f"[A3C] Cluster {c_id} | Epoch {ep+1}/{epochs} | AvgLoss {avg_loss:.6f} | batches {n_batches}")
        gc.collect(); torch.cuda.empty_cache()

    # --- Save model checkpoint ---
    model_path = os.path.join(MODEL_DIR, f"a3c_cluster_{c_id}.pt")
    torch.save(model.state_dict(), model_path)
    print(f"  ✅ Saved A3C checkpoint: {model_path}")

    # cleanup
    del X, M, model, opt
    gc.collect(); torch.cuda.empty_cache()

# --- Run training for all clusters ---
for meta in tensor_index:
    try:
        process_cluster(meta, epochs=EPOCHS_DEFAULT, lr=LR_DEFAULT, batch_size=BATCH_SIZE_DEFAULT, reward_clip=REWARD_CLIP)
    except Exception as e:
        print(f"Error processing cluster {meta.get('cluster')}: {e}")
        gc.collect()

print("✅ Done Block 8: A3C models trained and saved.")

[Cluster 0] loaded tensor ./tensors/cluster_0_tensor.npy -> (B,T,N,F)=(509,64,23,25)
[A3C] Cluster 0 | Epoch 1/3 | AvgLoss -0.011300 | batches 46
[A3C] Cluster 0 | Epoch 2/3 | AvgLoss -0.011976 | batches 46
[A3C] Cluster 0 | Epoch 3/3 | AvgLoss -0.011060 | batches 46
  ✅ Saved A3C checkpoint: ./models/a3c_cluster_0.pt
[Cluster 1] loaded tensor ./tensors/cluster_1_tensor.npy -> (B,T,N,F)=(509,64,28,25)
[A3C] Cluster 1 | Epoch 1/3 | AvgLoss -0.012400 | batches 56
[A3C] Cluster 1 | Epoch 2/3 | AvgLoss -0.011375 | batches 56
[A3C] Cluster 1 | Epoch 3/3 | AvgLoss -0.011279 | batches 56
  ✅ Saved A3C checkpoint: ./models/a3c_cluster_1.pt
[Cluster 2] loaded tensor ./tensors/cluster_2_tensor.npy -> (B,T,N,F)=(509,64,26,25)
[A3C] Cluster 2 | Epoch 1/3 | AvgLoss -0.011559 | batches 52
[A3C] Cluster 2 | Epoch 2/3 | AvgLoss -0.011941 | batches 52
[A3C] Cluster 2 | Epoch 3/3 | AvgLoss -0.011691 | batches 52
  ✅ Saved A3C checkpoint: ./models/a3c_cluster_2.pt
[Cluster 3] loaded tensor ./tensors/clus

In [32]:
# Block 8 — Train A3C multi-stock per-cluster (Final Stable)

import os, gc, json
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim

DATA_DIR  = "./tensors/"
MODEL_DIR = "./models/"
os.makedirs(MODEL_DIR, exist_ok=True)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load tensor metadata
with open(os.path.join(DATA_DIR, "tensor_index.json"), "r") as f:
    tensor_index = json.load(f)

# --- Model định nghĩa ---
class A3CNet(nn.Module):
    def __init__(self, n_features, hidden=64):
        super().__init__()
        self.lstm = nn.LSTM(input_size=n_features, hidden_size=hidden, batch_first=True)
        self.actor = nn.Linear(hidden, 3)   # short, flat, long
        self.critic = nn.Linear(hidden, 1)
    def forward(self, x):
        out, _ = self.lstm(x)
        h = out[:, -1, :]
        return self.actor(h), self.critic(h)

# --- Loss ---
def a3c_loss(logits, values, actions, rewards, beta=0.01):
    adv = rewards - values.squeeze(-1)
    critic = adv.pow(2).mean()
    logp = torch.log_softmax(logits, dim=-1)
    actor = -(logp.gather(1, actions.unsqueeze(1)).squeeze(1) * adv.detach()).mean()
    entropy = -(torch.softmax(logits, dim=-1) * logp).sum(-1).mean()
    return actor + 0.5*critic - beta*entropy

# --- Training per cluster ---
def process_cluster(meta, epochs=3, lr=1e-3, batch_size=256):
    c_id, tickers, dates, dates_shifted = (
        meta["cluster"], meta["tickers"], meta["dates"], meta["dates_shifted"]
    )
    X = np.load(os.path.join(DATA_DIR, meta["tensor_file"]), mmap_mode="r")
    M = np.load(os.path.join(DATA_DIR, meta["mask_file"]), mmap_mode="r")
    if X.size == 0:
        return
    B, T, N, F = X.shape
    print(f"Cluster {c_id} | X={X.shape}")

    # --- Clean tensors ---
    X = X.astype(np.float32)
    X = np.nan_to_num(X, nan=0.0, posinf=1e6, neginf=-1e6)
    M = M.astype(np.float32)
    assert not np.isnan(X).any(), "NaN still in X"
    assert not np.isinf(X).any(), "Inf still in X"

    # --- Reward lookup từ df_backtest ---
    rewards_map = {}
    for tk in tickers:
        sub = df_backtest[df_backtest["ticker"] == tk].sort_values("timestamp")
        for _, row in sub.iterrows():
            entry = row["timestamp"]
            r = row["realized_return"]
            if pd.notna(r):
                rewards_map[(tk, entry)] = float(r)

    # --- Model + optimizer ---
    model = A3CNet(F).to(device)
    opt = optim.Adam(model.parameters(), lr=lr)

    # Mini-batch generator
    total = B * N
    def iterator():
        for start in range(0, total, batch_size):
            end = min(total, start+batch_size)
            xb, mb, rb, idx = [], [], [], []
            for s in range(start, end):
                b, n = divmod(s, N)
                entry_date = pd.to_datetime(dates[b])
                tk = tickers[n]
                reward = rewards_map.get((tk, entry_date), 0.0)  # impute NaN->0
                xb.append(X[b, :, n, :])
                mb.append(M[b, :, n, :])
                rb.append(reward)
                idx.append((b, n))
            yield np.stack(xb), np.stack(mb), np.array(rb, dtype=np.float32), idx

    # --- Train ---
    for ep in range(epochs):
        loss_ep = 0
        for xb, mb, rb, _ in iterator():
            xb = torch.tensor(xb, dtype=torch.float32).to(device)
            mb = torch.tensor(mb, dtype=torch.float32).to(device)
            rb = torch.tensor(rb, dtype=torch.float32).to(device)

            xb = xb * mb  # apply mask

            logits, vals = model(xb)
            logits = torch.nan_to_num(logits, nan=0.0, posinf=1e6, neginf=-1e6)

            dist = torch.distributions.Categorical(logits=logits)
            act = dist.sample()
            # reward theo action: long=2, short=0, flat=1
            reward = torch.where(act==2, rb,
                        torch.where(act==0, -rb, torch.zeros_like(rb)))
            loss = a3c_loss(logits, vals, act, reward)

            opt.zero_grad()
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)  # gradient clipping
            opt.step()
            loss_ep += loss.item()
        print(f"  Epoch {ep+1}/{epochs}, Loss={loss_ep:.4f}")
        gc.collect(); torch.cuda.empty_cache()

    # --- Save model ---
    model_path = os.path.join(MODEL_DIR, f"a3c_cluster_{c_id}.pt")
    torch.save(model.state_dict(), model_path)
    print(f"  ✅ Saved model checkpoint: {model_path}")

    del X, M, model, opt
    gc.collect(); torch.cuda.empty_cache()

# --- Run all clusters ---
for meta in tensor_index:
    process_cluster(meta)

print(f"✅ Done Block 8: models saved in {MODEL_DIR}")

Cluster 0 | X=(509, 64, 23, 25)
  Epoch 1/3, Loss=-0.6032
  Epoch 2/3, Loss=-0.4641
  Epoch 3/3, Loss=-0.5235
  ✅ Saved model checkpoint: ./models/a3c_cluster_0.pt
Cluster 1 | X=(509, 64, 28, 25)
  Epoch 1/3, Loss=-0.8764
  Epoch 2/3, Loss=-0.5717
  Epoch 3/3, Loss=-0.6274
  ✅ Saved model checkpoint: ./models/a3c_cluster_1.pt
Cluster 2 | X=(509, 64, 26, 25)
  Epoch 1/3, Loss=-0.6268
  Epoch 2/3, Loss=-0.5301
  Epoch 3/3, Loss=-0.5815
  ✅ Saved model checkpoint: ./models/a3c_cluster_2.pt
Cluster 3 | X=(509, 64, 33, 25)
  Epoch 1/3, Loss=-0.6216
  Epoch 2/3, Loss=-0.7012
  Epoch 3/3, Loss=-0.7094
  ✅ Saved model checkpoint: ./models/a3c_cluster_3.pt
Cluster 4 | X=(509, 64, 31, 25)
  Epoch 1/3, Loss=-0.7542
  Epoch 2/3, Loss=-0.6479
  Epoch 3/3, Loss=-0.6742
  ✅ Saved model checkpoint: ./models/a3c_cluster_4.pt
Cluster 5 | X=(509, 64, 30, 25)
  Epoch 1/3, Loss=-0.3925
  Epoch 2/3, Loss=-0.5933
  Epoch 3/3, Loss=-0.6354
  ✅ Saved model checkpoint: ./models/a3c_cluster_5.pt
Cluster 6 | X=(5

**Block 9: Suy luận từ mô hình A3C**

In [57]:
# Block 9 — Inference từ checkpoint A3C (có regime thực từ df_backtest)

import os, gc, json, csv
import numpy as np
import pandas as pd
import torch
import torch.nn as nn

DATA_DIR = "./tensors/"
MODEL_DIR = "./models/"
SIG_DIR   = "./signals/"
os.makedirs(SIG_DIR, exist_ok=True)

SIG_FILE = os.path.join(SIG_DIR, "a3c_signals_infer.csv")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Reset signals file
if os.path.exists(SIG_FILE):
    os.remove(SIG_FILE)
with open(SIG_FILE, "w", newline="") as f:
    csv.writer(f).writerow(["date","ticker","signal"])

# Load metadata
with open(os.path.join(DATA_DIR, "tensor_index.json"), "r") as f:
    tensor_index = json.load(f)

# --- Model định nghĩa lại (giống Block 😎 ---
class A3CNet(nn.Module):
    def __init__(self, n_features, hidden=64):
        super().__init__()
        self.lstm = nn.LSTM(input_size=n_features, hidden_size=hidden, batch_first=True)
        self.actor = nn.Linear(hidden, 3)   # short, flat, long
        self.critic = nn.Linear(hidden, 1)
    def forward(self, x):
        out, _ = self.lstm(x)
        h = out[:, -1, :]
        return self.actor(h), self.critic(h)

# --- Tạo regime map từ df_backtest ---
# df_backtest phải có cột: ["ticker","timestamp","regime_bear","regime_bull","regime_sideway"]
regime_map = {
    (row.ticker, pd.to_datetime(row.timestamp)): 
        np.array([row.regime_bear, row.regime_bull, row.regime_sideway], dtype=np.float32)
    for _, row in df_backtest.iterrows()
}

# --- Inference function ---
def infer_cluster(meta, batch_size=256):
    c_id, tickers, dates, dates_shifted = meta["cluster"], meta["tickers"], meta["dates"], meta["dates_shifted"]
    X = np.load(os.path.join(DATA_DIR, meta["tensor_file"]), mmap_mode="r")
    M = np.load(os.path.join(DATA_DIR, meta["mask_file"]), mmap_mode="r")
    if X.size == 0:
        return
    B, T, N, F = X.shape
    print(f"[Inference] Cluster {c_id} | X={X.shape}")

    # Load model checkpoint
    model_path = os.path.join(MODEL_DIR, f"a3c_cluster_{c_id}.pt")
    if not os.path.exists(model_path):
        print(f"⚠️ Model checkpoint not found: {model_path}, skip")
        return
    model = A3CNet(F+3).to(device)  # +3 vì có regime one-hot
    model.load_state_dict(torch.load(model_path, map_location=device))
    model.eval()

    # Inference & save signals
    total = B * N
    with open(SIG_FILE, "a", newline="") as f:
        w = csv.writer(f)
        with torch.no_grad():
            for start in range(0, total, batch_size):
                end = min(total, start+batch_size)
                xb, mb, idx = [], [], []
                for s in range(start, end):
                    b, n = divmod(s, N)
                    tk = tickers[n]
                    dt = pd.to_datetime(dates[b])

                    # regime thực tại ngày dt
                    regime_vec = regime_map.get((tk, dt), np.array([0,0,0], dtype=np.float32))

                    # append regime vào mỗi timestep của sequence
                    seq_x = X[b, :, n, :]            # shape (T, F)
                    seq_x = np.concatenate([seq_x, np.tile(regime_vec, (T,1))], axis=1)  # (T, F+3)

                    xb.append(seq_x)
                    mb.append(M[b, :, n, :])  # mask gốc không đổi (vẫn shape (T,F))
                    idx.append((b, n))

                xb = torch.tensor(np.stack(xb), dtype=torch.float32).to(device)
                mb = torch.tensor(np.stack(mb), dtype=torch.float32).to(device)

                # (không mask lên regime vì đó là thông tin "luôn có")
                xb = xb * torch.cat([mb, torch.ones(mb.shape[0], mb.shape[1], 3, device=device)], dim=2)

                logits, _ = model(xb)
                acts = torch.argmax(logits, dim=-1).cpu().numpy() - 1  # (-1,0,1)

                for k,(b,n) in enumerate(idx):
                    w.writerow([dates[b], tickers[n], int(acts[k])])

                del xb, mb, acts
                gc.collect(); torch.cuda.empty_cache()

    del X, M, model
    gc.collect(); torch.cuda.empty_cache()

# --- Run inference all clusters ---
for meta in tensor_index:
    infer_cluster(meta)

print(f"✅ Done Block 9: inference signals saved to {SIG_FILE} (có regime thực).")

[Inference] Cluster 0 | X=(509, 64, 23, 25)
[Inference] Cluster 1 | X=(509, 64, 28, 25)
[Inference] Cluster 2 | X=(509, 64, 26, 25)
[Inference] Cluster 3 | X=(509, 64, 33, 25)
[Inference] Cluster 4 | X=(509, 64, 31, 25)
[Inference] Cluster 5 | X=(509, 64, 30, 25)
[Inference] Cluster 6 | X=(509, 64, 30, 25)
[Inference] Cluster 7 | X=(509, 64, 26, 25)
[Inference] Cluster 8 | X=(509, 64, 27, 25)
[Inference] Cluster 9 | X=(509, 64, 29, 25)
[Inference] Cluster 10 | X=(509, 64, 29, 25)
[Inference] Cluster 11 | X=(509, 64, 30, 25)
[Inference] Cluster 12 | X=(509, 64, 26, 25)
[Inference] Cluster 13 | X=(509, 64, 28, 25)
[Inference] Cluster 14 | X=(509, 64, 31, 25)
[Inference] Cluster 15 | X=(509, 64, 30, 25)
[Inference] Cluster 16 | X=(509, 64, 28, 25)
[Inference] Cluster 17 | X=(509, 64, 28, 25)
[Inference] Cluster 18 | X=(509, 64, 29, 25)
[Inference] Cluster 19 | X=(509, 64, 29, 25)
[Inference] Cluster 20 | X=(509, 64, 26, 25)
[Inference] Cluster 21 | X=(509, 64, 28, 25)
[Inference] Cluster 

In [33]:
# Block 9 — Inference từ checkpoint A3C

import os, gc, json, csv
import numpy as np
import pandas as pd
import torch
import torch.nn as nn

DATA_DIR = "./tensors/"
MODEL_DIR = "./models/"
SIG_DIR   = "./signals/"
os.makedirs(SIG_DIR, exist_ok=True)

SIG_FILE = os.path.join(SIG_DIR, "a3c_signals_infer.csv")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Reset signals file
if os.path.exists(SIG_FILE):
    os.remove(SIG_FILE)
with open(SIG_FILE, "w", newline="") as f:
    csv.writer(f).writerow(["date","ticker","signal"])

# Load metadata
with open(os.path.join(DATA_DIR, "tensor_index.json"), "r") as f:
    tensor_index = json.load(f)

# --- Model định nghĩa lại (giống Block 8) ---
class A3CNet(nn.Module):
    def __init__(self, n_features, hidden=64):
        super().__init__()
        self.lstm = nn.LSTM(input_size=n_features, hidden_size=hidden, batch_first=True)
        self.actor = nn.Linear(hidden, 3)   # short, flat, long
        self.critic = nn.Linear(hidden, 1)
    def forward(self, x):
        out, _ = self.lstm(x)
        h = out[:, -1, :]
        return self.actor(h), self.critic(h)

# --- Inference function ---
def infer_cluster(meta, batch_size=256):
    c_id, tickers, dates, dates_shifted = meta["cluster"], meta["tickers"], meta["dates"], meta["dates_shifted"]
    X = np.load(os.path.join(DATA_DIR, meta["tensor_file"]), mmap_mode="r")
    M = np.load(os.path.join(DATA_DIR, meta["mask_file"]), mmap_mode="r")
    if X.size == 0:
        return
    B, T, N, F = X.shape
    print(f"[Inference] Cluster {c_id} | X={X.shape}")

    # Load model checkpoint
    model_path = os.path.join(MODEL_DIR, f"a3c_cluster_{c_id}.pt")
    if not os.path.exists(model_path):
        print(f"⚠️ Model checkpoint not found: {model_path}, skip")
        return
    model = A3CNet(F).to(device)
    model.load_state_dict(torch.load(model_path, map_location=device))
    model.eval()

    # Inference & save signals
    total = B * N
    with open(SIG_FILE, "a", newline="") as f:
        w = csv.writer(f)
        with torch.no_grad():
            for start in range(0, total, batch_size):
                end = min(total, start+batch_size)
                xb, mb, idx = [], [], []
                for s in range(start, end):
                    b, n = divmod(s, N)
                    xb.append(X[b, :, n, :])
                    mb.append(M[b, :, n, :])
                    idx.append((b, n))
                xb = torch.tensor(np.stack(xb), dtype=torch.float32).to(device)
                mb = torch.tensor(np.stack(mb), dtype=torch.float32).to(device)

                # Áp dụng mask
                xb = xb * mb

                acts = torch.argmax(model(xb)[0], dim=-1).cpu().numpy() - 1  # (-1,0,1)
                for k,(b,n) in enumerate(idx):
                    # dùng dates[b] (ngày cuối window), nhưng reward tính T+1 (dates_shifted)
                    w.writerow([dates[b], tickers[n], int(acts[k])])
                del xb, mb, acts
                gc.collect(); torch.cuda.empty_cache()

    del X, M, model
    gc.collect(); torch.cuda.empty_cache()

# --- Run inference all clusters ---
for meta in tensor_index:
    infer_cluster(meta)

print(f"✅ Done Block 9: inference signals saved to {SIG_FILE}")


[Inference] Cluster 0 | X=(509, 64, 23, 25)
[Inference] Cluster 1 | X=(509, 64, 28, 25)
[Inference] Cluster 2 | X=(509, 64, 26, 25)
[Inference] Cluster 3 | X=(509, 64, 33, 25)
[Inference] Cluster 4 | X=(509, 64, 31, 25)
[Inference] Cluster 5 | X=(509, 64, 30, 25)
[Inference] Cluster 6 | X=(509, 64, 30, 25)
[Inference] Cluster 7 | X=(509, 64, 26, 25)
[Inference] Cluster 8 | X=(509, 64, 27, 25)
[Inference] Cluster 9 | X=(509, 64, 29, 25)
[Inference] Cluster 10 | X=(509, 64, 29, 25)
[Inference] Cluster 11 | X=(509, 64, 30, 25)
[Inference] Cluster 12 | X=(509, 64, 26, 25)
[Inference] Cluster 13 | X=(509, 64, 28, 25)
[Inference] Cluster 14 | X=(509, 64, 31, 25)
[Inference] Cluster 15 | X=(509, 64, 30, 25)
[Inference] Cluster 16 | X=(509, 64, 28, 25)
[Inference] Cluster 17 | X=(509, 64, 28, 25)
[Inference] Cluster 18 | X=(509, 64, 29, 25)
[Inference] Cluster 19 | X=(509, 64, 29, 25)
[Inference] Cluster 20 | X=(509, 64, 26, 25)
[Inference] Cluster 21 | X=(509, 64, 28, 25)
[Inference] Cluster 

**Block 10 : Huấn luyện Cluster DDPG (chỉ với trường hợp vị thế long)**

In [None]:
# Block 10 — Cluster DDPG + Execution Lag + Turnover Cost (Final, realistic backtest with extra logs)
import os, gc, json, csv, math, time
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from FiinQuantX import FiinSession

# ====== I/O paths ======
DATA_DIR   = "./tensors/"
SIG_DIR    = "./signals/"
OUTPUT_DIR = "./backtest_ddpg/"
MODEL_DIR  = "./models/"
os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(MODEL_DIR, exist_ok=True)

# ====== Config ======
INIT_CAPITAL    = 10_000.0
BENCHMARK_TKR   = "VNINDEX"
EXECUTION_LAG   = 2           # T+2 execution/settlement (we align signals accordingly)
COST_BPS        = 20          # 0.20% per round-trip on turnover
STATE_LKBK      = 20
MIN_NAMES_PER_CLUSTER = 1
MAX_HOLD_TICKERS = 10
MAX_HOLD_DAYS = 15
SEED = 42

# DDPG hyper
EPOCHS       = 60
BATCH_SIZE   = 128
LR_ACTOR     = 1e-4
LR_CRITIC    = 1e-4
GAMMA        = 0.94
TAU          = 0.01
NOISE_STD    = 0.15
NOISE_DECAY  = 0.97
HIDDEN       = 128
BUFFER_MAX   = 100_000

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.manual_seed(SEED); np.random.seed(SEED)

# ====== Load artifacts ======
# A3C signals (inference)
sig_file = os.path.join(SIG_DIR, "a3c_signals_infer.csv")
if not os.path.exists(sig_file):
    raise FileNotFoundError(f"{sig_file} not found. Please run A3C inference earlier.")
signals = pd.read_csv(sig_file)
signals["date"] = pd.to_datetime(signals["date"])

# df_backtest must exist (from Block 7.5)
# Expect df_backtest in notebook namespace or saved CSV in OUTPUT_DIR
if 'df_backtest' not in globals():
    fb_path = os.path.join(OUTPUT_DIR, "df_backtest.csv")
    if os.path.exists(fb_path):
        df_backtest = pd.read_csv(fb_path, parse_dates=["timestamp"])
    else:
        raise FileNotFoundError("df_backtest not found in workspace and ./backtest_ddpg/df_backtest.csv missing. Produce df_backtest first.")

# Work copy
df_px = df_backtest.rename(columns={"timestamp":"date"}).copy()
df_px["date"] = pd.to_datetime(df_px["date"])

# realized mapping used for shaping / planned exits (index by (ticker,date))
realized = df_px[["ticker","date","exit_date","exit_price","realized_return",
                  "entry_regime","tp_level","sl_level","exit_type","horizon_days","fee"]].copy()
realized = realized.set_index(["ticker","date"]).sort_index()

# cluster metadata from tensors (tensor_index.json)
tidx_path = os.path.join(DATA_DIR, "tensor_index.json")
if not os.path.exists(tidx_path):
    raise FileNotFoundError(f"{tidx_path} not found. Provide tensor_index.json.")
with open(tidx_path, "r") as f:
    tensor_index = json.load(f)

ticker2cluster = {}
for meta in tensor_index:
    for tk in meta.get("tickers", []):
        # keep first mapping encountered
        ticker2cluster.setdefault(tk, meta.get("cluster"))
ticker2cluster = pd.Series(ticker2cluster)

# price wide and daily returns for realistic test
px_wide = df_px.pivot(index="date", columns="ticker", values="close").sort_index()
daily_ret = px_wide.pct_change().fillna(0.0)

# ====== train/test split (time-based) ======
TRAIN_START = pd.Timestamp("2023-01-01")
TRAIN_END   = pd.Timestamp("2024-12-31")
TEST_START  = pd.Timestamp("2025-01-01")
TEST_END    = daily_ret.index.max()

# ====== Prepare signals with execution lag (T+2) ======
sig_wide_raw = signals.pivot_table(index="date", columns="ticker", values="signal", aggfunc="last").sort_index()
idx_all = daily_ret.index.union(sig_wide_raw.index)
daily_ret = daily_ret.reindex(idx_all).fillna(0.0)
sig_wide = sig_wide_raw.reindex(idx_all).fillna(0.0)
sig_wide_lag = sig_wide.shift(EXECUTION_LAG)   # we will use this for building actives

# Keep tickers that exist in cluster mapping and price universe
tickers = sorted([t for t in daily_ret.columns if t in ticker2cluster.index])
daily_ret = daily_ret[tickers].astype("float32")
sig_wide_lag = sig_wide_lag[tickers].astype("float32")
cluster_of = ticker2cluster.loc[tickers]

clusters = sorted(cluster_of.unique().tolist())
cluster_members = {c: cluster_of[cluster_of==c].index.tolist() for c in clusters}
C = len(clusters)

# ====== Helpers to build cluster-level states using realized returns (for shaping during train) ======
def build_state_arrays_realized(ret_w, sig_lag, realized_idx, start, end, K=STATE_LKBK):
    """
    Build:
      - S_mat: (T', 2*C*K + 3) where we append (global) regime one-hot
      - R_mat: (T', C) cluster realized returns (mean realized_return among active tickers)
      - dates: dates2
      - ACTIVE_masks: dict[c] -> DataFrame active mask aligned to dates2
    """
    dates_all = ret_w.loc[start:end].index
    act_cols, ret_cols = [], []
    ACTIVE_masks = {}

    for c in clusters:
        tks = cluster_members[c]
        if not tks:
            act_c = pd.Series(0.0, index=dates_all, name=c)
            ret_c = pd.Series(0.0, index=dates_all, name=c)
            ACTIVE_masks[c] = pd.DataFrame(0.0, index=dates_all, columns=tks)
        else:
            S_c = sig_lag[tks].reindex(dates_all).fillna(0.0)
            active_mask = (S_c > 0).astype("float32")   # long-only
            ACTIVE_masks[c] = active_mask

            # lookup realized_return for each (tk, date)
            rr = pd.DataFrame(index=dates_all, columns=tks, dtype="float32")
            for tk in tks:
                vals = []
                for d in dates_all:
                    try:
                        vals.append(realized_idx.loc[(tk, d), "realized_return"])
                    except Exception:
                        vals.append(np.nan)
                rr[tk] = pd.Series(vals, index=dates_all, dtype="float32")

            denom = active_mask.sum(axis=1).replace(0, np.nan)
            w = active_mask.div(denom, axis=0).fillna(0.0)   # equal weight among active tickers
            ret_c = (rr.fillna(0.0) * w).sum(axis=1).astype("float32")
            act_c = active_mask.mean(axis=1).astype("float32")

        act_cols.append(act_c.rename(c))
        ret_cols.append(ret_c.rename(c))

    act_df = pd.concat(act_cols, axis=1).astype("float32")
    cret_df = pd.concat(ret_cols, axis=1).astype("float32")

    Tfull = len(dates_all)
    # build lookback stacks: (T, C, K)
    A_stack = np.zeros((Tfull, C, K), dtype="float32")
    R_stack = np.zeros((Tfull, C, K), dtype="float32")
    for k in range(K):
        A_stack[:, :, k] = act_df.shift(k).fillna(0.0).values
        R_stack[:, :, k] = cret_df.shift(k).fillna(0.0).values

    valid_idx = np.arange(Tfull) >= (K - 1)
    dates2 = dates_all[valid_idx]
    A3 = A_stack[valid_idx]   # (T', C, K)
    R3 = R_stack[valid_idx]   # (T', C, K)

    S_flat = np.concatenate([A3.reshape(len(dates2), -1), R3.reshape(len(dates2), -1)], axis=1).astype("float32")

    # global regime one-hot from realized_idx (majority across tickers if possible)
    regimes = []
    for d in dates2:
        try:
            sample = realized_idx.xs(d, level="date")
            rvec = np.array([
                sample.get("entry_regime", (sample.index.to_series()*0).astype(object)).apply(lambda x: 1 if x=="bear" else 0).mean() if "entry_regime" in sample.columns else 0.0,
                sample.get("entry_regime", (sample.index.to_series()*0).astype(object)).apply(lambda x: 1 if x=="bull" else 0).mean() if "entry_regime" in sample.columns else 0.0,
                sample.get("entry_regime", (sample.index.to_series()*0).astype(object)).apply(lambda x: 1 if x=="sideway" else 0).mean() if "entry_regime" in sample.columns else 0.0
            ], dtype="float32")
        except Exception:
            rvec = np.array([0.0,0.0,0.0], dtype="float32")
        regimes.append(rvec)
    regimes = np.stack(regimes, axis=0)

    S_mat = np.concatenate([S_flat, regimes], axis=1)
    R_mat = cret_df.loc[dates2].values.astype("float32")
    ACTIVE_masks = {c: ACTIVE_masks[c].loc[dates2] for c in clusters}
    return S_mat, R_mat, dates2, ACTIVE_masks

# Build train/test arrays (for DDPG training)
S_train, R_train, d_train, ACTIVE_train = build_state_arrays_realized(daily_ret, sig_wide_lag, realized, TRAIN_START, TRAIN_END, K=STATE_LKBK)
S_test,  R_test,  d_test,  ACTIVE_test  = build_state_arrays_realized(daily_ret, sig_wide_lag, realized, TEST_START,  TEST_END,  K=STATE_LKBK)

# ====== DDPG actor/critic definitions ======
class Actor(nn.Module):
    def __init__(self, s_dim, a_dim, hidden=HIDDEN):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(s_dim, hidden), nn.ReLU(),
            nn.Linear(hidden, hidden), nn.ReLU(),
            nn.Linear(hidden, a_dim)
        )
    def forward(self, s):
        logits = self.net(s)
        return torch.softmax(logits, dim=-1)

class Critic(nn.Module):
    def __init__(self, s_dim, a_dim, hidden=HIDDEN):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(s_dim + a_dim, hidden), nn.ReLU(),
            nn.Linear(hidden, hidden), nn.ReLU(),
            nn.Linear(hidden, 1)
        )
    def forward(self, s, a):
        return self.net(torch.cat([s,a], dim=-1))

class ReplayBuffer:
    def __init__(self, maxlen=BUFFER_MAX):
        self.maxlen = maxlen; self.buf=[]
    def push(self, s,a,r,s2):
        if len(self.buf) >= self.maxlen:
            self.buf.pop(0)
        self.buf.append((s,a,r,s2))
    def sample(self, bs):
        n = min(bs, len(self.buf))
        idx = np.random.choice(len(self.buf), n, replace=False)
        s,a,r,s2 = zip(*[self.buf[i] for i in idx])
        return (np.array(s, np.float32), np.array(a, np.float32),
                np.array(r, np.float32).reshape(-1,1), np.array(s2, np.float32))

def soft_update(src, tgt, tau):
    with torch.no_grad():
        for p, tp in zip(src.parameters(), tgt.parameters()):
            tp.data.mul_(1-tau); tp.data.add_(tau*p.data)

# Reward helper (cluster-level) used in training shaping
def port_reward_cluster(w_cluster, r_cluster, prev_w_cluster=None, cost_bps=COST_BPS, train_penalty=True):
    gross = float(np.dot(w_cluster, r_cluster))
    turnover = 0.0
    if prev_w_cluster is not None:
        turnover = float(np.sum(np.abs(w_cluster - prev_w_cluster)))
    fee = (cost_bps/1e4) * turnover
    penalty = fee if train_penalty else 0.0
    return gross - penalty

# ====== Instantiate models & training setup ======
s_dim = S_train.shape[1]; a_dim = C
actor = Actor(s_dim, a_dim).to(DEVICE)
critic = Critic(s_dim, a_dim).to(DEVICE)
t_actor = Actor(s_dim, a_dim).to(DEVICE); t_actor.load_state_dict(actor.state_dict())
t_critic = Critic(s_dim, a_dim).to(DEVICE); t_critic.load_state_dict(critic.state_dict())
optA = optim.Adam(actor.parameters(), lr=LR_ACTOR)
optC = optim.Adam(critic.parameters(), lr=LR_CRITIC)
mse = nn.MSELoss()
buf = ReplayBuffer(BUFFER_MAX)

# ====== Training loop (DDPG) ======
prev_w = None
noise_std = NOISE_STD
min_buffer = max(500, BATCH_SIZE*5)

for ep in range(EPOCHS):
    c_loss = a_loss = 0.0
    prev_w = None
    # iterate time steps
    for t in range(len(S_train)-1):
        s = torch.from_numpy(S_train[t]).to(DEVICE).unsqueeze(0)
        s2 = torch.from_numpy(S_train[t+1]).to(DEVICE).unsqueeze(0)
        with torch.no_grad():
            w = actor(s).cpu().numpy()[0]
        # exploration on simplex via logits noise
        logits = np.log(np.clip(w, 1e-9, 1.0))
        logits = logits + np.random.normal(0, noise_std, size=a_dim)
        w_e = np.exp(logits); w_e = w_e / (w_e.sum() + 1e-12)
        r = port_reward_cluster(w_e, R_train[t], prev_w_cluster=prev_w, train_penalty=True)
        prev_w = w_e.copy()
        buf.push(S_train[t], w_e, r, S_train[t+1])

        # update networks
        if len(buf.buf) >= min_buffer:
            sb, ab, rb, s2b = buf.sample(BATCH_SIZE)
            sb = torch.from_numpy(sb).to(DEVICE)
            ab = torch.from_numpy(ab).to(DEVICE)
            rb = torch.from_numpy(rb).to(DEVICE)
            s2b = torch.from_numpy(s2b).to(DEVICE)

            with torch.no_grad():
                a2 = t_actor(s2b)
                q2 = t_critic(s2b, a2)
                y = rb + GAMMA * q2

            q = critic(sb, ab)
            lc = mse(q, y)
            optC.zero_grad(); lc.backward(); optC.step()

            ap = actor(sb)
            la = -critic(sb, ap).mean()
            optA.zero_grad(); la.backward(); optA.step()

            soft_update(actor, t_actor, TAU)
            soft_update(critic, t_critic, TAU)

            c_loss += float(lc.item()); a_loss += float(la.item())

    noise_std *= NOISE_DECAY
    print(f"[DDPG] Epoch {ep+1}/{EPOCHS} | Critic {c_loss:.4f} | Actor {a_loss:.4f} | noise {noise_std:.5f}")
    gc.collect(); torch.cuda.empty_cache()

# Save actor
torch.save(actor.state_dict(), os.path.join(MODEL_DIR, "ddpg_actor.pt"))

# ====== Backtest (realistic daily returns, dynamic top-10, SL/TP, cash buffer) ======
dates = pd.DatetimeIndex(d_test)   # aligned dates from state builder
dates = dates.sort_values()
capital = INIT_CAPITAL
nav = capital

# Outputs & logs
portfolio_value = pd.Series(index=dates, dtype="float64")
portfolio_value.iloc[0] = capital

# Trade log: detailed fields incl entry_alloc/exit_alloc
trade_log_cols = [
    "entry_date","exit_date","ticker","entry_price","exit_price",
    "entry_alloc","exit_alloc","entry_regime","exit_type",
    "tp_level","sl_level","realized_return","holding_days","fee"
]
trade_log = []

# cluster weights CSV writer
cw_path = os.path.join(OUTPUT_DIR, "cluster_weights_test.csv")
with open(cw_path, "w", newline="") as f:
    cw = csv.writer(f); cw.writerow(["date"] + [f"cluster_{c}" for c in clusters] + ["cash_buffer"])

# dynamic daily ticker weights (post-rebalance)
weights_by_ticker_path = os.path.join(OUTPUT_DIR, "weights_by_ticker.csv")
weights_df = pd.DataFrame(0.0, index=dates, columns=tickers, dtype="float32")

# holdings: dict[ticker] -> dict(entry_date, entry_price, planned_exit_date, planned_exit_price, entry_regime, tp, sl, entry_alloc)
holdings = {}

# previous day weights for turnover calc
prev_w_ticker = pd.Series(0.0, index=tickers, dtype="float32")

# Extra logs
regime_daily_list = []
holdings_lastday_rows = []

# ====== Backtest main loop (FULL with logging) ======
for i, dt in enumerate(dates):
    # prepare state for actor: use previous state to avoid leakage (action lag 1 step)
    s_idx = i-1 if i>0 else i
    s = torch.from_numpy(S_test[s_idx].astype("float32")).to(DEVICE).unsqueeze(0)
    with torch.no_grad():
        w_c = actor(s).cpu().numpy()[0].astype("float32")
    # clip & normalize
    w_c = np.clip(w_c, 0.0, 1.0)
    if w_c.sum() > 0:
        w_c = w_c / w_c.sum()
    else:
        w_c = np.ones_like(w_c) / len(w_c)

    # determine cash buffer from global regime at date dt (from df_px if present)
    cash_buffer = 0.15  # default sideway
    majority_regime = None
    try:
        todays = df_px[df_px["date"] == dt]
        if "entry_regime" in todays.columns:
            mode = todays["entry_regime"].mode()
            if len(mode) > 0:
                modev = mode.iloc[0]
                if modev == "bull": cash_buffer = 0.02
                elif modev == "bear": cash_buffer = 0.35
                else: cash_buffer = 0.15
                majority_regime = modev
    except Exception:
        majority_regime = None

    # save cluster weights (include cash_buffer)
    with open(cw_path, "a", newline="") as f:
        cw = csv.writer(f)
        cw.writerow([dt.strftime("%Y-%m-%d")] + [float(x) for x in w_c] + [float(cash_buffer)])

    # Map cluster weights down to candidate tickers and compute per-ticker score
    score = pd.Series(0.0, index=tickers, dtype="float32")
    if dt in sig_wide_lag.index:
        active_matrix_row = sig_wide_lag.loc[dt]
    else:
        active_matrix_row = pd.Series(0.0, index=tickers)

    for j, c in enumerate(clusters):
        members = cluster_members[c]
        if not members: continue
        try:
            act_row = active_matrix_row[members]
        except Exception:
            act_row = pd.Series(0.0, index=members)
        valid = [tk for tk in members if (tk in act_row.index and act_row[tk] > 0)]
        if len(valid) >= MIN_NAMES_PER_CLUSTER:
            per_tk = float(w_c[j] / len(valid))
            score.loc[valid] += per_tk

    # Now choose top-k but include held tickers
    candidates = score[score > 0].sort_values(ascending=False)
    held_now = list(holdings.keys())
    combined_candidates = pd.concat([candidates, pd.Series(0.0, index=held_now)]).groupby(level=0).first()
    topk = combined_candidates.sort_values(ascending=False).head(MAX_HOLD_TICKERS).index.tolist()

    target_scores = score.loc[topk].copy()
    target_scores = target_scores.fillna(0.0)
    total_score = target_scores.sum()
    investable = 1.0 - cash_buffer
    if total_score <= 1e-12:
        target_w_ticker = pd.Series(0.0, index=tickers, dtype="float32")
    else:
        target_w_ticker = pd.Series(0.0, index=tickers, dtype="float32")
        alloc = (target_scores / total_score) * investable
        target_w_ticker.loc[alloc.index] = alloc.values

    # Compute turnover and apply fee immediately (approx)
    turnover = float(np.sum(np.abs(target_w_ticker.values - prev_w_ticker.values)))
    fee = (COST_BPS / 1e4) * turnover
    nav = nav * (1.0 - fee)

    # Detect opens/closes
    new_open = [tk for tk in target_w_ticker.index if (prev_w_ticker.loc[tk] == 0.0 and target_w_ticker.loc[tk] > 0.0)]
    closed = [tk for tk in prev_w_ticker.index if (prev_w_ticker.loc[tk] > 0.0 and target_w_ticker.loc[tk] == 0.0)]

    # Forced exits (planned exit or max_hold_days)
    to_force_exit = []
    for tk, info in list(holdings.items()):
        planned_exit = info.get("planned_exit_date", pd.NaT)
        entry_dt = info["entry_date"]
        if pd.notna(planned_exit) and planned_exit <= dt:
            to_force_exit.append(tk); continue
        hold_days = (dt - entry_dt).days
        if isinstance(hold_days, (int, np.integer)) and hold_days >= MAX_HOLD_DAYS:
            to_force_exit.append(tk)

    # Execute forced exits first
    for tk in to_force_exit:
        info = holdings.pop(tk)
        entry_dt = info["entry_date"]
        entry_price = info.get("entry_price", np.nan)
        entry_alloc = info.get("entry_alloc", 0.0)
        try:
            rec = realized.loc[(tk, entry_dt)]
            exit_price = rec["exit_price"]; exit_date = rec["exit_date"]
            rret = rec["realized_return"]; etype = rec.get("exit_type", "forced")
            fee_tr = rec.get("fee", (COST_BPS/1e4)*abs(prev_w_ticker.loc[tk]))
        except Exception:
            exit_price = px_wide.loc[dt, tk] if (dt in px_wide.index and tk in px_wide.columns) else np.nan
            exit_date = dt
            rret = np.log(exit_price / entry_price) if pd.notna(exit_price) and entry_price>0 else np.nan
            etype = "forced"
            fee_tr = (COST_BPS/1e4) * abs(prev_w_ticker.loc[tk])
        holding_days = (exit_date - entry_dt).days if pd.notna(exit_date) else np.nan

        trade_log.append({
            "entry_date": entry_dt, "exit_date": exit_date, "ticker": tk,
            "entry_price": entry_price, "exit_price": exit_price,
            "entry_alloc": float(entry_alloc), "exit_alloc": float(prev_w_ticker.loc[tk]) if tk in prev_w_ticker.index else 0.0,
            "entry_regime": info.get("entry_regime", None),
            "exit_type": etype,
            "tp_level": info.get("tp_level", np.nan),
            "sl_level": info.get("sl_level", np.nan),
            "realized_return": rret, "holding_days": holding_days, "fee": fee_tr
        })
        prev_w = prev_w_ticker.loc[tk]
        if not pd.isna(rret):
            nav = nav * (1.0 + prev_w * rret)
        prev_w_ticker.loc[tk] = 0.0

    # Open new positions (record entry_alloc)
    for tk in new_open:
        if len(holdings) >= MAX_HOLD_TICKERS:
            continue
        entry_price = px_wide.loc[dt, tk] if (dt in px_wide.index and tk in px_wide.columns) else np.nan
        planned_exit_date = np.nan; planned_exit_price = np.nan; tp_level = np.nan; sl_level = np.nan; entry_regime = np.nan
        try:
            rec = realized.loc[(tk, dt)]
            planned_exit_date = rec["exit_date"]; planned_exit_price = rec["exit_price"]
            tp_level = rec.get("tp_level", np.nan); sl_level = rec.get("sl_level", np.nan)
            entry_regime = rec.get("entry_regime", np.nan)
        except Exception:
            pass

        holdings[tk] = {
            "entry_date": dt,
            "entry_price": entry_price,
            "planned_exit_date": planned_exit_date,
            "planned_exit_price": planned_exit_price,
            "entry_regime": entry_regime,
            "tp_level": tp_level,
            "sl_level": sl_level,
            "entry_alloc": float(target_w_ticker.loc[tk]) if tk in target_w_ticker.index else 0.0
        }

    # Rebalance: use target weights as current_weights
    current_weights = target_w_ticker.copy()
    if current_weights.sum() > 1.0:
        current_weights = current_weights / current_weights.sum()

    # compute day's portfolio return using daily_ret (mark-to-market)
    if dt in daily_ret.index:
        r_vec = daily_ret.loc[dt].reindex(current_weights.index).fillna(0.0).values
    else:
        r_vec = np.zeros(len(current_weights), dtype=float)
    port_daily_ret = float(np.dot(current_weights.values, r_vec))
    nav = nav * (1.0 + port_daily_ret)

    # store weights and update prev_w_ticker
    weights_df.loc[dt] = current_weights
    prev_w_ticker = current_weights.copy()

    # Planned exits today: finalize and remove holdings (record exit_alloc)
    for tk in list(holdings.keys()):
        info = holdings[tk]
        planned_exit = info.get("planned_exit_date", pd.NaT)
        if pd.notna(planned_exit) and planned_exit == dt:
            entry_dt = info["entry_date"]; entry_price = info["entry_price"]; entry_alloc = info.get("entry_alloc", 0.0)
            try:
                rec = realized.loc[(tk, entry_dt)]
                exit_price = rec["exit_price"]; exit_date = rec["exit_date"]; rret = rec["realized_return"]
                etype = rec.get("exit_type", "exit"); fee_tr = rec.get("fee", (COST_BPS/1e4)*abs(prev_w_ticker.loc[tk]))
            except Exception:
                exit_price = px_wide.loc[dt, tk] if (dt in px_wide.index and tk in px_wide.columns) else np.nan
                exit_date = dt
                rret = np.log(exit_price / entry_price) if pd.notna(exit_price) and entry_price>0 else np.nan
                etype = "exit"; fee_tr = (COST_BPS/1e4)*abs(prev_w_ticker.loc[tk])

            trade_log.append({
                "entry_date": entry_dt, "exit_date": exit_date, "ticker": tk,
                "entry_price": entry_price, "exit_price": exit_price,
                "entry_alloc": float(entry_alloc), "exit_alloc": float(prev_w_ticker.loc[tk]) if tk in prev_w_ticker.index else 0.0,
                "entry_regime": info.get("entry_regime", None), "exit_type": etype,
                "tp_level": info.get("tp_level", np.nan), "sl_level": info.get("sl_level", np.nan),
                "realized_return": rret, "holding_days": (exit_date - entry_dt).days if pd.notna(exit_date) else np.nan,
                "fee": fee_tr
            })
            prev_w_ticker.loc[tk] = 0.0
            holdings.pop(tk, None)

    # Save NAV + Logs daily
    portfolio_value.iloc[i] = nav
    regime_daily_list.append({
        "date": dt, "regime": majority_regime, "cash_buffer": cash_buffer, "nav": nav
    })
    # snapshot holdings for this day (after rebal/mark-to-market)
    for tk, info in holdings.items():
        holdings_lastday_rows.append({
            "snapshot_date": dt,
            "ticker": tk,
            "entry_date": info["entry_date"],
            "entry_price": info["entry_price"],
            "tp_level": info.get("tp_level", np.nan),
            "sl_level": info.get("sl_level", np.nan),
            "entry_regime": info.get("entry_regime", None),
            "weight": float(current_weights.loc[tk]) if tk in current_weights.index else 0.0,
            "nav": nav,
            "entry_alloc": info.get("entry_alloc", 0.0)
        })

    if (i % 50) == 0:
        gc.collect(); torch.cuda.empty_cache()

# ====== Save outputs ======
portfolio_value.to_frame("portfolio_value").to_csv(os.path.join(OUTPUT_DIR, "portfolio_value_test.csv"))
weights_df.to_csv(weights_by_ticker_path)
pd.DataFrame(trade_log).sort_values(["entry_date","ticker"]).to_csv(os.path.join(OUTPUT_DIR, "trade_log.csv"), index=False)
pd.DataFrame(regime_daily_list).to_csv(os.path.join(OUTPUT_DIR, "regime_daily.csv"), index=False)
pd.DataFrame(holdings_lastday_rows).to_csv(os.path.join(OUTPUT_DIR, "holdings_lastday.csv"), index=False)

# ==== Benchmark (FiinQuantX call unchanged) ====
print("Fetching VNINDEX for benchmark (buy & hold)...")
client = FiinSession(username="DSTC_18@fiinquant.vn", password="Fiinquant0606").login()
bench = client.Fetch_Trading_Data(
    realtime=False, tickers=BENCHMARK_TKR, fields=['close'],
    adjusted=True, by="1d", from_date=str(dates.min().date())
).get_data()
bench["date"] = pd.to_datetime(bench["timestamp"])
bench = bench.set_index("date")["close"].sort_index().reindex(dates).ffill().bfill()
bench_ret = bench.pct_change().fillna(0.0)
benchmark_value = (1 + bench_ret).cumprod() * INIT_CAPITAL
benchmark_value.to_frame("benchmark_value").to_csv(os.path.join(OUTPUT_DIR, "benchmark_value_test.csv"))

# Save actor for inference later
torch.save(actor.state_dict(), os.path.join(MODEL_DIR, "ddpg_actor.pt"))

print("✅ Done Block 10: DDPG train + realistic backtest saved to", OUTPUT_DIR)
print(" - portfolio:", os.path.join(OUTPUT_DIR, "portfolio_value_test.csv"))
print(" - weights:", weights_by_ticker_path)
print(" - trades:", os.path.join(OUTPUT_DIR, "trade_log.csv"))
print(" - regime daily:", os.path.join(OUTPUT_DIR, "regime_daily.csv"))
print(" - holdings snapshot:", os.path.join(OUTPUT_DIR, "holdings_lastday.csv"))

**code mới nhất nằm ngày trên**

In [72]:
# Block 10 — Cluster DDPG + Execution Lag + Turnover Cost (Final, realistic backtest with extra logs)
import os, gc, json, csv, math, time
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from FiinQuantX import FiinSession

# ====== I/O paths ======
DATA_DIR   = "./tensors/"
SIG_DIR    = "./signals/"
OUTPUT_DIR = "./backtest_ddpg/"
MODEL_DIR  = "./models/"
os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(MODEL_DIR, exist_ok=True)

# ====== Config ======
INIT_CAPITAL    = 10_000.0
BENCHMARK_TKR   = "VNINDEX"
EXECUTION_LAG   = 2
COST_BPS        = 20
STATE_LKBK      = 20
MIN_NAMES_PER_CLUSTER = 1
MAX_HOLD_TICKERS = 10
MAX_HOLD_DAYS = 15
SEED = 42

# DDPG hyper
EPOCHS       = 120
BATCH_SIZE   = 128
LR_ACTOR     = 1e-4
LR_CRITIC    = 1e-4
GAMMA        = 0.94
TAU          = 0.01
NOISE_STD    = 0.15
NOISE_DECAY  = 0.97
HIDDEN       = 128
BUFFER_MAX   = 100_000

DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.manual_seed(SEED); np.random.seed(SEED)

# ====== Load artifacts ======
sig_file = os.path.join(SIG_DIR, "a3c_signals_infer.csv")
signals = pd.read_csv(sig_file)
signals["date"] = pd.to_datetime(signals["date"])

# df_backtest must exist (from Block 7.5)
df_px = df_backtest.rename(columns={"timestamp":"date"}).copy()
df_px["date"] = pd.to_datetime(df_px["date"])

realized = df_px[["ticker","date","exit_date","exit_price","realized_return",
                  "entry_regime","tp_level","sl_level","exit_type","horizon_days","fee"]].copy()
realized = realized.set_index(["ticker","date"]).sort_index()

with open(os.path.join(DATA_DIR, "tensor_index.json"), "r") as f:
    tensor_index = json.load(f)

ticker2cluster = {}
for meta in tensor_index:
    for tk in meta["tickers"]:
        ticker2cluster.setdefault(tk, meta["cluster"])
ticker2cluster = pd.Series(ticker2cluster)

px_wide = df_px.pivot(index="date", columns="ticker", values="close").sort_index()
daily_ret = px_wide.pct_change().fillna(0.0)

TRAIN_START = pd.Timestamp("2023-01-01")
TRAIN_END   = pd.Timestamp("2024-12-31")
TEST_START  = pd.Timestamp("2025-01-01")
TEST_END    = daily_ret.index.max()

sig_wide_raw = signals.pivot_table(index="date", columns="ticker", values="signal", aggfunc="last").sort_index()
idx_all = daily_ret.index.union(sig_wide_raw.index)
daily_ret = daily_ret.reindex(idx_all).fillna(0.0)
sig_wide = sig_wide_raw.reindex(idx_all).fillna(0.0)
sig_wide_lag = sig_wide.shift(EXECUTION_LAG)

tickers = sorted([t for t in daily_ret.columns if t in ticker2cluster.index])
daily_ret = daily_ret[tickers].astype("float32")
sig_wide_lag = sig_wide_lag[tickers].astype("float32")
cluster_of = ticker2cluster.loc[tickers]
clusters = sorted(cluster_of.unique().tolist())
cluster_members = {c: cluster_of[cluster_of==c].index.tolist() for c in clusters}
C = len(clusters)

# ====== Build states ======
def build_state_arrays_realized(ret_w, sig_lag, realized_idx, start, end, K=STATE_LKBK):
    dates_all = ret_w.loc[start:end].index
    act_cols, ret_cols, ACTIVE_masks = [], [], {}
    for c in clusters:
        tks = cluster_members[c]
        if not tks:
            act_c = pd.Series(0.0, index=dates_all, name=c)
            ret_c = pd.Series(0.0, index=dates_all, name=c)
            ACTIVE_masks[c] = pd.DataFrame(0.0, index=dates_all, columns=tks)
        else:
            S_c = sig_lag[tks].reindex(dates_all).fillna(0.0)
            active_mask = (S_c > 0).astype("float32")
            ACTIVE_masks[c] = active_mask
            rr = pd.DataFrame(index=dates_all, columns=tks, dtype="float32")
            for tk in tks:
                vals = []
                for d in dates_all:
                    try:
                        vals.append(realized_idx.loc[(tk, d), "realized_return"])
                    except Exception:
                        vals.append(np.nan)
                rr[tk] = pd.Series(vals, index=dates_all, dtype="float32")
            denom = active_mask.sum(axis=1).replace(0, np.nan)
            w = active_mask.div(denom, axis=0).fillna(0.0)
            ret_c = (rr.fillna(0.0) * w).sum(axis=1).astype("float32")
            act_c = active_mask.mean(axis=1).astype("float32")
        act_cols.append(act_c.rename(c)); ret_cols.append(ret_c.rename(c))

    act_df = pd.concat(act_cols, axis=1).astype("float32")
    cret_df = pd.concat(ret_cols, axis=1).astype("float32")
    Tfull = len(dates_all)
    A_stack = np.zeros((Tfull, C, K), dtype="float32")
    R_stack = np.zeros((Tfull, C, K), dtype="float32")
    for k in range(K):
        A_stack[:, :, k] = act_df.shift(k).fillna(0.0).values
        R_stack[:, :, k] = cret_df.shift(k).fillna(0.0).values
    valid_idx = np.arange(Tfull) >= (K - 1)
    dates2 = dates_all[valid_idx]
    A3, R3 = A_stack[valid_idx], R_stack[valid_idx]
    S_flat = np.concatenate([A3.reshape(len(dates2), -1), R3.reshape(len(dates2), -1)], axis=1).astype("float32")

    regimes = []
    for d in dates2:
        try:
            sample = realized_idx.xs(d, level="date")
            rvec = np.array([
                sample["entry_regime"].eq("bear").mean() if "entry_regime" in sample.columns else 0.0,
                sample["entry_regime"].eq("bull").mean() if "entry_regime" in sample.columns else 0.0,
                sample["entry_regime"].eq("sideway").mean() if "entry_regime" in sample.columns else 0.0
            ], dtype="float32")
        except Exception:
            rvec = np.array([0.0,0.0,0.0], dtype="float32")
        regimes.append(rvec)
    regimes = np.stack(regimes, axis=0)

    S_mat = np.concatenate([S_flat, regimes], axis=1)
    R_mat = cret_df.loc[dates2].values.astype("float32")
    ACTIVE_masks = {c: ACTIVE_masks[c].loc[dates2] for c in clusters}
    return S_mat, R_mat, dates2, ACTIVE_masks

S_train, R_train, d_train, ACTIVE_train = build_state_arrays_realized(daily_ret, sig_wide_lag, realized, TRAIN_START, TRAIN_END)
S_test,  R_test,  d_test,  ACTIVE_test  = build_state_arrays_realized(daily_ret, sig_wide_lag, realized, TEST_START, TEST_END)

# ====== DDPG Actor/Critic ======
class Actor(nn.Module):
    def __init__(self, s_dim, a_dim, hidden=HIDDEN):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(s_dim, hidden), nn.ReLU(),
            nn.Linear(hidden, hidden), nn.ReLU(),
            nn.Linear(hidden, a_dim)
        )
    def forward(self, s): return torch.softmax(self.net(s), dim=-1)

class Critic(nn.Module):
    def __init__(self, s_dim, a_dim, hidden=HIDDEN):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(s_dim + a_dim, hidden), nn.ReLU(),
            nn.Linear(hidden, hidden), nn.ReLU(),
            nn.Linear(hidden, 1)
        )
    def forward(self, s,a): return self.net(torch.cat([s,a], dim=-1))

class ReplayBuffer:
    def __init__(self, maxlen=BUFFER_MAX): self.maxlen=maxlen; self.buf=[]
    def push(self,s,a,r,s2):
        if len(self.buf)>=self.maxlen: self.buf.pop(0)
        self.buf.append((s,a,r,s2))
    def sample(self,bs):
        n=min(bs,len(self.buf)); idx=np.random.choice(len(self.buf),n,replace=False)
        s,a,r,s2=zip(*[self.buf[i] for i in idx])
        return np.array(s,np.float32),np.array(a,np.float32),np.array(r,np.float32).reshape(-1,1),np.array(s2,np.float32)

def soft_update(src,tgt,tau):
    with torch.no_grad():
        for p,tp in zip(src.parameters(),tgt.parameters()):
            tp.data.mul_(1-tau); tp.data.add_(tau*p.data)

def port_reward_cluster(w_cluster,r_cluster,prev_w_cluster=None,cost_bps=COST_BPS,train_penalty=True):
    gross=float(np.dot(w_cluster,r_cluster))
    turnover=0.0
    if prev_w_cluster is not None:
        turnover=float(np.sum(np.abs(w_cluster-prev_w_cluster)))
    fee=(cost_bps/1e-4)*0.0 if False else (cost_bps/1e4)*turnover  # explicit
    return gross - (fee if train_penalty else 0.0)

s_dim=S_train.shape[1]; a_dim=C
actor=Actor(s_dim,a_dim).to(DEVICE); critic=Critic(s_dim,a_dim).to(DEVICE)
t_actor=Actor(s_dim,a_dim).to(DEVICE); t_actor.load_state_dict(actor.state_dict())
t_critic=Critic(s_dim,a_dim).to(DEVICE); t_critic.load_state_dict(critic.state_dict())
optA=optim.Adam(actor.parameters(),lr=LR_ACTOR); optC=optim.Adam(critic.parameters(),lr=LR_CRITIC)
mse=nn.MSELoss(); buf=ReplayBuffer(BUFFER_MAX)

# ====== Training ======
noise_std=NOISE_STD; min_buffer=max(500,BATCH_SIZE*5)
for ep in range(EPOCHS):
    prev_w=None; c_loss=a_loss=0.0
    for t in range(len(S_train)-1):
        s=torch.from_numpy(S_train[t]).to(DEVICE).unsqueeze(0)
        s2=torch.from_numpy(S_train[t+1]).to(DEVICE).unsqueeze(0)
        with torch.no_grad(): w=actor(s).cpu().numpy()[0]
        logits=np.log(np.clip(w,1e-9,1.0))+np.random.normal(0,noise_std,size=a_dim)
        w_e=np.exp(logits); w_e=w_e/(w_e.sum()+1e-12)
        r=port_reward_cluster(w_e,R_train[t],prev_w_cluster=prev_w,train_penalty=True); prev_w=w_e.copy()
        buf.push(S_train[t],w_e,r,S_train[t+1])
        if len(buf.buf)>=min_buffer:
            sb,ab,rb,s2b=buf.sample(BATCH_SIZE)
            sb=torch.from_numpy(sb).to(DEVICE); ab=torch.from_numpy(ab).to(DEVICE)
            rb=torch.from_numpy(rb).to(DEVICE); s2b=torch.from_numpy(s2b).to(DEVICE)
            with torch.no_grad(): a2=t_actor(s2b); q2=t_critic(s2b,a2); y=rb+GAMMA*q2
            q=critic(sb,ab); lc=mse(q,y); optC.zero_grad(); lc.backward(); optC.step()
            ap=actor(sb); la=-critic(sb,ap).mean(); optA.zero_grad(); la.backward(); optA.step()
            soft_update(actor,t_actor,TAU); soft_update(critic,t_critic,TAU)
            c_loss+=float(lc.item()); a_loss+=float(la.item())
    noise_std*=NOISE_DECAY
    print(f"[DDPG] Epoch {ep+1}/{EPOCHS} | Critic {c_loss:.4f} | Actor {a_loss:.4f} | noise {noise_std:.5f}")
    gc.collect(); torch.cuda.empty_cache()

torch.save(actor.state_dict(), os.path.join(MODEL_DIR,"ddpg_actor.pt"))

# ====== Backtest (with extra logs) ======
dates=pd.DatetimeIndex(d_test).sort_values()
capital=INIT_CAPITAL; nav=capital
portfolio_value=pd.Series(index=dates,dtype="float64"); portfolio_value.iloc[0]=capital
trade_log=[] 
regime_daily_list=[] 
holdings_lastday_rows=[]
cw_path=os.path.join(OUTPUT_DIR,"cluster_weights_test.csv")
with open(cw_path,"w",newline="") as f:
    cw=csv.writer(f)
    cw.writerow(["date"] + [f"cluster_{c}" for c in clusters] + ["cash_buffer"])

weights_df=pd.DataFrame(0.0,index=dates,columns=tickers,dtype="float32")
holdings={}; prev_w_ticker=pd.Series(0.0,index=tickers,dtype="float32")

# ====== Backtest main loop (FULL with logging) ======
for i, dt in enumerate(dates):
    # prepare state for actor: use previous state to avoid leakage
    s_idx = i-1 if i>0 else i
    s = torch.from_numpy(S_test[s_idx].astype("float32")).to(DEVICE).unsqueeze(0)
    with torch.no_grad():
        w_c = actor(s).cpu().numpy()[0].astype("float32")
    w_c = np.clip(w_c, 0.0, 1.0)
    if w_c.sum() > 0:
        w_c = w_c / w_c.sum()
    else:
        w_c = np.ones_like(w_c) / len(w_c)

    # --- Determine regime majority & cash buffer (compute BEFORE writing cluster weights row) ---
    cash_buffer = 0.15  # default
    majority_regime = None
    try:
        todays = df_px[df_px["date"] == dt]
        if "entry_regime" in todays.columns:
            mode = todays["entry_regime"].mode()
            if len(mode) > 0:
                majority_regime = mode.iloc[0]
                if majority_regime == "bull":   cash_buffer = 0.02
                elif majority_regime == "bear": cash_buffer = 0.35
                else:                           cash_buffer = 0.15
    except Exception:
        majority_regime = None

    # save cluster weights (include cash_buffer)
    with open(cw_path, "a", newline="") as f:
        cw = csv.writer(f)
        cw.writerow([dt.strftime("%Y-%m-%d")] + [float(x) for x in w_c] + [float(cash_buffer)])

    # --- Map cluster weights down to tickers ---
    score = pd.Series(0.0, index=tickers, dtype="float32")
    # safe access to active row
    if dt in sig_wide_lag.index:
        active_matrix_row = sig_wide_lag.loc[dt]
    else:
        # empty series aligned to tickers
        active_matrix_row = pd.Series(0.0, index=tickers)

    for j, c in enumerate(clusters):
        members = cluster_members[c]
        if not members: continue
        try:
            act_row = active_matrix_row[members]
        except Exception:
            act_row = pd.Series(0.0, index=members)
        valid = [tk for tk in members if (tk in act_row.index and act_row[tk] > 0)]
        if len(valid) >= MIN_NAMES_PER_CLUSTER:
            per_tk = float(w_c[j] / len(valid))
            score.loc[valid] += per_tk

    # --- Select top-k tickers dynamic ---
    candidates = score[score > 0].sort_values(ascending=False)
    held_now = list(holdings.keys())
    combined_candidates = pd.concat([candidates, pd.Series(0.0, index=held_now)]).groupby(level=0).first()
    topk = combined_candidates.sort_values(ascending=False).head(MAX_HOLD_TICKERS).index.tolist()

    target_scores = score.loc[topk].fillna(0.0)
    total_score = target_scores.sum()
    investable = 1.0 - cash_buffer
    if total_score <= 1e-12:
        target_w_ticker = pd.Series(0.0, index=tickers, dtype="float32")
    else:
        alloc = (target_scores / total_score) * investable
        target_w_ticker = pd.Series(0.0, index=tickers, dtype="float32")
        target_w_ticker.loc[alloc.index] = alloc.values

    # --- Turnover cost (apply immediately as reduction of NAV) ---
    turnover = float(np.sum(np.abs(target_w_ticker.values - prev_w_ticker.values)))
    fee = (COST_BPS / 1e4) * turnover
    nav = nav * (1.0 - fee)

    # --- Detect new open / close ---
    new_open = [tk for tk in target_w_ticker.index if (prev_w_ticker.loc[tk] == 0.0 and target_w_ticker.loc[tk] > 0.0)]
    closed  = [tk for tk in prev_w_ticker.index if (prev_w_ticker.loc[tk] > 0.0 and target_w_ticker.loc[tk] == 0.0)]

    # --- Forced exits (planned exit or max_hold_days) ---
    to_force_exit = []
    for tk, info in list(holdings.items()):
        planned_exit = info.get("planned_exit_date", pd.NaT)
        entry_dt = info["entry_date"]
        if pd.notna(planned_exit) and planned_exit <= dt:
            to_force_exit.append(tk); continue
        hold_days = (dt - entry_dt).days
        if isinstance(hold_days, (int, np.integer)) and hold_days >= MAX_HOLD_DAYS:
            to_force_exit.append(tk)

    for tk in to_force_exit:
        info = holdings.pop(tk)
        entry_dt = info["entry_date"]
        try:
            rec = realized.loc[(tk, entry_dt)]
            exit_price = rec["exit_price"]; exit_date = rec["exit_date"]
            rret = rec["realized_return"]; etype = rec["exit_type"]; fee_tr = rec.get("fee", (COST_BPS/1e4)*abs(prev_w_ticker.loc[tk]))
        except Exception:
            exit_price = px_wide.loc[dt, tk] if (dt in px_wide.index and tk in px_wide.columns) else np.nan
            exit_date = dt
            rret = np.log(exit_price / info["entry_price"]) if pd.notna(exit_price) and info["entry_price"]>0 else np.nan
            etype = "forced"; fee_tr = (COST_BPS/1e4) * abs(prev_w_ticker.loc[tk])
        trade_log.append({
            "entry_date": entry_dt, "exit_date": exit_date, "ticker": tk,
            "entry_price": info["entry_price"], "exit_price": exit_price,
            "entry_regime": info.get("entry_regime", None),
            "exit_type": etype, "tp_level": info.get("tp_level", np.nan),
            "sl_level": info.get("sl_level", np.nan), "realized_return": rret,
            "holding_days": (exit_date - entry_dt).days if pd.notna(exit_date) else np.nan,
            "fee": fee_tr
        })
        prev_w = prev_w_ticker.loc[tk]
        if not pd.isna(rret):
            nav = nav * (1.0 + prev_w * rret)
        prev_w_ticker.loc[tk] = 0.0

    # --- Open new positions ---
    for tk in new_open:
        if len(holdings) >= MAX_HOLD_TICKERS: continue
        entry_price = px_wide.loc[dt, tk] if (dt in px_wide.index and tk in px_wide.columns) else np.nan
        planned_exit_date = np.nan; planned_exit_price = np.nan; tp_level = np.nan; sl_level = np.nan; entry_regime = np.nan
        try:
            rec = realized.loc[(tk, dt)]
            planned_exit_date = rec["exit_date"]; planned_exit_price = rec["exit_price"]
            tp_level = rec.get("tp_level", np.nan); sl_level = rec.get("sl_level", np.nan)
            entry_regime = rec.get("entry_regime", np.nan)
        except Exception: pass
        holdings[tk] = {
            "entry_date": dt, "entry_price": entry_price,
            "planned_exit_date": planned_exit_date, "planned_exit_price": planned_exit_price,
            "entry_regime": entry_regime, "tp_level": tp_level, "sl_level": sl_level
        }

    # --- Rebalance and daily PnL (mark-to-market using daily_ret) ---
    current_weights = target_w_ticker.copy()
    if current_weights.sum() > 1.0:
        current_weights = current_weights / current_weights.sum()
    # safe get r_vec
    if dt in daily_ret.index:
        r_vec = daily_ret.loc[dt].reindex(current_weights.index).fillna(0.0).values
    else:
        r_vec = np.zeros(len(current_weights), dtype=float)
    port_daily_ret = float(np.dot(current_weights.values, r_vec))
    nav = nav * (1.0 + port_daily_ret)

    weights_df.loc[dt] = current_weights
    prev_w_ticker = current_weights.copy()

    # --- Planned exits today (finalize trade log and remove holdings) ---
    for tk in list(holdings.keys()):
        info = holdings[tk]
        planned_exit = info.get("planned_exit_date", pd.NaT)
        if pd.notna(planned_exit) and planned_exit == dt:
            try:
                rec = realized.loc[(tk, info["entry_date"])]
                exit_price = rec["exit_price"]; exit_date = rec["exit_date"]; rret = rec["realized_return"]
                etype = rec.get("exit_type", "exit"); fee_tr = rec.get("fee", (COST_BPS/1e4)*abs(prev_w_ticker.loc[tk]))
            except Exception:
                exit_price = px_wide.loc[dt, tk] if (dt in px_wide.index and tk in px_wide.columns) else np.nan
                exit_date = dt
                rret = np.log(exit_price / info["entry_price"]) if pd.notna(exit_price) and info["entry_price"]>0 else np.nan
                etype = "exit"; fee_tr = (COST_BPS/1e4)*abs(prev_w_ticker.loc[tk])
            trade_log.append({
                "entry_date": info["entry_date"], "exit_date": exit_date, "ticker": tk,
                "entry_price": info["entry_price"], "exit_price": exit_price,
                "entry_regime": info.get("entry_regime", None), "exit_type": etype,
                "tp_level": info.get("tp_level", np.nan), "sl_level": info.get("sl_level", np.nan),
                "realized_return": rret, "holding_days": (exit_date - info["entry_date"]).days if pd.notna(exit_date) else np.nan,
                "fee": fee_tr
            })
            prev_w_ticker.loc[tk] = 0.0
            holdings.pop(tk, None)

    # --- Save NAV + Logs daily ---
    portfolio_value.iloc[i] = nav
    regime_daily_list.append({
        "date": dt, "regime": majority_regime, "cash_buffer": cash_buffer, "nav": nav
    })
    for tk, info in holdings.items():
        holdings_lastday_rows.append({
            "snapshot_date": dt, "ticker": tk,
            "entry_date": info["entry_date"], "entry_price": info["entry_price"],
            "tp_level": info.get("tp_level", np.nan), "sl_level": info.get("sl_level", np.nan),
            "entry_regime": info.get("entry_regime", None),
            "weight": float(current_weights.loc[tk]) if tk in current_weights.index else 0.0,
            "nav": nav
        })

    if (i % 50) == 0:
        gc.collect(); torch.cuda.empty_cache()

# ====== Save outputs ======
portfolio_value.to_frame("portfolio_value").to_csv(os.path.join(OUTPUT_DIR,"portfolio_value_test.csv"))
weights_df.to_csv(os.path.join(OUTPUT_DIR,"weights_by_ticker.csv"))
pd.DataFrame(trade_log).to_csv(os.path.join(OUTPUT_DIR,"trade_log.csv"),index=False)
pd.DataFrame(regime_daily_list).to_csv(os.path.join(OUTPUT_DIR,"regime_daily.csv"),index=False)
pd.DataFrame(holdings_lastday_rows).to_csv(os.path.join(OUTPUT_DIR,"holdings_lastday.csv"),index=False)

# ==== Benchmark ====
print("Fetching VNINDEX for benchmark...")
client=FiinSession(username="DSTC_18@fiinquant.vn",password="Fiinquant0606").login()
bench=client.Fetch_Trading_Data(realtime=False,tickers=BENCHMARK_TKR,fields=['close'],
    adjusted=True,by="1d",from_date=str(dates.min().date())).get_data()
bench["date"]=pd.to_datetime(bench["timestamp"])
bench=bench.set_index("date")["close"].sort_index().reindex(dates).ffill().bfill()
bench_ret=bench.pct_change().fillna(0.0)
benchmark_value=(1+bench_ret).cumprod()*INIT_CAPITAL
benchmark_value.to_frame("benchmark_value").to_csv(os.path.join(OUTPUT_DIR,"benchmark_value_test.csv"))

torch.save(actor.state_dict(), os.path.join(MODEL_DIR,"ddpg_actor.pt"))
print("✅ Done Block 10 with extra logs")

[DDPG] Epoch 1/120 | Critic 0.0000 | Actor 0.0000 | noise 0.14550
[DDPG] Epoch 2/120 | Critic 0.0010 | Actor -8.7141 | noise 0.14113
[DDPG] Epoch 3/120 | Critic 0.0056 | Actor -9.7634 | noise 0.13690
[DDPG] Epoch 4/120 | Critic 0.0040 | Actor -6.7523 | noise 0.13279
[DDPG] Epoch 5/120 | Critic 0.0009 | Actor -6.5903 | noise 0.12881
[DDPG] Epoch 6/120 | Critic 0.0003 | Actor -6.5519 | noise 0.12495
[DDPG] Epoch 7/120 | Critic 0.0005 | Actor -6.5082 | noise 0.12120
[DDPG] Epoch 8/120 | Critic 0.0007 | Actor -7.3679 | noise 0.11756
[DDPG] Epoch 9/120 | Critic 0.0005 | Actor -7.5439 | noise 0.11403
[DDPG] Epoch 10/120 | Critic 0.0004 | Actor -7.2145 | noise 0.11061
[DDPG] Epoch 11/120 | Critic 0.0003 | Actor -6.9752 | noise 0.10730
[DDPG] Epoch 12/120 | Critic 0.0004 | Actor -7.0616 | noise 0.10408
[DDPG] Epoch 13/120 | Critic 0.0004 | Actor -6.8767 | noise 0.10095
[DDPG] Epoch 14/120 | Critic 0.0003 | Actor -7.0262 | noise 0.09793
[DDPG] Epoch 15/120 | Critic 0.0003 | Actor -7.1496 | nois

**BLOCK 10.5**

In [100]:
# Block 10.5 — Save enriched artifacts for Telegram reporting (full, fixed snapshot carry-over)
import os
import json
import math
from datetime import timedelta
import pandas as pd
import numpy as np

# -------- CONFIG & PATHS --------
OUTPUT_DIR = "./backtest_ddpg/"
os.makedirs(OUTPUT_DIR, exist_ok=True)

CONFIG_PATH = "config.json"
# default execution lag (days) if not specified in config
DEFAULT_EXECUTION_LAG = 2

# --------- Load config.json (optional) ----------
cfg = {}
if os.path.exists(CONFIG_PATH):
    try:
        with open(CONFIG_PATH, "r", encoding="utf-8") as f:
            cfg = json.load(f)
    except Exception as e:
        print("Warning: cannot parse config.json:", e)
EXECUTION_LAG = int(cfg.get("pipeline", {}).get("execution_lag", DEFAULT_EXECUTION_LAG))

print(f"[Info] OUTPUT_DIR = {OUTPUT_DIR}")
print(f"[Info] EXECUTION_LAG = {EXECUTION_LAG} days (settlement)")

# --------- Helper functions ----------
def safe_parse_dates(df, cols):
    for c in cols:
        if c in df.columns:
            df[c] = pd.to_datetime(df[c], errors="coerce")
    return df

def choose_column(df, candidates, default=None):
    """Return first existing column name from candidates or default."""
    for c in candidates:
        if c in df.columns:
            return c
    return default

def ensure_col(df, col, default_val=np.nan):
    if col not in df.columns:
        df[col] = default_val
    return df

def get_price_on_or_before(px_wide, price_map, ticker, dt):
    """
    Try price_map[(ticker, dt)] exact; else fallback to px_wide[ticker].loc[:dt].iloc[-1] if available.
    Returns np.nan if cannot find.
    """
    if pd.isna(dt):
        return np.nan
    try:
        # exact lookup first (if df_backtest-based price_map exists)
        return float(price_map.loc[(ticker, dt)])
    except Exception:
        pass
    try:
        if ticker in px_wide.columns:
            ser = px_wide[ticker].loc[:dt].dropna()
            if len(ser) > 0:
                return float(ser.iloc[-1])
    except Exception:
        pass
    return np.nan

# --------- Load trade_log (expected from Block 10) ----------
trade_log_path = os.path.join(OUTPUT_DIR, "trade_log.csv")
if not os.path.exists(trade_log_path):
    raise FileNotFoundError(f"trade_log.csv not found in {OUTPUT_DIR}. Please run Block 10 first.")

trade_log = pd.read_csv(trade_log_path)
# try to parse common date columns
trade_log = safe_parse_dates(trade_log, ["entry_date", "exit_date", "entry_dt", "exit_dt", "timestamp"])

# normalize column names: prefer 'entry_price', 'exit_price' etc.
entry_price_col = choose_column(trade_log, ["entry_price", "entry_trade_price", "entry_trade_px", "entry_px", "entry_trade_price"])
exit_price_col = choose_column(trade_log, ["exit_price", "exit_trade_price", "exit_trade_px", "exit_px", "exit_trade_price"])

# ensure we have columns named 'entry_price' and 'exit_price' for downstream code
if entry_price_col and entry_price_col != "entry_price":
    trade_log = trade_log.rename(columns={entry_price_col: "entry_price"})
elif "entry_price" not in trade_log.columns:
    trade_log["entry_price"] = np.nan

if exit_price_col and exit_price_col != "exit_price":
    trade_log = trade_log.rename(columns={exit_price_col: "exit_price"})
elif "exit_price" not in trade_log.columns:
    trade_log["exit_price"] = np.nan

# ensure entry_date/exit_date columns exist under these canonical names
if "entry_date" not in trade_log.columns:
    cand = choose_column(trade_log, ["entry_dt", "timestamp", "date"])
    if cand:
        trade_log = trade_log.rename(columns={cand: "entry_date"})
trade_log = safe_parse_dates(trade_log, ["entry_date"])

if "exit_date" not in trade_log.columns:
    trade_log["exit_date"] = pd.NaT
else:
    trade_log = safe_parse_dates(trade_log, ["exit_date"])

# fill missing ticker column check
if "ticker" not in trade_log.columns:
    raise KeyError("trade_log.csv must contain a 'ticker' column")

# canonical: entry_price, exit_price, entry_date (datetime), exit_date (datetime), ticker
trade_log = ensure_col(trade_log, "entry_price", np.nan)
trade_log = ensure_col(trade_log, "exit_price", np.nan)

# --------- Load df_backtest (prices) ----------
df_backtest_path = os.path.join(OUTPUT_DIR, "df_backtest.csv")
df_backtest = None
if os.path.exists(df_backtest_path):
    try:
        df_backtest = pd.read_csv(df_backtest_path)
    except Exception as e:
        print("Warning: cannot read df_backtest.csv:", e)

# If df_backtest not found on disk, try to use variable in global namespace (user may have it in memory)
if df_backtest is None:
    try:
        # 'df_backtest' variable might exist in user's global scope
        df_backtest = globals().get("df_backtest", None)
        if df_backtest is not None:
            # save a copy to OUTPUT_DIR for reproducibility
            try:
                df_backtest.to_csv(df_backtest_path, index=False)
                print(f"[Info] Saved df_backtest from memory to {df_backtest_path}")
            except Exception:
                pass
    except Exception:
        df_backtest = None

if df_backtest is None:
    # fallback: try to build minimal px from trade_log (not ideal)
    raise FileNotFoundError(f"df_backtest.csv not found in {OUTPUT_DIR} and variable df_backtest not present in memory. Block 10 must produce price history (df_backtest).")

# fix timestamp/date column name
if "timestamp" not in df_backtest.columns:
    if "date" in df_backtest.columns:
        df_backtest = df_backtest.rename(columns={"date": "timestamp"})
    elif "datetime" in df_backtest.columns:
        df_backtest = df_backtest.rename(columns={"datetime": "timestamp"})
    else:
        raise KeyError("df_backtest must contain a 'timestamp' or 'date' column")

df_backtest["timestamp"] = pd.to_datetime(df_backtest["timestamp"], errors="coerce")
if df_backtest["timestamp"].isna().all():
    raise ValueError("df_backtest timestamp column could not be parsed as datetimes")

# ensure we have close price col
if "close" not in df_backtest.columns:
    raise KeyError("df_backtest must contain 'close' column")

# create price structures
price_map = df_backtest.set_index(["ticker", "timestamp"])["close"]
px_wide = df_backtest.pivot(index="timestamp", columns="ticker", values="close").sort_index()

# --------- Load weights_by_ticker, portfolio value, regime_daily if present ----------
weights_path = os.path.join(OUTPUT_DIR, "weights_by_ticker.csv")
weights_df = pd.read_csv(weights_path, index_col=0, parse_dates=True) if os.path.exists(weights_path) else None

pv_path = os.path.join(OUTPUT_DIR, "portfolio_value_test.csv")
pv = None
if os.path.exists(pv_path):
    try:
        pv = pd.read_csv(pv_path, index_col=0, parse_dates=True).iloc[:, 0]
    except Exception:
        pv = None

regime_path = os.path.join(OUTPUT_DIR, "regime_daily.csv")
regime_df = pd.read_csv(regime_path, parse_dates=["date"]).set_index("date") if os.path.exists(regime_path) else None

# --------- Prepare trade_log canonical columns and settlement dates ----------
# Add status column (open/closed) based on exit_date presence
trade_log["status"] = np.where(pd.notna(trade_log["exit_date"]), "closed", "open")

# compute settlement dates using EXECUTION_LAG (if trade already contains 'settlement' columns, prefer them)
# prefer user-provided settle columns if exist:
entry_settle_col = choose_column(trade_log, ["entry_settlement_date", "entry_settle_date", "entry_settled"])
exit_settle_col  = choose_column(trade_log, ["exit_settlement_date", "exit_settle_date", "exit_settled"])

if entry_settle_col:
    trade_log["entry_settle_date"] = pd.to_datetime(trade_log[entry_settle_col], errors="coerce")
else:
    trade_log["entry_settle_date"] = trade_log["entry_date"] + pd.to_timedelta(EXECUTION_LAG, unit="D")

if exit_settle_col:
    trade_log["exit_settle_date"] = pd.to_datetime(trade_log[exit_settle_col], errors="coerce")
else:
    # if exit_date is NaT, keep exit_settle_date as NaT
    trade_log["exit_settle_date"] = pd.to_datetime(trade_log["exit_date"], errors="coerce") + pd.to_timedelta(EXECUTION_LAG, unit="D")
    trade_log.loc[trade_log["exit_date"].isna(), "exit_settle_date"] = pd.NaT

# --------- Choose date range to iterate (based on portfolio nav if possible) ----------
if pv is not None:
    start_date = pd.to_datetime(pv.index.min())
    end_date = pd.to_datetime(pv.index.max())
else:
    # fallback to union of df_backtest dates
    start_date = df_backtest["timestamp"].min()
    end_date = df_backtest["timestamp"].max()

dates = pd.date_range(start_date, end_date, freq="D")
print(f"[Info] Will generate snapshots for {len(dates)} days: {start_date.date()} -> {end_date.date()}")

# --------- Create output subfolders (optional) ----------
SNAP_DIR = OUTPUT_DIR  # same folder; user requested simple filenames
# --------- Main loop: for each day, create snapshots, signals, payload ----------
created_snapshots = 0
created_signals = 0
created_payloads = 0

# Precompute: last available price per (ticker, date) to speed up repeated lookups
# (we will still use price_map+px_wide per get_price_on_or_before)
for i, d in enumerate(dates):
    # use pd.Timestamp (normalize to midnight)
    d = pd.Timestamp(d).normalize()
    # --- determine open positions at EOD d using settlement dates ---
    # A position is considered "open at EOD d" if entry_settle_date <= d and (exit_settle_date is NA or exit_settle_date > d)
    mask_open = (trade_log["entry_settle_date"].notna()) & (trade_log["entry_settle_date"] <= d) & (
        (trade_log["exit_settle_date"].isna()) | (trade_log["exit_settle_date"] > d)
    )
    open_pos_df = trade_log.loc[mask_open].copy().reset_index(drop=True)

    # enrich open positions with last_price as of d (use get_price_on_or_before)
    enriched_rows = []
    for idx, row in open_pos_df.iterrows():
        tk = row["ticker"]
        entry_px = row.get("entry_price", np.nan)
        last_px = get_price_on_or_before(px_wide, price_map, tk, d)
        if not pd.isna(entry_px) and not pd.isna(last_px):
            unreal_pct = (last_px / float(entry_px) - 1.0) * 100.0
        else:
            unreal_pct = np.nan
        # position size pct from weights_df if exists
        pos_size = 0.0
        if weights_df is not None:
            try:
                if d in weights_df.index and tk in weights_df.columns:
                    pos_size = float(weights_df.loc[d, tk])
                else:
                    pos_size = 0.0
            except Exception:
                pos_size = 0.0
        enriched_rows.append({
            "snapshot_date": d,
            "ticker": tk,
            "entry_date": row.get("entry_date", pd.NaT),
            "entry_settle_date": row.get("entry_settle_date", pd.NaT),
            "entry_price": entry_px,
            "last_price": last_px,
            "current_unrealized_pct": unreal_pct,
            "tp_level": row.get("tp_level", np.nan),
            "sl_level": row.get("sl_level", np.nan),
            "entry_regime": row.get("entry_regime", None),
            "position_size_pct": pos_size
        })

    snapshot_df = pd.DataFrame(enriched_rows, columns=[
        "snapshot_date","ticker","entry_date","entry_settle_date","entry_price","last_price",
        "current_unrealized_pct","tp_level","sl_level","entry_regime","position_size_pct"
    ])
    # Save snapshot even if empty
    snap_fname = os.path.join(SNAP_DIR, f"positions_snapshot_{d.date()}.csv")
    snapshot_df.to_csv(snap_fname, index=False)
    created_snapshots += 1

    # --- signals_today: rows where entry_date == d or exit_date == d (these are trade timestamps, not settle) ---
    signals_mask = ( (trade_log["entry_date"].notna()) & (trade_log["entry_date"].dt.normalize() == d) ) | \
                   ( (trade_log["exit_date"].notna()) & (trade_log["exit_date"].dt.normalize() == d) )
    signals_df = trade_log.loc[signals_mask].copy().reset_index(drop=True)
    # add action column
    def _action(r):
        if pd.notna(r.get("entry_date")) and pd.to_datetime(r["entry_date"]).normalize() == d and (pd.isna(r.get("exit_date")) or pd.to_datetime(r.get("exit_date")).normalize() != d):
            return "BUY"
        if pd.notna(r.get("exit_date")) and pd.to_datetime(r["exit_date"]).normalize() == d:
            # If both entry and exit same day, treat exit as SELL
            return "SELL"
        return "BUY" if pd.notna(r.get("entry_date")) and pd.to_datetime(r["entry_date"]).normalize() == d else "SELL"
    if not signals_df.empty:
        signals_df["action"] = signals_df.apply(_action, axis=1)
    # Save signals file
    sig_fname = os.path.join(SNAP_DIR, f"signals_today_{d.date()}.csv")
    # select sensible columns to save
    out_sig_cols = []
    for c in ["ticker","action","entry_date","exit_date","entry_price","exit_price","tp_level","sl_level","exit_type","status"]:
        if c in signals_df.columns:
            out_sig_cols.append(c)
    if len(out_sig_cols)==0:
        # fallback to save everything
        signals_df.to_csv(sig_fname, index=False)
    else:
        signals_df[out_sig_cols].to_csv(sig_fname, index=False)
    created_signals += 1

    # --- telegram payload JSON for this date ---
    # regime and cash_buffer
    if regime_df is not None and d in regime_df.index:
        regime_row = regime_df.loc[d]
        # try to standardize regime/cash_buffer names
        regime_val = regime_row.get("regime", None) if hasattr(regime_row, "get") else regime_row.get("entry_regime", None) if "entry_regime" in regime_row.index else None
        cash_buffer = regime_row.get("cash_buffer", None) if hasattr(regime_row, "get") else None
        # fallback if regime row is a Series without .get
        if isinstance(regime_row, (pd.Series, )):
            if "regime" in regime_row.index:
                regime_val = regime_row["regime"]
            elif "entry_regime" in regime_row.index:
                regime_val = regime_row["entry_regime"]
            if "cash_buffer" in regime_row.index:
                cash_buffer = float(regime_row["cash_buffer"])
    else:
        regime_val = None
        cash_buffer = None
    # nav at date
    nav_val = None
    if pv is not None:
        try:
            # take last NAV at or before d
            nav_val = float(pv.loc[:d].iloc[-1])
        except Exception:
            nav_val = None

    # Build payload dict
    payload = {
        "date": str(d.date()),
        "regime": regime_val if regime_val is not None else None,
        "cash_buffer": float(cash_buffer) if (cash_buffer is not None and not pd.isna(cash_buffer)) else None,
        "nav": float(nav_val) if (nav_val is not None and not pd.isna(nav_val)) else None,
        "positions": [],
        "signals": []
    }

    # positions: from snapshot_df
    for _, r in snapshot_df.iterrows():
        payload["positions"].append({
            "ticker": r["ticker"],
            "entry_date": str(r["entry_date"].date()) if pd.notna(r["entry_date"]) else None,
            "entry_settle_date": str(r["entry_settle_date"].date()) if pd.notna(r["entry_settle_date"]) else None,
            "entry_price": float(r["entry_price"]) if not pd.isna(r["entry_price"]) else None,
            "last_price": float(r["last_price"]) if not pd.isna(r["last_price"]) else None,
            "unrealized_pct": float(r["current_unrealized_pct"]) if not pd.isna(r["current_unrealized_pct"]) else None,
            "tp_level": float(r["tp_level"]) if not pd.isna(r["tp_level"]) else None,
            "sl_level": float(r["sl_level"]) if not pd.isna(r["sl_level"]) else None,
            "position_size_pct": float(r["position_size_pct"]) if not pd.isna(r["position_size_pct"]) else None,
            "entry_regime": r.get("entry_regime", None)
        })

    # signals: from signals_df
    for _, r in signals_df.iterrows():
        action = r.get("action", None) if "action" in r.index else None
        payload["signals"].append({
            "ticker": r.get("ticker", None),
            "action": action,
            "entry_date": str(r["entry_date"].date()) if pd.notna(r.get("entry_date")) else None,
            "exit_date": str(r["exit_date"].date()) if pd.notna(r.get("exit_date")) else None,
            "entry_price": float(r["entry_price"]) if not pd.isna(r.get("entry_price", np.nan)) else None,
            "exit_price": float(r["exit_price"]) if not pd.isna(r.get("exit_price", np.nan)) else None,
            "tp_level": float(r.get("tp_level")) if (r.get("tp_level") is not None and not pd.isna(r.get("tp_level"))) else None,
            "sl_level": float(r.get("sl_level")) if (r.get("sl_level") is not None and not pd.isna(r.get("sl_level"))) else None,
            "exit_type": r.get("exit_type", None),
            "status": r.get("status", None)
        })

    payload_fname = os.path.join(SNAP_DIR, f"telegram_payload_{d.date()}.json")
    with open(payload_fname, "w", encoding="utf-8") as f:
        json.dump(payload, f, ensure_ascii=False, indent=2, default=str)
    created_payloads += 1

    # progress print
    if (i + 1) % 50 == 0 or (i + 1) == len(dates):
        print(f"[Progress] Generated {i+1}/{len(dates)} days - last date: {d.date()}")

# --------- Enrich trade_log with last price as-of last_date and unrealized PnL where open ----------
# We'll compute 'last_price' (using end_date) and 'current_unrealized_pct' for open trades
last_date = pd.Timestamp(dates.max()).normalize()
trade_log_detailed = trade_log.copy()

# compute last price per trade row (use last available price on or before last_date)
_last_prices = []
_curr_unreal = []
for idx, r in trade_log_detailed.iterrows():
    tk = r["ticker"]
    entry_px = r.get("entry_price", np.nan)
    last_px = get_price_on_or_before(px_wide, price_map, tk, last_date)
    _last_prices.append(last_px)
    if pd.notna(entry_px) and pd.notna(last_px):
        _curr_unreal.append((last_px / float(entry_px) - 1.0) * 100.0)
    else:
        _curr_unreal.append(np.nan)

trade_log_detailed["last_price_asof"] = _last_prices
trade_log_detailed["current_unrealized_pct_asof_last_date"] = _curr_unreal

# Save trade_log_detailed
detailed_path = os.path.join(OUTPUT_DIR, "trade_log_detailed.csv")
trade_log_detailed.to_csv(detailed_path, index=False)
print(f"[Saved] trade_log_detailed -> {detailed_path}")

# Summary prints
print("=== Done Block 10.5 ===")
print(f"Snapshots created: {created_snapshots} (files positions_snapshot_YYYY-MM-DD.csv)")
print(f"Signals created:   {created_signals} (files signals_today_YYYY-MM-DD.csv)")
print(f"Payloads created:  {created_payloads} (files telegram_payload_YYYY-MM-DD.json)")
print(f"Trade log detailed: {detailed_path}")
print(f"Example snapshot file (first): {os.path.join(SNAP_DIR, f'positions_snapshot_{dates[0].date()}.csv')}")

[Info] OUTPUT_DIR = ./backtest_ddpg/
[Info] EXECUTION_LAG = 2 days (settlement)
[Info] Will generate snapshots for 237 days: 2025-02-05 -> 2025-09-29
[Progress] Generated 50/237 days - last date: 2025-03-26
[Progress] Generated 100/237 days - last date: 2025-05-15
[Progress] Generated 150/237 days - last date: 2025-07-04
[Progress] Generated 200/237 days - last date: 2025-08-23
[Progress] Generated 237/237 days - last date: 2025-09-29
[Saved] trade_log_detailed -> ./backtest_ddpg/trade_log_detailed.csv
=== Done Block 10.5 ===
Snapshots created: 237 (files positions_snapshot_YYYY-MM-DD.csv)
Signals created:   237 (files signals_today_YYYY-MM-DD.csv)
Payloads created:  237 (files telegram_payload_YYYY-MM-DD.json)
Trade log detailed: ./backtest_ddpg/trade_log_detailed.csv
Example snapshot file (first): ./backtest_ddpg/positions_snapshot_2025-02-05.csv


**Block 11: Thống kê kết quả và vẽ biểu đồ**

In [80]:
# Block 11 — Hiệu suất & Stress Test (Final, full stats + regime analysis)
import os
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from scipy import stats

# ---- Config / paths ----
OUTPUT_DIR = "./backtest_ddpg/"
os.makedirs(OUTPUT_DIR, exist_ok=True)
STATS_FILE = os.path.join(OUTPUT_DIR, "stats_test.csv")

# Stress events cấu hình ở đây
STRESS_EVENTS = [
    {"name": "Trump áp thuế 46%", "start": "2025-03-26", "end": "2025-04-15"},
]

# ---- Helpers ----
def max_drawdown(series: pd.Series) -> float:
    peak = series.cummax()
    dd = (series / peak) - 1.0
    return float(dd.min())

def calmar_ratio(port_val: pd.Series) -> float:
    dd = max_drawdown(port_val)
    if dd == 0 or np.isclose(dd, 0):
        return np.nan
    total_ret = port_val.iloc[-1] / port_val.iloc[0] - 1.0
    ann_ret = (1 + total_ret) ** (252.0 / len(port_val)) - 1.0
    return float(ann_ret / abs(dd))

def information_ratio(port_ret: pd.Series, bench_ret: pd.Series) -> float:
    ex = port_ret - bench_ret
    if ex.std() == 0:
        return np.nan
    return float(ex.mean() / ex.std() * np.sqrt(252))

def compute_alpha_beta(port_ret: pd.Series, bench_ret: pd.Series):
    idx = port_ret.index.intersection(bench_ret.index)
    if len(idx) < 2:
        return np.nan, np.nan
    y = port_ret.loc[idx].values
    x = bench_ret.loc[idx].values
    if np.isclose(x.std(), 0):
        return np.nan, np.nan
    X = sm.add_constant(x)
    model = sm.OLS(y, X).fit()
    alpha, beta = float(model.params[0]), float(model.params[1])
    return alpha, beta

def stats_from_series(port_val: pd.Series, bench_val: pd.Series | None = None):
    port_ret = port_val.pct_change().fillna(0.0)
    statsd = {}
    statsd["Ngày bắt đầu"] = port_val.index.min().strftime("%Y-%m-%d")
    statsd["Ngày kết thúc"] = port_val.index.max().strftime("%Y-%m-%d")
    statsd["Giá trị cuối"] = float(port_val.iloc[-1])
    statsd["ROI (%)"] = float((port_val.iloc[-1] / port_val.iloc[0] - 1.0) * 100)
    ann_vol = float(port_ret.std() * np.sqrt(252) * 100)
    statsd["Biến động (năm, %)"] = ann_vol
    statsd["Sharpe"] = float((port_ret.mean() / port_ret.std() * np.sqrt(252)) if port_ret.std() > 0 else 0.0)
    neg_std = port_ret[port_ret < 0].std()
    statsd["Sortino"] = float((port_ret.mean() / neg_std * np.sqrt(252)) if (neg_std > 0) else 0.0)
    statsd["MaxDrawdown (%)"] = float(max_drawdown(port_val) * 100)
    statsd["Tỷ lệ phiên thắng (%)"] = float((port_ret > 0).mean() * 100)
    statsd["Số ngày"] = int(len(port_ret))
    statsd["Skewness"] = float(stats.skew(port_ret.dropna()))
    statsd["Kurtosis"] = float(stats.kurtosis(port_ret.dropna()))
    statsd["Calmar"] = float(calmar_ratio(port_val))

    if bench_val is not None and len(bench_val) > 1:
        bench_ret = bench_val.pct_change().fillna(0.0)
        statsd["Giá trị cuối (VNINDEX)"] = float(bench_val.iloc[-1])
        statsd["ROI VNINDEX (%)"] = float((bench_val.iloc[-1] / bench_val.iloc[0] - 1.0) * 100)
        statsd["Chênh lệch so với VNINDEX (pp)"] = statsd["ROI (%)"] - statsd["ROI VNINDEX (%)"]
        statsd["InformationRatio"] = float(information_ratio(port_ret, bench_ret))
        a,b = compute_alpha_beta(port_ret, bench_ret)
        statsd["Alpha (daily)"] = float(a) if not pd.isna(a) else np.nan
        statsd["Beta"] = float(b) if not pd.isna(b) else np.nan

    return statsd

def plot_equity(port_val, bench_val, title, path):
    plt.figure(figsize=(10,6))
    plt.plot(port_val, label="Chiến lược", linewidth=1.6)
    if bench_val is not None:
        plt.plot(bench_val, label="VNINDEX (Buy&Hold)", linewidth=1.2)
    plt.title(title); plt.legend(); plt.grid(True, alpha=0.3)
    plt.savefig(path, dpi=150); plt.close()

def plot_hist(port_ret, title, path, bench_ret=None):
    plt.figure(figsize=(9,5))
    plt.hist(port_ret.dropna()*100, bins=50, alpha=0.6, label="Chiến lược")
    if bench_ret is not None:
        plt.hist(bench_ret.dropna()*100, bins=50, alpha=0.6, label="VNINDEX")
    plt.title(title); plt.xlabel("Daily Return (%)"); plt.ylabel("Tần suất")
    plt.legend(); plt.grid(True, alpha=0.3)
    plt.savefig(path, dpi=150); plt.close()

# ---- Load results ----
pv_path = os.path.join(OUTPUT_DIR, "portfolio_value_test.csv")
bv_path = os.path.join(OUTPUT_DIR, "benchmark_value_test.csv")
weights_path = os.path.join(OUTPUT_DIR, "weights_by_ticker.csv")
regime_path = os.path.join(OUTPUT_DIR, "regime_daily.csv")
df_backtest_path = os.path.join(OUTPUT_DIR, "df_backtest.csv")

if not os.path.exists(pv_path):
    raise FileNotFoundError(f"{pv_path} not found. Run Block 10 first.")

port_val = pd.read_csv(pv_path, index_col=0, parse_dates=True).iloc[:,0].sort_index()
bench_val = None
if os.path.exists(bv_path):
    bench_val = pd.read_csv(bv_path, index_col=0, parse_dates=True).iloc[:,0].sort_index()
    bench_val = bench_val.reindex(port_val.index).ffill().bfill()

returns = port_val.pct_change().fillna(0.0)
bench_ret = bench_val.pct_change().fillna(0.0) if bench_val is not None else None

# ---- Compute & save stats ----
stats_test = stats_from_series(port_val, bench_val)
pd.DataFrame({"metric": list(stats_test.keys()), "value": list(stats_test.values())}).to_csv(STATS_FILE, index=False)

# ---- Plots toàn kỳ ----
plot_equity(port_val, bench_val, "Equity Curve (Test)", os.path.join(OUTPUT_DIR, "equity_test.png"))
plot_hist(returns, "Histogram lợi nhuận ngày (Test)", os.path.join(OUTPUT_DIR, "hist_test.png"), bench_ret)

# Drawdown
cum_ret = (1+returns).cumprod()
dd = cum_ret / cum_ret.cummax() - 1
plt.figure(figsize=(10,4)); plt.plot(dd, color="red")
plt.title("Drawdown toàn kỳ"); plt.grid(True, alpha=0.3)
plt.savefig(os.path.join(OUTPUT_DIR,"drawdown_test.png")); plt.close()

# Rolling Sharpe
window = 60
roll_sharpe = returns.rolling(window).mean() / returns.rolling(window).std()
plt.figure(figsize=(10,4)); plt.plot(roll_sharpe, label="Rolling Sharpe (60d)")
plt.axhline(0, color="grey", ls="--"); plt.legend(); plt.grid(True, alpha=0.3)
plt.title("Rolling Sharpe (toàn kỳ)")
plt.savefig(os.path.join(OUTPUT_DIR,"rolling_sharpe.png")); plt.close()

# Rolling Beta
if bench_val is not None:
    betas = []
    for i in range(len(returns) - window):
        y = returns.iloc[i:i+window]; x = bench_ret.iloc[i:i+window]
        if x.std() == 0: betas.append(np.nan); continue
        X = sm.add_constant(x); model = sm.OLS(y, X).fit(); betas.append(model.params[1])
    roll_beta = pd.Series(betas, index=returns.index[window:])
    plt.figure(figsize=(10,4)); plt.plot(roll_beta, label="Rolling Beta (60d)")
    plt.axhline(1, color="grey", ls="--"); plt.legend(); plt.grid(True, alpha=0.3)
    plt.title("Rolling Beta (toàn kỳ)")
    plt.savefig(os.path.join(OUTPUT_DIR,"rolling_beta.png")); plt.close()

# ---- Monthly heatmap ----
monthly = returns.resample("M").agg(["sum","count", lambda x: (x>0).mean()])
monthly.columns = ["Return","Trades","WinRate"]
monthly.index.name = "Date"

pivot2 = monthly.reset_index()
pivot2["year"] = pivot2["Date"].dt.year
pivot2["month"] = pivot2["Date"].dt.month
heat = pivot2.pivot_table(values="Return", index="year", columns="month")
plt.figure(figsize=(12,6)); sns.heatmap(heat, annot=True, fmt=".2%", cmap="RdYlGn", center=0)
plt.title("Heatmap lợi nhuận hàng tháng"); plt.savefig(os.path.join(OUTPUT_DIR,"heatmap_monthly.png")); plt.close()
monthly.to_csv(os.path.join(OUTPUT_DIR,"monthly_stats.csv"))

# ---- Stress Test ----
stress_stats = {}
for ev in STRESS_EVENTS:
    start, end, name = pd.Timestamp(ev["start"]), pd.Timestamp(ev["end"]), ev["name"]
    sub_port = port_val.loc[start:end]
    sub_bench = bench_val.loc[start:end] if bench_val is not None else None
    if len(sub_port) > 1:
        stats_stress = stats_from_series(sub_port, sub_bench)
        stress_stats[name] = stats_stress
        plot_equity(sub_port, sub_bench, f"Equity ({name})", os.path.join(OUTPUT_DIR,f"equity_stress_{name}.png"))
        sub_ret = sub_port.pct_change().fillna(0.0)
        plot_hist(sub_ret, f"Histogram ({name})", os.path.join(OUTPUT_DIR,f"hist_stress_{name}.png"))
        cum_p = (1+sub_ret).cumprod(); dd_p = cum_p / cum_p.cummax() - 1
        plt.figure(figsize=(10,4)); plt.plot(dd_p, label="Chiến lược", color="red")
        if sub_bench is not None:
            cum_b = (1 + sub_bench.pct_change().fillna(0.0)).cumprod(); dd_b = cum_b / cum_b.cummax() - 1
            plt.plot(dd_b, label="VNINDEX", color="blue")
        plt.title(f"Drawdown ({name})"); plt.legend(); plt.grid(True, alpha=0.3)
        plt.savefig(os.path.join(OUTPUT_DIR,f"drawdown_stress_{name}.png")); plt.close()

# ---- Top holdings & performance ----
if os.path.exists(weights_path):
    try:
        weights = pd.read_csv(weights_path, index_col=0, parse_dates=True)
        mean_w = weights.mean().sort_values(ascending=False).head(10)
        plt.figure(figsize=(10,6)); mean_w.plot(kind="bar")
        plt.title("Top 10 cổ phiếu phân bổ vốn cao nhất"); plt.ylabel("Tỷ trọng")
        plt.savefig(os.path.join(OUTPUT_DIR,"top10_weights.png")); plt.close()
    except Exception as e:
        print("Warning: cannot plot top weights:", e)

if os.path.exists(df_backtest_path):
    try:
        df_backtest = pd.read_csv(df_backtest_path, parse_dates=["timestamp"])
        df_backtest = df_backtest.groupby(["timestamp","ticker"], as_index=False).agg({"close":"last"})
        px = df_backtest.pivot(index="timestamp", columns="ticker", values="close").sort_index()
        ret_wide = px.pct_change().fillna(0.0)
        cum_ret_tk = (1+ret_wide).cumprod().iloc[-1] - 1
        top_gain = cum_ret_tk.sort_values(ascending=False).head(10)
        top_loss = cum_ret_tk.sort_values().head(10)
        plt.figure(figsize=(10,6)); top_gain.plot(kind="bar",color="green"); plt.title("Top 10 cổ phiếu lãi nhiều nhất")
        plt.savefig(os.path.join(OUTPUT_DIR,"top10_gain.png")); plt.close()
        plt.figure(figsize=(10,6)); top_loss.plot(kind="bar",color="red"); plt.title("Top 10 cổ phiếu lỗ nhiều nhất")
        plt.savefig(os.path.join(OUTPUT_DIR,"top10_loss.png")); plt.close()
    except Exception as e:
        print("Warning: cannot compute per-stock performance:", e)

# ---- Turnover stats ----
trade_stats = {}
if os.path.exists(weights_path):
    try:
        wdf = pd.read_csv(weights_path, index_col=0, parse_dates=True)
        turnover_series = wdf.diff().abs().sum(axis=1)
        trade_stats["Turnover mean (daily)"] = float(turnover_series.mean())
        trade_stats["Turnover median (daily)"] = float(turnover_series.median())
        trade_stats["Turnover max (daily)"] = float(turnover_series.max())
        trade_stats["Turnover annualized"] = float(turnover_series.mean() * 252)
        pd.DataFrame({"metric": list(trade_stats.keys()), "value": list(trade_stats.values())}).to_csv(os.path.join(OUTPUT_DIR,"turnover_stats.csv"), index=False)
    except Exception as e:
        print("Warning: cannot compute turnover stats:", e)

# ---- Regime performance ----
if os.path.exists(regime_path):
    regime_df = pd.read_csv(regime_path, parse_dates=["date"])
    regime_df = regime_df.set_index("date").reindex(port_val.index).fillna(method="ffill")
    regime_df["daily_ret"] = returns
    perf_by_regime = regime_df.groupby("regime")["daily_ret"].agg(["mean","std","count"])
    perf_by_regime["Sharpe"] = perf_by_regime["mean"] / perf_by_regime["std"] * np.sqrt(252)
    perf_by_regime.to_csv(os.path.join(OUTPUT_DIR,"regime_performance.csv"))

# ---- Save combined stats ----
all_stats = {"Test": stats_test}
all_stats.update({f"Stress {k}":v for k,v in stress_stats.items()})
pd.DataFrame(all_stats).T.to_csv(STATS_FILE)

# ---- Print summary ----
print("📊 Kết quả Test:"); print(pd.Series(stats_test))
for k,v in stress_stats.items():
    print(f"\n📊 Stress Test ({k}):"); print(pd.Series(v))
if trade_stats: print("\n📊 Trade stats:"); print(pd.Series(trade_stats))
if os.path.exists(regime_path):
    print("\n📊 Regime performance:"); print(perf_by_regime)
print(f"\n✅ Block 11 hoàn tất. Stats saved: {STATS_FILE}, Charts & files under: {OUTPUT_DIR}")

📊 Kết quả Test:
Ngày bắt đầu                        2025-02-05
Ngày kết thúc                       2025-09-29
Giá trị cuối                      11690.494071
ROI (%)                              17.104017
Biến động (năm, %)                   28.069246
Sharpe                                1.008656
Sortino                               1.591649
MaxDrawdown (%)                     -19.096715
Tỷ lệ phiên thắng (%)                41.717791
Số ngày                                    163
Skewness                              0.511906
Kurtosis                              1.987763
Calmar                                 1.44777
Giá trị cuối (VNINDEX)            13125.920558
ROI VNINDEX (%)                      31.259206
Chênh lệch so với VNINDEX (pp)      -14.155188
InformationRatio                     -0.541155
Alpha (daily)                         0.000426
Beta                                  0.395968
dtype: object

📊 Stress Test (Trump áp thuế 46%):
Ngày bắt đầu                        2025-

**Block 12A: gửi tín hiệu lên telegram với dữ liệu quá khứ**

`Lưu ý` tuy quá khứ nhưng nhóm không để bị nhìn trước tương lai 

In [103]:
# Block 12a — Hàm gửi tín hiệu Telegram (Offline replay, full version)

import os, json, requests, time
import pandas as pd
import matplotlib.pyplot as plt

OUTPUT_DIR = "./backtest_ddpg/"

# ==== Load config ====
with open("config.json","r") as f:
    cfg = json.load(f)

TG_TOKEN = cfg["telegram"]["bot_token"]
TG_CHAT_ID = cfg["telegram"]["chat_id"]
TG_THREAD_ID = cfg["telegram"]["message_thread_id"]

def send_daily_report(report_date: str | pd.Timestamp, sleep_sec:int=2):
    """
    Gửi báo cáo Telegram cho 1 ngày bất kỳ trong backtest (offline replay).
    - Hiển thị danh mục hiện tại (lọc tỷ trọng > 0)
    - Hiển thị lệnh đóng hôm nay (PnL %)
    - Hiển thị lệnh mở hôm nay (TP, SL, tỷ trọng)
    - Gửi kèm chart equity
    """
    report_date = pd.Timestamp(report_date)

    # --- NAV ---
    pv = pd.read_csv(os.path.join(OUTPUT_DIR,"portfolio_value_test.csv"), index_col=0, parse_dates=True).iloc[:,0]
    if report_date not in pv.index:
        print(f"⚠️ {report_date.date()} không có trong NAV index")
        return
    nav_today = pv.loc[:report_date].iloc[-1]

    # --- Regime ---
    regime_df = pd.read_csv(os.path.join(OUTPUT_DIR,"regime_daily.csv"), parse_dates=["date"]).set_index("date")
    if report_date not in regime_df.index:
        print(f"⚠️ {report_date.date()} không có trong regime_daily.csv")
        return
    row = regime_df.loc[report_date]
    regime_today = row["regime"]
    cash_today = row["cash_buffer"]

    # --- Snapshot danh mục ---
    snap_path = os.path.join(OUTPUT_DIR, f"positions_snapshot_{report_date.date()}.csv")
    positions_txt = ""
    if os.path.exists(snap_path):
        pos = pd.read_csv(snap_path, parse_dates=["snapshot_date","entry_date"])
        pos = pos[pos["position_size_pct"] > 0]  # chỉ giữ vị thế còn vốn
        if len(pos)>0:
            for _,r in pos.iterrows():
                positions_txt += (
                    f"— {r['ticker']}: Mua {r['entry_price']:.2f} ngày {r['entry_date'].date()}, "
                    f"SL {r['sl_level']:.2f}, TP {r['tp_level']:.2f}, "
                    f"Giá hiện {r['last_price']:.2f}, "
                    f"Lãi/lỗ {r['current_unrealized_pct']:.2f}%, "
                    f"Tỷ trọng {r['position_size_pct']*100:.1f}%\n"
                )
        else:
            positions_txt = "— Không có vị thế nào đang mở\n"
    else:
        positions_txt = "— (Không tìm thấy snapshot)\n"

    # --- Tín hiệu hôm đó ---
    sig_path = os.path.join(OUTPUT_DIR, f"signals_today_{report_date.date()}.csv")
    signals_txt, closed_txt, opened_txt = "", "", ""
    if os.path.exists(sig_path):
        sigs = pd.read_csv(sig_path, parse_dates=["entry_date","exit_date"])
        if len(sigs)>0:
            # Lệnh mở
            opened = sigs[sigs["action"]=="BUY"].copy()
            if len(opened)>0:
                for _,r in opened.iterrows():
                    pos_size = r.get("position_size_pct", 0.0) * 100
                    opened_txt += (
                        f"— {r['ticker']}: Mua {r['entry_price']:.2f}, "
                        f"TP {r['tp_level']:.2f}, SL {r['sl_level']:.2f}, "
                        f"Tỷ trọng {pos_size:.1f}%\n"
                    )
            else:
                opened_txt = "— Không có lệnh mở nào\n"

            # Lệnh đóng
            closed = sigs[sigs["action"]=="SELL"].copy()
            if len(closed)>0:
                for _,r in closed.iterrows():
                    entry_price = r.get("entry_price", None)
                    exit_price  = r.get("exit_price", None)
                    pos_size    = r.get("position_size_pct", 0.0) * 100
                    if pd.notna(entry_price) and pd.notna(exit_price) and entry_price>0:
                        pnl_pct = (exit_price/entry_price - 1)*100
                    else:
                        pnl_pct = float("nan")
                    closed_txt += (
                        f"— {r['ticker']}: Mua {entry_price:.2f} → "
                        f"Bán {exit_price:.2f}, "
                        f"{'Lãi' if pnl_pct>0 else 'Lỗ'} {pnl_pct:.2f}%, "
                        f"Tỷ trọng {pos_size:.1f}%\n"
                    )
            else:
                closed_txt = "— Không có lệnh đóng nào\n"

            # Tín hiệu gốc
            for _,r in sigs.iterrows():
                if r["action"]=="BUY":
                    signals_txt += (
                        f"🟢 MUA {r['ticker']} giá {r['entry_price']:.2f}, "
                        f"TP {r['tp_level']:.2f}, SL {r['sl_level']:.2f}\n"
                    )
                else:
                    exit_price = r["exit_price"] if pd.notna(r["exit_price"]) else 0.0
                    signals_txt += (
                        f"🔴 BÁN {r['ticker']} giá {exit_price:.2f}, "
                        f"loại {r.get('exit_type','NA')}\n"
                    )
        else:
            signals_txt = "— Không có tín hiệu giao dịch hôm nay\n"
    else:
        signals_txt = "— (Không tìm thấy file tín hiệu)\n"
        opened_txt = "— (Không tìm thấy file tín hiệu)\n"
        closed_txt = "— (Không tìm thấy file tín hiệu)\n"

    # --- Chart equity đến ngày đó ---
    plt.figure(figsize=(8,5))
    plt.plot(pv.loc[:report_date], label="Chiến lược")
    plt.title(f"Equity đến {report_date.date()}")
    plt.grid(True, alpha=0.3); plt.legend()
    chart_path = os.path.join(OUTPUT_DIR, f"equity_until_{report_date.date()}.png")
    plt.savefig(chart_path, dpi=150); plt.close()

    # --- Compose message ---
    msg = f"""
📅 Ngày {report_date.date()}

💰 Giá trị tài sản: {nav_today:,.0f}
💵 Tiền mặt: {cash_today*100:.1f}%
📈 Chế độ thị trường: {regime_today}

📊 Danh mục hiện tại:
{positions_txt}

💡 Lệnh đóng hôm nay:
{closed_txt}

🟢 Lệnh mở hôm nay:
{opened_txt}

📌 Tín hiệu trong ngày:
{signals_txt}
""".strip()

    # --- Gửi text ---
    send_url = f"https://api.telegram.org/bot{TG_TOKEN}/sendMessage"
    requests.post(send_url, data={
        "chat_id": TG_CHAT_ID,
        "message_thread_id": TG_THREAD_ID,
        "text": msg
    })
    time.sleep(sleep_sec)

    # --- Gửi chart ---
    photo_url = f"https://api.telegram.org/bot{TG_TOKEN}/sendPhoto"
    with open(chart_path,"rb") as f:
        requests.post(photo_url, data={
            "chat_id": TG_CHAT_ID,
            "message_thread_id": TG_THREAD_ID,
            "caption": f"Equity Curve đến {report_date.date()}"
        }, files={"photo":f})
    time.sleep(sleep_sec)

    print(f"✅ Đã gửi báo cáo Telegram cho ngày {report_date.date()}")

In [104]:
# Cell runner — gửi báo cáo Telegram từ 26/03 đến 16/04/2025

import pandas as pd

start = pd.Timestamp("2025-03-26")
end   = pd.Timestamp("2025-04-16")

for d in pd.date_range(start, end, freq="D"):
    try:
        send_daily_report(d, sleep_sec=10)  # sleep 2 giây giữa mỗi tin nhắn
    except Exception as e:
        print(f"⚠️ Lỗi khi gửi ngày {d.date()}: {e}")

✅ Đã gửi báo cáo Telegram cho ngày 2025-03-26
✅ Đã gửi báo cáo Telegram cho ngày 2025-03-27
✅ Đã gửi báo cáo Telegram cho ngày 2025-03-28
⚠️ 2025-03-29 không có trong NAV index
⚠️ 2025-03-30 không có trong NAV index
✅ Đã gửi báo cáo Telegram cho ngày 2025-03-31
✅ Đã gửi báo cáo Telegram cho ngày 2025-04-01
✅ Đã gửi báo cáo Telegram cho ngày 2025-04-02
✅ Đã gửi báo cáo Telegram cho ngày 2025-04-03
✅ Đã gửi báo cáo Telegram cho ngày 2025-04-04
⚠️ 2025-04-05 không có trong NAV index
⚠️ 2025-04-06 không có trong NAV index
⚠️ 2025-04-07 không có trong NAV index
✅ Đã gửi báo cáo Telegram cho ngày 2025-04-08
✅ Đã gửi báo cáo Telegram cho ngày 2025-04-09
✅ Đã gửi báo cáo Telegram cho ngày 2025-04-10
✅ Đã gửi báo cáo Telegram cho ngày 2025-04-11
⚠️ 2025-04-12 không có trong NAV index
⚠️ 2025-04-13 không có trong NAV index
✅ Đã gửi báo cáo Telegram cho ngày 2025-04-14
✅ Đã gửi báo cáo Telegram cho ngày 2025-04-15
✅ Đã gửi báo cáo Telegram cho ngày 2025-04-16
