**HƯỚNG DẪN CHẠY**

*Nhóm chạy code theo thứ tự từng cell từ trên xuống xuống dưới*

**Một số điểm lưu ý:**

- *Thời gian chạy các block đa số lâu (Khoảng 10 phút riêng block 6 khoảng 25 phút)*

- *Các block 8,9,10,11 có lưu kết quả file csv* 

- **Các file csv kết quả nhóm có upload lên github**

**Tải các thư viện cần thiết** 

In [None]:
!pip install pandas numpy torch scikit-learn matplotlib
!pip install --extra-index-url https://fiinquant.github.io/fiinquantx/simple fiinquantx
!pip install --upgrade --extra-index-url https://fiinquant.github.io/fiinquantx/simple fiinquantx

**Block 1: tải dữ liệu lịch sử và realtime**

In [1]:
# Block 1 — Login & Lấy dữ liệu tất cả HOSE/HNX/UPCOM
import pandas as pd
from FiinQuantX import FiinSession, BarDataUpdate
# --- Login ---
username = "DSTC_18@fiinquant.vn"
password = "Fiinquant0606"

client = FiinSession(
    username=username,
    password=password
).login()

# --- Lấy danh sách cổ phiếu từng sàn ---
tickers_hose  = list(client.TickerList(ticker="VNINDEX"))     # HOSE
print(f"Số mã HOSE: {len(tickers_hose)}")

# --- Lấy dữ liệu lịch sử toàn bộ (có thể nặng, nên lấy theo batch nếu cần) ---
event_history = client.Fetch_Trading_Data(
    realtime=False,
    tickers=tickers_hose,
    fields=['open','high','low','close','volume','bu','sd','fs','fn'], 
    adjusted=True,
    by="1d",
    from_date="2023-01-01"   # backtest từ 2023 tới nay
)

df_all = event_history.get_data()
print("History ban đầu:", df_all.head())

# --- Callback realtime ---
def onDataUpdate(data: BarDataUpdate):
    global df_all
    df_update = data.to_dataFrame()
    df_all = pd.concat([df_all, df_update])
    df_all = df_all.drop_duplicates()
    print("Realtime update:")
    print(df_update.head())

# --- Bật realtime nối tiếp dữ liệu ---
event_realtime = client.Fetch_Trading_Data(
    realtime=True,
    tickers=tickers_hose,
    fields=['open','high','low','close','volume','bu','sd','fs','fn'], 
    adjusted=True,
    by="1d",
    period=1,
    callback=onDataUpdate
)


Số mã HOSE: 415
Fetching data, it may take a while. Please wait...
History ban đầu:   ticker         timestamp      open      high       low     close     volume  \
0    AAA  2023-01-03 00:00  6539.643  6866.145  6539.643  6866.145  1543984.0   
1    AAA  2023-01-04 00:00  6866.145  7000.587  6827.733  6827.733  1302505.0   
2    AAA  2023-01-05 00:00  6866.145  6904.557  6808.527  6885.351   980473.0   
3    AAA  2023-01-06 00:00  6885.351  6990.984  6818.130  6856.542  1431699.0   
4    AAA  2023-01-09 00:00  6914.160  6962.175  6760.512  6789.321  1121385.0   

         bu        sd           fs           fn  
0  938600.0  504700.0   40579000.0  899404000.0  
1  462900.0  780600.0  151639000.0   36850000.0  
2  487200.0  473700.0  343911000.0  -59103000.0  
3  564300.0  828300.0  345999000.0 -294312000.0  
4  414000.0  631800.0  514557000.0 -483197000.0  


**Block 2: lấy dữ liệu FA, lọc các mã không hợp lệ**

In [2]:
# Block 2 — Lấy dữ liệu FA theo quý (HOSE only)

def fetch_fa_quarterly(ticker, latest_year=2025, n_periods=32):
    try:
        fi_list = client.FundamentalAnalysis().get_ratios(
            tickers=[ticker],
            TimeFilter="Quarterly",
            LatestYear=latest_year,
            NumberOfPeriod=n_periods,
            Consolidated=True
        )

        # Nếu không có dữ liệu thì bỏ qua
        if not fi_list or not isinstance(fi_list, list):
            return pd.DataFrame()

        df = pd.DataFrame(fi_list)
        if df.empty:
            return pd.DataFrame()

        df["ticker"] = ticker
        if "ReportDate" in df.columns:
            df["ReportDate"] = pd.to_datetime(df["ReportDate"])
        else:
            # Nếu không có ReportDate thì tạo cột null để tránh lỗi concat
            df["ReportDate"] = pd.NaT

        return df

    except Exception as e:
        print(f"⚠️ Lỗi khi lấy FA cho {ticker}: {e}")
        return pd.DataFrame()


# --- Lọc danh sách: chỉ giữ những mã có dữ liệu FA ---
fa_list = []
valid_tickers = []

for t in tickers_hose:   # lấy theo danh sách HOSE từ Block 1
    df_fa = fetch_fa_quarterly(t, latest_year=2025, n_periods=32)
    if not df_fa.empty:
        fa_list.append(df_fa)
        valid_tickers.append(t)

# --- Gộp DataFrame ---
if fa_list:
    fa_data = pd.concat(fa_list, ignore_index=True)
else:
    fa_data = pd.DataFrame()

print(f"Số mã HOSE ban đầu: {len(tickers_hose)}")
print(f"Số mã có dữ liệu FA: {len(valid_tickers)}")
print("FA Data sample:")
print(fa_data.head())


⚠️ Lỗi khi lấy FA cho FUETPVND: 'FUETPVND'
Số mã HOSE ban đầu: 415
Số mã có dữ liệu FA: 392
FA Data sample:
   organizationId ticker  year  quarter  \
0          894364    CCC  2023        4   
1          894364    CCC  2024        1   
2          894364    CCC  2024        2   
3          894364    CCC  2024        3   
4          894364    CCC  2024        4   

                                              ratios ReportDate  
0  {'SolvencyRatio': {'DebtToEquityRatio': 1.5102...        NaT  
1  {'SolvencyRatio': {'DebtToEquityRatio': 0.7722...        NaT  
2  {'SolvencyRatio': {'DebtToEquityRatio': 0.7357...        NaT  
3  {'SolvencyRatio': {'DebtToEquityRatio': 0.7914...        NaT  
4  {'SolvencyRatio': {'DebtToEquityRatio': 0.6437...        NaT  


**Block 3: Chuẩn hóa FA và gộp dữ liệu với giá**

In [3]:
# Block 3 — Chuẩn hoá FA + Merge với giá (HOSE only, dựa theo Block 2)

import pandas as pd

# --- Các chỉ số FA cần lấy ---
fa_fields = [
    "DebtToEquityRatio","EBITMargin","ROA","ROE","ROIC",
    "BasicEPS","PriceToBook","PriceToEarning",
    "NetRevenueGrowthYoY","GrossProfitGrowthYoY"
]

# --- Hàm nổ ratios ---
def explode_ratios(df, fa_fields):
    records = []
    for _, row in df.iterrows():
        d = {
            "ticker": row["ticker"],
            "fa_year": int(row["year"]),
            "fa_quarter": int(row["quarter"])
        }
        ratios = row.get("ratios", {})
        if isinstance(ratios, dict):   # ✅ fix chỗ lỗi
            for f in fa_fields:
                val = None
                for section in ratios.values():
                    if isinstance(section, dict) and f in section:
                        val = section[f]
                d[f] = val
        else:
            # nếu ratios không phải dict thì gán NaN hết
            for f in fa_fields:
                d[f] = None
        records.append(d)
    return pd.DataFrame(records)

# --- Chuẩn hoá FA ---
fa_clean = explode_ratios(fa_data, fa_fields)

# --- Chuẩn hoá giá ---
df_price = df_all[df_all["ticker"].isin(valid_tickers)].copy()
df_price["timestamp"] = pd.to_datetime(df_price["timestamp"])
df_price = df_price.sort_values(["ticker","timestamp"])

# tạo key (fa_year, fa_quarter) = quý trước
pi = df_price["timestamp"].dt.to_period("Q")
prev_pi = pi - 1
df_price["fa_year"] = prev_pi.dt.year.astype(int)
df_price["fa_quarter"] = prev_pi.dt.quarter.astype(int)

# --- Xử lý FA: giữ duy nhất bản cuối cùng mỗi quý
fa_clean = (
    fa_clean.sort_values(["ticker","fa_year","fa_quarter"])
            .drop_duplicates(subset=["ticker","fa_year","fa_quarter"], keep="last")
)

# --- Merge giá + FA ---
df_merged = df_price.merge(
    fa_clean,
    on=["ticker","fa_year","fa_quarter"],
    how="left"
)

# FFill theo thời gian trong từng ticker để lấp chỗ trống
df_merged = df_merged.sort_values(["ticker","timestamp"])
df_merged[fa_fields] = df_merged.groupby("ticker")[fa_fields].ffill()

print("Sample merged:")
print(df_merged.head())
print("Số mã merge thành công:", df_merged["ticker"].nunique())




Sample merged:
  ticker  timestamp      open      high       low     close     volume  \
0    AAA 2023-01-03  6539.643  6866.145  6539.643  6866.145  1543984.0   
1    AAA 2023-01-04  6866.145  7000.587  6827.733  6827.733  1302505.0   
2    AAA 2023-01-05  6866.145  6904.557  6808.527  6885.351   980473.0   
3    AAA 2023-01-06  6885.351  6990.984  6818.130  6856.542  1431699.0   
4    AAA 2023-01-09  6914.160  6962.175  6760.512  6789.321  1121385.0   

         bu        sd           fs  ...  DebtToEquityRatio  EBITMargin  \
0  938600.0  504700.0   40579000.0  ...           0.507521   -0.049731   
1  462900.0  780600.0  151639000.0  ...           0.507521   -0.049731   
2  487200.0  473700.0  343911000.0  ...           0.507521   -0.049731   
3  564300.0  828300.0  345999000.0  ...           0.507521   -0.049731   
4  414000.0  631800.0  514557000.0  ...           0.507521   -0.049731   

        ROA       ROE      ROIC    BasicEPS  PriceToBook  PriceToEarning  \
0  0.014669  0.0295

**Xóa biến df_all không cần thiết nữa để giảm dung lượng RAM**

In [4]:
import gc
del df_all
gc.collect()


0

**Block 4: Tính các chỉ số TA dựa vào thư viện FiinQuant và ghép dữ liệu**

In [5]:
# Block 4 — Tính các chỉ số TA (trên df_merged từ Block 3)

import pandas as pd

# --- Khởi tạo Indicator ---
fi = client.FiinIndicator()

# --- Hàm tính TA theo từng ticker ---
def add_ta_indicators(df):
    df = df.sort_values("timestamp").copy()

    # EMA
    df['ema_5']  = fi.ema(df['close'], window=5)
    df['ema_20'] = fi.ema(df['close'], window=20)
    df['ema_50'] = fi.ema(df['close'], window=50)

    # MACD
    df['macd']        = fi.macd(df['close'], window_fast=12, window_slow=26)
    df['macd_signal'] = fi.macd_signal(df['close'], window_fast=12, window_slow=26, window_sign=9)
    df['macd_diff']   = fi.macd_diff(df['close'], window_fast=12, window_slow=26, window_sign=9)

    # RSI
    df['rsi'] = fi.rsi(df['close'], window=14)

    # Bollinger Bands
    df['bollinger_hband'] = fi.bollinger_hband(df['close'], window=20, window_dev=2)
    df['bollinger_lband'] = fi.bollinger_lband(df['close'], window=20, window_dev=2)

    # ATR
    df['atr'] = fi.atr(df['high'], df['low'], df['close'], window=14)

    # OBV
    df['obv'] = fi.obv(df['close'], df['volume'])

    # VWAP
    df['vwap'] = fi.vwap(df['high'], df['low'], df['close'], df['volume'], window=14)

    return df

# --- Áp dụng cho toàn bộ df_merged ---
df_with_ta = df_merged.groupby("ticker", group_keys=False).apply(add_ta_indicators)

print("Sample with TA:")
print(df_with_ta.head())
print("Shape sau khi thêm TA:", df_with_ta.shape)


Sample with TA:
  ticker  timestamp      open      high       low     close     volume  \
0    AAA 2023-01-03  6539.643  6866.145  6539.643  6866.145  1543984.0   
1    AAA 2023-01-04  6866.145  7000.587  6827.733  6827.733  1302505.0   
2    AAA 2023-01-05  6866.145  6904.557  6808.527  6885.351   980473.0   
3    AAA 2023-01-06  6885.351  6990.984  6818.130  6856.542  1431699.0   
4    AAA 2023-01-09  6914.160  6962.175  6760.512  6789.321  1121385.0   

         bu        sd           fs  ...  ema_50  macd  macd_signal  macd_diff  \
0  938600.0  504700.0   40579000.0  ...     NaN   NaN          NaN        NaN   
1  462900.0  780600.0  151639000.0  ...     NaN   NaN          NaN        NaN   
2  487200.0  473700.0  343911000.0  ...     NaN   NaN          NaN        NaN   
3  564300.0  828300.0  345999000.0  ...     NaN   NaN          NaN        NaN   
4  414000.0  631800.0  514557000.0  ...     NaN   NaN          NaN        NaN   

   rsi  bollinger_hband  bollinger_lband  atr       

**Xóa bớt biến df_merged không còn cần thiết để giảm dung lượng RAM**

In [6]:
import gc
del df_merged
gc.collect()

0

**Block 5: Chuẩn hóa dữ liệu FA và TA**

In [7]:
# Block 5 — Feature engineering & scaling

import numpy as np

# --- Danh sách cột FA & TA ---
fa_features = [
    "DebtToEquityRatio","EBITMargin","ROA","ROE","ROIC",
    "BasicEPS","PriceToBook","PriceToEarning",
    "NetRevenueGrowthYoY","GrossProfitGrowthYoY"
]

ta_features = [
    "ema_5","ema_20","ema_50","macd","macd_signal","macd_diff",
    "rsi","bollinger_hband","bollinger_lband","atr","obv","vwap"
]

# --- Chuẩn hoá FA: cross-section min-max scaling theo ngày ---
def scale_fa_minmax(df):
    df_scaled = df.copy()
    for f in fa_features:
        vals = df[f].astype(float)
        vmin, vmax = vals.min(), vals.max()
        if np.isfinite(vmin) and np.isfinite(vmax) and vmax > vmin:
            df_scaled[f] = (vals - vmin) / (vmax - vmin)
        else:
            df_scaled[f] = np.nan
    return df_scaled

df_scaled_fa = df_with_ta.groupby("timestamp", group_keys=False).apply(scale_fa_minmax)

# --- Chuẩn hoá TA: rolling z-score theo từng ticker ---
def zscore_rolling(series, window=60):
    return (series - series.rolling(window).mean()) / series.rolling(window).std()

df_scaled = df_scaled_fa.groupby("ticker", group_keys=False).apply(
    lambda g: g.assign(**{f"{col}_z": zscore_rolling(g[col], 60) for col in ta_features})
)

# --- Drop các cột gốc TA, giữ bản z-score ---
keep_cols = ["ticker","timestamp"] + fa_features + [f"{col}_z" for col in ta_features]
df_features = df_scaled[keep_cols].dropna().reset_index(drop=True)

print("Sample features:")
print(df_features.head())
print("Shape sau khi scaling & dropna:", df_features.shape)


Sample features:
  ticker  timestamp  DebtToEquityRatio  EBITMargin       ROA       ROE  \
0    AAA 2023-06-14           0.559115    0.978956  0.368731  0.415728   
1    AAA 2023-06-15           0.559115    0.978956  0.368731  0.415728   
2    AAA 2023-06-16           0.559115    0.978956  0.368731  0.415728   
3    AAA 2023-06-19           0.559115    0.978956  0.368731  0.415728   
4    AAA 2023-06-20           0.559115    0.978956  0.368731  0.415728   

     ROIC  BasicEPS  PriceToBook  PriceToEarning  ...  ema_50_z    macd_z  \
0  0.7533  0.158046      0.37764        0.537353  ...  1.799046 -0.008320   
1  0.7533  0.158046      0.37764        0.537353  ...  1.761009 -0.314808   
2  0.7533  0.158046      0.37764        0.537353  ...  1.701399 -0.891565   
3  0.7533  0.158046      0.37764        0.537353  ...  1.651219 -1.278916   
4  0.7533  0.158046      0.37764        0.537353  ...  1.609327 -1.509982   

   macd_signal_z  macd_diff_z     rsi_z  bollinger_hband_z  bollinger_lband

**Xóa các biến df_with_ta, df_scaled, df_scaled_fa không cần thiết nữa**

In [8]:
del df_with_ta, df_scaled, df_scaled_fa
gc.collect()


0

**Block 6: Giảm chiều dữ liệu bằng t-SNE và phân cụm bằng DBSCAN**

In [9]:
# Block 6 — Dimensionality reduction & Clustering (t-SNE + DBSCAN)

from sklearn.manifold import TSNE
from sklearn.cluster import DBSCAN

# --- Chọn các cột features để clustering ---
feature_cols = [
    "DebtToEquityRatio","EBITMargin","ROA","ROE","ROIC",
    "BasicEPS","PriceToBook","PriceToEarning",
    "NetRevenueGrowthYoY","GrossProfitGrowthYoY"
] + [c for c in df_features.columns if c.endswith("_z")]

# --- Thêm cột tháng để snapshot ---
df_features["month"] = df_features["timestamp"].dt.to_period("M")

cluster_results = []

for (month, g) in df_features.groupby("month"):
    if len(g) < 10:   # quá ít cổ phiếu thì bỏ
        continue

    X = g[feature_cols].values

    # --- t-SNE giảm chiều còn 2D ---
    tsne = TSNE(n_components=2, perplexity=30, learning_rate="auto", init="random", random_state=42)
    X_emb = tsne.fit_transform(X)

    # --- DBSCAN clustering ---
    db = DBSCAN(eps=0.5, min_samples=5).fit(X_emb)
    labels = db.labels_

    temp = g[["ticker","timestamp"]].copy()
    temp["cluster"] = labels
    temp["tsne_x"] = X_emb[:,0]
    temp["tsne_y"] = X_emb[:,1]
    temp["month"]  = str(month)

    cluster_results.append(temp)

df_clusters = pd.concat(cluster_results, ignore_index=True)

print("Cluster sample:")
print(df_clusters.head())
print("Số cụm mỗi tháng:")
print(df_clusters.groupby("month")["cluster"].nunique())


Cluster sample:
  ticker  timestamp  cluster     tsne_x     tsne_y    month
0    AAA 2023-06-14       -1  34.417667 -21.641058  2023-06
1    AAA 2023-06-15       -1  34.361568 -21.848602  2023-06
2    AAA 2023-06-16       -1  35.918636 -42.906532  2023-06
3    AAA 2023-06-19       -1  35.490566 -42.649422  2023-06
4    AAA 2023-06-20       -1  34.553066 -42.459969  2023-06
Số cụm mỗi tháng:
month
2023-06     16
2023-07     67
2023-08     78
2023-09     40
2023-10     83
2023-11     55
2023-12     83
2024-01     90
2024-02     31
2024-03     64
2024-04     43
2024-05     66
2024-06     94
2024-07     87
2024-08     65
2024-09     86
2024-10    104
2024-11     80
2024-12     74
2025-01     49
2025-02     68
2025-03     93
2025-04     52
2025-05     79
2025-06     95
2025-07    110
2025-08     74
Name: cluster, dtype: int64


**Block 7: Xây tensors (clusters mapping) và masks (active stocks)**

In [None]:
# Block 7 — Tensors & Masks 

import numpy as np
import pandas as pd
import os, gc, json

LOOKBACK = 64   # window size
DATA_DIR = "./tensors/"
os.makedirs(DATA_DIR, exist_ok=True)

feature_cols = [c for c in df_features.columns if c not in ["ticker","timestamp","cluster","month"]]

tensor_index = []

for c_id, g in df_clusters.groupby("cluster"):
    if c_id == -1:   # noise bỏ qua
        continue

    tickers = sorted(g["ticker"].unique())
    g_feat = df_features[df_features["ticker"].isin(tickers)].copy()

    # Pivot: index = timestamp, columns = (ticker, feature)
    pivoted = g_feat.pivot(index="timestamp", columns="ticker", values=feature_cols)
    pivoted.columns = pd.MultiIndex.from_product([tickers, feature_cols])

    # Mask
    mask_df = ~pivoted.isna()
    pivoted_filled = pivoted.ffill().bfill()

    T, N, F = len(pivoted_filled.index), len(tickers), len(feature_cols)
    X = pivoted_filled.values.reshape(T, N, F)
    M = mask_df.values.reshape(T, N, F).astype(np.int8)

    cluster_tensors, cluster_masks, cluster_dates = [], [], []
    for i in range(LOOKBACK, T):
        cluster_tensors.append(X[i-LOOKBACK:i])
        cluster_masks.append(M[i-LOOKBACK:i])
        cluster_dates.append(pivoted_filled.index[i])  # ngày cuối của window

    if cluster_tensors:
        X_arr = np.array(cluster_tensors, dtype=np.float16)  # tiết kiệm RAM
        M_arr = np.array(cluster_masks, dtype=np.int8)

        tensor_file = f"cluster_{c_id}_tensor.npy"
        mask_file   = f"cluster_{c_id}_mask.npy"
        np.save(os.path.join(DATA_DIR, tensor_file), X_arr)
        np.save(os.path.join(DATA_DIR, mask_file), M_arr)

        tensor_index.append({
            "cluster": int(c_id),
            "tickers": tickers,
            "dates": [str(d) for d in cluster_dates],   # ngày T
            "dates_shifted": [str(d+pd.Timedelta(days=1)) for d in cluster_dates],  # T+1 (cho reward)
            "tensor_file": tensor_file,
            "mask_file": mask_file
        })

        print(f"Cluster {c_id}: tensor {X_arr.shape}, mask {M_arr.shape} saved.")

    del g_feat, pivoted, pivoted_filled, mask_df, X, M, cluster_tensors, cluster_masks
    gc.collect()

# Save metadata
with open(os.path.join(DATA_DIR, "tensor_index.json"), "w") as f:
    json.dump(tensor_index, f, indent=2)

print("✅ Done Block 7: tensors + masks saved for all clusters.")

# Xoá df_features giảm RAM
del df_features
gc.collect()


**Block 7.5: Chuẩn bị dữ liệu backtest(loại bỏ các cột dữ liệu không cần thiết nữa)**

In [11]:
# Block 7.5 — Chuẩn bị dữ liệu backtest cho reward thật
import gc

# Chỉ giữ dữ liệu cần thiết để tính reward (close price)
# df_price có từ Block 1 (OHLCV đầy đủ)
df_backtest = df_price[["ticker", "timestamp", "close"]].copy()

# Ép timestamp về dạng datetime để đồng bộ
df_backtest["timestamp"] = pd.to_datetime(df_backtest["timestamp"])

print("✅ Done Block 7.5: df_backtest sẵn sàng cho reward.")
print("Kích thước df_backtest:", df_backtest.shape)
print("Tickers unique:", df_backtest["ticker"].nunique())

# Xóa những biến không còn cần để tiết kiệm RAM
del df_price
gc.collect()


✅ Done Block 7.5: df_backtest sẵn sàng cho reward.
Kích thước df_backtest: (256151, 3)
Tickers unique: 391


21

**Block 8: Huấn luyện A3C theo từng cụm**

In [None]:
# Block 8 — A3C multi-stock per-cluster 

import os, gc, json, csv
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim

DATA_DIR = "./tensors/"
SIG_DIR  = "./signals/"
MODEL_DIR = "./models/"
os.makedirs(SIG_DIR, exist_ok=True)
os.makedirs(MODEL_DIR, exist_ok=True)

SIG_FILE = os.path.join(SIG_DIR, "a3c_signals.csv")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Reset signals file
if os.path.exists(SIG_FILE):
    os.remove(SIG_FILE)
with open(SIG_FILE, "w", newline="") as f:
    csv.writer(f).writerow(["date","ticker","signal"])

# Load metadata
with open(os.path.join(DATA_DIR, "tensor_index.json"), "r") as f:
    tensor_index = json.load(f)

# --- Model ---
class A3CNet(nn.Module):
    def __init__(self, n_features, hidden=64):
        super().__init__()
        self.lstm = nn.LSTM(input_size=n_features, hidden_size=hidden, batch_first=True)
        self.actor = nn.Linear(hidden, 3)   # short, flat, long
        self.critic = nn.Linear(hidden, 1)
    def forward(self, x):
        out, _ = self.lstm(x)
        h = out[:, -1, :]
        return self.actor(h), self.critic(h)

# --- Loss ---
def a3c_loss(logits, values, actions, rewards, beta=0.01):
    adv = rewards - values.squeeze(-1)
    critic = adv.pow(2).mean()
    logp = torch.log_softmax(logits, dim=-1)
    actor = -(logp.gather(1, actions.unsqueeze(1)).squeeze(1) * adv.detach()).mean()
    entropy = -(torch.softmax(logits, dim=-1) * logp).sum(-1).mean()
    return actor + 0.5*critic - beta*entropy

# --- Training & inference ---
def process_cluster(meta, epochs=3, lr=1e-3, batch_size=256):
    c_id, tickers, dates, dates_shifted = meta["cluster"], meta["tickers"], meta["dates"], meta["dates_shifted"]
    X = np.load(os.path.join(DATA_DIR, meta["tensor_file"]), mmap_mode="r")
    M = np.load(os.path.join(DATA_DIR, meta["mask_file"]), mmap_mode="r")
    if X.size == 0:
        return
    B, T, N, F = X.shape
    print(f"Cluster {c_id} | X={X.shape}")

    # --- Lấy giá để tính reward ---
    px = []
    for tk in tickers:
        s = df_backtest[df_backtest["ticker"]==tk].set_index("timestamp")["close"]
        s = s.reindex(dates_shifted).ffill().bfill().values  # dùng dates_shifted
        px.append(s)
    px = np.stack(px, axis=1)  # shape (B, N)
    r = np.zeros_like(px, dtype=np.float32)
    r[1:] = np.log(px[1:] / np.maximum(px[:-1], 1e-9))  # daily log-return

    # --- Model + optimizer ---
    model = A3CNet(F).to(device)
    opt = optim.Adam(model.parameters(), lr=lr)

    # Mini-batch generator
    total = B*N
    def iterator():
        for start in range(0, total, batch_size):
            end = min(total, start+batch_size)
            xb, mb, rb, idx = [], [], [], []
            for s in range(start,end):
                b, n = divmod(s, N)
                xb.append(X[b,:,n,:])
                mb.append(M[b,:,n,:])
                rb.append(r[b,n])
                idx.append((b,n))
            yield np.stack(xb), np.stack(mb), np.array(rb), idx

    # --- Train ---
    for ep in range(epochs):
        loss_ep = 0
        for xb, mb, rb, _ in iterator():
            xb = torch.tensor(xb, dtype=torch.float32).to(device)
            rb = torch.tensor(rb, dtype=torch.float32).to(device)
            # mask vào input (loại bỏ chỗ NaN fill)
            xb = xb * torch.tensor(mb, dtype=torch.float32).to(device)

            logits, vals = model(xb)
            dist = torch.distributions.Categorical(logits=logits)
            act = dist.sample()
            # reward theo action
            reward = torch.where(act==2, rb, torch.where(act==0, -rb, torch.zeros_like(rb)))
            loss = a3c_loss(logits, vals, act, reward)

            opt.zero_grad(); loss.backward(); opt.step()
            loss_ep += loss.item()
        print(f"  Epoch {ep+1}/{epochs}, Loss={loss_ep:.4f}")
        gc.collect(); torch.cuda.empty_cache()

    # --- Save model ---
    model_path = os.path.join(MODEL_DIR, f"a3c_cluster_{c_id}.pt")
    torch.save(model.state_dict(), model_path)
    print(f"  ✅ Saved model checkpoint: {model_path}")

    # --- Inference & save signals ---
    with open(SIG_FILE,"a",newline="") as f:
        w = csv.writer(f)
        with torch.no_grad():
            for xb, mb, _, idx in iterator():
                xb = torch.tensor(xb, dtype=torch.float32).to(device)
                xb = xb * torch.tensor(mb, dtype=torch.float32).to(device)
                acts = torch.argmax(model(xb)[0], dim=-1).cpu().numpy() - 1  # (-1,0,1)
                for k,(b,n) in enumerate(idx):
                    w.writerow([dates[b], tickers[n], int(acts[k])])
                del xb, acts
                gc.collect(); torch.cuda.empty_cache()

    del X, M, px, r, model, opt
    gc.collect(); torch.cuda.empty_cache()

# --- Run all clusters ---
for meta in tensor_index:
    process_cluster(meta)

print(f"✅ Done Block 8: signals saved to {SIG_FILE}, models in {MODEL_DIR}")


**Block 9: Suy luận từ mô hình A3C**

In [None]:
# Block 9 — Inference từ checkpoint A3C

import os, gc, json, csv
import numpy as np
import pandas as pd
import torch
import torch.nn as nn

DATA_DIR = "./tensors/"
MODEL_DIR = "./models/"
SIG_DIR   = "./signals/"
os.makedirs(SIG_DIR, exist_ok=True)

SIG_FILE = os.path.join(SIG_DIR, "a3c_signals_infer.csv")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Reset signals file
if os.path.exists(SIG_FILE):
    os.remove(SIG_FILE)
with open(SIG_FILE, "w", newline="") as f:
    csv.writer(f).writerow(["date","ticker","signal"])

# Load metadata
with open(os.path.join(DATA_DIR, "tensor_index.json"), "r") as f:
    tensor_index = json.load(f)

# --- Model định nghĩa lại (giống Block 8) ---
class A3CNet(nn.Module):
    def __init__(self, n_features, hidden=64):
        super().__init__()
        self.lstm = nn.LSTM(input_size=n_features, hidden_size=hidden, batch_first=True)
        self.actor = nn.Linear(hidden, 3)   # short, flat, long
        self.critic = nn.Linear(hidden, 1)
    def forward(self, x):
        out, _ = self.lstm(x)
        h = out[:, -1, :]
        return self.actor(h), self.critic(h)

# --- Inference function ---
def infer_cluster(meta, batch_size=256):
    c_id, tickers, dates, dates_shifted = meta["cluster"], meta["tickers"], meta["dates"], meta["dates_shifted"]
    X = np.load(os.path.join(DATA_DIR, meta["tensor_file"]), mmap_mode="r")
    M = np.load(os.path.join(DATA_DIR, meta["mask_file"]), mmap_mode="r")
    if X.size == 0:
        return
    B, T, N, F = X.shape
    print(f"[Inference] Cluster {c_id} | X={X.shape}")

    # Load model checkpoint
    model_path = os.path.join(MODEL_DIR, f"a3c_cluster_{c_id}.pt")
    if not os.path.exists(model_path):
        print(f"⚠️ Model checkpoint not found: {model_path}, skip")
        return
    model = A3CNet(F).to(device)
    model.load_state_dict(torch.load(model_path, map_location=device))
    model.eval()

    # Inference & save signals
    total = B * N
    with open(SIG_FILE, "a", newline="") as f:
        w = csv.writer(f)
        with torch.no_grad():
            for start in range(0, total, batch_size):
                end = min(total, start+batch_size)
                xb, mb, idx = [], [], []
                for s in range(start, end):
                    b, n = divmod(s, N)
                    xb.append(X[b, :, n, :])
                    mb.append(M[b, :, n, :])
                    idx.append((b, n))
                xb = torch.tensor(np.stack(xb), dtype=torch.float32).to(device)
                mb = torch.tensor(np.stack(mb), dtype=torch.float32).to(device)

                # Áp dụng mask
                xb = xb * mb

                acts = torch.argmax(model(xb)[0], dim=-1).cpu().numpy() - 1  # (-1,0,1)
                for k,(b,n) in enumerate(idx):
                    # dùng dates[b] (ngày cuối window), nhưng reward tính T+1 (dates_shifted)
                    w.writerow([dates[b], tickers[n], int(acts[k])])
                del xb, mb, acts
                gc.collect(); torch.cuda.empty_cache()

    del X, M, model
    gc.collect(); torch.cuda.empty_cache()

# --- Run inference all clusters ---
for meta in tensor_index:
    infer_cluster(meta)

print(f"✅ Done Block 9: inference signals saved to {SIG_FILE}")


**Block 10 : Huấn luyện Cluster DDPG (chỉ với trường hợp vị thế long)**

In [None]:
# Block 10 — Cluster DDPG + Execution Lag + Turnover Cost 

import os, gc, json, csv
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from FiinQuantX import FiinSession

# ===== Paths =====
DATA_DIR   = "./tensors/"
SIG_DIR    = "./signals/"
OUTPUT_DIR = "./backtest_ddpg/"
os.makedirs(OUTPUT_DIR, exist_ok=True)

# ===== Hyper-params & config =====
INIT_CAPITAL   = 10_000
BENCHMARK_TKR  = "VNINDEX"
EXECUTION_LAG  = 1         # tín hiệu T -> return T+1
COST_BPS       = 30        # phí theo turnover 0.30% = 30bps
STATE_LKBK     = 5         # số ngày lịch sử dùng làm state
MIN_NAMES_PER_CLUSTER = 2  # tối thiểu số mã active trong 1 cụm
SEED = 42

# DDPG 
EPOCHS       = 8
BATCH_SIZE   = 64
LR_ACTOR     = 1e-4
LR_CRITIC    = 5e-4
GAMMA        = 0.99
TAU          = 5e-3
NOISE_STD    = 0.03   # nhiễu nhẹ trên logits
HIDDEN       = 96

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
torch.manual_seed(SEED); np.random.seed(SEED)

# ===== Load artifacts từ các block trước =====
# (1) A3C signals
signals = pd.read_csv(os.path.join(SIG_DIR, "a3c_signals_infer.csv"))
signals["date"] = pd.to_datetime(signals["date"])

# (2) Giá để tính return
df_px = df_backtest.rename(columns={"timestamp": "date"}).copy()
df_px["date"] = pd.to_datetime(df_px["date"])
px_wide = df_px.pivot(index="date", columns="ticker", values="close").sort_index()
ret_wide = px_wide.pct_change().fillna(0.0)

# (3) Map ticker -> cluster (đảm bảo tương thích)
with open(os.path.join(DATA_DIR, "tensor_index.json"), "r") as f:
    tensor_index = json.load(f)
ticker2cluster = {}
for meta in tensor_index:
    for tk in meta["tickers"]:
        # nếu 1 mã gặp nhiều cụm theo thời gian, giữ cụm đầu tiên để đơn giản
        ticker2cluster.setdefault(tk, meta["cluster"])
ticker2cluster = pd.Series(ticker2cluster)

# ===== Train/Test split =====
TRAIN_START = pd.Timestamp("2023-01-01")
TRAIN_END   = pd.Timestamp("2024-12-31")
TEST_START  = pd.Timestamp("2025-01-01")
TEST_END    = ret_wide.index.max()

# ===== Chuẩn bị signal với execution lag =====
sig_wide_raw = signals.pivot_table(index="date", columns="ticker", values="signal", aggfunc="last").sort_index()
idx_all = ret_wide.index.union(sig_wide_raw.index)
ret_wide = ret_wide.reindex(idx_all).fillna(0.0)
sig_wide = sig_wide_raw.reindex(idx_all).fillna(0.0)
sig_wide_lag = sig_wide.shift(EXECUTION_LAG)  # <- execution lag

# chỉ giữ tickers có trong map cụm & có return
tickers = [t for t in ret_wide.columns if t in ticker2cluster.index]
ret_wide = ret_wide[tickers].astype("float32")
sig_wide_lag = sig_wide_lag[tickers].astype("float32")
cluster_of = ticker2cluster.loc[tickers]

# ===== Danh sách cụm và thành viên =====
clusters = sorted(cluster_of.unique().tolist())
cluster_members = {c: cluster_of[cluster_of == c].index.tolist() for c in clusters}
C = len(clusters)

# ===== Helpers =====
def build_state_arrays(ret_w, sig_lag, start, end, K=STATE_LKBK):
    """
    Xây state ở cấp cụm (long-only):
      - activity[c,t] = tỷ lệ mã trong cụm c có signal>0 tại ngày t
      - cluster_ret[c,t] = mean return của các mã active (signal>0) trong cụm c tại ngày t
    Trả ra:
      S_mat: (T, 2*C*K)  -> [activity(K ngày), cluster_ret(K ngày)]
      R_mat: (T, C)      -> return cụm tại ngày t (dùng để tính reward)
      dates: index tương ứng
      ACTIVE_masks: dict[c] -> DataFrame mask active của cụm c trên khoảng thời gian này
    """
    R = ret_w.loc[start:end]
    S = sig_lag.loc[start:end]
    dates = R.index

    # tính activity và return cụm từng ngày, theo cụm
    act_cols, ret_cols = [], []
    ACTIVE_masks = {}

    for c in clusters:
        tks = cluster_members[c]
        if not tks:
            # tạo cột 0 nếu cụm rỗng (hiếm)
            act_c = pd.Series(0.0, index=dates, name=c)
            ret_c = pd.Series(0.0, index=dates, name=c)
            ACTIVE_masks[c] = pd.DataFrame(0.0, index=dates, columns=tks)
        else:
            S_c = S[tks]
            R_c = R[tks]

            active_mask = (S_c > 0).astype("float32")  # long-only
            ACTIVE_masks[c] = active_mask

            denom = active_mask.sum(axis=1).replace(0, np.nan)
            w = active_mask.div(denom, axis=0)  # chia đều trong các mã active
            w = w.fillna(0.0)

            act_c = (active_mask.mean(axis=1)).astype("float32")  # % mã active
            ret_c = ((R_c * w).sum(axis=1)).astype("float32")     # mean ret của mã active

        act_cols.append(act_c.rename(c))
        ret_cols.append(ret_c.rename(c))

    act_df = pd.concat(act_cols, axis=1).astype("float32")
    cret_df = pd.concat(ret_cols, axis=1).astype("float32")

    # dựng state = concat K ngày gần nhất cho activity & cluster_ret
    def stack_lookback(df, K):
        mats = []
        for k in range(K):
            mats.append(df.shift(k).fillna(0.0))
        return np.concatenate([m.values[:, :, None] for m in mats], axis=2)  # (T, C, K)

    A3 = stack_lookback(act_df, K)    # (T,C,K)
    R3 = stack_lookback(cret_df, K)   # (T,C,K)

    # Loại bỏ những ngày đầu chưa đủ lookback
    valid = np.arange(A3.shape[0]) >= (K - 1)
    A3 = A3[valid]   # (T',C,K)
    R3_look = R3[valid]
    dates2 = dates[valid]

    # Flatten state: (T', 2*C*K)
    S_mat = np.concatenate([A3.reshape(len(dates2), -1), R3_look.reshape(len(dates2), -1)], axis=1).astype("float32")
    # Reward dùng return cụm (không lookback): lấy cret_df tại ngày tương ứng
    R_mat = cret_df.loc[dates2].values.astype("float32")

    # đồng bộ ACTIVE_masks về dates2
    ACTIVE_masks = {c: ACTIVE_masks[c].loc[dates2] for c in clusters}
    return S_mat, R_mat, dates2, ACTIVE_masks

# ===== Chuẩn bị Train & Test =====
S_train, R_train, d_train, ACTIVE_train = build_state_arrays(ret_wide, sig_wide_lag, TRAIN_START, TRAIN_END, K=STATE_LKBK)
S_test,  R_test,  d_test,  ACTIVE_test  = build_state_arrays(ret_wide, sig_wide_lag, TEST_START,  TEST_END,  K=STATE_LKBK)

# ===== DDPG (long-only simplex) =====
class Actor(nn.Module):
    def __init__(self, s_dim, a_dim, hidden=HIDDEN):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(s_dim, hidden), nn.ReLU(),
            nn.Linear(hidden, hidden), nn.ReLU(),
            nn.Linear(hidden, a_dim)  # logits
        )
    def forward(self, s):
        return torch.softmax(self.net(s), dim=-1)  # long-only, sum=1

class Critic(nn.Module):
    def __init__(self, s_dim, a_dim, hidden=HIDDEN):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(s_dim + a_dim, hidden), nn.ReLU(),
            nn.Linear(hidden, hidden), nn.ReLU(),
            nn.Linear(hidden, 1)
        )
    def forward(self, s, a):
        return self.net(torch.cat([s, a], dim=-1))

class Buffer:
    def __init__(self, maxlen=20000):
        self.maxlen = maxlen; self.buf=[]
    def push(self, s,a,r,s2):
        if len(self.buf)>=self.maxlen: self.buf.pop(0)
        self.buf.append((s,a,r,s2))
    def sample(self, bs):
        n=min(bs,len(self.buf))
        idx=np.random.choice(len(self.buf), n, replace=False)
        s,a,r,s2 = zip(*[self.buf[i] for i in idx])
        return (np.array(s, np.float32), np.array(a, np.float32),
                np.array(r, np.float32).reshape(-1,1), np.array(s2, np.float32))

def step_soft_update(src, tgt, tau):
    with torch.no_grad():
        for p, tp in zip(src.parameters(), tgt.parameters()):
            tp.data.mul_(1-tau); tp.data.add_(tau*p.data)

def port_reward_longonly(w_c, r_c, prev_w_c=None):
    gross = float(np.dot(w_c, r_c))
    fee = 0.0
    # phí ở cấp cụm đã được tính ở cấp mã trong mô phỏng cuối, nên ở training giữ đơn giản để ổn định
    return gross - fee

s_dim = S_train.shape[1]; a_dim = C
actor = Actor(s_dim, a_dim).to(device)
critic = Critic(s_dim, a_dim).to(device)
t_actor = Actor(s_dim, a_dim).to(device); t_actor.load_state_dict(actor.state_dict())
t_critic= Critic(s_dim, a_dim).to(device); t_critic.load_state_dict(critic.state_dict())
optA = optim.Adam(actor.parameters(), lr=LR_ACTOR)
optC = optim.Adam(critic.parameters(), lr=LR_CRITIC)
mse = nn.MSELoss()
buf = Buffer()

# ---- Train (RAM-friendly) ----
S_tr = S_train; R_tr = R_train
prev_w = None
for ep in range(EPOCHS):
    c_loss=a_loss=0.0
    prev_w = None
    for t in range(len(S_tr)-1):
        s  = torch.from_numpy(S_tr[t]).to(device).unsqueeze(0)
        s2 = torch.from_numpy(S_tr[t+1]).to(device).unsqueeze(0)
        with torch.no_grad():
            w = actor(s).cpu().numpy()[0]
        # exploration trong simplex
        logits = np.log(w + 1e-9) + np.random.normal(0, NOISE_STD, size=a_dim)
        w_e = np.exp(logits); w_e = (w_e / w_e.sum()).astype("float32")

        r = port_reward_longonly(w_e, R_tr[t], prev_w)
        prev_w = w_e.copy()
        buf.push(S_tr[t], w_e, r, S_tr[t+1])

        if len(buf.buf) >= BATCH_SIZE:
            sb, ab, rb, s2b = buf.sample(BATCH_SIZE)
            sb  = torch.from_numpy(sb).to(device)
            ab  = torch.from_numpy(ab).to(device)
            rb  = torch.from_numpy(rb).to(device)
            s2b = torch.from_numpy(s2b).to(device)

            with torch.no_grad():
                a2 = t_actor(s2b)
                q2 = t_critic(s2b, a2)
                y  = rb + GAMMA * q2

            q  = critic(sb, ab)
            lc = mse(q, y)
            optC.zero_grad(); lc.backward(); optC.step()

            ap = actor(sb)
            la = -critic(sb, ap).mean()
            optA.zero_grad(); la.backward(); optA.step()

            step_soft_update(actor, t_actor, TAU)
            step_soft_update(critic, t_critic, TAU)

            c_loss += float(lc.item()); a_loss += float(la.item())
    print(f"[DDPG] Epoch {ep+1}/{EPOCHS} | Critic {c_loss:.4f} | Actor {a_loss:.4f}")
    gc.collect(); torch.cuda.empty_cache()

# ===== Backtest (Test) — long-only với phí theo turnover ở cấp mã =====
dates = d_test
capital = INIT_CAPITAL
portfolio_value = pd.Series(index=dates, dtype="float64")
portfolio_value.iloc[0] = capital

# stream save cluster weights để không tốn RAM
cw_path = os.path.join(OUTPUT_DIR, "cluster_weights_test.csv")
with open(cw_path, "w", newline="") as f:
    cw = csv.writer(f); cw.writerow(["date"] + [f"cluster_{c}" for c in clusters])

prev_w_ticker = pd.Series(0.0, index=tickers, dtype="float32")

for i, dt in enumerate(dates):
    # dùng state của ngày trước (action lag 1 step để chắc chắn không leak)
    s_dt = dates[i-1] if i>0 else dates[i]
    s = torch.from_numpy(S_test[i-1].astype("float32") if i>0 else S_test[i].astype("float32")).to(device).unsqueeze(0)
    with torch.no_grad():
        w_c = actor(s).cpu().numpy()[0].astype("float32")
    w_c = np.clip(w_c, 0, 1); ssum = w_c.sum(); w_c = w_c/ssum if ssum>0 else np.ones_like(w_c)/len(w_c)

    # phân bổ xuống mã: trong mỗi cụm, chia đều cho mã active (signal>0)
    w_ticker = pd.Series(0.0, index=tickers, dtype="float32")
    for j, c in enumerate(clusters):
        members = cluster_members[c]
        if not members: continue
        act_row = ACTIVE_test[c].iloc[i]  # mask của ngày dt
        valid = [tk for tk in members if (tk in act_row.index and act_row[tk] > 0)]
        if len(valid) >= MIN_NAMES_PER_CLUSTER:
            share = w_c[j] / len(valid)
            w_ticker.loc[valid] += share

    # nếu không cụm nào active -> phân bổ đều toàn thị trường (đảm bảo sum=1)
    ssum = float(w_ticker.sum())
    if ssum <= 1e-12:
        w_ticker[:] = 1.0 / len(w_ticker)
    else:
        w_ticker /= ssum

    # lưu cluster weights (stream)
    with open(cw_path, "a", newline="") as f:
        cw = csv.writer(f); cw.writerow([dt.strftime("%Y-%m-%d")] + [float(x) for x in w_c])

    # lợi nhuận ngày dt & phí turnover
    r_vec = ret_wide.loc[dt, w_ticker.index].values.astype("float32")
    turnover = float(np.sum(np.abs(w_ticker.values - prev_w_ticker.values)))
    fee = (COST_BPS/1e4) * turnover

    r_net = float(np.dot(w_ticker.values, r_vec)) - fee
    capital = capital * (1.0 + r_net)
    portfolio_value.iloc[i] = capital

    prev_w_ticker = w_ticker
    if (i % 50) == 0:
        gc.collect(); torch.cuda.empty_cache()

# ===== Benchmark (buy & hold VNINDEX) =====
print("Fetching VNINDEX for benchmark (buy & hold)...")
client = FiinSession(username="DSTC_18@fiinquant.vn", password="Fiinquant0606").login()
bench = client.Fetch_Trading_Data(
    realtime=False, tickers=BENCHMARK_TKR, fields=['close'],
    adjusted=True, by="1d", from_date=str(dates.min().date())
).get_data()
bench["date"] = pd.to_datetime(bench["timestamp"])
bench = bench.set_index("date")["close"].sort_index().reindex(dates).ffill().bfill()
bench_ret = bench.pct_change().fillna(0.0)
benchmark_value = (1 + bench_ret).cumprod() * INIT_CAPITAL

# ===== Save outputs (gọn) =====
portfolio_value.to_frame("portfolio_value").to_csv(os.path.join(OUTPUT_DIR, "portfolio_value_test.csv"))
benchmark_value.to_frame("benchmark_value").to_csv(os.path.join(OUTPUT_DIR, "benchmark_value_test.csv"))

print("✅ Done Block 10 (long-only, lag, turnover, cluster-DDPG, RAM-optimized).")


**Block 11: Thống kê kết quả và vẽ biểu đồ**

In [None]:
# Block 11 — Performance Stats & Visualization (Test + Special Period)
import os
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use("Agg")  # vẽ không cần GUI, tiết kiệm tài nguyên
import matplotlib.pyplot as plt

OUTPUT_DIR = "./backtest_ddpg/"
os.makedirs(OUTPUT_DIR, exist_ok=True)

STATS_FILE = os.path.join(OUTPUT_DIR, "stats_test.csv")

# ---------- Helpers ----------
def max_drawdown(series: pd.Series) -> float:
    # trả về số âm (tỷ lệ sụt tối đa), ví dụ -0.25 = -25%
    peak = series.cummax()
    dd = (series / peak) - 1.0
    return float(dd.min())

def compute_stats(port_val: pd.Series, bench_val: pd.Series | None = None) -> dict:
    # daily return từ equity để đảm bảo đồng nhất
    port_ret = port_val.pct_change().fillna(0.0)

    stats = {
        "Start": port_val.index.min().strftime("%Y-%m-%d"),
        "End": port_val.index.max().strftime("%Y-%m-%d"),
        "Final Value": float(port_val.iloc[-1]),
        "ROI (%)": float((port_val.iloc[-1] / port_val.iloc[0] - 1.0) * 100.0),
        "Volatility (ann %)": float(port_ret.std() * np.sqrt(252) * 100.0) if port_ret.std() > 0 else 0.0,
        "Sharpe": float((port_ret.mean() / port_ret.std()) * np.sqrt(252)) if port_ret.std() > 0 else 0.0,
        "Sortino": float((port_ret.mean() / port_ret[port_ret < 0].std()) * np.sqrt(252)) if port_ret[port_ret < 0].std() > 0 else 0.0,
        "Max Drawdown (%)": float(max_drawdown(port_val) * 100.0),
        "Hit Rate (%)": float((port_ret > 0).mean() * 100.0),
        "Days": int(len(port_ret))
    }
    if bench_val is not None and len(bench_val) > 1:
        bench_ret = bench_val.pct_change().fillna(0.0)
        stats.update({
            "Benchmark Final": float(bench_val.iloc[-1]),
            "Benchmark ROI (%)": float((bench_val.iloc[-1] / bench_val.iloc[0] - 1.0) * 100.0),
            "Excess Return (pp)": float(stats["ROI (%)"] - ((bench_val.iloc[-1] / bench_val.iloc[0] - 1.0) * 100.0))
        })
    return stats

def plot_equity(port_val: pd.Series, bench_val: pd.Series | None, title: str, path: str):
    plt.figure(figsize=(10, 6))
    plt.plot(port_val.index, port_val.values, label="Portfolio", linewidth=1.6)
    if bench_val is not None:
        plt.plot(bench_val.index, bench_val.values, label="VNINDEX (Buy&Hold)", linewidth=1.2)
    plt.title(title)
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.savefig(path, dpi=150)
    plt.close()

def plot_hist(port_ret: pd.Series, title: str, path: str):
    plt.figure(figsize=(9, 5))
    plt.hist(port_ret.dropna(), bins=50, alpha=0.8, color="#1f77b4")
    plt.title(title)
    plt.xlabel("Daily Return")
    plt.ylabel("Frequency")
    plt.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.savefig(path, dpi=150)
    plt.close()

# ---------- Load results từ Block 10 ----------
# Test equity (DDPG)
port_val = pd.read_csv(
    os.path.join(OUTPUT_DIR, "portfolio_value_test.csv"),
    index_col=0, parse_dates=True
).iloc[:, 0]
port_val = port_val.sort_index()

# Benchmark equity (Buy & Hold VNINDEX)
bench_val = pd.read_csv(
    os.path.join(OUTPUT_DIR, "benchmark_value_test.csv"),
    index_col=0, parse_dates=True
).iloc[:, 0]
bench_val = bench_val.sort_index().reindex(port_val.index).ffill().bfill()

# ---------- Stats (Test) ----------
stats_test = compute_stats(port_val, bench_val)

# Vẽ biểu đồ
plot_equity(
    port_val, bench_val,
    "Equity Curve (Test) — Portfolio vs VNINDEX",
    os.path.join(OUTPUT_DIR, "equity_test.png")
)
plot_hist(
    port_val.pct_change().fillna(0.0),
    "Daily Return Histogram (Test)",
    os.path.join(OUTPUT_DIR, "hist_test.png")
)

# ---------- Special Period Analysis ----------
special_start, special_end = pd.Timestamp("2025-03-26"), pd.Timestamp("2025-04-15")
sub_port = port_val.loc[special_start:special_end]
sub_bench = bench_val.loc[special_start:special_end]

stats_special = {}
if len(sub_port) > 1:
    stats_special = compute_stats(sub_port, sub_bench)
    plot_equity(
        sub_port, sub_bench,
        f"Equity Curve — Special Period ({special_start.date()} → {special_end.date()})",
        os.path.join(OUTPUT_DIR, "equity_special.png")
    )
    plot_hist(
        sub_port.pct_change().fillna(0.0),
        f"Daily Return Histogram — Special Period ({special_start.date()} → {special_end.date()})",
        os.path.join(OUTPUT_DIR, "hist_special.png")
    )

# ---------- Save all stats ----------
all_stats = {"test": stats_test}
if stats_special:
    all_stats["special"] = stats_special

pd.DataFrame(all_stats).T.to_csv(STATS_FILE)
print("📊 Test Stats:")
print(pd.Series(stats_test))
if stats_special:
    print("\n📊 Special Period Stats:")
    print(pd.Series(stats_special))

print(f"\n✅ Block 12 done. Stats saved to: {STATS_FILE}")
print(f"   Plots saved to: {OUTPUT_DIR}")
