# Finlab x FinMind 處置股事件研究系統

本 Notebook 旨在建立一個自動化的處置股事件驅動 (Event-Driven) 分析流程，整合 Finlab 處置公告與 FinMind 股價資料，產出可用於回測與統計分析的標準化數據。

### 核心功能
1. **資料整合**：自動對接 Finlab (處置資訊) 與 FinMind (股價) API。
2. **智慧分級**：實作連續處置判斷邏輯 (Strict Overlap)，自動識別第 1, 2...N 次處置事件。
3. **動態標籤**：產生時間軸標籤 `s+N` (處置開始/期間) 與 `e+N` (處置結束後)，包含解禁日 `e+0`。
4. **雙軌輸出**：
   - **Wide Format (`disposal_df_wide.csv`)**：訊號表 (Signal Table)，不含價格，專供回測系統產生交易訊號。
   - **Long Format (`disposal_df_long.csv`)**：分析表 (Analysis Table)，含完整價量與報酬率，專供統計研究與視覺化。

### 使用流程
- **Step 1**: 抓取處置公告。
- **Step 1.5**: 執行前處理 (分級與濾網)。
- **Step 2**: 平行化抓取處置期間股價。
- **Step 3**: 執行 Event Study 轉換，產出 Wide/Long CSV。

In [32]:
# [Env Setup] 載入必要套件與設定
import pandas as pd
from tqdm import tqdm
from loguru import logger
import sys
import os
import gc
import matplotlib.pyplot as plt
import seaborn as sns

%load_ext autoreload
%autoreload 2
sys.path.append("/Users/xinc./Documents/GitHub/note")
sys.path.append(os.getcwd()) # 加入目前路徑以匯入 utils

from module.get_info_FinMind import FinMindClient, FinMindConfig
from module.get_info_Finlab import FinlabClient
from module.plot_func import plot
from utils import batch_fetch_prices, run_event_study, process_disposal_events

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [33]:
# [Step 1] 抓取 Finlab 處置股資料
# 若需要 Token，請在初始化時傳入，例如 FinlabClient(token="YOUR_TOKEN")
finlab_client = FinlabClient()
print("Fetching disposal information from Finlab...")

# 抓取資料 (設定較大的範圍以確保涵蓋需求)
finlab_disposal = finlab_client.get_data("disposal_information", start_date='2018-01-01')

# [Manual Filter] 手動篩選日期 (修正 Finlab API 時間過濾限制)
if not finlab_disposal.empty:
    finlab_disposal['date'] = pd.to_datetime(finlab_disposal['date'])
    finlab_disposal = finlab_disposal[finlab_disposal['date'] >= '2018-01-01']
    print(f"Fetched {len(finlab_disposal)} records from Finlab.")
    print(f"Data Range: {finlab_disposal['date'].min()} to {finlab_disposal['date'].max()}")
else:
    print("No data fetched from Finlab.")

Fetching disposal information from Finlab...
Fetched 3383 records from Finlab.
Data Range: 2018-01-04 00:00:00 to 2025-09-26 00:00:00


In [34]:
# [Step 1.5] 前處理與分級 (Preprocessing)
# 這一步會先將 Finlab 處置資料整理格式，並標記 First/Second Disposal
if 'finlab_disposal' in locals() and not finlab_disposal.empty:
    print("Processing disposal events...")
    processed_disposal = process_disposal_events(finlab_disposal)
    
    # print(f"Processed Data Shape: {processed_disposal.shape}")
    # display(processed_disposal.head())
    
    # Optional: Save for inspection
    processed_disposal.to_csv('processed_disposal_events.csv', index=False, encoding='utf-8-sig')
else:
    print("Finlab data not available. Please run Step 1 first.")
    processed_disposal = pd.DataFrame()

Processing disposal events...
Columns before processing: ['Stock_id', 'date', '證券名稱', 'condition', '處置措施', '處置內容', 'event_start_date', 'event_end_date', 'interval', 'key_date']
Processed 3383 events.
Level Distribution:
disposal_level
1     2702
2      453
3      110
4       36
5       17
6        8
7        6
8        5
9        5
10       4
11       4
12       4
13       3
14       3
15       3
16       3
17       3
18       3
19       3
20       2
21       2
22       2
23       1
24       1
Name: count, dtype: int64


In [35]:
# [Step 2] 平行化抓取股價 (FinMind)
# 使用處理過的事件表 (processed_disposal) 以確保連續處置區間不遺漏
logger.remove()
logger.add(sys.stderr, level="WARNING")

# Initialize FinMind Client
fm_client = FinMindClient()

offset_days = 5

if 'processed_disposal' in locals() and not processed_disposal.empty:
    # 開始抓取
    price_df = batch_fetch_prices(fm_client, processed_disposal, offset_days=offset_days, max_workers=10)

    if not price_df.empty:
        print(f"Fetched Price Data Shape: {price_df.shape}")
        display(price_df.head())
    else:
        print("No price data fetched.")
else:
    print("No processed disposal data found. Please run Step 1.5.")

gc.collect()

Using pre-processed columns 'event_start_date' and 'event_end_date'.
Starting batch fetch for 1317 stocks with 10 workers...


Fetching Prices: 100%|██████████| 1317/1317 [00:39<00:00, 33.57it/s]

Fetched total 37289 rows.
Fetched Price Data Shape: (37289, 8)





Unnamed: 0,Date,Stock_id,Open,High,Low,Close,Volume,TradingAmount
0,2021-07-02,30001,24.5,26.8,22.1,26.6,16000,409800
1,2021-07-05,30001,26.6,26.6,25.5,26.0,4000,104100
2,2021-07-06,30001,27.0,28.0,27.0,27.0,8000,218100
3,2021-07-07,30001,27.0,27.0,25.3,25.3,16000,421800
4,2021-07-08,30001,25.3,25.3,25.0,25.3,8000,201200


602

In [36]:
# [Step 3] 執行 Event Study 分析
# 使用 processed_disposal，其中已經包含 is_first_disposal 等標記
disposal_wide, disposal_long = run_event_study(price_df, processed_disposal, offset_days=offset_days)

if not disposal_wide.empty:
    print(f"Wide Format Shape: {disposal_wide.shape}")
    print(f"Long Format Shape: {disposal_long.shape}")
    
    print("\n[Wide Head]")
    display(disposal_wide.head())
    
    print("\n[Long Head]")
    display(disposal_long.head())
    
    # Save both files
    disposal_wide.to_csv('disposal_df_wide.csv', index=False, encoding='utf-8-sig')
    disposal_long.to_csv('disposal_df_long.csv', index=False, encoding='utf-8-sig')
    print("Saved 'disposal_df_wide.csv' and 'disposal_df_long.csv'.")
    
    # For compatibility, save wide as original name too
    disposal_wide.to_csv('disposal_df.csv', index=False, encoding='utf-8-sig')
else:
    print("Analysis returned empty DataFrame.")

Detected Disposal Levels: [np.int64(1), np.int64(2), np.int64(3), np.int64(4), np.int64(5), np.int64(6), np.int64(7), np.int64(8), np.int64(9), np.int64(10), np.int64(11), np.int64(12), np.int64(13), np.int64(14), np.int64(15), np.int64(16), np.int64(17), np.int64(18), np.int64(19), np.int64(20), np.int64(21), np.int64(22), np.int64(23)]
Converting to Wide Format...
Analysis completed. Wide shape: (36797, 186), Long shape: (43844, 48)
Wide Format Shape: (36797, 186)
Long Format Shape: (43844, 48)

[Wide Head]


Unnamed: 0,Date,Stock_id,t_label_first,condition_first,interval_first,event_start_date_first,event_end_date_first,relative_day_first,gap_days_first,calendar_relative_day_first,...,gap_days_level_22,calendar_relative_day_level_22,t_label_level_23,condition_level_23,interval_level_23,event_start_date_level_23,event_end_date_level_23,relative_day_level_23,gap_days_level_23,calendar_relative_day_level_23
0,2020-03-23,00642U,s-4,監視業務督導會報決議,5.0,2020-03-27,2020-04-13,-4.0,0.0,-4.0,...,,,,,,NaT,NaT,,,
1,2020-03-24,00642U,s-3,監視業務督導會報決議,5.0,2020-03-27,2020-04-13,-3.0,0.0,-3.0,...,,,,,,NaT,NaT,,,
2,2020-03-25,00642U,s-2,監視業務督導會報決議,5.0,2020-03-27,2020-04-13,-2.0,0.0,-2.0,...,,,,,,NaT,NaT,,,
3,2020-03-26,00642U,s-1,監視業務督導會報決議,5.0,2020-03-27,2020-04-13,-1.0,0.0,-1.0,...,,,,,,NaT,NaT,,,
4,2020-03-27,00642U,s+0,監視業務督導會報決議,5.0,2020-03-27,2020-04-13,0.0,0.0,0.0,...,,,,,,NaT,NaT,,,



[Long Head]


Unnamed: 0,Date,Stock_id,Open,High,Low,Close,Volume,TradingAmount,trading_idx,prev_trade_date,...,t_label_level_15,t_label_level_16,t_label_level_17,t_label_level_18,t_label_level_19,t_label_level_20,t_label_level_21,t_label_level_22,t_label_level_23,daily_ret
0,2020-03-23,00642U,9.3,10.1,9.2,10.1,56754750,552693946,0,NaT,...,,,,,,,,,,0.086022
12,2020-03-24,00642U,10.32,10.57,10.17,10.36,42796200,444039684,1,2020-03-23,...,,,,,,,,,,0.003876
24,2020-03-25,00642U,10.55,10.59,10.44,10.46,39627340,416239309,2,2020-03-24,...,,,,,,,,,,-0.008531
36,2020-03-26,00642U,10.44,10.46,10.11,10.33,25842235,264724135,3,2020-03-25,...,,,,,,,,,,-0.010536
48,2020-03-27,00642U,10.24,10.24,10.1,10.17,20479766,207667697,4,2020-03-26,...,,,,,,,,,,-0.006836


Saved 'disposal_df_wide.csv' and 'disposal_df_long.csv'.


In [37]:
# [Step 4] 最終篩選 (Final Filter: Common Stocks Only)
# 應用篩選邏輯：只保留代碼長度為 4 且非 00 開頭的股票 (只保留股票)

def is_common_stock(stock_id):
    sid = str(stock_id)
    return len(sid) == 4 and not sid.startswith('00')

if 'disposal_wide' in locals() and not disposal_wide.empty:
    print("Filtering Final Output for Common Stocks Only...")
    
    # Filter Wide Format
    mask_wide = disposal_wide['Stock_id'].apply(is_common_stock)
    final_wide = disposal_wide[mask_wide].copy()
    
    # Filter Long Format
    mask_long = disposal_long['Stock_id'].apply(is_common_stock)
    final_long = disposal_long[mask_long].copy()
    
    print(f"Wide Format: {len(disposal_wide)} -> {len(final_wide)} rows")
    print(f"Long Format: {len(disposal_long)} -> {len(final_long)} rows")
    
    # Save filtered versions
    final_wide.to_csv('disposal_df_wide.csv', index=False, encoding='utf-8-sig')
    final_long.to_csv('disposal_df_long.csv', index=False, encoding='utf-8-sig')
    print("Saved 'disposal_df_wide.csv' and 'disposal_df_long.csv'.")
    
    # Preview
    display(final_wide.head())
else:
    print("Output dataframes not found. Please run Step 3 first.")

Filtering Final Output for Common Stocks Only...
Wide Format: 36797 -> 34765 rows
Long Format: 43844 -> 40532 rows
Saved 'disposal_df_wide.csv' and 'disposal_df_long.csv'.


Unnamed: 0,Date,Stock_id,t_label_first,condition_first,interval_first,event_start_date_first,event_end_date_first,relative_day_first,gap_days_first,calendar_relative_day_first,...,gap_days_level_22,calendar_relative_day_level_22,t_label_level_23,condition_level_23,interval_level_23,event_start_date_level_23,event_end_date_level_23,relative_day_level_23,gap_days_level_23,calendar_relative_day_level_23
890,2020-07-16,1213,s-3,連續三次,,2020-07-21,2020-08-03,-3.0,0.0,-5.0,...,,,,,,NaT,NaT,,,
891,2020-07-17,1213,s-2,連續三次,,2020-07-21,2020-08-03,-2.0,0.0,-4.0,...,,,,,,NaT,NaT,,,
892,2020-07-20,1213,s-1,連續三次,,2020-07-21,2020-08-03,-1.0,2.0,-1.0,...,,,,,,NaT,NaT,,,
893,2020-07-21,1213,s+0,連續三次,,2020-07-21,2020-08-03,0.0,0.0,0.0,...,,,,,,NaT,NaT,,,
894,2020-07-22,1213,s+1,連續三次,,2020-07-21,2020-08-03,1.0,0.0,1.0,...,,,,,,NaT,NaT,,,
