# Alpha101 单股票因子研究

> 挑选 5 只不同行业的代表性个股，逐个 Alpha 因子可视化：上图 K 线 + 成交量，下图对应因子值。
>
> **选股：**
> | 代码 | 名称 | 行业 |
> |------|------|------|
> | 600519 | 贵州茅台 | 食品饮料 |
> | 000333 | 美的集团 | 白色家电 |
> | 601899 | 紫金矿业 | 工业金属 |
> | 002594 | 比亚迪 | 电动乘用车 |
> | 600276 | 恒瑞医药 | 化学制药 |

In [1]:
import sys, os
sys.path.insert(0, os.path.abspath("../.."))

import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots

from stockquant.data.database import Database
from stockquant.indicators import Alpha101Indicators
from stockquant.visualization.plot import PlotEngine

# ---------- 配置 ----------
STOCKS = {
    "600519": "贵州茅台",
    "000333": "美的集团",
    "601899": "紫金矿业",
    "002594": "比亚迪",
    "600276": "恒瑞医药",
}

# 只看最近一年的数据，图形更清晰
DATE_START = "2025-01-01"
DATE_END   = "2026-02-13"

# 要研究的 Alpha 因子
ALPHA_IDS = [1, 6, 12, 33, 41, 55, 101]

# 可视化引擎
plot_engine = PlotEngine(backend="plotly")

print(f"✅ 配置完成: {len(STOCKS)} 只股票, {len(ALPHA_IDS)} 个 Alpha 因子")

✅ 配置完成: 5 只股票, 7 个 Alpha 因子


## 1. 加载数据 & 计算 Alpha 因子

In [2]:
db = Database(read_only=True)

stock_data: dict[str, pd.DataFrame] = {}
indicator = Alpha101Indicators(alphas=ALPHA_IDS)

for code, name in STOCKS.items():
    df = db.query(
        "SELECT * FROM daily_bars WHERE code = ? AND date >= ? AND date <= ? ORDER BY date",
        [code, DATE_START, DATE_END],
    )
    if df.empty:
        print(f"⚠️ {code} {name} 无数据，跳过")
        continue

    df["date"] = pd.to_datetime(df["date"])
    df = df.set_index("date").sort_index()

    # 计算 Alpha 因子
    df = indicator.compute(df)

    stock_data[code] = df
    alpha_cols = [c for c in df.columns if c.startswith("alpha")]
    print(f"✅ {code} {name}: {len(df)} 根K线, 计算了 {len(alpha_cols)} 个因子")

db.close()
print(f"\n共加载 {len(stock_data)} 只股票")

[32m2026-02-20 09:25:30.872[0m | [1mINFO    [0m | [36mstockquant.data.database[0m:[36mconn[0m:[36m46[0m | [1m已连接 DuckDB: /workspaces/stockQuant/stockquant/data/db/stockquant.duckdb[0m
[32m2026-02-20 09:25:30.897[0m | [1mINFO    [0m | [36mstockquant.data.database[0m:[36mconn[0m:[36m46[0m | [1m已连接 DuckDB: /workspaces/stockQuant/stockquant/data/db/stockquant.duckdb[0m
[32m2026-02-20 09:25:30.922[0m | [1mINFO    [0m | [36mstockquant.indicators.alpha101.alpha101[0m:[36m_auto_load_stock_info[0m:[36m348[0m | [1m已从 stock_info 加载行业数据: 1 个行业[0m
[32m2026-02-20 09:25:30.926[0m | [1mINFO    [0m | [36mstockquant.indicators.alpha101.alpha101[0m:[36m_auto_load_stock_info[0m:[36m360[0m | [1m已从 stock_info 加载流通市值数据[0m
[32m2026-02-20 09:25:30.973[0m | [1mINFO    [0m | [36mstockquant.data.database[0m:[36mconn[0m:[36m46[0m | [1m已连接 DuckDB: /workspaces/stockQuant/stockquant/data/db/stockquant.duckdb[0m
[32m2026-02-20 09:25:30.979[0m | [1mINFO    

✅ 600519 贵州茅台: 273 根K线, 计算了 7 个因子
✅ 000333 美的集团: 273 根K线, 计算了 7 个因子
✅ 601899 紫金矿业: 273 根K线, 计算了 7 个因子


[32m2026-02-20 09:25:31.057[0m | [1mINFO    [0m | [36mstockquant.data.database[0m:[36mconn[0m:[36m46[0m | [1m已连接 DuckDB: /workspaces/stockQuant/stockquant/data/db/stockquant.duckdb[0m
[32m2026-02-20 09:25:31.069[0m | [1mINFO    [0m | [36mstockquant.indicators.alpha101.alpha101[0m:[36m_auto_load_stock_info[0m:[36m348[0m | [1m已从 stock_info 加载行业数据: 1 个行业[0m
[32m2026-02-20 09:25:31.070[0m | [1mINFO    [0m | [36mstockquant.indicators.alpha101.alpha101[0m:[36m_auto_load_stock_info[0m:[36m360[0m | [1m已从 stock_info 加载流通市值数据[0m
[32m2026-02-20 09:25:31.116[0m | [1mINFO    [0m | [36mstockquant.data.database[0m:[36mconn[0m:[36m46[0m | [1m已连接 DuckDB: /workspaces/stockQuant/stockquant/data/db/stockquant.duckdb[0m
[32m2026-02-20 09:25:31.131[0m | [1mINFO    [0m | [36mstockquant.indicators.alpha101.alpha101[0m:[36m_auto_load_stock_info[0m:[36m348[0m | [1m已从 stock_info 加载行业数据: 1 个行业[0m


[32m2026-02-20 09:25:31.135[0m | [1mINFO    [0m | [36mstockquant.indicators.alpha101.alpha101[0m:[36m_auto_load_stock_info[0m:[36m360[0m | [1m已从 stock_info 加载流通市值数据[0m


✅ 002594 比亚迪: 273 根K线, 计算了 7 个因子
✅ 600276 恒瑞医药: 273 根K线, 计算了 7 个因子

共加载 5 只股票


## 2. 可视化工具函数

每个 Alpha 因子画一组图，每只股票一行，包含：
- **上方**：K 线图 + 成交量柱状图
- **下方**：Alpha 因子值曲线（带零轴线）

In [3]:
# Alpha 因子公式描述
ALPHA_DESC = {
    1:   "rank(Ts_ArgMax(SignedPower(cond, 2), 5)) − 0.5",
    6:   "−1 × correlation(open, volume, 10)",
    12:  "sign(delta(volume, 1)) × (−1 × delta(close, 1))",
    33:  "rank(−(1 − open / close))",
    41:  "(high × low)^0.5 − vwap",
    55:  "−corr(rank((close−ts_min(low,12))/(ts_max(high,12)−ts_min(low,12))), rank(volume), 6)",
    101: "(close − open) / ((high − low) + 0.001)",
}

# 配色
COLORS = ["#e74c3c", "#3498db", "#2ecc71", "#f39c12", "#9b59b6"]

def plot_alpha_group(alpha_id: int):
    """为一个 Alpha 因子画出 5 只股票的 K 线 + 成交量 + 因子值对比图。

    每只股票 3 行: K线 | 成交量 | Alpha 因子值。
    K 线 + 成交量通过 PlotEngine.build_kline_traces() 复用已有绘图逻辑。
    """
    col = f"alpha{alpha_id:03d}"
    n = len(stock_data)
    rows_per_stock = 3  # K线, 成交量, Alpha

    fig = make_subplots(
        rows=n * rows_per_stock, cols=1,
        shared_xaxes=True,
        vertical_spacing=0.012,
        row_heights=[3, 1, 1.5] * n,
        subplot_titles=[
            item
            for code in stock_data
            for item in (f"{STOCKS[code]}({code}) K线", "成交量", f"Alpha#{alpha_id}")
        ],
    )

    for i, (code, df) in enumerate(stock_data.items()):
        row_candle = i * rows_per_stock + 1
        row_volume = i * rows_per_stock + 2
        row_alpha  = i * rows_per_stock + 3
        color = COLORS[i % len(COLORS)]
        dates = df.index

        # ---- K 线 + 成交量: 复用 PlotEngine ----
        kline_df = df.reset_index().rename(columns={df.index.name or "index": "date"})
        kline_traces = plot_engine.build_kline_traces(kline_df, name=STOCKS[code])
        for trace, target in kline_traces:
            row = row_candle if target == "candle" else row_volume
            fig.add_trace(trace, row=row, col=1)

        # ---- Alpha 因子 ----
        alpha_vals = df[col] if col in df.columns else pd.Series(np.nan, index=dates)

        fig.add_trace(
            go.Scatter(
                x=dates, y=alpha_vals,
                mode="lines",
                line=dict(color=color, width=1.2),
                fill="tozeroy",
                fillcolor=f"rgba({int(color[1:3],16)},{int(color[3:5],16)},{int(color[5:7],16)},0.15)",
                name=f"Alpha#{alpha_id}",
                showlegend=(i == 0),
            ),
            row=row_alpha, col=1,
        )

        # 零轴线
        fig.add_hline(y=0, line_dash="dot", line_color="gray",
                      line_width=0.8, row=row_alpha, col=1)

    desc = ALPHA_DESC.get(alpha_id, "")
    fig.update_layout(
        title=dict(
            text=f"Alpha#{alpha_id}  —  {desc}",
            font=dict(size=16),
        ),
        height=320 * n,
        width=1100,
        template="plotly_white",
        xaxis_rangeslider_visible=False,
        showlegend=True,
        margin=dict(l=60, r=30, t=60, b=30),
    )

    # 隐藏所有 rangeslider
    for ax_key in [k for k in fig.layout.to_plotly_json() if k.startswith("xaxis")]:
        fig.layout[ax_key]["rangeslider"] = dict(visible=False)

    fig.show()

print("✅ 绑定可视化函数 plot_alpha_group()")

✅ 绑定可视化函数 plot_alpha_group()


## 3. 因子分布概览

先看一下各因子在 5 只股票上的基本统计量。

In [4]:
rows = []
for code, df in stock_data.items():
    for aid in ALPHA_IDS:
        col = f"alpha{aid:03d}"
        if col in df.columns:
            s = df[col].dropna()
            rows.append({
                "股票": f"{STOCKS[code]}({code})",
                "Alpha": f"#{aid}",
                "均值": f"{s.mean():.4f}",
                "标准差": f"{s.std():.4f}",
                "最小值": f"{s.min():.4f}",
                "最大值": f"{s.max():.4f}",
                "有效率": f"{len(s)/len(df)*100:.1f}%",
            })

summary = pd.DataFrame(rows)
summary.style.set_properties(**{"text-align": "right"})

Unnamed: 0,股票,Alpha,均值,标准差,最小值,最大值,有效率
0,贵州茅台(600519),#1,0.5,0.0,0.5,0.5,98.9%
1,贵州茅台(600519),#6,0.0925,0.3692,-0.7377,0.7637,98.5%
2,贵州茅台(600519),#12,-9.902,90.238,-572.98,346.11,99.6%
3,贵州茅台(600519),#33,1.0,0.0,1.0,1.0,100.0%
4,贵州茅台(600519),#41,-138372.7326,5855.5977,-154836.8393,-124804.625,100.0%
5,贵州茅台(600519),#55,,,,,0.0%
6,贵州茅台(600519),#101,-0.0822,0.5265,-1.0,0.9508,100.0%
7,美的集团(000333),#1,0.5,0.0,0.5,0.5,99.3%
8,美的集团(000333),#6,0.0701,0.3573,-0.78,0.8407,98.5%
9,美的集团(000333),#12,-0.0763,3.6128,-11.66,23.18,99.6%


## 4. 逐因子可视化

每个 Alpha 因子一组图，5 只股票纵向排列：上方 K 线，下方因子值。

### Alpha #1 — 波动率/价格条件排名因子
`rank(Ts_ArgMax(SignedPower(cond, 2), 5)) − 0.5`
- 当收益率 < 0 时使用波动率，否则使用收盘价，取有符号平方后找 5 日窗口内最大值位置

In [5]:
plot_alpha_group(1)

### Alpha #6 — 开盘价与成交量反相关因子
`−1 × correlation(open, volume, 10)`
- 10 日滚动窗口内 open 与 volume 的相关系数取反；当价量背离时信号较强

In [6]:
plot_alpha_group(6)

### Alpha #12 — 量价动量因子
`sign(delta(volume, 1)) × (−1 × delta(close, 1))`
- 量增价跌为正、量增价涨为负，捕捉量价背离的短期反转信号

In [7]:
plot_alpha_group(12)

### Alpha #33 — 开收盘价比率排名因子
`rank(−(1 − open / close))`
- 当日开盘价相对收盘价越低（阳线越长），因子值越大

In [8]:
plot_alpha_group(33)

### Alpha #41 — 几何均价偏离因子
`(high × low)^0.5 − vwap`
- 最高/最低价的几何均值与 VWAP 之差；当日内波动偏离均价时信号明显

In [9]:
plot_alpha_group(41)

### Alpha #55 — 价位与量排名反相关因子
`−corr(rank((close−ts_min(low,12))/(ts_max(high,12)−ts_min(low,12))), rank(volume), 6)`
- 衡量 12 日内价位高低与成交量排名的 6 日相关性，取反

In [10]:
plot_alpha_group(55)

### Alpha #101 — 日内涨跌幅占比因子
`(close − open) / ((high − low) + 0.001)`
- 实体部分占整个振幅的比例；阳线越长、上下影线越短，值越接近 1

In [11]:
plot_alpha_group(101)

## 5. 因子相关性矩阵

看看这几个因子在同一只股票上是否高度相关（若相关则存在信息冗余）。

In [12]:
import plotly.express as px

alpha_cols = [f"alpha{aid:03d}" for aid in ALPHA_IDS]

fig = make_subplots(
    rows=1, cols=len(stock_data),
    subplot_titles=[f"{STOCKS[c]}({c})" for c in stock_data],
    horizontal_spacing=0.04,
)

for idx, (code, df) in enumerate(stock_data.items()):
    corr = df[alpha_cols].corr()
    fig_heat = go.Heatmap(
        z=corr.values,
        x=[f"#{aid}" for aid in ALPHA_IDS],
        y=[f"#{aid}" for aid in ALPHA_IDS],
        colorscale="RdBu_r",
        zmin=-1, zmax=1,
        showscale=(idx == len(stock_data) - 1),
        text=corr.round(2).values,
        texttemplate="%{text}",
        textfont=dict(size=9),
    )
    fig.add_trace(fig_heat, row=1, col=idx + 1)

fig.update_layout(
    title="各股票 Alpha 因子相关性矩阵",
    height=400,
    width=1100,
    template="plotly_white",
)
fig.show()