基于我们刚才讨论的 Python 代码（微观逐笔数据）、TradingView 脚本（反转信号）以及 Coinglass 清算地图（宏观势能），可以提炼出一套非常硬核的**“微观博弈因子挖掘方法论”**。

这套方法论不同于传统的“量价时空”技术分析，它更接近于**市场微观结构（Market Microstructure）**与**行为金融学**的结合。

我们可以将其总结为  **“D-S-G-N” 四层金字塔框架** ：

---

### 第一层：数据重构层 (D - Data Reconstruction)

**核心思维：不要只看 K 线（Summary），要看事件（Events）和状态（States）。**

K 线是高度压缩的信息（丢失了谁在买、怎么买、花了多少钱）。在这套逻辑里，我们必须“打开黑盒”：

1. **拆解原子单位** ：

* **Tick Flow** ：将时间序列拆解为“逐笔资金流”。
* **Side Classification** ：不仅是买/卖，更是 **Maker（挂单者/提供流动性）** vs  **Taker（吃单者/消耗流动性）** 。
* **Semantic Tagging（语义打标）** ：这是最关键的一步。代码中的 `_dynamic_bucketing` 就是在做这个。
  * 大单 vs 小单（机构 vs 散户）
  * 爆仓单 vs 主动单（被动止损 vs 主动进攻）
  * 开仓 vs 平仓（新增博弈 vs 认输离场）

> **方法论沉淀** ：因子挖掘的第一步，不是找公式，而是给原始数据 **打标签** 。如果你的数据源里没有“大单”、“爆仓”、“高杠杆”这些标签，后续模型就是垃圾进垃圾出。

---

### 第二层：博弈逻辑层 (S - Structural Logic)

**核心思维：市场运动是掠夺流动性的过程，需引入物理学与生物学隐喻。**

基于 Coinglass 地图和 Python 代码中的 `derived_features`，我们可以归纳出三种核心的市场驱动力模型：

1. **燃料模型 (Fuel / Breakout)**
   * **逻辑** ：清算量 = 燃料。价格冲破关键位 **$\to$** 触发止损/爆仓 **$\to$** 强制市价买入 **$\to$** 价格进一步推升。
   * **因子体现** ：`short_liq_volume`（空头爆仓量）。当它急剧放大时，是动量策略的最佳入场点。
2. **磁力模型 (Magnet / Mean Reversion)**
   * **逻辑** ：大额清算簇 = 磁铁。做市商（MM）和高频算法倾向于把价格推向流动性密集的区域，以便完成大量换手。
   * **因子体现** ：`Distance_to_Max_Liq`（距离最大清算簇的距离）。距离越近，引力越强。
3. **压力模型 (Pressure / Divergence)**
   * **逻辑** ：大户与散户的背离。如果散户在疯狂买入（小单 Count 激增），但价格涨不动，说明大户在悄悄出货（大单 Sum 为负）。
   * **因子体现** ：`Whale_Retail_Ratio`（鲸鱼散户比）、`Net_Burn_Volume`。

> **方法论沉淀** ：优秀的因子必须能用一句话解释清楚它的 **博弈含义** （例如：“这是庄家在诱多”或“这是空头被打爆了”）。

---

### 第三层：时序特征层 (G - Geometric & Statistical)

**核心思维：将非平稳的“金额”转化为平稳的“信号”。**

原始的金额（如“1亿美金爆仓”）是没法直接用的，因为随着币价上涨，1亿美金的意义会变。必须进行数学变换，这在 Python 代码的 `_cross_dimensional_mining` 中体现得淋漓尽致：

1. **相对化 (Relativization)** ：

* 不要用绝对值，要用**分位数 (Quantile)** 或  **Z-Score** 。
* *思维* ：现在发生的这件事，在过去 24 小时或 7 天里，有多罕见？

1. **形态化 (Morphology)** ：

* 利用统计力矩：**偏度 (Skewness)** 和  **峰度 (Kurtosis)** 。
* *思维* ：市场是均匀的博弈（Skew **$\approx$** 0），还是单边的屠杀（Skew >> 0）？

1. **衰减与记忆 (Decay & Memory)** ：

* 爆仓的影响是短期的。因子计算必须包含时间衰减（Time Decay），或者使用滚动窗口（Rolling Window）。

> **方法论沉淀** ：因子的数学形式应该是  **$Signal = \frac{Current - Baseline}{Volatility}$** （即信噪比的形式）。

---

### 第四层：反身性验证层 (N - Narrative Verification)

**核心思维：因子有效的前提是市场结构未变，需警惕“拥挤”。**

这是 TradingView 脚本中 `timeoutBars` 和 `Supertrend` 确认逻辑体现的智慧：

1. **事件 + 确认 (Setup + Trigger)** ：

* 单纯的“爆仓”只是一个 Setup（准备状态），它可能是反转，也可能是中继。
* 必须加上 Trigger（触发器），如 TV 脚本中的“Supertrend 翻转”。
* *公式* ： **高赔率交易 = 极端微观结构异常 + 宏观趋势确认** 。

1. **防止过拟合 (Anti-Overfitting)** ：

* 使用动态阈值（Rolling Quantile）而不是固定数值（如 >100BTC）。
* 确保逻辑在不同币种、不同时间周期（牛熊市）都说得通。

---

### 总结：一套“狩猎者”思维框架

如果把这套方法论浓缩成一张清单，当你要挖掘新因子时，请按顺序问自己：

1. **标签 (Label)** ：我能不能把市场里的**每一笔钱**分类？（这是聪明的钱、愚蠢的钱、还是被迫的钱？）
2. **位置 (Location)** ：这些钱在什么位置（价格带）最痛苦？（结合 Coinglass 地图）
3. **异常 (Anomaly)** ：现在的行为是不是 **统计学上的异常** ？（Z-Score > 3?）
4. **对手 (Opponent)** ：谁在这个异常中获利了？（多头爆仓 **$\to$** 空头获利 **$\to$** 空头平仓买入 **$\to$** 潜在反弹）。

**这套框架的本质，是利用“公开的市场透明度”（链上数据、逐笔成交、订单簿），去捕捉人性在极端压力下的“非理性行为”。**

In [6]:
import pandas as pd
import numpy as np
from pathlib import Path
from scipy.optimize import minimize
import time
import talib as ta
from enum import Enum
import re

import pandas as pd
import numpy as np
from pathlib import Path
from scipy.optimize import minimize
import time
import talib as ta
from enum import Enum
import re
import os
import pandas as pd
import numpy as np
from typing import Dict, List, Optional, Tuple, Any
from datetime import datetime, timedelta

import sys
import matplotlib.pyplot as plt
from scipy.stats import zscore, kurtosis, skew, yeojohnson, boxcox
from scipy.stats import tukeylambda, mstats
from sklearn.preprocessing import RobustScaler
import zipfile
from io import BytesIO

class DataFrequency(Enum):
    """数据频率枚举"""
    MONTHLY = 'monthly'  # 月度数据
    DAILY = 'daily'      # 日度数据


def _generate_date_range(start_date: str, end_date: str, read_frequency: DataFrequency = DataFrequency.MONTHLY) -> List[str]:
    """
    生成日期范围列表
    
    参数:
    start_date: 起始日期
        - 月度格式: 'YYYY-MM' (如 '2020-01') 或 'YYYY-MM-DD' (自动转换为 'YYYY-MM')
        - 日度格式: 'YYYY-MM-DD' (如 '2020-01-01')
    end_date: 结束日期，格式同上
    frequency: 数据频率（月度或日度）
    
    返回:
    日期字符串列表
    """
    if read_frequency == DataFrequency.MONTHLY:
        # 兼容 'YYYY-MM' 和 'YYYY-MM-DD' 两种格式
        # 如果是 'YYYY-MM-DD' 格式，自动截取为 'YYYY-MM'
        new_start_date = start_date
        new_end_date = end_date
        if len(start_date) == 10:  # 'YYYY-MM-DD' 格式
            new_start_date = start_date[:7]
        if len(end_date) == 10:
            new_end_date = end_date[:7]
            
        start_dt = datetime.strptime(new_start_date, '%Y-%m')
        end_dt = datetime.strptime(new_end_date, '%Y-%m')
        
        date_list = []
        current_dt = start_dt
        while current_dt <= end_dt:
            date_list.append(current_dt.strftime('%Y-%m'))
            # 移动到下一个月
            if current_dt.month == 12:
                current_dt = current_dt.replace(year=current_dt.year + 1, month=1)
            else:
                current_dt = current_dt.replace(month=current_dt.month + 1)
        
        return date_list
    
    elif read_frequency == DataFrequency.DAILY:
        start_dt = datetime.strptime(start_date, '%Y-%m-%d')
        end_dt = datetime.strptime(end_date, '%Y-%m-%d')
        
        date_list = []
        current_dt = start_dt
        while current_dt <= end_dt:
            date_list.append(current_dt.strftime('%Y-%m-%d'))
            current_dt += timedelta(days=1)
        
        return date_list
    
    else:
        raise ValueError(f"不支持的数据频率: {frequency}")

In [6]:
# start_date = '2025-01-01'
# end_date = '2025-11-01'
# read_frequency = DataFrequency.MONTHLY
# date_range_list = _generate_date_range(start_date=start_date, end_date=end_date, read_frequency=read_frequency)
# date_range_list

['2025-01',
 '2025-02',
 '2025-03',
 '2025-04',
 '2025-05',
 '2025-06',
 '2025-07',
 '2025-08',
 '2025-09',
 '2025-10',
 '2025-11']

处理多空数据
/Users/aming/data/ETHUSDT

takerlongshortRatio
topLongShortPositionRatio
topLongShortAccountRatio

大户的多头和空头总持仓量占比，大户指保证金余额排名前20%的用户。 
多仓持仓量比例 = 大户多仓持仓量 / 大户总持仓量 
空仓持仓量比例 = 大户空仓持仓量 / 大户总持仓量 
多空持仓量比值 = 多仓持仓量比例 / 空仓持仓量比例

topLongShortPositionRatio

https://developers.binance.com/docs/zh-CN/derivatives/usds-margined-futures/market-data/rest-api/Top-Trader-Long-Short-Ratio

{ 
         "symbol":"BTCUSDT",
	      "longShortRatio":"1.4342",// 大户多空持仓量比值
	      "longAccount": "0.5344", // 大户多仓持仓量比例
	      "shortAccount":"0.4238", // 大户空仓持仓量比例
	      "timestamp":"1583139600000"
}

名称	类型	是否必需	描述
symbol	STRING	YES	
period	ENUM	YES	"5m","15m","30m","1h","2h","4h","6h","12h","1d"
limit	LONG	NO	default 30, max 500
startTime	LONG	NO	
endTime	LONG	NO	


In [45]:
start_date = '2025-10-01'
end_date = '2025-11-01'
date_range_list = _generate_date_range(start_date=start_date, end_date=end_date, read_frequency=DataFrequency.MONTHLY)
dir = '/Users/aming/data/ETHUSDT'
path = 'topLongShortPositionRatio'
df_list = []

for date_str in date_range_list:
    df = pd.read_csv(f'{dir}/{path}/{path}_{date_str}.csv')
    df_list.append(df)

df = pd.concat(df_list)
df['open_time'] = pd.to_datetime(df['open_time'], unit='ns')
# df.sort_values(by='open_time', ascending=True, inplace=True)
df.set_index('open_time', inplace=True)
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df
# df['buySellRatio'].plot()


Unnamed: 0_level_0,symbol,longAccount,longShortRatio,shortAccount
open_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2025-10-01 00:00:00,ETHUSDT,0.7353,2.7775,0.2647
2025-10-01 00:00:00,ETHUSDT,0.7353,2.7775,0.2647
2025-10-01 00:00:00,ETHUSDT,0.7353,2.7775,0.2647
2025-10-01 00:00:00,ETHUSDT,0.7353,2.7775,0.2647
2025-10-01 00:00:00,ETHUSDT,0.7353,2.7775,0.2647
...,...,...,...,...
2025-11-30 23:50:00,ETHUSDT,0.7451,2.9237,0.2549
2025-11-30 23:50:00,ETHUSDT,0.7451,2.9237,0.2549
2025-11-30 23:50:00,ETHUSDT,0.7451,2.9237,0.2549
2025-11-30 23:55:00,ETHUSDT,0.7450,2.9213,0.2550


topLongShortAccountRatio

持仓大户的净持仓多头和空头账户数占比，大户指保证金余额排名前20%的用户。一个账户记一次。 多仓账户数比例 = 持多仓大户数 / 总持仓大户数 空仓账户数比例 = 持空仓大户数 / 总持仓大户数 多空账户数比值 = 多仓账户数比例 / 空仓账户数比例


https://developers.binance.com/docs/zh-CN/derivatives/usds-margined-futures/market-data/rest-api/Top-Long-Short-Account-Ratio

名称	类型	是否必需	描述
symbol	STRING	YES	
period	ENUM	YES	"5m","15m","30m","1h","2h","4h","6h","12h","1d"
limit	LONG	NO	default 30, max 500
startTime	LONG	NO	
endTime	LONG	NO	

{ 
         "symbol":"BTCUSDT",
	      "longShortRatio":"1.8105",// 大户多空账户数比值
	      "longAccount": "0.6442", // 大户多仓账户数比例
	      "shortAccount":"0.3558", // 大户空仓账户数比例
	      "timestamp":"1583139600000"
    }
    


In [46]:
start_date = '2025-10-01'
end_date = '2025-11-01'
date_range_list = _generate_date_range(start_date=start_date, end_date=end_date, read_frequency=DataFrequency.MONTHLY)
dir = '/Users/aming/data/ETHUSDT'
path = 'topLongShortAccountRatio'
df_list = []

for date_str in date_range_list:
    df = pd.read_csv(f'{dir}/{path}/{path}_{date_str}.csv')
    df_list.append(df)

df = pd.concat(df_list)
df['open_time'] = pd.to_datetime(df['open_time'], unit='ns')
# df.sort_values(by='open_time', ascending=True, inplace=True)
df.set_index('open_time', inplace=True)
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df
# df['buySellRatio'].plot()


Unnamed: 0_level_0,symbol,longAccount,longShortRatio,shortAccount
open_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2025-10-01 00:00:00,ETHUSDT,0.7126,2.4795,0.2874
2025-10-01 00:00:00,ETHUSDT,0.7126,2.4795,0.2874
2025-10-01 00:00:00,ETHUSDT,0.7126,2.4795,0.2874
2025-10-01 00:05:00,ETHUSDT,0.7122,2.4746,0.2878
2025-10-01 00:05:00,ETHUSDT,0.7122,2.4746,0.2878
...,...,...,...,...
2025-11-30 23:50:00,ETHUSDT,0.7028,2.3647,0.2972
2025-11-30 23:50:00,ETHUSDT,0.7028,2.3647,0.2972
2025-11-30 23:55:00,ETHUSDT,0.7037,2.3750,0.2963
2025-11-30 23:55:00,ETHUSDT,0.7037,2.3750,0.2963


{
    buySellRatio: "1.5586",
    buyVol: "387.3300", // 主动买入量
    sellVol: "248.5030", // 主动卖出量
    timestamp: "1585614900000",
  }

https://developers.binance.com/docs/zh-CN/derivatives/usds-margined-futures/market-data/rest-api/Taker-BuySell-Volume

takerlongshortRatio

In [15]:
start_date = '2025-10-01'
end_date = '2025-11-01'
date_range_list = _generate_date_range(start_date=start_date, end_date=end_date, read_frequency=DataFrequency.MONTHLY)
dir = '/Users/aming/data/ETHUSDT'
takerlongshortRatioPath = 'takerlongshortRatio'
df_list = []

for date_str in date_range_list:
    df = pd.read_csv(f'{dir}/{takerlongshortRatioPath}/{takerlongshortRatioPath}_{date_str}.csv')
    df_list.append(df)

df = pd.concat(df_list)
# df.sort_values(by='open_time', ascending=True, inplace=True)
df.set_index('open_time', inplace=True)
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df.head()
# df['buySellRatio'].plot()


Unnamed: 0_level_0,buySellRatio,sellVol,buyVol
open_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2025-10-01 00:00:00,1.5031,4230.999,6359.645
2025-10-01 00:00:00,1.5031,4230.999,6359.645
2025-10-01 00:00:00,1.5031,4230.999,6359.645
2025-10-01 00:05:00,0.8976,3388.384,3041.311
2025-10-01 00:05:00,0.8976,3388.384,3041.311


globalLongShortAccountRatio

https://developers.binance.com/docs/zh-CN/derivatives/usds-margined-futures/market-data/rest-api/Long-Short-Ratio

{ 
         "symbol":"BTCUSDT",
	      "longShortRatio":"0.1960", // 多空人数比值
	      "longAccount": "0.6622", // 多仓人数比例
	      "shortAccount":"0.3378", // 空仓人数比例
	      "timestamp":"1583139600000"
    
}


In [49]:
start_date = '2025-10-01'
end_date = '2025-11-01'
date_range_list = _generate_date_range(start_date=start_date, end_date=end_date, read_frequency=DataFrequency.MONTHLY)
dir = '/Users/aming/data/ETHUSDT'
path = 'globalLongShortAccountRatio'
df_list = []

for date_str in date_range_list:
    df = pd.read_csv(f'{dir}/{path}/{path}_{date_str}.csv')
    df_list.append(df)

df = pd.concat(df_list)
# df.sort_values(by='open_time', ascending=True, inplace=True)
df.set_index('open_time', inplace=True)
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df
# df.head()
# df['buySellRatio'].plot()


Unnamed: 0_level_0,symbol,longAccount,longShortRatio,shortAccount
open_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2025-10-01 00:00:00,ETHUSDT,0.6786,2.1114,0.3214
2025-10-01 00:00:00,ETHUSDT,0.6786,2.1114,0.3214
2025-10-01 00:00:00,ETHUSDT,0.6786,2.1114,0.3214
2025-10-01 00:00:00,ETHUSDT,0.6786,2.1114,0.3214
2025-10-01 00:00:00,ETHUSDT,0.6786,2.1114,0.3214
...,...,...,...,...
2025-11-30 23:50:00,ETHUSDT,0.6650,1.9851,0.3350
2025-11-30 23:55:00,ETHUSDT,0.6655,1.9895,0.3345
2025-11-30 23:55:00,ETHUSDT,0.6655,1.9895,0.3345
2025-11-30 23:55:00,ETHUSDT,0.6655,1.9895,0.3345


杠杆数据

liquidations

side: 强平方向。

sell: 代表多头被爆仓（系统卖出平仓）。

buy: 代表空头被爆仓（系统买入平仓）。

price: 强平发生的成交价格。

amount: 强平的数量（币数或张数）。


In [7]:
start_date = '2025-10-01'
end_date = '2025-11-01'
read_frequency = DataFrequency.DAILY
date_range_list = _generate_date_range(start_date=start_date, end_date=end_date, read_frequency=read_frequency)
dir = '/Users/aming/data/ETHUSDT'
channel_path = 'liquidations'
symbol = 'ETHUSDT'
liq_list = []

for date_str in date_range_list:
    df = pd.read_csv(f'{dir}/{channel_path}/binance-futures_{channel_path}_{date_str}_{symbol}.csv.gz')
    liq_list.append(df)

liq_df = pd.concat(liq_list)
# df.sort_values(by='open_time', ascending=True, inplace=True)
liq_df.rename(columns={'timestamp': 'open_time'}, inplace=True)
liq_df['open_time'] = pd.to_datetime(liq_df['open_time'], unit='us')
# df['funding_timestamp'] = pd.to_datetime(df['funding_timestamp'], unit='us')
liq_df.set_index('open_time', inplace=True)
# df.index = pd.to_datetime(df.index)
liq_df.sort_index(inplace=True)
liq_df.drop(columns=['id', 'exchange', 'local_timestamp', 'symbol'], inplace=True)
liq_df.head()

Unnamed: 0_level_0,side,price,amount
open_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2025-10-01 00:00:17.767,buy,4161.21,3.244
2025-10-01 00:01:12.560,buy,4162.28,0.738
2025-10-01 00:01:39.582,buy,4164.18,0.06
2025-10-01 00:03:18.173,buy,4165.67,1.722
2025-10-01 00:05:02.358,buy,4166.02,3.837


In [8]:
from tools import LiquidationFactorEngine as liq
# from tools import LiquidationFactorEngine as liq
liq_factor_engine = liq.LiquidationFactorEngine(resample_freq = '15m')

bucket_quantiles = [0.75, 0.90]
bucket_window_hours=[24, 48]
mining_windows=[24]
mining_quantiles=[0.90]

liq_factor_df = liq_factor_engine.process(liq_df, bucket_quantiles=bucket_quantiles, bucket_window_hours=bucket_window_hours, mining_windows=mining_windows, mining_quantiles=mining_quantiles)

liq_factor_df

[*] 启动高性能引擎 (Polars Core) | 频率: 15m
[-] 生成博弈与比率特征...
[-] 执行多维度挖掘 (Windows=[24])...
[+] 处理完成. 输出因子数量: 88


Unnamed: 0_level_0,sum_long_large_lb24,count_long_large_lb24,sum_long_small_lb24,count_long_small_lb24,sum_long_med_lb24,count_long_med_lb24,sum_short_large_lb24,count_short_large_lb24,sum_short_small_lb24,count_short_small_lb24,...,feat_brk_sum_long_small_lb48_w24_q90,feat_brk_count_long_small_lb48_w24_q90,feat_brk_sum_long_med_lb48_w24_q90,feat_brk_count_long_med_lb48_w24_q90,feat_brk_sum_short_large_lb48_w24_q90,feat_brk_count_short_large_lb48_w24_q90,feat_brk_sum_short_small_lb48_w24_q90,feat_brk_count_short_small_lb48_w24_q90,feat_brk_sum_short_med_lb48_w24_q90,feat_brk_count_short_med_lb48_w24_q90
open_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2025-10-01 00:00:00,0.0,0,0.00000,0,103326.00000,1,0.0,0,0.00000,0,...,,,,,,,,,,
2025-10-01 00:15:00,0.0,0,0.00000,0,264325.59999,10,0.0,0,0.00000,0,...,0.000000,0.000000,2.558171,10.000000,0.0,0.0,0.000000e+00,0.000000e+00,0.000000,0.000000
2025-10-01 00:30:00,0.0,0,0.00000,0,26082.23683,12,0.0,0,0.00000,0,...,0.000000,0.000000,0.105075,1.318681,0.0,0.0,0.000000e+00,0.000000e+00,0.007906,0.185185
2025-10-01 00:45:00,0.0,0,0.00000,0,197.12784,1,0.0,0,0.00000,0,...,0.000000,0.000000,0.000849,0.086207,0.0,0.0,0.000000e+00,0.000000e+00,1.100628,2.200000
2025-10-01 01:00:00,0.0,0,0.00000,0,0.00000,0,0.0,0,10840.66922,6,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,1.084067e+13,6.000000e+09,0.424750,0.210526
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2025-11-01 22:30:00,0.0,0,1727.45853,2,0.00000,0,0.0,0,0.00000,0,...,0.178426,0.185185,0.000000,0.000000,0.0,0.0,0.000000e+00,0.000000e+00,0.000000,0.000000
2025-11-01 22:45:00,0.0,0,3.85887,1,0.00000,0,0.0,0,0.00000,0,...,0.000399,0.092593,0.000000,0.000000,0.0,0.0,0.000000e+00,0.000000e+00,0.000000,0.000000
2025-11-01 23:00:00,0.0,0,0.00000,0,0.00000,0,0.0,0,388.57300,1,...,0.000000,0.000000,0.000000,0.000000,0.0,0.0,1.139159e-01,2.127660e-01,0.000000,0.000000
2025-11-01 23:30:00,0.0,0,165.88239,1,0.00000,0,0.0,0,0.00000,0,...,0.017134,0.092593,0.000000,0.000000,0.0,0.0,0.000000e+00,0.000000e+00,0.000000,0.000000


In [9]:
liq_factor_df.describe()

Unnamed: 0,sum_long_large_lb24,count_long_large_lb24,sum_long_small_lb24,count_long_small_lb24,sum_long_med_lb24,count_long_med_lb24,sum_short_large_lb24,count_short_large_lb24,sum_short_small_lb24,count_short_small_lb24,...,feat_brk_sum_long_small_lb48_w24_q90,feat_brk_count_long_small_lb48_w24_q90,feat_brk_sum_long_med_lb48_w24_q90,feat_brk_count_long_med_lb48_w24_q90,feat_brk_sum_short_large_lb48_w24_q90,feat_brk_count_short_large_lb48_w24_q90,feat_brk_sum_short_small_lb48_w24_q90,feat_brk_count_short_small_lb48_w24_q90,feat_brk_sum_short_med_lb48_w24_q90,feat_brk_count_short_med_lb48_w24_q90
count,2987.0,2987.0,2987.0,2987.0,2987.0,2987.0,2987.0,2987.0,2987.0,2987.0,...,2986.0,2986.0,2986.0,2986.0,2986.0,2986.0,2986.0,2986.0,2986.0,2986.0
mean,136454.4,1.164714,7699.733608,8.48912,12576.249155,1.624372,121624.1,0.982591,6108.325126,6.484098,...,15217100.0,334896.7,14303300000.0,2344274.0,1110196000000.0,14065640.0,3630499000.0,2009378.0,24207000000.0,2344274.0
std,1628616.0,2.631216,13753.735476,13.599013,25342.815614,3.242207,1384918.0,2.209331,10293.202861,9.641984,...,831527700.0,18300170.0,496407600000.0,83843180.0,32126590000000.0,232537000.0,198386100000.0,109801000.0,821302700000.0,65951580.0
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,224.74456,1.0,0.0,0.0,0.0,0.0,192.52869,1.0,...,0.01754626,0.07142857,0.0,0.0,0.0,0.0,0.01715932,0.07042254,0.0,0.0
50%,0.0,0.0,2553.79584,4.0,0.0,0.0,0.0,0.0,2343.86097,3.0,...,0.1705781,0.225549,0.0,0.0,0.0,0.0,0.1889563,0.2275316,0.0,0.0
75%,62411.38,1.0,9098.555275,10.0,15322.16869,2.0,50875.93,1.0,7544.489325,8.0,...,0.5686821,0.625,0.5083826,0.5172414,0.3319217,0.5,0.5944059,0.5997696,0.5090206,0.5148031
max,79462750.0,56.0,211547.8098,231.0,499114.17867,67.0,63460500.0,48.0,120396.61924,125.0,...,45438250000.0,1000000000.0,20057850000000.0,4000000000.0,1687410000000000.0,9000000000.0,10840670000000.0,6000000000.0,41776120000000.0,3000000000.0


derivative_ticker

In [51]:
start_date = '2025-10-01'
end_date = '2025-11-01'
read_frequency = DataFrequency.DAILY
date_range_list = _generate_date_range(start_date=start_date, end_date=end_date, read_frequency=read_frequency)
dir = '/Users/aming/data/ETHUSDT'
channel_path = 'derivative_ticker'
symbol = 'ETHUSDT'
df_list = []

for date_str in date_range_list:
    df = pd.read_csv(f'{dir}/{channel_path}/binance-futures_{channel_path}_{date_str}_{symbol}.csv.gz')
    df_list.append(df)

df = pd.concat(df_list)
# df.sort_values(by='open_time', ascending=True, inplace=True)
df.rename(columns={'timestamp': 'open_time'}, inplace=True)
df['open_time'] = pd.to_datetime(df['open_time'], unit='us')
df['funding_timestamp'] = pd.to_datetime(df['funding_timestamp'], unit='us')
df.set_index('open_time', inplace=True)
# df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
df.drop(columns=['exchange', 'local_timestamp', 'predicted_funding_rate', 'open_interest', 'symbol'], inplace=True)
df.head()
# df['buySellRatio'].plot()


Unnamed: 0_level_0,funding_timestamp,funding_rate,last_price,index_price,mark_price
open_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2025-10-01 00:00:00.000,2025-10-01 00:00:00,-1.6e-05,,4145.183256,4142.98
2025-10-01 00:00:01.001,2025-10-01 08:00:00,-1.6e-05,,4145.183256,4142.74
2025-10-01 00:00:01.632,2025-10-01 08:00:00,-1.6e-05,4142.99,4145.183256,4142.74
2025-10-01 00:00:02.001,2025-10-01 08:00:00,-1.6e-05,4142.99,4145.226512,4142.96
2025-10-01 00:00:03.000,2025-10-01 08:00:00,-1.6e-05,4142.99,4145.49,4143.066357


处理openInterest
{
	"openInterest": "10659.509", // 未平仓合约数量
	"symbol": "BTCUSDT",	// 交易对
	"time": 1589437530011   // 撮合引擎时间
}
https://developers.binance.com/docs/zh-CN/derivatives/usds-margined-futures/market-data/rest-api/Open-Interest

In [34]:
start_date = '2025-10-01'
end_date = '2025-11-01'
read_frequency = DataFrequency.MONTHLY
date_range_list = _generate_date_range(start_date=start_date, end_date=end_date, read_frequency=read_frequency)
dir = '/Users/aming/data/ETHUSDT'
channel_path = 'openInterest'
symbol = 'ETHUSDT'
df_list = []

for date_str in date_range_list:
    df = pd.read_csv(f'{dir}/{channel_path}/{channel_path}_{date_str}.csv')
    df_list.append(df)

df = pd.concat(df_list)
df.set_index('open_time', inplace=True)
# df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
# df.drop(columns=['exchange', 'local_timestamp', 'symbol'], inplace=True)
df.head()
# df['buySellRatio'].plot()


Unnamed: 0_level_0,symbol,openInterest
open_time,Unnamed: 1_level_1,Unnamed: 2_level_1
2025-10-01 00:00:00.114,ETHUSDT,1827785.572
2025-10-01 00:00:05.408,ETHUSDT,1827799.596
2025-10-01 00:00:12.566,ETHUSDT,1827989.568
2025-10-01 00:00:20.574,ETHUSDT,1827917.962
2025-10-01 00:00:26.319,ETHUSDT,1827886.995
