财报因子v4
- 结合了因子增量计算模块(财报因子v3.3)和全量计算模块
  - `increment`参数为True时返回增量数据，否则返回全量数据
  - `start_date`参数为财报日期`m_timetag`的开始时间

In [1]:
import numpy as np
import pandas as pd
from cylib.apis.all_api import *
from cylib.qmtdata.cyxtdata import xtdata
import warnings
warnings.filterwarnings('ignore')
from datetime import datetime
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

# 以HS300为例

### 注意事项

> - 财报中有一些值是累计值，比如说公司的营业收入，每季度财报统计的是从财报年初期到本期累计的值，而不是单季度的营业收入。
> 
> - 在计算某些初期的值时，不一定是选择财报年的第一个季度(3月)，而是考虑公司实际公布的财报的季度，防止因为没有3月份的数据而错误地计算。
>
> - 对TTM求同比增长率时，不能使用fillna对null进行0值填充，因为TTM的null是值无效值
>
> - 股票代码和日期统一由小到大排序
>
> - pandas的1.4版本之后，Dataframe不再有append函数，因此`All_data.append(temp)`会报错，要使用：`pd.concat([All_data, temp], ignore_index=True)`

> 获取单日期的数据时，要注意一下事项：
>
> - 如果当天有数据，则返回一个三个列表的Dataframe(trade_date, ts_code, factor_name)，如果没有数据，则返回None
>
> - 如果是需要用之前数据进行计算的(比如：同比增长、环比增长等)，则需要将此范围日期内的数据加入计算。
> - **获取的是在当日发布的财报数据❗❗**

In [2]:
trade_date = '20240814' # 截止日期
hs300_list = list(xtdata.get_index_weight("000300.SH").keys())
# xtdata.download_financial_data(hs300_list, table_list=["Balance", "Income", "CashFlow", "PershareIndex", "Top10holder", "Holdernum", "Capital"]) # 下载数据
# financial_data = xtdata.get_financial_data(hs300_list, ["Balance", "Income", "CashFlow", "PershareIndex", "Top10holder", "Holdernum", "Capital"])

# 共有自定义参数
common_param = {
    "Stock_list": hs300_list,
    "LPNP_N": 8, # 线性纯化利润率的参数N，默认为8
    "CEGR_n": 2, # 资本支出增长率的参数n，默认为2
    "AROE_N": 8, # 规模调整ROE的参数N，默认为8
    "SOP_T": 6, # 标准化营业利润的参数T，默认为6
    "OCFA_N": 8, # 产业利用率提升的参数N，默认为8
}

# def DIY_FACTOR1_SCRIPT(trade_date, common_param):
#     NEW_DATA_DF = calc_factor()
#     # 你的因子增量计算代码
#     return NEW_DATA_DF # 返回一个df给数据库，有三列，分别是：trade_date、ts_code、factor_name

In [3]:
def MV_DF(Stock_list, Start_date, End_date):
    """
    获取某一段时间的总市值数据。
    由于有一些函数需要用到总市值数据，因此此函数作为全局函数使用。
    区别于获取当天总市值数据函数MV_Single_DF.
    
    由于后面市值的因子可能需要获取季度最后一天的数据，
    而在季度最后一天有可能不是交易日而导致没有数据，
    此时采用最近一天交易日的数据作为前值填充替代。
        
    注：
    1. 某个股票因为停牌而没有数据，采用最近一天交易日的数据作为替代
    2. 处理完成之后，值依旧为Null，说明这个股票在这个日期还没有上市，没有数据
    3. 目前晨乐数据库中的数据最早的日期为2014.01.02
    4. 原始市值数据的单位是万元，为了转换为元，要乘以10000
    5. 由于上市公司会不定期变动股本数量(不一定是每季度的最后一天变动)，
       而变动股本数量的日期可能不是交易日，因此我们还需要加入上市公司变动
       股本数量的日期Changed_dates。如果为了提高效率，不加也可以。
    """
    def Quarter_End_Dates(start_date, end_date):
        # 获取每个季度最后的一天
        start = pd.to_datetime(start_date)
        end = pd.to_datetime(end_date)
        quarter_end_dates = (
            pd.date_range(start=start, end=end, freq="Q")
            .to_period("Q")
            .to_timestamp("D")
            + pd.offsets.QuarterEnd()
        )
        return quarter_end_dates
    def Get_Capital_Changeable_Dates(Stock_list, Begin_date, End_date):
        """
        获取上市公司股本变化的日期
        """
        Financial_DF = xtdata.get_financial_data(Stock_list, ["Capital"])
        All_data = pd.DataFrame()
        for asset in Financial_DF.keys():
            temp = Financial_DF[asset]["Capital"][
                [
                    "m_timetag", 
                    "m_anntime"]
            ]
            temp = temp[(temp.m_timetag >= Begin_date) & (temp.m_timetag <= End_date)]
            All_data = pd.concat([All_data, temp], ignore_index=True)

        All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
        All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
        All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
        All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
        All_data.reset_index(drop=True, inplace=True)
        All_data.set_index("m_timetag", inplace=True)
        return All_data.index.unique()
    Changable_dates = Get_Capital_Changeable_Dates(Stock_list, Start_date, End_date)

    # Today_Date = datetime.today().strftime("%Y%m%d")  # 今天的日期
    MV = get_price(
        ts_code_list=Stock_list,
        feature_list=["total_mv"], # 总市值
        start_date=Start_date,
        trade_date=End_date,
        target_type="stock",
    )
    MV = MV.reset_index()
    MV_pivot = MV.pivot(index="trade_date", columns="ts_code", values="total_mv")
    Quarter_Dates = Quarter_End_Dates(Start_date, End_date)
    
    # 将交易日日期与季度最后一天日期合并，并排序
    Combined_date = pd.Index(sorted(set(MV_pivot.index) | set(Quarter_Dates) | set(Changable_dates)))
    
    # 创建一个新的Dataframe，与原来的市值Dataframe保持相同的日期和股票名称
    all_combinations = pd.MultiIndex.from_product(
        [Combined_date, Stock_list], 
        names=["trade_date", "ts_code"]
    )
    all_combinations_df = pd.DataFrame(index=all_combinations).reset_index()
    
    # 将新的Dataframe与市值Dataframe合并，采用左连接的方法
    price_whole = pd.merge(
        all_combinations_df, MV, 
        on=["trade_date", "ts_code"], 
        how="left"
    )
    
    # 为了方便缺失的数据显示出来，这里使用pivot，并采用前值填充的方法
    Combined_DF = price_whole.pivot(index="trade_date", columns="ts_code", values="total_mv")
    Combined_DF = Combined_DF.ffill()
    
    # 再将pivot转换成常用的形式，方便后续处理
    Combined_DF = Combined_DF.reset_index().melt(
        id_vars="trade_date", 
        var_name="ts_code", 
        value_name="total_mv"
    )
    Combined_DF["total_mv"] = Combined_DF["total_mv"] * 10000
    Combined_DF = Combined_DF.sort_values(["trade_date", "ts_code"]).reset_index(drop=True)
    return Combined_DF

def Prev_Quarter_end(date):
    """
    获取上一个季度的末尾日期，如果日期是当前季度的末尾日期，则直接返回该日期。
    """
    date = pd.to_datetime(date)
    Quarter_end = pd.Timestamp(date).to_period('Q').end_time.normalize()
    # 检查输入日期是否是当前季度的末尾日期
    if date == Quarter_end:
        return Quarter_end.to_pydatetime()
    else:
        # 计算上一个季度的末尾日期
        prev_quarter_end = Quarter_end - pd.offsets.QuarterEnd(1)
        return prev_quarter_end.normalize().to_pydatetime()

def Trade_Calendar(start_date, end_date):
    # 交易日历获取
    trading_dates = xtdata.get_trading_dates('SH', 
                                            start_time=start_date, 
                                            end_time=end_date, 
                                            count=-1)
    dates = [datetime.fromtimestamp(ts / 1000).strftime("%Y%m%d") for ts in trading_dates]
    return dates
# Trade_Dates = Trade_Calendar("20150101", "20200101")

### 账面杠杆(7.21)

(Market Leverage)

$$
账面杠杆=\frac{最近报告期总资产}{最近同期股东权益总计}
$$

表示资产总额是股东权益总额的多少倍，作者测试结果表明平均回报率与账面杠杆负相关；Barra中的账面杠杆=最近报告期的(非流动负债合计+优先股账面价值+普通股账面价值)/最近报告期的普通股账面价值

In [4]:
# Market leverage
def Market_Leverage(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Balance"])
    # Market leverage
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Balance"][
            [
                "m_timetag", 
                "m_anntime", 
                "total_equity", 
                "tot_liab_shrhldr_eqy"
            ]
        ]
        temp["ts_code"] = asset
        temp = temp[
            [
                "ts_code",
                "m_timetag",
                "m_anntime",
                "total_equity",
                "tot_liab_shrhldr_eqy"
            ]
        ]
        if increment == True:
            temp = temp[temp.m_anntime == trade_date]
        else: 
            temp = temp[(temp.m_timetag >= start_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)
    if All_data.empty: return None

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])

    All_data["Market_Leverage"] = All_data["tot_liab_shrhldr_eqy"] / All_data["total_equity"]
    All_data["Market_Leverage"] = All_data["Market_Leverage"].replace([np.inf, -np.inf], np.nan)
    All_data = All_data[["m_timetag", "ts_code", "Market_Leverage"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [5]:
ML = Market_Leverage(trade_date, common_param, increment=False)
ML

Unnamed: 0,trade_date,ts_code,Market_Leverage
0,2015-03-31,000001.SZ,16.859602
1,2015-06-30,000001.SZ,17.036771
2,2015-09-30,000001.SZ,16.540194
3,2015-12-31,000001.SZ,15.524142
4,2016-03-31,000001.SZ,14.294233
...,...,...,...
10320,2023-06-30,688981.SH,1.528879
10321,2023-09-30,688981.SH,1.528725
10322,2023-12-31,688981.SH,1.549242
10323,2024-03-31,688981.SH,1.562552


### 单季度成本费用利润率同比增长率(7.13)

(YoY of CIR)

$$
\frac{最近报告期单季度成本费用利润率 - 上年同期单季度成本费用利润率}{abs(上年同期单季度成本费用利润率)}
$$
① 成长因子是基于股票过去一段时间的各项估值或盈利指标计算得到的增长量或增长率指标，用于衡量公司各个维度的成长性;

② 计算成长因子时，最常用的统计维度有:环比增长率(或环比增量)、同比增长率(或同比增量)，计算用的基础财务数据也有“最近12个月TTM”和“单季度”2类，经测试发现:“单季度同比”和“TTM环比”这2个维度下计算得到的因子表现通常相对更优;

③ 至于增长率和增量，表现相差不大，增长率更具可比性，增量更不容易受基础指标取值异常的影响。
成本费用利用率（Cost-to-Income Ratio，简称CIR）是一个用来衡量公司或者机构在运营过程中，为每单位营收所消耗的成本和费用的比率指标。它通常用于评估一个企业的运营效率和财务健康状况。

---

- 营业成本是指企业销售商品或提供劳务的成本。营业成本又分为主营业务成本和其他业务成本。他们是与主营业务收入和其他业务收入相对应的一组概念。
- 营业总成本是指企业在经营活动中所有的成本支出，包括营业成本、营业税金及附加、销售费用、管理费用、财务费用和资产减值损失。

- 成本费用利润率(CIR)
$$ \text{CIR} = \frac{\text{Operating Costs and Expenses}}{\text{Operating Income}} \times 100\% $$

其中，

- **Operating Costs and Expenses** 表示企业在运营过程中产生的所有成本和费用，包括但不限于人力成本、设备折旧、租金、利息支出等。
  
- **Operating Income** 是企业在正常运营活动中所获得的收入，通常指营业收入减去营业成本。

成本费用利用率的数值越低，说明企业在获取每单位收入时的成本和费用相对较低，运营效率较高。高成本费用利用率则可能表明企业的成本控制能力较弱，运营效率不佳。

在金融行业，成本费用利用率也被用作评估银行和金融机构的核心指标之一，以衡量其运营效率和盈利能力的关键指标之一。

> 注意是单季度的值，要对原始值做差分

In [6]:
# YoY of CIR
def CIR_YoY(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Income"])
    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    # 由于要计算同比增长，数据范围需要提前一年的时间
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1)).strftime("%Y%m%d")
    
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime", 
                "revenue", 
                "total_operating_cost"]
        ]
        temp["ts_code"] = asset
        temp = temp[
            [
                "ts_code", 
                "m_timetag", 
                "m_anntime", 
                "revenue", 
                "total_operating_cost"]
        ]
        temp = temp[(temp.m_timetag >= Begin_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])

    def cal_group_dif(Group, Factor):
        def cal_diff(group, factor):
            res = group[factor].diff()
            res = res.fillna(group[factor]) # Fill NaN values with original values
            return res
        Group[f"{Factor}_diff"] = (
            Group.groupby("Year")
            .apply(lambda x: cal_diff(x, Factor))
            .reset_index(level=0, drop=True)
        ).values.reshape(-1)
        return Group
    All_data = (
        All_data.groupby("ts_code")
        .apply(lambda x: cal_group_dif(x, "revenue"))
        .reset_index(drop=True)
    )
    All_data = (
        All_data.groupby("ts_code")
        .apply(lambda x: cal_group_dif(x, "total_operating_cost"))
        .reset_index(drop=True)
    )

    All_data["CIR"] = All_data["total_operating_cost_diff"] / All_data["revenue_diff"]
    All_data["CIR"] = All_data["CIR"].replace([np.inf, -np.inf], np.nan)

    def Cal_Factor_YoY(Group, Factor):
        """
        计算同比增长。
        YoY = (本期值 - 上期值) / abs(上期值)
        如果上一期值和本期值为0，则增长率为0；
        如果上一期值为0，本期不为0，则增长率为NaN；
        其他情况正常计算。
        """
        DF = Group[Factor].fillna(0)
        DF_Shift = Group[Factor].fillna(0).shift(1)
        DIFF = DF - DF_Shift
        Group["CIR_YoY"] = np.where(DIFF == 0, 0, DIFF / abs(DF_Shift))
        Group["CIR_YoY"] = Group["CIR_YoY"].replace([np.inf, -np.inf], np.nan)
        return Group
    All_data = All_data.groupby(["ts_code", "Month"]).apply(lambda x: Cal_Factor_YoY(x, "CIR")).reset_index(drop=True)
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "CIR_YoY"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [7]:
CIR_YoY_DF = CIR_YoY("20230720", common_param, increment=True)
CIR_YoY_DF

### 线性纯化利润率(7.5)

(Linear Purified Net Profit)

财务数据中的信息可以分为对未来股价有预测能力的信息(信号)与对未来股价没有预测能力的信息(噪声)。数据的信比越大，其作为因子的预测能力就越强。净利润数据中有不少其它财务数据决定的部分就属于噪音项，找出它们对于净利润数据的贡献后，将其剔除，可以提升纯化后净利润数据的预测能力。

$$
NetProfit_i = a_i + \beta_{1, i}NonOperatingIncome_i + \beta_{2, i}CashPaidEmployees + \epsilon_i
$$
$$
i \in \{0, 1, 2, ..., N-1\},\ N=8 \  (default)
$$

以当天最近N个季度的净利润$NetProfit$作为Y、以对应季度的营业外收入$NonOperatingIncome$和支付给职工以及为职工支付的现金数据$CashPaidEmployees$作为X，进行OLS线性回归，回归前需对X和Y进行Z-Score标准化处理；回归得到的最近一个季度上的残差$\epsilon_i$即为LPNP在当天的因子值。

> 要对原始数据进行差分运算

In [8]:
def OSL_Regression(NetProfit, NonOperIncome, CashPaidEmployees, N):
    if len(NetProfit) != N or len(NonOperIncome) != N or len(CashPaidEmployees) != N:
        return None
    # Combine the input series into a DataFrame
    df = pd.DataFrame(
        {
            "NetProfit": NetProfit,
            "NonOperIncome": NonOperIncome,
            "CashPaidEmployees": CashPaidEmployees,
        }
    )
    # print(NetProfit)
    # print(NonOperIncome)
    # print(CashPaidEmployees)
    # Z-Score standardization
    df_standardized = ((df - df.mean()) / df.std()).fillna(0)

    # Prepare the data for regression
    X = sm.add_constant(df_standardized[["NonOperIncome", "CashPaidEmployees"]])
    y = df_standardized["NetProfit"]

    # Perform OLS regression
    model = sm.OLS(y, X).fit()

    # print(model.rsquared)
    # rsqure_list.append(model.rsquared)

    # Get the coefficients
    a = model.params["const"]
    beta1 = model.params["NonOperIncome"]
    beta2 = model.params["CashPaidEmployees"]

    # Calculate LPNP factor value
    latest_data = df_standardized.iloc[-1]  # Get the latest quarter data
    LPNP = (
        latest_data["NetProfit"]
        - a
        - beta1 * latest_data["NonOperIncome"]
        - beta2 * latest_data["CashPaidEmployees"]
    )
    # print("LPNP Factor Value:", LPNP)
    return LPNP

def Linear_Purified_NP(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    N = common_param["LPNP_N"]
    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    # 数据范围提前N个季度，即3*N个月
    Begin_date = (Prev_Quarter_date - pd.DateOffset(months=3*N)).strftime("%Y%m%d")
    
    # 由于资产负债表和现金流表中的m_anntime不同，因此去掉
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Income", "CashFlow"])
    Income = pd.DataFrame()
    CashFlow = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp_income = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime", 
                "net_profit_incl_min_int_inc", 
                "plus_non_oper_rev"]
        ]
        temp_cashflow = Financial_DF[asset]["CashFlow"][
            [
                "m_timetag", 
                "m_anntime", 
                "cash_pay_beh_empl"]
        ]
        temp_income["ts_code"] = asset
        temp_cashflow["ts_code"] = asset
        temp_income = temp_income[
            [
                "ts_code", 
                "m_timetag", 
                "m_anntime", 
                "net_profit_incl_min_int_inc", 
                "plus_non_oper_rev"]
        ]
        temp_cashflow = temp_cashflow[
            [
                "ts_code", 
                "m_timetag", 
                "m_anntime", 
                "cash_pay_beh_empl"]]
        temp_income = temp_income[
            (temp_income.m_timetag >= Begin_date) 
            & (temp_income.m_timetag <= trade_date)
        ]
        temp_cashflow = temp_cashflow[
            (temp_cashflow.m_timetag >= Begin_date)
            & (temp_cashflow.m_timetag <= trade_date)
        ]
        Income = pd.concat([Income, temp_income], ignore_index=True)
        CashFlow = pd.concat([CashFlow, temp_cashflow], ignore_index=True)

    keys = ["ts_code", "m_timetag"]
    All_data = pd.merge(Income, CashFlow, on=keys, how="left")
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime_x"] = pd.to_datetime(All_data["m_anntime_x"]) # Income's column
    All_data["m_anntime_y"] = pd.to_datetime(All_data["m_anntime_y"]) # CashFlow's column
    # 由于Income和Cashflow表中的披露时间m_anntime不一样
    # 取同一个财报中，在这两个表中最早的日期作为实际的财报披露时间
    All_data['m_anntime'] = All_data[['m_anntime_x', 'm_anntime_y']].min(axis=1)
    All_data.fillna(0, inplace=True)

    def cal_group_dif(Group, Factor):
        def cal_diff(group, factor):
            res = group[factor].diff()
            res = res.fillna(group[factor])  # Fill NaN values with original values
            return res

        Group[f"{Factor}_diff"] = (
            Group.groupby("Year")
            .apply(lambda x: cal_diff(x, Factor))
            .reset_index(level=0, drop=True)
        ).values.reshape(-1)
        return Group

    All_data = All_data.groupby("ts_code").apply(
        lambda x: cal_group_dif(x, "net_profit_incl_min_int_inc")
    ).reset_index(drop=True)
    All_data = All_data.groupby("ts_code").apply(
        lambda x: cal_group_dif(x, "plus_non_oper_rev")
    ).reset_index(drop=True)
    All_data = All_data.groupby("ts_code").apply(
        lambda x: cal_group_dif(x, "cash_pay_beh_empl")
    ).reset_index(drop=True)

    def cal_factor_LPNP(group, factor_NP, factor_NOI, factor_CPE, N):
        NP_Rolling = group[factor_NP].rolling(N)
        NOI_Rolling = group[factor_NOI].rolling(N)
        CPE_Rolling = group[factor_CPE].rolling(N)
        group["LPNP"] = [
            OSL_Regression(a, b, c, N)
            for a, b, c in zip(NP_Rolling, NOI_Rolling, CPE_Rolling)
        ]
        return group

    All_data = All_data.groupby("ts_code").apply(
        lambda x: cal_factor_LPNP(
            x,
            "net_profit_incl_min_int_inc_diff",
            "plus_non_oper_rev_diff",
            "cash_pay_beh_empl_diff",
            N,
        )
    ).reset_index(drop=True)
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "LPNP"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [9]:
LP = Linear_Purified_NP(trade_date, common_param, increment=False)
LP

Unnamed: 0,trade_date,ts_code,LPNP
0,2015-03-31,000001.SZ,0.248498
1,2015-06-30,000001.SZ,1.149239
2,2015-09-30,000001.SZ,0.416773
3,2015-12-31,000001.SZ,-0.037270
4,2016-03-31,000001.SZ,0.625530
...,...,...,...
9988,2023-06-30,688981.SH,-1.540062
9989,2023-09-30,688981.SH,-1.120249
9990,2023-12-31,688981.SH,-1.109765
9991,2024-03-31,688981.SH,-0.074837


### 户均持股比例(6.30)

(Average sharehold ratio)

为股东的平均持股数占总股本的比例，一般来说，比例大的股票筹码集中度高，筹码锁定性强，浮筹少，上涨较容易。
$$
\frac{总股本/股东户数}{总股本} = \frac{1}{股东户数}
$$

In [10]:
# Average sharehold ratio
def Average_Sharehold_Ratio(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Holdernum"])
    """
    endDate = m_timetag
    declareDate = m_anntime
    """
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Holdernum"][
            ["endDate", "declareDate", "shareholder"]]
        temp["ts_code"] = asset
        temp = temp[
            [
                "ts_code",
                "endDate",
                "declareDate",
                "shareholder"
            ]
        ]
        if increment:
            temp = temp[temp.declareDate == trade_date]
        else: temp = temp[(temp.endDate >= start_date) & (temp.endDate <= trade_date)]
        All_data = All_data = pd.concat([All_data, temp], ignore_index=True)
    if All_data.empty: return None

    All_data["Year"] = All_data["endDate"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["endDate"].apply(lambda x: x[4:6]).astype(int)
    All_data["endDate"] = pd.to_datetime(All_data["endDate"])
    All_data["declareDate"] = pd.to_datetime(All_data["declareDate"])
    All_data.drop_duplicates(subset=["ts_code", "endDate", "declareDate"], inplace=True)
    
    All_data['Average_Sharehold_Ratio'] = (1 / All_data['shareholder'])
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["endDate", "ts_code", "Average_Sharehold_Ratio"]]
    All_data = All_data.rename(columns={"endDate": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [11]:
ASR = Average_Sharehold_Ratio(trade_date, common_param, increment=True)
ASR

Unnamed: 0,trade_date,ts_code,Average_Sharehold_Ratio
0,2024-08-09,000408.SZ,2.8e-05
1,2024-08-09,000876.SZ,5e-06
2,2024-08-10,000895.SZ,5e-06
3,2024-07-31,002460.SZ,inf
4,2024-06-30,002938.SZ,1.7e-05
5,2024-06-30,601138.SH,5e-06


### 单季度净利润同比增长率(6.26)

(YoY of Net Profit)

$$
\frac{最近报告期的单季度净利率-上年同期的单季度净利率}{abs(上年同期的单季度净利率)}
$$

> 要对原始数据进行差分运算

In [12]:
# YoY of NP
def Net_Profit_YoY(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    # 首先数据范围提前一年；由于要计算差分，确保开始的日期是第一季度末的日期
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1, month=3)).strftime("%Y%m%d")

    Financial_DF = xtdata.get_financial_data(Stock_list, ["Income"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime", 
                "net_profit_incl_min_int_inc"]
        ]
        temp["ts_code"] = asset
        temp = temp[
            [
                "ts_code", 
                "m_timetag", 
                "m_anntime", 
                "net_profit_incl_min_int_inc"]
        ]
        temp = temp[(temp.m_timetag >= Begin_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])

    def cal_group_dif(Group, Factor):
        def cal_diff(group, factor):
            res = group[factor].diff()
            res = res.fillna(group[factor]) # Fill NaN values with original values
            return res
        Group[f"{Factor}_diff"] = (
            Group.groupby("Year")
            .apply(lambda x: cal_diff(x, Factor))
            .reset_index(level=0, drop=True)
        ).values.reshape(-1)
        return Group
    All_data = (
        All_data.groupby("ts_code")
        .apply(lambda x: cal_group_dif(x, "net_profit_incl_min_int_inc"))
        .reset_index(drop=True)
    )
    
    def Cal_Factor_YoY(Group, Factor):
        """
        计算同比增长。
        YoY = (本期值 - 上期值) / abs(上期值)
        如果上一期值和本期值为0，则增长率为0；
        如果上一期值为0，本期不为0，则增长率为NaN；
        其他情况正常计算。
        """
        DF = Group[Factor]
        DF_Shift = Group[Factor].shift(1)
        DIFF = DF - DF_Shift
        Group["Net_Profit_YoY"] = np.where(DIFF == 0, 0, DIFF / abs(DF_Shift))
        Group["Net_Profit_YoY"] = Group["Net_Profit_YoY"].replace([np.inf, -np.inf], np.nan)
        return Group
    All_data = All_data.groupby(["ts_code", "Month"]).apply(lambda x: Cal_Factor_YoY(x, "net_profit_incl_min_int_inc_diff")).reset_index(drop=True)
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "Net_Profit_YoY"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [13]:
NY = Net_Profit_YoY(trade_date, common_param, increment=False)
NY

Unnamed: 0,trade_date,ts_code,Net_Profit_YoY
0,2015-03-31,000001.SZ,0.113771
1,2015-06-30,000001.SZ,0.186927
2,2015-09-30,000001.SZ,0.094806
3,2015-12-31,000001.SZ,0.004138
4,2016-03-31,000001.SZ,0.081187
...,...,...,...
10221,2023-06-30,688981.SH,-0.587497
10222,2023-09-30,688981.SH,-0.784074
10223,2023-12-31,688981.SH,-0.581556
10224,2024-03-31,688981.SH,-0.680166


### 流动性经营资产变动(6.16)

(Changes in liquid operating asset)

流动性经营资产中的分项在会计核算时带有较大的主观性，其中的应收款项、存货等都容易涉及盈余操纵，可靠性较低；流动性经营资产的变动与企业未来盈利负相关，与股票未来收益也负相关，容易造成错误定价。

$$
\frac{最近报告期的流动性经营资产-上年同期值}{平均总资产}
$$
其中流动性经营资产 = 流动资产中剔除金融资产相关科目后的剩余部分 

≈ 应收账款 + 应收票据 + 预付款项 + 其他应收款 + 存货 + 待摊费用

平均总资产 = $\frac{期初总资产+期未总资产}{2} $

> Q: 期末总资产如何确定？是否是指最近报告期的总资产？

In [14]:
# Changes in liquid operating asset
def CLOA(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1)).strftime("%Y%m%d")
        
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Balance"])
    All_data = pd.DataFrame()
    Liquid_Asset_Names = [
        "account_receivable",
        "bill_receivable",
        "advance_payment",
        "other_receivable",
        "inventories",
        "apportioned_cost",
    ]
    Total_Asset_Name = ["tot_liab_shrhldr_eqy"]
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Balance"][
            ["m_timetag", "m_anntime"] 
            + Liquid_Asset_Names 
            + Total_Asset_Name
        ]
        temp["ts_code"] = asset
        temp = temp[
            ["ts_code", "m_timetag", "m_anntime"]
            + Liquid_Asset_Names
            + Total_Asset_Name
        ]
        temp = temp[(temp.m_timetag >= Begin_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    All_data.fillna(0, inplace=True)
    All_data["liquid_asset"] = (
        All_data["account_receivable"]
        + All_data["bill_receivable"]
        + All_data["advance_payment"]
        + All_data["other_receivable"]
        + All_data["inventories"]
        + All_data["apportioned_cost"]
    )

    def Cal_Average(Group, Factor):
        Asset = Group[Factor]
        First_Asset = Asset.iloc[0]
        Group["Average_Total_Asset"] = (Asset + First_Asset) / 2
        return Group

    All_data = All_data.groupby(["ts_code", "Year"]).apply(
        lambda x: Cal_Average(x, "tot_liab_shrhldr_eqy")).reset_index(drop=True)
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    if All_data.empty: return None
    All_data["CLOA"] = All_data["liquid_asset"] / All_data["Average_Total_Asset"]
    All_data["CLOA"] = All_data["CLOA"].replace([np.inf, -np.inf], np.nan)
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "CLOA"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [15]:
CLOA_DF = CLOA(trade_date, common_param, increment=False)
CLOA_DF

Unnamed: 0,trade_date,ts_code,CLOA
0,2015-03-31,000001.SZ,0.000000
1,2015-06-30,000001.SZ,0.000000
2,2015-09-30,000001.SZ,0.000000
3,2015-12-31,000001.SZ,0.000000
4,2016-03-31,000001.SZ,0.000000
...,...,...,...
10320,2023-06-30,688981.SH,0.069917
10321,2023-09-30,688981.SH,0.073046
10322,2023-12-31,688981.SH,0.074730
10323,2024-03-31,688981.SH,0.071576


### 企业年龄(6.6)

(IPO Age)

上市时间久的公司比上市时间短的公司能获得更高的投资收益。

$$ 
当前时刻与公司IPO日之间的月份数量
$$

> 与其他因子的输出格式一致，包含: trade_date, ts_code, factor_value
>
> 由于IPO年龄数据

In [16]:
def IPO_Age(trade_date, common_param, increment=True, start_date="20150101"):
    def Cal_Factor_IPO_Age(row):
        list_date = row['list_date']
        now_date = row['date']
        month_difference = (now_date.year - list_date.year) * 12 + now_date.month - list_date.month
        if month_difference < 0: return 0
        return month_difference
    Stock_list = common_param["Stock_list"]
    Stock_info = get_targets_info(target_type='stock')
    Trade_dates = Trade_Calendar(start_date, trade_date)
    Age_DF = Stock_info[Stock_info['ts_code'].isin(Stock_list)][['ts_code', 'list_date']]
    Age_DF['list_date'] = pd.to_datetime(Age_DF['list_date'])
    if increment:
        Age_DF['date'] = pd.to_datetime(trade_date)
        Age_DF['IPO_Age'] = Age_DF.apply(Cal_Factor_IPO_Age, axis=1)
        IPO_Age_DF = Age_DF
    else: 
        IPO_Age_DF = pd.DataFrame()
        for DATE in Trade_dates:
            Age_DF['date'] = pd.to_datetime(DATE)
            Age_DF['IPO_Age'] = Age_DF.apply(Cal_Factor_IPO_Age, axis=1)
            IPO_Age_DF = pd.concat([IPO_Age_DF, Age_DF], ignore_index=True)
    IPO_Age_DF.dropna(inplace=True)
    if IPO_Age_DF.empty: return None
    IPO_Age_DF = IPO_Age_DF[["date", "ts_code", "IPO_Age"]]
    IPO_Age_DF = IPO_Age_DF.rename(columns={"date": "trade_date"})
    IPO_Age_DF.reset_index(drop=True, inplace=True)
    return IPO_Age_DF

In [17]:
IPO = IPO_Age(trade_date, common_param, increment=True)
IPO

Unnamed: 0,trade_date,ts_code,IPO_Age
0,2024-08-14,000001.SZ,400
1,2024-08-14,000002.SZ,403
2,2024-08-14,000063.SZ,321
3,2024-08-14,000100.SZ,247
4,2024-08-14,000157.SZ,286
...,...,...,...
295,2024-08-14,688303.SH,37
296,2024-08-14,688363.SH,57
297,2024-08-14,688396.SH,54
298,2024-08-14,688599.SH,50


### 负债市值比(6.5)

(Debt-to-market ratio)

反映公司建立资产的资金来源中股本与债务比例，是公司财务杠杆的衡量指标。

$$
\frac{最近报告期负债合计}{总市值}
$$

In [18]:
def Debt2Market_Ratio(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Balance"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Balance"][["m_timetag", "m_anntime", "tot_liab"]]
        temp["ts_code"] = asset
        temp = temp[["ts_code", "m_timetag", "m_anntime", "tot_liab"]]
        if increment:
            temp = temp[temp.m_anntime == trade_date]
        else: temp = temp[(temp.m_timetag >= start_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    if All_data.empty: return None
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    Exist_Stock_list = list(All_data["ts_code"].unique())
    # print(Exist_Stock_list)
    
    # 获取日期范围
    date_unique = All_data['m_timetag'].drop_duplicates()
    date_sorted = date_unique.sort_values()
    latest_date = date_sorted.max().strftime("%Y%m%d")
    # print(earliest_date, latest_date)

    Market_Value_DF = MV_DF(Exist_Stock_list, start_date, latest_date)
    All_data = pd.merge(All_data, 
                        Market_Value_DF, 
                        how='left', 
                        left_on=['ts_code', 'm_timetag'], 
                        right_on=['ts_code', 'trade_date'])
    All_data["Debt2Market_Ratio"] = All_data["tot_liab"] / All_data["total_mv"]
    All_data["Debt2Market_Ratio"] = All_data["Debt2Market_Ratio"].replace([np.inf, -np.inf], np.nan)
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["trade_date", "ts_code", "Debt2Market_Ratio"]]
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [19]:
DR = Debt2Market_Ratio("20230610", common_param, increment=False)
DR

Unnamed: 0,trade_date,ts_code,Debt2Market_Ratio
0,2015-03-31,000001.SZ,12.048721
1,2015-06-30,000001.SZ,11.630143
2,2015-09-30,000001.SZ,16.268864
3,2015-12-31,000001.SZ,13.672391
4,2016-03-31,000001.SZ,16.378879
...,...,...,...
8479,2022-03-31,688981.SH,0.194388
8480,2022-06-30,688981.SH,0.246878
8481,2022-09-30,688981.SH,0.329109
8482,2022-12-31,688981.SH,0.317641


### 金融负债变动(5.21)

(Change in Financial Liability)

金融负债主要包含一些计息债务，会计核算时简洁明确，可靠性高，盈余操纵空间很小；金融负债的变动与企业未来盈利正相关，与股票未来收益也正相关。

$$
\frac{最近报告期的金融负债-上年同期值}{平均总资产}
$$

金融负债 = 短期借款 + 交易性金融负债 + 应付票据 + 一年内到期的非流动负债 + 长期借款 + 应付债券

平均总资产 = $ \frac{初期初总资产 + 期未总资产}{2} $

In [20]:
def CFL(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1)).strftime("%Y%m%d")
    
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Balance"])
    All_data = pd.DataFrame()
    Financial_Asset_Names = [
        "shortterm_loan",
        "tradable_fin_assets",
        "notes_payable",
        "non_current_liability_in_one_year",
        "long_term_loans",
        "bonds_payable",
    ]
    Total_Asset_Name = ["tot_liab_shrhldr_eqy"]
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Balance"][
            ["m_timetag", "m_anntime"]
            + Financial_Asset_Names 
            + Total_Asset_Name
        ]
        temp["ts_code"] = asset
        temp = temp[
            ["ts_code", "m_timetag", "m_anntime"]
            + Financial_Asset_Names
            + Total_Asset_Name
        ]
        temp = temp[(temp.m_timetag >= Begin_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    All_data.fillna(0, inplace=True)
    All_data["Financial_Liability"] = (
        All_data["shortterm_loan"]
        + All_data["tradable_fin_assets"]
        + All_data["notes_payable"]
        + All_data["non_current_liability_in_one_year"]
        + All_data["long_term_loans"]
        + All_data["bonds_payable"]
    )

    def Cal_Factor_Diff(Group, Factor):
        """
        计算同比差值。
        Diff = 本期值 - 上期值
        如果运算中出现了null值，则结果为null值
        """
        DF = Group[Factor]
        DF_Shift = Group[Factor].shift(1)
        Group[f"{Factor}_Diff"] = DF - DF_Shift
        return Group

    All_data = All_data.groupby(["ts_code", "Month"]).apply(
        lambda x: Cal_Factor_Diff(x, "Financial_Liability")
    ).reset_index(drop=True)
    All_data = All_data.sort_values(by=["ts_code", "m_timetag"])

    def Cal_Average(Group, Factor):
        Asset = Group[Factor]
        First_Asset = Asset.iloc[0]
        Group["Average_Total_Asset"] = (Asset + First_Asset) / 2
        return Group

    All_data = All_data.groupby(["ts_code", "Year"]).apply(
        lambda x: Cal_Average(x, "tot_liab_shrhldr_eqy")).reset_index(drop=True)
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    if All_data.empty: return None
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    All_data["CFL"] = All_data["Financial_Liability_Diff"] / All_data["Average_Total_Asset"]
    All_data["CFL"] = All_data["CFL"].replace([np.inf, -np.inf], np.nan)
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "CFL"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [21]:
CFL_DF = CFL(trade_date, common_param, increment=True)
CFL_DF

Unnamed: 0,trade_date,ts_code,CFL
0,2024-06-30,000895.SZ,0.056895
1,2024-06-30,002938.SZ,-0.000102
2,2024-06-30,300999.SZ,-0.113865
3,2024-06-30,601138.SH,-0.054339


### 现金比率(5.19)

(Cash Ratio)

现金比率反映企业的即刻变现能力，或是企业立即偿还到期债务的能力
$$
\frac{最近报告期货币资金+最近报告期交易性金融资产}{最近同期流动负债合计}
$$

In [22]:
# Cash Ratio
def Cash_Ratio(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Balance"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Balance"][
            [
                "m_timetag", 
                "m_anntime", 
                "cash_equivalents", 
                "tradable_fin_assets", 
                "total_current_liability"
            ]
        ]
        temp["ts_code"] = asset
        temp = temp[
            [
                "ts_code",
                "m_timetag",
                "m_anntime",
                "cash_equivalents", 
                "tradable_fin_assets", 
                "total_current_liability"
            ]
        ]
        if increment == True:
            temp = temp[temp.m_anntime == trade_date]
        else: 
            temp = temp[(temp.m_timetag >= start_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    if All_data.empty: return None
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    All_data.fillna(0, inplace=True)

    All_data["Cash_Ratio"] = (
        (
            All_data["cash_equivalents"] 
            + All_data["tradable_fin_assets"]
        )
        / All_data["total_current_liability"]
    )
    All_data["Cash_Ratio"] = All_data["Cash_Ratio"].replace([np.inf, -np.inf], np.nan)
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "Cash_Ratio"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [23]:
CR_DF = Cash_Ratio(trade_date, common_param, increment=False)
CR_DF

Unnamed: 0,trade_date,ts_code,Cash_Ratio
0,2015-03-31,000002.SZ,0.105390
1,2015-06-30,000002.SZ,0.117721
2,2015-09-30,000002.SZ,0.106283
3,2015-12-31,000002.SZ,0.126601
4,2016-03-31,000002.SZ,0.115666
...,...,...,...
8721,2023-06-30,688981.SH,1.206551
8722,2023-09-30,688981.SH,1.056388
8723,2023-12-31,688981.SH,1.002695
8724,2024-03-31,688981.SH,0.823191


### 毛利率增长减销售收入增长(5.15)

(Gross Profit growth Difference in sales revenue growth)

$$
最近单季度毛利润的同比增速-同期单季度营业收入的同比增速
$$

> 要对原始数据进行差分运算

In [24]:
# Gross margin growth Difference in sales revenue growth
def GPG_Minus_SRG(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    # 由于要计算差分，开始的日期是第一季度末的日期
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1, month=3)).strftime("%Y%m%d")
    
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Income"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime", 
                "revenue", 
                "total_expense",
                "revenue_inc"
            ]
        ]
        temp["ts_code"] = asset
        temp = temp[
            [
                "ts_code",
                "m_timetag",
                "m_anntime",
                "revenue", 
                "total_expense",
                "revenue_inc"
            ]
        ]
        temp = temp[(temp.m_timetag >= Begin_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    
    # Gross Profit
    All_data['Gross_Profit'] = All_data['revenue'].fillna(0) - All_data['total_expense'].fillna(0)
    
    def cal_group_dif(Group, Factor):
        def cal_diff(group, factor):
            res = group[factor].diff()
            res = res.fillna(group[factor]) # Fill NaN values with original values
            return res
        Group[f"{Factor}_diff"] = (
            Group.groupby("Year")
            .apply(lambda x: cal_diff(x, Factor))
            .reset_index(level=0, drop=True)
        ).values.reshape(-1)
        return Group
    # 单季度毛利润
    All_data = (
        All_data.groupby("ts_code")
        .apply(lambda x: cal_group_dif(x, "Gross_Profit"))
        .reset_index(drop=True)
    )
    # 单季度营业收入
    All_data = (
        All_data.groupby("ts_code")
        .apply(lambda x: cal_group_dif(x, "revenue_inc"))
        .reset_index(drop=True)
    )

    def Cal_Factor_YoY(Group, Factor):
        """
        计算同比增长。
        YoY = (本期值 - 上期值) / abs(上期值)
        如果上一期值和本期值为0，则增长率为0；
        如果上一期值为0，本期不为0，则增长率为NaN；
        其他情况正常计算。
        """
        DF = Group[Factor].fillna(0)
        DF_Shift = Group[Factor].fillna(0).shift(1)
        DIFF = DF - DF_Shift
        Group[f"{Factor}_YoY"] = np.where(DIFF == 0, 0, DIFF / abs(DF_Shift))
        Group[f"{Factor}_YoY"] = Group[f"{Factor}_YoY"].replace([np.inf, -np.inf], np.nan)
        return Group

    All_data = All_data.groupby(["ts_code", "Month"]).apply(lambda x: Cal_Factor_YoY(x, "Gross_Profit_diff")).reset_index(drop=True)
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    All_data = All_data.groupby(["ts_code", "Month"]).apply(lambda x: Cal_Factor_YoY(x, "revenue_inc_diff")).reset_index(drop=True)
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data['GPG_Minus_SRG'] = All_data["Gross_Profit_diff_YoY"] - All_data["revenue_inc_diff_YoY"]
    All_data = All_data[["m_timetag", "ts_code", "GPG_Minus_SRG"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [25]:
CMS_DF = GPG_Minus_SRG(trade_date, common_param, increment=True)
CMS_DF

Unnamed: 0,trade_date,ts_code,GPG_Minus_SRG
0,2024-06-30,000895.SZ,-0.013654
1,2024-06-30,002938.SZ,0.06552
2,2024-06-30,300999.SZ,0.449483
3,2024-06-30,601138.SH,-0.203409


### 单季度毛利率同比增长率(5.7)

(Gross Profit Growth)

$$
\frac{最近报告期单季度毛利率-上年同期单季度毛利率}{abs(上年同期单季度毛利率)}
$$

> 要对原始数据进行差分运算

In [26]:
# Gross profit growth 
def GPG(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    # 由于要计算差分，开始的日期是第一季度末的日期
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1, month=3)).strftime("%Y%m%d")
    
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Income"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime", 
                "revenue", 
                "total_expense"
            ]
        ]
        temp["ts_code"] = asset
        temp = temp[
            [
                "ts_code",
                "m_timetag",
                "m_anntime",
                "revenue", 
                "total_expense"
            ]
        ]
        temp = temp[(temp.m_timetag >= Begin_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    
    # Gross Profit
    All_data['Gross_Profit'] = All_data['revenue'].fillna(0) - All_data['total_expense'].fillna(0)
    def cal_group_dif(Group, Factor):
        def cal_diff(group, factor):
            res = group[factor].diff()
            res = res.fillna(group[factor]) # Fill NaN values with original values
            return res
        Group[f"{Factor}_diff"] = (
            Group.groupby("Year")
            .apply(lambda x: cal_diff(x, Factor))
            .reset_index(level=0, drop=True)
        ).values.reshape(-1)
        return Group
    # 单季度毛利润
    All_data = (
        All_data.groupby("ts_code")
        .apply(lambda x: cal_group_dif(x, "Gross_Profit"))
    ).reset_index(drop=True)

    def Cal_Factor_YoY(Group, Factor):
        """
        计算同比增长。
        YoY = (本期值 - 上期值) / abs(上期值)
        如果上一期值和本期值为0，则增长率为0；
        如果上一期值为0，本期不为0，则增长率为NaN；
        其他情况正常计算。
        """
        DF = Group[Factor].fillna(0)
        DF_Shift = Group[Factor].fillna(0).shift(1)
        DIFF = DF - DF_Shift
        Group["GPG_YoY"] = np.where(DIFF == 0, 0, DIFF / abs(DF_Shift))
        Group["GPG_YoY"] = Group["GPG_YoY"].replace([np.inf, -np.inf], np.nan)
        return Group

    All_data = All_data.groupby(["ts_code", "Month"]).apply(lambda x: Cal_Factor_YoY(x, "Gross_Profit_diff")).reset_index(drop=True)
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "GPG_YoY"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [27]:
GPG_DF = GPG(trade_date, common_param, increment=True)
GPG_DF

Unnamed: 0,trade_date,ts_code,GPG_YoY
0,2024-06-30,000895.SZ,-0.110968
1,2024-06-30,002938.SZ,0.38827
2,2024-06-30,300999.SZ,0.354672
3,2024-06-30,601138.SH,0.25765


### 对数总市值(5.31)

(Logarithmic Market Value)

A股市值分布存在严重厚尾特征，取对数后，会使得因子分布更接近正态分布。

$$
In(当日收盘价\times 当日总股本)
$$

In [28]:
def Log_Market_Value(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    if increment:
        try:
            MV = get_price(
                ts_code_list=Stock_list,
                feature_list=["total_mv"], # 总市值
                start_date=trade_date,
                trade_date=trade_date,
                target_type="stock",
            )
        # 如果报错，说明说当天不是交易日，没有数据
        except: return None
        MV = MV.reset_index()
        MV["total_mv"] = MV["total_mv"] * 10000
    else:
        MV = MV_DF(Stock_list, start_date, trade_date)
    MV["Log_Market_Value"] = np.log(MV["total_mv"])
    MV.dropna(inplace=True)
    if MV.empty: return None
    MV = MV[["trade_date", "ts_code", "Log_Market_Value"]]
    MV.reset_index(drop=True, inplace=True)
    return MV

In [29]:
LMV = Log_Market_Value("20231024", common_param, increment=True)
LMV

### 销售收入增长存货增长(4.14)

(Revenue Growth Minus Inventory Growth)

$$
最近单季度营业收入的同比增速-同期单季度存货的同比增速
$$

> 要对原始数据进行差分运算

In [30]:
def RG_Minus_IG(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    # 由于要计算差分，开始的日期是第一季度末的日期
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1, month=3)).strftime("%Y%m%d")

    Financial_DF = xtdata.get_financial_data(Stock_list, ["Balance", "Income"])
    Balance = pd.DataFrame()
    Income = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp_balance = Financial_DF[asset]['Balance'][
            [
                "m_timetag", 
                "m_anntime",
                "inventories"]
        ]
        temp_income = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime",
                "revenue_inc"]
        ]
        temp_balance["ts_code"] = asset
        temp_income["ts_code"] = asset
        temp_balance = temp_balance[
            [
                "ts_code",
                "m_timetag", 
                "m_anntime",
                "inventories"]
        ]
        temp_income = temp_income[
            [
                "ts_code",
                "m_timetag", 
                "m_anntime",
                "revenue_inc"]
        ]
        temp_balance = temp_balance[(temp_balance.m_timetag >= Begin_date) & (temp_balance.m_timetag <= trade_date)]
        temp_income = temp_income[(temp_income.m_timetag >= Begin_date) & (temp_income.m_timetag <= trade_date)]
        Balance = pd.concat([Balance, temp_balance], ignore_index=True)
        Income = pd.concat([Income, temp_income], ignore_index=True)

    keys = ["ts_code", "m_timetag"]
    All_data = pd.merge(Balance, Income, on=keys, how='left')
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime_x"] = pd.to_datetime(All_data["m_anntime_x"]) # Balance's column
    All_data["m_anntime_y"] = pd.to_datetime(All_data["m_anntime_y"]) # Income's column
    # Balance和Income表中的披露时间m_anntime可能不一样
    # 取同一个财报中，在这两个表中最早的日期作为实际的财报披露时间
    All_data['m_anntime'] = All_data[['m_anntime_x', 'm_anntime_y']].min(axis=1)
    
    def cal_group_dif(Group, Factor):
        def cal_diff(group, factor):
            res = group[factor].diff()
            res = res.fillna(group[factor]) # Fill NaN values with original values
            return res
        Series = (
            Group.groupby("Year")
            .apply(lambda x: cal_diff(x, Factor))
            .reset_index(level=0, drop=True)
        ).values.reshape(-1)
        # print(Series)
        Group[f"{Factor}_diff"] = Series
        return Group
    # 单季度毛利润
    All_data = (
        All_data.groupby("ts_code")
        .apply(lambda x: cal_group_dif(x, "inventories"))
        .reset_index(drop=True)
    )
    All_data = (
        All_data.groupby("ts_code")
        .apply(lambda x: cal_group_dif(x, "revenue_inc"))
    ).reset_index(drop=True)
    
    def Cal_Factor_YoY(Group, Factor):
        DF = Group[Factor].fillna(0)
        DF_Shift = Group[Factor].fillna(0).shift(1)
        DIFF = DF - DF_Shift
        Group[f"{Factor}_YoY"] = np.where(DIFF == 0, 0, DIFF / abs(DF_Shift))
        Group[f"{Factor}_YoY"] = Group[f"{Factor}_YoY"].replace([np.inf, -np.inf], np.nan)
        return Group
    
    All_data = All_data.groupby(["ts_code", "Month"]).apply(lambda x: Cal_Factor_YoY(x, "inventories_diff")).reset_index(drop=True)
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    All_data = All_data.groupby(["ts_code", "Month"]).apply(lambda x: Cal_Factor_YoY(x, "revenue_inc_diff")).reset_index(drop=True)
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data["RG_Minus_IG"] = All_data["revenue_inc_diff_YoY"] - All_data["inventories_diff_YoY"]
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "RG_Minus_IG"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [31]:
RMI_DF = RG_Minus_IG(trade_date, common_param, increment=True)
RMI_DF

Unnamed: 0,trade_date,ts_code,RG_Minus_IG
0,2024-06-30,000895.SZ,0.92648
1,2024-06-30,002938.SZ,-12.536521
2,2024-06-30,300999.SZ,-2.590268
3,2024-06-30,601138.SH,-4.219269


### 有形资本回报率环比增长率(4.12)

(Tangible Assets Return QoQ)

$$
\frac{最近报告期有形资本回报率(TTM)-上一报告期有形资本回报率(TTM)}{abs(上一报告期有形资本回报率(TTM))}
$$

有形资本（Tangible Capital）通常是指企业在其运营中实际拥有的有形资产和资本。计算有形资本的方法可以简单地通过将有形资产减去有形负债来实现。具体计算公式如下：

有形资本 = 有形资产 - 有形负债

有形资产（Tangible Assets）包括公司拥有的物质资产，如土地、建筑物、设备、存货等。而有形负债（Tangible Liabilities）指的是与这些有形资产相关的负债，如与建筑物、设备等有形资产相关的负债。

有形资本回报率（Return on Tangible Capital）是一个衡量公司有效利用其有形资本（即实际拥有的有形资产和资本）来创造利润的指标。有形资本回报率可以帮助评估公司在资本投资方面的效率和盈利能力。

有形资本回报率的计算方法如下：

有形资本回报率 = 净利润 / 有形资本

其中：
- 净利润（Net Income）是公司在一定时间内的净利润，通常可以在财务报表中找到。
- 有形资本（Tangible Capital）是公司实际拥有的有形资产减去有形负债的值。

有形资本回报率的计算结果显示公司在每单位有形资本上创造的收益。较高的有形资本回报率表明公司更有效地利用其有形资本来实现利润。

(拓展)

**有形净资产回报率是衡量企业盈利能力的一个重要指标，它反映了企业在剔除无形资产后，能够用有形资产创造的利润。这个指标在财务分析中具有重要的参考价值，尤其在评估企业资产运营效率方面。**

有形净资产回报率（Tangible Net Asset Return on Investment，简称TNA ROI）是指企业净利润与有形净资产之间的比率。它体现了企业利用有形净资产创造利润的能力。计算公式为：

有形净资产回报率 = $\frac{净利润}{有形净资产}$

其中，有形净资产是指企业的总资产减去无形资产和其他递延费用后的净值。具体计算公式为：

有形净资产 = 总资产 - 无形资产 - 其他递延费用

In [32]:
def TAR_QoQ(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    # 由于要计算差分，开始的日期是第一季度末的日期
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1, month=3)).strftime("%Y%m%d")

    Financial_DF = xtdata.get_financial_data(Stock_list, ["Balance", "Income"])
    Balance = pd.DataFrame()
    Income = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp_balance = Financial_DF[asset]['Balance'][
            [
                "m_timetag", 
                "m_anntime",
                "intang_assets",
                "tot_liab_shrhldr_eqy"]
        ]
        temp_income = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime",
                "net_profit_incl_min_int_inc"]
        ]
        temp_balance["ts_code"] = asset
        temp_income["ts_code"] = asset
        temp_balance = temp_balance[
            [
                "ts_code",
                "m_timetag", 
                "m_anntime",
                "intang_assets",
                "tot_liab_shrhldr_eqy"]
        ]
        temp_income = temp_income[
            [
                "ts_code",
                "m_timetag", 
                "m_anntime",
                "net_profit_incl_min_int_inc"]
        ]
        temp_balance = temp_balance[(temp_balance.m_timetag >= Begin_date) & (temp_balance.m_timetag <= trade_date)]
        temp_income = temp_income[(temp_income.m_timetag >= Begin_date) & (temp_income.m_timetag <= trade_date)]
        Balance = pd.concat([Balance, temp_balance], ignore_index=True)
        Income = pd.concat([Income, temp_income], ignore_index=True)

    keys = ["ts_code", "m_timetag"]
    All_data = pd.merge(Balance, Income, on=keys, how='left')
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime_x"] = pd.to_datetime(All_data["m_anntime_x"]) # Balance's column
    All_data["m_anntime_y"] = pd.to_datetime(All_data["m_anntime_y"]) # Income's column
    # Balance和Income表中的披露时间m_anntime可能不一样
    # 取同一个财报中，在这两个表中最早的日期作为实际的财报披露时间
    All_data['m_anntime'] = All_data[['m_anntime_x', 'm_anntime_y']].min(axis=1)
    
    # In "Tangible_Asset", there is no null value
    All_data["Tangible_Asset"] = All_data["tot_liab_shrhldr_eqy"].fillna(0) - All_data["intang_assets"].fillna(0)
    All_data["TAR"] = All_data["net_profit_incl_min_int_inc"] / All_data["Tangible_Asset"]
    All_data["TAR"] = All_data["TAR"].replace([np.inf, -np.inf], np.nan)
    def Cal_Factor_TAR_TTM(group, factor):
        group[f"{factor}_TTM"] = group[factor].rolling(4).sum()
        return group
    # Therefore, in "TAR_TTM", the null values appear in top
    All_data = All_data.groupby("ts_code").apply(lambda x: Cal_Factor_TAR_TTM(x, 'TAR')).reset_index(drop=True)
    def Cal_Factor_TAR_QoQ(group, factor):
        DF = group[factor]
        Shift_DF = group[factor].shift(1)
        group[f"{factor}_QoQ"] = (DF - Shift_DF) / np.abs(Shift_DF)
        group[f"{factor}_QoQ"] = group[f"{factor}_QoQ"].replace([np.inf, -np.inf], np.nan)
        return group
    All_data = All_data.groupby("ts_code").apply(lambda x: Cal_Factor_TAR_QoQ(x, 'TAR_TTM')).reset_index(drop=True)
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "TAR_TTM_QoQ"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [33]:
TAR_QoQ_DF = TAR_QoQ(trade_date, common_param, increment=False)
TAR_QoQ_DF

Unnamed: 0,trade_date,ts_code,TAR_TTM_QoQ
0,2015-03-31,000001.SZ,0.001344
1,2015-06-30,000001.SZ,-0.008912
2,2015-09-30,000001.SZ,-0.021306
3,2015-12-31,000001.SZ,-0.014872
4,2016-03-31,000001.SZ,-0.007710
...,...,...,...
9962,2023-06-30,688981.SH,-0.140383
9963,2023-09-30,688981.SH,-0.241878
9964,2023-12-31,688981.SH,-0.393615
9965,2024-03-31,688981.SH,-0.092595


### ROE增长减净资产增长(4.7)

(ROE Growth Minus Net Asset Growth)

经规模调整后的盈利因子，更能反映企业真正的内生增长性。

$$
ROE环比增速-净资产同比增速
$$

$ ROE环比增速=\frac{本期ROE(TTM)-上季度ROE(TTM)}{abs(季度ROE(TTM))}$

$净资产同比增速=\frac{最近报告期的归属母公司股东权益-上年同期值}{上年同期值} $

In [34]:
def ROEG_Minus_NAG(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1)).strftime("%Y%m%d")
    
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Balance", "Income"])
    Balance = pd.DataFrame()
    Income = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp_balance = Financial_DF[asset]['Balance'][
            [
                "m_timetag", 
                "m_anntime",
                "total_equity"]
        ]
        temp_income = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime",
                "net_profit_incl_min_int_inc"]
        ]
        temp_balance["ts_code"] = asset
        temp_income["ts_code"] = asset
        temp_balance = temp_balance[
            [
                "ts_code",
                "m_timetag", 
                "m_anntime",
                "total_equity"]
        ]
        temp_income = temp_income[
            [
                "ts_code",
                "m_timetag", 
                "m_anntime",
                "net_profit_incl_min_int_inc"]
        ]
        temp_balance = temp_balance[(temp_balance.m_timetag >= Begin_date) & (temp_balance.m_timetag <= trade_date)]
        temp_income = temp_income[(temp_income.m_timetag >= Begin_date) & (temp_income.m_timetag <= trade_date)]
        Balance = pd.concat([Balance, temp_balance], ignore_index=True)
        Income = pd.concat([Income, temp_income], ignore_index=True)

    keys = ["ts_code", "m_timetag"]
    All_data = pd.merge(Balance, Income, on=keys, how='left')
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime_x"] = pd.to_datetime(All_data["m_anntime_x"]) # Balance's column
    All_data["m_anntime_y"] = pd.to_datetime(All_data["m_anntime_y"]) # Income's column
    # Balance和Income表中的披露时间m_anntime可能不一样
    # 取同一个财报中，在这两个表中最早的日期作为实际的财报披露时间
    All_data['m_anntime'] = All_data[['m_anntime_x', 'm_anntime_y']].min(axis=1)
    
    All_data.fillna(0, inplace=True)
    All_data["ROE"] = All_data["net_profit_incl_min_int_inc"] / All_data["total_equity"]
    All_data["ROE"] = All_data["ROE"].replace([np.inf, -np.inf], np.nan)
    
    def Cal_Factor_ROE_TTM(group, factor):
        group[f"{factor}_TTM"] = group[factor].rolling(4).sum()
        return group
    # Therefore, in "ROE_TTM", the null values appear in top
    All_data = All_data.groupby("ts_code").apply(lambda x: Cal_Factor_ROE_TTM(x, 'ROE')).reset_index(drop=True)
    
    def Cal_Factor_ROE_QoQ(group, factor):
        DF = group[factor]
        Shift_DF = group[factor].shift(1)
        group[f"{factor}_QoQ"] = (DF - Shift_DF) / np.abs(Shift_DF)
        group[f"{factor}_QoQ"] = group[f"{factor}_QoQ"].replace([np.inf, -np.inf], np.nan)
        return group
    All_data = All_data.groupby("ts_code").apply(lambda x: Cal_Factor_ROE_QoQ(x, 'ROE_TTM')).reset_index(drop=True)
    
    def Cal_Factor_YoY(Group, Factor):
        """
        计算同比增长。
        YoY = (本期值 - 上期值) / abs(上期值)
        如果上一期值和本期值为0，则增长率为0；
        如果上一期值为0，本期不为0，则增长率为NaN；
        其他情况正常计算。
        """
        DF = Group[Factor].fillna(0)
        DF_Shift = Group[Factor].fillna(0).shift(1)
        DIFF = DF - DF_Shift
        Group[f"{Factor}_YoY"] = np.where(DIFF == 0, 0, DIFF / abs(DF_Shift))
        Group[f"{Factor}_YoY"] = Group[f"{Factor}_YoY"].replace([np.inf, -np.inf], np.nan)
        return Group

    All_data = All_data.groupby(["ts_code", "Month"]).apply(lambda x: Cal_Factor_YoY(x, "net_profit_incl_min_int_inc")).reset_index(drop=True)
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data["ROEG_Minus_NAG"] = All_data["ROE_TTM_QoQ"] - All_data["net_profit_incl_min_int_inc_YoY"]
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "ROEG_Minus_NAG"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [35]:
RMN = ROEG_Minus_NAG(trade_date, common_param, increment=False)
RMN

Unnamed: 0,trade_date,ts_code,ROEG_Minus_NAG
0,2015-03-31,000001.SZ,-0.118528
1,2015-06-30,000001.SZ,-0.166484
2,2015-09-30,000001.SZ,-0.158195
3,2015-12-31,000001.SZ,-0.145619
4,2016-03-31,000001.SZ,-0.105023
...,...,...,...
9845,2023-06-30,688981.SH,0.384813
9846,2023-09-30,688981.SH,0.371883
9847,2023-12-31,688981.SH,0.216088
9848,2024-03-31,688981.SH,0.589508


### 股权集中度(流通)(4.4)

(Concentration of circulating shares)

衡量公司的股权分布状态的主要指标，也是衡量公司稳定性强弱的重要指标，同时也是衡量公司结构的重要指标。

注：持股数据基本上每个月更新一次

$$
\frac{前三大流通股东持股数}{流通股股本}
$$

> `All_data.drop_duplicates(subset=["ts_code", "Year", "Month"])` 和 `All_data.drop_duplicates(subset=["ts_code", "endDate"])`结果不一样，什么原因？

In [36]:
def CCS(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Top10holder"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp_Top10holder = Financial_DF[asset]['Top10holder'][
            [
                "endDate", 
                "declareDate",
                "ratio",
                "rank"]
        ]
        temp_Top10holder["ts_code"] = asset
        temp_Top10holder = temp_Top10holder[
            [
                "ts_code",
                "endDate", 
                "declareDate",
                "ratio",
                "rank"]
        ]
        if increment:
            temp_Top10holder = temp_Top10holder[temp_Top10holder.declareDate == trade_date]
        else: temp_Top10holder = temp_Top10holder[(temp_Top10holder.endDate >= start_date) & (temp_Top10holder.endDate <= trade_date)]
        All_data = pd.concat([All_data, temp_Top10holder], ignore_index=True)

    if All_data.empty: return None
    All_data["Year"] = All_data["endDate"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["endDate"].apply(lambda x: x[4:6]).astype(int)
    All_data["endDate"] = pd.to_datetime(All_data["endDate"])
    All_data["declareDate"] = pd.to_datetime(All_data["declareDate"])
    # 由于QMT中数据的重复，这里去除掉重复数据
    All_data = All_data.drop_duplicates(subset=["ts_code", "endDate", "rank"])
    
    # Group by the combination of ts_code, endDate, declareDate, Year, and Month
    grouped = All_data.groupby(['ts_code', 'endDate', "declareDate", "Year", "Month"])

    # Get top 3 ratios for each group
    top_3_df = grouped.apply(lambda x: x.nlargest(3, 'ratio')['ratio'].sum()).reset_index(name="CCS")
    top_3_df["CCS"] = top_3_df["CCS"] / 3
    top_3_df.dropna(inplace=True)
    if top_3_df.empty: return None
    top_3_df = top_3_df[["endDate", "ts_code", "CCS"]]
    top_3_df = top_3_df.rename(columns={"endDate": "trade_date"})
    top_3_df.reset_index(drop=True, inplace=True)
    return top_3_df

In [37]:
CCS_DF = CCS(trade_date, common_param, increment=False)
CCS_DF

Unnamed: 0,trade_date,ts_code,CCS
0,2015-03-31,000001.SZ,19.660000
1,2015-04-30,000001.SZ,19.660000
2,2015-05-12,000001.SZ,19.313333
3,2015-06-30,000001.SZ,19.313333
4,2015-09-30,000001.SZ,19.553333
...,...,...,...
12929,2023-06-30,688981.SH,25.053333
12930,2023-09-30,688981.SH,25.000000
12931,2023-12-31,688981.SH,25.003333
12932,2024-03-31,688981.SH,24.996667


### 总资产市值比(4.2)

(Total Asset to Market Ratio)

常用的估值因子，用于衡量股票价格是否被高估或低估，与账面市值比类似。

$$
\frac{最近报告期总资产}{总市值}
$$

In [38]:
def Total_Asset_to_Market(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Balance"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Balance"][["m_timetag", "m_anntime", "tot_liab_shrhldr_eqy"]]
        temp["ts_code"] = asset
        temp = temp[["ts_code", "m_timetag", "m_anntime", "tot_liab_shrhldr_eqy"]]
        if increment:
            temp = temp[temp.m_anntime == trade_date]
        else: temp = temp[(temp.m_timetag >= start_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    if All_data.empty: return None
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    Exist_Stock_list = list(All_data["ts_code"].unique())
    
    # 获取日期范围
    date_unique = All_data['m_timetag'].drop_duplicates()
    date_sorted = date_unique.sort_values()
    latest_date = date_sorted.max().strftime("%Y%m%d")

    Market_Value_DF = MV_DF(Exist_Stock_list, start_date, latest_date)
    All_data = pd.merge(All_data, 
                        Market_Value_DF, 
                        how='left', 
                        left_on=['ts_code', 'm_timetag'], 
                        right_on=['ts_code', 'trade_date'])
    All_data["Total_Asset_to_Market"] = All_data["tot_liab_shrhldr_eqy"] / All_data["total_mv"]
    All_data["Total_Asset_to_Market"] = All_data["Total_Asset_to_Market"].replace([np.inf, -np.inf], np.nan)
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["trade_date", "ts_code", "Total_Asset_to_Market"]]
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [39]:
TAM = Total_Asset_to_Market(trade_date, common_param, increment=False)
TAM

Unnamed: 0,trade_date,ts_code,Total_Asset_to_Market
0,2015-03-31,000001.SZ,12.808433
1,2015-06-30,000001.SZ,12.355360
2,2015-09-30,000001.SZ,17.315754
3,2015-12-31,000001.SZ,14.613747
4,2016-03-31,000001.SZ,17.610907
...,...,...,...
9975,2023-06-30,688981.SH,0.826806
9976,2023-09-30,688981.SH,0.825979
9977,2023-12-31,688981.SH,0.803329
9978,2024-03-31,688981.SH,0.984776


### 研发费用收入比(3.29)

(Research Expenses Ratio)

研发费用对营业收入占比较高的公司可以获得较高的超额收益。

$$
\frac{最近12个月研发费用(TTM)}{最近12个月营业收入(TTM)}\ (如果研发费用较为稀疏，可用管理费用代替)
$$

In [40]:
def Research_Expenses_Ratio(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1)).strftime("%Y%m%d")

    Financial_DF = xtdata.get_financial_data(Stock_list, ["Income"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime", 
                "research_expenses",
                "revenue_inc"]
        ]
        temp["ts_code"] = asset
        temp = temp[
            [
                "ts_code", 
                "m_timetag", 
                "m_anntime", 
                "research_expenses",
                "revenue_inc"]
        ]
        temp = temp[(temp.m_timetag >= Begin_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    All_data.fillna(0, inplace=True)
    
    def Cal_Factor_TTM(group, factor):
        group[f"{factor}_TTM"] = group[factor].rolling(4).sum()
        return group
    # Therefore, in "TAR_TTM", the null values appear in top
    All_data = All_data.groupby("ts_code").apply(lambda x: Cal_Factor_TTM(x, 'research_expenses')).reset_index(drop=True)
    All_data = All_data.groupby("ts_code").apply(lambda x: Cal_Factor_TTM(x, 'revenue_inc')).reset_index(drop=True)
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    if All_data.empty: return None
    
    All_data["RER"] = All_data["research_expenses_TTM"] / All_data["revenue_inc_TTM"]
    All_data["RER"] = All_data["RER"].replace([np.inf, -np.inf], np.nan)
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "RER"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [41]:
RER = Research_Expenses_Ratio(trade_date, common_param, increment=False)
RER

Unnamed: 0,trade_date,ts_code,RER
0,2015-03-31,000001.SZ,0.000000
1,2015-06-30,000001.SZ,0.000000
2,2015-09-30,000001.SZ,0.000000
3,2015-12-31,000001.SZ,0.000000
4,2016-03-31,000001.SZ,0.000000
...,...,...,...
10312,2023-06-30,688981.SH,0.101463
10313,2023-09-30,688981.SH,0.106449
10314,2023-12-31,688981.SH,0.110932
10315,2024-03-31,688981.SH,0.110192


### 单季度销售成本率同比增长率(3.24)

(YoY of Sales-cost Ratio)

$$
\frac{最近报告期单季度销售成本率-上年同期单季度销售成本率}{上年同期单季度销售成本率}
$$

$$ 销售成本率＝销售成本÷销售收入净额×100% $$

> 要对原始数据进行差分运算

In [42]:
def Sales_Cost_Ratio_YoY(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    # 由于要计算差分，开始的日期是第一季度末的日期
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1, month=3)).strftime("%Y%m%d")
    
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Income"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime",
                "sale_expense",
                "revenue"]
        ]
        temp["ts_code"] = asset
        temp = temp[
            [
                "ts_code", 
                "m_timetag", 
                "m_anntime", 
                "sale_expense",
                "revenue"]
        ]
        temp = temp[(temp.m_timetag >= Begin_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    All_data.fillna(0, inplace=True)

    def cal_group_dif(Group, Factor):
        def cal_diff(group, factor):
            res = group[factor].diff()
            res = res.fillna(group[factor]) # Fill NaN values with original values
            return res
        Group[f"{Factor}_diff"] = (
            Group.groupby("Year")
            .apply(lambda x: cal_diff(x, Factor))
            .reset_index(level=0, drop=True)
        ).values.reshape(-1)
        return Group
    # 单季度销售成本
    All_data = (
        All_data.groupby("ts_code")
        .apply(lambda x: cal_group_dif(x, "sale_expense"))
        .reset_index(drop=True)
    )
    # 单季度销售收入净额
    All_data = (
        All_data.groupby("ts_code")
        .apply(lambda x: cal_group_dif(x, "revenue"))
        .reset_index(drop=True)
    )
    All_data["Sales_Cost_Ratio"] = All_data["sale_expense_diff"] / All_data["revenue_diff"]
    All_data["Sales_Cost_Ratio"] = All_data["Sales_Cost_Ratio"].replace([np.inf, -np.inf], np.nan)
    def Cal_Factor_YoY(Group, Factor):
        """
        计算同比增长。
        YoY = (本期值 - 上期值) / abs(上期值)
        如果上一期值和本期值为0，则增长率为0；
        如果上一期值为0，本期不为0，则增长率为NaN；
        其他情况正常计算。
        """
        DF = Group[Factor].fillna(0)
        DF_Shift = Group[Factor].fillna(0).shift(1)
        DIFF = DF - DF_Shift
        Group[f"{Factor}_YoY"] = np.where(DIFF == 0, 0, DIFF / abs(DF_Shift))
        Group[f"{Factor}_YoY"] = Group[f"{Factor}_YoY"].replace([np.inf, -np.inf], np.nan)
        return Group

    All_data = All_data.groupby(["ts_code", "Month"]).apply(lambda x: Cal_Factor_YoY(x, "Sales_Cost_Ratio")).reset_index(drop=True)
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "Sales_Cost_Ratio_YoY"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [43]:
SCR = Sales_Cost_Ratio_YoY(trade_date, common_param, increment=False)
SCR

Unnamed: 0,trade_date,ts_code,Sales_Cost_Ratio_YoY
0,2015-03-31,000001.SZ,0.000000
1,2015-06-30,000001.SZ,0.000000
2,2015-09-30,000001.SZ,0.000000
3,2015-12-31,000001.SZ,0.000000
4,2016-03-31,000001.SZ,0.000000
...,...,...,...
10209,2023-06-30,688981.SH,0.189080
10210,2023-09-30,688981.SH,0.516813
10211,2023-12-31,688981.SH,0.157840
10212,2024-03-31,688981.SH,-0.061642


### 速动比率(3.17)

(Quick Ratio)

反映企业的短期偿债能力。

$$
\frac{最近报告期速动资产}{最近同期流动负债合计}
$$

其中：$速动资产=流动资产-存货-(应付账款+待摊费用)$

In [44]:
def Quick_Ratio(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Balance"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Balance"][
            [
                "m_timetag", 
                "m_anntime",
                "total_current_assets",
                "inventories",
                "accounts_payable",
                "apportioned_cost",
                "total_current_liability"]
        ]
        temp["ts_code"] = asset
        temp = temp[
            [
                "ts_code", 
                "m_timetag", 
                "m_anntime", 
                "total_current_assets",
                "inventories",
                "accounts_payable",
                "apportioned_cost",
                "total_current_liability"]
        ]
        if increment:
            temp = temp[temp.m_anntime == trade_date]
        else: temp = temp[(temp.m_timetag >= start_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    if All_data.empty: return None
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    All_data.fillna(0, inplace=True)
    
    All_data["Quick_Asset"] = (
        All_data["total_current_assets"]
        - All_data["inventories"]
        - All_data["accounts_payable"]
        - All_data["apportioned_cost"]
    )
    
    All_data["Quick_Ratio"] = All_data["Quick_Asset"] / All_data["total_current_liability"]
    All_data["Quick_Ratio"] = All_data["Quick_Ratio"].replace([np.inf, -np.inf], np.nan)
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "Quick_Ratio"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [45]:
QR = Quick_Ratio(trade_date, common_param, increment=False)
QR

Unnamed: 0,trade_date,ts_code,Quick_Ratio
0,2015-03-31,000002.SZ,0.212581
1,2015-06-30,000002.SZ,0.245267
2,2015-09-30,000002.SZ,0.227356
3,2015-12-31,000002.SZ,0.207132
4,2016-03-31,000002.SZ,0.184785
...,...,...,...
8721,2023-06-30,688981.SH,1.664410
8722,2023-09-30,688981.SH,1.396603
8723,2023-12-31,688981.SH,1.373340
8724,2024-03-31,688981.SH,1.200630


### 资本支出增长率-n年(3.14)

(Capital Expenditure Growth Rate)

资本性支出的增长与股票未来收益呈反比，而且公司增加资本性支出的行为还会影响传统基本面特征(如市值、账面市值比)与股票收益的关系，降低风险暴露。

$$
\frac{最近12个月的资本性支出(TTM)-前置n年的同期值}{前置n年的同期值}
$$

$ 资本性支出=购建固定资产、无形资产和其他长期资产支付的现金-处置固定资产、无形资产和其他长期资产收回的现金 $

$ n=2年(2*12个月)、3年(3*12个月) $

In [46]:
def CEGR(trade_date, common_param, increment=True, start_date="20150101"):
    """
    Capital Expenditure Growth Rate
    """
    Stock_list = common_param["Stock_list"]
    n = common_param["CEGR_n"]

    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=n+1)).strftime("%Y%m%d")

    Financial_DF = xtdata.get_financial_data(Stock_list, ["CashFlow"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp_cashflow = Financial_DF[asset]["CashFlow"][
            [
                "m_timetag",
                "m_anntime",
                "cash_pay_acq_const_fiolta",
                "net_cash_recp_disp_fiolta",
            ]
        ]
        temp_cashflow["ts_code"] = asset
        temp_cashflow = temp_cashflow[
            [
                "ts_code",
                "m_timetag",
                "m_anntime",
                "cash_pay_acq_const_fiolta",
                "net_cash_recp_disp_fiolta",
            ]
        ]
        temp_cashflow = temp_cashflow[(temp_cashflow.m_timetag >= Begin_date) & (temp_cashflow.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp_cashflow], ignore_index=True)

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    All_data.fillna(0, inplace=True)
    All_data["Capital_Expenditure"] = All_data["cash_pay_acq_const_fiolta"] - All_data["net_cash_recp_disp_fiolta"]
    
    def cal_factor_ttm(data, factor):
        """
        计算factor_ttm.
        factor_diff ：单季度factor数值
        factor_ttm ：滚动四个季度factor之和
        """
        def cal_group_dif(group, factor):
            res = group[factor].diff()
            res = res.fillna(group[factor]) # Fill NaN values with original values
            return res

        data[f"{factor}_diff"] = (
            data.groupby("Year")
            .apply(lambda x: cal_group_dif(x, factor))
            .reset_index(level=0, drop=True)
        ).values.reshape(-1)
        data[f"{factor}_TTM"] = data[f"{factor}_diff"].rolling(4).sum()
        return data

    All_data = All_data.groupby('ts_code').apply(
        lambda x: cal_factor_ttm(x, 'Capital_Expenditure')
    ).reset_index(drop=True)
    
    def Cal_Factor_YoY(Group, Factor, n):
        """
        计算同比增长。
        YoY = (本期值 - 上期值) / abs(上期值)
        如果上一期值和本期值为0，则增长率为0；
        如果上一期值为0，本期不为0，则增长率为NaN；
        其他情况正常计算。
        """
        DF = Group[Factor]
        DF_Shift = Group[Factor].shift(n)
        DIFF = DF - DF_Shift
        Group["CEGR"] = np.where(DIFF == 0, 0, DIFF / abs(DF_Shift))
        Group["CEGR"] = Group["CEGR"].replace([np.inf, -np.inf], np.nan)
        return Group
    All_data = All_data.groupby(["ts_code", "Month"]).apply(lambda x: Cal_Factor_YoY(x, "Capital_Expenditure_TTM", n)).reset_index(drop=True)
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "CEGR"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [47]:
CEGR_DF = CEGR(trade_date, common_param, increment=False)
CEGR_DF

Unnamed: 0,trade_date,ts_code,CEGR
0,2015-03-31,000001.SZ,1.926712
1,2015-06-30,000001.SZ,2.063278
2,2015-09-30,000001.SZ,1.941323
3,2015-12-31,000001.SZ,1.052265
4,2016-03-31,000001.SZ,-0.132576
...,...,...,...
9620,2023-06-30,688981.SH,0.420594
9621,2023-09-30,688981.SH,0.772116
9622,2023-12-31,688981.SH,1.007105
9623,2024-03-31,688981.SH,1.087922


### 流通股市值(3.13)

(Tradable Market Value)

流通股是指上市公司股份中可以在交易所流通的股份数量，通过统计流通股市值可以直观的体现公司目前流通规模的大小。

$$
当日收盘价 \times 当日自由流通股本
$$

In [48]:
# def Tradable_Market_Value(Date, common_param):
#     """
#     流通股市值
#     方法1：(总市值 / 总股本) * 流通股本
#     方法2：收盘价 * 流通股本
    
#     由于总市值已经得到，因此使用方法1计算
#     旧方法计算，只返回当天流通股总数有变化的股票的流通市值数据。
#     """
#     Stock_list = common_param["Stock_list"]
#     Financial_DF = xtdata.get_financial_data(Stock_list, ["Capital"])
#     # 首先获取公司流通股本变化的日期
#     All_data = pd.DataFrame()
#     for asset in Financial_DF.keys():
#         temp = Financial_DF[asset]["Capital"][
#             [
#                 "m_timetag", 
#                 "m_anntime", 
#                 "total_capital", 
#                 "circulating_capital"]
#         ]
#         temp["ts_code"] = asset
#         temp = temp[[
#             "ts_code", 
#             "m_timetag", 
#             "m_anntime", 
#             "total_capital", 
#             "circulating_capital"]
#         ]
#         temp = temp[temp.m_anntime == Date]
#         All_data = pd.concat([All_data, temp], ignore_index=True)

#     if All_data.empty: return None
#     All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
#     All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
#     All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
#     All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
#     Exist_Stock_list = list(All_data["ts_code"].unique())
#     # print(Exist_Stock_list)
    
#     # 获取日期范围
#     date_unique = All_data['m_timetag'].drop_duplicates()
#     date_sorted = date_unique.sort_values()
#     earliest_date = "20140101"
#     latest_date = date_sorted.max().strftime("%Y%m%d")
#     # print(earliest_date, latest_date)

#     Market_Value_DF = MV_DF(Exist_Stock_list, earliest_date, latest_date)
#     All_data = pd.merge(All_data, 
#                         Market_Value_DF, 
#                         how='left', 
#                         left_on=['ts_code', 'm_timetag'], 
#                         right_on=['ts_code', 'trade_date'])
#     All_data["Tradable_Market_Value"] = (All_data["total_mv"] / All_data["total_capital"]) * All_data["circulating_capital"]
#     All_data["Tradable_Market_Value"] = All_data["Tradable_Market_Value"].replace([np.inf, -np.inf], np.nan)
#     All_data = All_data[["m_timetag", "ts_code", "Tradable_Market_Value"]]
#     All_data = All_data.rename(columns={"m_timetag": "trade_date"})
#     All_data.reset_index(drop=True, inplace=True)
#     return All_data

def Tradable_Market_Value(trade_date, common_param, increment=True, start_date="20150101"):
    """
    流通股市值
    方法1：(总市值 / 总股本) * 流通股本
    方法2：收盘价 * 流通股本
    
    使用方法2计算。
    晨乐数据库中的数据开始时间为2014.01.02.
    新的计算方法，确保每天返回所有股票的流通市值数据。
    (因为即使流通股本数没有变化，但是每天的股票价格在变化。)
    """
    Stock_list = common_param["Stock_list"]
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Capital"])
    # 首先获取公司流通股本变化的日期和对应流通股本的值
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Capital"][
            [
                "m_timetag", 
                "m_anntime",
                "circulating_capital"]
        ]
        temp["ts_code"] = asset
        temp = temp[[
            "m_timetag", 
            "ts_code", 
            "m_anntime", 
            "circulating_capital"]
        ]
        temp = temp[temp.m_timetag <= trade_date]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    if All_data.empty: return None
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    Changable_dates = All_data.set_index("trade_date").index.unique()
    
    # 获取每日收盘价
    MV = get_price(
        ts_code_list=Stock_list,
        feature_list=["close"], # 收盘价
        start_date=start_date,
        trade_date=trade_date,
        target_type="stock",
    )
    MV = MV.reset_index()
    MV_pivot = MV.pivot(index="trade_date", columns="ts_code", values="close")
    
    # 将交易日日期与股本变化日期合并，并排序
    Combined_date = pd.Index(sorted(set(MV_pivot.index) | set(Changable_dates)))
    
    # 创建一个新的Dataframe，包含交易日和股本变化日期，所有的股票代码
    all_combinations = pd.MultiIndex.from_product(
        [Combined_date, Stock_list], 
        names=["trade_date", "ts_code"]
    )
    all_combinations_df = pd.DataFrame(index=all_combinations).reset_index()

    # 将新的Dataframe与价格Dataframe合并，采用左连接的方法
    price_whole = pd.merge(
        all_combinations_df, MV, 
        on=["trade_date", "ts_code"], 
        how="left"
    )
    capital_whole = pd.merge(
        all_combinations_df, All_data, 
        on=["trade_date", "ts_code"], 
        how="left"
    )
    
    # 为了方便缺失的数据显示出来，这里使用pivot，并采用前值填充的方法
    price_DF = price_whole.pivot(index="trade_date", columns="ts_code", values="close")
    capital_DF = capital_whole.pivot(index="trade_date", columns="ts_code", values="circulating_capital")
    price_DF = price_DF.ffill()
    capital_DF = capital_DF.ffill()
    
    # 再将pivot转换成常用的形式，方便后续处理
    price_DF = price_DF.reset_index().melt(
        id_vars="trade_date", 
        var_name="ts_code", 
        value_name="close"
    )
    capital_DF = capital_DF.reset_index().melt(
        id_vars="trade_date", 
        var_name="ts_code", 
        value_name="circulating_capital"
    )
    price_DF = price_DF.sort_values(["trade_date", "ts_code"]).reset_index(drop=True)
    capital_DF = capital_DF.sort_values(["trade_date", "ts_code"]).reset_index(drop=True)
    combined_DF = pd.merge(price_DF, capital_DF, on=["trade_date", "ts_code"], how="left")
    if increment:
        combined_DF = combined_DF[combined_DF.trade_date == trade_date]
    else: combined_DF = combined_DF[(combined_DF.trade_date >= start_date) &(combined_DF.trade_date <= trade_date)]
    combined_DF.dropna(inplace=True)
    if combined_DF.empty: return None
    combined_DF["Tradable_Market_Value"] = combined_DF["close"] * combined_DF["circulating_capital"]
    combined_DF = combined_DF[[	"trade_date", "ts_code", "Tradable_Market_Value"]]
    combined_DF.reset_index(drop=True, inplace=True)
    return combined_DF

In [49]:
TMV = Tradable_Market_Value(trade_date, common_param, increment=False)
TMV

Unnamed: 0,trade_date,ts_code,Tradable_Market_Value
0,2015-01-05,000001.SZ,1.575841e+11
1,2015-01-05,000002.SZ,1.447266e+11
2,2015-01-05,000063.SZ,5.325053e+10
3,2015-01-05,000100.SZ,3.278399e+10
4,2015-01-05,000157.SZ,4.464089e+10
...,...,...,...
668394,2024-08-14,688303.SH,1.073939e+10
668395,2024-08-14,688363.SH,1.225138e+10
668396,2024-08-14,688396.SH,4.779220e+10
668397,2024-08-14,688599.SH,3.691845e+10


### 净经营资产变动(3.8)

(Change in Net Operating Assets)

净经营资产增长过快的企业未来盈利能力会有所下降，净经营资产变动与股票未来收益负相关。

$$
\frac{最近报告期的净经营资产-上年同期值}{最近报告期的总资产}
$$

$净经营资产=股东权益合计(含少数股东权益)+金融负债-金融资产=经营性资产-经营性负债$

金融负债："shortterm_loan", "tradable_fin_liab", "derivative_fin_liab", "notes_payable","accounts_payable", "long_term_loans", "bonds_payable"

金融资产："cash_equivalents", "loans_to_oth_banks", "tradable_fin_assets", "derivative_fin_assets", "bill_receivable", "account_receivable",  "other_receivable", "loans_and_adv_granted", "long_term_eqy_invest", "red_monetary_cap_for_sale", "agency_bus_assets"

[新金融工具｜终于有人将“金融负债的分类”讲明白了！](https://baijiahao.baidu.com/s?id=1771557694392273497)

In [50]:
def Change_in_NOA(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]

    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1)).strftime("%Y%m%d")

    Financial_DF = xtdata.get_financial_data(Stock_list, ["Balance"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Balance"][
            [
                "m_timetag", 
                "m_anntime",
                "total_equity",
                "tot_liab_shrhldr_eqy",
                "shortterm_loan",
                "tradable_fin_liab",
                "derivative_fin_liab",
                "notes_payable",
                "accounts_payable",
                "long_term_loans",
                "bonds_payable",
                
                "cash_equivalents",
                "loans_to_oth_banks",
                "tradable_fin_assets",
                "derivative_fin_assets",
                "bill_receivable",
                "account_receivable", 
                "other_receivable",
                "loans_and_adv_granted",
                "long_term_eqy_invest",
                "red_monetary_cap_for_sale",
                "agency_bus_assets"]
        ]
        temp["ts_code"] = asset
        temp = temp[
            [
                "ts_code", 
                "m_timetag", 
                "m_anntime",
                "total_equity",
                "tot_liab_shrhldr_eqy",
                "shortterm_loan",
                "tradable_fin_liab",
                "derivative_fin_liab",
                "notes_payable",
                "accounts_payable",
                "long_term_loans",
                "bonds_payable",
                
                "cash_equivalents",
                "loans_to_oth_banks",
                "tradable_fin_assets",
                "derivative_fin_assets",
                "bill_receivable",
                "account_receivable", 
                "other_receivable",
                "loans_and_adv_granted",
                "long_term_eqy_invest",
                "red_monetary_cap_for_sale",
                "agency_bus_assets"]
        ]
        temp = temp[(temp.m_timetag >= Begin_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    
    All_data.fillna(0, inplace=True)
    
    All_data["NOA"] = (
        All_data["total_equity"]   
        + All_data["shortterm_loan"]
        + All_data["tradable_fin_liab"]
        + All_data["derivative_fin_liab"]
        + All_data["notes_payable"]
        + All_data["accounts_payable"]
        + All_data["long_term_loans"]
        + All_data["bonds_payable"]
        - All_data["cash_equivalents"]
        - All_data["loans_to_oth_banks"]
        - All_data["tradable_fin_assets"]
        - All_data["derivative_fin_assets"]
        - All_data["bill_receivable"]
        - All_data["account_receivable"]
        - All_data["other_receivable"]
        - All_data["red_monetary_cap_for_sale"]
        - All_data["loans_and_adv_granted"]
        - All_data["long_term_eqy_invest"]
        - All_data["red_monetary_cap_for_sale"]
        - All_data["agency_bus_assets"]
    )
    
    def Cal_Factor_Diff(Group, Factor):
        """
        计算同比差值。
        Diff = 本期值 - 上期值
        如果运算中出现了null值，则结果为null值
        """
        DF = Group[Factor].fillna(0)
        DF_Shift = Group[Factor].fillna(0).shift(1)
        Group[f"{Factor}_Diff"] = DF - DF_Shift
        return Group

    All_data = All_data.groupby(["ts_code", "Month"]).apply(lambda x: Cal_Factor_Diff(x, "NOA")).reset_index(drop=True)
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    if All_data.empty: return None
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    All_data["CNOA"] = All_data["NOA_Diff"] / All_data["tot_liab_shrhldr_eqy"]
    All_data["CNOA"] = All_data["CNOA"].replace([np.inf, -np.inf], np.nan)
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "CNOA"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [51]:
CNOA = Change_in_NOA(trade_date, common_param, increment=False)
CNOA

Unnamed: 0,trade_date,ts_code,CNOA
0,2015-03-31,000001.SZ,0.067911
1,2015-06-30,000001.SZ,-0.089928
2,2015-09-30,000001.SZ,-0.079956
3,2015-12-31,000001.SZ,0.052970
4,2016-03-31,000001.SZ,0.126089
...,...,...,...
9989,2023-06-30,688981.SH,0.239484
9990,2023-09-30,688981.SH,0.197316
9991,2023-12-31,688981.SH,0.163332
9992,2024-03-31,688981.SH,0.158291


### 规模调整后的ROE(3.2)

(Adjusted ROE)

经规模调整后的盈利因子，更能反映企业真正的内生增长性。

连续8个季度的ROE(TTM)序列(因变量)对相同区间的调整后的总资产序列(自变量)进行 OLS 回归，取最新一期残差作为因子值，其中仅对总资产的年报进行调整，一季报、半年报、三季报沿用上年年报数据。

[规模调整的盈利因子-从盈余公积谈起](https://www.doc88.com/p-73847754586096.html)

In [52]:
def OSL_Regression_AROE(ROETTM, TotalAsset):
    if ROETTM.isnull().any() or TotalAsset.isnull().any():
        return None
    # Combine the input series into a DataFrame
    df = pd.DataFrame(
        {
            "ROETTM": ROETTM,
            "TotalAsset": TotalAsset,
        }
    )
    # Z-Score standardization
    df_standardized = ((df - df.mean()) / df.std()).fillna(0)

    # Prepare the data for regression
    X = sm.add_constant(df_standardized[["TotalAsset"]])
    y = df_standardized["ROETTM"]

    # Perform OLS regression
    model = sm.OLS(y, X).fit()

    # Get the coefficients
    a = model.params["const"]
    beta = model.params["TotalAsset"]

    # Calculate LPNP factor value
    latest_data = df_standardized.iloc[-1]  # Get the latest quarter data
    AR = (
        latest_data["ROETTM"]
        - a
        - beta * latest_data["TotalAsset"]
    )
    return AR

def Adjusted_ROE(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    N = common_param["AROE_N"]
    """
    Capacity Utilization Increase
    """

    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=2, months=3*N, month=12)).strftime("%Y%m%d")

    Financial_DF = xtdata.get_financial_data(Stock_list, ["Balance", "PershareIndex"])
    Balance = pd.DataFrame()
    PershareIndex = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp_pershareIndex = Financial_DF[asset]["PershareIndex"][
            [
                "m_timetag", 
                "m_anntime",
                "du_return_on_equity"]
        ]
        temp_balance = Financial_DF[asset]['Balance'][
            [
                "m_timetag", 
                "m_anntime",
                'tot_liab_shrhldr_eqy']
        ]
        temp_pershareIndex["ts_code"] = asset
        temp_balance["ts_code"] = asset
        temp_pershareIndex = temp_pershareIndex[
            [
                "ts_code",
                "m_timetag",
                "m_anntime",
                "du_return_on_equity"]
        ]
        temp_balance = temp_balance[
            [
                "ts_code",
                "m_timetag", 
                "m_anntime",
                'tot_liab_shrhldr_eqy']
        ]
        temp_pershareIndex = temp_pershareIndex[(temp_pershareIndex.m_timetag >= Begin_date) & (temp_pershareIndex.m_timetag <= trade_date)]
        temp_balance = temp_balance[(temp_balance.m_timetag >= Begin_date) & (temp_balance.m_timetag <= trade_date)]
        PershareIndex = pd.concat([PershareIndex, temp_pershareIndex], ignore_index=True)
        Balance = pd.concat([Balance, temp_balance], ignore_index=True)
    
    # PershareIndex中有大量重复行
    PershareIndex = PershareIndex.drop_duplicates(subset=["ts_code", "m_timetag"])

    keys = ["ts_code", "m_timetag"]
    All_data = pd.merge(Balance, PershareIndex, on=keys, how='left')
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime_x"] = pd.to_datetime(All_data["m_anntime_x"]) # Balance's column
    All_data["m_anntime_y"] = pd.to_datetime(All_data["m_anntime_y"]) # PershareIndex's column
    # 由于Balance和PershareIndex表中的披露时间m_anntime不一样
    # 取同一个财报中，在这两个表中最早的日期作为实际的财报披露时间
    All_data['m_anntime'] = All_data[['m_anntime_x', 'm_anntime_y']].min(axis=1)
    All_data.fillna(0, inplace=True)

    def cal_factor_ttm(data, factor, N):
        """
        计算factor_ttm.
        factor_diff ：单季度factor数值
        factor_ttm ：滚动四个季度factor之和
        """

        def cal_group_dif(group, factor):
            res = group[factor].diff()
            res = res.fillna(group[factor]) # Fill NaN values with original values
            return res

        data[f"{factor}_diff"] = (
            data.groupby("Year")
            .apply(lambda x: cal_group_dif(x, factor))
            .reset_index(level=0, drop=True)
        ).values.reshape(-1)
        data[f"{factor}_TTM"] = data[f"{factor}_diff"].rolling(N).sum()
        return data

    All_data = All_data.groupby("ts_code").apply(lambda x: cal_factor_ttm(x, "du_return_on_equity", N)).reset_index(drop=True)
    All_data.loc[All_data["Month"] != 12, "tot_liab_shrhldr_eqy"] = None
    All_data["tot_liab_shrhldr_eqy"] = All_data.groupby("ts_code")["tot_liab_shrhldr_eqy"].ffill()
    def cal_factor_AROE(group, factor_ROE, factor_TA, N):
        ROE_Rolling = group[factor_ROE].rolling(N)
        TA_Rolling = group[factor_TA].rolling(N)
        group["AROE"] = [OSL_Regression_AROE(a, b) 
                         for a, b in 
                         zip(ROE_Rolling, TA_Rolling)]
        return group

    All_data = All_data.groupby("ts_code").apply(
        lambda x: cal_factor_AROE(
            x, 
            "du_return_on_equity_TTM", 
            "tot_liab_shrhldr_eqy",
            N)
        ).reset_index(drop=True)
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "AROE"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [53]:
AROE = Adjusted_ROE(trade_date, common_param, increment=False)
AROE

Unnamed: 0,trade_date,ts_code,AROE
0,2015-03-31,000001.SZ,0.080440
1,2015-06-30,000001.SZ,-0.045979
2,2015-09-30,000001.SZ,-0.953141
3,2015-12-31,000001.SZ,0.445252
4,2016-03-31,000001.SZ,-0.658618
...,...,...,...
9087,2023-06-30,688981.SH,-1.357792
9088,2023-09-30,688981.SH,-1.475658
9089,2023-12-31,688981.SH,-1.253228
9090,2024-03-31,688981.SH,-1.077273


### 每股净资产同比增长率(2.23)

(YoY of BVPS)

注：企业的财务报告报出后，如果出现以前年度损益调整事项，比如所得税汇算调整等，会改变去年报表年末数字，相关调整应该在报表附注中披露，分析的时候以最近的数字为准。

$$
\frac{最近报告期每股净资产(TTM)-上年同期}{上年同期值}
$$

> 是否应该使用差分运算

In [54]:
def BVPS_YoY(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]

    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=2)).strftime("%Y%m%d")

    Financial_DF = xtdata.get_financial_data(Stock_list, ["PershareIndex"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["PershareIndex"][
            [
                "m_timetag", 
                "m_anntime",
                "s_fa_bps"]
        ]
        temp["ts_code"] = asset
        temp = temp[
            [
                "ts_code", 
                "m_timetag", 
                "m_anntime", 
                "s_fa_bps"]
        ]
        temp = temp[(temp.m_timetag >= Begin_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    
    # 保留最新的每股净资产数据
    # Sort the DataFrame by m_timetag and m_anntime in descending order
    df_sorted = All_data.sort_values(by=['m_timetag', 'm_anntime'], ascending=[False, False])

    # Keep only the first row for each unique m_timetag
    newest_values_df = df_sorted.groupby(['m_timetag', 'ts_code']).first().reset_index()
    newest_values_df = newest_values_df.sort_values(by=['ts_code', 'm_timetag'])
    newest_values_df = newest_values_df[['ts_code', 'm_timetag', 'm_anntime', 's_fa_bps', 'Year', 'Month']]

    def cal_factor_ttm(data, factor):
        """
        计算factor_ttm.
        由于是计算净资产的TTM，不需要进行差分运算。
        """
        data[f"{factor}_TTM"] = data[f"{factor}"].rolling(4).sum()
        return data

    newest_values_df = newest_values_df.groupby('ts_code').apply(
        lambda x: cal_factor_ttm(x, 's_fa_bps')
    ).reset_index(drop=True)
    
    def Cal_Factor_YoY(Group, Factor):
        """
        计算同比增长。
        YoY = (本期值 - 上期值) / abs(上期值)
        如果上一期值和本期值为0，则增长率为0；
        如果上一期值为0，本期不为0，则增长率为NaN；
        其他情况正常计算。
        """
        DF = Group[Factor]
        DF_Shift = Group[Factor].shift(1)
        DIFF = DF - DF_Shift
        Group["BVPS_YoY"] = np.where(DIFF == 0, 0, DIFF / abs(DF_Shift))
        Group["BVPS_YoY"] = Group["BVPS_YoY"].replace([np.inf, -np.inf], np.nan)
        return Group

    newest_values_df = newest_values_df.groupby(["ts_code", "Month"]).apply(lambda x: Cal_Factor_YoY(x, "s_fa_bps_TTM")).reset_index(drop=True)
    newest_values_df = newest_values_df.sort_values(by=['ts_code', 'm_timetag'])
    if increment: 
        newest_values_df = newest_values_df[newest_values_df.m_anntime == trade_date]
    else: newest_values_df = newest_values_df[(newest_values_df.m_timetag >= start_date) & (newest_values_df.m_timetag <= trade_date)]
    newest_values_df.dropna(inplace=True)
    if newest_values_df.empty: return None
    newest_values_df = newest_values_df[["m_timetag", "ts_code", "BVPS_YoY"]]
    newest_values_df = newest_values_df.rename(columns={"m_timetag": "trade_date"})
    newest_values_df.reset_index(drop=True, inplace=True)
    return newest_values_df

In [55]:
BVPS = BVPS_YoY(trade_date, common_param, increment=False)
BVPS

Unnamed: 0,trade_date,ts_code,BVPS_YoY
0,2015-03-31,000001.SZ,-0.081941
1,2015-06-30,000001.SZ,-0.071000
2,2015-09-30,000001.SZ,-0.063120
3,2015-12-31,000001.SZ,-0.060742
4,2016-03-31,000001.SZ,0.035066
...,...,...,...
9629,2023-06-30,688981.SH,0.202570
9630,2023-09-30,688981.SH,0.157855
9631,2023-12-31,688981.SH,0.119191
9632,2024-03-31,688981.SH,0.089779


### 股东户数(2.29)

(Number of shareholders)

持有该股票的股东人数越多，股票更易流通;如果股票只集中在少数机构手中该股的流通性会变差，会提高该股票的交易难度。

In [56]:
def Num_Shareholders(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Holdernum"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp_Holdernum = Financial_DF[asset]['Holdernum'][
            [
                "endDate", 
                "declareDate",
                "shareholder"]
        ]
        temp_Holdernum["ts_code"] = asset
        temp_Holdernum = temp_Holdernum[
            [
                "ts_code",
                "endDate", 
                "declareDate",
                "shareholder"]
        ]
        if increment:
            temp_Holdernum = temp_Holdernum[temp_Holdernum.declareDate == trade_date]
        else: temp_Holdernum = temp_Holdernum[(temp_Holdernum.endDate >= start_date) & (temp_Holdernum.endDate <= trade_date)]
        All_data = pd.concat([All_data, temp_Holdernum], ignore_index=True)

    if All_data.empty: return None
    All_data["Year"] = All_data["endDate"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["endDate"].apply(lambda x: x[4:6]).astype(int)
    All_data["endDate"] = pd.to_datetime(All_data["endDate"])
    All_data["declareDate"] = pd.to_datetime(All_data["declareDate"])
    
    All_data.drop_duplicates(subset=["ts_code", "endDate", "declareDate"], inplace=True)
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["endDate", "ts_code", "shareholder"]]
    All_data = All_data.rename(columns={"endDate": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [57]:
NS = Num_Shareholders(trade_date, common_param, increment=False)
NS

Unnamed: 0,trade_date,ts_code,shareholder
0,2015-03-31,000001.SZ,313029.0
1,2015-06-30,000001.SZ,418294.0
2,2015-09-30,000001.SZ,358049.0
3,2015-12-31,000001.SZ,332918.0
4,2016-03-31,000001.SZ,341525.0
...,...,...,...
16062,2023-09-30,688981.SH,276303.0
16063,2023-12-31,688981.SH,244046.0
16064,2024-02-29,688981.SH,248000.0
16065,2024-03-31,688981.SH,246045.0


### 研发费用市值比(2.18)

(Research Cost to Market Value Ratio)

研发费用对总市值占比较高的公司可以获得较高的超额收益。

$$
\frac{最近12个月的研发费用(TTM)}{总市值}\ (如果研发费用较为稀疏，可用管理费用代替)
$$

In [58]:
def RC_to_MVR(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    """
    Research Cost to Market Value Ratio
    """

    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1)).strftime("%Y%m%d")

    Financial_DF = xtdata.get_financial_data(Stock_list, ["Income"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp_income = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime",
                "research_expenses"]
        ]
        temp_income["ts_code"] = asset
        temp_income = temp_income[
            [
                "ts_code",
                "m_timetag", 
                "m_anntime",
                "research_expenses"]
        ]
        temp_income = temp_income[(temp_income.m_timetag >= Begin_date) & (temp_income.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp_income], ignore_index=True)

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    All_data.fillna(0, inplace=True)
    
    def cal_factor_ttm(data, factor):
        """
        计算factor_ttm.
        factor_diff ：单季度factor数值
        factor_ttm ：滚动四个季度factor之和
        """

        def cal_group_dif(group, factor):
            res = group[factor].diff()
            res = res.fillna(group[factor]) # Fill NaN values with original values
            return res

        data[f"{factor}_diff"] = (
            data.groupby("Year")
            .apply(lambda x: cal_group_dif(x, factor))
            .reset_index(level=0, drop=True)
        ).values.reshape(-1)
        data[f"{factor}_TTM"] = data[f"{factor}_diff"].rolling(4).sum()
        return data
    All_data = All_data.groupby('ts_code').apply(lambda x: cal_factor_ttm(x, 'research_expenses')).reset_index(drop=True)
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    if All_data.empty: return None

    Exist_Stock_list = list(All_data["ts_code"].unique())
    
    # 获取日期范围
    date_unique = All_data['m_timetag'].drop_duplicates()
    date_sorted = date_unique.sort_values()
    latest_date = date_sorted.max().strftime("%Y%m%d")

    Market_Value_DF = MV_DF(Exist_Stock_list, start_date, latest_date)

    All_data = pd.merge(All_data, 
                        Market_Value_DF, 
                        how='left', 
                        left_on=['ts_code', 'm_timetag'], 
                        right_on=['ts_code', 'trade_date'])
    All_data["RC_to_MVR"] = All_data["research_expenses_TTM"] / All_data["total_mv"]
    All_data["RC_to_MVR"] = All_data["RC_to_MVR"].replace([np.inf, -np.inf], np.nan)
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "RC_to_MVR"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [59]:
RM_DF = RC_to_MVR(trade_date, common_param, increment=False)
RM_DF

Unnamed: 0,trade_date,ts_code,RC_to_MVR
0,2015-03-31,000001.SZ,0.000000
1,2015-06-30,000001.SZ,0.000000
2,2015-09-30,000001.SZ,0.000000
3,2015-12-31,000001.SZ,0.000000
4,2016-03-31,000001.SZ,0.000000
...,...,...,...
9975,2023-06-30,688981.SH,0.012639
9976,2023-09-30,688981.SH,0.012423
9977,2023-12-31,688981.SH,0.011847
9978,2024-03-31,688981.SH,0.014906


### 标准化营业利润(2.9)

(Standardized Operating Profit)

通过对常规财务数据进行标准化构造的另类成长因子，标准化后的因子与成长和盈利大类因子相关性较高，可以通过回归中心化处理剔除相关性。

$$
\frac{营业利润(TTM)的当期值-mean(过去T期的营业利润(TTM)}{std(过去T期的营业利润(TTM))}\ (默认 T=6 个季度)
$$

check是否在0附近

In [60]:
def SOP(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    T = common_param["SOP_T"]
    """
    Standardized Operating Profit
    """

    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    Begin_date = (Prev_Quarter_date - pd.DateOffset(months=3*T)).strftime("%Y%m%d")

    Financial_DF = xtdata.get_financial_data(Stock_list, ["Income"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp_income = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime",
                "oper_profit"]
        ]
        temp_income["ts_code"] = asset
        temp_income = temp_income[
            [
                "ts_code",
                "m_timetag", 
                "m_anntime",
                "oper_profit"]
        ]
        temp_income = temp_income[(temp_income.m_timetag >= Begin_date) & (temp_income.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp_income], ignore_index=True)

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    
    def cal_factor_ttm(data, factor, T):
        """
        计算factor_ttm.
        factor_diff ：单季度factor数值
        factor_ttm ：滚动四个季度factor之和
        """

        def cal_group_dif(group, factor):
            res = group[factor].diff()
            res = res.fillna(group[factor]) # Fill NaN values with original values
            return res

        data[f"{factor}_diff"] = (
            data.groupby("Year")
            .apply(lambda x: cal_group_dif(x, factor))
            .reset_index(level=0, drop=True)
        ).values.reshape(-1)
        data[f"{factor}_TTM"] = data[f"{factor}_diff"].rolling(T).sum()
        data[f"{factor}_TTM_Mean"] = data[f"{factor}_diff"].rolling(T).mean()
        data[f"{factor}_TTM_Std"] = data[f"{factor}_diff"].rolling(T).std()
        return data

    # There are some NaN values in cash_cash_equ_end_period_TTM in the top
    All_data = All_data.groupby('ts_code').apply(lambda x: cal_factor_ttm(x, 'oper_profit', T)).reset_index(drop=True)
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data["SOP"] = (All_data["oper_profit_TTM"] - All_data["oper_profit_TTM_Mean"]) / All_data["oper_profit_TTM_Std"]
    All_data["SOP"] = All_data["SOP"].replace([np.inf, -np.inf], np.nan)
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "SOP"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [61]:
SOP_DF = SOP(trade_date, common_param, increment=False)
SOP_DF

Unnamed: 0,trade_date,ts_code,SOP
0,2015-03-31,000001.SZ,28.634868
1,2015-06-30,000001.SZ,41.012913
2,2015-09-30,000001.SZ,37.738891
3,2015-12-31,000001.SZ,30.694867
4,2016-03-31,000001.SZ,29.297339
...,...,...,...
10152,2023-06-30,688981.SH,15.994805
10153,2023-09-30,688981.SH,11.353229
10154,2023-12-31,688981.SH,11.771147
10155,2024-03-31,688981.SH,10.493596


### 异常资本投资(2.7)

(Abnormal Capital Investment)

异常资本投资为企业最近一年的资本性支出CE相对此前三年均值的变化。异常资本投资高的企业，未来收益会更低，而且对于那些有着更大投资自主权的企业(比如有着更高现金流量和更低负债率的企业)，这种投资效应更加显著。

$$
CI_{t-1}=\frac{CE_{t-1}}{(CE_{t-2} + CE_{t-3} + CE_{t-4})/3} - 1
$$

$CE = \frac{资本性支出}{营业收入}$


$资本性支出=购建固定资产、无形资产和其他长期资产支付的现金-处置固定资产、无形资产和其他长期资产收回的现金净额$

*购建固定资产、无形资产和其他长期资产支付的现金：该项目反映企业本期购买、建造固定资产、取得无形资产和其他长期资产（如投资性房地产）实际支付的现金，包括购买固定资产、无形资产等支付的价款及相关税费，以及用现金支付的应由在建工程和无形资产负担的职工薪酬。*

*处置固定资产、无形资产和其他长期资产收回的现金净额：处置固定资产、无形资产和其他长期资产收回的现金净额”项目，反映企业处置固定资产、无形资产和其他长期资产收回的现金，扣除所发生的现金支出后的净额。*

In [62]:
def ACI(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    """
    Standardized Operating Profit
    """

    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=4)).strftime("%Y%m%d")

    Financial_DF = xtdata.get_financial_data(Stock_list, ["CashFlow", "Income"])
    CashFlow = pd.DataFrame()
    Income = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp_cashflow = Financial_DF[asset]["CashFlow"][
            [
                "m_timetag",
                "m_anntime",
                "cash_pay_acq_const_fiolta",
                "net_cash_recp_disp_fiolta",
            ]
        ]
        temp_income = Financial_DF[asset]["Income"][
            ["m_timetag", "m_anntime", "revenue_inc"]
        ]
        temp_cashflow["ts_code"] = asset
        temp_income["ts_code"] = asset
        temp_cashflow = temp_cashflow[
            [
                "ts_code",
                "m_timetag",
                "m_anntime",
                "cash_pay_acq_const_fiolta",
                "net_cash_recp_disp_fiolta",
            ]
        ]
        temp_income = temp_income[["ts_code", "m_timetag", "m_anntime", "revenue_inc"]]
        temp_cashflow = temp_cashflow[(temp_cashflow.m_timetag >= Begin_date) & (temp_cashflow.m_timetag <= trade_date)]
        temp_income = temp_income[(temp_income.m_timetag >= Begin_date) & (temp_income.m_timetag <= trade_date)]
        CashFlow = pd.concat([CashFlow, temp_cashflow], ignore_index=True)
        Income = pd.concat([Income, temp_income], ignore_index=True)

    keys = ["ts_code", "m_timetag"]
    All_data = pd.merge(CashFlow, Income, on=keys, how="left")
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime_x"] = pd.to_datetime(All_data["m_anntime_x"]) # Cashflow's column
    All_data["m_anntime_y"] = pd.to_datetime(All_data["m_anntime_y"]) # Income's column
    # 由于Cashflow和Income表中的披露时间m_anntime不一样
    # 取同一个财报中，在这两个表中最早的日期作为实际的财报披露时间
    All_data['m_anntime'] = All_data[['m_anntime_x', 'm_anntime_y']].min(axis=1)
    All_data.fillna(0, inplace=True)
    
    def cal_factor_ttm(data, factor):
        """
        计算factor_ttm.
        factor_diff ：单季度factor数值
        factor_ttm ：滚动四个季度factor之和
        """

        def cal_group_dif(group, factor):
            res = group[factor].diff()
            res = res.fillna(group[factor]) # Fill NaN values with original values
            return res

        data[f"{factor}_diff"] = (
            data.groupby("Year")
            .apply(lambda x: cal_group_dif(x, factor))
            .reset_index(level=0, drop=True)
        ).values.reshape(-1)
        data[f"{factor}_TTM"] = data[f"{factor}_diff"].rolling(4).sum()
        return data

    All_data = All_data.groupby('ts_code').apply(
        lambda x: cal_factor_ttm(x, 'cash_pay_acq_const_fiolta')).reset_index(drop=True)
    All_data = All_data.groupby('ts_code').apply(
        lambda x: cal_factor_ttm(x, 'net_cash_recp_disp_fiolta')).reset_index(drop=True)
    All_data = All_data.groupby('ts_code').apply(
        lambda x: cal_factor_ttm(x, 'revenue_inc')).reset_index(drop=True)
    All_data["Capital_Expense"] = All_data["cash_pay_acq_const_fiolta_TTM"] - All_data["net_cash_recp_disp_fiolta_TTM"]
    All_data["CE"] = (
        (
            All_data["Capital_Expense"] 
            / All_data["revenue_inc"]
        )
        .replace([np.inf, -np.inf], np.nan)
    )
    
    def Cal_Factor_CI(Group, Factor):
        DF = Group[Factor]
        DF_Shift1 = Group[Factor].shift(1)
        DF_Shift2 = Group[Factor].shift(2)
        DF_Shift3 = Group[Factor].shift(3)
        Group["ACI"] = 3 * DF / (DF_Shift1 + DF_Shift2 + DF_Shift3) - 1
        Group["ACI"] = Group["ACI"].replace([np.inf, -np.inf], np.nan)
        return Group

    All_data = All_data.groupby(["ts_code", "Month"]).apply(lambda x: Cal_Factor_CI(x, "CE")).reset_index(drop=True)
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "ACI"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [63]:
ACI_DF = ACI(trade_date, common_param, increment=False)
ACI_DF

Unnamed: 0,trade_date,ts_code,ACI
0,2015-03-31,000001.SZ,0.432034
1,2015-06-30,000001.SZ,0.385260
2,2015-09-30,000001.SZ,0.353477
3,2015-12-31,000001.SZ,-0.066713
4,2016-03-31,000001.SZ,-0.437821
...,...,...,...
9232,2023-06-30,688981.SH,0.548010
9233,2023-09-30,688981.SH,0.387993
9234,2023-12-31,688981.SH,0.214561
9235,2024-03-31,688981.SH,0.168745


### 账面市值比(2.4)

(Book-to-market)

为市净率的倒数；用于衡量股价高低和公司内在价值的估值因子，账面市值比高的股票，投资价值相对更高。

$$
\frac{最近报告期归属于母公司股东权益合计}{总市值}
$$

In [64]:
def Book_to_Market(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    """
    Research Cost to Market Value Ratio
    """
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Balance"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp_income = Financial_DF[asset]["Balance"][
            [
                "m_timetag", 
                "m_anntime",
                "total_equity"]
        ]
        temp_income["ts_code"] = asset
        temp_income = temp_income[
            [
                "ts_code",
                "m_timetag", 
                "m_anntime",
                "total_equity"]
        ]
        if increment:
            temp_income = temp_income[temp_income.m_anntime == trade_date]
        else: 
            temp_income = temp_income[(temp_income.m_timetag >= start_date) & (temp_income.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp_income], ignore_index=True)

    if All_data.empty: return None
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    All_data.fillna(0, inplace=True)
    Exist_Stock_list = list(All_data["ts_code"].unique())
    # print(Exist_Stock_list)
    
    # 获取日期范围
    date_unique = All_data['m_timetag'].drop_duplicates()
    date_sorted = date_unique.sort_values()
    latest_date = date_sorted.max().strftime("%Y%m%d")
    # print(earliest_date, latest_date)

    Market_Value_DF = MV_DF(Exist_Stock_list, start_date, latest_date)
    All_data = pd.merge(All_data, 
                        Market_Value_DF, 
                        how='left', 
                        left_on=['ts_code', 'm_timetag'], 
                        right_on=['ts_code', 'trade_date'])
    All_data["Book_to_Market"] = All_data["total_equity"] / All_data["total_mv"]
    All_data["Book_to_Market"] = All_data["Book_to_Market"].replace([np.inf, -np.inf], np.nan)
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "Book_to_Market"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [65]:
BM = Book_to_Market(trade_date, common_param, increment=False)
BM

Unnamed: 0,trade_date,ts_code,Book_to_Market
0,2015-03-31,000001.SZ,0.759711
1,2015-06-30,000001.SZ,0.725217
2,2015-09-30,000001.SZ,1.046889
3,2015-12-31,000001.SZ,0.941356
4,2016-03-31,000001.SZ,1.232029
...,...,...,...
9975,2023-06-30,688981.SH,0.540792
9976,2023-09-30,688981.SH,0.540306
9977,2023-12-31,688981.SH,0.518530
9978,2024-03-31,688981.SH,0.630236


trade_date### 总市值(1.23)

(Market Value)

公司市值与股票收益负相关，存在小市值效应。

$$
当日收盘价 \times 当日总股本
$$

In [66]:
def MV_Single_DF(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    """
    获取当天的总市值数据
    
    注：
    1. 某个股票因为停牌而没有数据，使用前值填充的方法解决
    2. 处理完成之后，值依旧为Null，说明这个股票在这个日期还没有上市，没有数据
    3. 目前晨乐数据库中的数据最早的日期为2014.01.02
    4. 原始市值数据的单位是万元，为了转换为元，要乘以10000
    5. 由于上市公司会不定期变动股本数量(不一定是每季度的最后一天变动)，
       而变动股本数量的日期可能不是交易日，因此我们还需要加入上市公司变动
       股本数量的日期Changed_dates。如果为了提高效率，不加也可以。
    """
    Begin_date = trade_date if increment else start_date
    # Today_Date = datetime.today().strftime("%Y%m%d")  # 今天的日期
    try:
        MV = get_price(
            ts_code_list=Stock_list,
            feature_list=["total_mv"], # 总市值
            start_date=Begin_date,
            trade_date=trade_date,
            target_type="stock",
        )
        # 如果报错，说明当日不是交易日，没有数据
    except: return None
    MV = MV.sort_values(["trade_date", "ts_code"]).reset_index()
    MV["total_mv"] = MV["total_mv"] * 10000
    MV.dropna(inplace=True)
    if MV.empty: return None
    MV.reset_index(drop=True, inplace=True)
    return MV

In [67]:
MVS_DF = MV_Single_DF(trade_date, common_param, increment=False)
MVS_DF

Unnamed: 0,trade_date,ts_code,total_mv
0,2015-01-05,000001.SZ,1.830270e+11
1,2015-01-05,000002.SZ,1.642340e+11
2,2015-01-05,000063.SZ,6.534770e+10
3,2015-01-05,000100.SZ,3.809320e+10
4,2015-01-05,000157.SZ,5.494340e+10
...,...,...,...
600938,2024-08-14,688303.SH,4.169760e+10
600939,2024-08-14,688363.SH,2.977740e+10
600940,2024-08-14,688396.SH,4.779220e+10
600941,2024-08-14,688599.SH,3.691840e+10


### 产能利用率提升(1.21)

(Operation Cost on Fixed Assets)

产能利用率反映企业营运效率，刻画了公司将生产设备用于生产工作的利用程度。生产设备作为一种固定成本，不会像原材料等变动成本那样会随着产量而改变，所以对于设备等固定设施的利用率的高低一定程度上决定了最终实际平均成本的高低，利用率越高，摊销到单位产品上的平均实际成本就越低，企业的营运效率就越高。

以当天最近N个季度的营业总成本作为Y、以对应季度的固定资产作为X，进行 OLS 线性回归，回归前需对营业总成本和固定资产进行Z-Score 标准化处理;回归得到的最近一个季度上的残差$\epsilon$即为 OCFA 在当天的因子值:
$$
TotalOperationCost_i = \alpha_i + \beta_i \times FixedAssets_i + \epsilon_i
\\(i\in {0,1,2,…,N-1}，默认 N=8)
$$

[创新基本面因子： 捕捉产能利用率中的讯号](https://www.doc88.com/p-19229069924417.html)

In [68]:
def OSL_Regression_OCFA(TotalOperationCost, FixedAsset, N):
    if len(TotalOperationCost) != N or len(FixedAsset) != N:
        return None
    # Combine the input series into a DataFrame
    df = pd.DataFrame(
        {
            "TotalOperationCost": TotalOperationCost,
            "FixedAsset": FixedAsset,
        }
    )
    # Z-Score standardization
    df_standardized = ((df - df.mean()) / df.std()).fillna(0)

    # Prepare the data for regression
    X = sm.add_constant(df_standardized[["FixedAsset"]])
    y = df_standardized["TotalOperationCost"]

    # Perform OLS regression
    model = sm.OLS(y, X).fit()

    # Get the coefficients
    a = model.params["const"]
    beta = model.params["FixedAsset"]

    # Calculate LPNP factor value
    latest_data = df_standardized.iloc[-1]  # Get the latest quarter data
    CUI = (
        latest_data["TotalOperationCost"]
        - a
        - beta * latest_data["FixedAsset"]
    )
    return CUI

def OCFA(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    N = common_param["OCFA_N"]
    """
    Capacity Utilization Increase
    """

    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    Begin_date = (
        Prev_Quarter_date 
        - pd.DateOffset(months=3*N) 
        - pd.DateOffset(month=3)).strftime("%Y%m%d")
    # print(f"Begin date: {Begin_date}, End date: {End_date}")
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Income", "Balance"])
    Income = pd.DataFrame()
    Balance = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp_income = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime",
                "total_operating_cost"]
        ]
        temp_balance = Financial_DF[asset]['Balance'][
            [
                "m_timetag", 
                "m_anntime",
                'fix_assets']
        ]
        temp_income["ts_code"] = asset
        temp_balance["ts_code"] = asset
        temp_income = temp_income[
            [
                "ts_code",
                "m_timetag",
                "m_anntime",
                "total_operating_cost"]
        ]
        temp_balance = temp_balance[
            [
                "ts_code",
                "m_timetag", 
                "m_anntime",
                'fix_assets']
        ]
        temp_income = temp_income[(temp_income.m_timetag >= Begin_date) & (temp_income.m_timetag <= trade_date)]
        temp_balance = temp_balance[(temp_balance.m_timetag >= Begin_date) & (temp_balance.m_timetag <= trade_date)]
        Income = pd.concat([Income, temp_income], ignore_index=True)
        Balance = pd.concat([Balance, temp_balance], ignore_index=True)

    keys = ["ts_code", "m_timetag"]
    All_data = pd.merge(Balance, Income, on=keys, how='left')
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4])
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6])
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime_x"] = pd.to_datetime(All_data["m_anntime_x"]) # Balance's column
    All_data["m_anntime_y"] = pd.to_datetime(All_data["m_anntime_y"]) # Income's column
    # Balance和Income表中的披露时间m_anntime可能不一样
    # 取同一个财报中，在这两个表中最早的日期作为实际的财报披露时间
    All_data['m_anntime'] = All_data[['m_anntime_x', 'm_anntime_y']].min(axis=1)
    
    def cal_group_dif(group, factor):
        # 计算单季度的值(使用于原始数据为累加值)
        res = group[factor].diff()
        res = res.fillna(group[factor]) # Fill NaN values with original values
        return res

    All_data["total_operating_cost_quarter"] = (
        All_data.groupby("Year")
        .apply(lambda x: cal_group_dif(x,'total_operating_cost'))
        .reset_index(level=0, drop=True)
    )
    All_data.fillna(0, inplace=True)
    
    def cal_factor_OCFA(group, factor_TOC, factor_FA, N):
        TOC_Rolling = group[factor_TOC].rolling(N)
        FA_Rolling = group[factor_FA].rolling(N)
        group["OCFA"] = [OSL_Regression_OCFA(a, b, N) 
                         for a, b in 
                         zip(TOC_Rolling, FA_Rolling)]
        return group

    All_data = All_data.groupby("ts_code").apply(
        lambda x: cal_factor_OCFA(
            x, 
            "total_operating_cost_quarter", 
            "fix_assets", 
            N)
        ).reset_index(drop=True)
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "OCFA"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [69]:
OCFA_DF = OCFA(trade_date, common_param, increment=False)
OCFA_DF

Unnamed: 0,trade_date,ts_code,OCFA
0,2015-03-31,000001.SZ,0.488886
1,2015-06-30,000001.SZ,1.676240
2,2015-09-30,000001.SZ,-0.083244
3,2015-12-31,000001.SZ,0.041024
4,2016-03-31,000001.SZ,0.225714
...,...,...,...
9752,2023-06-30,688981.SH,0.634459
9753,2023-09-30,688981.SH,0.358572
9754,2023-12-31,688981.SH,0.224501
9755,2024-03-31,688981.SH,-0.319718


### 现金与总资产比(1.15)

(Cash to Total Assets Ratio)

与现金流和总需求冲击相关性更高的高风险公司有更强的预防性现金储蓄动机，而这种预防性现金储蓄动机意味着预期股票收益与现金持有量之间存在正相关关系。

$$
\frac{最近12个月的现金及现金等价物(TTM)}{平均总资产}
$$

$ 平均总资产=\frac{期初总资产+期未总资产}{2}$

In [70]:
def Cash_to_Asset(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]

    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1)).strftime("%Y%m%d")

    Financial_DF = xtdata.get_financial_data(Stock_list, ["CashFlow", "Income"])
    CashFlow = pd.DataFrame()
    Income = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp_cashflow = Financial_DF[asset]['CashFlow'][
            [
                "m_timetag", 
                "m_anntime",
                "net_incr_cash_cash_equ",
                "cash_cash_equ_beg_period",
                "cash_cash_equ_end_period"]
        ]
        temp_income = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime",
                "net_profit_incl_min_int_inc"]
        ]
        temp_cashflow["ts_code"] = asset
        temp_income["ts_code"] = asset
        temp_cashflow = temp_cashflow[
            [
                "ts_code",
                "m_timetag", 
                "m_anntime",
                "net_incr_cash_cash_equ",
                "cash_cash_equ_beg_period",
                "cash_cash_equ_end_period"]
        ]
        temp_income = temp_income[
            [
                "ts_code",
                "m_timetag", 
                "m_anntime",
                "net_profit_incl_min_int_inc"]
        ]
        temp_cashflow = temp_cashflow[(temp_cashflow.m_timetag >= Begin_date) & (temp_cashflow.m_timetag <= trade_date)]
        temp_income = temp_income[(temp_income.m_timetag >= Begin_date) & (temp_income.m_timetag <= trade_date)]
        CashFlow = pd.concat([CashFlow, temp_cashflow], ignore_index=True)
        Income = pd.concat([Income, temp_income], ignore_index=True)

    keys = ["ts_code", "m_timetag"]
    All_data = pd.merge(CashFlow, Income, on=keys, how='left')
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime_x"] = pd.to_datetime(All_data["m_anntime_x"]) # CashFlow's column
    All_data["m_anntime_y"] = pd.to_datetime(All_data["m_anntime_y"]) # Income's column
    # CashFlow和Income表中的披露时间m_anntime不一样
    # 取同一个财报中，在这两个表中最早的日期作为实际的财报披露时间
    All_data['m_anntime'] = All_data[['m_anntime_x', 'm_anntime_y']].min(axis=1)
    
    def cal_factor_ttm(data, factor):
        """
        计算factor_ttm.
        factor_diff ：单季度factor数值
        factor_ttm ：滚动四个季度factor之和
        """

        def cal_group_dif(group, factor):
            res = group[factor].diff()
            res = res.fillna(group[factor]) # Fill NaN values with original values
            return res

        data[f"{factor}_diff"] = (
            data.groupby("Year")
            .apply(lambda x: cal_group_dif(x, factor))
            .reset_index(level=0, drop=True)
        ).values.reshape(-1)
        data[f"{factor}_TTM"] = data[f"{factor}_diff"].rolling(4).sum()
        return data

    # There are some NaN values in cash_cash_equ_end_period_TTM in the top
    All_data = All_data.groupby('ts_code').apply(lambda x: cal_factor_ttm(x, 'cash_cash_equ_end_period')).reset_index(drop=True)

    def Cal_Average(Group, Factor):
        Asset = Group[Factor]
        First_Asset = Asset.iloc[0]
        Group["Average_Total_Asset"] = (Asset + First_Asset) / 2
        return Group
    All_data = All_data.groupby(["ts_code", "Year"]).apply(lambda x: Cal_Average(x, "net_profit_incl_min_int_inc")).reset_index(drop=True)
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data["CA"] = All_data["cash_cash_equ_end_period_TTM"] / All_data["Average_Total_Asset"]
    All_data["CA"] = All_data["CA"].replace([np.inf, -np.inf], np.nan)
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "CA"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [71]:
CA = Cash_to_Asset(trade_date, common_param, increment=False)
CA

Unnamed: 0,trade_date,ts_code,CA
0,2015-03-31,000001.SZ,23.355481
1,2015-06-30,000001.SZ,39.438829
2,2015-09-30,000001.SZ,34.272498
3,2015-12-31,000001.SZ,19.010766
4,2016-03-31,000001.SZ,54.525140
...,...,...,...
10306,2023-06-30,688981.SH,19.607058
10307,2023-09-30,688981.SH,16.855083
10308,2023-12-31,688981.SH,13.725739
10309,2024-03-31,688981.SH,43.245758


### 流动比率(1.8)

(Liquidity Ratio)

反映企业的短期偿债能力。

$$
\frac{最近报告期流动资产合计}{最近同期流动负债合计}
$$

In [72]:
def Liquidity_Ratio(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    Financial_DF = xtdata.get_financial_data(Stock_list, ["Balance"])
    # Linear_Purified_NP
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]['Balance'][
            [
                "m_timetag", 
                "m_anntime", 
                'total_current_assets',
                'total_current_liability']
        ]
        temp["ts_code"] = asset
        temp = temp[
            [
                "ts_code",
                "m_timetag",
                "m_anntime",
                'total_current_assets',
                'total_current_liability'
            ]
        ]
        if increment == True:
            temp = temp[temp.m_anntime == trade_date]
        else: 
            temp = temp[(temp.m_timetag >= start_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    if All_data.empty: return None
    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    
    All_data["Liquidity_Ratio"] = (All_data["total_current_assets"] / All_data["total_current_liability"])
    All_data["Liquidity_Ratio"] = All_data["Liquidity_Ratio"].replace([np.inf, -np.inf], np.nan)
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data[["m_timetag", "ts_code", "Liquidity_Ratio"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [73]:
LR = Liquidity_Ratio(trade_date, common_param, increment=False)
LR

Unnamed: 0,trade_date,ts_code,Liquidity_Ratio
0,2015-03-31,000002.SZ,1.302974
1,2015-06-30,000002.SZ,1.292964
2,2015-09-30,000002.SZ,1.293264
3,2015-12-31,000002.SZ,1.302247
4,2016-03-31,000002.SZ,1.291051
...,...,...,...
8721,2023-06-30,688981.SH,2.033938
8722,2023-09-30,688981.SH,1.785040
8723,2023-12-31,688981.SH,1.835524
8724,2024-03-31,688981.SH,1.626023


### 单季度销售收入同比增长率(1.7)

(YoY of Revenue)

$$
\frac{最近单季度营业收入-上年同期值}{abs(上年同期值)}
$$

> 要对原始数据进行差分运算

In [74]:
def Revenue_YoY(trade_date, common_param, increment=True, start_date="20150101"):
    Stock_list = common_param["Stock_list"]
    if increment: Prev_Quarter_date = Prev_Quarter_end(trade_date)
    else: Prev_Quarter_date = Prev_Quarter_end(start_date)
    Begin_date = (Prev_Quarter_date - pd.DateOffset(years=1, month=3)).strftime("%Y%m%d")

    Financial_DF = xtdata.get_financial_data(Stock_list, ["Income"])
    All_data = pd.DataFrame()
    for asset in Financial_DF.keys():
        temp = Financial_DF[asset]["Income"][
            [
                "m_timetag", 
                "m_anntime",
                "revenue_inc"]
        ]
        temp["ts_code"] = asset
        temp = temp[
            [
                "ts_code", 
                "m_timetag", 
                "m_anntime", 
                "revenue_inc"]
        ]
        temp = temp[(temp.m_timetag >= Begin_date) & (temp.m_timetag <= trade_date)]
        All_data = pd.concat([All_data, temp], ignore_index=True)

    All_data["Year"] = All_data["m_timetag"].apply(lambda x: x[:4]).astype(int)
    All_data["Month"] = All_data["m_timetag"].apply(lambda x: x[4:6]).astype(int)
    All_data["m_timetag"] = pd.to_datetime(All_data["m_timetag"])
    All_data["m_anntime"] = pd.to_datetime(All_data["m_anntime"])
    All_data.fillna(0, inplace=True)
    
    def cal_group_dif(Group, Factor):
        def cal_diff(group, factor):
            res = group[factor].diff()
            res = res.fillna(group[factor]) # Fill NaN values with original values
            return res
        Group[f"{Factor}_diff"] = (
            Group.groupby("Year")
            .apply(lambda x: cal_diff(x, Factor))
            .reset_index(level=0, drop=True)
        ).values.reshape(-1)
        return Group
    # 单季度毛利润
    All_data = (
        All_data.groupby("ts_code")
        .apply(lambda x: cal_group_dif(x, "revenue_inc"))
    ).reset_index(drop=True)
    
    def Cal_Factor_YoY(Group, Factor):
        DF = Group[Factor].fillna(0)
        DF_Shift = Group[Factor].fillna(0).shift(1)
        DIFF = DF - DF_Shift
        Group["YoY_Revenue"] = np.where(DIFF == 0, 0, DIFF / abs(DF_Shift))
        Group["YoY_Revenue"] = Group["YoY_Revenue"].replace([np.inf, -np.inf], np.nan)
        return Group
    
    All_data = All_data.groupby(["ts_code", "Month"]).apply(lambda x: Cal_Factor_YoY(x, "revenue_inc_diff")).reset_index(drop=True)
    if increment: 
        All_data = All_data[All_data.m_anntime == trade_date]
    else: All_data = All_data[(All_data.m_timetag >= start_date) & (All_data.m_timetag <= trade_date)]
    All_data.dropna(inplace=True)
    if All_data.empty: return None
    All_data = All_data.sort_values(by=['ts_code', 'm_timetag'])
    All_data = All_data[["m_timetag", "ts_code", "YoY_Revenue"]]
    All_data = All_data.rename(columns={"m_timetag": "trade_date"})
    All_data.reset_index(drop=True, inplace=True)
    return All_data

In [75]:
Revenue_YoY_DF = Revenue_YoY("20210315", common_param, increment=True)
Revenue_YoY_DF

In [76]:
# trade_date = '20240814' # 每天的日期
# hs300_list = list(xtdata.get_index_weight("000300.SH").keys())
# # xtdata.download_financial_data(hs300_list, table_list=["Balance", "Income", "CashFlow", "PershareIndex", "Top10holder", "Holdernum", "Capital"]) # 下载数据
# # financial_data = xtdata.get_financial_data(hs300_list, ["Balance", "Income", "CashFlow", "PershareIndex", "Top10holder", "Holdernum", "Capital"])

# # 共有自定义参数
# common_param = {
#     "Stock_list": hs300_list,
#     "LPNP_N": 8, # 线性纯化利润率的参数N，默认为8
#     "CEGR_n": 2, # 资本支出增长率的参数n，默认为2
#     "AROE_N": 8, # 规模调整ROE的参数N，默认为8
#     "SOP_T": 6, # 标准化营业利润的参数T，默认为6
#     "OCFA_N": 8, # 产业利用率提升的参数N，默认为8
# }

# # def DIY_FACTOR1_SCRIPT(trade_date, common_param):
# #     NEW_DATA_DF = calc_factor()
# #     # 你的因子增量计算代码
# #     return NEW_DATA_DF # 返回一个df给数据库，有三列，分别是：trade_date、ts_code、factor_name

In [77]:
# 函数使用：fact_dict["factor_name"](trade_date, common_param)
fact_dict = {
    "Market_Leverage": Market_Leverage,
    "CIR_YoY": CIR_YoY,
    "LPNP": Linear_Purified_NP,
    "Average_Sharehold_Ratio": Average_Sharehold_Ratio,
    "Net_Profit_YoY": Net_Profit_YoY,
    "CLOA": CLOA,
    "IPO_Age": IPO_Age,
    "Debt2Market_Ratio": Debt2Market_Ratio,
    "CFL": CFL,
    "Cash_Ratio": Cash_Ratio,
    "GPG_Minus_SRG": GPG_Minus_SRG,
    "GPG_YoY": GPG,
    "Log_Market_Value": Log_Market_Value,
    "RG_Minus_IG": RG_Minus_IG,
    "TAR_TTM_QoQ": TAR_QoQ,
    "ROEG_Minus_NAG": ROEG_Minus_NAG,
    "CCS": CCS,
    "Total_Asset_to_Market": Total_Asset_to_Market,
    "RER": Research_Expenses_Ratio,
    "Sales_Cost_Ratio_YoY": Sales_Cost_Ratio_YoY,
    "Quick_Ratio": Quick_Ratio,
    "CEGR": CEGR,
    "Tradable_Market_Value": Tradable_Market_Value,
    "CNOA": Change_in_NOA,
    "AROE": Adjusted_ROE,
    "BVPS_YoY": BVPS_YoY,
    "shareholder": Num_Shareholders,
    "RC_to_MVR": RC_to_MVR,
    "SOP": SOP,
    "ACI": ACI,
    "Book_to_Market": Book_to_Market,
    "total_mv": MV_Single_DF,
    "OCFA": OCFA,
    "CA": Cash_to_Asset,
    "Liquidity_Ratio": Liquidity_Ratio,
    "YoY_Revenue": Revenue_YoY,
}

# 测试用例框架：

In [79]:
# 生成测试日期
def get_datelist(num):
    import pandas as pd
    import numpy as np

    start_date = pd.to_datetime("2020-01-01")
    end_date = pd.to_datetime("2023-12-31")
    max_start_date = end_date - pd.Timedelta(days=num - 1)
    random_start = start_date + pd.to_timedelta(
        np.random.randint(0, (max_start_date - start_date).days + 1), unit="D"
    )
    date_range = pd.date_range(start=random_start, periods=num)
    return date_range

## 全量计算测试，测试连续的20天

In [80]:
# 全量计算测试，测试连续的20天
start_date = "20150101"  # 设置全量计算时的开始时间

all_yes = True
for trade_date in get_datelist(20):
    # common_param = make_common_param()  # 在这里准备你需要的数据和参数
    for x in fact_dict:
        try:
            df = fact_dict[x](
                trade_date=trade_date.strftime("%Y%m%d"),
                common_param=common_param,
                increment=False,
                start_date=start_date,
            )
            if type(df) != pd.DataFrame:
                raise ValueError("返回值类型有误")
            if len(df) < 1:
                raise ValueError("返回值行数有误")
            if len(df.columns) != 3: 
                raise ValueError("返回值列数有误")
            if df.columns[2] != x:
                raise ValueError("返回值第三列名字有误")
            if len(df) != len(df.dropna()):
                raise ValueError("返回值中含有None")
            print(f"{x}测试通过")
        except Exception as e:
            print(f"因子{x}在{trade_date}出错:{e}")
            all_yes = False
if all_yes:
    print("全量计算测试全部通过！")
else:
    print("没有全部通过全量计算测试！")

Market_Leverage测试通过
CIR_YoY测试通过
LPNP测试通过
Average_Sharehold_Ratio测试通过
Net_Profit_YoY测试通过
CLOA测试通过
IPO_Age测试通过
Debt2Market_Ratio测试通过
CFL测试通过
Cash_Ratio测试通过
GPG_Minus_SRG测试通过
GPG_YoY测试通过
Log_Market_Value测试通过
RG_Minus_IG测试通过
TAR_TTM_QoQ测试通过
ROEG_Minus_NAG测试通过
CCS测试通过
Total_Asset_to_Market测试通过
RER测试通过
Sales_Cost_Ratio_YoY测试通过
Quick_Ratio测试通过
CEGR测试通过
Tradable_Market_Value测试通过
CNOA测试通过
AROE测试通过
BVPS_YoY测试通过
shareholder测试通过
RC_to_MVR测试通过
SOP测试通过
ACI测试通过
Book_to_Market测试通过
total_mv测试通过
OCFA测试通过
CA测试通过
Liquidity_Ratio测试通过
YoY_Revenue测试通过
Market_Leverage测试通过
CIR_YoY测试通过
LPNP测试通过
Average_Sharehold_Ratio测试通过
Net_Profit_YoY测试通过
CLOA测试通过
IPO_Age测试通过
Debt2Market_Ratio测试通过
CFL测试通过
Cash_Ratio测试通过
GPG_Minus_SRG测试通过
GPG_YoY测试通过
Log_Market_Value测试通过
RG_Minus_IG测试通过
TAR_TTM_QoQ测试通过
ROEG_Minus_NAG测试通过
CCS测试通过
Total_Asset_to_Market测试通过
RER测试通过
Sales_Cost_Ratio_YoY测试通过
Quick_Ratio测试通过
CEGR测试通过
Tradable_Market_Value测试通过
CNOA测试通过
AROE测试通过
BVPS_YoY测试通过
shareholder测试通过
RC_to_MVR测试通过
SOP测试通过
ACI测试通过
Book_to_Market

##  增量计算测试，测试连续的20天

In [81]:
# 增量计算测试，测试连续的20天
all_yes = True
for Trade_date in get_datelist(20):
    print(Trade_date, "正在测试")
    # common_param = make_common_param()  # 在这里准备你需要的数据和参数
    for x in fact_dict:
        try:
            df = fact_dict[x](
                trade_date=Trade_date.strftime("%Y%m%d"), 
                common_param=common_param, 
                increment=True
            )
            if type(df) == pd.DataFrame:
                if len(df) < 1:
                    raise ValueError("返回值行数有误")
                if len(df.columns) != 3:
                    raise ValueError("返回值列数有误")
                if df.columns[2] != x:
                    raise ValueError("返回值第三列名字有误")
                if len(df) != len(df.dropna()):
                    raise ValueError("返回值中含有None")
        except Exception as e:
            print(f"因子{x}在{Trade_date}出错:{e}")
            all_yes = False
if all_yes:
    print("增量计算测试全部通过！")
else:
    print("没有全部通过增量计算测试！")

2023-06-17 00:00:00 正在测试
2023-06-18 00:00:00 正在测试
2023-06-19 00:00:00 正在测试
2023-06-20 00:00:00 正在测试
2023-06-21 00:00:00 正在测试
2023-06-22 00:00:00 正在测试
2023-06-23 00:00:00 正在测试
2023-06-24 00:00:00 正在测试
2023-06-25 00:00:00 正在测试
2023-06-26 00:00:00 正在测试
2023-06-27 00:00:00 正在测试
2023-06-28 00:00:00 正在测试
2023-06-29 00:00:00 正在测试
2023-06-30 00:00:00 正在测试
2023-07-01 00:00:00 正在测试
2023-07-02 00:00:00 正在测试
2023-07-03 00:00:00 正在测试
2023-07-04 00:00:00 正在测试
2023-07-05 00:00:00 正在测试
2023-07-06 00:00:00 正在测试
增量计算测试全部通过！
