# Topic
## background
机构`抱团`行为带来高额收益，如白马股、蓝筹股

**reason**
- 中国股市的结构存在大量的散户 -> 大量资金抱团可以对股价造成影响
- 中国公募基金有相似的选股逻辑
    1. 券商研究所 - 调研公司、筛选股票并发布研报
    2. 基金经理 - 根据研报推荐筛选基金并构建基金池
    -> 被机构投资的股票范围缩小 -> `抱团`
- 中国市场非强有效市场（论文？）， 通过私人关系获取的信息可以带来超额收益
    - 私人社交网络具有排他性
    - 基金经理通过私人社交网络获取具有排他性的消息
    - 同时，基金经理根据个人需求向研究所提出标的覆盖要求
    
## Q
在2018年至2023年运营的开放式基金，其基金超额收益与基金经理间个人校友网络的关系。

# data

## 数据规则
- 2018-2023年存续
- 开放式的混合基金及股票基金

### 剔除
pre：30161个

- 2018-2023年存续
- 混合+股票基金
- 基金经理为同一人，取业绩最好的一支

剩余218个

## 变量

### personal info excel
基金经理 FundManager

学校 Uni

学历 Degree

专业 Major

D商学院 D_B

基金名称 Fund

### fund return excel
RF 10-year treasury bond

Rm Hushen 300 Index

基金名称 Fund

业绩回报 Return

存续 year

换手率 turnover

规模 size

账面市值 BM

December book value of equity in the previous year

Degree Centrality：如果你认为基金经理的校友网络规模越大（即他们与更多的其他基金经理有直接的联系），那么他们的信息获取能力和资源可能就越丰富，这可能会对他们管理基金的业绩产生积极影响。

# Model design - DID

## methodology
1. why use FF-3
2. Network: 通过python的networkx得亲密中心度，并从高到低分为5组（Q1-Q5）

## parameters

MKT: return-Rm

SMB: 

HML：

Network: 从高到低分为5组（Q1-Q5）

## 固定：FF-3

α = β_0 + β_1 MKT + β_2 SMB + β_3 HML

## our
α = β_0 + β_1 MKT + β_2 SMB + β_3 HML + β_4 Network 

# Model regression

In [1]:
import pandas as pd
import numpy as np
import networkx as nx
import statsmodels.api as sm
import math

In [6]:
# fund_managers
fund_managers_df = pd.read_csv("https://raw.githubusercontent.com/Tami666/FINM8100/cc8ffb42b6152c5b7490448f614ccb8746adac44/info.csv",encoding="gbk")

# A-shares market data
df_market = pd.read_csv('https://raw.githubusercontent.com/Tami666/FINM8100/cc8ffb42b6152c5b7490448f614ccb8746adac44/market.csv')
df_market = df_market.replace(0, np.nan) # replay '0' to 'NaN'
df_market = df_market.dropna()  # delete 'NaN'

# fund return and age
df_fund = pd.read_csv('https://raw.githubusercontent.com/Tami666/FINM8100/cc8ffb42b6152c5b7490448f614ccb8746adac44/Fund.csv')

# rm and rf
df_rm_rf = pd.read_csv('C:/Users/tamiz/Desktop/rf_rm.csv')

# fund size
df_size = pd.read_csv('C:/Users/tamiz/Desktop/size.csv')

## variables construction

### size

In [3]:
df_size.columns

Index(['Manager', 'Fund', '1', '2', '3', '4', '5'], dtype='object')

In [4]:
# 计算每位基金经理管理的所有基金的年平均回报
df_sizee = df_size.groupby('Manager')[[f'{i}' for i in range(1, 6)]].mean().reset_index()

# 将数据从宽格式转化为长格式
df_sizee= df_sizee.melt(id_vars='Manager', value_vars=[f'{i}' for i in range(1, 6)], var_name='Year', value_name='size')
df_sizee["Year"]=df_sizee["Year"].astype(int)
df_sizee

Unnamed: 0,Manager,Year,size
0,30000000000000013242,1,2487.3250
1,30000000000000013327,1,101.6400
2,30000000000000013376,1,1138.5100
3,30000000000000013382,1,106.1100
4,30000000000000013404,1,612.7700
...,...,...,...
1245,30383013,5,1588.3950
1246,30391589,5,482.3900
1247,30391651,5,8790.2250
1248,30415976,5,387.3100


### centrality_df

In [5]:
df_fund

Unnamed: 0,Manager,Fund,1,2,3,4,5,age
0,30000000000000020478,000011.OF,-0.149482,0.315315,0.334528,0.046428,-0.149874,5.668493
1,30382072,000017,-0.151667,0.310413,0.426149,-0.066471,-0.105338,6.643836
2,30000000000000020174,000021,-0.164809,0.248281,0.388907,0.110584,-0.208163,9.649315
3,30382395,000039,-0.255532,0.445372,0.565019,0.481965,-0.327494,5.783562
4,30380532,000041,-0.135238,0.190899,0.307836,-0.152266,-0.284404,6.605479
...,...,...,...,...,...,...,...,...
470,30380226,673120,-0.038499,0.176176,0.631939,0.086079,-0.164631,5.545205
471,30249178,690005,-0.264904,0.693425,0.640383,-0.039573,-0.222287,6.104110
472,30391589,690011,-0.161500,0.487532,0.714536,0.251484,-0.257409,5.019178
473,30381526,700003,-0.287863,0.698471,0.648602,0.767591,-0.180938,6.454795


In [6]:
# data: info.csv
# 创建一个空的图模型
G = nx.Graph()

# 添加基金经理作为节点
G.add_nodes_from(fund_managers_df['Manager'])

# 根据毕业学校添加边
for idx1, manager1 in fund_managers_df.iterrows():
    for idx2, manager2 in fund_managers_df.iterrows():
        # 检查两个基金经理是否属于同一所学校
        if manager1['Uni'] == manager2['Uni'] and manager1['Manager'] != manager2['Manager']:
            G.add_edge(manager1['Manager'], manager2['Manager'])

    
# 最后，我们计算每个节点的程度中心度
degree_centrality = nx.degree_centrality(G)
degree_centrality = list(degree_centrality.items())

In [7]:
dc= pd.DataFrame(degree_centrality, columns=['Manager',"degree_centrality"])

### SMB

In [8]:
# 初始化一个空的DataFrame，用于存储计算结果
df_smb = pd.DataFrame()
# 循环处理每一年的数据
for i in range(1, 6):
    # 提取每年的收益率和市值数据
    df_year = df_market[['Stock', f'Return_Y{i}', f'MarketCap_Y{i}']].copy()
    df_year.columns = ['Stock', 'Return', 'MarketCap']
    
    # 转换Return和MarketCap的数据类型
    df_year['Return'] = df_year['Return'].astype(float)
    df_year['MarketCap'] = df_year['MarketCap'].astype(float)
    
    # 计算市值的中位数
    median_market_cap = df_year['MarketCap'].median()
    
    # 根据市值中位数将股票分为Small和Big
    df_year['Size'] = df_year.apply(lambda row: 'Small' if row['MarketCap'] <= median_market_cap else 'Big', axis=1)
    
    # 计算Small和Big的平均收益
    df_average_returns = df_year.groupby('Size')['Return'].mean()
    
    # 计算SMB
    smb = df_average_returns['Small'] - df_average_returns['Big']
    
    # 将结果添加到df_smb中
    df_smb = df_smb.append({'Year': i, 'SMB': smb}, ignore_index=True)

# SMB就是df_smb['SMB']
print(df_smb)

   Year           SMB
0   1.0 -6.525544e+00
1   2.0 -2.069882e+01
2   3.0 -8.221398e+11
3   4.0  7.204264e+00
4   5.0  5.098288e+00


  df_smb = df_smb.append({'Year': i, 'SMB': smb}, ignore_index=True)
  df_smb = df_smb.append({'Year': i, 'SMB': smb}, ignore_index=True)
  df_smb = df_smb.append({'Year': i, 'SMB': smb}, ignore_index=True)
  df_smb = df_smb.append({'Year': i, 'SMB': smb}, ignore_index=True)
  df_smb = df_smb.append({'Year': i, 'SMB': smb}, ignore_index=True)


### HML

In [9]:
# 初始化一个空的DataFrame，用于存储计算结果
df_hml = pd.DataFrame()

# 循环处理每一年的数据
for i in range(1, 6):
    # 提取每年的收益率和PB数据
    df_year = df_market[['Stock', f'Return_Y{i}', f'PB_Y{i}']].copy()
    df_year.columns = ['Stock', 'Return', 'PB']

    # 将括号内的数值视为负数，并去除非数字字符
    df_year['Return'] = df_year['Return'].apply(lambda x: -float(x[1:-1].replace(',', '')) if isinstance(x, str) and x.startswith('(') else float(x))
    df_year['PB'] = df_year['PB'].apply(lambda x: -float(x[1:-1].replace(',', '')) if isinstance(x, str) and x.startswith('(') else float(x))

    # 计算PB的30%和70%分位数
    pb_30_quantile = df_year['PB'].quantile(0.3)
    pb_70_quantile = df_year['PB'].quantile(0.7)

    # 根据PB的30%和70%分位数将股票分为Low, Medium, High
    df_year['PB_Group'] = pd.cut(df_year['PB'], bins=[df_year['PB'].min(), pb_30_quantile, pb_70_quantile, df_year['PB'].max()], labels=['Low', 'Medium', 'High'])

    # 计算Low和High的平均收益
    df_average_returns = df_year.groupby('PB_Group')['Return'].mean()

    # 计算HML
    hml = df_average_returns['High'] - df_average_returns['Low']

    # 将结果添加到df_hml中
    df_hml = df_hml.append({'Year': i, 'HML': hml}, ignore_index=True)

# HML就是df_hml['HML']
print(df_hml)

   Year           HML
0   1.0 -2.304483e+00
1   2.0  1.367099e+01
2   3.0 -7.388570e+11
3   4.0 -4.666064e+00
4   5.0 -1.377054e+01


  df_hml = df_hml.append({'Year': i, 'HML': hml}, ignore_index=True)
  df_hml = df_hml.append({'Year': i, 'HML': hml}, ignore_index=True)
  df_hml = df_hml.append({'Year': i, 'HML': hml}, ignore_index=True)
  df_hml = df_hml.append({'Year': i, 'HML': hml}, ignore_index=True)
  df_hml = df_hml.append({'Year': i, 'HML': hml}, ignore_index=True)


### Ri 计算每位基金经理管理的所有基金的平均回报

In [10]:
# 计算每位基金经理管理的所有基金的年平均回报
df_manager = df_fund.groupby('Manager')[[f'{i}' for i in range(1, 6)]].mean().reset_index()

# 将数据从宽格式转化为长格式
df_manager= df_manager.melt(id_vars='Manager', value_vars=[f'{i}' for i in range(1, 6)], var_name='Year', value_name='Return')
df_manager

Unnamed: 0,Manager,Year,Return
0,30000000000000013242,1,-0.186805
1,30000000000000013327,1,-0.136924
2,30000000000000013376,1,-0.208102
3,30000000000000013382,1,-0.171601
4,30000000000000013404,1,0.048168
...,...,...,...
1245,30383013,5,-0.257668
1246,30391589,5,-0.257409
1247,30391651,5,0.019353
1248,30415976,5,-0.130458


### centrality degree分组

In [11]:
ff= df_rm_rf.merge(df_hml[['Year', 'HML']], on=['Year'], how='inner')
ff= ff.merge(df_smb[['Year', 'SMB']], on=['Year'], how='inner')
ff['Year'] = ff['Year'].astype(int) 
df_manager['Year'] = df_manager['Year'].astype(int)
ff

Unnamed: 0,Year,MKT2,MKT,Rf,Rm,1yLPR,stock index,bond index,HML,SMB
0,5,0.025984,-0.16,0.02,-0.14,0.04,-0.22,0.03,-13.77054,5.098288
1,4,0.002532,-0.05,0.02,-0.03,0.04,-0.06,0.06,-4.666064,7.204264
2,3,0.027683,0.17,0.02,0.19,0.04,0.26,0.03,-738857000000.0,-822139800000.0
3,2,0.064373,0.25,0.03,0.28,0.04,0.38,0.05,13.67099,-20.69882
4,1,0.035357,-0.19,0.03,-0.16,0.04,-0.26,0.09,-2.304483,-6.525544


In [12]:
ff3= df_manager.merge(ff[['Year', 'MKT2', 'MKT', 'HML', 'SMB','Rf']], on=['Year'], how='inner')
ff3['excess_return'] = ff3.Return - ff3.Rf
ff3= ff3.merge(dc[['Manager', 'degree_centrality']], on=['Manager'], how='inner')
ff3

Unnamed: 0,Manager,Year,Return,MKT2,MKT,HML,SMB,Rf,excess_return,degree_centrality
0,30000000000000013242,1,-0.186805,0.035357,-0.19,-2.304483e+00,-6.525544e+00,0.03,-0.216805,0.0
1,30000000000000013242,2,0.350415,0.064373,0.25,1.367099e+01,-2.069882e+01,0.03,0.320415,0.0
2,30000000000000013242,3,0.399091,0.027683,0.17,-7.388570e+11,-8.221398e+11,0.02,0.379091,0.0
3,30000000000000013242,4,-0.046272,0.002532,-0.05,-4.666064e+00,7.204264e+00,0.02,-0.066272,0.0
4,30000000000000013242,5,-0.152575,0.025984,-0.16,-1.377054e+01,5.098288e+00,0.02,-0.172575,0.0
...,...,...,...,...,...,...,...,...,...,...
1245,30593893,1,-0.147237,0.035357,-0.19,-2.304483e+00,-6.525544e+00,0.03,-0.177237,0.0
1246,30593893,2,0.267458,0.064373,0.25,1.367099e+01,-2.069882e+01,0.03,0.237458,0.0
1247,30593893,3,0.390736,0.027683,0.17,-7.388570e+11,-8.221398e+11,0.02,0.370736,0.0
1248,30593893,4,0.180475,0.002532,-0.05,-4.666064e+00,7.204264e+00,0.02,0.160475,0.0


In [13]:
# sorted degree_centrality into 3 degrees
sorted_df= ff3.sort_values(by='degree_centrality', ascending=False)
total_managers = len(sorted_df)// 3

df1 = sorted_df[:total_managers]
df2 = sorted_df[total_managers:2 * total_managers]
df3 = sorted_df[2 * total_managers:]

### average 'age'

In [14]:
# average 'age',"size"
reg = df_fund.groupby('Manager')[["age"]].mean().reset_index()
reg

Unnamed: 0,Manager,age
0,30000000000000013242,14.083562
1,30000000000000013327,5.797260
2,30000000000000013376,6.689041
3,30000000000000013382,5.797260
4,30000000000000013404,6.715068
...,...,...
245,30383013,8.560274
246,30391589,5.019178
247,30391651,5.016438
248,30415976,5.600000


## regression 1

In [15]:
# 创建空DataFrame，用于存储模型结果
df_results = pd.DataFrame()

In [16]:
# Q1
X = df1[['MKT', 'MKT2', 'SMB', 'HML']]
Y = df1['excess_return']

# 加入截距项
X = sm.add_constant(X)

# 拟合模型
model1 = sm.OLS(Y, X).fit()

In [17]:
# Q2
X = df2[['MKT', 'MKT2', 'SMB', 'HML']]
Y = df2['excess_return']

# 加入截距项
X = sm.add_constant(X)

# 拟合模型
model2 = sm.OLS(Y, X).fit()

In [18]:
# Q3
X = df3[['MKT', 'MKT2', 'SMB', 'HML']]
Y = df3['excess_return']

# 加入截距项
X = sm.add_constant(X)

# 拟合模型
model3 = sm.OLS(Y, X).fit()

In [19]:
model1.summary()

0,1,2,3
Dep. Variable:,excess_return,R-squared:,0.731
Model:,OLS,Adj. R-squared:,0.728
Method:,Least Squares,F-statistic:,279.3
Date:,"Sun, 14 May 2023",Prob (F-statistic):,9.4e-116
Time:,23:43:26,Log-Likelihood:,209.94
No. Observations:,416,AIC:,-409.9
Df Residuals:,411,BIC:,-389.7
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.1758,0.024,7.308,0.000,0.129,0.223
MKT,1.2663,0.085,14.924,0.000,1.099,1.433
MKT2,-3.7217,0.765,-4.866,0.000,-5.225,-2.218
SMB,-0.0018,0.001,-1.657,0.098,-0.004,0.000
HML,0.0020,0.001,1.657,0.098,-0.000,0.004

0,1,2,3
Omnibus:,30.775,Durbin-Watson:,1.846
Prob(Omnibus):,0.0,Jarque-Bera (JB):,44.282
Skew:,0.544,Prob(JB):,2.42e-10
Kurtosis:,4.17,Cond. No.,52800000000000.0


In [20]:
model2.summary()

0,1,2,3
Dep. Variable:,excess_return,R-squared:,0.712
Model:,OLS,Adj. R-squared:,0.709
Method:,Least Squares,F-statistic:,254.3
Date:,"Sun, 14 May 2023",Prob (F-statistic):,1e-109
Time:,23:43:26,Log-Likelihood:,195.15
No. Observations:,416,AIC:,-380.3
Df Residuals:,411,BIC:,-360.1
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.1963,0.025,7.964,0.000,0.148,0.245
MKT,1.2627,0.088,14.366,0.000,1.090,1.435
MKT2,-4.5987,0.790,-5.819,0.000,-6.152,-3.045
SMB,-0.0025,0.001,-2.220,0.027,-0.005,-0.000
HML,0.0027,0.001,2.220,0.027,0.000,0.005

0,1,2,3
Omnibus:,76.474,Durbin-Watson:,1.8
Prob(Omnibus):,0.0,Jarque-Bera (JB):,175.959
Skew:,0.938,Prob(JB):,6.18e-39
Kurtosis:,5.575,Cond. No.,52000000000000.0


In [21]:
model3.summary()

0,1,2,3
Dep. Variable:,excess_return,R-squared:,0.601
Model:,OLS,Adj. R-squared:,0.598
Method:,Least Squares,F-statistic:,155.8
Date:,"Sun, 14 May 2023",Prob (F-statistic):,3.87e-81
Time:,23:43:26,Log-Likelihood:,109.26
No. Observations:,418,AIC:,-208.5
Df Residuals:,413,BIC:,-188.3
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.2005,0.031,6.566,0.000,0.140,0.261
MKT,1.1592,0.108,10.762,0.000,0.947,1.371
MKT2,-4.5939,0.973,-4.724,0.000,-6.506,-2.682
SMB,-0.0027,0.001,-2.030,0.043,-0.005,-8.67e-05
HML,0.0030,0.001,2.030,0.043,9.65e-05,0.006

0,1,2,3
Omnibus:,230.544,Durbin-Watson:,1.837
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2837.195
Skew:,2.077,Prob(JB):,0.0
Kurtosis:,15.069,Cond. No.,52600000000000.0


In [43]:
# Overall
X = ff3[['MKT', 'MKT2', 'SMB', 'HML']]
Y = ff3['excess_return']

# 加入截距项
X = sm.add_constant(X)

# 拟合模型
model = sm.OLS(Y, X).fit()
model.summary()

0,1,2,3
Dep. Variable:,excess_return,R-squared:,0.678
Model:,OLS,Adj. R-squared:,0.677
Method:,Least Squares,F-statistic:,655.9
Date:,"Mon, 15 May 2023",Prob (F-statistic):,1.32e-304
Time:,00:03:31,Log-Likelihood:,497.72
No. Observations:,1250,AIC:,-985.4
Df Residuals:,1245,BIC:,-959.8
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.1909,0.015,12.471,0.000,0.161,0.221
MKT,1.2292,0.054,22.689,0.000,1.123,1.336
MKT2,-4.3039,0.488,-8.815,0.000,-5.262,-3.346
SMB,-0.0023,0.001,-3.409,0.001,-0.004,-0.001
HML,0.0026,0.001,3.409,0.001,0.001,0.004

0,1,2,3
Omnibus:,450.58,Durbin-Watson:,1.65
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3836.528
Skew:,1.427,Prob(JB):,0.0
Kurtosis:,11.094,Cond. No.,52500000000000.0


In [40]:
coefficients = model.params
coefficients

const    0.190857
MKT      1.229247
MKT2    -4.303880
SMB     -0.002312
HML      0.002573
dtype: float64

## regression 2 

In [23]:
# Y
ff4=ff3.copy()
ff4['alpha'] = ff3.excess_return-(ff3.MKT*coefficients[1]+ff3.MKT2*coefficients[2]+ff3.SMB*coefficients[3]+ff3.HML*coefficients[4])
ff4

Unnamed: 0,Manager,Year,Return,MKT2,MKT,HML,SMB,Rf,excess_return,degree_centrality,alpha
0,30000000000000013242,1,-0.186805,0.035357,-0.19,-2.304483e+00,-6.525544e+00,0.03,-0.216805,0.0,0.159766
1,30000000000000013242,2,0.350415,0.064373,0.25,1.367099e+01,-2.069882e+01,0.03,0.320415,0.0,0.207134
2,30000000000000013242,3,0.399091,0.027683,0.17,-7.388570e+11,-8.221398e+11,0.02,0.379091,0.0,0.177557
3,30000000000000013242,4,-0.046272,0.002532,-0.05,-4.666064e+00,7.204264e+00,0.02,-0.066272,0.0,0.034747
4,30000000000000013242,5,-0.152575,0.025984,-0.16,-1.377054e+01,5.098288e+00,0.02,-0.172575,0.0,0.183149
...,...,...,...,...,...,...,...,...,...,...,...
1245,30593893,1,-0.147237,0.035357,-0.19,-2.304483e+00,-6.525544e+00,0.03,-0.177237,0.0,0.199334
1246,30593893,2,0.267458,0.064373,0.25,1.367099e+01,-2.069882e+01,0.03,0.237458,0.0,0.124176
1247,30593893,3,0.390736,0.027683,0.17,-7.388570e+11,-8.221398e+11,0.02,0.370736,0.0,0.169201
1248,30593893,4,0.180475,0.002532,-0.05,-4.666064e+00,7.204264e+00,0.02,0.160475,0.0,0.261493


In [24]:
df_no_duplicates = fund_managers_df.drop_duplicates(subset='Manager')

In [25]:
df_sizee

Unnamed: 0,Manager,Year,size
0,30000000000000013242,1,2487.3250
1,30000000000000013327,1,101.6400
2,30000000000000013376,1,1138.5100
3,30000000000000013382,1,106.1100
4,30000000000000013404,1,612.7700
...,...,...,...
1245,30383013,5,1588.3950
1246,30391589,5,482.3900
1247,30391651,5,8790.2250
1248,30415976,5,387.3100


In [26]:
# X 
ff5= reg.merge(ff4[['Manager', 'degree_centrality',"alpha","Year"]], on=['Manager'], how='inner')
ff6= ff5.merge(df_no_duplicates[['Manager', 'DB']], on=['Manager'], how='inner')
ff6= ff6.merge(df_sizee[['Manager', "size", "Year"]], on=['Manager',"Year"], how='inner')
ff6["age"] = ff6["age"]-(5-ff6["Year"]) # age 
ff6

Unnamed: 0,Manager,age,degree_centrality,alpha,Year,DB,size
0,30000000000000013242,10.083562,0.0,0.159766,1,1,2487.3250
1,30000000000000013242,11.083562,0.0,0.207134,2,1,1695.5450
2,30000000000000013242,12.083562,0.0,0.177557,3,1,2298.1050
3,30000000000000013242,13.083562,0.0,0.034747,4,1,1909.7650
4,30000000000000013242,14.083562,0.0,0.183149,5,1,1501.1500
...,...,...,...,...,...,...,...
1245,30593893,2.080137,0.0,0.199334,1,1,614.1925
1246,30593893,3.080137,0.0,0.124176,2,1,461.9575
1247,30593893,4.080137,0.0,0.169201,3,1,563.1550
1248,30593893,5.080137,0.0,0.261493,4,1,648.5725


In [28]:
ff7 = ff6.copy()
ff7["dc100"] = ff6["degree_centrality"]*100
ff7["lsize"] = np.log(ff6["size"])
ff7['lalpha']=np.log(ff7['alpha'])
ff7

  result = getattr(ufunc, method)(*inputs, **kwargs)
  result = getattr(ufunc, method)(*inputs, **kwargs)


Unnamed: 0,Manager,age,degree_centrality,alpha,Year,DB,size,dc100,lsize,lalpha
0,30000000000000013242,10.083562,0.0,0.159766,1,1,2487.3250,0.0,7.818963,-1.834043
1,30000000000000013242,11.083562,0.0,0.207134,2,1,1695.5450,0.0,7.435760,-1.574390
2,30000000000000013242,12.083562,0.0,0.177557,3,1,2298.1050,0.0,7.739840,-1.728465
3,30000000000000013242,13.083562,0.0,0.034747,4,1,1909.7650,0.0,7.554735,-3.359661
4,30000000000000013242,14.083562,0.0,0.183149,5,1,1501.1500,0.0,7.313987,-1.697457
...,...,...,...,...,...,...,...,...,...,...
1245,30593893,2.080137,0.0,0.199334,1,1,614.1925,0.0,6.420308,-1.612773
1246,30593893,3.080137,0.0,0.124176,2,1,461.9575,0.0,6.135473,-2.086054
1247,30593893,4.080137,0.0,0.169201,3,1,563.1550,0.0,6.333555,-1.776666
1248,30593893,5.080137,0.0,0.261493,4,1,648.5725,0.0,6.474774,-1.341347


alpha ~ degree_centrality + age + size + DB

ff6["lsize"]=math.log(ff6["size"].astype(str))

In [30]:
# model 2
X = ff7[['dc100',"DB","size","age"]]
Y = ff7['alpha']

# 加入截距项
X = sm.add_constant(X)

# 拟合模型
modelf = sm.OLS(Y, X).fit().summary()
modelf

0,1,2,3
Dep. Variable:,alpha,R-squared:,0.009
Model:,OLS,Adj. R-squared:,0.006
Method:,Least Squares,F-statistic:,2.879
Date:,"Sun, 14 May 2023",Prob (F-statistic):,0.0217
Time:,23:44:36,Log-Likelihood:,503.48
No. Observations:,1250,AIC:,-997.0
Df Residuals:,1245,BIC:,-971.3
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.2160,0.014,15.121,0.000,0.188,0.244
dc100,0.0005,0.001,0.570,0.569,-0.001,0.002
DB,-0.0187,0.010,-1.795,0.073,-0.039,0.002
size,-3.796e-06,1.5e-06,-2.524,0.012,-6.75e-06,-8.45e-07
age,-0.0017,0.002,-0.773,0.439,-0.006,0.003

0,1,2,3
Omnibus:,442.891,Durbin-Watson:,1.652
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3725.822
Skew:,1.401,Prob(JB):,0.0
Kurtosis:,10.98,Cond. No.,11700.0


# Stat

In [39]:
round(ff6.corr(),4)

  round(ff6.corr(),4)


Unnamed: 0,age,degree_centrality,alpha,Year,DB,size
age,1.0,-0.0014,-0.0383,0.6671,0.1196,0.1385
degree_centrality,-0.0014,1.0,0.0178,-0.0,-0.1036,0.0521
alpha,-0.0383,0.0178,1.0,0.0,-0.0539,-0.073
Year,0.6671,-0.0,0.0,1.0,-0.0,0.1758
DB,0.1196,-0.1036,-0.0539,-0.0,1.0,-0.0246
size,0.1385,0.0521,-0.073,0.1758,-0.0246,1.0


In [35]:
ff6.describe()

Unnamed: 0,age,degree_centrality,alpha,Year,DB,size
count,1250.0,1250.0,1250.0,1250.0,1250.0,1250.0
mean,4.959345,0.057496,0.190848,3.0,0.728,1517.734873
std,2.120797,0.057045,0.162558,1.41478,0.445168,3085.179176
min,1.005479,0.0,-0.20452,1.0,0.0,0.0
25%,3.439726,0.003937,0.097964,2.0,0.0,203.3525
50%,4.813014,0.031496,0.172904,3.0,1.0,581.275
75%,6.185445,0.122047,0.267711,4.0,1.0,1563.96125
max,14.794521,0.145669,1.756578,5.0,1.0,36671.335


In [36]:
ff3.describe()

Unnamed: 0,Year,Return,MKT2,MKT,HML,SMB,Rf,excess_return,degree_centrality
count,1250.0,1250.0,1250.0,1250.0,1250.0,1250.0,1250.0,1250.0,1250.0
mean,3.0,0.111148,0.031186,0.004,-147771400000.0,-164428000000.0,0.024,0.087148,0.057496
std,1.41478,0.2862,0.019904,0.176434,295661100000.0,328987600000.0,0.004901,0.286549,0.057045
min,1.0,-0.456636,0.002532,-0.19,-738857000000.0,-822139800000.0,0.02,-0.476636,0.0
25%,2.0,-0.132026,0.025984,-0.16,-13.77054,-20.69882,0.02,-0.154591,0.003937
50%,3.0,0.065046,0.027683,-0.05,-4.666064,-6.525544,0.02,0.042274,0.031496
75%,4.0,0.312331,0.035357,0.17,-2.304483,5.098288,0.03,0.288421,0.122047
max,5.0,1.675559,0.064373,0.25,13.67099,7.204264,0.03,1.655559,0.145669
