## **Dividend Capture Strategy and Ex-Dividend Day Anomalies**

### Objective
This project models a **dividend capture strategy** to test the **Efficient Markets Hypothesis (EMH)**.
It examines whether investors can earn abnormal returns by buying stocks prior to their ex-dividend date
and selling them afterward.

The analysis replicates and extends the findings from Kalay (1982), using daily CRSP data and Fama-French
factors.

### Methodology Overview
1. Clean and preprocess CRSP dividend event data.  
2. Compute **prior-day prices**, **market capitalization**, and **dividend yields**.  
3. Estimate expected returns using the **CAPM** with pre-computed yearly betas.  
4. Calculate **abnormal returns**, **Sharpe ratios**, and **t-statistics**.  
5. Evaluate strategy performance for:  
   - Close-to-close returns  
   - Close-to-open (overnight) returns  
   - Returns after transaction costs  
6. Sort results by **Market Cap**, **Dividend Yield (D/P)**, and **Beta Deciles**.

### Key Insights
- The apparent profitability of the dividend capture strategy weakens once transaction costs are included.  
- Abnormal returns are statistically insignificant, supporting market efficiency.  
- Larger-cap and low-beta stocks show narrower bid-ask spreads and smaller net returns.

### Tools
Python (pandas, numpy, statsmodels), Jupyter Notebook | Data: CRSP, Fama-French Factors (WRDS)


In [1]:
import numpy as np
import pandas as pd

# Load the CSV file into a DataFrame
dividend_df = pd.read_csv('CRSP_Dividends_HW3.csv', index_col='date', parse_dates=True)
dividend_df

Unnamed: 0_level_0,PERMNO,SHRCD,PERMCO,DISTCD,DIVAMT,PRC,VOL,RET,BID,ASK,SHROUT,OPENPRC,RETX,vwretd
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1986-03-10,10001,11,7953,1232,0.095,-6.2500,100.0,0.015200,,,985.0,,0.000000,0.004365
1986-06-09,10001,11,7953,1232,0.105,-6.1875,1050.0,0.016970,,,985.0,,0.000000,-0.019374
1986-09-08,10001,11,7953,1232,0.105,6.7500,1610.0,0.054615,6.375,6.75,985.0,,0.038462,-0.010801
1986-12-08,10001,11,7953,1232,0.105,6.5000,400.0,0.016154,6.500,7.00,991.0,,0.000000,-0.000944
1987-03-09,10001,11,7953,1232,0.105,6.1250,650.0,0.027629,5.875,6.25,991.0,,0.010309,-0.006488
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-05-27,93429,11,53447,1232,0.480,111.8500,491916.0,0.026782,111.620,111.70,106172.0,109.29,0.022395,0.024482
2022-08-30,93429,11,53447,1232,0.480,117.7800,825426.0,-0.017203,117.940,117.99,106189.0,119.83,-0.021192,-0.011447
2022-11-29,93429,11,53447,1232,0.500,123.2500,555365.0,-0.007379,123.250,123.27,106062.0,124.44,-0.011390,-0.000744
2023-02-27,93429,11,53447,1232,0.500,127.5800,577552.0,-0.012414,127.410,127.59,106082.0,129.64,-0.016270,0.003326


Question 1 data cleaning

In [2]:
# delete rows with 0 volume
dividend_df = dividend_df[dividend_df['VOL'] != 0]
# We remove observations where VOL = 0 because those represent days when no shares traded.
# CHECK HOW MANY 0 VOL REMAINING IN VOL COLUMN
dividend_df['VOL'].value_counts()

VOL
100.0        7316
200.0        6789
300.0        5525
500.0        5352
400.0        4853
             ... 
555365.0        1
1086317.0       1
839393.0        1
360050.0        1
3250802.0       1
Name: count, Length: 109521, dtype: int64

In [3]:
# Remove negative and zero dividend amounts. and remove dividend amounts less than 0.01.
dividend_df = dividend_df[dividend_df['DIVAMT'] > 0.01]
# THE DIVAMT COLUMN NOW HAS NO NEGATIVE OR ZERO VALUES
dividend_df['DIVAMT'].describe()

count    430826.000000
mean          0.232045
std           0.451217
min           0.010250
25%           0.090000
50%           0.165000
75%           0.300000
max          85.000000
Name: DIVAMT, dtype: float64

In [4]:
# set negative price equal to the absolute value of the price
dividend_df['PRC'] = dividend_df['PRC'].apply(lambda x: abs(x) if x < 0 else x)
# THE PRC COLUMN NOW HAS NO NEGATIVE VALUES
dividend_df['PRC'].describe()

count    430826.000000
mean         31.803662
std          48.606915
min           0.031250
25%          14.625000
50%          23.750000
75%          37.375000
max        4280.040040
Name: PRC, dtype: float64

In [5]:
# Remove stocks that have a market capitalization below $50M and SHROUT values here are in 1,000.
dividend_df['MKT_CAP'] = dividend_df['PRC'] * dividend_df['SHROUT'] / 1000 # MKT_CAP in millions
dividend_df = dividend_df[dividend_df['MKT_CAP'] >= 50]
# THE MKT_CAP COLUMN NOW HAS NO VALUES BELOW 50
dividend_df['MKT_CAP'].describe()

count    3.330350e+05
mean     5.052210e+03
std      2.943542e+04
min      5.000000e+01
25%      1.537730e+02
50%      4.859085e+02
75%      1.981762e+03
max      2.813308e+06
Name: MKT_CAP, dtype: float64

In [6]:
# show the  mean for price, dividend amount, and market cap of the cleaned dataset
dividend_df[['PRC', 'DIVAMT', 'MKT_CAP']].describe()

Unnamed: 0,PRC,DIVAMT,MKT_CAP
count,333035.0,333035.0,333035.0
mean,37.030485,0.255323,5052.21
std,53.972846,0.495639,29435.42
min,1.01,0.0108,50.0
25%,18.75,0.1,153.773
50%,28.0,0.195,485.9085
75%,42.46,0.325,1981.762
max,4280.04004,85.0,2813308.0


Question2

In [7]:
# calculate PriorClose using prc and ret
# using retx to calculate prior close because retx excludes dividends
dividend_df['PriorClose'] = dividend_df['PRC'] / (1 + dividend_df['RETX'])
dividend_df.head()

Unnamed: 0_level_0,PERMNO,SHRCD,PERMCO,DISTCD,DIVAMT,PRC,VOL,RET,BID,ASK,SHROUT,OPENPRC,RETX,vwretd,MKT_CAP,PriorClose
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2010-05-12,10001,11,7953,1222,0.045,10.53,10100.0,-0.007974,10.53,10.55,6070.0,10.61,-0.012195,0.016275,63.9171,10.659999
2010-06-11,10001,11,7953,1222,0.045,11.79,10600.0,-0.002949,11.8,11.82,6071.0,11.89,-0.00674,0.006196,71.57709,11.870004
2010-07-13,10001,11,7953,1222,0.045,11.0,5100.0,0.004091,10.98,11.04,6080.0,10.98,0.0,0.01722,66.88,11.0
2010-08-11,10001,11,7953,1222,0.045,11.76,9000.0,0.000424,11.76,11.89,6080.0,11.85,-0.00339,-0.029241,71.5008,11.800002
2010-09-13,10001,11,7953,1222,0.045,11.1,5500.0,-0.004911,11.07,11.08,6073.0,11.23,-0.008929,0.012873,67.4103,11.200005


In [8]:
# create column (Pt-1 – Pt) / D
dividend_df['PriceChangevsDividend'] = (dividend_df['PriorClose'] - dividend_df['PRC']) / dividend_df['DIVAMT']
dividend_df['PriceChangevsDividend'].describe() 

count    333035.000000
mean          0.635993
std           9.282540
min        -485.002094
25%          -0.799603
50%           0.707476
75%           2.205901
max         922.868789
Name: PriceChangevsDividend, dtype: float64

The (Pt-1 – Pt) / D ratio measures how much the stock price drops on the ex-dividend day relative to the dividend amount. If the market is perfectly efficient, the stock price should decline by approximately the dividend amount when it goes ex-dividend. In other word, the ratio should be approximately equal to 1 if the EMH is true. 

Mean = 0.640
On average, the price drop on ex-dividend day is only about 64% of the dividend amount, which means if we hold the stock before the ex-dividend day and sell it at the following day, we may have possible short-term trading gain
the ratio Range: from –485 to 923, which is extremely wide — likely driven by outliers or stocks with small dividend


Under the EMH, we would expect this ratio to be close to 1, because prices should fully and immediately reflect the dividend payment.
But The average ratio is significantly below 1 which somehow show that the market is not efficient. And we may have arbitrage opportunity by buying just before the ex-dividend date and selling at the ex-dividend day.

In [9]:
fama_french = pd.read_csv('FamaFrenchDaily_HW3.csv', index_col='date', parse_dates=True)
fama_french

Unnamed: 0_level_0,Mkt-RF,SMB,HML,RF
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1926-07-01,0.10,-0.25,-0.27,0.009
1926-07-02,0.45,-0.33,-0.06,0.009
1926-07-06,0.17,0.30,-0.39,0.009
1926-07-07,0.09,-0.58,0.02,0.009
1926-07-08,0.21,-0.38,0.19,0.009
...,...,...,...,...
2024-09-24,0.24,0.12,-0.58,0.020
2024-09-25,-0.29,-0.66,-0.70,0.020
2024-09-26,0.42,0.21,0.43,0.020
2024-09-27,-0.08,0.44,0.55,0.020


In [10]:
# combine the two dataframes on the date index
combined_df = pd.merge(dividend_df, fama_french, left_index=True, right_index=True, how='inner')
combined_df

Unnamed: 0_level_0,PERMNO,SHRCD,PERMCO,DISTCD,DIVAMT,PRC,VOL,RET,BID,ASK,...,OPENPRC,RETX,vwretd,MKT_CAP,PriorClose,PriceChangevsDividend,Mkt-RF,SMB,HML,RF
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2010-05-12,10001,11,7953,1222,0.045,10.53,10100.0,-0.007974,10.53,10.55,...,10.61,-0.012195,0.016275,63.91710,10.659999,2.888860,1.63,1.47,0.15,0.001
2010-06-11,10001,11,7953,1222,0.045,11.79,10600.0,-0.002949,11.80,11.82,...,11.89,-0.006740,0.006196,71.57709,11.870004,1.777863,0.62,0.95,0.05,0.001
2010-07-13,10001,11,7953,1222,0.045,11.00,5100.0,0.004091,10.98,11.04,...,10.98,0.000000,0.017220,66.88000,11.000000,0.000000,1.76,1.61,0.63,0.001
2010-08-11,10001,11,7953,1222,0.045,11.76,9000.0,0.000424,11.76,11.89,...,11.85,-0.003390,-0.029241,71.50080,11.800002,0.888933,-2.91,-1.08,-0.64,0.001
2010-09-13,10001,11,7953,1222,0.045,11.10,5500.0,-0.004911,11.07,11.08,...,11.23,-0.008929,0.012873,67.41030,11.200005,2.222330,1.30,1.13,0.23,0.001
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-05-27,93429,11,53447,1232,0.480,111.85,491916.0,0.026782,111.62,111.70,...,109.29,0.022395,0.024482,11875.33820,109.399987,-5.104193,2.58,0.33,-1.24,0.001
2022-08-30,93429,11,53447,1232,0.480,117.78,825426.0,-0.017203,117.94,117.99,...,119.83,-0.021192,-0.011447,12506.94042,120.330034,5.312571,-1.11,-0.24,-0.24,0.008
2022-11-29,93429,11,53447,1232,0.500,123.25,555365.0,-0.007379,123.25,123.27,...,124.44,-0.011390,-0.000744,13072.14150,124.669991,2.839982,-0.18,0.10,1.02,0.014
2023-02-27,93429,11,53447,1232,0.500,127.58,577552.0,-0.012414,127.41,127.59,...,129.64,-0.016270,0.003326,13533.94156,129.690057,4.220114,0.31,0.16,-0.28,0.018


In [11]:
# calculate the annual return if we buy the stock just before the ex-dividend date and sell it at the ex-dividend day
annual_return = combined_df['RET'].mean()*252 
annual_std = combined_df['RET'].std() * np.sqrt(252)
# calculate excess return
combined_df['Excess_RET'] = combined_df['RET'] - combined_df['RF']/100
sharpe_ratio = combined_df['Excess_RET'].mean() / combined_df['Excess_RET'].std() * np.sqrt(252)
print(f"Annual Return: {annual_return*100:.2f}%")
print(f"Annual Standard Deviation: {annual_std*100:.2f}")
print(f"Sharpe Ratio: {sharpe_ratio:.2f}")

Annual Return: 65.43%
Annual Standard Deviation: 35.62
Sharpe Ratio: 1.71


Question 3

In [12]:
Yearly_Beta = pd.read_csv('Yearly_Betas_HW3.csv', index_col = 'DATE', parse_dates = ['DATE'])
Yearly_Beta

Unnamed: 0_level_0,PERMNO,n,RET,b_mkt,alpha,ivol,tvol,R2,exret
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1988-12-30,10001,35,-2.1132%,0.0730,0.0025,4.1438%,4.1674%,1.1284%,-2.8553%
1989-12-29,10001,47,3.7975%,0.0799,0.0118,5.4991%,5.5165%,0.6286%,3.0953%
1990-12-31,10001,59,0.1299%,0.0986,0.0081,5.0472%,5.0760%,1.1321%,-0.7169%
1991-12-31,10001,60,-0.6780%,-0.0132,0.0135,5.7039%,5.7044%,0.0155%,-1.0810%
1992-12-31,10001,60,-1.5130%,-0.0178,0.0164,6.4095%,6.4099%,0.0116%,-1.7180%
...,...,...,...,...,...,...,...,...,...
2018-12-31,93436,60,-5.0445%,0.6010,0.0156,11.7355%,11.8998%,2.7418%,0.8336%
2019-12-31,93436,60,26.7897%,0.6279,0.0114,12.0413%,12.2509%,3.3923%,25.0443%
2020-12-31,93436,60,24.3252%,2.0822,0.0339,16.7153%,19.3599%,25.4539%,14.8778%
2021-12-31,93436,60,-7.6855%,1.9930,0.0416,17.3051%,19.6319%,22.2996%,-13.9528%


In [13]:
#Extract Year from DATE
Yearly_Beta['Year'] = Yearly_Beta.index.year
#To aviod look ahead bias
combined_df['Year'] = combined_df.index.year
combined_df['Beta_Year'] = combined_df['Year']-1
#merge yearly beta into combined df based on PERMNO, Year
combined_df = pd.merge(combined_df, Yearly_Beta, left_on=['PERMNO', 'Beta_Year'], right_on=['PERMNO', 'Year'], how = 'inner')
combined_df

Unnamed: 0,PERMNO,SHRCD,PERMCO,DISTCD,DIVAMT,PRC,VOL,RET_x,BID,ASK,...,Beta_Year,n,RET_y,b_mkt,alpha,ivol,tvol,R2,exret,Year_y
0,10001,11,7953,1222,0.045,10.53,10100.0,-0.007974,10.53,10.55,...,2009,60,16.2621%,0.2416,0.0184,9.0589%,9.1302%,1.5549%,15.5712%,2009
1,10001,11,7953,1222,0.045,11.79,10600.0,-0.002949,11.80,11.82,...,2009,60,16.2621%,0.2416,0.0184,9.0589%,9.1302%,1.5549%,15.5712%,2009
2,10001,11,7953,1222,0.045,11.00,5100.0,0.004091,10.98,11.04,...,2009,60,16.2621%,0.2416,0.0184,9.0589%,9.1302%,1.5549%,15.5712%,2009
3,10001,11,7953,1222,0.045,11.76,9000.0,0.000424,11.76,11.89,...,2009,60,16.2621%,0.2416,0.0184,9.0589%,9.1302%,1.5549%,15.5712%,2009
4,10001,11,7953,1222,0.045,11.10,5500.0,-0.004911,11.07,11.08,...,2009,60,16.2621%,0.2416,0.0184,9.0589%,9.1302%,1.5549%,15.5712%,2009
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
303580,93429,11,53447,1232,0.480,111.85,491916.0,0.026782,111.62,111.70,...,2021,60,1.1323%,0.6066,0.0036,6.5854%,7.1645%,15.5130%,-0.7685%,2021
303581,93429,11,53447,1232,0.480,117.78,825426.0,-0.017203,117.94,117.99,...,2021,60,1.1323%,0.6066,0.0036,6.5854%,7.1645%,15.5130%,-0.7685%,2021
303582,93429,11,53447,1232,0.500,123.25,555365.0,-0.007379,123.25,123.27,...,2021,60,1.1323%,0.6066,0.0036,6.5854%,7.1645%,15.5130%,-0.7685%,2021
303583,93429,11,53447,1232,0.500,127.58,577552.0,-0.012414,127.41,127.59,...,2022,60,-1.0801%,0.5459,-0.0012,6.4244%,7.1152%,18.4736%,2.1468%,2022


In [14]:
#compute the expected return from the CAPM model,assume alpha = 0
combined_df['alpha'] = 0
combined_df['EXP_RET'] = combined_df['b_mkt'] * (combined_df['vwretd'] -combined_df['RF']) +combined_df['RF']
print(f"Expected return:{combined_df['EXP_RET'].mean()*100:.2f}%")

Expected return:0.05%


In [15]:
#Compute the abnormal return for each dividend event
combined_df['AB_RET'] = combined_df['RET_x']-combined_df['EXP_RET']
print(combined_df['AB_RET'].describe())

count    303537.000000
mean          0.002008
std           0.023145
min          -0.727306
25%          -0.010166
50%           0.000740
75%           0.012481
max           0.687038
Name: AB_RET, dtype: float64


In [16]:
#Compute the mean and standard deviation of the abnormal return (annualized) along with the average beta
annual_abreturn = combined_df['AB_RET'].mean()*252
annual_abstd = combined_df['AB_RET'].std()*np.sqrt(252)
ave_Beta = combined_df['b_mkt'].mean()
print(f"Annual abnormal return mean:{annual_abreturn*100:.2f}%")
print(f"Annual abnormal return std:{annual_abstd*100:.2f}")
print(f"Average Beta:{ave_Beta:.2f}")

Annual abnormal return mean:50.61%
Annual abnormal return std:36.74
Average Beta:0.97


In [17]:
#Compute the t-statistic and Sharpe ratio of the abnormal return.
t_stat = (combined_df['AB_RET'].mean()/combined_df['AB_RET'].std())*np.sqrt(len(combined_df))
Sharpe_Ratio = (combined_df['AB_RET'].mean()/combined_df['AB_RET'].std())*np.sqrt(252)
print(f"t_statistic:{t_stat:.2f}")
print(f"Sharpe_Ratio:{Sharpe_Ratio:.2f}")

t_statistic:47.81
Sharpe_Ratio:1.38


The abnormal return is statistically significant given that t statistic is 47.81.

Compared to the average annual return 65.43% in question 2, the average abnormal return 50.61% is explaining the majority of the total return, suggesting that the observed high returns are largely driven by systematic factors captured by the CAPM model.

Question 4

In [18]:
#when buying at the closing price on the prior day and selling at the open price
combined_df['RET_OPEN']=((combined_df['OPENPRC']+combined_df['DIVAMT'])/combined_df['PriorClose'])-1
Annual_Return = combined_df['RET_OPEN'].mean()*252
Annual_Std = combined_df['RET_OPEN'].std()*np.sqrt(252)
T_stat = (combined_df['RET_OPEN'].mean()/combined_df['RET_OPEN'].std())*np.sqrt(len(combined_df))
Sharpe_R = ((combined_df['RET_OPEN'] - combined_df['RF']/100).mean()/(combined_df['RET_OPEN'] - combined_df['RF']/100).std())*np.sqrt(252)
print(f"Annual return mean:{Annual_Return:.2%}")
print(f"Annual return std:{Annual_Std*100:.2f}")
print(f"T_statistic:{T_stat:.2f}")
print(f"Sharpe_Ratio:{Sharpe_R:.2f}")


Annual return mean:38.34%
Annual return std:23.16
T_statistic:57.46
Sharpe_Ratio:1.55


These returns are statiscally significant as indicated by the t statistic of 57.46.

The overnight strategy, which buying at prior close, selling at open price, the annual return 38.34% is lower than the annual return 65.43% in the full day close to close strategy.

This is because most of the market’s positive risk premium is earned during trading hours when investors take on systematic risk.
Overnight periods, when markets are closed, tend to reflect negative news adjustments and limited liquidity, leading to lower or even negative overnight returns.

Question 5

In [19]:
#calculate transaction cost and net return after transaction cost
combined_df['spread'] = combined_df['ASK']-combined_df['BID']
combined_df['midprice'] = (combined_df['ASK']+combined_df['BID'])/2

#Valid condition: ASK>0, BID>0, ASK>=BID
condition = (combined_df['ASK']>0) & (combined_df['BID']>0) & (combined_df['ASK']>=combined_df['BID'])
combined_df['transac_cost'] = (combined_df['spread']/combined_df['midprice']).where(condition)

#Exclude the transaction cost above 99.5th percentile
combined_df['transac_cost'] = (combined_df['transac_cost']).clip(upper = combined_df['transac_cost'].quantile(0.995))
print(combined_df['transac_cost'].describe())
#calculate transaction cost
transac_cost = combined_df['transac_cost'].mean()

# assume we pay transaction cost when buying and selling the stock
combined_df['ret_after_cost'] = combined_df['RET_x'] - 2*transac_cost

#calculate annual return after transaction cost
ret_after_cost = combined_df['ret_after_cost'].mean()
annnet_return = combined_df['ret_after_cost'].mean()*252
print(f"Average transaction cost:{transac_cost*100:.2f}%")
print(f"Daily return after cost:{ret_after_cost*100:.2f}%")
print(f"Annual net return:{annnet_return*100:.2f}%")

count    175329.000000
mean          0.009818
std           0.014340
min           0.000000
25%           0.000541
50%           0.003284
75%           0.013582
max           0.083333
Name: transac_cost, dtype: float64
Average transaction cost:0.98%
Daily return after cost:-1.71%
Annual net return:-431.19%


Question 6

In [20]:
# Dividend Yield (D/P)
combined_df['DIV_YIELD'] = combined_df['DIVAMT'] / combined_df['PRC']

# Rank by three metrics
combined_df['MKT_decile']  = pd.qcut(combined_df['MKT_CAP'], 10, labels=False) + 1
combined_df['DY_decile']   = pd.qcut(combined_df['DIV_YIELD'], 10, labels=False) + 1
combined_df['BETA_decile'] = pd.qcut(combined_df['b_mkt'], 10, labels=False) + 1
combined_df.head(20)

Unnamed: 0,PERMNO,SHRCD,PERMCO,DISTCD,DIVAMT,PRC,VOL,RET_x,BID,ASK,...,AB_RET,RET_OPEN,spread,midprice,transac_cost,ret_after_cost,DIV_YIELD,MKT_decile,DY_decile,BETA_decile
0,10001,11,7953,1222,0.045,10.53,10100.0,-0.007974,10.53,10.55,...,-0.012664,-0.000469,0.02,10.54,0.001898,-0.02761,0.004274,1,3,1.0
1,10001,11,7953,1222,0.045,11.79,10600.0,-0.002949,11.8,11.82,...,-0.005204,0.005476,0.02,11.81,0.001693,-0.022585,0.003817,1,3,1.0
2,10001,11,7953,1222,0.045,11.0,5100.0,0.004091,10.98,11.04,...,-0.000828,0.002273,0.06,11.01,0.00545,-0.015545,0.004091,1,3,1.0
3,10001,11,7953,1222,0.045,11.76,9000.0,0.000424,11.76,11.89,...,0.00673,0.008051,0.13,11.825,0.010994,-0.019212,0.003827,1,3,1.0
4,10001,11,7953,1222,0.045,11.1,5500.0,-0.004911,11.07,11.08,...,-0.00878,0.006696,0.01,11.075,0.000903,-0.024547,0.004054,1,3,1.0
5,10001,11,7953,1222,0.045,11.18,17900.0,0.008536,11.17,11.18,...,0.005655,-0.002247,0.01,11.175,0.000895,-0.0111,0.004025,1,3,1.0
6,10001,11,7953,1222,0.045,9.79,546000.0,-0.083411,9.79,9.84,...,-0.085472,-0.071296,0.05,9.815,0.005094,-0.103047,0.004597,1,4,1.0
7,10001,11,7953,1222,0.045,10.48,30400.0,0.012019,10.45,10.48,...,0.011217,0.013942,0.03,10.465,0.002867,-0.007617,0.004294,1,3,1.0
8,10001,11,7953,1222,0.045,10.61,47600.0,0.010911,10.6,10.62,...,0.008187,0.011859,0.02,10.61,0.001885,-0.008725,0.004241,1,3,1.0
9,10001,11,7953,1222,0.045,10.85,14400.0,0.004147,10.83,10.85,...,0.001603,-0.007834,0.02,10.84,0.001845,-0.015489,0.004147,2,3,1.0


In [21]:
# Calculate the bid-ask spread
combined_df['BID_ASK_Spread'] = combined_df['ASK'] - combined_df['BID']
# results by dividend yield decile
res1 = combined_df.groupby('DY_decile').agg(
    mean_ret=('RET_x','mean'),
    mean_spread=('BID_ASK_Spread','mean'),
    mean_net=('ret_after_cost','mean')
).reset_index()
print(res1)

   DY_decile  mean_ret  mean_spread  mean_net
0          1  0.001858     0.332839 -0.017778
1          2  0.002377     0.259073 -0.017259
2          3  0.002480     0.257874 -0.017156
3          4  0.002781     0.257788 -0.016855
4          5  0.003119     0.261157 -0.016517
5          6  0.003238     0.264088 -0.016398
6          7  0.003406     0.260833 -0.016230
7          8  0.002917     0.259622 -0.016720
8          9  0.002712     0.259118 -0.016924
9         10  0.000365     0.218185 -0.019271


Raw returns tend to increase slightly from low to medium D/P。
However, transaction costs (bid–ask spreads) remain large, and net returns are negative across all deciles.
Therefore, while high dividend yield stocks have slightly higher gross returns, these are fully offset by price impact costs.

In [22]:
# results by market cap decile
res2 = combined_df.groupby('MKT_decile').agg(
    mean_ret=('RET_x','mean'),
    mean_spread=('BID_ASK_Spread','mean'),
    mean_net=('ret_after_cost','mean')
).reset_index()
print(res2)

   MKT_decile  mean_ret  mean_spread  mean_net
0           1  0.004265     0.527135 -0.015371
1           2  0.003795     0.487864 -0.015841
2           3  0.003668     0.420546 -0.015969
3           4  0.002782     0.365092 -0.016855
4           5  0.002511     0.316720 -0.017125
5           6  0.002291     0.249718 -0.017345
6           7  0.001768     0.222169 -0.017869
7           8  0.001564     0.164885 -0.018072
8           9  0.001466     0.167786 -0.018170
9          10  0.001145     0.111353 -0.018492


Small-cap stocks exhibit the highest raw returns, but also much larger bid–ask spreads (illiquidity).
As market cap increases, spreads narrow and net returns become less negative.
The pattern suggests that liquidity costs dominate—small, illiquid stocks seem profitable on paper but yield poor net returns once costs are included.

In [23]:
# results by CAPM beta decile
res3 = combined_df.groupby('BETA_decile').agg(
    mean_ret=('RET_x','mean'),
    mean_spread=('BID_ASK_Spread','mean'),
    mean_net=('ret_after_cost','mean')
).reset_index()
print(res3)

   BETA_decile  mean_ret  mean_spread  mean_net
0          1.0  0.002987     0.372825 -0.016650
1          2.0  0.002192     0.335721 -0.017445
2          3.0  0.002321     0.303871 -0.017315
3          4.0  0.002491     0.287754 -0.017145
4          5.0  0.002494     0.275077 -0.017142
5          6.0  0.002740     0.249033 -0.016896
6          7.0  0.002558     0.229956 -0.017078
7          8.0  0.002481     0.189330 -0.017155
8          9.0  0.002587     0.167128 -0.017049
9         10.0  0.002407     0.142735 -0.017229


Average returns are nearly flat across Beta deciles, implying systematic risk (Beta) has little effect on ex-dividend-day returns.
higher-Beta stocks tend to have smaller spreads and thus slightly less negative net returns.

There is no robust evidence that sorting the ex-dividend-day trading strategy by firm characteristics (D/P, size, or beta) can systematically improve profitability. (since returns after cost are all negative in each decile)
While some subgroups exhibit higher nominal returns, transaction costs eliminate any arbitrage opportunities, reaffirming that bid-ask spread and transaction costs jointly prevent simple profit strategies around ex-dividend dates.

If we want to increase our strategy net returns, we can construct the portfolio with low D/P and high Market Capitalization.