#### In this project, I constructed a five-factor model (consisting of reversal, liquidity, scale, value, and growth) and utilized a long/short equity strategy to hedge against market exposures. Backtesting of the strategy in the Chinese stock market between 2010 and 2022 achieves an annualized excess return of 20.4% and a Sharpe ratio of 1.52.

#### Low-attention stocks are identified across the entire market every six months , which may have relatively poor fundamental conditions but still contain individual stocks with very high growth potential. The long and short positions of the portfolio are rebalanced monthly and only low-attention stocks are selected for the portfolio.


### 1. The reversal factor (consisting of monthly reversal and price-volume correlation)

(1) Monthly reversal: The difference in mean returns between days with the top 20% and bottom 20% stock price volatility over the past 20 trading days. Stock price volatility=[(daily high-daily low)/previous close]

(2) Price-volume correlation:
The Pearson correlation coefficient between stock turnover rate on day T and stock price on day T+1 over the past 20 trading days.
 
The reversal factor helps to pick out stocks which have relatively low returns over the past month but are expected to perform well in the following month due to the short-term reversal anomaly. The time series data for both monthly reversal and price-volume correlation are standardized cross-sectionally and then added together to give the reversal factor. Both are negatively correlated with stock returns, hence the minus sign.<br>

__The reversal factor = - (monthly reversal) - (price-volume correlation)__<br>
__Information Coefficient: 9%__

### 2. The liquidity factor (consisting of volatility, turnover rate, and price elasticity)

(1) Volatility: The standard deviation of (daily high/daily low) over the past 20 trading days

(2) Turnover rate: The standard deviation of stock turnover rate over the past 20 trading days

(3) Price elasticity (the Amihud illiquidity measure): The mean value of (daily return/daily trading volume) over the past 20 trading days.

Stocks with relatively low liquidity often tend to outperform more liquid ones due to a higher expected return as compensation for higher trading cost and risk. All three factors are standardized and then combined together to give the liquidity factor. Volatility and turnover rate are negatively correlated with stock returns, hence the minus sign.<br>

__The liquidity factor = price elasticity - volatility - turnover rate__<br>
__Information Coefficient: 12%__

### 3. The size factor

It is taken as the logarithm of the market capitalization. Small-cap stocks tend to outperform large-cap stocks.<br>

__The size factor = - ln (market cap)__<br>
__Information Coefficient: 5.7%__

### 4. The value factor

It is taken as the book-to-market ratio and is used to pick out undervalued stocks which are expected to increase in price.<br>

__The value factor = book-to-market ratio__<br>
__Information Coefficient: 7.5%__

### 5. The growth factor

All stocks are first assigned values of 1, 2 and 3 based on their revenue growth in the cross-section. The same procedure is then repeated based on the acceleration of revenue growth (taken as the difference in revenue growth). These two indices are added to give a final score ranging from 2 to 6, which reflects the growth speed of the individual company. The growth factor picks out stocks with high growth potential.

__The growth factor = Revenue growth and acceleration score__<br>
__Information Coefficient: 4.8%__

### It is worth noting that all the code written for calculating the five factors are contained in other python files in this repository. They are not displayed here due to the large amount of time it takes to load and process the historical market data, which involves thousands of stocks and spans for 12 years.

In [28]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import statsmodels.api as sm
import pickle
import alphalens

In [29]:
# loading all the processed factors and other relevant data

# open price of stocks at the start of every month from 2010.07 to 2022.07
price=pd.read_excel('开盘价.xlsx',skiprows=1,index_col=0)

# loading the five factors
with open('反转因子.pkl', 'rb') as file:
    reversal = pickle.load(file)

with open('流动因子.pkl', 'rb') as file:
    liquidity = pickle.load(file)

with open('规模因子.pkl', 'rb') as file:
    size = pickle.load(file)

with open('价值因子.pkl', 'rb') as file:
    value = pickle.load(file)

with open('qpt因子.pkl', 'rb') as file:
    growth = pickle.load(file)

# Lists of selected low-attention stocks, refreshed twice a year
with open('持仓.pkl', 'rb') as file:
    low_att_stocks = pickle.load(file)

# date index
start_date = '2010-07-01'
end_date = '2022-07-01'
date = pd.date_range(start=start_date, end=end_date, freq='MS')

In [27]:
# essential functions

# standardization
def stan(x):
    mean=x.mean()
    std=x.std()
    return (x-mean)/std

# calculating maximum drawdown
def max_drawdown(returns):
    cumulative_returns = (1 + returns).cumprod()
    peak = cumulative_returns.cummax()
    drawdown = (cumulative_returns / peak) - 1
    return drawdown.min()

# using the median absolute deviation (MAD) to identify outliers in the cross-section, 
# and cap them at 3 MADs away from the median
from scipy.stats import median_abs_deviation as mad
def madf(df,n):
    median=np.median(df)
    sd=mad(df)
    up=median+n*sd
    down=median-n*sd
    return df.clip(down,up)

# This function divides the stocks into quantiles based on the size of a certain factor, and then extracts
# the lists of stocks in the top and bottom quantiles. These would be respectively stocks in the long and 
# short positions every month during rebalancing.



24


In [31]:
d

Timestamp('2010-07-01 00:00:00', freq='MS')

Index(['000001.SZ', '000002.SZ', '000004.SZ', '000006.SZ', '000008.SZ',
       '000009.SZ', '000010.SZ', '000011.SZ', '000012.SZ', '000014.SZ',
       ...
       '601898.SH', '601899.SH', '601918.SH', '601919.SH', '601939.SH',
       '601958.SH', '601988.SH', '601991.SH', '601998.SH', '601999.SH'],
      dtype='object', length=1417)


2
