# Idiosyncratic Market Value Factor: 
# Explaining market value by machine learning methods

## Idea
At a given specific time point, market value explanition models think market values of listed firms can be explained by theirs financial informations and market factors.  
Under such assumptions, these models will induce an intrinsic market value of a listed firms at a specific time, however, there's always a residue term, which is also the difference between intrinsic value and current market value.  
We name this residue term as Idiosyncratic Market Value Factor, larger this factor is, indicating more upper bias is between current firm value and intrinsic value, by the idea of mean-reverting, it is more likely for the stock price to drop.  
In other words, this is a way of relative valuation and a smaller IMVF indicates a better performance of the stock. 

## Variables explanation
* $ m_{it} $ is the log-market value of stock $i$ at time $t$  
* $ IND_{it} $ is the dummy variables of different industries  
* $ b_{it} $ is the log-net asset of stock $i$ at time $t$, except firms whose net asset is negtive    
* $ Ln(NI)^+_{it}$ and $ Ln(NI)^-_{it}$ are net income of stock $i$ at time $t$ according to whether their sign  
* $ LEV_{it} $ is firm's financial leverage  
* $ g_{it} $ is the revenue growth rate (YoY quarterly)   
* $ RD_{it} $ is the log-R&D expense, None is replaced by 0  

# First sight: Linear Regression

Consider following model:  
$$ m_{it} = a_{0t}IND_{it}+a_{1t}b_{it}+a_{2t}Ln(NI)^+a_{3t}I_{<0}Ln(NI)^-_{it}+a_{4t}LEV_{it}+a_{5t}g_{it}+a_{6t}RD_{it}+\epsilon_{it} $$

## Machine Learning method: Random Forest
Consider Random Forest model:  
$$ m_{it} = RF(IND_{it},b_{it},Ln(NI)^,Ln(NI)^-_{it},LEV_{it},g_{it},RD_{it})+\epsilon_{it} $$

In [20]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [21]:
from quarter2month import quarter2month
from Winsorize_Fillna_Neutralize import Winsorize_Fillna_Neutralize
import statsmodels.api as sm
from sklearn.tree import DecisionTreeRegressor  
from sklearn.ensemble import RandomForestRegressor

## In this part, we will construct the idiosyncratic market value

### Load the raw data

In [9]:
# Input raw data
comp_frm=pd.read_csv('./data/industrycomp.csv')
netasset_raw=pd.read_excel('./data/netasset.xlsx')
netprofit_raw=pd.read_excel('./data/netprofit_q.xlsx')
lev_raw=pd.read_excel('./data/debt2asset.xlsx')
yoysales_raw=pd.read_excel('./data/yoysales.xlsx')
rd_raw=pd.read_excel('./data/rd.xlsx')
value_frm=pd.read_csv('./data/value.csv')
retf1_frm=pd.read_csv('./data/retf1.csv')
limit_status_frm=pd.read_csv('./data/limit_status_st.csv',encoding='gb18030')
tdate=list(value_frm['date'].drop_duplicates())

### Transform the seasonal data to monthly data

In [10]:
netasset_frm=quarter2month(tdate,netasset_raw,'netasset')
netprofit_frm=quarter2month(tdate,netprofit_raw,'netprofit')
lev_frm=quarter2month(tdate,lev_raw,'lev')
yoysales_frm=quarter2month(tdate,yoysales_raw,'yoysales')
rd_frm=quarter2month(tdate,rd_raw,'rd')

### Merge data

In [11]:
data=pd.merge(value_frm,netasset_frm,how='inner',on=['date','code'])
data=pd.merge(data,netprofit_frm,how='inner',on=['date','code'])
data=pd.merge(data,lev_frm,how='inner',on=['date','code'])
data=pd.merge(data,yoysales_frm,how='inner',on=['date','code'])
data=pd.merge(data,rd_frm,how='inner',on=['date','code'])
data=pd.merge(data,comp_frm,how='inner',on=['code'])
data=pd.merge(data,retf1_frm,how='inner',on=['date','code'])
data=pd.merge(data,limit_status_frm,how='inner',on=['date','code'])
data=data[['date','code','totalmv','netasset','netprofit','lev','yoysales','rd','industry','retf1','limit','status']]

In [17]:
data.head()

Unnamed: 0,date,code,totalmv,netasset,netprofit,lev,yoysales,rd,industry,retf1,limit,status,logmv,lognetasset,netprofit_abs,lognetprofit,ni_negative
121,20070131,000002.SZ,6690315.0,9948030000.0,1456485000.0,72.126,50.5017,0.0,CI005023.WI,-0.037233,0.0,交易,15.716172,23.02064,1456485000.0,21.099292,0.0
450,20070131,000006.SZ,302534.8,1236908000.0,177255800.0,55.7914,-20.4864,0.0,CI005023.WI,0.081375,0.0,交易,12.619952,20.935881,177255800.0,18.993104,0.0
573,20070131,000007.SZ,53824.92,63964260.0,-25825370.0,88.7987,78.1794,0.0,CI005023.WI,0.29496,0.0,交易,10.893492,17.973835,25825370.0,17.066868,1.0
621,20070131,000008.SZ,41245.8,72307980.0,-627861.8,7.1045,-74.1205,0.0,CI005010.WI,0.143159,0.0,交易,10.627304,18.096445,627861.8,13.350075,1.0
1036,20070131,000012.SZ,1298777.0,2559077000.0,317367200.0,55.4746,28.0972,0.0,CI005008.WI,0.133759,0.0,交易,14.076934,21.662912,317367200.0,19.57557,0.0
