see appendix 2.1.x for detail how I choose financial vars.

Mainly follows:


Bernard, Darren, Terrence Blackburne, and Jacob Thornock. 2020. “Information Flows among Rivals and Corporate Investment.” Journal of Financial Economics 136 (3): 760–79.

Yang, Chin-Sheng, Chih-Ping Wei, and Yu-Hsun Chiang. 2014. “Exploiting Technological Indicators for Effective Technology Merger and Acquisition (M&A) Predictions.” Decision Sciences 45 (1): 147–74.

the fin var table:

| variable               | formula                          | definition                                                          |
|------------------------|----------------------------------|---------------------------------------------------------------------|
|                        |        **Bernard et al. (2020) Appendix A**     |                                                                     |
| size_i                 | at                               | Firm i’s total assets                                               |
| market-to-book ratio_i | (at+prcc_f*csho-ceq-txdb)/at     | Market-to-book assets ratio of firm i                               |
| leverage_i             | (dlc+dltt)/at                    | Book leverage of firm i                                             |
| roa_i                  | ib/at                            | Return-on-assets of firm i                                          |
| sales growth_i         | (sale_{t}-sale_{t-1})/sale_{t-1} | Sales growth of firm i                                              |
| ppe_i                  | ppent/at                         | Firm i’s net plant, property, and equipment, scaled by total assets |
| cash | ch | cash|
|                        |  **Yang et al (2014)  Table 2**     |                                                                     |
| sale_i                 | sale                             | Firm i’s net sales                                                  |
| cash-to-asset ratio_i  | ch/at                            | Cash flow to total assets ratio of firm i                           |
| cash-to-sales ratio_i  | ch/sale                          | Cash flow to sales ratio of firm i                                  |
| sales-to-asset ratio_i | sale/at                          | Net sales/total assets                                              |
| current ratio_i        | act/lct                          | Current assets of firm i scaled by its current liabilities          |
| asset growth_i         | (at_{t}-at_{t-1})/at_{t-1}       | Total asset growth of firm i                           
| gsi | cogs/invt | cost of goods sold divided by inventory |
| de| dlc+dltt/CEQ| debt to common equity|
|rd|rdip|in process RD expense|
|roe|ib/ceq| return on equity|



Finally, if classify variables by types:

|                                | Bernald? | Yang? |
| ------------------------------ | -------- | ----- |
| **holding assets of a firm**       |          |       |
| size                           | 1        | 1     |
| market-to-book asset ratio     | 1        | 1     |
| PPE                            | 1        | 0     |
| cash                           | 1        | 0     |
| cash/asset                     | 0        | 1     |
| cash/sale                      | 0        | 1     |
| current ratio(asset/liability) | 0        | 1     |
| asset growth                   | 0        | 1     |
|                                |          |       |
| **leverage**                       |          |       |
| book leverage(debt/asset)      | 1        | 1     |
| debt/equity                    | 0        | 1     |
|                                |          |       |
| **Business Operation**             |          |       |
| ROA: return on asset           | 1        | 1     |
| sale                           | 0        | 1     |
| sale growth                    | 1        | 1     |
| sale/asset                     | 0        | 1     |
| cost of goods sold/inventory   | 0        | 1     |
| ROE                            | 0        | 1     |
|                                |          |       |
|                                |          |       |
| **R&D**                            |          |       |
| in process RD                  | 0        | 0     |

In [28]:
import numpy as np
import pandas as pd
import wrds
import pickle

In [30]:
tmp_data_path = '../MA_data/data/tmp'


s_year = 1997-1
e_year = 2020

In [31]:
def get_firm_annual_data(tmp_data_path, s_year, e_year, db):
    # Among the selected variables, for those money denominated variables, the unit is million.
    pd_afr = db.raw_sql(sql=f'''
      select gvkey, datadate, at, ceq, csho, prcc_f, txdb, dlc, dltt, ib, sale, ch, ppent, re, act, lct, rdip, cogs, invt
      from comp.funda
      where extract(year from datadate) >= {s_year} AND extract(year from datadate) <= {e_year}
    ''', date_cols=['datadate'])
    #
    pd_afr.gvkey = pd_afr.gvkey.astype(int).astype(str) # ! keep in mind that we do not allow 00 at front
    pd_afr['year'] = pd.DatetimeIndex(pd_afr['datadate']).year
    pd_afr.to_pickle(f"./{tmp_data_path}/fin_raw_{s_year}_{e_year}.pickle")
    print("raw Compustat stored in:", f"./{tmp_data_path}/fin_raw_{s_year}_{e_year}.pickle")

# Download Raw Compustat Fin vars

In [32]:
db = wrds.Connection()
db = wrds.Connection(wrds_username='dayuyang1999')

Enter your WRDS username [dalab5]:dayuyang1999
Enter your password:········
WRDS recommends setting up a .pgpass file.
Create .pgpass file now [y/n]?: y
Created .pgpass file successfully.
Loading library list...
Done
Loading library list...
Done


In [33]:
get_firm_annual_data(tmp_data_path, s_year, e_year, db)

raw Compustat stored in: ./../MA_data/data/tmp/fin_raw_1996_2020.pickle


In [34]:
fin_var = pd.read_pickle(f"{tmp_data_path}/fin_raw_{s_year}_{e_year}.pickle")

# Create MA Variables

In [35]:
def get_lags(sub_pd):
    sub_pd = sub_pd[['gvkey', 'year', 'sale', 'at']]
    sub_pd[['lag_year', 'lag_sale', 'lag_at']] = sub_pd[['year', 'sale', 'at']].shift(1)
    return sub_pd


def create_var(df):
    '''
    df:  financial var, must contain:
        - gvkey
        - datadate
        - and other variables you interested in
    
    '''
    pd_afr = df
    #### pre
    # create year and sort
    pd_afr['year'] = pd_afr.datadate.dt.year 
    pd_afr = pd_afr.sort_values(['gvkey', 'year', 'datadate'], ascending=True)
    
    # check, each firm-year observation should only be observed once
    pd_afr = pd_afr.groupby(['gvkey', 'year'], sort=False).tail(1)
    
    #### create 
    # keep at ,sale, cash, rdip
    ratio_pd = pd_afr[['gvkey', 'year', 'at', 'sale', 'ch', 'rdip']].copy()
    
    # market to book ratio
    ratio_pd['m2b'] = (pd_afr['at']+pd_afr['prcc_f']*pd_afr['csho']-pd_afr['ceq']-pd_afr['txdb'])/(pd_afr['at'])
    
    # leverage
    ratio_pd['lev'] = (pd_afr['dlc']+pd_afr['dltt'])/(pd_afr['at'])
    
    # return on asset
    ratio_pd['roa'] = pd_afr['ib']/(pd_afr['at'])

    # various ratios
    ratio_pd['ppe'] = pd_afr['ppent']/(pd_afr['at'])
    ratio_pd['cash2asset'] = pd_afr['ch']/(pd_afr['at']) 
    ratio_pd['cash2sale'] = pd_afr['ch']/(pd_afr['sale'])
    ratio_pd['sale2asset'] = pd_afr['sale']/(pd_afr['at'])
    
    # current ratio
    ratio_pd['cr'] = pd_afr['act']/(pd_afr['lct']) 
    
    # sale growth
    growth_pd = pd_afr[['gvkey', 'year', 'sale', 'at']].copy()
    growth_pd[['lag_year', 'lag_sale', 'lag_at']] = growth_pd.groupby('gvkey', sort=False)[['year', 'sale', 'at']].shift(1)
    growth_pd['d_sale'] = (growth_pd['sale'] - growth_pd['lag_sale'])/growth_pd['lag_sale']
    growth_pd['d_at'] = (growth_pd['at'] - growth_pd['lag_at'])/growth_pd['lag_at']
    

    # gsi ratio
    ratio_pd['gsi'] = pd_afr['cogs']/pd_afr['invt']

    # debit to equity ratio
    ratio_pd['de'] = (pd_afr['dlc']+pd_afr['dltt'])/pd_afr['ceq']

    # roe
    ratio_pd['roe'] = pd_afr['ib']/pd_afr['ceq']


    #print('check df structure ok: ', growth_pd.head(5))
    
    ratio_pd = ratio_pd.merge(growth_pd[['gvkey', 'year', 'd_sale', 'd_at']])
    
    
    print('check df created ok: \n', ratio_pd.head(1))
    
    print('\n variable lists of ratio pd: ', ratio_pd.columns)

    print(f"the output df contains {len(ratio_pd.columns)} number of variables:", ratio_pd.columns)
    
    return ratio_pd
    

In [36]:
ratio_pd_w_raw = create_var(fin_var)

check df created ok: 
    gvkey  year       at     sale     ch  rdip       m2b       lev      roa  \
0  10000  1996  624.806  721.805  4.664   NaN  0.927419  0.423247  0.02346   

        ppe  cash2asset  cash2sale  sale2asset        cr       gsi        de  \
0  0.203133    0.007465   0.006462    1.155247  2.024097  2.053331  1.184056   

        roe  d_sale  d_at  
0  0.065631     NaN   NaN  

 variable lists of ratio pd:  Index(['gvkey', 'year', 'at', 'sale', 'ch', 'rdip', 'm2b', 'lev', 'roa', 'ppe',
       'cash2asset', 'cash2sale', 'sale2asset', 'cr', 'gsi', 'de', 'roe',
       'd_sale', 'd_at'],
      dtype='object')
the output df contains 19 number of variables: Index(['gvkey', 'year', 'at', 'sale', 'ch', 'rdip', 'm2b', 'lev', 'roa', 'ppe',
       'cash2asset', 'cash2sale', 'sale2asset', 'cr', 'gsi', 'de', 'roe',
       'd_sale', 'd_at'],
      dtype='object')


In [37]:
print(f"saving raw financial variable tables from {s_year} to {e_year}; table size: ", ratio_pd_w_raw.shape)
ratio_pd_w_raw.to_pickle(f'{tmp_data_path}/fv_raw_{s_year}_{e_year}.pickle')


saving raw financial variable tables from 1996 to 2020; table size:  (290959, 19)


In [38]:
ratio_pd_w_raw.columns

Index(['gvkey', 'year', 'at', 'sale', 'ch', 'rdip', 'm2b', 'lev', 'roa', 'ppe',
       'cash2asset', 'cash2sale', 'sale2asset', 'cr', 'gsi', 'de', 'roe',
       'd_sale', 'd_at'],
      dtype='object')

In [39]:
ratio_pd_w_raw.year.unique()

array([1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006,
       2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
       2018, 2019, 2020])