In [2]:
from imputation import core_utils, core_imputation_model_new
import numpy as np
from tqdm.notebook import tqdm

# Data Loading

this method loads the data from the corresponding `data_path` this would be the feather file shared on google drive, however it is too large to host on github, it returns the characteristic percentile ranks as a numpy array of shape TxNxC where T is the number of dates N the number of stocks and C the number of characteristics, the raw characteristics, the characteristic namess, the dates, returns and permos

In [3]:
data_path = "data/raw_chars_returns_df_yearly_fb_monthly_avg_mergedizes.fthr"
percentile_rank_chars, raw_chars, chars, date_vals, returns, permnos = core_utils.get_data_panel(
    path=data_path, computstat_data_present_filter=True,start_date=19770000)

  0%|          | 0/528 [00:00<?, ?it/s]

In [4]:
char_groupings = core_utils.CHAR_GROUPINGS

The two methods you will be interested in are:
- `core_imputation_model.fit_factors_and_loadings`
- `core_imputation_model.impute_chars`

The first generates the factors and loadings. 

The second runs the regressions to potentially combine time-series information with cross-sectional information and perform the imputation.

The below examples correspond to global and local fits, the parameters are documented in the function definition. 

# Running Imputations

In this section we will run the imputation method described in the paper.

In [None]:
T, N, L = percentile_rank_chars.shape

## Local Fit

We first look at a local estimation, in this case we show how to estimate either the purely cross-sectional model or the cross-sectional model with backwards time series information. 

Local estimation means we allow the loadings and factors in the cross-sectiona model to vary over time, as well as the time series regression coefficients.

In [5]:
imputation = core_imputation_model_2.run_imputation(
    percentile_rank_chars, 
    n_xs_factors=20,
    time_varying_loadings=True,
    xs_factor_reg=0.01,
    use_bw_ts_info=False, 
    use_fw_ts_info=False,
    include_ts_residuals=True,
    min_xs_obs=1, 
    xs_regr_n_iter=3
)

bw_xs_imputation = core_imputation_model_2.run_imputation(
    percentile_rank_chars, 
    n_xs_factors=20,
    time_varying_loadings=True,
    xs_factor_reg=0.01,
    use_bw_ts_info=True, 
    use_fw_ts_info=False,
    include_ts_residuals=True,
    min_xs_obs=1, 
    xs_regr_n_iter=3
)

In [None]:
gamma_ts, lmbda = fit_factors_and_loadings(
    char_panel=percentile_rank_chars, 
    min_chars=min_xs_obs, 
    K=n_xs_factors, 
    num_months_train=T,
    reg=xs_factor_reg,
    time_varying_lambdas=time_varying_loadings,
    n_iter=xs_regr_n_iter,
    eval_data=None,
    run_in_parallel=True
)

## Gobal Fit

We first look at a global estimation, in this case we show how to estimate the global model using forward and backwards time series information as well as the cross-se. 

Local estimation means we allow the loadings and factors in the cross-sectiona model to vary over time, as well as the time series regression coefficients.

In [None]:
imputation = core_imputation_model_2.run_imputation(
    percentile_rank_chars, 
    n_xs_factors=20,
    time_varying_loadings=True,
    xs_factor_reg=0.01,
    use_bw_ts_info=True, 
    use_fw_ts_info=True,
    include_ts_residuals=False,
    min_xs_obs=1, 
    xs_regr_n_iter=3
)


In [125]:
gamma_ts, lmbda = fit_factors_and_loadings(
    char_panel=percentile_rank_chars, 
    min_chars=min_xs_obs, 
    K=n_xs_factors, 
    num_months_train=T,
    reg=xs_factor_reg,
    time_varying_lambdas=False,
    n_iter=xs_regr_n_iter,
    eval_data=None,
    run_in_parallel=True
)