In [1]:
## Preamble: Package Loading
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec
from mpl_toolkits.mplot3d import Axes3D
import pandas as pd
import itertools as iter
import math 
import sys
import os
import json

<h2> Panel Selection and Control: Monte Carlo Data Generating Process  </h2> 

<h3> 1.0 DGP Description </h3>

Here I decribe the manner in which the data used in the monte carlo simulation is generated. 

<h3> 1.1 Error, Instrument, and Exogenous Variable Generation </h3>

Let; $n_{tp} \equiv T$ be the total number of time periods, $n_{end} \equiv p_1$ be the number of endogneous regressors included in the primary regression, $n_{exo} \equiv p_2$ be the number of exogenous regressors included in the primary regression, and $n_{tinst} \equiv w$, $ n_{cinst} \equiv w_j$ be the total number of available instruments and the number of instruments relevant to each crossection respectively. Now let,

$$
\begin{align*} 
\rho_{er} &= \begin{bmatrix} \rho_{er,1} & \rho_{er,2} & \cdots & \rho_{er,n_{end}} \end{bmatrix} \\
\rho_{inst} &= \begin{bmatrix} \rho_{inst,1} & \rho_{inst,2} & \cdots & \rho_{inst,n_{inst}-1} \end{bmatrix}\\
\rho_{ex} &= \begin{bmatrix} \rho_{ex,1} & \rho_{ex,2} & \cdots & \rho_{ex,n_{ex}-1} \end{bmatrix}  
\end{align*}
$$

So that I can define the covariance matrices for *each* cross section as follows 

$$
\begin{align*}
cv_{er} &= \begin{bmatrix} 
1 & \rho_{er,1} & \rho_{er,2} & \cdots & \rho_{er,n_{end}} \\
\rho_{er,1} & 1  & \rho_{er,1} &\cdots & \rho_{er,n_{end}-1} \\
\rho_{er,2} & \rho_{er,1} & 1 & \cdots & \rho_{er,n_{end}-2} \\
\vdots & &&\ddots&  \\
 \rho_{er,n_{end}} & \rho_{er,n_{end}-1} & \rho_{er,n_{end}-2} & \cdots &  1 
\end{bmatrix}
%
\hspace{1cm} 
%
%
cv_{ex} &= \begin{bmatrix} 
1 & \rho_{ex,1} & \rho_{ex,2} & \cdots & \rho_{ex,n_{ex}-1} \\
\rho_{ex,1} & 1  & \rho_{ex,1} &\cdots & \rho_{ex,n_{ex}-2} \\
\rho_{ex,2} & \rho_{ex,1} & 1 & \cdots & \rho_{ex,n_{ex}-3} \\
\vdots & &&\ddots&  \\
 \rho_{ex,n_{ex}-1} & \rho_{ex,n_{ex}-2} & \rho_{ex,n_{ex}-2} & \cdots &  1 
\end{bmatrix}
\end{align*} 
$$

Since all crossections will use some subset of the vector of instruments the following is the full covariance matrix for all instruments.

$$
CV_{inst} = \begin{bmatrix} 
1 & \rho_{inst,1} & \rho_{inst,2} & \cdots & \rho_{inst,n_{tinst}-1} \\
\rho_{inst,1} & 1  & \rho_{inst,1} &\cdots & \rho_{inst,n_{tinst}-2} \\
\rho_{inst,2} & \rho_{tinst,1} & 1 & \cdots & \rho_{inst,n_{tinst}-3} \\
\vdots & &&\ddots&  \\
 \rho_{inst,n_{tinst}-1} & \rho_{inst,n_{tinst}-2} & \rho_{inst,n_{tinst}-3} & \cdots &  1 
\end{bmatrix}
%
$$

As a result we can construct the covariance matrices for all cross sections,

$$
\begin{align*}
CV_{er} &= 
\begin{bmatrix}
cv_{er} & \mathbf{0}_{(n_{end}+1 \times n_{end}+1)} & \cdots & \mathbf{0}_{(n_{end}+1 \times n_{end}+1)}  \\
\mathbf{0}_{(n_{end}+1 \times n_{end}+1)} & cv_{er} & \cdots & \mathbf{0}_{(n_{end}+1 \times n_{end}+1)}  \\
\vdots & \vdots & \ddots & \vdots \\
\mathbf{0}_{(n_{end}+1 \times n_{end}+1)} & \mathbf{0}_{(n_{end}+1 \times n_{end}+1)} & \cdots & cv_{er}
\end{bmatrix} 
%
\hspace{1cm}
%
CV_{ex}  = 
\begin{bmatrix}
cv_{ex} & \mathbf{0}_{(n_{ex} \times n_{ex})} & \cdots & \mathbf{0}_{(n_{ex} \times n_{ex})}  \\
\mathbf{0}_{(n_{ex} \times n_{ex})} & cv_{ex} & \cdots & \mathbf{0}_{(n_{ex} \times n_{ex})}  \\
\vdots & \vdots & \ddots & \vdots \\
\mathbf{0}_{(n_{ex} \times n_{ex})} & \mathbf{0}_{(n_{ex} \times n_{ex})} & \cdots & cv_{ex}
\end{bmatrix} 
\end{align*}
$$


=Now I generate, error terms, instruments, and exogneous variables from mulitvariate normal distribution. First let

$$ 
\begin{align*} 
Z_{2jt} &= \begin{bmatrix} Z_{2jt,1} & Z_{2jt,2} & \cdots & Z_{2jt,n_{ex}} \end{bmatrix}' \\[10pt]  
W_t &= \begin{bmatrix} W_{t,1} & W_{t,2} & \cdots & W_{t,n_{inst}} \end{bmatrix}' \\[10pt]
\tilde{V}_{jt} &= \begin{bmatrix} V_{jt,1} & V_{jt,2}& \cdots & V_{jt,n_{end}} & \varepsilon_{j} \end{bmatrix}' 
\end{align*} 
$$

Then consider, $ W_{t} \sim N(\mathbf{0}_{n_{inst} \times 1}, CV_{inst})$

$$
\begin{bmatrix} Z_{21t}' & Z_{22t}' & \cdots & Z_{2n_{cs}t}' \end{bmatrix}' \sim N(\mathbf{0}_{n_{cs} \cdot n_{exo} \times 1}, CV_{ex})
\hspace{1cm} \text{ and } \hspace{1cm} 
\begin{bmatrix} \tilde{V}_{1t}' & \tilde{V}_{2t}' & \cdots & \tilde{V}_{n_{cs},t}' \end{bmatrix}' \sim N(\mathbf{0}_{n_{cs} \cdot (n_{end} +1) \times 1}, CV_{er})
$$

<h3> 1.2 Endogenous Variable Generation </h3>  

Now we wish to generate the endogenous variables from the above but in accordance with the material presented in the proposal we consider two different sturctures. 

1. Non Panel: Parameter Vectors are not shared across cross sections meaning that secondary equations for each cross section are seperate.


2. Panel: Parameter vectors are shared across cross sections.

<h4> 1.2.1 Endogenous Variable Generation: Non Panel </h4>

Since the secondary equations are not of a panel type, I will

* For each $j\in \{1,2,\cdots , n_{cs}\}$ and $d \in \{1,2,\cdots,n_{end} \}$ I will draw the coefficienct vector $\alpha_{1jd}$ from the cartesian product of $[1,1]$ with itself $n_{exo}$ times. 


* For each $j\in \{1,2,\cdots , n_{cs}\}$ I will draw a sequence of integers $C_{j}$ from the $\mathcal{C}^{n_{tinst}}_{n_{cinst}}$ ways that that you can choose $n_{cinst}$ instruments from $n_{tinst}$ total instrument to be included in every regression of $Z_{1j}$ on $Z_{2j}$ and $W$ i.e. these numbers define which columns of $W$ are included in $W_j$.


* For each $j\in \{1,2,\cdots , n_{cs}\}$ and $d \in \{1,2,\cdots,n_{end} \}$ I will draw the coefficienct vector $\alpha_{2jd}$ from the cartesian product of $[1,1]$ with itself $n_{cinst}$ times. 


* Then for each $j\in \{1,2,\cdots , n_{cs}\}$ and $d \in \{1,2,\cdots,n_{end} \}$ I generate endogenous regressors $Z_{1jd}$ as follows

$$ Z_{1jd} =  \alpha_{0jd} + Z_{2jt}' \alpha_{1jd} + W_{jt}' \alpha_{2jd} + V_{jt,d} \hspace{1cm} \text{ where } \hspace{1cm} \alpha_{0jd} = 1/2+j/2 $$ 



<h4> 1.2.2 Endogenous Variable Generation: Panel </h4>

If the secondary equations are panel type, I will

* For each $d \in \{1,2,\cdots,n_{end} \}$ I will draw the coefficienct vector $\alpha_{1d}$ from the cartesian product of $[1,1]$ with itself $n_{exo}$ times. 


* For each $j\in \{1,2,\cdots , n_{cs}\}$ I will draw a sequence of integers $C_{j}$ from the $\mathcal{C}^{n_{tinst}}_{n_{cinst}}$ ways that that you can choose $n_{cinst}$ instruments from $n_{tinst}$ total instrument to be included in every regression of $Z_{1j}$ on $Z_{2j}$ and $W$ i.e. these numbers define which columns of $W$ are included in $W_j$.


* For each $d \in \{1,2,\cdots,n_{end} \}$ I will draw the coefficienct vector $\alpha_{2d}$ from the cartesian product of $[1,1]$ with itself $n_{tinst}$ times. 


* Then for each $j\in \{1,2,\cdots , n_{cs}\}$ and $d \in \{1,2,\cdots,n_{end} \}$ I generate endogenous regressors $Z_{1jd}$ as follows

$$ Z_{1jd} =  \alpha_{0jd} + Z_{2jt}' \alpha_{1d} + W_{jt}' \alpha_{2d} + V_{jt,d} \hspace{1cm} \text{ where } \hspace{1cm} \alpha_{0jd} = 1/2+j/2 $$ 


<h3> Primary Regressand Generation </h3>

Having generated all primary regressors I will generate the regresand for the primary equation to do this I will,

* I will draw the coefficienct vector $\beta_1$ from the cartesian product of $[1,1]$ with itself $n_{exo} +n_{end}$ times. Then set


$$ Y_{jt} = [\; Z_{1jt}' \;\; Z_{2jt}' \;] \beta_1 + e_j + \varepsilon_{jt} \;\;\;\; \text{ where } \;\;\;\;  e_{j} = 1+j/2  $$



<h3> 2.0 Block Diagonal Matrix Function </h3>

In order to facilitate construction of the block diagonal matrices $V_{er},V_{ex}$, and $V_{inst}$ discussed above, I define a following function

In [2]:
def blkdiag(mat,nb):
    """
INPUTS
mat     Square Matrix which will form the block in a block diagonal matrix
nb      Number of diagonal block in output matrix 

OUTPUT
v       Block diagonal matrix of dimension ( nb*mat.shape[0] x nb*mat.shape[0] )
    """
    # Initializing the varcov matrix for all crosssections
    v = np.hstack((mat,np.zeros((mat.shape[0],(nb-1)*(mat.shape[1])))))
    # Registry matrix used in following loops
    vreg = np.eye(nb-1)
    for j in np.arange(nb-1):
        # Initializing current block of rows 
        pv = np.zeros((mat.shape[0],mat.shape[0]))
        # Horiz Stacking either zeros or var_err depending on ve_reg[j,i]
        for i in np.arange(nb-1):
            if vreg[j,i] == 1: # Stack var_err onto pv_err
                pv = np.hstack((pv,mat))
            if vreg[j,i] == 0: # Stack zeros onto pv_err
                pv = np.hstack((pv,np.zeros((mat.shape[0],mat.shape[0]))))
        # Vertically stacking block rows on top of one another
        v = np.vstack((v,pv))
    return v

<h3> 2.1 Block Diagonal Matrix Function Demonstration </h3>

In [3]:
blkdiag(np.ones((3,3)),4)

array([[1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1.]])

<h3> 3.0 DGP Generation Code </h3>

What follows is the function used to generate all monte carol simulation data sets. 

In [4]:
def psc_dgp(inpt_d): 
    '''
PURPOSE: 
Generate Draws from the dgp detailed in penner(2018) 'Panel Selection and Control'

INPUTS: 
inpt_d      Dictionary with the following keyword items
 r_seed      Integer random number generator seed
 nds         Number of generated data sets (int)
 ntp         Number of time periods (int)
 ncs         Number of cross sections (int)
 n_end       Number of endogenous variables in primary regression (int)
 n_exo       Number of exogenous variables in primary regression (int)
 t_inst      Number of total instruments available to all cross sections (int)
 c_inst      Number of valid instrument per cross section (int < t_inst )
 frc         Indicator for forcing control function to be other than correlation based
 sec_pan     Indicator for whether secondary equation is panel (=1) or not (=0)
 ex_vpro     List of exogenous regr covariances where cov(Z21,i , Z21,(i+j)) = ex_vpro[j-1] 
 inst_vpro   List of instrument covariances where cov(Wi , W(i+j)) = inst_vpro[j-1] 
 err_vpro    List of error covariances where cov(V1,i , V1,(i+j)) = err_vpro[j-1] 
 
OUPUTS: 
data_sets                      List with the following elements
 data_sets[0][0]                  dictionary inpt_d  returned
 data_sets[0][1]['Derr_nms']      list of column names for each error term df
 data_sets[0][1]['Dins_nms']      list of column names for each instrument df
 data_sets[0][1]['Dlng_nms']      list of column names for each long df
 data_sets[0][1]['coeff']         list of lists of coefficient matrices used in secondary eqn 
 data_sets[0][1]['pcoeff']        list of n_exo+n_end coefficents used to generate primary regressand
 data_sets[0][1]['var_inst']      list of lists when coverted to array is instrument VCOV matrix
 data_sets[0][1]['V_ex']          list of lists when coverted to array is exog regs VCOV matrix
 data_sets[0][1]['V_err']         list of lists when coverted to array is error term VCOV matrix
 data_sets[i][0]['err_df']        ith error term data array in list form
 data_sets[i][0]['prim_df']       ith primary regression variables data array in list form
 data_sets[i][0]['inst_df]        ith instruments data array in list form    
    '''

    # Extracting all input variables from inpt_d dictionary
    r_seed = inpt_d['r_seed']
    nds = inpt_d['nds']
    ntp = inpt_d['ntp']
    ncs = inpt_d['ncs']
    n_end = inpt_d['n_end']
    n_exo = inpt_d['n_exo']
    t_inst = inpt_d['t_inst']
    c_inst = inpt_d['c_inst']
    frc = inpt_d['frc']
    sec_pan = inpt_d['sec_pan']
    ex_vpro = inpt_d['ex_vpro']
    inst_vpro = inpt_d['inst_vpro']
    err_vpro = inpt_d['err_vpro']
    np.random.seed([r_seed])

    ## Coefficients on instruments in secondary equation
    if sec_pan == 0:
        # Not panel so coefficients have seperate coeff vector of length c_inst
        icoeffs_reg = list(iter.product([-1,1],repeat = c_inst))
        # All permutation of ncs pairs of n_end coeff vectors on t_inst instruments  
        picfs = list(iter.permutations(range(0,len(icoeffs_reg)),n_end))
        # Rand choosing 1 (if panel) or ncs (if !panel) coeff for ex regress in secondary reg.
        icfs = [ picfs[i] for i in np.random.randint(len(picfs),size = ncs)]
    else: 
        # Is panel so coefficients have common vector of length t_inst
        icoeffs_reg = list(iter.product([-1,1],repeat = t_inst))
        # All permutation of ncs pairs of n_end coeff vectors on t_inst instruments 
        if len(icoeffs_reg) < 99:
            picfs = list(iter.permutations(range(0,len(icoeffs_reg)),n_end))
            # Rand choosing 1 (if panel) coeff for ex regress in secondary reg.
            icfs = [ picfs[i] for i in np.random.randint(len(picfs),size =1)]*ncs
        else:
            icfs = [tuple(np.random.randint(len(icoeffs_reg),size =n_end))]*ncs

    ## Assignment of the relvant instruments to each cross section.
    # Registry of instrument assignments
    insts_reg = list(iter.combinations(np.arange(1,t_inst+1),c_inst))
    # List of which instruments (col #'s) are relevant for each crossection 
    icr = [insts_reg[np.random.randint(len(insts_reg))] for i in range(ncs)]

    ## Coefficients on exogenous variables in secondary equations       
    # Collection of all Coefficients on Exogenous Variables in secondary eqns
    excoeffs_reg = list(iter.product([-1,1],repeat = n_exo))
    # All permutation of combinations of ncs coeff vectors on  for ex regressors 
    pxcfs = list(iter.permutations(range(0,len(excoeffs_reg)),n_end))
    # Rand choosing 1 (if panel) or ncs (if !panel) coeff for ex regress in secondary reg.
    if sec_pan == 0:
        # Not panel so rand choosing ncs collections of n_exo reg numbers from pxcfs
        xcfs = [ pxcfs[i] for i in np.random.randint(len(pxcfs),size = ncs)]
    else:
        # Is panel so need only 1 collection of n_exo reg numbers from pxcfs & duplicating
        xcfs = [ pxcfs[i] for i in np.random.randint(len(pxcfs),size = 1)]*ncs

    ## Generation of the coefficient matrix for secondary regression.             
    # Initializing Coefficient Matrix            
    coeff = np.zeros((n_end,ncs,t_inst+n_exo))
    for j in range(n_end):
        for i in range(ncs):
            for k in range(n_exo):
                coeff[j,i,k] = excoeffs_reg[xcfs[i][j]][k]
            if sec_pan == 0:
                for l in range(c_inst):
                    k = icr[i][l]
                    coeff[j,i,k+n_exo-1] = icoeffs_reg[icfs[i][j]][l]   
            else: 
                for l in icr[i]:
                    coeff[j,i,l+n_exo-1] = icoeffs_reg[icfs[i][j]][l-1] 
                    
    ## Generation primary regression coefficient vector                
    # Common Primary Coeff Vector
    pcoeff = np.array([1,-1]*int(np.ceil((2*(n_end + n_exo)+1)/2)))[:n_end+n_exo].reshape(n_end+n_exo,1)
    # Fixed Effect for each crossection
    fe = [ 1+x/2 for x in np.arange(0,ncs)]

    ## Joint Distribution of Exogenous regressors
    # Vector of Means (=0)
    mu_ex = np.zeros(n_exo)
    # Diagonal matrix of variaces (=1)
    var_ex = np.eye(n_exo)
    # Variance Covariance Matrix Generation or EACH crossection
    for i in np.arange(len(ex_vpro)):
        var_ex = (var_ex + ex_vpro[i]*np.eye(n_exo,k=i+1)
                             + ex_vpro[i]*np.eye(n_exo,k=-(i+1))) 

    ## Joint Distribution of Instruments for all cross sections
    # Vector of Means (=0)
    mu_inst = np.zeros(t_inst)
    # Diagonal Matrix of Variaces (=1)
    var_inst = np.eye(t_inst)
    # Variance Covariance Matrix Generation
    for i in np.arange(len(inst_vpro)):
        var_inst = (var_inst + inst_vpro[i]*np.eye(t_inst,k=i+1) 
                             + inst_vpro[i]*np.eye(t_inst,k=-(i+1)))

    ## Joint Distribution of Error Terms for EACH crossection
    # Vector of means
    mu_err = np.zeros(n_end+1)
    # Diagonal Matrix of Variances 
    var_err = np.eye(n_end+1)
    # Variance Covariance Matrix Generation
    if frc == 0 : 
        # Var Cov matrix for correlated errors ==> additive linear control functions 
        for i in np.arange(len(err_vpro)):
            var_err = (var_err + err_vpro[i]*np.eye(n_end+1,k=i+1) 
                                 + err_vpro[i]*np.eye(n_end+1,k=-(i+1)))
    else: 
        # Error not explicitly correlated ==> have to force control functions.
        var_err = np.eye(n_end+1)

    # Error term mean vector for ALL Crossections
    Mu_err = np.tile(mu_err,ncs)        
    # Error term variance covariance matrix for ALL Crossections
    V_err = blkdiag(var_err,ncs)

    # Exogenous regressor mean vector for ALL Crossections
    Mu_ex = np.tile(mu_ex,ncs) 
    # Exogenous regressor variance covariance matrix for ALL Crossections
    V_ex = blkdiag(var_ex,ncs)

    ## Variable Name Generation
    # exogenous variable name generation 
    #          [Z21,1 , Z21,2 ......, Z22,1 , Z22,2 , ..... ]        
    ex_nms = [''.join(['Z2',str(i),',',str(j)]) 
              for i in list(range(1,ncs+1)) 
              for j in list(range(1,n_exo+1))]
    # instruments names generation 
    #          [W1 , W2 , .... ]
    inst_nms = [''.join(['W',str(i)]) for i in list(range(1,t_inst+1))]
    # Error terms names generation
    #          [V1,1 , V1,2 , .... ,e1 , V2,1 , V2,2 , ...... e2 , ......]
    err_nm1 = ['e' if val == n_end+1 else 'V' for val in  list(range(1,n_end+2))*ncs]
    err_nm2 = [ str(i) for y in range(1,ncs+1) for i in iter.repeat(y,n_end+1)]
    err_nm3 = ['' if val == n_end+1 else ''.join([',',str(val)]) 
               for val in list(range(1,n_end+2))*ncs]
    err_nm  = [''.join([err_nm1[i],err_nm2[i],err_nm3[i]]) for i in range(len(err_nm1))]

    # Initializing the data sets list
    data_sets = []

    for k in range(nds):
        ## Variable Generation
        time = np.arange(1,ntp+1).reshape(ntp,1)
        # Exogenous Regressor Generation
        Ex = np.random.multivariate_normal(Mu_ex,V_ex,ntp)
        Ex = np.hstack((time,Ex))
        # Instruments Generation
        Inst = np.random.multivariate_normal(mu_inst,var_inst,ntp)
        Inst = np.hstack((time,Inst))
        # Error Terms Generation 
        Err = np.random.multivariate_normal(Mu_err,V_err,ntp)
        Err = np.hstack((time,Err))
        ## Data Frame Generation 
        Ex_df = pd.DataFrame(Ex,columns = ['t'] + ex_nms)
        Inst_df = pd.DataFrame(Inst, columns = ['t'] + inst_nms)
        Err_df = pd.DataFrame(Err,columns = ['t'] + err_nm)

        ## Generating Endogenous (primary) regressors
        for j in range(n_end):
            for i in range(ncs):
                # Regular expression for the relevant exogenous regressors
                ex_pat = ''.join(['^Z2',str(i+1),','])
                # Regular expression for the relevant error term. 
                err_pat = ''.join(['V',str(i+1),',',str(j+1)])
                # Extracting exog regresors converting to numpy array
                pe1 = pd.concat([Ex_df.filter(regex = ex_pat),Inst_df.iloc[:,1:]], axis = 1).values
                # Extracting error variable and converting to numpy array
                pe2 = Err_df.filter(regex = err_pat).values
                # Calculating the endogenous primary regressor
                pe = pe1.dot(coeff[j,i,:]).reshape(pe1.shape[0],1) + pe2
                # Constructing the appropriate name for the endo regressor
                end_nm = ''.join(['Z1',str(i+1),',',str(j+1)])
                if j == 0 and i == 0:
                    # Initializing the endog df with first calculated regressor
                    End_df = pd.DataFrame(pe,columns = [end_nm])
                else:
                    # Adding calculated endog regressor onto df
                    End_df[end_nm] = pe

        ## Generation of primary regressand
        for i in range(ncs):
            # Regular expression for the relevant endogenous regressors
            en_pat = ''.join(['^Z1',str(i+1),','])
            # Regular expression for the relevant exogenous regressors
            ex_pat = ''.join(['^Z2',str(i+1),','])
            # Name of apporpriate primary error term
            er_nm = ''.join(['e',str(i+1)])
            # Extracting appropriate regressor for primary equation
            pr3 = pd.concat([End_df.filter(regex = en_pat),
                             Ex_df.filter(regex = ex_pat)], axis = 1).values
            # Extracting appropriate error term
            pr4 = Err_df[er_nm].values.reshape(Err_df.shape[0],1)
            # Generating primary regressand
            if frc == 0:
                # if no forcing no need to include control functions explicity
                pr = fe[i] + pr3.dot(pcoeff)+pr4
            else:
                # need to include control functions explicitly
                # (inprogress)
                pass
            # Constructing the appropriate name for the endo regressor
            pr_nm = ''.join(['Y',str(i+1)])
            if i == 0:
                # Initializing the regressand df 
                Pr_df = pd.DataFrame(pr,columns = [pr_nm])   
            else:
                # Adding generated regressand to df
                Pr_df[pr_nm] = pr

        ## Converting Data To Long Panel Type
        for i in range(ncs):
            # Initializing temporary df
            pL = None
            eL = None
            # Columns Names for endogeneous regressors
            Z1_nm = [ ''.join(['Z1',',',str(j)]) for j in range(1,n_end+1)]
            # Columns Names for exogenous regressors
            Z2_nm = [ ''.join(['Z2',',',str(j)]) for j in range(1,n_exo+1)]
            
            # Names to be extracted from Err_df
            erl_ex_nm = (['t'] + [ ''.join(['V',str(i+1),',',str(j)]) for j in range(1,n_end+1)]
                               + [''.join(['e',str(i+1)])])
            # Long panel variable names
            erl_nm = [''.join(['V',str(j)]) for j in range(1,n_end+1)] + ['e']
            # Extracting errors for cross section i
            eL = Err_df.loc[:,erl_ex_nm].copy()
            # Renaming to long panal names
            eL.columns = ['t'] + erl_nm
            # Adding the cross section varaible
            eL['crs'] = str(i+1)
            # Reording variables
            eL = eL[['crs','t']+erl_nm]
            # Adding regressand columns to pL
            pL = pd.DataFrame(Pr_df[''.join(['Y',str(i+1)])].values,columns = ['Y'])
            # Adding endog regressors to pL
            pL = pd.concat([pL,pd.DataFrame(
                             End_df.filter(regex = ''.join(['^Z1',str(i+1),','])).values
                             ,columns = Z1_nm)],axis = 1)
            # Adding exog regressors to pL
            pL = pd.concat([pL,pd.DataFrame(
                             Ex_df.filter(regex = ''.join(['^Z2',str(i+1),','])).values
                             ,columns = Z2_nm)],axis = 1)
            # Adding the crossection variable
            pL['crs'] = i+1
            # Adding the time component variable
            pL['t'] = pd.DataFrame(np.arange(1,ntp+1).reshape(ntp,1))
            if i == 0 :
                # Initializing Data_long
                Data_long = pL
                err_long = eL
            else:
                # Adding pL to the bottom of Data_long
                Data_long = pd.concat([Data_long,pL], axis = 0)
                err_long = pd.concat([err_long,eL],axis = 0)

        # Sorting Data_long by column name      
        Data_long = Data_long[list(Data_long.columns)[-2:] + list(Data_long.columns)[:-2]]

        # Extract names in Df_s only once
        if k == 0:
            # Names for export data sets
            inpt_d['cin'] = 'crs'
            inpt_d['tin'] = 't'
            inpt_d['ex_nm'] = Z2_nm
            inpt_d['en_nm'] = Z1_nm
            inpt_d['dep_nm'] = 'Y'
            Dlng_nms = list(Data_long.columns)
            Dins_nms = list(Inst_df.columns)
            Derr_nms = list(err_long.columns)
            nms_cfs = {}
            nms_cfs['Dlng_nms'] = Dlng_nms
            nms_cfs['Dins_nms'] = Dins_nms
            nms_cfs['Derr_nms'] = Derr_nms
            nms_cfs['coeff'] = coeff.tolist()
            nms_cfs['pcoeff'] = pcoeff.T[0].tolist()
            nms_cfs['var_inst'] = var_inst.tolist()
            nms_cfs['V_ex'] = V_ex.tolist()
            nms_cfs['V_err'] = V_err.tolist() 
            data_sets.append([inpt_d , nms_cfs])

        # Adding constructed data sets to data_sets list
        c = {}
        c['err_df'] = np.array(err_long).tolist()
        c['prim_df'] = np.array(Data_long).tolist()
        c['inst_df'] = np.array(Inst_df).tolist()
        data_sets.append([c])
        
    return data_sets

<h3> DGP Inputs </h3>

In [16]:
%%time
# Construction of the input dictionary (see: ?psc_dgp for details)
inpt_d = {'r_seed':199,'nds':1000, 'ntp':30 , 'ncs': 7 , 'n_end': 3,
             'n_exo': 5 , 't_inst': 10, 'c_inst': 6,
             'frc':0, 'sec_pan': 1, 'ex_vpro': [0.5],
             'inst_vpro': [0.5 , 0.25], 'err_vpro':[0.8 , 0.36, - 0.1 ]}
# Function call
psc_data = psc_dgp(inpt_d)

CPU times: user 4min 38s, sys: 7.01 s, total: 4min 45s
Wall time: 1min 25s


<h3> DGP JSON encoding and saving </h3>

In [17]:
wkng_folder = os.getcwd()
data_folder = ''.join([wkng_folder,'/','data'])
output_filename = 'pscdata_6_7_1.json'
output_file_full = ''.join([data_folder,'/',output_filename])

with open(output_file_full, 'w') as f_obj:
    json.dump(psc_data, f_obj)

<h3> Data set meta data dictionary </h3>

In [18]:
psc_data[0][1]['pcoeff']

[1, -1, 1, -1, 1, -1, 1, -1]

<h3> Coefficient vector for regression of $Z_{1j1}$ on $[Z_{2j} , W ]$ </h3>

In [19]:
n_exo = psc_data[0][0]['n_exo']
t_inst = psc_data[0][0]['t_inst']
ncs = psc_data[0][0]['ncs']
c1 = [''.join(['$\\alpha_{1j1,',str(i),'}$']) for i in range(1,n_exo+1)]
for i in range(1,t_inst+1):
    c1.append(''.join(['$\\alpha_{2j1,',str(i),'}$']))

c11 = [''.join(['j=',str(i)]) for i in range(1,ncs+1)]
    
pd.DataFrame(np.array(psc_data[0][1]['coeff'][0]),columns = c1, index = c11)

Unnamed: 0,"$\alpha_{1j1,1}$","$\alpha_{1j1,2}$","$\alpha_{1j1,3}$","$\alpha_{1j1,4}$","$\alpha_{1j1,5}$","$\alpha_{2j1,1}$","$\alpha_{2j1,2}$","$\alpha_{2j1,3}$","$\alpha_{2j1,4}$","$\alpha_{2j1,5}$","$\alpha_{2j1,6}$","$\alpha_{2j1,7}$","$\alpha_{2j1,8}$","$\alpha_{2j1,9}$","$\alpha_{2j1,10}$"
j=1,1.0,-1.0,1.0,1.0,1.0,-1.0,0.0,1.0,1.0,-1.0,0.0,-1.0,0.0,-1.0,0.0
j=2,1.0,-1.0,1.0,1.0,1.0,-1.0,1.0,0.0,1.0,0.0,0.0,-1.0,0.0,-1.0,1.0
j=3,1.0,-1.0,1.0,1.0,1.0,-1.0,0.0,1.0,0.0,-1.0,1.0,-1.0,0.0,0.0,1.0
j=4,1.0,-1.0,1.0,1.0,1.0,-1.0,0.0,0.0,1.0,0.0,0.0,-1.0,-1.0,-1.0,1.0
j=5,1.0,-1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,-1.0,1.0,-1.0,-1.0,-1.0,1.0
j=6,1.0,-1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,-1.0,-1.0,-1.0,0.0
j=7,1.0,-1.0,1.0,1.0,1.0,-1.0,1.0,0.0,0.0,-1.0,0.0,0.0,-1.0,-1.0,1.0


<h3> Coefficient vector for regression of $Z_{2j2}$ on $[Z_{2j} , W ]$ </h3>

In [20]:
c2 = [''.join(['$\\alpha_{1j2,',str(i),'}$']) for i in range(1,n_exo+1)]
for i in range(1,t_inst+1):
    c2.append(''.join(['$\\alpha_{2j2,',str(i),'}$']))

pd.DataFrame(np.array(psc_data[0][1]['coeff'][1]),columns = c2,index = c11)

Unnamed: 0,"$\alpha_{1j2,1}$","$\alpha_{1j2,2}$","$\alpha_{1j2,3}$","$\alpha_{1j2,4}$","$\alpha_{1j2,5}$","$\alpha_{2j2,1}$","$\alpha_{2j2,2}$","$\alpha_{2j2,3}$","$\alpha_{2j2,4}$","$\alpha_{2j2,5}$","$\alpha_{2j2,6}$","$\alpha_{2j2,7}$","$\alpha_{2j2,8}$","$\alpha_{2j2,9}$","$\alpha_{2j2,10}$"
j=1,-1.0,-1.0,1.0,-1.0,-1.0,1.0,0.0,1.0,1.0,1.0,0.0,1.0,0.0,-1.0,0.0
j=2,-1.0,-1.0,1.0,-1.0,-1.0,1.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,-1.0,1.0
j=3,-1.0,-1.0,1.0,-1.0,-1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,0.0,0.0,1.0
j=4,-1.0,-1.0,1.0,-1.0,-1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,-1.0,1.0
j=5,-1.0,-1.0,1.0,-1.0,-1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,-1.0,1.0
j=6,-1.0,-1.0,1.0,-1.0,-1.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0,-1.0,0.0
j=7,-1.0,-1.0,1.0,-1.0,-1.0,1.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,-1.0,1.0


<h3> Error term variance covariance matrix </h3>

In [21]:
pd.DataFrame(np.array(psc_data[0][1]['V_err']))

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,18,19,20,21,22,23,24,25,26,27
0,1.0,0.8,0.36,-0.1,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.8,1.0,0.8,0.36,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.36,0.8,1.0,0.8,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,-0.1,0.36,0.8,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,1.0,0.8,0.36,-0.1,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.8,1.0,0.8,0.36,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.36,0.8,1.0,0.8,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,-0.1,0.36,0.8,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.8,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.8,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


<h3> Instruments variance covariance matrix </h3>

In [22]:
pd.DataFrame(np.array(psc_data[0][1]['var_inst']))

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,1.0,0.5,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.5,1.0,0.5,0.25,0.0,0.0,0.0,0.0,0.0,0.0
2,0.25,0.5,1.0,0.5,0.25,0.0,0.0,0.0,0.0,0.0
3,0.0,0.25,0.5,1.0,0.5,0.25,0.0,0.0,0.0,0.0
4,0.0,0.0,0.25,0.5,1.0,0.5,0.25,0.0,0.0,0.0
5,0.0,0.0,0.0,0.25,0.5,1.0,0.5,0.25,0.0,0.0
6,0.0,0.0,0.0,0.0,0.25,0.5,1.0,0.5,0.25,0.0
7,0.0,0.0,0.0,0.0,0.0,0.25,0.5,1.0,0.5,0.25
8,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.5,1.0,0.5
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.5,1.0


<h3> Exogenous regressors variance covariance </h3>

In [23]:
pd.DataFrame(np.array(psc_data[0][1]['V_ex']))

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,25,26,27,28,29,30,31,32,33,34
0,1.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.5,1.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.5,1.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.5,1.0,0.5,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.5,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,1.0,0.5,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.5,1.0,0.5,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.5,1.0,0.5,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,1.0,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


<h3> Long panel of primary regression variables </h3>

In [24]:
pd.DataFrame(np.array(psc_data[1][0]['prim_df']), columns = psc_data[0][1]['Dlng_nms'])

Unnamed: 0,crs,t,Y,"Z1,1","Z1,2","Z1,3","Z2,1","Z2,2","Z2,3","Z2,4","Z2,5"
0,1.0,1.0,-0.164165,-0.512439,-2.202741,-0.258346,-0.022826,-1.454673,-0.407353,0.566428,0.376526
1,1.0,2.0,3.937760,1.392265,0.521072,3.162920,-1.456039,-0.718717,0.127975,1.515899,3.540657
2,1.0,3.0,-7.759968,-5.372650,3.827823,0.672529,-0.256226,1.390624,0.776912,-1.035782,0.329270
3,1.0,4.0,-3.999583,-4.401955,2.480635,1.921170,-1.501450,-1.838082,0.120065,-0.816123,-1.586829
4,1.0,5.0,-9.670890,-0.289849,2.829142,-4.660386,0.085701,-0.904156,-0.562774,-0.775189,-0.149669
5,1.0,6.0,-3.579452,2.369516,2.128705,-4.390401,2.758192,0.848919,-0.010476,-0.457581,-2.162283
6,1.0,7.0,-4.367964,-2.122236,-0.900394,-3.094680,-0.301203,-2.092995,-0.552938,-0.588507,-2.235244
7,1.0,8.0,7.105436,1.766791,-1.058030,1.512129,-0.156095,1.255831,1.172862,0.878984,-0.630256
8,1.0,9.0,4.038604,0.829312,-7.729740,-1.159560,0.986409,-1.055829,-0.329977,0.485673,1.010046
9,1.0,10.0,-11.320540,-3.003408,9.077074,2.078871,-1.868040,-2.727573,-0.743259,-1.183419,-0.154607


<h3> Instruments data frame </h3>

In [25]:
pd.DataFrame(np.array(psc_data[1][0]['inst_df']), columns = psc_data[0][1]['Dins_nms'])

Unnamed: 0,t,W1,W2,W3,W4,W5,W6,W7,W8,W9,W10
0,1.0,0.605205,1.543611,0.214294,-0.809703,-0.271784,-1.34519,0.238141,-0.120604,1.147101,0.424202
1,2.0,1.575857,0.346206,0.080949,1.153969,1.04437,0.827144,0.199513,0.073282,-0.426958,-0.268021
2,3.0,0.301724,0.516205,1.198642,0.136219,2.697098,2.347897,0.774978,0.963853,1.367559,1.500667
3,4.0,-0.135455,-0.658351,-0.033491,-0.197652,-1.045303,-0.685588,0.422785,1.088755,0.562654,-0.031826
4,5.0,-1.50065,-0.57097,0.702371,1.198086,0.887042,1.588492,1.672423,1.739138,-0.436682,0.471256
5,6.0,-0.281071,0.938036,2.245216,0.644382,0.506365,-1.226878,-0.329669,-0.50575,-0.869855,-1.370422
6,7.0,-1.195927,-1.332841,-0.360229,-1.31082,-2.22793,-0.031466,1.294809,1.794366,0.400962,0.367887
7,8.0,-0.957192,-0.81636,-0.272083,0.05678,0.745042,1.377872,-0.830702,-0.447916,-0.50577,-1.779491
8,9.0,0.8586,-0.580743,-1.102091,-1.408718,0.160056,0.1376,-2.167706,-0.664217,-0.315888,-0.064515
9,10.0,1.067016,1.961137,1.532942,0.652726,0.088371,0.754161,1.807024,0.880663,1.346149,1.75689


<h3> Error term data frame </h3>

In [26]:
pd.DataFrame(np.array(psc_data[1][0]['err_df']), columns = psc_data[0][1]['Derr_nms'])

Unnamed: 0,crs,t,V1,V2,V3,e
0,1,1.0,-0.16581463483088216,-1.1589860014086462,-1.7367776618205755,-1.7615294568403332
1,1,2.0,-1.8970784085362569,-1.2067180266653301,-0.18182852705725028,0.31905878316088154
2,1,3.0,0.010298308858982221,-0.2623037803835718,-0.1339118589399639,0.2630886833132368
3,1,4.0,-2.419875787140163,-1.8301454633461345,-1.1040849958251535,-0.3521717223030937
4,1,5.0,-1.0703985886310527,-1.7473509076155842,-2.0635508031738543,-1.8389105987672065
5,1,6.0,-0.7732447928948968,-0.5286503007204844,-0.5617896790081498,-0.23576713391439874
6,1,7.0,-0.5943762656260728,-1.3643462793829049,-1.3143815445642926,-1.4593252465639477
7,1,8.0,0.423806881955599,-0.13004130301334355,-0.511882270542019,0.020181818578112597
8,1,9.0,-1.3327965473311016,-2.62949311924534,-2.938477135542809,-2.1242545399116466
9,1,10.0,0.3412377960213818,0.0847624777925029,-0.5589661476933472,-1.1738434527905106
