In [1]:
## Preamble: Package Loading
import numpy as np
import ipywidgets as ipw
from IPython.display import display
import matplotlib.pyplot as plt
from matplotlib import gridspec
import pandas as pd
import json
import kernel as kr
import psc_sumdisp as psd 

<h2> Summary </h2>

The following notebook contains results of a Monte Carlo Exercise conducted on the estimator detailed in 'psc.ipynb' and 'psc_proposal.pdf' with a data sets generated by 'psc_dgp.ipnyb' (see this notebook for details of the DGP). 

Important features of each of the following trials are presented here

* In all data sets the endogneous variables $Z_1$ have been generated by secondary equations which are panel fixed effects type, corresponding to section 3.3 and 3.4 of 'psc_dgp.ipynb'. 


* All estimates have been generated with the knowledge that the secondary equations are panel type (i.e. the estimation of the secondary equations is properly specified). 


* No subset selection (lasso/SCAD) has been used to generate the following results, this will come later. 


* The number of datasets used from each component of each trial is 'nds = 1000'

<h3> Variable Description Table </h3>

A number of variables are used below, here are their descriptions. Refer back to 'psc.ipynb' or 'psc_dgp.ipynb' for more details.

Variable Name  |  Description  
--|--
k_H| Kernel number used for H function Estimation  
c_H |  Plug in bandwidth constant for H function Estimation
k_mvd  | Kernel number used for multivariate d>2 density estimation
c_mvd|  Plug in bandwidth constant for multivariate d>2 density estimation
k_uvd  |  Kernel number used for bivariate density  estimation 
c_uvd |  Plug in bandwidth used for bivariate density estimation
dep_nm|  Variable name of the dependent variable
en_nm |  Variable names of each endogenous variabble
ex_nm |  Variable names of each exogenous variable
in_nm |  Variable names of instruments relevant to each cross section
err_vpro|  Vector of covariances used to construct the error cov matrix
ex_vpro|  Vector of covariances used to construct the exog variable cov matrix
inst_vpro | Vector of covariances used to construct the instrument cov matrix
frc |  Indicator for whether the functional form of control function is forced
input_filename|  Filename of dataset used to generate the results. 
kwnsub  | Indicator for ifthe subset of instrument relevant to each crs is known
n_end  |  Number of endogenous variables 
n_exo|  Number of exogenous variables
ncs  |  Number cross sections
nds  |  Number of dgp data sets
ntp |  Number of time periods
orcl |  Indicator for whether residuals $V$ are observed (=1) or not
r_seed|  Random number generator seed used to generate the data set
sec_pan|  Indicator for whether the secondary eqn data is panel or not
c_inst  |  Number of instrument relevant to each cross section   
t_inst|  Total number of instruments
inc | List of instrument relevant to at least one cross section
tin  |  Variable name of the time period index
cin  |  Variable name of the cross section index 

<h2>Trial Set 1: Varying the Number of Time Periods </h2> 

Here we examine the sampling distribution of $\hat{\beta}_1, \hat{\alpha}_{1}$ and $\hat{\alpha}_{2}$ as the number of time periods $T$ increases i.e. where $T \in \{30,50,70\}$, while holding the following constant (amongst others shown below).

* Number of Cross Sections: 5


* Number of Endogenous Regressors: 2


* Number of Exogenous Regressors: 2


* Total Number of Instruments: 5


* Number of Instrument Relevant to Each Cross Section: 3


* Set of instruments relevant to each cross section is known

<h3> Trial Set 1: Data Loading and Organization </h3> 

The following is extracts and organizes all relvant information from the results data sets whose file names are list here. 

In [2]:
# Results data sets included in trial #1
inpt_filenames0 = ['pscout_6_12_1954.json' ,'pscout_6_12_1220.json' , 'pscout_6_12_1799.json']
# Legend labels
line_nms0 = ['n=30', 'n=50' ,'n=70']

res_out0 = [psd.psc_load(inpt_filenames0[i]) for i in range(len(inpt_filenames0))]
estin_dcts0 = [res_out0[i][0] for i in range(len(inpt_filenames0))]
dgp_sum_filenames0 = [ estin_dcts0[i]['input_filename'].replace('pscdata','pscsum')
                      for i in range(len(inpt_filenames0))]
dgp_dicts0 = [psd.pscsum_load(dgp_sum_filenames0[i]) 
             for i in range(len(dgp_sum_filenames0))]
dgpin_dcts0 =  [dgp_dicts0[i][0] for i in range(len(inpt_filenames0))]
merged_dcts0 = [{**estin_dcts0[i],**dgpin_dcts0[i]} for i in range(len(inpt_filenames0))]
true_bcoeffs0 = [dgp_dicts0[i][1] for i in range(len(inpt_filenames0))]
true_acoeffs0 = [dgp_dicts0[i][2] for i in range(len(inpt_filenames0))]
bcoeff0  = [res_out0[i][1] for i in range(len(inpt_filenames0))]
acoeff0  = [res_out0[i][3] for i in range(len(inpt_filenames0))]
btables0 = [res_out0[i][2] for i in range(len(inpt_filenames0))]
atables0 = [res_out0[i][4] for i in range(len(inpt_filenames0))]

<h3> Trial Set 1: Merged DGP and Estimator Function Input Dictionary Comparison </h3> 

Here I have merged together the dictionaries used to generate both the underlying dataset and the results (you will see the file name for this data set below) and the dictionary used to produce the estimates based on that data below. 

Below you will see a slider which can be used to summarize this merged dictionary corresponding to the position its file name appears in 'input_filenames0' above. 

In accordance with the trial description, the only differences that should exist are the number of time periods (ntp) and the file name of the data set uded to generate the results. 

In [3]:
psd.indict_dsp(merged_dcts0,1)

<h3> Trial Set 1: True Secondary Equation Coefficients Comparison </h3> 

Here I interactively display the coefficent vectors $\alpha_{1jd}$ used to generate the data set (by row indicating cross section and equation) corresponding to the position its file name appears in 'input_filenames0' above. Here they should also be identical across data sets. 

**Note:** 

1.) That since in the above 'sec_pan = 1' the secondary equations are panel type so all non zero coefficients in a columns should be identical. 

2.) A zero coefficient in the following matrix means that the instrument it multiplies is not relevant to that cross section. 

3.) In accordance with the description above they should be identical across results data sets.


In [4]:
psd.indict_dsp(true_acoeffs0,2)

<h3> Trial Set 1: Secondary Function Coefficient Estimates </h3>

Here I interactively show the sampling distribution of the elements of $\hat{\alpha}_{dj}$.  

In [5]:
display(psd.cfs_dsp(acoeff0,atables0,2,5,line_nms0))

<h3> Trial Set 1: Comments on Secondary Function Coefficient Estimates </h3>
    
* The changes in the properties of the sampling distribution of each coefficient are inline with what we would expect from a consistent estimator, the sample variance and Mean Squared Error decrease as the number of time periods increases $ntp \rightarrow \infty$.  


* Another feature evident from the above is the the variance of each coefficient is inversly proportional to the number of cross section which the instrument it multiplies is relevant to. For example $\hat{\alpha}_{d1,1}$ and $\hat{\alpha}_{d1,2}$ have the small variance since they are relevant to all cross sections, followed by $\hat{\alpha}_{d2,1}$ (relvant to 4 cross sections), followed by $\hat{\alpha}_{d2,4}$ and $\hat{\alpha}_{d2,5}$ (relevant to 3 cross sections), lastly followed by $\hat{\alpha}_{d2,4}$ (relevant to only 2 cross sections),.

<h3> Trial Set 1: True Primary Equations Coefficients Comparison </h3>

Here I interactively display the coefficent vector $\beta_1$ used to generate the data set corresponding to the position its file name appears in 'input_filenames0' above. Here they should be identical. 

In [6]:
psd.indict_dsp(true_bcoeffs0,1)

<h3> Trial Set 1: Primary Function Coefficient Estimates </h3>

Here I show the sampling distribution of the elements of $\hat{\beta}_1$.  

In [7]:
display(psd.cfs_dsp(bcoeff0,btables0,1,12,line_nms0))

<h3> Trial Set 1: Comments on Primary Function Coefficient Estimates </h3>

1.) The sampling distribution behave in the way that we would expect a consistent estimator to behave meaning that the sample variance and mean squared error of all coefficient decrease as the number of time periods increases.  

2.) The sample variance of the coefficients multiplying the endogenous regressors are much larger than those multiplying the exogenous regressors. Given the dgp this makes sense in that $Z_1$ is not correlated with error term $\varepsilon$, thus its identification is accomplished without the need for estimating $V$.

<h2> Trial Set 2: Varying the number of Cross Sections </h2>

Here we examine the sampling distribution of $\hat{\beta}_1, \hat{\beta}_2, \hat{\alpha}_{1}$ and $\hat{\alpha}_{2}$ as the number of cross sections $ncr$ increases i.e. where $ncr \in \{5,10,15,20\}$, while holding the following constant (amongst others shown below).

* Number of Time Periods: 5


* Number of Endogenous Regressors: 2


* Number of Exogenous Regressors: 2


* Total Number of Instruments: 5


* Number of Instrument Relevant to Each Cross Section: 3


* Set of instruments relevant to each cross section is known

<h3> Trial Set 2: Data Loading and Organization </h3> 


The following is extracts and organizes all relevant information from the results data sets whose file names are list here.

In [8]:
inpt_filenames1 = ['pscout_6_12_1220.json' ,'pscout_6_13_1914.json'
                   ,'pscout_6_13_1498.json','pscout_6_13_1227.json' ]
line_nms1 = ['ncr = 5','ncr = 10', 'ncr = 15', 'ncr = 20']

res_out1 = [psd.psc_load(inpt_filenames1[i]) for i in range(len(inpt_filenames1))]
estin_dcts1 = [res_out1[i][0] for i in range(len(inpt_filenames1))]
dgp_sum_filenames1 = [ estin_dcts1[i]['input_filename'].replace('pscdata','pscsum')
                      for i in range(len(inpt_filenames1))]
dgp_dicts1 = [psd.pscsum_load(dgp_sum_filenames1[i]) 
             for i in range(len(dgp_sum_filenames1))]
merged_dcts1 = [{**estin_dcts0[i],**dgpin_dcts0[i]} for i in range(len(inpt_filenames0))]
dgpin_dcts1 =  [dgp_dicts1[i][0] for i in range(len(inpt_filenames1))]
true_bcoeffs1 = [dgp_dicts1[i][1] for i in range(len(inpt_filenames1))]
true_acoeffs1 = [dgp_dicts1[i][2] for i in range(len(inpt_filenames1))]
bcoeff1  = [res_out1[i][1] for i in range(len(inpt_filenames1))]
acoeff1  = [res_out1[i][3] for i in range(len(inpt_filenames1))]
btables1 = [res_out1[i][2] for i in range(len(inpt_filenames1))]
atables1 = [res_out1[i][4] for i in range(len(inpt_filenames1))]

<h3> Trial Set 2: DGP and Estimator Function Input Dictionary Comparison </h3> 

Here I have merged together the dictionaries used to generate both the underlying dataset and the results (you will see the file name for this data set below) and the dictionary used to produce the estimates based on that data below. 

Below you will see a slider which can be used to summarize this merged dictionary corresponding to the position its file name appears in 'input_filenames1' above. 

In accordance with the trial description, the only differences that should exist are the number cross sections (ncr), the vector instrument names relevant to each cross section (in_nm), and the file name of the data set uded to generate the results.

In [9]:
psd.indict_dsp(merged_dcts1,1)

<h3> Trial Set 2: True Secondary Equation Coefficients Comparison </h3> 
 
Here I interactively display the coefficent vectors $\alpha_{dj}$ used to generate the data set (by row indicating cross section and equation) corresponding to the position its file name appears in 'input_filenames0' above. Here they should also be identical across data sets. 

**Note:** 

1.) That since in the above 'sec_pan = 1' the secondary equations are panel type so all non zero coefficients in a columns should be identical. 

2.) A zero coefficient in the following matrix means that the instrument it multiplies is not relevant to that cross section. 

3.) In accordance with the description above this df should have a number of rows corresponding to the number of cross section in each data set.

4.) The intersection of the coefficient df across data sets should identical meaning the jth row of coefficient should be the same whenever it appears in the following.  

 

In [10]:
psd.indict_dsp(true_acoeffs1,2)

<h3> Trial Set 2: Secondary Function Coefficient Estimates </h3>

Here I interactively show the sampling distribution of the elements of $\hat{\alpha}_{dj}$. 

In [11]:
display(psd.cfs_dsp(acoeff1,atables1,2,8,line_nms1))

<h3> Trial Set 2: Comments on Secondary Function Coefficient Estimates </h3>
    
* The changes in the properties of the sampling distribution of each coefficient are inline with what we would expect from a consistent estimator as $ncs \rightarrow \infty$, the sample variance and mean squared error decrease as the number of cross sections increases.  


* Similar to Trials Set 1, another feature evident from the above is that the variance of each coefficient is inversely proportional to the number of cross section which the instrument it multiplies is relevant to. As result the variance of each coefficient is weakly decreasing as $ncr$ increases.

<h3> Trial Set 2: True Primary Equations Coefficients Comparison </h3>

Here I interactively display the coefficent vector $\beta_1$ used to generate the data set corresponding to the position its file name appears in 'input_filenames0' above. Here they should be identical. 

In [12]:
psd.indict_dsp(true_bcoeffs1,1)

<h3> Trial Set 2: Primary Function Coefficient Estimates </h3>


Here I show the sampling distribution of the elements of $\hat{\beta}_1$.  

In [13]:
display(psd.cfs_dsp(bcoeff1,btables1,1,12,line_nms1))

<h3> Trial Set 2: Comments on Primary Function Coefficient Estimates </h3>

* The variance of each coefficient is decreasing a $ncr$ increases, behavior one would expect from a consistent estimator as $ncs \rightarrow \infty$ 


* The bias of $\hat{\beta}_{1,1}$ and $\hat{\beta}_{1,2}$ is curiously increasingly with $ncs$ I as yet don't have a great explanation for this. 


* The bias of $\hat{\beta}_{2,1}$ and $\hat{\beta}_{2,2}$ even more curiously jumps around somewhat as $ncs$ increases I assume this is due to the fact that the joint density of the variables in the primary equation changes as more and more cross sections with different sets of relevant instruments which are used to generate the endogenous regressors are added. 

<h2> Trial Set 3: Varying whether the inst.'s relevant to each cross section is known, t_inst = 5</h2>

Here we examine the sampling distribution of $\hat{\beta}_1, \hat{\alpha}_{1}$ and $\hat{\alpha}_{2}$ as I vary whether the set of instruments relevant to each cross section is known or not. In particular this means that if the set is known the endogenous regressors for each cross section are only regressed on those that are relevant, when it is unknown the endogenous regressors are regressed on all instruments. This is done while holding the following constant (amongst others shown below).

* Number of Cross Sections: 5


* Number of Endogenous Regressors: 2


* Number of Exogenous Regressors: 2


* Total Number of Instruments: 5


* Number of Instrument Relevant to Each Cross Section: 3


* Number of time periods: 30

<h3> Trial Set 3: Data Loading and Organization </h3> 

The following is extracts and organizes all relvant information from the results data sets whose file names are list here.

In [14]:
inpt_filenames2 = ['pscout_6_12_1954.json' ,'pscout_6_19_1577.json']
line_nms2 = ['Known Sub','Unknown Sub']

res_out2 = [psd.psc_load(inpt_filenames2[i]) for i in range(len(inpt_filenames2))]
estin_dcts2 = [res_out2[i][0] for i in range(len(inpt_filenames2))]
dgp_sum_filenames2 = [ estin_dcts2[i]['input_filename'].replace('pscdata','pscsum')
                      for i in range(len(inpt_filenames2))]
dgp_dicts2 = [psd.pscsum_load(dgp_sum_filenames2[i]) 
             for i in range(len(dgp_sum_filenames2))]
dgpin_dcts2 =  [dgp_dicts2[i][0] for i in range(len(inpt_filenames2))]
merged_dcts2 = [{**estin_dcts2[i],**dgpin_dcts2[i]} for i in range(len(inpt_filenames2))]
true_bcoeffs2 = [dgp_dicts2[i][1] for i in range(len(inpt_filenames2))]
true_acoeffs2 = [dgp_dicts2[i][2] for i in range(len(inpt_filenames2))]
bcoeff2  = [res_out2[i][1] for i in range(len(inpt_filenames2))]
acoeff2  = [res_out2[i][3] for i in range(len(inpt_filenames2))]
btables2 = [res_out2[i][2] for i in range(len(inpt_filenames2))]
atables2 = [res_out2[i][4] for i in range(len(inpt_filenames2))]

<h3> Trial Set 3: DGP and Estimator Function Input Dictionary Comparison </h3> 

Here I have merged together the dictionaries used to generate both the underlying dataset and the results (you will see the file name for this data set below) and the dictionary used to produce the estimates based on that data below. 

Below you will see a slider which can be used to summarize this merged dictionary corresponding to the position its file name appears in 'input_filenames2' above. 

In accordance with the trial description, the only differences is the value of the indicatro for whether the set of relevant instruments for each cross section is known (kwnsub).

In [15]:
psd.indict_dsp(merged_dcts2,1)

<h3> Trial Set 3: True Secondary Equation Coefficients Comparison </h3> 

Here I interactively display the coefficent vectors $\alpha_{1jd}$ used to generate the data set (by row indicating cross section and equation) corresponding to the position its file name appears in 'input_filenames2' above. Here they should also be identical across data sets. 

**Note:** 

1.) That since in the above 'sec_pan = 1' the secondary equations are panel type so all non zero coefficients in a columns should be identical. 

2.) A zero coefficient in the following matrix means that the instrument it multiplies is not relevant to that cross section. 

3.) In accordance with the description above they should be identical across results data sets.

In [16]:
psd.indict_dsp(true_acoeffs2,2)

<h3> Trial Set 3: Secondary Function Coefficient Estimates </h3>

Here I interactively show the sampling distribution of the elements of $\hat{\alpha}_{dj}$. 

In [17]:
display(psd.cfs_dsp(acoeff2,atables2,2,8,line_nms2))

 <h3> Trial Set 3: Comments on Secondary Function Coefficient Estimates </h3>
    
* The differences betweent the sampling distributions of these estimators are what one would expect when, in case 2, you estimate a regression with a number of irrelevant regressors. In almost all cases the variance of each coefficient is substantially larger when the set of relevant regressors are unknown.


* In the one case where the above is not true i.e. $\hat{\alpha}_{12,3}$ and $\hat{\alpha}_{22,3}$ the variance may be some what smaller but the bias is approximately 200 times larger.


* The reasoning for why $\hat{\alpha}_{12,3}$ and $\hat{\alpha}_{22,3}$ are different may have to do with the fact that from the section above we can see that the instrument which both multiply is relevant to the least amount of cross sections. 


* In all cases the sample mean squared error for each coefficient estimate is far larger in case 2.

<h3> Trial Set 3: True Primary Equations Coefficients Comparison </h3>

Here I interactively display the coefficent vector $\beta_1$ used to generate the data set corresponding to the position its file name appears in 'input_filenames2' above. Here they should be identical. 

In [18]:
psd.indict_dsp(true_bcoeffs2,1)

<h3> Trial Set 3: Primary Function Coefficient Estimates </h3>

Here I show the sampling distribution of the elements of $\hat{\beta}_1$.  

In [19]:
display(psd.cfs_dsp(bcoeff2,btables2,1,12,line_nms2))

<h3> Trial Set 3: Comments on Primary Function Coefficient Estimates </h3>

* In the case of the first 3 coefficient the bias and MSE of each estimator is larger in the case where the subset is unknown. 


* What is curious is that the variance of each estimated coefficient is smaller in case 2, this is perplexing. At moment I imagine that the presence of irrelvant regressors in the secondary equation has induced a bias varaince trade off, where bias is sacrificed for a reduction in the variance, but that doesn't explain what is happening with the last estimated coefficient. Perhaps that the reduced bias is an attentuation bias.     


<h2> Trial Set 4:  Varying whether the inst.'s relevant to each cross section is known, t_inst = 10 </h2>


Here we examine the sampling distribution of $\hat{\beta}_1, \hat{\alpha}_{1}$ and $\hat{\alpha}_{2}$ as I vary whether the set of instruments relevant to each cross section is known or not, and for a secondary coefficient matrix more sparse that in trial set 3. I do this by doubling the total number of instruments available when holding the number of instruments relevant to each cross section constant.  

* Number of Cross Sections: 5


* Number of Endogenous Regressors: 2


* Number of Exogenous Regressors: 2


* Total Number of Instruments: 10


* Number of Instrument Relevant to Each Cross Section: 3


* Number of time periods: 30


<h3> Trial Set 4: Data Loading and Organization </h3> 

The following is extracts and organizes all relvant information from the results data sets whose file names are listed here.

In [20]:
inpt_filenames3 = ['pscout_6_19_1579.json' ,'pscout_6_19_1326.json']
line_nms3 = ['Known Sub','Unknown Sub']

res_out3 = [psd.psc_load(inpt_filenames3[i]) for i in range(len(inpt_filenames3))]
estin_dcts3 = [res_out3[i][0] for i in range(len(inpt_filenames3))]
dgp_sum_filenames3 = [ estin_dcts3[i]['input_filename'].replace('pscdata','pscsum')
                      for i in range(len(inpt_filenames3))]
dgp_dicts3 = [psd.pscsum_load(dgp_sum_filenames3[i]) 
             for i in range(len(dgp_sum_filenames3))]
dgpin_dcts3 =  [dgp_dicts3[i][0] for i in range(len(inpt_filenames3))]
merged_dcts3 = [{**estin_dcts3[i],**dgpin_dcts3[i]} for i in range(len(inpt_filenames3))]
true_bcoeffs3 = [dgp_dicts3[i][1] for i in range(len(inpt_filenames3))]
true_acoeffs3 = [dgp_dicts3[i][2] for i in range(len(inpt_filenames3))]
bcoeff3  = [res_out3[i][1] for i in range(len(inpt_filenames3))]
acoeff3  = [res_out3[i][3] for i in range(len(inpt_filenames3))]
btables3 = [res_out3[i][2] for i in range(len(inpt_filenames3))]
atables3 = [res_out3[i][4] for i in range(len(inpt_filenames3))]

<h3> Trial Set 4: DGP and Estimator Function Input Dictionary Comparison </h3> 

Here I have merged together the dictionaries used to generate both the underlying dataset and the results (you will see the file name for this data set below) and the dictionary used to produce the estimates based on that data below. 

Below you will see a slider which can be used to summarize this merged dictionary corresponding to the position its file name appears in 'input_filenames2' above. 

In accordance with the trial description, the only differences is the value of the indicator for whether the set of relevant instruments for each cross section is known (kwnsub).

In [21]:
psd.indict_dsp(merged_dcts3,1)

<h3> Trial Set 4: True Secondary Equation Coefficients Comparison </h3> 

Here I interactively display the coefficent vectors $\alpha_{1jd}$ used to generate the data set (by row indicating cross section and equation) corresponding to the position its file name appears in 'input_filenames2' above. Here they should also be identical across data sets. 

**Note:** 

1.) That since in the above 'sec_pan = 1' the secondary equations are panel type so all non zero coefficients in a columns should be identical. 

2.) A zero coefficient in the following matrix means that the instrument it multiplies is not relevant to that cross section. 

3.) In accordance with the description above they should be identical across results data sets.


4.) This coefficient matrix is much more sparse than the one shown in trial set 3

In [22]:
psd.indict_dsp(true_acoeffs3,2)

<h3> Trial Set 4: Secondary Function Coefficient Estimates </h3>

Here I interactively show the sampling distribution of the elements of $\hat{\alpha}_{dj}$. 

In [23]:
display(psd.cfs_dsp(acoeff3,atables3,2,8,line_nms3))

 <h3> Trial Set 4: Comments on Secondary Function Coefficient Estimates </h3>
    
* For $\hat{\alpha}_{d1,1}$ and $\hat{\alpha}_{d1,2}$ we can see that the bias and variance of each coefficient is smaller is the case where the set is known.  


* For $\hat{\alpha}_{d2,1}$, $\hat{\alpha}_{d2,2}$, and $\hat{\alpha}_{d2,5}$ we can see a pronounced bias variance trade off, in fact given the result shown in trial set 1 and 2  we would expect the variance of these three coefficients to be the largest given that each is only relevant to one cross section. 


* A bias variance trade off is evident in each of the other coefficients where the bias of each estimated coefficient in case 2 is at least 100 times larger than that shown in case 1 while the variance is at times smaller. 


<h3> Trial Set 4: True Primary Equations Coefficients Comparison </h3>

Here I interactively display the coefficent vector $\beta_1$ used to generate the data set corresponding to the position its file name appears in 'input_filenames3' above. Here they should be identical. 

In [24]:
psd.indict_dsp(true_bcoeffs2,1)

<h3> Trial Set 4: Primary Function Coefficient Estimates </h3>

Here I show the sampling distribution of the elements of $\hat{\beta}_1$. 

In [25]:
display(psd.cfs_dsp(bcoeff3,btables3,1,12,line_nms3))

<h3> Trial Set 4: Comments on Primary Function Coefficient Estimates </h3>

* In the case of the first 2 coefficients the bias and MSE of each estimator is larger in the case where the subset is unknown. 


* What is curious is that the variance of each estimated coefficient is smaller in case 2, this is perplexing. At moment I imagine that the presence of irrelvant regressors in the secondary equation has induced a bias varaince trade off, where bias is sacrificed for a reduction in the variance.


* Again what is somewhat interesting is that the estimator remains in some sense insensitive to the presence of weak/irrelevant regressors in the secondary equation. 