Authors : Young Jun Lee and Daniel Wilhelm
This project provides the STATA command
dgmtest which implements the test for significance by Delgado and Manteiga (2001) and can be used to test for the presence of measurement error as described in Wilhelm (2018) and Lee and Wilhelm (2018).
Files contained in this package:
- The file
- The file
dgmtest.sthlpcontains the Stata helpfile for the
- The files
simul_DGM2001.docontain the code to replicate the simulations in Delgado and Manteiga (2001).
- The files
simul_Wilhelm2018.docontain the code to replicate the simulations in Wilhelm (2018).
- The file
example.docontains the simple simulation example shown below.
- Download the package.
- Change into the directory containing this package.
- Use the command
dgmtestas described below.
dgmtest tests the null hypothesis
H0: E[Y | X, W, Z] = E[Y | X, W]
against the alternative that the null does not hold, where
- Y is a scalar dependent variable
- X and W are vectors of explanatory variables
- Z is a vector of explanatory variables
The vector of explanatory variables, W, may contain elements that enter the conditional expectation in a linear, additively separable fashion. For example, decompose W=(W1,W2) where W1 enters nonseparably and W2 enters in a linear, additively separable fashion,
E[Y | X, W, Z] = f(X,W1,Z) + pi*W2
where f is some function and pi a row-vector of the same dimension as W2. In the presence of variables W2, we apply the test in Delgado and Manteiga (2001) after replacing Y with (Y - pihat*W2), where pihat is Robinson (1988)'s estimator of pi.
dgmtest depvar expvar [if] [in] [, qz(integer) qw2(integer) teststat(string) kernel(string) bootdist(string) bw(real) bootnum(integer) ngrid(integer) qgrid(real)]
depvaris the outcome variable Y
expvaris a list of variables containing all elements of X, W, and Z. The order of variables in the list should be: X, W, Z)
The options are as follows:
qzis the dimension of Z (default = 1).
qw2is the dimension of W2 (default = 0).
teststatis the type of test statistic to be used: Cramer-van Mises (CvM, default) or Kolmogorov-Smirnov (KS).
kernelis the kernel function: biweight, epanechnikov (default), epan2, epan4, normal, rectangle, triangular.
bwis the bandwidth (default = n^(-1/3q), rule of thumb, where n is the sample size and q the dimension of X1).
bootnumis the number of bootstrap samples for the computation of the test's critical value (default = 500).
bootdistis the distribution of the bootstrap multiplier variable: mammen (default), rademacher, uniform.
ngridis the number of equally spaced grid points used to compute the supremum of the KS statistic, if that statistic is chosen via the option
teststat. The default is 0 which means that the sample serves as the grid.
qgridis a number between 0 and 1 to define the min and max values of the grid in the previous option. The min value is the
qgrid-quantile and the max value is the (1-
qgrid)-quantile. The default is 0 so that in that case the grid ranges from the min to the max value in the sample.
If options are left unspecified, the command runs on the default settings.
Testing for the presence of measurement error
Wilhelm (2018) shows that, under some conditions, the null hypothesis H0 is equivalent to the hypothesis of no measurement error in X. In this context, the variable Z must be excluded from the outcome equation. For example, it could be a second measurement or an instrumental variable. See Wilhelm (2018), Lee and Wilhelm (2018), and the examples below for more details.
Generate explanatory variables
set obs 200 // true regressor generate Xstar = runiform() // measurement error in X generate etaX = runiform() // mismeasured regressor generate X1 = Xstar + 0.5*etaX // additively linear control variable generate X2 = runiform() // measurement error in Z generate etaZ = runiform() // second measurement of true regressor generate Z = Xstar + 0.5*etaZ // regression error generate epsilon = runiform()
Generate outcome variable
We generate an outcome in two different ways, in a regression with and without additively separable, linear controls:
// outcome equation without controls generate Y1 = Xstar^2 + 0.2*Xstar + 0.5*epsilon // outcome equation with controls generate Y2 = Xstar^2 + 0.2*Xstar + 0.5*X2 + 0.5*epsilon
Perform the test of no measurement error
Perform the test using default options:
// perform the test of the hypothesis of no measurement error in X1 dgmtest Y1 X1 Z dgmtest Y2 X1 X2 Z, qw2(1)
Perform the test, choosing the triangular kernel function:
// perform the test of the hypothesis of no measurement error in X1 dgmtest Y1 X1 Z, kernel(triangular) dgmtest Y2 X1 X2 Z, qw2(1) kernel(triangular)