Authors : Young Jun Lee and Daniel Wilhelm
This project provides the STATA command dgmtest which implements the test for significance by Delgado and Manteiga (2001) and can be used to test for the presence of measurement error as described in Wilhelm (2018) and Lee and Wilhelm (2018).
Files contained in this package:
- The file
dgmtest.adocontains thedgmtestcommand. - The file
dgmtest.sthlpcontains the Stata helpfile for thedgmtestcommand. - The files
example_DGM2001.adoandsimul_DGM2001.docontain the code to replicate the simulations in Delgado and Manteiga (2001). - The files
example_Wilhelm2018.adoandsimul_Wilhelm2018.docontain the code to replicate the simulations in Wilhelm (2018). - The file
example.docontains the simple simulation example shown below.
- Download the package.
- Change into the directory containing this package.
- Use the command
dgmtestas described below.
The command dgmtest tests the null hypothesis
H0: E[Y | X, W, Z] = E[Y | X, W]
against the alternative that the null does not hold, where
- Y is a scalar dependent variable
- X and W are vectors of explanatory variables
- Z is a vector of explanatory variables
The vector of explanatory variables, W, may contain elements that enter the conditional expectation in a linear, additively separable fashion. For example, decompose W=(W1,W2) where W1 enters nonseparably and W2 enters in a linear, additively separable fashion,
E[Y | X, W, Z] = f(X,W1,Z) + pi*W2
where f is some function and pi a row-vector of the same dimension as W2. In the presence of variables W2, we apply the test in Delgado and Manteiga (2001) after replacing Y with (Y - pihat*W2), where pihat is Robinson (1988)'s estimator of pi.
Syntax:
dgmtest depvar expvar [if] [in] [, qz(integer) qw2(integer) teststat(string) kernel(string) bootdist(string) bw(real) bootnum(integer) ngrid(integer) qgrid(real)]
where
depvaris the outcome variable Yexpvaris a list of variables containing all elements of X, W, and Z. The order of variables in the list should be: X, W, Z)
The options are as follows:
qzis the dimension of Z (default = 1).qw2is the dimension of W2 (default = 0).teststatis the type of test statistic to be used: Cramer-van Mises (CvM, default) or Kolmogorov-Smirnov (KS).kernelis the kernel function: biweight, epanechnikov (default), epan2, epan4, normal, rectangle, triangular.bwis the bandwidth (default = n^(-1/3q), rule of thumb, where n is the sample size and q the dimension of X1).bootnumis the number of bootstrap samples for the computation of the test's critical value (default = 500).bootdistis the distribution of the bootstrap multiplier variable: mammen (default), rademacher, uniform.ngridis the number of equally spaced grid points used to compute the supremum of the KS statistic, if that statistic is chosen via the optionteststat. The default is 0 which means that the sample serves as the grid.qgridis a number between 0 and 1 to define the min and max values of the grid in the previous option. The min value is theqgrid-quantile and the max value is the (1-qgrid)-quantile. The default is 0 so that in that case the grid ranges from the min to the max value in the sample.
If options are left unspecified, the command runs on the default settings.
Wilhelm (2018) shows that, under some conditions, the null hypothesis H0 is equivalent to the hypothesis of no measurement error in X. In this context, the variable Z must be excluded from the outcome equation. For example, it could be a second measurement or an instrumental variable. See Wilhelm (2018), Lee and Wilhelm (2018), and the examples below for more details.
set obs 200
// true regressor
generate Xstar = runiform()
// measurement error in X
generate etaX = runiform()
// mismeasured regressor
generate X1 = Xstar + 0.5*etaX
// additively linear control variable
generate X2 = runiform()
// measurement error in Z
generate etaZ = runiform()
// second measurement of true regressor
generate Z = Xstar + 0.5*etaZ
// regression error
generate epsilon = runiform()
We generate an outcome in two different ways, in a regression with and without additively separable, linear controls:
// outcome equation without controls
generate Y1 = Xstar^2 + 0.2*Xstar + 0.5*epsilon
// outcome equation with controls
generate Y2 = Xstar^2 + 0.2*Xstar + 0.5*X2 + 0.5*epsilon
Perform the test using default options:
// perform the test of the hypothesis of no measurement error in X1
dgmtest Y1 X1 Z
dgmtest Y2 X1 X2 Z, qw2(1)
Perform the test, choosing the triangular kernel function:
// perform the test of the hypothesis of no measurement error in X1
dgmtest Y1 X1 Z, kernel(triangular)
dgmtest Y2 X1 X2 Z, qw2(1) kernel(triangular)
Wilhelm, D. (2018), "Testing for the Presence of Measurement Error", CeMMAP Working Paper CWP45/18