add White's test (heteroskedasticity) #206

PaulSoderlind · 2020-05-26T14:24:38Z

This PR adds White's (and Breusch-Pagan's) test for heteroskedasticity in a regression. It is one of the most commonly applied diagnostic tests (along with various tests for autocorrelation which already are in the package). The test gives the same result as bptest in R.

Closes issue #199

azev77 · 2020-05-27T02:04:43Z

Thanks this looks great! It might be nice to explicitly define BreuschPaganTest in case someone looking for it doesn't realize it's a special case of White Test.
Something that reuses code like:

BreuschPaganTest(X, e) = WhiteTest(X, e, :Linear)

PaulSoderlind · 2020-05-27T11:51:10Z

@azev77 Thanks for the feedback. One way of doing that is to

(1) add the function as you suggest, export it
(2) output a string from WhiteTest() that describes the test (eg. TestDescription = "Breusch-Pagan's test for heteroskedasticity (linear)" depending on the TestType
(3) which is used for testname(t::WhiteTest) = t.TestDescription

Sounds OK?

azev77 · 2020-05-27T15:56:22Z

Sounds good

PaulSoderlind · 2020-05-27T17:45:39Z

added Breusch-Pagan test and more precise testname(t::WhiteTest)

azev77 · 2020-05-27T18:18:20Z

Is population_param_of_interest() standard?
I like estimand().
I’m not recommending a modification, just discussing for the future...

PaulSoderlind · 2020-05-27T18:34:01Z

@azev77 Thanks. I just tried to follow what appears to be the default in this package.

Another thing: it would be nice to make this package and CovarianceMatrices.jl communicate better. The easiest way is to make something like gragusa/CovarianceMatrices.jl#49 happen. Please drop me a line in case you are interested in contributing.

wildart · 2020-05-28T23:17:13Z

src/white.jl

+
+    if TestType == :Linear
+        z  = X
+        TestDescription = string("Breusch-Pagan's test for heteroskedasticity (linear)")


There is no need to call string function here, string is any double-quoted text. Same below.

Indeed. These are left-overs from another attempt. The fix will be in the next push.

azev77 · 2020-06-02T20:10:15Z

@PaulSoderlind
check out a recent paper (Romano & Wolf, JE, 2017)
"Therefore, it is tempting to decide based on the data which route to take: OLS or WLS. Specifically,we suggest applying a test for conditional heteroskedasticity. Several such tests exists, the mostpopular ones being the tests of Breusch and Pagan (1979) and White (1980); also see Koenker(1981) and Koenker and Bassett (1982). If the null hypothesis of conditional homoskedasticity itnot rejected by such a test, use the OLS estimator; otherwise, use the WLS estimator. We call theresulting estimator theadaptive least squares(ALS) estimator. The motivation is as follows. Underconditional homoskedasticity, the ALS estimator will be equal to the WLS estimator with a smallprobability only (roughly equal to the nominal size of the test). Therefore, in this case, ALS isexpected to be more efficient than WLS in finite samples, though still less efficient than OLS."

azev77 · 2020-06-03T03:16:55Z

@PaulSoderlind
in your code you regress e2 on z=[X, X.^2]
I've seen some regress e2 on z=[y_hat, y_hat^2

Also, in theory these are just nested hypothesis tests, so in principal Wald/LR/LM can be used.

PaulSoderlind · 2020-06-03T08:45:32Z

@azev77 Yes, there are lots of versions of these tests. For instance, what I called the BP test is really the BP/Koenker (1981) test.

The code above can be used to perform some of the other versions applied in other packages: just specify X as desired to have

just some of the original regressors
the predicted values (y_hat) instead of the original regressors.

Maybe we should add this information to the documentation of the function?

As you point out, these are restricted versions, sometimes applied save dfs/increase power. I believe Stata has some interesting routines for this, and even through on a Bonferroni approach for multiple testing. I have Julia code for Bonferroni/Holm so that could be done here - at some point in time.

PaulSoderlind · 2020-06-05T13:12:38Z

@wildart I added a comment along the lines of the comment by @azev77.

Good to go?

wildart · 2020-06-06T08:14:06Z

You forgot to add the test file to test/runtests.jl

PaulSoderlind · 2020-06-07T08:45:38Z

@wildart I added white.jl to test/runtests.jl. I also added BreuschPaganTest to doc and test.

The test passes. I did notice, however, that Pkg.test(HypothesisTests) fails because of show.jl. This appears to be unrelated to my PR.

wildart

LGTM

wildart · 2020-06-07T13:33:36Z

Disregard nightly builds fails, especially if it doesn't related to your PR.

PaulSoderlind · 2020-06-19T20:49:02Z

It seems the feedback so far is positive. Anything else I can do to make this move forward?

wildart · 2020-06-19T21:55:15Z

@ararslan @mschauer Can we merge this PR?

nalimilan

Sorry, I'm late to the party, with small design comments.

src/white.jl

nalimilan · 2020-06-20T16:42:28Z

src/white.jl

+end
+
+"""
+    WhiteTest(X, e, TestType)


TestType should use lowercase. Also, maybe better make it a keyword argument

Suggested change

WhiteTest(X, e, TestType)

WhiteTest(X, e; type=:White)

lowercase is fine

keyword seems trickier: it seems to interfere with the dispatch, probably because the struct which is also called WhiteTest (as per the convention in this package). I'll send more information on this in a later email.

nalimilan · 2020-06-20T16:43:36Z

src/white.jl

+Compute White's (or Breusch-Pagan's) test for heteroskedasticity.
+
+`X` is a matrix of regressors and `e` is the vector of residuals from the original model.
+`TestType` is either `:Linear` for the Breusch-Pagan/Koenker test, `:LinearAndSquares` for


I'd use lowercase :linear and linearandsquares (or linear_and_squares), for consistency with e.g. pvalue's tail argument.

nalimilan · 2020-06-20T16:45:11Z

src/white.jl

+constant. In some applications, `X` is a subset of the regressors in the original model,
+or just the fitted values. This saves degrees of freedom and may give a more powerful test.
+The test statistic is T*R2 where R2 is from the regression of `e^2` on the terms
+mentioned above. Under the null hypothesis it is distributed as `Chisquare(dof)`


Better use the actual name of the type in Distributions.jl:

Suggested change

mentioned above. Under the null hypothesis it is distributed as `Chisquare(dof)`

mentioned above. Under the null hypothesis it is distributed as `Chisq(dof)`

nalimilan · 2020-06-20T16:45:51Z

src/white.jl

+    n == length(e) || throw(DimensionMismatch("inputs must have the same length"))
+
+    if TestType == :Linear
+        z  = X


Suggested change

z = X

z = X

nalimilan · 2020-06-20T16:49:30Z

src/white.jl

+        for i = 1:K, j = i:K
+            z[:,vv] = X[:,i].*X[:,j]             #eg. x1*x1, x1*x2, x2*x2
+            vv      = vv + 1


Better avoid unnecessary copies:

Suggested change

for i = 1:K, j = i:K

z[:,vv] = X[:,i].*X[:,j] #eg. x1*x1, x1*x2, x2*x2

vv = vv + 1

@views for i = 1:K, j = i:K

z[:, vv] .= X[:, i] .* X[:, j] #eg. x1*x1, x1*x2, x2*x2

vv += 1

OK, except that I dislike the X[:, j] thing. I would prefer no space

The only things that matters is consistency -- but I don't know which convention is used in this package, if any.

nalimilan · 2020-06-20T16:50:28Z

test/white.jl

+   bp_test = BreuschPaganTest(X, e,)
+   @test pvalue(bp_test) ≈ 0.1287 atol=atol


Suggested change

bp_test = BreuschPaganTest(X, e,)

@test pvalue(bp_test) ≈ 0.1287 atol=atol

bp_test = BreuschPaganTest(X, e,)

@test pvalue(bp_test) ≈ 0.1287 atol=atol

src/white.jl

nalimilan · 2020-06-23T07:38:03Z

Sorry, I don't understand the problem. You need to write z = foo(1.2, B=:xx) when using keyword arguments. z = foo(1.2, :xx) calls the default constructor, which you can override if needed using an inner constructor (see the manual). But I don't think that's the case here?

nalimilan · 2020-06-23T07:40:15Z

src/white.jl

+    dof::Int                 #degrees of freedom in test
+    LM::Float64              #test statistic, distributed as Chisq(dof)
+    TestDescription::String  #short description of the test


Suggested change

dof::Int #degrees of freedom in test

LM::Float64 #test statistic, distributed as Chisq(dof)

TestDescription::String #short description of the test

dof::Int # degrees of freedom in test

lm::Float64 # test statistic, distributed as Chisq(dof)

descr::String # short description of the test

Oh, and what does LM stand for?

I thought LM = Lagrange Multiplier test statistic?
(the only other context is LM = Linear Model)

yes, Lagrange Multiplier test statistic. Perhaps we could use LMstat or lmstat instead

I looked around in the lit: and it's LM everywhere. I'll stick to that.

Though better use lowercase lm, in Julia uppercase is used for types.

PaulSoderlind · 2020-06-23T11:08:39Z

@nalimilan I have made the changes you requested and make an update of the PR. And, yes, you are right about the keyword argument.

PaulSoderlind · 2020-06-23T16:12:50Z

@nalimilan and @azev77

I have some 2nd thoughts about one of latest design changes: X should really by an AbstractMatrix (not an AbstractArray) and we should check whether it actually includes a (non-zero) constant column. Motivation: the test really needs X to have a constant and at least one more column. If not, the results are not valid (so a silent failure).

To test the X matrix: (comments welcome):

constant_cols = mapslices(col->all(diff(col).==0),X,dims=1)   #loops over columns, test if constant
if !any(X[1,vec(constant_cols)] .!= 0)          #test if any constant column is non-zero
  throw(ArgumentError("one of the colums of X must be a non-zero constant"))
end

This clearly allocates memory, but just one column at a time.

PaulSoderlind · 2020-06-30T08:14:26Z

I have now implemented (almost all of) the suggestions by @nalimilan limilan. In addition, I have added checks on X having at least two columns and including a non-zero constant column (or else the test can silently fail). Tests for these checks have been added.

@wildart good to go?

nalimilan · 2020-07-01T16:07:42Z

src/white.jl

+end
+
+"""
+    WhiteTest(X, e; testtype = :White)


No need to repeat "test"?

Suggested change

WhiteTest(X, e; testtype = :White)

WhiteTest(X, e; type = :White)

nalimilan · 2020-07-01T16:08:50Z

src/white.jl

+
+Compute White's (or Breusch-Pagan's) test for heteroskedasticity.
+
+`X` is a matrix of regressors and `e` is the vector of residuals from the original model.


Could you say that observations are in rows and/or that variables are in columns? There are different conventions and it can be confusing.

nalimilan · 2020-07-01T16:08:59Z

src/white.jl

+  * [White's test on Wikipedia](https://en.wikipedia.org/wiki/White_test)
+"""
+function WhiteTest(X::AbstractMatrix{<:Real}, e::AbstractVector{<:Real}; testtype = :White)
+


Suggested change

nalimilan · 2020-07-01T16:13:05Z

src/white.jl

+    intercept_cols = mapslices(col->all(diff(col).==0) && (first(col) != 0),X,dims=1)
+    any(intercept_cols) || throw(ArgumentError("one of the colums of X must be a non-zero constant"))


This will avoid lots of allocations (I hope I got it right but you see the idea):

Suggested change

intercept_cols = mapslices(col->all(diff(col).==0) && (first(col) != 0),X,dims=1)

any(intercept_cols) || throw(ArgumentError("one of the colums of X must be a non-zero constant"))

intercept_col = any(eachcol(X)) do col

first(col) != 0 && all(==(first(col)), col)

end

intercept_col || throw(ArgumentError("one of the colums of X must be a non-zero constant"))

nalimilan · 2020-07-01T16:14:10Z

src/white.jl

+Compute Breusch-Pagan's test for heteroskedasticity.
+
+`X` is a matrix of regressors from the original model and `e` the vector of residuals.
+See `WhiteTest` for further details.


Suggested change

See `WhiteTest` for further details.

This is equivalent to `WhiteTest(X, e, testtype=:linear)`.

See [`WhiteTest`](@ref) for further details.

nalimilan · 2020-07-01T16:14:53Z

test/white.jl

+
+    @test_throws ArgumentError WhiteTest(rand(4,2), rand(4))
+
+    bp_test = BreuschPaganTest(X, e,)


Suggested change

bp_test = BreuschPaganTest(X, e,)

bp_test = BreuschPaganTest(X, e)

nalimilan · 2020-07-01T16:16:15Z

test/white.jl

+
+    bp_test = BreuschPaganTest(X, e,)
+    @test pvalue(bp_test) ≈ 0.1287 atol=atol
+


Can you add a test for show?

nalimilan · 2020-07-01T16:17:50Z

src/white.jl

+    dof::Int                 #degrees of freedom in test
+    LM::Float64              #test statistic, distributed as Chisq(dof)
+    TestDescription::String  #short description of the test


Though better use lowercase lm, in Julia uppercase is used for types.

nalimilan

Sorry for the delay. I just have a few suggestions.

PaulSoderlind · 2020-07-02T11:37:15Z

@nalimilan Thanks for the suggestions. I believe I implemented all of them. Yes, your intercept_col is a better idea than my old code. I changed slightly to
intercept_col = any(col -> first(col) != 0 && all(==(first(col)), col),eachcol(X))

However, I now notice travis test failures on Julia 1.0 (passes the 1.3 test). Because of eachcol?

nalimilan · 2020-07-03T07:21:14Z

However, I now notice travis test failures on Julia 1.0 (passes the 1.3 test). Because of eachcol?

Ah, yes, you'll need to add a dependency on Compat 2.2.0.

nalimilan · 2020-07-03T07:22:36Z

test/white.jl

    @test pvalue(w_test) ≈ 0.2774 atol=atol

    w_test = WhiteTest(X, e)
    @test pvalue(w_test) ≈ 0.3458 atol=atol
-
    show(IOBuffer(), w_test)


Sorry I should have been more specific: this should also check the contents of the IOBuffer using e.g. String(take!(buf)) == """...""".

@nalimilan Regarding the eachcol and Compat. I don't know whether the maintainers like adding another dep. Perhaps this old-fashioned approach could do the trick until support for < Julia 1.1 is dropped?

intercept_col = false for i = 1:K col = view(X,:,i) intercept_col = first(col) != 0 && all(==(first(col)), col) intercept_col && break end

@nalimilan Regarding testing the IOBuffer, I could do something like

buffer = IOBuffer() show(buffer, w_test) str = String(take!(buffer)) @test str == refstr

where refstr is a predfined string with the expected output.

But, I see a problem: the expected output contains point estimate: 5.612418635833716. That's a lot of digits, and strictly requiring that they all match is somewhat in conflict with the other tests that use isapprox. Is there a way around this (other than always rounding the printed results?)

You could use $value to interpolate the exact value in the string.

Regarding Compat, that's always a tradeoff, but I'd say it's a small and common enough dependency that it's OK. Adding version checks everywhere is precisely what Compat allows to avoid.

The latest push includes

a test for show. (I did't find anything similar in this package, so I am not entirely sure what the expectation is.)

a loop for testing if X has an intercept term. Not so pretty, but compatible with Julia 1.0.5 and actually pretty fast.

nalimilan

Thanks, just a few details and it should be ready.

nalimilan · 2020-07-06T14:16:00Z

test/white.jl

+    @test w_pval ≈ 0.3458 atol=atol
+
+    refstr = """
+White's (or Breusch-Pagan's) test for heteroskedasticity


You should be able to add a four-space indentation here without breaking the test.

nalimilan · 2020-07-06T14:17:25Z

src/white.jl

@@ -64,7 +64,13 @@ function WhiteTest(X::AbstractMatrix{<:Real}, e::AbstractVector{<:Real}; type =

    K >= 2 || throw(ArgumentError("X must have >= 2 columns"))

-    intercept_col = any(col -> first(col) != 0 && all(==(first(col)), col),eachcol(X))
+   # intercept_col = any(col -> first(col) != 0 && all(==(first(col)), col),eachcol(X))


Suggested change

# intercept_col = any(col -> first(col) != 0 && all(==(first(col)), col),eachcol(X))

nalimilan · 2020-07-06T14:20:02Z

src/white.jl

@@ -102,11 +108,11 @@ BreuschPaganTest(X, e) = WhiteTest(X, e, type = :linear)

 testname(t::WhiteTest) = "White's (or Breusch-Pagan's) test for heteroskedasticity"

-population_param_of_interest(t::WhiteTest) = ("T*R2", 0, t.lm)
+population_param_of_interest(t::WhiteTest) = ("T*R2", 0, round(t.lm,digits=4))


Currently we don't round value in other tests, so better don't round here either. It would probably make sense to apply some rounding when printing, but that should be applied to all tests at the same time, in another PR.

Thanks. I have skipped the rounding, except for testing pvalue() which has a built-in round(z,digits=4). I also fixed the indentation of that """...""" string.

nalimilan

Thanks!

add White's test (heteroskedasticity)

9b60be0

PaulSoderlind mentioned this pull request May 26, 2020

Test for heteroskedasticity (White's etc) #199

Closed

add Breusch-Pagan test

4d7269a

wildart reviewed May 28, 2020

View reviewed changes

drop string("...")

511ae0d

better description, added a reference

725eb26

PaulSoderlind added 3 commits June 7, 2020 10:40

added BreuschPaganTest to doc

6587dad

added BreuschPaganTest

b996a97

add white.jl

525d886

wildart approved these changes Jun 7, 2020

View reviewed changes

nalimilan reviewed Jun 20, 2020

View reviewed changes

nalimilan reviewed Jun 23, 2020

View reviewed changes

fixed design comments; now keyword

e7e46f8

added check of X and tests

fb1c6bf

nalimilan reviewed Jul 1, 2020

View reviewed changes

implemented suggestions by nalimilan

7a94201

nalimilan reviewed Jul 3, 2020

View reviewed changes

reverted to loop, added test for show

75f9de6

nalimilan reviewed Jul 6, 2020

View reviewed changes

less rounding, better indentation

333510c

nalimilan approved these changes Jul 6, 2020

View reviewed changes

mschauer merged commit b4e2775 into JuliaStats:master Jul 7, 2020

	mentioned above. Under the null hypothesis it is distributed as `Chisquare(dof)`
	mentioned above. Under the null hypothesis it is distributed as `Chisq(dof)`

		bp_test = BreuschPaganTest(X, e,)
		@test pvalue(bp_test) ≈ 0.1287 atol=atol

	WhiteTest(X, e; testtype = :White)
	WhiteTest(X, e; type = :White)


		Compute White's (or Breusch-Pagan's) test for heteroskedasticity.

		`X` is a matrix of regressors and `e` is the vector of residuals from the original model.

		intercept_cols = mapslices(col->all(diff(col).==0) && (first(col) != 0),X,dims=1)
		any(intercept_cols) \|\| throw(ArgumentError("one of the colums of X must be a non-zero constant"))

-See `WhiteTest` for further details.
+This is equivalent to `WhiteTest(X, e, testtype=:linear)`.
+See [`WhiteTest`](@ref) for further details.


		@test_throws ArgumentError WhiteTest(rand(4,2), rand(4))

		bp_test = BreuschPaganTest(X, e,)

	bp_test = BreuschPaganTest(X, e,)
	bp_test = BreuschPaganTest(X, e)

add White's test (heteroskedasticity) #206

add White's test (heteroskedasticity) #206

Conversation

PaulSoderlind commented May 26, 2020

azev77 commented May 27, 2020

PaulSoderlind commented May 27, 2020

azev77 commented May 27, 2020

PaulSoderlind commented May 27, 2020

azev77 commented May 27, 2020 • edited Loading

PaulSoderlind commented May 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

azev77 commented Jun 2, 2020

azev77 commented Jun 3, 2020 • edited Loading

PaulSoderlind commented Jun 3, 2020

PaulSoderlind commented Jun 5, 2020

wildart commented Jun 6, 2020

PaulSoderlind commented Jun 7, 2020

wildart left a comment

Choose a reason for hiding this comment

wildart commented Jun 7, 2020

PaulSoderlind commented Jun 19, 2020

wildart commented Jun 19, 2020

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan commented Jun 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PaulSoderlind commented Jun 23, 2020

PaulSoderlind commented Jun 23, 2020

PaulSoderlind commented Jun 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan left a comment

Choose a reason for hiding this comment

PaulSoderlind commented Jul 2, 2020

nalimilan commented Jul 3, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PaulSoderlind Jul 6, 2020 • edited Loading

Choose a reason for hiding this comment

nalimilan left a comment

Choose a reason for hiding this comment

azev77 commented May 27, 2020 •

edited

Loading

azev77 commented Jun 3, 2020 •

edited

Loading

nalimilan commented Jul 3, 2020 •

edited

Loading

PaulSoderlind Jul 6, 2020 •

edited

Loading