Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add White's test (heteroskedasticity) #206

Merged
merged 12 commits into from
Jul 7, 2020
Merged

add White's test (heteroskedasticity) #206

merged 12 commits into from
Jul 7, 2020

Conversation

PaulSoderlind
Copy link
Contributor

This PR adds White's (and Breusch-Pagan's) test for heteroskedasticity in a regression. It is one of the most commonly applied diagnostic tests (along with various tests for autocorrelation which already are in the package). The test gives the same result as bptest in R.

Closes issue #199

@azev77
Copy link

azev77 commented May 27, 2020

Thanks this looks great! It might be nice to explicitly define BreuschPaganTest in case someone looking for it doesn't realize it's a special case of White Test.
Something that reuses code like:

BreuschPaganTest(X, e) = WhiteTest(X, e, :Linear)

@PaulSoderlind
Copy link
Contributor Author

@azev77 Thanks for the feedback. One way of doing that is to

(1) add the function as you suggest, export it
(2) output a string from WhiteTest() that describes the test (eg. TestDescription = "Breusch-Pagan's test for heteroskedasticity (linear)" depending on the TestType
(3) which is used for testname(t::WhiteTest) = t.TestDescription

Sounds OK?

@azev77
Copy link

azev77 commented May 27, 2020

Sounds good

@PaulSoderlind
Copy link
Contributor Author

added Breusch-Pagan test and more precise testname(t::WhiteTest)

@azev77
Copy link

azev77 commented May 27, 2020

Is population_param_of_interest() standard?
I like estimand().
I’m not recommending a modification, just discussing for the future...

@PaulSoderlind
Copy link
Contributor Author

@azev77 Thanks. I just tried to follow what appears to be the default in this package.

Another thing: it would be nice to make this package and CovarianceMatrices.jl communicate better. The easiest way is to make something like gragusa/CovarianceMatrices.jl#49 happen. Please drop me a line in case you are interested in contributing.

src/white.jl Outdated

if TestType == :Linear
z = X
TestDescription = string("Breusch-Pagan's test for heteroskedasticity (linear)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to call string function here, string is any double-quoted text. Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. These are left-overs from another attempt. The fix will be in the next push.

@azev77
Copy link

azev77 commented Jun 2, 2020

@PaulSoderlind
check out a recent paper (Romano & Wolf, JE, 2017)
"Therefore, it is tempting to decide based on the data which route to take: OLS or WLS. Specifically,we suggest applying a test for conditional heteroskedasticity. Several such tests exists, the mostpopular ones being the tests of Breusch and Pagan (1979) and White (1980); also see Koenker(1981) and Koenker and Bassett (1982). If the null hypothesis of conditional homoskedasticity itnot rejected by such a test, use the OLS estimator; otherwise, use the WLS estimator. We call theresulting estimator theadaptive least squares(ALS) estimator. The motivation is as follows. Underconditional homoskedasticity, the ALS estimator will be equal to the WLS estimator with a smallprobability only (roughly equal to the nominal size of the test). Therefore, in this case, ALS isexpected to be more efficient than WLS in finite samples, though still less efficient than OLS."

@azev77
Copy link

azev77 commented Jun 3, 2020

@PaulSoderlind
in your code you regress e2 on z=[X, X.^2]
I've seen some regress e2 on z=[y_hat, y_hat^2

Also, in theory these are just nested hypothesis tests, so in principal Wald/LR/LM can be used.

@PaulSoderlind
Copy link
Contributor Author

@azev77 Yes, there are lots of versions of these tests. For instance, what I called the BP test is really the BP/Koenker (1981) test.

The code above can be used to perform some of the other versions applied in other packages: just specify X as desired to have

  1. just some of the original regressors
  2. the predicted values (y_hat) instead of the original regressors.

Maybe we should add this information to the documentation of the function?

As you point out, these are restricted versions, sometimes applied save dfs/increase power. I believe Stata has some interesting routines for this, and even through on a Bonferroni approach for multiple testing. I have Julia code for Bonferroni/Holm so that could be done here - at some point in time.

@PaulSoderlind
Copy link
Contributor Author

@wildart I added a comment along the lines of the comment by @azev77.

Good to go?

@wildart
Copy link
Contributor

wildart commented Jun 6, 2020

You forgot to add the test file to test/runtests.jl

@PaulSoderlind
Copy link
Contributor Author

@wildart I added white.jl to test/runtests.jl. I also added BreuschPaganTest to doc and test.

The test passes. I did notice, however, that Pkg.test(HypothesisTests) fails because of show.jl. This appears to be unrelated to my PR.

Copy link
Contributor

@wildart wildart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wildart
Copy link
Contributor

wildart commented Jun 7, 2020

Disregard nightly builds fails, especially if it doesn't related to your PR.

@PaulSoderlind
Copy link
Contributor Author

It seems the feedback so far is positive. Anything else I can do to make this move forward?

@wildart
Copy link
Contributor

wildart commented Jun 19, 2020

@ararslan @mschauer Can we merge this PR?

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm late to the party, with small design comments.

src/white.jl Show resolved Hide resolved
src/white.jl Outdated
end

"""
WhiteTest(X, e, TestType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestType should use lowercase. Also, maybe better make it a keyword argument

Suggested change
WhiteTest(X, e, TestType)
WhiteTest(X, e; type=:White)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowercase is fine

keyword seems trickier: it seems to interfere with the dispatch, probably because the struct which is also called WhiteTest (as per the convention in this package). I'll send more information on this in a later email.

src/white.jl Outdated
Compute White's (or Breusch-Pagan's) test for heteroskedasticity.

`X` is a matrix of regressors and `e` is the vector of residuals from the original model.
`TestType` is either `:Linear` for the Breusch-Pagan/Koenker test, `:LinearAndSquares` for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use lowercase :linear and linearandsquares (or linear_and_squares), for consistency with e.g. pvalue's tail argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

src/white.jl Outdated
constant. In some applications, `X` is a subset of the regressors in the original model,
or just the fitted values. This saves degrees of freedom and may give a more powerful test.
The test statistic is T*R2 where R2 is from the regression of `e^2` on the terms
mentioned above. Under the null hypothesis it is distributed as `Chisquare(dof)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use the actual name of the type in Distributions.jl:

Suggested change
mentioned above. Under the null hypothesis it is distributed as `Chisquare(dof)`
mentioned above. Under the null hypothesis it is distributed as `Chisq(dof)`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

src/white.jl Outdated
n == length(e) || throw(DimensionMismatch("inputs must have the same length"))

if TestType == :Linear
z = X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
z = X
z = X

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

src/white.jl Outdated
Comment on lines 73 to 75
for i = 1:K, j = i:K
z[:,vv] = X[:,i].*X[:,j] #eg. x1*x1, x1*x2, x2*x2
vv = vv + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better avoid unnecessary copies:

Suggested change
for i = 1:K, j = i:K
z[:,vv] = X[:,i].*X[:,j] #eg. x1*x1, x1*x2, x2*x2
vv = vv + 1
@views for i = 1:K, j = i:K
z[:, vv] .= X[:, i] .* X[:, j] #eg. x1*x1, x1*x2, x2*x2
vv += 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, except that I dislike the X[:, j] thing. I would prefer no space

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only things that matters is consistency -- but I don't know which convention is used in this package, if any.

test/white.jl Outdated
Comment on lines 221 to 222
bp_test = BreuschPaganTest(X, e,)
@test pvalue(bp_test) ≈ 0.1287 atol=atol
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bp_test = BreuschPaganTest(X, e,)
@test pvalue(bp_test) 0.1287 atol=atol
bp_test = BreuschPaganTest(X, e,)
@test pvalue(bp_test) 0.1287 atol=atol

src/white.jl Show resolved Hide resolved
@nalimilan
Copy link
Member

Sorry, I don't understand the problem. You need to write z = foo(1.2, B=:xx) when using keyword arguments. z = foo(1.2, :xx) calls the default constructor, which you can override if needed using an inner constructor (see the manual). But I don't think that's the case here?

src/white.jl Outdated
Comment on lines 28 to 30
dof::Int #degrees of freedom in test
LM::Float64 #test statistic, distributed as Chisq(dof)
TestDescription::String #short description of the test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dof::Int #degrees of freedom in test
LM::Float64 #test statistic, distributed as Chisq(dof)
TestDescription::String #short description of the test
dof::Int # degrees of freedom in test
lm::Float64 # test statistic, distributed as Chisq(dof)
descr::String # short description of the test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and what does LM stand for?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought LM = Lagrange Multiplier test statistic?
(the only other context is LM = Linear Model)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, Lagrange Multiplier test statistic. Perhaps we could use LMstat or lmstat instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked around in the lit: and it's LM everywhere. I'll stick to that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though better use lowercase lm, in Julia uppercase is used for types.

@PaulSoderlind
Copy link
Contributor Author

@nalimilan I have made the changes you requested and make an update of the PR. And, yes, you are right about the keyword argument.

@PaulSoderlind
Copy link
Contributor Author

@nalimilan and @azev77

I have some 2nd thoughts about one of latest design changes: X should really by an AbstractMatrix (not an AbstractArray) and we should check whether it actually includes a (non-zero) constant column. Motivation: the test really needs X to have a constant and at least one more column. If not, the results are not valid (so a silent failure).

To test the X matrix: (comments welcome):

constant_cols = mapslices(col->all(diff(col).==0),X,dims=1)   #loops over columns, test if constant
if !any(X[1,vec(constant_cols)] .!= 0)          #test if any constant column is non-zero
  throw(ArgumentError("one of the colums of X must be a non-zero constant"))
end

This clearly allocates memory, but just one column at a time.

@PaulSoderlind
Copy link
Contributor Author

I have now implemented (almost all of) the suggestions by @nalimilan limilan. In addition, I have added checks on X having at least two columns and including a non-zero constant column (or else the test can silently fail). Tests for these checks have been added.

@wildart good to go?

src/white.jl Outdated
end

"""
WhiteTest(X, e; testtype = :White)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to repeat "test"?

Suggested change
WhiteTest(X, e; testtype = :White)
WhiteTest(X, e; type = :White)


Compute White's (or Breusch-Pagan's) test for heteroskedasticity.

`X` is a matrix of regressors and `e` is the vector of residuals from the original model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you say that observations are in rows and/or that variables are in columns? There are different conventions and it can be confusing.

src/white.jl Outdated
* [White's test on Wikipedia](https://en.wikipedia.org/wiki/White_test)
"""
function WhiteTest(X::AbstractMatrix{<:Real}, e::AbstractVector{<:Real}; testtype = :White)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

src/white.jl Outdated
Comment on lines 67 to 68
intercept_cols = mapslices(col->all(diff(col).==0) && (first(col) != 0),X,dims=1)
any(intercept_cols) || throw(ArgumentError("one of the colums of X must be a non-zero constant"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will avoid lots of allocations (I hope I got it right but you see the idea):

Suggested change
intercept_cols = mapslices(col->all(diff(col).==0) && (first(col) != 0),X,dims=1)
any(intercept_cols) || throw(ArgumentError("one of the colums of X must be a non-zero constant"))
intercept_col = any(eachcol(X)) do col
first(col) != 0 && all(==(first(col)), col)
end
intercept_col || throw(ArgumentError("one of the colums of X must be a non-zero constant"))

src/white.jl Outdated
Compute Breusch-Pagan's test for heteroskedasticity.

`X` is a matrix of regressors from the original model and `e` the vector of residuals.
See `WhiteTest` for further details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
See `WhiteTest` for further details.
This is equivalent to `WhiteTest(X, e, testtype=:linear)`.
See [`WhiteTest`](@ref) for further details.

test/white.jl Outdated

@test_throws ArgumentError WhiteTest(rand(4,2), rand(4))

bp_test = BreuschPaganTest(X, e,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bp_test = BreuschPaganTest(X, e,)
bp_test = BreuschPaganTest(X, e)

test/white.jl Outdated

bp_test = BreuschPaganTest(X, e,)
@test pvalue(bp_test) ≈ 0.1287 atol=atol

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for show?

src/white.jl Outdated
Comment on lines 28 to 30
dof::Int #degrees of freedom in test
LM::Float64 #test statistic, distributed as Chisq(dof)
TestDescription::String #short description of the test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though better use lowercase lm, in Julia uppercase is used for types.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I just have a few suggestions.

@PaulSoderlind
Copy link
Contributor Author

@nalimilan Thanks for the suggestions. I believe I implemented all of them. Yes, your intercept_col is a better idea than my old code. I changed slightly to
intercept_col = any(col -> first(col) != 0 && all(==(first(col)), col),eachcol(X))

However, I now notice travis test failures on Julia 1.0 (passes the 1.3 test). Because of eachcol?

@nalimilan
Copy link
Member

nalimilan commented Jul 3, 2020

However, I now notice travis test failures on Julia 1.0 (passes the 1.3 test). Because of eachcol?

Ah, yes, you'll need to add a dependency on Compat 2.2.0.

test/white.jl Outdated
@test pvalue(w_test) ≈ 0.2774 atol=atol

w_test = WhiteTest(X, e)
@test pvalue(w_test) ≈ 0.3458 atol=atol

show(IOBuffer(), w_test)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should have been more specific: this should also check the contents of the IOBuffer using e.g. String(take!(buf)) == """...""".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nalimilan Regarding the eachcol and Compat. I don't know whether the maintainers like adding another dep. Perhaps this old-fashioned approach could do the trick until support for < Julia 1.1 is dropped?

intercept_col = false
for i = 1:K
    col = view(X,:,i)
    intercept_col = first(col) != 0 && all(==(first(col)), col)
    intercept_col && break
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nalimilan Regarding testing the IOBuffer, I could do something like

buffer = IOBuffer()
show(buffer, w_test)
str = String(take!(buffer))
@test str == refstr

where refstr is a predfined string with the expected output.

But, I see a problem: the expected output contains point estimate: 5.612418635833716. That's a lot of digits, and strictly requiring that they all match is somewhat in conflict with the other tests that use isapprox. Is there a way around this (other than always rounding the printed results?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use $value to interpolate the exact value in the string.

Regarding Compat, that's always a tradeoff, but I'd say it's a small and common enough dependency that it's OK. Adding version checks everywhere is precisely what Compat allows to avoid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest push includes

  1. a test for show. (I did't find anything similar in this package, so I am not entirely sure what the expectation is.)
  2. a loop for testing if X has an intercept term. Not so pretty, but compatible with Julia 1.0.5 and actually pretty fast.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just a few details and it should be ready.

test/white.jl Outdated
@test w_pval ≈ 0.3458 atol=atol

refstr = """
White's (or Breusch-Pagan's) test for heteroskedasticity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to add a four-space indentation here without breaking the test.

src/white.jl Outdated
@@ -64,7 +64,13 @@ function WhiteTest(X::AbstractMatrix{<:Real}, e::AbstractVector{<:Real}; type =

K >= 2 || throw(ArgumentError("X must have >= 2 columns"))

intercept_col = any(col -> first(col) != 0 && all(==(first(col)), col),eachcol(X))
# intercept_col = any(col -> first(col) != 0 && all(==(first(col)), col),eachcol(X))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# intercept_col = any(col -> first(col) != 0 && all(==(first(col)), col),eachcol(X))

src/white.jl Outdated
@@ -102,11 +108,11 @@ BreuschPaganTest(X, e) = WhiteTest(X, e, type = :linear)

testname(t::WhiteTest) = "White's (or Breusch-Pagan's) test for heteroskedasticity"

population_param_of_interest(t::WhiteTest) = ("T*R2", 0, t.lm)
population_param_of_interest(t::WhiteTest) = ("T*R2", 0, round(t.lm,digits=4))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we don't round value in other tests, so better don't round here either. It would probably make sense to apply some rounding when printing, but that should be applied to all tests at the same time, in another PR.

Copy link
Contributor Author

@PaulSoderlind PaulSoderlind Jul 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I have skipped the rounding, except for testing pvalue() which has a built-in round(z,digits=4). I also fixed the indentation of that """...""" string.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mschauer mschauer merged commit b4e2775 into JuliaStats:master Jul 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants