-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Durbin-Watson test #102
Conversation
src/durbin_watson.jl
Outdated
vector of residuals. Note that the Durbin-Watson test is not valid if `X` | ||
includes a lagged dependent variable. The test statistic is computed as | ||
```math | ||
DW = \frac{\sum_{t=2}^n (e_t - e_{t-1})^2}{\sum_{t=1}^n e_t^2} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to escape the backslashes
6e80b73
to
6b25daa
Compare
I have now added Pan's algorithm (Farebrother, 1980) for the computation of exact p-values in small samples. Tests are comparing output to R. @ararslan It seems like the p-value is computed three times when calling |
src/durbin_watson.jl
Outdated
function pan_algorithm(a::AbstractArray, x::Float64, m::Int, n::Int) | ||
|
||
try # catch case where ν is empty | ||
ν = find(a.>=x)[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findfirst(ai -> ai >= x, a)
will be more efficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great as it allows me to get rid of the try-catch construct.
src/durbin_watson.jl
Outdated
pin = pi / (2n) | ||
sum = 0.5 * (k + 1) | ||
sgn = k / n | ||
n2 = n + n -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2n-1
?
src/durbin_watson.jl
Outdated
catch | ||
sum = 1.0 | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a return
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems I don't need one here as it returns the last value automatically. I eliminated the try-catch construct anyways. I have now also added tests to check these corner cases.
src/durbin_watson.jl
Outdated
# p-vales based on Pan's algorithm (see Farebrother, 1980) | ||
|
||
# the following setup is, e.g, described in Durbin and Watson (1971) | ||
A = diagm(-1 * ones(x.n - 1), -1) + diagm(-1 * ones(x.n - 1), 1) +diagm( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can just use -ones(x.n - 1)
rather than multiplying by -1
src/durbin_watson.jl
Outdated
end | ||
|
||
pin = pi / (2n) | ||
sum = 0.5 * (k + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically it's preferred to avoid float literals in computations such as this, since float literals are always Float64
, which will promote smaller float types to Float64
when they're used together in a computation. So in this case it would be sum = (k + 1) / 2
(though also note that it doesn't really matter, since that will promote to Float64
anyway, but the general point still stands 😛). It might be worthwhile to check @code_warntype pan_algorithm(<some args>)
to see whether the function remains type-stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, makes sense. Never thought about that. I had checked for type instability in the algorithm with @code_warntype
before and everything looked ok. Rechecked after making the changes and it's still fine.
src/durbin_watson.jl
Outdated
|
||
export DurbinWatsonTest | ||
|
||
struct DurbinWatsonTest <: HypothesisTest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The package still supports Julia 0.5 but struct
only parses on 0.6 and later. So this should be immutable
until we drop 0.5 support.
Yeah, I could be wrong but I'm pretty sure it's the fault of the general |
eeef5a2
to
f63b98c
Compare
src/durbin_watson.jl
Outdated
|
||
ν = findfirst(ai -> ai >= x, a) | ||
if ν == 0 | ||
return sum = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need the sum =
, right?
f63b98c
to
a767e60
Compare
I have opened an issue about the double computation of the p-value in #105. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not intimately familiar with Durbin-Watson but this implementation looks good to me, plus it matches the results from R.
Thanks for the quick and very helpful review Alex! |
Any other comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's really cool! I would have had a few stylistic comments.
export DurbinWatsonTest | ||
|
||
immutable DurbinWatsonTest <: HypothesisTest | ||
xmat::Array{Float64} # regressor matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's a matrix it should be Matrix{Float64}
to use a concrete type.
``` | ||
where `n` is the number of observations. | ||
|
||
By default, the choice of approach to compute p-values depends on the sample size (`p_compute=:ndep`). For small samples (n<100), Pan's algorithm (Farebrother, 1980) is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break line at 92 chars.
](https://en.wikipedia.org/wiki/Durbin–Watson_statistic) | ||
""" | ||
function DurbinWatsonTest{T<:Real}(xmat::AbstractArray{T}, e::AbstractArray{T}; | ||
p_compute::Symbol = :ndep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect indentation.
end | ||
|
||
if exact_problem_flag == 1 || (x.p_compute == :ndep && x.n > 100 | ||
) || x.p_compute == :approx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parenthesis should be on previous line.
p_compute::Symbol = :ndep) | ||
|
||
n = length(e) | ||
DW = sum(diff(e) .^2) / sum(e .^2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how much it can matter for performance, but you can use sum(abs2, diff(e))/sum(abs2, e)
to avoid allocating copies. (diff
itself allocates a vector, which could be avoided by using a small loop.)
|
||
""" | ||
function pan_algorithm(a::AbstractArray, x::Float64, m::Int, n::Int) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention used in Julia and in JuliaStats packages is not to add blank lines at the top nor at the end of functions.
Thanks Milan! I'll revisit this after the smoke has cleared from the documentation migration. |
Implementation of the Durbin-Watson test for serial correlation in the residuals of a regression model.
WIP as I still have to implement Pan's algorithm (Farebrother, 1980) for the computation of exact p-values in small samples.
Everything else is ready and I would be glad to get feedback.
One question: I tried to include a displayed latex equation in the docstring as described in the documentation (and I understand it will not render the Latex code in the REPL or Juno console) but it does not even display the Latex code correctly.
Thanks as always for your help!