Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Durbin-Watson test #102

Merged
merged 2 commits into from
Aug 22, 2017
Merged

Conversation

BenjaminBorn
Copy link
Contributor

Implementation of the Durbin-Watson test for serial correlation in the residuals of a regression model.

WIP as I still have to implement Pan's algorithm (Farebrother, 1980) for the computation of exact p-values in small samples.

Everything else is ready and I would be glad to get feedback.

One question: I tried to include a displayed latex equation in the docstring as described in the documentation (and I understand it will not render the Latex code in the REPL or Juno console) but it does not even display the Latex code correctly.

Thanks as always for your help!

vector of residuals. Note that the Durbin-Watson test is not valid if `X`
includes a lagged dependent variable. The test statistic is computed as
```math
DW = \frac{\sum_{t=2}^n (e_t - e_{t-1})^2}{\sum_{t=1}^n e_t^2}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to escape the backslashes

@BenjaminBorn BenjaminBorn changed the title WIP: Durbin-Watson test Add Durbin-Watson test Aug 13, 2017
@BenjaminBorn
Copy link
Contributor Author

I have now added Pan's algorithm (Farebrother, 1980) for the computation of exact p-values in small samples. Tests are comparing output to R.

@ararslan It seems like the p-value is computed three times when calling t = DurbinWatsonTest(X, resid; p_compute = :exact). You can see this when running the last example of the test file (line 193). The warning is printed three times. Is this a general problem of the package that we should investigate or did I do something stupid?

function pan_algorithm(a::AbstractArray, x::Float64, m::Int, n::Int)

try # catch case where ν is empty
ν = find(a.>=x)[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findfirst(ai -> ai >= x, a)

will be more efficient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great as it allows me to get rid of the try-catch construct.

pin = pi / (2n)
sum = 0.5 * (k + 1)
sgn = k / n
n2 = n + n -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2n-1?

catch
sum = 1.0
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a return here?

Copy link
Contributor Author

@BenjaminBorn BenjaminBorn Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I don't need one here as it returns the last value automatically. I eliminated the try-catch construct anyways. I have now also added tests to check these corner cases.

# p-vales based on Pan's algorithm (see Farebrother, 1980)

# the following setup is, e.g, described in Durbin and Watson (1971)
A = diagm(-1 * ones(x.n - 1), -1) + diagm(-1 * ones(x.n - 1), 1) +diagm(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just use -ones(x.n - 1) rather than multiplying by -1

end

pin = pi / (2n)
sum = 0.5 * (k + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically it's preferred to avoid float literals in computations such as this, since float literals are always Float64, which will promote smaller float types to Float64 when they're used together in a computation. So in this case it would be sum = (k + 1) / 2 (though also note that it doesn't really matter, since that will promote to Float64 anyway, but the general point still stands 😛). It might be worthwhile to check @code_warntype pan_algorithm(<some args>) to see whether the function remains type-stable.

Copy link
Contributor Author

@BenjaminBorn BenjaminBorn Aug 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, makes sense. Never thought about that. I had checked for type instability in the algorithm with @code_warntype before and everything looked ok. Rechecked after making the changes and it's still fine.


export DurbinWatsonTest

struct DurbinWatsonTest <: HypothesisTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package still supports Julia 0.5 but struct only parses on 0.6 and later. So this should be immutable until we drop 0.5 support.

@ararslan
Copy link
Member

Is this a general problem of the package that we should investigate

Yeah, I could be wrong but I'm pretty sure it's the fault of the general show method and not a fault of this particular test.


ν = findfirst(ai -> ai >= x, a)
if ν == 0
return sum = 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't need the sum =, right?

@BenjaminBorn
Copy link
Contributor Author

I have opened an issue about the double computation of the p-value in #105.

Copy link
Member

@ararslan ararslan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not intimately familiar with Durbin-Watson but this implementation looks good to me, plus it matches the results from R.

@BenjaminBorn
Copy link
Contributor Author

Thanks for the quick and very helpful review Alex!

@BenjaminBorn
Copy link
Contributor Author

Any other comments?

@ararslan ararslan merged commit cf392fc into JuliaStats:master Aug 22, 2017
@BenjaminBorn BenjaminBorn deleted the durbin_watson branch August 22, 2017 20:58
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's really cool! I would have had a few stylistic comments.

export DurbinWatsonTest

immutable DurbinWatsonTest <: HypothesisTest
xmat::Array{Float64} # regressor matrix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's a matrix it should be Matrix{Float64} to use a concrete type.

```
where `n` is the number of observations.

By default, the choice of approach to compute p-values depends on the sample size (`p_compute=:ndep`). For small samples (n<100), Pan's algorithm (Farebrother, 1980) is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break line at 92 chars.

](https://en.wikipedia.org/wiki/Durbin–Watson_statistic)
"""
function DurbinWatsonTest{T<:Real}(xmat::AbstractArray{T}, e::AbstractArray{T};
p_compute::Symbol = :ndep)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect indentation.

end

if exact_problem_flag == 1 || (x.p_compute == :ndep && x.n > 100
) || x.p_compute == :approx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parenthesis should be on previous line.

p_compute::Symbol = :ndep)

n = length(e)
DW = sum(diff(e) .^2) / sum(e .^2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how much it can matter for performance, but you can use sum(abs2, diff(e))/sum(abs2, e) to avoid allocating copies. (diff itself allocates a vector, which could be avoided by using a small loop.)


"""
function pan_algorithm(a::AbstractArray, x::Float64, m::Int, n::Int)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention used in Julia and in JuliaStats packages is not to add blank lines at the top nor at the end of functions.

@BenjaminBorn
Copy link
Contributor Author

Thanks Milan! I'll revisit this after the smoke has cleared from the documentation migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants