Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix total mean calculation in ANOVA #273

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/var_equality.jl
Expand Up @@ -60,7 +60,7 @@ end
function anova(scores::AbstractVector{<:Real}...)
Nᵢ = [length(g) for g in scores]
Z̄ᵢ = mean.(scores)
Z̄ = mean(Z̄ᵢ)
Z̄ = sum(Iterators.flatten(scores))/sum(Nᵢ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, one could use

Suggested change
= sum(Iterators.flatten(scores))/sum(Nᵢ)
= dot(Z̄ᵢ, Nᵢ) / sum(Nᵢ)

In a quick benchmark this seemed to be similarly fast, and usually even marginally faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd probably need a very large dataset to see the difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually a tiny dataset is sufficient. Of course, the difference is very small but it seemed to be consistent:

julia> using Statistics, LinearAlgebra, BenchmarkTools

julia> function f(scores::AbstractVector{<:Real}...)
           Nᵢ = [length(g) for g in scores]
           Z̄ᵢ = mean.(scores)
           Z̄ = sum(Iterators.flatten(scores)) / sum(Nᵢ)
           return Nᵢ, Z̄ᵢ, Z̄
       end
f (generic function with 1 method)

julia> function g(scores::AbstractVector{<:Real}...)
           Nᵢ = [length(g) for g in scores]
           Z̄ᵢ = mean.(scores)
           Z̄ = dot(Z̄ᵢ, Nᵢ) / sum(Nᵢ)
           return Nᵢ, Z̄ᵢ, Z̄
       end
g (generic function with 1 method)

julia> scores = map(n -> rand(n), (3, 9, 12));

julia> @btime f($(scores...));
  33.893 ns (1 allocation: 64 bytes)

julia> @btime g($(scores...));
  33.687 ns (1 allocation: 64 bytes)

julia> scores = map(n -> rand(n), (3, 9, 12, 134));

julia> @btime f($(scores...));
  33.944 ns (1 allocation: 64 bytes)

julia> @btime g($(scores...));
  33.681 ns (1 allocation: 64 bytes)

julia> scores = map(n -> rand(n), (3, 9, 12, 134, 12, 4134, 1231, 122, 12, 1, 23, 58));

julia> @btime f($(scores...));
  34.070 ns (1 allocation: 64 bytes)

julia> @btime g($(scores...));
  33.781 ns (1 allocation: 64 bytes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like something else from Z evaluation dominates in the function. But nanoseconds 😏.

SStᵢ = Nᵢ .* (Z̄ᵢ .- Z̄).^2
SSeᵢ = sum.( (z .- z̄).^2 for (z, z̄) in zip(scores, Z̄ᵢ) )
(Nᵢ, SStᵢ, SSeᵢ)
Expand Down
3 changes: 2 additions & 1 deletion test/var_equality.jl
Expand Up @@ -31,7 +31,8 @@ using DelimitedFiles
t = OneWayANOVATest(groups...)
@test nobs(t) == [7, 8, 8, 6]
@test dof(t) == (3,25)
@test pvalue(t) ≈ 0.072 atol=1e-3
@test pvalue(t) ≈ 0.07276 atol=1e-6
@test HypothesisTests.teststatistic(t) ≈ 2.62311 atol=1e-6
@test occursin("reject h_0", sprint(show, t))

# http://www.real-statistics.com/one-way-analysis-of-variance-anova/homogeneity-variances/levenes-test/
Expand Down