Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix total mean calculation in ANOVA #273

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wildart
Copy link
Contributor

@wildart wildart commented May 28, 2022

Fix for #242

@codecov-commenter
Copy link

codecov-commenter commented May 28, 2022

Codecov Report

Merging #273 (f0ff08e) into master (be980f3) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #273   +/-   ##
=======================================
  Coverage   93.65%   93.65%           
=======================================
  Files          28       28           
  Lines        1717     1717           
=======================================
  Hits         1608     1608           
  Misses        109      109           
Impacted Files Coverage Δ
src/var_equality.jl 98.24% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be980f3...f0ff08e. Read the comment docs.

@@ -60,7 +60,7 @@ end
function anova(scores::AbstractVector{<:Real}...)
Nᵢ = [length(g) for g in scores]
Z̄ᵢ = mean.(scores)
Z̄ = mean(Z̄ᵢ)
Z̄ = sum(Iterators.flatten(scores))/sum(Nᵢ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, one could use

Suggested change
= sum(Iterators.flatten(scores))/sum(Nᵢ)
= dot(Z̄ᵢ, Nᵢ) / sum(Nᵢ)

In a quick benchmark this seemed to be similarly fast, and usually even marginally faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd probably need a very large dataset to see the difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually a tiny dataset is sufficient. Of course, the difference is very small but it seemed to be consistent:

julia> using Statistics, LinearAlgebra, BenchmarkTools

julia> function f(scores::AbstractVector{<:Real}...)
           Nᵢ = [length(g) for g in scores]
           Z̄ᵢ = mean.(scores)
           Z̄ = sum(Iterators.flatten(scores)) / sum(Nᵢ)
           return Nᵢ, Z̄ᵢ, Z̄
       end
f (generic function with 1 method)

julia> function g(scores::AbstractVector{<:Real}...)
           Nᵢ = [length(g) for g in scores]
           Z̄ᵢ = mean.(scores)
           Z̄ = dot(Z̄ᵢ, Nᵢ) / sum(Nᵢ)
           return Nᵢ, Z̄ᵢ, Z̄
       end
g (generic function with 1 method)

julia> scores = map(n -> rand(n), (3, 9, 12));

julia> @btime f($(scores...));
  33.893 ns (1 allocation: 64 bytes)

julia> @btime g($(scores...));
  33.687 ns (1 allocation: 64 bytes)

julia> scores = map(n -> rand(n), (3, 9, 12, 134));

julia> @btime f($(scores...));
  33.944 ns (1 allocation: 64 bytes)

julia> @btime g($(scores...));
  33.681 ns (1 allocation: 64 bytes)

julia> scores = map(n -> rand(n), (3, 9, 12, 134, 12, 4134, 1231, 122, 12, 1, 23, 58));

julia> @btime f($(scores...));
  34.070 ns (1 allocation: 64 bytes)

julia> @btime g($(scores...));
  33.781 ns (1 allocation: 64 bytes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like something else from Z evaluation dominates in the function. But nanoseconds 😏.

@nalimilan
Copy link
Member

Could you also add a test that failed before the PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants