 ### Geometric mean

The _geometric mean_ $GM$ of nonnegative numbers $x_1, x_2, \dots x_n$ is the nth root of their product.  Specifically
$$
   GM(x_1, x_2, \dots x_n) = \left( \prod_{k=1}^n x_k \right)^{1/n}
$$

One property of the geometric mean is that for all nonnegative numbers $x$, we have $GM(x,x) = x$ is an identity. 

Another property of the geometric mean is that any one of the numbers $x_1, x_2, \dots x_n$ is zero, then the geometric mean of these numbers is zero.

Our first effort for a Julia function for the geometric mean is little more than a direct translation of the definition.
The Julia function `prod` returns the product of the members of an array.

In [1]:
function geometricMean_0(a)
    prod(a)^(1/length(a))  
end

geometricMean_0 (generic function with 1 method)

Two simple tests show that our function works OK

In [2]:
(geometricMean_0([4,45]), sqrt(4*45))

(13.416407864998739, 13.416407864998739)

In [3]:
(geometricMean_0([1,2,3,4]), (1*2*3*4)^(1/4))

(2.2133638394006434, 2.2133638394006434)

In [4]:
(geometricMean_0([0,1,2,3,4]), (0*1*2*3*4)^(1/5))

(0.0, 0.0)

But a bit more testing shows that our code can overflow giving suboptimal results; for example

In [5]:
geometricMean_0([1.0e155, 1.0e155])

Inf

This result violates the identity $GM(x,x) = x$.  We really should do better.  A typical way to fix this overflow
problem is to use the fact that the logarithm of a product of positive numbers is the sum of the logarithms.  This 
gives an alternative formula for the geometric sum
$$
   GM(x_1, x_2, \dots x_n) = \exp(\frac{1}{n}  \left( \sum_{k=1}^n \ln(x_k) \right))
$$
A simple implementation of this method is

In [1]:
function geometricMean_1(a)
    exp(sum(map(log,a))/length(a))
end

geometricMean_1 (generic function with 1 method)

In Julia `log` is the natural logarithm. Many computer languages use `ln` for the natural logarithm. The Julia function `map` applies a function to each member of an array and the Julia function `sum` adds the members of an array. Finally,
`exp` is the natural exponential function.

In [3]:
(geometricMean_1([4,45]), sqrt(4*45))

DomainError: DomainError with -4.0:
log will only return a complex result if called with a complex argument. Try log(Complex(x)).

In [8]:
(geometricMean_1([1,2,3,4]), (1*2*3*4)^(1/4))

(2.213363839400643, 2.2133638394006434)

We have resolved the overflow problem, but arguably our function isn't as accurate as it might be

In [9]:
geometricMean_1([1.0e155, 1.0e155])

9.999999999999973e154

In [10]:
geometricMean_1([0,1,2,3])

0.0

To see what happens, we can work through the calculation one step at a time

In [11]:
x = map(log,[0,1,2,3])

4-element Vector{Float64}:
 -Inf
   0.0
   0.6931471805599453
   1.0986122886681098

In [12]:
x = sum(x)

-Inf

In [13]:
x = x/4

-Inf

In [14]:
exp(x)

0.0

In Julia `log(0) = -Inf` and `exp(-Inf) = 0`.

For an array with many elements, computing the logarithm is a bit spendy. Can we avoid overflow without using the 
logarithm trick? Sure, we'll just loop through the array members and after each partial product, we'll extract
the exponent and the significand of the partial product.  We'll keep a running sum of the exponent as a 64 bit integer.

In [15]:
function geometricMean2(a::Array)
    e = Int64(0)
    s = one(a[1])
    for x in a
        s *= x
        e += exponent(s)
        s = significand(s)
     end
    n = length(a)
    2.0^(e/n) * s^(1/n)
end
    

geometricMean2 (generic function with 1 method)

Simple checks for overflow

In [16]:
geometricMean2([1.0e155, 1.0e154,1.0e154])

2.154434690031827e154

In [17]:
geometricMean2([1.0e308, 1.0e308, 1.0e308])

1.0e308

In [18]:
geometricMean_1([0,1,2,3])

0.0

In [19]:
import Pkg

In [20]:
Pkg.add("StatsBase")

[32m[1m   Resolving[22m[39m package versions...
[32m[1m  No Changes[22m[39m to `C:\Users\willisb\.julia\environments\v1.10\Project.toml`
[32m[1m  No Changes[22m[39m to `C:\Users\willisb\.julia\environments\v1.10\Manifest.toml`


Once we have downloaded and installed this package, we don't need to do these steps again. All we need to is to load
the package:

In [21]:
using StatsBase

In [22]:
L = rand(Float64,10^7);

In [23]:
@time x = geometricMean2(L)

  0.033825 seconds (1 allocation: 16 bytes)


0.3679772281653708

In [24]:
@time x = geometricMean2(L)

  0.034154 seconds (1 allocation: 16 bytes)


0.3679772281653708

In [25]:
@time y = geomean(L)

  0.164373 seconds (48.63 k allocations: 3.326 MiB, 46.59% compilation time)


0.3679772281653709

In [26]:
@time y = geomean(L)

  0.086302 seconds (1 allocation: 16 bytes)


0.3679772281653709