Moving averaging schemes (exponentially weighted, polynomial-decay). Running averages only are maintained; no history of values is stored.
from average import EWMA
# Create a scalar running average.
# beta=0.5 is the smoothing factor.
avg = EWMA(beta=0.5)
avg.update(1)
avg.update(2)
print(avg.get()) # Prints 1.6666666666666667.
The average is weighted toward the most recent value. That is, its value is 1 * 1/3 + 2 * 2/3
. The default value for beta
is 0.9, which is reasonable for many uses. Higher smoothing values increase the amount of weight put on less recent values in the average.
You can also create running averages shaped like NumPy arrays by supplying a shape and dtype (defaults to numpy.float64
). For example:
avg = EWMA((4, 4), np.float64)
# or, equivalently:
avg = EWMA.like(np.eye(4))
avg.update(np.eye(4))
print(avg.get()) # Prints a 4x4 identity matrix.
For example, you could use EWMA
to maintain a running average of HxWx3 video frames, if the frames are in NumPy array format or convertible to it.
With the polynomial decay parameter eta
set to the default value of 0, PDMA
acts as a simple average (averages equally over all previous values). A history of values is not kept.
from average import PDMA
avg = PDMA()
avg.update(1)
avg.update(2)
avg.update(3)
print(avg.get()) # Prints 2.0.
Higher values of eta
correspond to a polynomially decaying window of degree eta
stretching back over all previous values. The higher eta
is set, the more weight is placed on more recent values.
avg = PDMA(eta=1)
for i in range(1, 5):
avg.update(i)
print(avg.get()) # Prints 3.0.
In our example, setting eta
to 0 would instead have printed the simple average 2.5. eta
can be set arbitrarily high, but 0, 1, and 3 are probably reasonable values for many uses. Similarly to EWMA
, PDMA
running averages can be shaped like NumPy arrays and have a NumPy data type (not shown).
The formula for an exponentially weighted average with initialization bias correction is given in "Adam: A Method for Stochastic Optimization" by Kingma and Ba.
The formula for polynomial-decay averaging is given in section 4 of "Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes" by Shamir and Zhang.