Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust eweights calculation to avoid precision issues #509

Merged
merged 10 commits into from
Nov 19, 2021

Conversation

rofinn
Copy link
Member

@rofinn rofinn commented Aug 2, 2019

The modified function is equivalent to dividing all weights by the largest value.

julia> x = [
        0.3
        0.42857142857142855
        0.6122448979591837
        0.8746355685131197
        1.249479383590171
        1.7849705479859588
        2.549957925694227
        3.642797036706039
        5.203995766722913
        7.434279666747019
        ]
10-element Array{Float64,1}:
 0.3
 0.42857142857142855
 0.6122448979591837
 0.8746355685131197
 1.249479383590171
 1.7849705479859588
 2.549957925694227
 3.642797036706039
 5.203995766722913
 7.434279666747019

julia> x ./ last(x)
10-element Array{Float64,1}:
 0.04035360699999998
 0.057648009999999965
 0.08235429999999996
 0.11764899999999996
 0.16806999999999994
 0.24009999999999995
 0.34299999999999997
 0.49
 0.7
 1.0

julia> using StatsBase
[ Info: Recompiling stale cache file /Users/rory/.julia/compiled/v1.0/StatsBase/EZjIG.ji for StatsBase [2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91]

julia> eweights(1:10, 0.3)
10-element Weights{Float64,Float64,Array{Float64,1}}:
 0.04035360699999998
 0.05764800999999997
 0.08235429999999996
 0.11764899999999996
 0.16806999999999994
 0.24009999999999995
 0.3429999999999999
 0.48999999999999994
 0.7
 1.0

@codecov
Copy link

codecov bot commented Aug 2, 2019

Codecov Report

Merging #509 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #509      +/-   ##
==========================================
+ Coverage    90.1%   90.11%   +0.01%     
==========================================
  Files          21       21              
  Lines        2021     2024       +3     
==========================================
+ Hits         1821     1824       +3     
  Misses        200      200
Impacted Files Coverage Δ
src/weights.jl 84.69% <100%> (+0.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0faf210...eef7373. Read the comment docs.

@rofinn rofinn requested a review from ararslan August 2, 2019 21:08
Copy link
Member

@ararslan ararslan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic issues aside (I don't like spaces around ^ or explicit returns inside of do blocks) this seems fine I guess?

@nickrobinson251
Copy link

Why the double negation in (1 - λ) ^ (n - i) and not just λ^i?

@rofinn
Copy link
Member Author

rofinn commented Aug 2, 2019

Cause those are different things? The goal was to minimize the potential impact on existing code. This approach simply scales the weights without changing the relative size, as noted above.

  • (1 - λ) ensures that λ holds the same meaning
  • (n - i) means we can still iterate forward through the elements.
julia> λ = 0.3
0.3

julia> [(1 - λ) ^ (10 - i) for i in 1:10]
10-element Array{Float64,1}:
 0.04035360699999998
 0.05764800999999997
 0.08235429999999996
 0.11764899999999996
 0.16806999999999994
 0.24009999999999995
 0.3429999999999999
 0.48999999999999994
 0.7
 1.0

julia> [λ ^ i for i in 1:10]
10-element Array{Float64,1}:
 0.3
 0.09
 0.026999999999999996
 0.0081
 0.0024299999999999994
 0.0007289999999999998
 0.00021869999999999995
 6.560999999999998e-5
 1.9682999999999994e-5
 5.9048999999999975e-6

@ararslan ararslan requested a review from nalimilan August 2, 2019 22:09
@nickrobinson251
Copy link

nickrobinson251 commented Aug 2, 2019

  • (1 - λ) ensures that λ holds the same meaning

I strongly dislike this meaning. λ^i is much easier to reason about... every previous observation is λ times as important. (I don't mind about the forward/reverse ordering, whatever makes sense for the most common use-case)

@rofinn
Copy link
Member Author

rofinn commented Aug 2, 2019

every previous observation is λ times as important

FWIW, since λ should alway be between 0 and 1 I think this is a fairly intuitive way to think about that parameter:

- `λ::Real`: a smoothing factor or rate parameter such that ``0 < λ ≤ 1``.
  As this value approaches 0, the resulting weights will be almost equal, As this value approaches 0, the resulting weights will be almost equal, while values closer to 1 will put greater weight on the tail elements of the vector.

I don't think it's a big deal, but it seems weird to me that you'd want to have a lower λ for a faster decay.

@nalimilan
Copy link
Member

How does this fix precision issues? Note that this is breaking, so we need a strong rationale to change the existing definition. Is the new one more standard?

@rofinn
Copy link
Member Author

rofinn commented Aug 6, 2019

How does this fix precision issues?

By limiting the range of values between 0.0 and 1.0 we're minimizing the absolute floating point error. This is also providing a more consistent range of weights regardless of λ or n for users to work with.

Note that this is breaking, so we need a strong rationale to change the existing definition. Is the new one more standard?

That's true. Unfortunately, I've managed to find at least 3-4 different formulations and seems like the preference is largely dependent on the application / field. I'd argue that compressing the weights between 0.0 and 1.0 is better for the reasons mentioned above. Also, most people won't notice the change because the relative strength of the weights hasn't changed, so all the existing weighted statistics methods will return the same result:

julia> wv1 = eweights(1:10, 0.3)
10-element Weights{Float64,Float64,Array{Float64,1}}:
 0.04035360699999998
 0.05764800999999997
 0.08235429999999996
 0.11764899999999996
 0.16806999999999994
 0.24009999999999995
 0.3429999999999999
 0.48999999999999994
 0.7
 1.0

julia> wv2 = Weights([
               0.3
               0.42857142857142855
               0.6122448979591837
               0.8746355685131197
               1.249479383590171
               1.7849705479859588
               2.549957925694227
               3.642797036706039
               5.203995766722913
               7.434279666747019
               ])
10-element Weights{Float64,Float64,Array{Float64,1}}:
 0.3
 0.42857142857142855
 0.6122448979591837
 0.8746355685131197
 1.249479383590171
 1.7849705479859588
 2.549957925694227
 3.642797036706039
 5.203995766722913
 7.434279666747019

julia> X = rand(10, 5)
10×5 Array{Float64,2}:
 0.613516   0.478427  0.278105   0.971781   0.898878
 0.501727   0.48274   0.398964   0.555583   0.309536
 0.279534   0.431118  0.15033    0.871819   0.211906
 0.0699309  0.96875   0.307077   0.0186145  0.537571
 0.139164   0.75598   0.576918   0.10567    0.820161
 0.994798   0.936011  0.256395   0.0565743  0.789837
 0.766728   0.284988  0.310023   0.240307   0.867523
 0.943843   0.448423  0.913451   0.887282   0.83511
 0.827083   0.689755  0.932156   0.912732   0.860018
 0.534114   0.733628  0.0328417  0.447575   0.257983

julia> mean(X, wv1; dims=1)
1×5 Array{Float64,2}:
 0.674775  0.642861  0.457071  0.549602  0.626412

julia> mean(X, wv2; dims=1)
1×5 Array{Float64,2}:
 0.674775  0.642861  0.457071  0.549602  0.626412

julia> cor(X, wv1)
5×5 Array{Float64,2}:
  1.0       -0.305422   0.518979   0.445195   0.568971
 -0.305422   1.0       -0.254187  -0.326943  -0.325702
  0.518979  -0.254187   1.0        0.677978   0.800088
  0.445195  -0.326943   0.677978   1.0        0.246341
  0.568971  -0.325702   0.800088   0.246341   1.0

julia> cor(X, wv2)
5×5 Array{Float64,2}:
  1.0       -0.305422   0.518979   0.445195   0.568971
 -0.305422   1.0       -0.254187  -0.326943  -0.325702
  0.518979  -0.254187   1.0        0.677978   0.800088
  0.445195  -0.326943   0.677978   1.0        0.246341
  0.568971  -0.325702   0.800088   0.246341   1.0

If we want to consider this a fully breaking change then I'm open to switching the meaning of λ.

@nalimilan
Copy link
Member

OK. I don't have a strong preference, but can you check what's the most common behavior and definition of λ in other implementations?

@rofinn
Copy link
Member Author

rofinn commented Aug 8, 2019

From what I can tell, the current meaning of λ (sometimes called α) is more standard?

The parameter λ determines the rate at which "older" data enter into the calculation of the EWMA statistic. A value of λ=1 implies that only the most recent measurement influences the EWMA (degrades to Shewhart chart). Thus, a large value of λ (closer to 1) gives more weight to recent data and less weight to older data; a small value of λ (closer to 0) gives more weight to older data. The value of λis usually set between 0.2 and 0.3 (Hunter) although this choice is somewhat arbitrary. Lucas and Saccucci (1990) give tables that help the user select λ.

https://www.itl.nist.gov/div898/handbook/pmc/section3/pmc324.htm

For any α between 0 and 1, the weights attached to the observations decrease exponentially as we go back in time, hence the name “exponential smoothing”. If α is small (i.e., close to 0), more weight is given to observations from the more distant past. If α is large (i.e., close to 1), more weight is given to the more recent observations.

https://otexts.com/fpp2/ses.html

A few other implementations:

@nalimilan
Copy link
Member

OK, cool. What about the normalization to 0:1 that this PR introduces?

@rofinn
Copy link
Member Author

rofinn commented Aug 9, 2019

I'm having a harder time finding a consistent pattern for that. Many algorithms seem to just have large numbers and others normalize the weights so that they sum to 1.0. I'm not sure which is the best approach, but I feel like having large floating point weights probably isn't a good idea if we can compress them. I could be convinced that having them sum to 1.0 is a good idea.

@nalimilan
Copy link
Member

Mmm, hard to decide then. Is there disagreement among major implementations (R, NumPy...)? An alternative possibility is to have a keyword argument to normalize, but keep it false by default.

@rofinn
Copy link
Member Author

rofinn commented Aug 12, 2019

This issue is that most other libraries just have an ewma style method which combines the data with the weighting rather than having independent weights. I feel like bounding the weights to 0:1 results in more understandable output and doesn't cost us anything, but maybe I'm biased by the fact that I think of eweights as being analytic/reliability weights. I'd be okay with using the keyword as a deprecation path though.

@nalimilan
Copy link
Member

OK, as you prefer. Having a keyword argument sounds good, whatever the default we retain.

src/weights.jl Outdated Show resolved Hide resolved
src/weights.jl Outdated Show resolved Hide resolved
@Nosferican
Copy link
Contributor

It is important to document how does the sum of the weights gets impacted by the normalization. If not mistaken, the sum of weights is not consistent and will depend on whether that weight type normalizes or not. For now, it should at least be mentioned in the docs.

@rofinn
Copy link
Member Author

rofinn commented Aug 23, 2019

FWIW, the initial version of this, that I wrote for our internal code base, would produce weights that summed to 1.0. I think @ararslan was the one who thought that was wrong, so maybe I can find the discussion?

@rofinn
Copy link
Member Author

rofinn commented Aug 23, 2019

Alright, couldn't find any discussion, but the original code we had internal looked like this:

function ExponentialWeights(n::Integer, λ::Real)
    n > 0 || throw(ArgumentError("cannot construct weights of length < 1"))
    0 < λ <= 1 || throw(ArgumentError("smoothing factor must be between 0 and 1"))
    w0 = map(i -> λ * (1 - λ)^(1 - i), 1:n)
    s = sum(w0)
    ExponentialWeights(w0 / s)
end

@kleinschmidt kleinschmidt added the breaking A breaking change that needs a semver major release label Feb 16, 2020
@nalimilan
Copy link
Member

Do you want to merge this in the end or not? In the perspective of moving features to Statistics (https://github.com/JuliaLang/Statistics.jl/issues/87) I'm checking which breaking changes we may want to do (as after that it will be too late).

@rofinn
Copy link
Member Author

rofinn commented Oct 1, 2021

Sorry, I wasn't clear if further consensus was needed for this PR. Sure, I can fix this up for a breaking release. I think it's probably still worth having both the values and the smoothing factor stay between 0 and 1 given the links I posted above. That being said, we could do something like what pandas does and accept multiple parameters either the normal smoothing parameter 0 < α <= 1 (ratio to increase at each step) or a growth rate 𝑟 > 1 which is equivalent to 𝑟^i, but normalized by 𝑟^n (ie: α = 1 - 1 / 𝑟)? Supporting both might be a more breaking change, but also provide a clearer deprecation path?

@nalimilan
Copy link
Member

AFAICT there are two different issues here, right?

  • Whether to normalize between 0 and 1. There doesn't seem to be any opposition to this, though it can't be merged before a breaking release (it would still be good to have the PR ready for the day we want to make such a release).
  • What parameterization to use for λ. Changing this would be really breaking, so instead what you propose about adding multiple keyword arguments like pandas sounds better. But this is better done in another PR, which could be merged before this one as it wouldn't be breaking at all.

@rofinn
Copy link
Member Author

rofinn commented Oct 4, 2021

Alright, I've revised this PR to be non-breaking by using a scaled::DepBool=nothing kwarg. I'm just gonna rebase with master resolve the conflicts.

The modified function is equivalent to dividing all weights by the largest value.

```
julia> x = [
        0.3
        0.42857142857142855
        0.6122448979591837
        0.8746355685131197
        1.249479383590171
        1.7849705479859588
        2.549957925694227
        3.642797036706039
        5.203995766722913
        7.434279666747019
        ]
10-element Array{Float64,1}:
 0.3
 0.42857142857142855
 0.6122448979591837
 0.8746355685131197
 1.249479383590171
 1.7849705479859588
 2.549957925694227
 3.642797036706039
 5.203995766722913
 7.434279666747019

julia> x ./ last(x)
10-element Array{Float64,1}:
 0.04035360699999998
 0.057648009999999965
 0.08235429999999996
 0.11764899999999996
 0.16806999999999994
 0.24009999999999995
 0.34299999999999997
 0.49
 0.7
 1.0

julia> using StatsBase
[ Info: Recompiling stale cache file /Users/rory/.julia/compiled/v1.0/StatsBase/EZjIG.ji for StatsBase [2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91]

julia> eweights(1:10, 0.3)
10-element Weights{Float64,Float64,Array{Float64,1}}:
 0.04035360699999998
 0.05764800999999997
 0.08235429999999996
 0.11764899999999996
 0.16806999999999994
 0.24009999999999995
 0.3429999999999999
 0.48999999999999994
 0.7
 1.0
```
src/weights.jl Outdated
- https://en.wikipedia.org/wiki/Exponential_smoothing
```
"""
function eweights(t::AbstractVector{<:Integer}, λ::Real, n::Integer; scaled::DepBool=nothing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this scale? We already use this term for transformations (https://juliastats.org/StatsBase.jl/latest/transformations/).

src/weights.jl Outdated
eweights(t::AbstractVector{<:Integer}, λ::Real)
eweights(t::AbstractVector{T}, r::StepRange{T}, λ::Real) where T
eweights(n::Integer, λ::Real)
eweights(t::AbstractVector{<:Integer}, λ::Real, [n::Integer]; scaled=false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to allow passing n separately here? Isn't it always equal to length(t)?

Copy link
Member Author

@rofinn rofinn Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't always hold for cases where you're skipping elements (ie: [1, 2, 4, 5], but you want n to be 5) The most common use case for us is passing a set of actual dates t and specifying our expect date range r. The previous code wouldn't handle missing entries in t correctly, hence why I added this addition and didn't document it on the first pass (used with eweights(t, r λ)).

@rofinn
Copy link
Member Author

rofinn commented Oct 4, 2021

NOTE: Nightly seems to be failing for an unrelated reason.

src/weights.jl Outdated
@@ -227,37 +231,55 @@ For each element `i` in `t` the weight value is computed as:
As this value approaches 0, the resulting weights will be almost equal,
while values closer to 1 will put greater weight on the tail elements of the vector.

# Keyword arguments

- `scale::Bool`: Whether are not to return the scaled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete sentence. Also repeat the default value.

src/weights.jl Outdated
Comment on lines 214 to 215
If an integer `n` is provided, weights are generated for values from 1 to `n`
(equivalent to `t = 1:n`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If an integer `n` is provided, weights are generated for values from 1 to `n`
(equivalent to `t = 1:n`).
If an integer `n` is provided and `t` is omitted, weights are generated for values from 1 to `n`
(equivalent to `t = 1:n`).

Should also describe what's the default value for n when t is provided.

BTW, the order of t, n and λ is inconsistent across signatures, but I guess we can't do much about it without breaking backward compatibility?

Copy link
Member Author

@rofinn rofinn Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. I added the (t, λ, n) method more out of convenience, but if you think it would be more consistent I could merge it with the existing (t, λ) method which just calls extrema to get the default n value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I believe the signatures are the same now, apart from the scaled kwarg.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks. The value of n when only t is provided still needs documentation though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. FWIW, I'm starting to dislike the n and r convenience constructors. The range syntax is pretty terse already, and something.(indexin(t, r)) doesn't seem that verbose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to make a deprecation PR if you would prefer dropping this in the next breaking release. Though I'm not sure they really hurt, as long as they don't introduce inconsistencies or ambiguities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they don't hurt and they are logical extensions of the primary method, it's just that this docstring feels a little verbose now :)

@rofinn
Copy link
Member Author

rofinn commented Nov 8, 2021

FWIW, I think we can remove the breaking label for this PR. I'm pretty sure we're just introducing a deprecation now.

src/weights.jl Outdated
Comment on lines 214 to 218
The integer value of `n` represents the number of past observations to consider, which defaults to:

1. `maximum(t) - minimum(t) + 1` if only `t` is passed in and the elements are integers
2. `length(r)` if a superset range `r` is passed in
3. `n` is explicitly passed instead of `t` (equivalent to `t = 1:n`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This describes default values for both n and t, so I suggest changing the wording a bit. For example:

Suggested change
The integer value of `n` represents the number of past observations to consider, which defaults to:
1. `maximum(t) - minimum(t) + 1` if only `t` is passed in and the elements are integers
2. `length(r)` if a superset range `r` is passed in
3. `n` is explicitly passed instead of `t` (equivalent to `t = 1:n`)
The integer value `n` represents the number of past observations to consider.
`n` defaults to `maximum(t) - minimum(t) + 1` if only `t` is passed in
and the elements are integers, and to `length(r)` if a superset range `r`
is also passed in.
If `n` is explicitly passed instead of `t`, `t` defaults to `1:n`.

@nalimilan nalimilan removed the breaking A breaking change that needs a semver major release label Nov 8, 2021
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's merge this and then we can file an issue/PR making the breaking change for the next breaking release.

@rofinn
Copy link
Member Author

rofinn commented Nov 18, 2021

Okay, if there aren't any other comments I can merge this tomorrow. Do we typically bump the release version in the PR here? I also appear to have merge permissions, but I don't know if I can register the change?

@nalimilan
Copy link
Member

Feel free to bump the patch version in the PR. Registering it should also work.

@rofinn rofinn merged commit 1678fd1 into master Nov 19, 2021
@rofinn rofinn deleted the rf/normalize-eweights branch November 19, 2021 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants