Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findmin implementation #455

Closed
wants to merge 6 commits into from
Closed

Conversation

jverzani
Copy link
Member

@jverzani jverzani commented Jan 28, 2023

This is suggested in #454

It basically implements the findmin(f, domain) generic. (Well, not quite. With a function and iterable, it returns an index not a value. This returns the value)

@codecov
Copy link

codecov bot commented Jan 28, 2023

Codecov Report

Base: 83.23% // Head: 83.35% // Increases project coverage by +0.12% 🎉

Coverage data is based on head (d7b8633) compared to base (efeb466).
Patch coverage: 91.30% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #455      +/-   ##
==========================================
+ Coverage   83.23%   83.35%   +0.12%     
==========================================
  Files          22       22              
  Lines        2994     3040      +46     
==========================================
+ Hits         2492     2534      +42     
- Misses        502      506       +4     
Impacted Files Coverage Δ
src/common.jl 90.02% <91.30%> (+0.15%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@singularitti
Copy link
Contributor

singularitti commented Jan 28, 2023

May I propose a more unified API? We could also extend findmax. And what is more important, we could define a pair of functions islocalmin and islocalmax. And with the help of that, we could define

findall(islocalmin, poly)
findall(islocalmax, poly)
findfirst(islocalmin, poly)
findfirst(islocalmax, poly)
findlast(islocalmin, poly)
findlast(islocalmax, poly)
findnext(islocalmin, poly, i)
findnext(islocalmax, poly, i)
findprev(islocalmin, poly, i)
findprev(islocalmax, poly, i)

I am no expert in polynomials, but I could help to make a PR for such API extension.

@jverzani
Copy link
Member Author

Hmm, I'll have to mull this over, Certainly findmax should be added if findmin is. But putting the polynomial in the iterator position is not quite in line with the generic, as iteration over a polynomial in this package means iterate over the coefficients. Something like findall is like findall(predicate, iterator) so it would be better to have an iterator over local min, maybe? That would be easy enough, as that is just re-wrapping the first-derivative test code I started with..

@singularitti
Copy link
Contributor

Please give me a few minutes to try to propose a new API.

@jverzani
Copy link
Member Author

Sure

@singularitti
Copy link
Contributor

singularitti commented Jan 29, 2023

Ok, I found that findall, findnext, etc., are usually for arrays, and based on the reason you said above, we cannot use them. There is another way to do it: create a new iterable type (e.g., EachPoint) and iterate over it.

export iscriticalpoint, islocalmin, islocalmax, eachcriticalpoint, eachlocalmin, eachlocalmax

function iscriticalpoint end

function islocalmin end

function islocalmax end

struct EachPoint{F,T<:Polynomial}
    f::F
    poly::T
end

eachcriticalpoint(poly::Polynomial) = EachPoint(iscriticalpoint, poly)

eachlocalmin(poly::Polynomial) = EachPoint(islocalmin, poly)

eachlocalmax(poly::Polynomial) = EachPoint(islocalmax, poly)

function Base.iterate(iter::EachPoint{typeof(iscriticalpoint)})
    # If there are no critical points, return nothing
end
function Base.iterate(iter::EachPoint{typeof(iscriticalpoint)}, state)
    # First determine how many critical points there are
    if state > length(iter)
        return nothing
    else
        return x, state + 1  # `x` is the `state`th critical point
    end
end
function Base.iterate(iter::EachPoint{typeof(islocalmin)})
    # If there are no local minima, return nothing
end
function Base.iterate(iter::EachPoint{typeof(islocalmin)}, state)
    if state > length(iter)
        return nothing
    else
        return x, state + 1  # `x` is the `state`th local minimum
    end
end
function Base.iterate(iter::EachPoint{typeof(islocalmax)})
    # If there are no local maxima, return nothing
end
function Base.iterate(iter::EachPoint{typeof(islocalmax)}, state)
    if state > length(iter)
        return nothing
    else
        return x, state + 1  # `x` is the `state`th local maximum
    end
end

Base.length(iter::EachPoint) = 100  # Calculate how many critical points there are

I have not thought about it thoroughly, since I do not know how you calculate the critical points, local minima, and local maxima. I think you calculate all the critical points at once, then check if each one of them is a local minimum or maximum, is it? The Base.length for EachPoint{typeof(iscriticalpoint)} can be defined. But the lengths of EachPoint{typeof(islocalmin)} and EachPoint{typeof(islocalmax)} are not. They will be suitable if we calculate local minima/maxima one by one.

This idea is from IterTools.fieldvalues, and I have been using such API for a while.

@jverzani
Copy link
Member Author

An iterator over local minima might look like this:


# Iterator over local minima
struct LocalMinPoints{T}
    pts::Vector{T}
end

function LocalMinPoints(p::AbstractPolynomial{T}) where {T <: Real}

    p′ = derivative(p)

    
    critical_pts = sort(real.(filter(isreal, roots(p′))))
    n = length(critical_pts)
    n <= 1 && return LocalMinPoints(critical_pts)

    S = eltype(critical_pts)
    δ, ϵ = eps(S), Base.rtoldefault(S)

    r, rst... = critical_pts
    pts = [r]
    for i ∈ 1:(n-1)
        r,s = critical_pts[i], critical_pts[i+1]
        !isapprox(r, s; atol=δ, rtol=ϵ) && push!(pts, s)
        r = s
    end

    n, aₙ = degree(p′), p′[end]
    lsgn = ifelse((iseven(n) && aₙ < 0) || (isodd(n) && aₙ > 0), -one(S), one(S))

    lpts = similar(pts, 0)
    for i ∈ 1:(length(pts)-1)
        r,s = pts[i], pts[i+1]
        Δ = s/2 - r/2
        y = p′(r + Δ)
        rsgn = abs(y) <= δ + (r + Δ) * ϵ ? zero(S) : y < 0 ? -one(S) : one(S)
        lsgn * rsgn < 0 && push!(lpts, r)
        lsgn = rsgn
    end
    rsgn = aₙ > 0 ? one(S) : -one(S)
    lsgn * rsgn < 0 && push!(lpts, pts[end])

    LocalMinPoints(pts)
end

Base.iterate(l::LocalMinPoints) = iterate(l.pts)
Base.iterate(l::LocalMinPoints, state) = iterate(l.pts, state)
Base.length(l::LocalMinPoints) = length(l.pts)

The ugly part is the first derivative test along with worrying a bit about floating point issues. A LocalMax iterator would be easily defined using -p. (Maybe it would make sense to use the same type and different names for the constructor). But with this, you could implement some of what you want. e.g, findnext, etc.

I don't mind implementing these, but I'm a bit wary about exporting the constructor, as it isn't really a specialization of some generic function I know of.

@singularitti
Copy link
Contributor

singularitti commented Jan 29, 2023

Forgive me if I do not understand your code very well.
When you say "floating point issues", do you mean the inaccuracy of function roots?
And this part:

    r, rst... = critical_pts
    pts = [r]
    for i  1:(n-1)
        r,s = critical_pts[i], critical_pts[i+1]
        !isapprox(r, s; atol=δ, rtol=ϵ) && push!(pts, s)
        r = s
    end

Are you comparing two roots? And if they are too close to distinguish, just discard one of them?

And at this part, I am completely lost:

    lpts = similar(pts, 0)
    for i  1:(length(pts)-1)
        r,s = pts[i], pts[i+1]
        Δ = s/2 - r/2
        y = p′(r + Δ)
        rsgn = abs(y) <= δ + (r + Δ) * ϵ ? zero(S) : y < 0 ? -one(S) : one(S)
        lsgn * rsgn < 0 && push!(lpts, r)
        lsgn = rsgn
    end
    rsgn = aₙ > 0 ? one(S) : -one(S)
    lsgn * rsgn < 0 && push!(lpts, pts[end])

Are you testing each critical point in a small neighborhood with radius Δ? Can we just compute the second derivative of that polynomial at that critical point (as I did in my code) since all polynomials are continuous and smooth everywhere?

I do not think export the constructor is an issue here, as I wrote above, we could write a fake function eachcriticalpoint for EachPoint to imitate eachcol functions defined in Base.

@jverzani
Copy link
Member Author

Yeah, the issues with roots are many:

  • first, the second derivative is only determinative if f''(r) is not 0. But what does that mean? f''(r) is subject to calculation errors like f'''(r) * r * eps(), so testing for 0 needs a tolerance. Then if f''(r) is identified as 0, you can't remove it from consideration. This is why the first derivative test is better used. For that, you still need something to consider if f' is positive, negative, or zero. Again even if mathematically 0, the computed value can be as big as f''(r)*abs(r) * eps(), so there is a tolerance check.
  • But the bigger issue is likely multiplicities. Consider this simple case:
p = integrate(fromroots([pi, ℯ, pi]))

Great, critical points are pi and e, and since pi is double, it is not a local max. But taking roots gives:

julia> roots(derivative(p))
3-element Vector{ComplexF64}:
  2.718281828458972 + 0.0im
 3.1415926535898264 - 2.2207909901031392e-7im
 3.1415926535898264 + 2.2207909901031392e-7im

Oops, it won't be identified. We have the method multroot, which does get this right:

julia> Polynomials.Multroot.multroot(derivative(p))
(values = [2.7182818284590433, 3.141592653589794], multiplicities = [1, 2], κ = 56.26785197710239, ϵ = 1.3242378009143006e-16)

But that misses some other roots it should get when there aren't multiplicities, though perhaps that is best to identify the critical points (instead of checking nearby roots).

This is part of the reluctance to add this, as it may not yield correct answers in fairly simple cases.

@singularitti
Copy link
Contributor

singularitti commented Jan 29, 2023

Oh, I was thinking we may not want to get all the critical points in one go (because it takes time to do so), so we needed some kind of iterative steps to find them one by one (lazily). But is it true that we find them all at once? If so, there is no need for findall, findnext, findprev interface.

I like the current implementation very much. If I understand correctly, you use min_max to find all critical points in one go, then use their values to implement findmax, etc., right?

@singularitti
Copy link
Contributor

However, there is one potential performance issue: if we call findmin, findmax, etc., multiple times on the same polynomial, min_max will be called multiple times but only one of its results will be used. It could be better if we have a cache mechanism.

@jverzani
Copy link
Member Author

True, extrema returns both so one wouldn't necessarily call them multiple times, though no exported method returns both the values and the arguments where they happen for both (min_max's job). But is there a use case where one wouldn't just save the computed value?

@singularitti
Copy link
Contributor

But is there a use case where one wouldn't just save the computed value?

True. Maybe we could write something in the docs like: If you only want to calculate x, use argmin/argmax; to calculate minima and maxima, use extrema; to calculate both, use findmin/findmax, avoid using them multiple times.

@singularitti
Copy link
Contributor

singularitti commented Jan 29, 2023

Also, could we add two more methods like

minimum(poly, xs)
maximum(poly, xs)

which returns the smallest/largest result of calling the polynomial poly on each element of xs? Because extrema is just (minimum(poly, xs), maximum(poly, xs)). It seems wierd to have extrema but no minimum and maximum.

findmin(p::AbstractPolynomial{<:Real}, I=domain(p)) -> (f(x), x)
findmax(p::AbstractPolynomial{<:Real}, I=domain(p)) -> (f(x), x)
minimum(p::AbstractPolynomial{<:Real}, I=domain(p)) -> m
maximum(p::AbstractPolynomial{<:Real}, I=domain(p)) -> M
extrema(p::AbstractPolynomial{<:Real}, I=domain(p)) -> (m, M)
argmin(p::AbstractPolynomial{<:Real}, I=domain(p)) -> x
argmax(p::AbstractPolynomial{<:Real}, I=domain(p)) -> x

Then the API seems to be complete.

@singularitti
Copy link
Contributor

Also, a small suggestion to the naming of variables: In Julia's official doc, they use mn and mx as the minimum and maximum values:

extrema(f, itr; [init]) -> (mn, mx)

Could we use them also?

@singularitti
Copy link
Contributor

I have re-read the doc of extrema, minimum, and maximum, it seems that their API

extrema(f, itr) -> (mn, mx)
maximum(f, itr)
minimum(f, itr)

means calling f on an iterable itr, and comparing these results f.(itr), then find the maximum/minimum values. It is a little bit different from what findmax/argmax, etc., intend to do: they are evaluated on a domain.

Should we respect the intention of the API? I mean, should we define

function Base.maximum(poly::Polynomial, itr)
    ys = poly.(iter)
    return maximum(poly)
end

or keep the current implementation?

Sorry if I am being too picky.

@singularitti
Copy link
Contributor

OK, it seems that if I call maximum(f, itr), it will just fall back to Julia's default definition of maximum.

julia> maximum(p, 1:10)
4321

julia> @which maximum(p, 1:10)
maximum(f, a::AbstractArray; dims, kw...) in Base at reducedim.jl:995

julia> maximum(p, Tuple(1:10))
4321

julia> @which maximum(p, Tuple(1:10))
maximum(f, a; kw...) in Base at reduce.jl:698

@jverzani
Copy link
Member Author

Maybe if someone is going to redo these things, a critical_points function might be useful. I just broke that out, but below it can be generalized. With just that function, a user could get most of this functionality of this PR with just this:

using Polynomials

function critical_points(p, l=-Inf, r=Inf)
    q = Polynomials.ngcd(derivative(p), derivative(p,2)).v
    rs = real.(filter(isreal, roots(q)))
    pts = Iterators.flatten((l, r, rs))
    sort(collect(Iterators.filter(x -> l ≤ x ≤ r, pts)))
end

pts = critical_points(p)
findmin(p, pts) # (-13.016666666666666, 3)
argmin(p, pts)  # -0.9999999999999993

@jverzani
Copy link
Member Author

Thanks for all this. Still things to think about :)

@jverzani
Copy link
Member Author

Using a different approach with just a critical_points function

@jverzani jverzani closed this Jan 31, 2023
@jverzani jverzani deleted the issue_454 branch January 31, 2023 22:34
@jishnub
Copy link
Contributor

jishnub commented Feb 1, 2023

Just in case this is reconsidered, perhaps argmin might be more suitable than findmin, as it returns the value in the domain and not the index.

@jverzani
Copy link
Member Author

jverzani commented Feb 1, 2023

Yeah, this PR would have done argmin as well, but adding a feature to find the critical points seems like a better solution, then using the XXXmin/max(f, itr) interfaces just work for these tasks

While I have your attention, the code to find the critical points first reduces the polynomial with the numeric gcd to hopefully reduce the issue with multiple roots being identified as complex values. However, that code takes some real time to compile so the first use is really sluggish. I also added a small amount of pre-compile code to reduce that, but that in turn increases the loading time. It was around 300ms, it might now be about 500ms. I wonder if that is a big price to pay for something that isn't likely to be used by most. Any thoughts on that? I personally find it un-noticable, but it is a bit of a trade off.

@jishnub
Copy link
Contributor

jishnub commented Feb 5, 2023

Sorry I had missed this. I think adding precompile statements is a good idea, as load times should be resolved by that, now that invalidations have been dealt with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants