Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use trait RangeStepStyle for findfirst(isequal, ...), and adjust it for StepRange{Char,Int} #35279

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

mcabbott
Copy link
Contributor

@mcabbott mcabbott commented Mar 27, 2020

Fixes #35203, and closes #35278 which is a different solution. This findfirst method was added in #30778.

Edit, 2021: another PR #35278 special-cased some dates to avoid the fast method. New PR #40552 instead restricts it to only numbers. This PR uses an existing trait, which I believe correctly marks the distinction between regular and irregular dates, and marks unknown cases RangeStepIrregular, including many floating point ranges.

Earlier:

I'm not completely sure about things like

RangeStepStyle(::Type{<:StepRangeLen{<:Any,<:Any,<:Integer}}) = RangeStepRegular()

and how

Date(2019):Month(1):Date(2021) isa StepRange

fits with "The step between each element is constant" here. So someone who does understand should check that this isn't wrong.

Copy link
Sponsor Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting issues here. "constant" is clearly in the eye of the beholder, and it depends on how things are being measured (a month is a constant step unless you are measuring time in days).

This seems correct, so despite a couple of comments below I'll give this my approval for merging. Parenthetically, for efficiency we should probably call divrem in that method rather than % followed by ÷.

base/array.jl Outdated
@@ -1798,6 +1801,9 @@ findfirst(p::Union{Fix2{typeof(isequal),T},Fix2{typeof(==),T}}, r::AbstractUnitR
first(r) <= p.x <= last(r) ? 1+Int(p.x - first(r)) : nothing

function findfirst(p::Union{Fix2{typeof(isequal),T},Fix2{typeof(==),T}}, r::StepRange{T,S}) where {T,S}
if RangeStepStyle(r) !== RangeStepRegular()
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth making this a trait-depot instead?

findfirst(p::Union{Fix2{typeof(isequal),T},Fix2{typeof(==),T}}, r::StepRange{T,S}) where {T,S} = _findfirst(p, RangeStepStyle(r), r)
_findfirst(p, ::RangeStepStyle, r) = invoke(findfirst, Tuple{Function,Any}, p, r)  # fallback
function _findfirst(p, ::RangeStepRegular, r)
    ...
end

That would make it possible to specialize at intermediate states.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks neater. Should I perhaps also change the simplest findfirst method to be findfirst_iter or something, so that it can be called without invoke?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if you like what I did here. I made:

_findfirst(p, ::RangeStepStyle, r) = _findfirst(p, r)

i.e. putting the trait before the variable it refers to, and breaking up the definition of the simplest findfirst(p, r) so that I can call its implementation without invoke.

r = 'a':'z'
r2 = 'α':2:'ω'
r3 = StepRangeLen('a',3,10)
@test Base.RangeStepStyle(r) === Base.RangeStepRegular()
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test that failed before you set the step for char ranges to be regular? If so it would be good to test that as well. (It's generally better to test outcomes than representations, though I agree there's room for testing both. But representations tend to be more fragile with respect to internal changes within Julia, and what we ultimately care about is the outcome.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On master char ranges get the fast findfirst method, but they won't once this checks the trait. But there's no explicit error or wrong result, it's just slower. Agree this isn't much of a test though!

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying here what I said at #35278 (comment):

I'm not sure the trait would really be appropriate here. The method will fail for any case in which conversion of first(r)-x to the step type isn't possible. AFAIK nothing requires that in the AbstractRange API. I'm not even sure types are required to implement % and div, which the method uses.

It seeds that a safer approach would be to:

  • restrict the optimized methods to Integer (as @JeffBezanson suggested initially)
  • add a special method for date and time types which does the right thing depending on the case (e.g. when the step is Day the efficient approach can be used)

@mcabbott
Copy link
Contributor Author

will fail for any case in which conversion of first(r)-x to the step type isn't possible.

Could this simply be a requirement for RangeStepRegular? Being able to unambiguously divide (say) time differences into the given steps seems to be precisely the concept that this trait is trying to capture.

@nalimilan
Copy link
Member

The reason for the introduction of the trait was to be able to hash ranges like arrays, so the only requirement was that the difference between two subsequent elements was always the same (according to isequal and hash). The optimized method in this PR requires more than that as it uses % and div.

Do we have examples of non-Integer types that would use it? Otherwise redefining the trait sounds overkill. And even if we want a trait, RangeStepRegular may not be the best name.

@mcabbott
Copy link
Contributor Author

Dates are the non-integer types in question. The trait seems to otherwise not be used for anything at all -- was this array hash code removed at some point?

I agree that if it didn't yet exist, then adding a trait to Base for this would indeed seem like overkill. But it does exist, and marks the border between well- and badly-behaved date ranges. The meaning is still that "difference between two subsequent elements was always the same", even if what methods ought to be implemented should be better documented.

That said, other solutions would be fine by me too.

@nalimilan
Copy link
Member

Yes, the hashing code was changed so the trait currently isn't used at all (it should probably have been removed at that time).

Sorry for missing that some time ranges are RangeStepRegular (that's not visible in the PR). I'm fine with reusing the trait for that then if that allows using the optimized method for these. Though please update the trait docstring to mention which functions have to be implemented, and add tests to check that findfirst works for these ranges (and not only for RangeStepIrregular ones). The new trait methods should also be tested. Thanks!

@mcabbott mcabbott changed the title Use trait RangeStepStyle for findfirst(isequal, ...), and adjust it for StepRange{Char,Int}` Use trait RangeStepStyle for findfirst(isequal, ...), and adjust it for StepRange{Char,Int} Apr 21, 2021
@rafaqz
Copy link
Contributor

rafaqz commented Apr 22, 2021

This looks good to me, thanks for getting it cleaned up @mcabbott

@mcabbott
Copy link
Contributor Author

Bump?

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! Looks good, just a few minor comments.

base/array.jl Outdated Show resolved Hide resolved
base/array.jl Outdated
Comment on lines 1975 to 1979
_findfirst(p, RangeStepStyle(r), r)

_findfirst(p, ::RangeStepStyle, r) = _findfirst(p, r)

function _findfirst(p::Union{Fix2{typeof(isequal)},Fix2{typeof(==)}}, ::RangeStepRegular, r::StepRange{T,S}) where {T,S}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be simpler?

Suggested change
_findfirst(p, RangeStepStyle(r), r)
_findfirst(p, ::RangeStepStyle, r) = _findfirst(p, r)
function _findfirst(p::Union{Fix2{typeof(isequal)},Fix2{typeof(==)}}, ::RangeStepRegular, r::StepRange{T,S}) where {T,S}
RangeStepStyle(r) isa RangeStepRegular || return _findfirst(p, r)

Also maybe use a more explicit name for _findfirst, like _findfirst_general?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is shorter. I guess I was trying to follow how other traits are treated, but now I have no idea what examples I followed. Maybe there is no need to leave this open to later definition of more methods, isequal, ::StepRange is already quite narrow.

Will see if CI agrees...

@nalimilan nalimilan closed this Nov 28, 2021
@nalimilan nalimilan reopened this Nov 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Errors from findfirst(..., ::StepRange{Date,Year})
4 participants