Rework annotation ordering/optimisations #54289

tecosaur · 2024-04-28T04:50:19Z

There are some annoyances/discrepancies with annotation handling that I originally tried to fix in #53794, #53800 and #53801.

@LilithHafner was kind enough to have a look at these, and after a long chat we came to the conclusion that this isn't the right approach to take, and that the order with which annotations are applied should be given primacy over annotation ranges.

This PR supersedes the aforementioned PRs, and should make the way annotations are ordered more sensible.

See the commit messages for more details.

tecosaur · 2024-04-28T08:57:23Z

Ah, tests are failing because I missed a use of Base.annotatedstring_optimize! in StyledStrings' tests. I'll fix that and re-bump the stdlib shortly.

base/strings/annotated.jl

LilithHafner · 2024-04-29T13:55:04Z

What part of annotations are semantically meaningful/visible?

IIUC the annotations

[(1:1, :a => 1), (2:2, :a => 1)] and [(1:2, :a => 1)] are semantically equivalent
[(1:1, :a => 1), (2:2, :a => 1)] and [ (2:2, :a => 1), (1:1, :a => 1)] are semantically equivalent
[(1:1, :a => 1), (1:1, :a => 2)] and [ (1:1, :a => 2), (1:1, :a => 1)] are semantically different
[(1:1, :a => 1), (1:1, :b => 2)] and [ (1:1, :a => 2), (1:1, :b => 1)] are semantically different

This should be documented. Specifically, a simple specification that describes what is and is not semantically visible from which one can derive the above (or a different result).

Ideally everything returned by annotations(::AnnotatedString) would be semantically meaningful, that's an easy semantic to document. (and https://www.hyrumslaw.com/)

(I can't review the correctness of an optimization without knowing what "correct" means)

tecosaur · 2024-04-29T13:57:59Z

Ah yep, we did also discuss a need for more info in the docstrings. I'll see about adding some of the elaboration needed.

LilithHafner

    for (region, annot) in str.annotations
        if region == fullregion
            push!(annotations, (firstindex(unannot):lastindex(unannot), annot))
        end
    end
    for offset in 0:len:(r-1)*len
        for (region, annot) in str.annotations
            if region != fullregion
                push!(annotations, (region .+ offset, annot))
            end
        end
    end

This is buggy in the new semantic because it re-orders annotations. Not necessarily blocking this PR, but an issue.

LilithHafner · 2024-04-29T18:15:41Z

base/strings/annotated.jl

+    end
+    # Insert any extra entries in the appropriate position
+    for (offset, (i, entry)) in enumerate(extras)
+        insert!(annotations, i + offset, entry)


This is O(n^2), but we can revise this function for perf later.

Yep, and I don't think this should be in the "hot path" of any common use-cases 🤞.

base/strings/annotated.jl

LilithHafner · 2024-04-29T23:09:59Z

base/strings/annotated.jl

+            first(first(annot)) == 1 || continue
+            if last(annot) == last(last(io.annotations))
+                valid_run = true
+                for runlen in 1:i


This is also O(n^2) when it could be O(n), but perf can wait.

And it will be O(n) in non-pathological cases.

Yea, this is written with the non-pathological case in mind, and if you consider the case where few-annotation insertions are repeatedly made, individual insertions should be pretty much O(1) with a constant factor based on the average "checked annotation depth" / average number of annotations each insertion.

base/strings/annotated.jl

LilithHafner

I think this should be merged and followup prs should address the performance of this PR's implementations and address the now-invalid-ness of the repeat function.

tecosaur · 2024-04-30T02:11:44Z

I've just adjusted repeat to be consistent with the new semantic,

fullregion = firstindex(str):lastindex(str)
if allequal(first, str.annotations) && first(first(str.annotations)) == fullregion
    newfullregion = firstindex(unannot):lastindex(unannot)
    for (_, annot) in str.annotations
        push!(annotations, (newfullregion, annot))
    end
else
    for offset in 0:len:(r-1)*len
        for (region, annot) in str.annotations
            push!(annotations, (region .+ offset, annot))
        end
    end
end

this change has been folded into 2656884453 * Remove strong ordering of annotation ranges

tecosaur · 2024-04-30T11:13:07Z

Now that #54308 has been merged, we need to resolve the merge conflict and adjust the test introduced there.

This is in preparation for changes to the way annotation ordering is handled in Base.

After a long chat with Lilith Halfner, we've come to the conclusion that the range-based ordering applied to annotations, while nice for making some otherwise O(n) code O(log n) and O(n^2) code O(n), is actually assuming too much about how annotations are used and interact with each other. Removing all assumptions about ordering, and giving annotation order primacy seems like the most sensible thing to do, even if it makes a few bits of the code less algorithmically "nice". As a consequence, we also get rid of annotatedstring_optimize!. Specific producers/consumers of annotated text will know what assumptions can be made to compress/optimise the annotations used, and are thus best suited to do so themselves. The one exception to this is probably when writing to an AnnotatedIOBuffer, here adding a specific optimisation to _insert_annotations! probably makes sense, and will be explored soon.

This is a rather important optimisation, since it prevents the annotation blow-up that can result from say writing to an AnnotatedIOBuffer char-by-char. Originally I was just going to pass the AnnotatedString produced when reading the AnnotatedIOBuffer through annotatedstring_optimize!, but now that's been removed, this seems like the best past forwards (it's also actually a better approach than applying annotatedstring_optimize!, just hard to justify when that code already existed).

It's important to specify the way that annotations relate to the characters of the underlying string and each other. Along the way, it's also worth explaining the behaviour of the internal functions _clear_annotations_in_region! and _insert_annotations!.

tecosaur · 2024-04-30T11:15:55Z

After CI, I think this should be good to merge if that also sounds good by @LilithHafner 🙂.

LilithHafner · 2024-04-30T12:14:02Z

1bed84f LGTM. I have not reviewed https://github.com/JuliaLang/julia/compare/1bed84f4ae6fe6fc9bb39612e274fd1630a51349..4c18472b3e56998f2784ecd5d9deea4cf5371850, but am open to suggestions to how I could do that without re-reviewing this PR from scratch.

tecosaur · 2024-04-30T12:21:25Z

The only change made here in the diff you link is

    @test chopprefix(sprint(show, str), "Base.") ==
-        "AnnotatedString{String}(\"some string\", [(1:4, :thing => 0x01), (1:11, :all => 0x03), (6:11, :other => 0x02)])"
+        "AnnotatedString{String}(\"some string\", [(1:4, :thing => 0x01), (6:11, :other => 0x02), (1:11, :all => 0x03)])"

i.e. correctly resolving the change introduced by #54308.

HTH.

LilithHafner · 2024-04-30T13:38:20Z

Okay, the diff in your comment also LGTM.

IanButterworth · 2024-05-01T14:26:54Z

@tecosaur I know you have a preference to not squash but for things that will get backported, it introduces unnecessary stumbling blocks in that automated process that would be better to avoid.

tecosaur · 2024-05-01T14:44:16Z

Thanks for mentioning that Ian, I do like to preserve more granular information when it seems sensible, but I wasn't aware that complicated the backporting process. I'll keep that in mind with the few remaining PRs I'm hoping to get backported 🙂.

(As an aside, is the automated backporting doing something other than cherry picking? Or is the hassle having to supply -m?)

IanButterworth · 2024-05-01T14:48:21Z

Yeah the automatic script doesn't handle adding -m. If you wanted to help with adding that it's here https://github.com/KristofferC/Backporter/blob/master/backporter.jl

Improvements to the consistency and semantics of AnnotatedStrings, mainly around the ordering of annotations.

Backported PRs: - [x] #53665  - [x] #53976  - [x] #54005  - [x] #54010  - [x] #54069  - [x] #53750  - [x] #53984  - [x] #54102  - [x] #54070  - [x] #54013  - [x] #53941  - [x] #54137  - [x] #54129  - [x] #54153  - [x] #54143  - [x] #54151  - [x] #54213  - [x] #54222  - [x] #54233  - [x] #54255  - [x] #54259  - [x] #54251  - [x] #54276  - [x] #54248  - [x] #54308  - [x] #54302  - [x] #54243  - [x] #54350  - [x] #54331  - [x] #53509  - [x] #54335  - [x] #54239  - [x] #54288 - [x] #54067 - [x] #53715  - [x] #54289  - [x] #53815  - [x] #54130  - [x] #54428  - [x] #54332  - [x] #53826  - [x] #54465  - [x] #54514  - [x] #54499  - [x] #54210  - [x] #54359  Non-merged PRs with backport label: - [ ] #54471  - [ ] #54457  - [ ] #54323  - [ ] #54322  - [ ] #54191  - [ ] #53957  - [ ] #53882  - [ ] #53707  - [ ] #53452  - [ ] #53402  - [ ] #53286  - [ ] #52694  - [ ] #51479

tecosaur added domain:strings "Strings!" backport 1.11 Change should be backported to release-1.11 labels Apr 28, 2024

tecosaur requested a review from LilithHafner April 28, 2024 04:50

tecosaur force-pushed the rework-annotation-optimisation branch from a1539b8 to b4a4dec Compare April 28, 2024 16:34

bvdmitri reviewed Apr 28, 2024

View reviewed changes

base/strings/annotated.jl Outdated Show resolved Hide resolved

tecosaur force-pushed the rework-annotation-optimisation branch from 1d26197 to b4a4dec Compare April 28, 2024 17:37

tecosaur added the kind:don't squash Don't squash merge label Apr 29, 2024

tecosaur mentioned this pull request Apr 29, 2024

Register StyledStrings.jl 1.0 in general JuliaLang/StyledStrings.jl#5

Closed

tecosaur force-pushed the rework-annotation-optimisation branch from b4a4dec to f869c2f Compare April 29, 2024 15:35

LilithHafner reviewed Apr 29, 2024

View reviewed changes

base/strings/annotated.jl Show resolved Hide resolved

LilithHafner reviewed Apr 29, 2024

View reviewed changes

base/strings/annotated.jl Show resolved Hide resolved

LilithHafner approved these changes Apr 29, 2024

View reviewed changes

tecosaur mentioned this pull request Apr 30, 2024

Implement eval-able AnnotatedString 2-arg show #54308

Merged

tecosaur force-pushed the rework-annotation-optimisation branch from f869c2f to 1bed84f Compare April 30, 2024 02:11

tecosaur added 4 commits April 30, 2024 19:13

Bump StyledStrings

a294d3d

This is in preparation for changes to the way annotation ordering is handled in Base.

tecosaur force-pushed the rework-annotation-optimisation branch from 1bed84f to 4c18472 Compare April 30, 2024 11:15

tecosaur merged commit fe554b7 into JuliaLang:master Apr 30, 2024
7 checks passed

tecosaur deleted the rework-annotation-optimisation branch April 30, 2024 15:14

tecosaur mentioned this pull request Apr 30, 2024

Backports for 1.11.0-beta2 #54112

Merged

59 tasks

tecosaur added a commit that referenced this pull request May 6, 2024

Rework annotation ordering/optimisations (#54289)

05e685e

Improvements to the consistency and semantics of AnnotatedStrings, mainly around the ordering of annotations.

KristofferC removed the backport 1.11 Change should be backported to release-1.11 label May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework annotation ordering/optimisations #54289

Rework annotation ordering/optimisations #54289

tecosaur commented Apr 28, 2024

tecosaur commented Apr 28, 2024

LilithHafner commented Apr 29, 2024 •

edited

Loading

tecosaur commented Apr 29, 2024

LilithHafner left a comment

LilithHafner Apr 29, 2024

tecosaur Apr 30, 2024

LilithHafner Apr 29, 2024

LilithHafner Apr 29, 2024

tecosaur Apr 30, 2024

LilithHafner left a comment

tecosaur commented Apr 30, 2024

tecosaur commented Apr 30, 2024

tecosaur commented Apr 30, 2024

LilithHafner commented Apr 30, 2024

tecosaur commented Apr 30, 2024

LilithHafner commented Apr 30, 2024

IanButterworth commented May 1, 2024

tecosaur commented May 1, 2024 •

edited

Loading

IanButterworth commented May 1, 2024

Rework annotation ordering/optimisations #54289

Rework annotation ordering/optimisations #54289

Conversation

tecosaur commented Apr 28, 2024

tecosaur commented Apr 28, 2024

LilithHafner commented Apr 29, 2024 • edited Loading

tecosaur commented Apr 29, 2024

LilithHafner left a comment

Choose a reason for hiding this comment

LilithHafner Apr 29, 2024

Choose a reason for hiding this comment

tecosaur Apr 30, 2024

Choose a reason for hiding this comment

LilithHafner Apr 29, 2024

Choose a reason for hiding this comment

LilithHafner Apr 29, 2024

Choose a reason for hiding this comment

tecosaur Apr 30, 2024

Choose a reason for hiding this comment

LilithHafner left a comment

Choose a reason for hiding this comment

tecosaur commented Apr 30, 2024

tecosaur commented Apr 30, 2024

tecosaur commented Apr 30, 2024

LilithHafner commented Apr 30, 2024

tecosaur commented Apr 30, 2024

LilithHafner commented Apr 30, 2024

IanButterworth commented May 1, 2024

tecosaur commented May 1, 2024 • edited Loading

IanButterworth commented May 1, 2024

LilithHafner commented Apr 29, 2024 •

edited

Loading

tecosaur commented May 1, 2024 •

edited

Loading