The highlighting code should be optimised #5326

nad · 2021-04-17T09:03:58Z

Elsewhere I wrote the following:

[T]he code that finds the overlapping part could be slow if it is used incorrectly:

agda/src/full/Agda/Interaction/Highlighting/Precise.hs

Lines 451 to 468 in c880f86

-- | @splitAtC p f@ splits the compressed file @f@ into @(f1, f2)@,

-- where all the positions in @f1@ are @< p@, and all the positions

-- in @f2@ are @>= p@.

splitAtC :: Int -> CompressedFile ->

(CompressedFile, CompressedFile)

splitAtC p f = (CompressedFile f1, CompressedFile f2)

where

(f1, f2) = split $ ranges f

split [] = ([], [])

split (rx@(r,x) : f)

| p <= from r = ([], rx:f)

| to r <= p = (rx:f1, f2)

| otherwise = ([ (toP, x) ], (fromP, x) : f)

where (f1, f2) = split f

toP = Range { from = from r, to = p }

fromP = Range { from = p, to = to r }

However, performance should be OK if we always remove the first part. (If necessary one could presumably optimise the code by using a different data structure.)

The splitC function is called from generateAndPrintSyntaxInfo:

agda/src/full/Agda/Interaction/Highlighting/Generate.hs

Lines 163 to 164 in aea2c53

    
           (prevTokens, (curTokens, postTokens)) <- 
        
             second (splitAtC to) . splitAtC from <$> useTC stTokens

I added code that counted the length of prevTokens, and this indicates that the worst-case time complexity is at least quadratic in the size of the input file. For instance, consider the following code:

_ : Set₁
_ = Set

_ : Set₁
_ = Set

_ : Set₁
_ = Set

_ : Set₁
_ = Set

_ : Set₁
_ = Set

_ : Set₁
_ = Set

For this code we get the following list of lengths: [0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 0]. For "real" code the numbers can exceed 1000.

I suggest that we define CompressedFile using a data structure that allows us to implement splitC efficiently. One option might be to use a map with Ranges as keys, along with the function splitLookup.

The text was updated successfully, but these errors were encountered:

nad · 2021-04-17T10:04:04Z

Perhaps the new representation can also be used for File, in which case CompressedFile could be removed.

andreasabel · 2021-04-19T07:00:12Z

5 years ago Guilhem Moulin was working on this, but this wasn't merged. Maybe Guilhem still has the patches. They were on a separate repo.
But if you are fixing this now, @nad, there may be no need to look for the old solution.

I had also looked around for better representations. There are some libraries for interval trees on hackage, but I didn't find one back than that implemented what I wanted.

The new code can be considerably faster for files with very many mutual blocks.

nad · 2021-04-21T19:01:32Z

I've implemented some different data structures and run some benchmarks (make std-lib-test). Some observations and comments:

I made the changes and ran the benchmarks on top of (roughly) commit f86f704 plus the change in commit 92c6f64.
Times vary quite a lot from run to run.
Quite a bit of the time spent on computing and serialising highlighting data is not attributed to highlighting when -vprofile:7 is used. I turned off highlighting by editing the code, and limited benchmarking suggests that, before my changes, computation and serialisation of highlighting data accounts for about 4-11% of the time.
Given the lack of reliability of the benchmarks I am not entirely certain that the new implementation is faster than the previous one for the standard library.
The problem I mentioned in the OP does not seem to be very important for the standard library. However, I constructed a file with 8580 occurrences of the following piece of code:
```
_ : Set₁
_ = Set
```
That file took about four times longer to type-check without my changes.
I've left several data structures in the code. (In fact, they are all used.) One data structure is used for the first step, where highlighting is gathered, and another for the second step, where it is modified. Some of the operations used for the second step are only implemented for one data structure (RangeMap), but several data structures can be used for the second operation by changing the type synonym HighlightingInfoBuilder.
I did not find a Haskell implementation of an interval map type with only non-overlapping ranges and the kinds of operations I was looking for, so I implemented my own (RangeMap) using Data.Map.Strict.
RangeMap could presumably be optimised further, but recompiling Agda and running benchmarks takes a lot of time.
It might make sense to move some code to one or more modules under Agda.Utils.

andreasabel · 2021-04-22T07:06:25Z

Excellent!

* It might make sense to move some code to one or more modules under `Agda.Utils`.

Yes, I think it is good to separate the data structure (RangeMap) from its application (map to Aspects).

Please open a PR so we can have a structured discussion with the conveniences offered by the PR functionality.

The new code can be considerably faster for files with very many mutual blocks. The highlighting of some symbols have changed. See for instance test/interaction/Issue3020.agda, where the definition B is now highlighted as a function instead of as a bound variable. The reason is presumably that mappend is not associative for the type Aspects.

nad · 2021-04-23T07:21:01Z

Another comment: The append function (<>) is not associative for Aspects. I think this is the reason for some changes to the highlighting behaviour. @andreasabel, you made append non-associative (66a82e6). I'm not convinced that this is a good idea.

nad · 2021-04-23T11:30:17Z

I think this is the reason for some changes to the highlighting behaviour.

No, this was due to a mistake I made.

The new code can be considerably faster for files with very many mutual blocks.

nad · 2021-04-24T13:12:39Z

I opened a separate issue related to how append is not associative (#5347).

The code passes all the tests now. It took me a few iterations to get the code to work for old versions of GHC. I also changed the implementation. Now the conversion from HighlightingInfoBuilder to HighlightingInfo no longer goes via PositionMap. (The difference in performance seems to be small for the standard library.)

I assume that it is possible to create a more efficient implementation of RangeMap by not reusing the implementation of Data.Map.Strict, but instead having a dedicated tree type with direct support for ranges. For instance, even in the case of no overlap the current implementation of insert uses four tree operations (unless the range is empty), which I guess translates to four tree traversals. It should be possible to traverse the tree just a single time.

The new code can be considerably faster for files with very many mutual blocks.

nad added type: bug Issues and pull requests about actual bugs performance Slow type checking, interaction, compilation or execution of Agda programs ux: highlighting Issues relating to syntax highlighting help wanted labels Apr 17, 2021

nad added this to the 2.6.3 milestone Apr 17, 2021

nad mentioned this issue Apr 17, 2021

Missing highlighting when interactive highlighting is used #4954

Open

nad self-assigned this Apr 18, 2021

nad added a commit that referenced this issue Apr 21, 2021

[ #5326 ] Changed the representations of highlighting information.

5b57682

The new code can be considerably faster for files with very many mutual blocks.

nad added a commit that referenced this issue Apr 22, 2021

[ #5326 ] Added Agda.Utils.RangeMap.

774ad1e

nad removed the help wanted label Apr 22, 2021

nad linked a pull request Apr 22, 2021 that will close this issue

#5326 #5346

Closed

nad added a commit that referenced this issue Apr 23, 2021

[ #5326 ] Added Agda.Utils.RangeMap.

9f63020

nad added a commit that referenced this issue Apr 23, 2021

[ #5326 ] Added Agda.Utils.RangeMap.

abf7896

nad added a commit that referenced this issue Apr 23, 2021

[ #5326 ] Added Agda.Utils.RangeMap.

4be9e53

nad added a commit that referenced this issue Apr 23, 2021

[ #5326 ] Added Agda.Utils.RangeMap.

c6e826a

nad added a commit that referenced this issue Apr 23, 2021

[ #5326 ] Added Agda.Utils.RangeMap.

8cbc76d

nad added a commit that referenced this issue Apr 23, 2021

[ #5326 ] Changed the representations of highlighting information.

577f568

The new code can be considerably faster for files with very many mutual blocks.

nad added a commit that referenced this issue Apr 23, 2021

[ #5326 ] Changed the representations of highlighting information.

1750f5d

The new code can be considerably faster for files with very many mutual blocks.

andreasabel added type: enhancement Issues and pull requests about possible improvements and removed type: bug Issues and pull requests about actual bugs labels Apr 25, 2021

nad added a commit that referenced this issue May 8, 2021

[ #5326 ] Changed the representations of highlighting information.

863bcdb

The new code can be considerably faster for files with very many mutual blocks.

nad removed this from the 2.6.3 milestone May 8, 2021

nad added this to the 2.6.2 milestone May 8, 2021

nad closed this as completed May 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The highlighting code should be optimised #5326

The highlighting code should be optimised #5326

nad commented Apr 17, 2021

nad commented Apr 17, 2021

andreasabel commented Apr 19, 2021

nad commented Apr 21, 2021

andreasabel commented Apr 22, 2021

nad commented Apr 23, 2021

nad commented Apr 23, 2021

nad commented Apr 24, 2021

The highlighting code should be optimised #5326

The highlighting code should be optimised #5326

Comments

nad commented Apr 17, 2021

nad commented Apr 17, 2021

andreasabel commented Apr 19, 2021

nad commented Apr 21, 2021

andreasabel commented Apr 22, 2021

nad commented Apr 23, 2021

nad commented Apr 23, 2021

nad commented Apr 24, 2021