Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The highlighting code should be optimised #5326

Closed
nad opened this issue Apr 17, 2021 · 7 comments
Closed

The highlighting code should be optimised #5326

nad opened this issue Apr 17, 2021 · 7 comments
Assignees
Labels
performance Slow type checking, interaction, compilation or execution of Agda programs type: enhancement Issues and pull requests about possible improvements ux: highlighting Issues relating to syntax highlighting
Milestone

Comments

@nad
Copy link
Contributor

nad commented Apr 17, 2021

Elsewhere I wrote the following:

[T]he code that finds the overlapping part could be slow if it is used incorrectly:

-- | @splitAtC p f@ splits the compressed file @f@ into @(f1, f2)@,
-- where all the positions in @f1@ are @< p@, and all the positions
-- in @f2@ are @>= p@.
splitAtC :: Int -> CompressedFile ->
(CompressedFile, CompressedFile)
splitAtC p f = (CompressedFile f1, CompressedFile f2)
where
(f1, f2) = split $ ranges f
split [] = ([], [])
split (rx@(r,x) : f)
| p <= from r = ([], rx:f)
| to r <= p = (rx:f1, f2)
| otherwise = ([ (toP, x) ], (fromP, x) : f)
where (f1, f2) = split f
toP = Range { from = from r, to = p }
fromP = Range { from = p, to = to r }

However, performance should be OK if we always remove the first part. (If necessary one could presumably optimise the code by using a different data structure.)

The splitC function is called from generateAndPrintSyntaxInfo:

(prevTokens, (curTokens, postTokens)) <-
second (splitAtC to) . splitAtC from <$> useTC stTokens

I added code that counted the length of prevTokens, and this indicates that the worst-case time complexity is at least quadratic in the size of the input file. For instance, consider the following code:

_ : Set₁
_ = Set

_ : Set₁
_ = Set

_ : Set₁
_ = Set

_ : Set₁
_ = Set

_ : Set₁
_ = Set

_ : Set₁
_ = Set

For this code we get the following list of lengths: [0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 0]. For "real" code the numbers can exceed 1000.

I suggest that we define CompressedFile using a data structure that allows us to implement splitC efficiently. One option might be to use a map with Ranges as keys, along with the function splitLookup.

@nad nad added type: bug Issues and pull requests about actual bugs performance Slow type checking, interaction, compilation or execution of Agda programs ux: highlighting Issues relating to syntax highlighting help wanted labels Apr 17, 2021
@nad nad added this to the 2.6.3 milestone Apr 17, 2021
@nad
Copy link
Contributor Author

nad commented Apr 17, 2021

Perhaps the new representation can also be used for File, in which case CompressedFile could be removed.

@nad nad self-assigned this Apr 18, 2021
@andreasabel
Copy link
Member

5 years ago Guilhem Moulin was working on this, but this wasn't merged. Maybe Guilhem still has the patches. They were on a separate repo.
But if you are fixing this now, @nad, there may be no need to look for the old solution.

I had also looked around for better representations. There are some libraries for interval trees on hackage, but I didn't find one back than that implemented what I wanted.

nad added a commit that referenced this issue Apr 21, 2021
The new code can be considerably faster for files with very many
mutual blocks.
@nad
Copy link
Contributor Author

nad commented Apr 21, 2021

I've implemented some different data structures and run some benchmarks (make std-lib-test). Some observations and comments:

  • I made the changes and ran the benchmarks on top of (roughly) commit f86f704 plus the change in commit 92c6f64.
  • Times vary quite a lot from run to run.
  • Quite a bit of the time spent on computing and serialising highlighting data is not attributed to highlighting when -vprofile:7 is used. I turned off highlighting by editing the code, and limited benchmarking suggests that, before my changes, computation and serialisation of highlighting data accounts for about 4-11% of the time.
  • Given the lack of reliability of the benchmarks I am not entirely certain that the new implementation is faster than the previous one for the standard library.
  • The problem I mentioned in the OP does not seem to be very important for the standard library. However, I constructed a file with 8580 occurrences of the following piece of code:
    _ : Set₁
    _ = Set
    That file took about four times longer to type-check without my changes.
  • I've left several data structures in the code. (In fact, they are all used.) One data structure is used for the first step, where highlighting is gathered, and another for the second step, where it is modified. Some of the operations used for the second step are only implemented for one data structure (RangeMap), but several data structures can be used for the second operation by changing the type synonym HighlightingInfoBuilder.
  • I did not find a Haskell implementation of an interval map type with only non-overlapping ranges and the kinds of operations I was looking for, so I implemented my own (RangeMap) using Data.Map.Strict.
  • RangeMap could presumably be optimised further, but recompiling Agda and running benchmarks takes a lot of time.
  • It might make sense to move some code to one or more modules under Agda.Utils.

@andreasabel
Copy link
Member

Excellent!

* It might make sense to move some code to one or more modules under `Agda.Utils`.

Yes, I think it is good to separate the data structure (RangeMap) from its application (map to Aspects).

Please open a PR so we can have a structured discussion with the conveniences offered by the PR functionality.

nad added a commit that referenced this issue Apr 22, 2021
The new code can be considerably faster for files with very many
mutual blocks.

The highlighting of some symbols have changed. See for instance
test/interaction/Issue3020.agda, where the definition B is now
highlighted as a function instead of as a bound variable. The reason
is presumably that mappend is not associative for the type Aspects.
nad added a commit that referenced this issue Apr 22, 2021
@nad nad removed the help wanted label Apr 22, 2021
@nad nad linked a pull request Apr 22, 2021 that will close this issue
nad added a commit that referenced this issue Apr 23, 2021
@nad
Copy link
Contributor Author

nad commented Apr 23, 2021

Another comment: The append function (<>) is not associative for Aspects. I think this is the reason for some changes to the highlighting behaviour. @andreasabel, you made append non-associative (66a82e6). I'm not convinced that this is a good idea.

nad added a commit that referenced this issue Apr 23, 2021
nad added a commit that referenced this issue Apr 23, 2021
nad added a commit that referenced this issue Apr 23, 2021
nad added a commit that referenced this issue Apr 23, 2021
@nad
Copy link
Contributor Author

nad commented Apr 23, 2021

I think this is the reason for some changes to the highlighting behaviour.

No, this was due to a mistake I made.

nad added a commit that referenced this issue Apr 23, 2021
The new code can be considerably faster for files with very many
mutual blocks.
nad added a commit that referenced this issue Apr 23, 2021
The new code can be considerably faster for files with very many
mutual blocks.
@nad
Copy link
Contributor Author

nad commented Apr 24, 2021

I opened a separate issue related to how append is not associative (#5347).

The code passes all the tests now. It took me a few iterations to get the code to work for old versions of GHC. I also changed the implementation. Now the conversion from HighlightingInfoBuilder to HighlightingInfo no longer goes via PositionMap. (The difference in performance seems to be small for the standard library.)

I assume that it is possible to create a more efficient implementation of RangeMap by not reusing the implementation of Data.Map.Strict, but instead having a dedicated tree type with direct support for ranges. For instance, even in the case of no overlap the current implementation of insert uses four tree operations (unless the range is empty), which I guess translates to four tree traversals. It should be possible to traverse the tree just a single time.

@andreasabel andreasabel added type: enhancement Issues and pull requests about possible improvements and removed type: bug Issues and pull requests about actual bugs labels Apr 25, 2021
nad added a commit that referenced this issue May 8, 2021
The new code can be considerably faster for files with very many
mutual blocks.
@nad nad removed this from the 2.6.3 milestone May 8, 2021
@nad nad added this to the 2.6.2 milestone May 8, 2021
@nad nad closed this as completed May 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Slow type checking, interaction, compilation or execution of Agda programs type: enhancement Issues and pull requests about possible improvements ux: highlighting Issues relating to syntax highlighting
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants