Skip to content

Conversation

@DmitryNekrasov
Copy link
Contributor

  • Replace direct list concatenation in ParserStructure.append() with ConcatenatedListView for improved efficiency.
  • Add ConcatenatedListView implementation to lazily combine two lists without creating a new collection.

- Replace direct list concatenation in `ParserStructure.append()` with `ConcatenatedListView` for improved efficiency.
- Add `ConcatenatedListView` implementation to lazily combine two lists without creating a new collection.
@DmitryNekrasov DmitryNekrasov self-assigned this Nov 3, 2025
@DmitryNekrasov
Copy link
Contributor Author

@dkhalanskyjb Hello! What do you think about this idea?

@DmitryNekrasov DmitryNekrasov marked this pull request as draft November 3, 2025 14:33
@dkhalanskyjb
Copy link
Collaborator

Hi! This may help with the first stage (building a parser before the normalisation), but normalisation has quadratic complexity, too, and it wouldn't benefit from the proposed approach, as the lists themselves will also need to be reconstructed.

We could extract the happy fast path where there are no adjacent numeric parser operations and simplify normalisation there. That is a common case, so it would be nice to provide brilliant performance there. I'm not yet convinced the common case can't be drastically improved by a better algorithm.

@dkhalanskyjb
Copy link
Collaborator

Now that I think about it, it doesn't even fix the quadratic complexity of the initial stage. Concatenating n parsers will give us a binary tree with n leaves. We will need to traverse the tree at least once, and even a single enumeration of all these elements will have quadratic complexity: the depth n to access the first parser, n - 1 to access the second one, and so on. The construction of the new list does indeed become O(n), but then, each traversal is O(n^2).

Most parsers we concatenate are going to be single-element, so n parsers basically means n operations.

@dkhalanskyjb
Copy link
Collaborator

Yep, the initial stage also doesn't benefit from this. A quick run of a benchmark shows this:

Before the change:

Benchmark                        Mode  Cnt  Score   Error  Units
FormattingBenchmark.buildFormat  avgt   25  5.205 ± 0.090  us/op

After the change:

Benchmark                        Mode  Cnt  Score   Error  Units
FormattingBenchmark.buildFormat  avgt   25  7.830 ± 0.160  us/op

Here, less is better (the numbers 5.2 and 7.8 show how long in milliseconds an operation takes).

The benchmark itself is creating the datetime format used in Python:

     @Benchmark
     fun buildFormat(blackhole: Blackhole) {
         val v = LocalDateTime.Format {
             year()
             char('-')
             monthNumber()
             char('-')
             day()
             char(' ')
             hour()
             char(':')
             minute()
             optional {
                 char(':')
                 second()
                 optional {
                     char('.')
                     secondFraction()
                 }
             }
         }
         blackhole.consume(v)
     }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants