Skip to content

Qi Compiler Sync Sept 9 2022

Siddhartha Kasivajhula edited this page Dec 15, 2022 · 7 revisions

Field Report and Compiler Implementation Approaches

Qi Compiler Sync Sept 9 2022

Adjacent meetings: Previous | Up | Next

Summary

We discussed some issues encountered during testing of the current compiler branch, and some before/after performance data for the bindingspec integration. We also discussed some compiler implementation approaches at a high level.

Background

Previously, we achieved end-to-end functionality with the expander being cleanly separated out and implemented using bindingspec, instead of being conflated with core form expansion in the compiler as it was formerly. All the tests were passing, and docs were building, too.

In the interim, we tried out the new code and ran some benchmarks to see how things looked.

Field Report

Error: "count: unbound identifier"

While the tests were passing, generating the performance benchmark data using make report-benchmarks was failing with the error:

profile/forms.rkt:614:35: count: unbound identifier
  in: count
  context...:
   #(22208 local) #(22209 intdef) [common scopes]
  other binding...:
   #(count.1 #<module-path-index:"flow/extended/forms.rkt" qi/flow> 0)
   #(-2870 interned qi) [common scopes]
  common scopes...:
   #(22182 module) #(22191 module partition)
  location...:
   profile/forms.rkt:614:35
[...]

This turned out to be because of the binding spaces "quirk" discussed at length last time. Most of the submodules used for the microbenchmarks in profile/forms.rkt use the same names for the test runner as the Qi form being benchmarked. As a result, by virtue of the name collision, they were each failing in unexpected ways since the input syntax was being treated as Racket code rather than Qi code, and the ways in which those were ending up being evaluated varied from case to case. For instance, while some caused the unbound identifier failure encountered above, others failed with bad syntax, and in other cases it simply caused an infinite loop during the module compilation phase. These will be fixed by renaming all of the Racket-level identifiers to avoid the collision, for next time.

In addition, it may be possible for bindingspec to detect this kind of inter-space name collision and provide a helpful error message (e.g. "ambiguous binding") before it falls through and is interpreted in some unpredictable way.

Is it possible to group expansion rules by the forms they correspond to?

It wouldn't be straightforward -- in fact, one of the things bindingspec does is make it so that such grouping isn't necessary. For now, if we want such separation, we could just keep the rules for each form separated from the other rules by using whitespace.

Build and Testing Times

Build Times

Before

make build [14.65s]
make build-all [56.65s]
make build-docs [43.10s]

After

make build [15.44s]
make build-all [1:01]
make build-docs [58.78]

Testing Times

Before

make test-flow [4.43s]
make test [1:22.82]

After

make test-flow [5.30s]
make test [1:27]

Assessment

That comes out to about a 15% increase in build and testing times.

Require Latency

The latency when invoking (require qi):

Before

183ms

After

250ms

Assessment

The new architecture of the code including the bindingspec dependency increased the require latency by around 37%. At 250ms, it is still well within the normal range for Racket packages, which empirically vary from 0 - 500ms. Still, we recorded an issue in case this can be improved down the line.

Compiler Implementation Approaches

Intermediate Representations (IRs)

There are a few different possibilities:

  1. Use structs (e.g. syntax-parse and match do it this way)
  2. Use syntax objects (e.g. minikanren) together with a binding store.
  3. Use Nanopass.

For (2), we could have the expander create bindings which could be compared using free-identifier=?. This allows the use of syntax-parse for generating the IRs but suffers from the drawback that the bindings are only available during the expansion phase, and that causes some awkwardness e.g. with unit testing, where the tests need to expand into things that run the tests(?). (3) is no longer dependent on running during expansion and the compiler can match using datum parsing instead of maintaining a binding store, but suffers from the drawback that we'd need to parse the bindingspec output into a form that can serve as the input to the Nanopass framework. For larger languages (like Qi) this could be undesirable since we already have a grammar specification in bindingspec, and would need to once again employ (a differently specified) one at the nanopass stage, which seems gratuitous from an implementation standpoint.

None of these options seems especially good at this point, though (3) is probably the most interesting, and there may be opportunities for innovations that could improve the integration here.

To Bind or Not to Bind?

Qi doesn't have bindings at the moment but we would like to support them in the future. We could go about this in one of two ways:

  1. Add bindings right away and then embark on writing compiler optimizations.

  2. Write compiler optimizations without bindings in the picture, and then add bindings later.

(1) would involve implementing the binding forms in the compiler, and declaring the binding forms in the expander grammar in order to preserve hygiene.

On the other hand, bindings complicate the compiler implementation, and so it may be easier to reach nontrivial optimization results without their being in the picture, i.e. forging ahead with (2) and then adding bindings in at a later stage.

We could potentially start exploring both of these independently in separate branches against the compiler integration branch and that may give us more clues on the best order in which to proceed.

DSLs for Performance

Positive results for Qi with optimizations for specialized cases would make a strong case for the use of DSLs for performance, and not only for syntactic and semantic economy in expressing the problem at the right level of abstraction.

Next Steps

  • Fix remaining occurrences of name collisions in the performance benchmarks in profile/forms.rkt
  • Put up bindingspec on the package index (on a provisional, no-backwards-compatibility-guarantees basis) to fix CI failures
  • Investigate detecting inter-space name collisions in bindingspec to provide a helpful error message
  • Merge the bindingspec branch into the compiler integration branch
  • Start a branch to explore adding bindings
  • Start a branch to implement a compiler optimization
  • Post about the compiler meetup on Twitter and Reddit to reach other CS folks

Attendees

Michael, Sid, Stephen

Clone this wiki locally