Skip to content

Qi Compiler Sync Dec 15 2023

Siddhartha Kasivajhula edited this page Dec 26, 2023 · 7 revisions

Tying Up Loose Ends

Qi Compiler Sync Dec 15 2023

Adjacent meetings: Previous | Up | Next

Summary

Michael returned after a semester away, and we took the opportunity to discuss many issues requiring his input. We discussed some blockers for the upcoming release planned for next Friday, and resolved most of the issues to our satisfaction, including the "bind and switch" issue regarding binding scope not extending downstream of conditional forms like switch. We discussed the need to preserve original syntax through expansion and successive stages of compilation for the purposes of error reporting and assigning "blame" appropriately. We also discussed some performance concerns around the use of contracts in deforestation, ways to unit test the expander, and miscellaneous other smaller items.

Background

We recently rebased the compiler branch against main, resolving merge conflicts, and moved the branch into the code review stage. Ben reviewed the normalization pass and identified some unsound rules with real-world impact, which were addressed:

The recent "phase shift" changes were partially reverted to avoid an unaccountable phase issue:

Some formatting was changed to adhere to Racket bracketing conventions:

Bind and Switch

Recently, we discovered that bindings assigned within a conditional form like switch were undefined downstream of the conditional. We considered this a blocker for release.

This time, we first considered what we would like the binding rules to be, and came up with one scheme.

Proposed Rules

Predicate binds consequent

(~> (3)
    (switch
      [(~> sqr add1 (ε (as v)
                       #t))
       (gen v)])) ; => 10

Consequent binds downstream

(~> (3)
    (switch [odd? (~> sqr (-< (as v)
                              add1) sub1)])
    (+ v 5)) ;=> (+ 9 v 5), or 23

Downstream reference must be bound in all consequent clauses

(~> (3)
    (switch
      [odd? (~> sqr add1 (as v))]
      [even? displayln])
    (+ v 5)) ; ERROR

Predicate does not bind downstream

(~> (3)
    (switch [(~> sqr add1 (ε (as v)
                             (> 10)))
             displayln])
    (+ v 5)) ; ERROR

Consequent shadows predicate

(~> (3)
    (switch [(~> sqr add1 (ε (as v)
                             #t))
             (~> (+ v 5) (as v))])
    (+ v 5)) ;=> 23

Implemented Rules

We discussed the above candidate binding specification and observed that some of the rules have DAG structure rather than tree structure. That is, they encode scoping rules that cannot be expressed as nested (e.g. let) scopes. This is not supported either by Racket itself or by Syntax Spec. For instance, in an analogous if expression in Racket, we would not expect a variable bound in one clause (e.g. via let) to be available outside the if expression. Additionally, the third rule above ("variable must be bound in every consequent clause") requires some nonlocal processing to be performed on the entire conditional form in order to determine whether the scoping rule is valid or if there is a syntax error, and it cannot be inferred purely from the use of the binding form. Such nonlocal processing is not currently supported in Syntax Spec. Finally, we also considered in what circumstances we might actually make use of or expect to have this kind of binding. We concluded that such cases are awkward at best, and generally indicative of poor design, and it would therefore be best to not support them in any case. So we settled on the following subset of rules:

  1. Predicate binds consequent.
  2. Later predicates shadow earlier ones.

Since switch isn't a core form, these rules would have to be defined in terms of if, which switch expands to.

We notated these rules using Syntax Spec and verified that they behaved as expected.

Designing Qi Normal Form

We've talked about Qi Normal Form a few times before, and rewriting to this form is currently implemented as the first pass of the compiler so that subsequent stages need not handle all permutations of equivalent flow expressions, and only deal in terms of the canonical form. But so far, we've only said that QNF is the "simplest" or "reduced" form of the flow expression, without really defining the context of such reduction. This time we talked about what factors and questions might inform this design:

  • What optimizations are we trying to enable? In other words, what would enable us to apply our theory of optimization? For instance, we current "simplify" nested threads into a flattened thread using the associative law, but from this practical perspective, we would say that we do it "to maximize the scope of deforestation." E.g. (~> (filter odd?) (~> (map sqr)) (foldl +)) would not be deforested but (~> (filter odd?) (map sqr) (foldl +)) would be. Another example: some languages may prefer to rewrite all uses of bindings into uses of static single assignment so that variables are never mutated. This would allow remaining compiler passes to apply optimizations under the assumption that there is no mutation.
  • How can we expose information that would allow desired optimizations?
  • Some goals may be in conflict, and we should think about how to reconcile them in the normal form.

More generally, the design of each compiler pass is informed both by what comes before as well as what comes after.

Getting to the Source

One of the challenges we've encountered in producing good error messages has been that the source expression is lost through expansion and the different stages of compilation. Instead, after expansion, we only have a core language expression corresponding to the source expression, and then, after each stage of compilation, we only have the output of the preceding stage.

To address this, we recently implemented a "de-expander" to produce an expression resembling the original surface language expression. Though some information is inevitably lost, it is at least sufficiently representative that it should help the user easily identify the error.

But of course, it would be better to have the original source syntax through the various stages of expansion and compilation. There are two issues at play here that conspire to prevent this:

  1. The source syntax object is lost after each macro expansion.
  2. Racket only tracks one source location, which is overwritten each time we use quote-syntax, and propagated when we use syntax/loc.

Racket likely does not preserve intermediate expressions through expansion and compilation due to the memory cost of retaining them through large numbers of transformations. Already, the expansion process takes up a lot of memory even though garbage collection is able to recover the memory from intermediate expressions that are no longer referred to.

But we agreed that, at least in the case of languages hosted on Racket, we are dealing with much less voluminous source transformations, and so it would be feasible to store these intermediate stages of expressions on the syntax object as a property which could form the basis of useful APIs provided by syntax spec. And who knows, maybe this initial step and lessons learned thereof could motivate similar changes to Racket's expander in the future.

We considered two values this origin property could have:

  1. The immediately preceding expression (syntax object) that generated the current one.
  2. A list of all preceding expressions.

In the former case, the full stack of expressions is implicit and could be derived by traversing the syntax objects. In the latter, the entire stack would be available without the need for traversal. Both provide the same information but one or the other may prove a more convenient and/or performant basis for APIs.

Although there are definable levels in the transformation of syntax that are commonly of interest -- for instance, surface language, core language, stages of optimized intermediate representations, and target language -- tracking the history of syntax in this way would be generally applicable through the entire process of syntax transformation and is not limited to implicating one of these common levels.

The Blame Game

In general, when there is a syntax error, there are many options for whom to blame:

  1. The surface syntax
  2. A bad macro
  3. The core language syntax
  4. The target syntax

There are no hard and fast rules on whom to blame, and the developer must decide on a case-by-case basis.

The surface syntax

Consider:

(range 1 2 3 4)

range accepts up to 3 arguments, so this source expression is invalid, and should be blamed.

Another instance where we should blame the surface syntax is, of course, the Qi deforestation optimization. There, it wouldn't make sense to blame the deforested, continuation-passing implementation since that's not what the user wrote. We would further assume that Qi expander and compiler transformations are sound (otherwise it would be a bug), and would therefore assume that the error stems from invalid surface syntax, which we would like to be able to clearly implicate.

A bad macro

Consider:

(define-qi-syntax-rule (my-amp f)
  (>< f f))

(~> (3) (my-amp add1))

Here, it is the my-amp macro that is in error, since it passes more than one parameter to >< which accepts only one. The template (>< f f) in my-amp should be blamed.

An analogous Racket example could be something like:

(define-syntax-rule (abc v)
  (sqr v v))

(abc 5)

... where abc should be blamed because sqr only accepts one argument.

The core language syntax

Consider this code generation step of group, a Qi core language form, to Racket:

(define (group-parser stx)
  (syntax-parse stx
    [(_ n:expr
        selection-onex:clause
        remainder-onex:clause)
     #'(loom-compose (qi0->racket selection-onex)
                     (qi0->racket remainder-onex)
                     n)]
    [_:id
     #'(λ (n selection-flo remainder-flo . vs)
         (apply (qi0->racket (group n
                                    (esc selection-flo)
                                    (esc remainder-flo))) vs))]))

Here, this is just a macro that happens to be using a function loom-compose. If it happened to misuse loom-compose (by e.g. passing it more arguments than it accepts), it would be the same as the "bad macro" case above, so it seems that it would be reasonable to blame this macro in the context of expanding a core language form, rather than implicate surface syntax. That is, although such an error would perhaps not be the most helpful to a user of Qi (in this contrived example), it would correctly suggest that this should be fixed by tightening up the patterns in the Qi language so that the right, useful error could have been reported to the user.

The target syntax

Once Qi has generated Racket code, it is the Racket expander that is in a position to generate error messages, and at that stage, of course, it is Racket expressions that would be blamed. In the above example of compiling the group Qi form to a use of a Racket function called loom-compose, if it happened to be the case that loom-compose misuses some Racket syntax (by e.g. passing two arguments to sqr), then it would be appropriate to blame loom-compose -- a Racket function -- rather than implicate any preceding Qi syntax.

Blame continuations?

Having access to a stack of prior expressions and deciding whom to blame is in some ways analogous to the prompt and abort primitives that are used in Racket's continuations (and exception handling). Once we have the stack of prior syntax states available as a data structure, it could be that patterns used in continuations (including exceptions) would be applicable here in correctly assigning blame (note, this is unrelated to the "expansion continuations" already exposed by Syntax Spec).

Having exception-like abstractions on top of the stack of syntax history could enable higher level reasoning about handling syntax errors, perhaps allowing developers to declare which expression is taking responsibility for what kinds of errors, without having to do the legwork of traversing the stack.

A safe blame protocol

For now, we agreed that we should implicate a surface expression when we are sure that it should be implicated (e.g. in deforestation), and in all other cases defer to whoever Racket chooses to blame.

Costs from Contracts

The current release of the compiler is simply to prove that nontrivial optimizations are possible and useful, and in that, we have succeeded. But we also talked about a possible concern with the use of contracts in the deforestation implementation. In general, this tends to incur a performance penalty so that it is often better to manually construct and raise an exception in practice. But as the contract machinery provides a lot of useful error information, and as we are using the low-level API, it is perhaps not a significant cost. But we noted this for further review in the future. We also discussed that it might be faster to use a flat contract if possible instead of a chaperone, especially since Racket provides some utilities that could allow such contracts to follow a "fast path" to verification in some cases.

One specific issue that was discussed recently was that in checking (chaperone?) contracts, Racket first does an arity check before even supplying the values to the contract. This was incurring some cost, and it was mitigated by adding a "case-lambda hack." Even this incurs a one-time cost at the beginning of the functional pipeline, which is why it's prominent only for small lists such as those used in the filter-map benchmark, but insignificant for large lists (e.g. filter-map (large list)). It could be that the use of flat contracts would avoid this issue.

Testing Improvements

Unit testing the expander

We've talked about how having a more granular way to unit test the expander would be useful. This time, we used utilities from syntax/macro-testing, specifically phase1-eval, to implement such unit tests, although, for now, they are only able to verify syntax through datum matching rather than literally. The former would catch obvious errors in expansion, but would not catch more subtle errors like an identically-named identifier failing to match an expected pattern due to literal matching expecting a different binding for the identifier.

In order to do it with literal matching, we would need an alpha-equivalence predicate for Qi to complement a similar predicate for Racket already provided by Syntax Spec. It's possible that Syntax Spec would be able to infer such a predicate from the core language grammar, but for now we must rely on datum matching.

This is still a great start on testing the expander, as it will be useful to be able to test every link in the process of translating Qi to Racket in a granular way to be able to triangulate any possible issues in the future. Qi already has close to 100% test coverage, but we still would like to have such granular tests, even on code that is already covered, in order to help us develop, debug, and iterate in a precise way.

Rich expansion events

We also discussed the issue where the Macro Stepper does not have full, hierarchical visibility into syntax transformations that we perform during compilation. We felt that trying out unpublished internal APIs that could accomplish this would be a reasonable next step, and if that works out, to encourage publication and standardization of those APIs (perhaps via a PR against core Racket).

Optimization Fixation

We currently optimize to a fixed point in each of our (two) optimization passes. We discussed and agreed that this isn't necessary in deforestation since we expect it to reach a fixed point after a single pass, and furthermore, that we should not look for fixed points by default but only when we specifically expect to have to do that.

Next Steps

(Some of these are carried over from last time)

  • Update the main documentation to reflect the compiler work
  • Update the developer documentation with details of the compiler implementation
  • Comprehensively review error messages in Qi
  • Review the use of explicit side effects in Qi wrt optimizations.
  • Add test cases for the backwards incompatibility on datum vs literal matching.
  • Investigate ways to extend the compiler (e.g. language composition or compiler macros) and use that to implement qi/list (deforesting racket/list) as a proof of concept.
  • Prototype and then propose map1 for addition to Racket.
  • Validate that optimization passes can prematurely terminate if they don't return false when they don't transform the input (and then fix it).
  • Preserve the stack of source expressions through expansion in Syntax Spec.
  • Review the use of contracts in deforestation.
  • Improve expander tests to compare expressions literally, using a test for alpha-equivalence.
  • Try unpublished expansion event APIs to implement hierarchical reporting of expansion in the compiler.
  • Implement streams for the other list operations described in St-Amour's writeup (i.e. unfoldr zip average). Note: implementing average might benefit from having "multi-valued CPS" for deforestation.
  • Write unit tests for more compiler rewrite rules, as well as end-to-end tests for pairs of input expressions, "logging" style (like Typed Racket) tests to validate expected compilation sequences, and sanity tests for the expander.
  • Find a way to reliably detect list-like operations on values.
  • Rewrite these list-like operations to list-based functional pipelines, e.g. (~> (-< (>< f) (>< g)) ...)
  • Generalize the stream implementation to support multiple values at each step (in addition to zero or one) and recheck the performance.
  • Review other core Qi combinators besides ~> (e.g. -<, == and ><, maybe if and pass) and see whether it would make sense to extend fusion to them.
  • Should Qi provide a map utility (overriding the default one) that supports "0 or more" values (as was discussed in the RacketCon talk)?
  • Design a core continuation form(s?) for Qi, and show how try can be implemented in terms of it.
  • Describe the sequence of compilation for input expressions with the optimizing compiler against the reference compiler, showing cases in which these do not terminate.
  • Come up with a methodology to formalize and theoretically validate compiler behavior.
  • Review other bindings cases that we could support next (e.g. selective binding, packing values, and binding exceptions in try).

Attendees

Dominik, Michael, Sid

Clone this wiki locally