CPS

Introduction

There are three commonly used starting points for Intermediate Representations (IR)

Static Single Assignment (SSA) : commonly used for imperative languages

Continuation Passing Style (CPS) : based on λ calculus

Administrative Normal Form (ANF) : based on λ calculus
Generally, many functional compilers use CPS or ANF.
- First(?) was Steele's Rabbit Scheme
- Later: Orbit Scheme
- Nowadays, ANF + local CPS seems to be accepted.
Interestingly, our choice might matter less than it seems
- it can be proven that CPS = ANF = SSA
- however each form makes certain things harder or even infeasible
Note: Many optimisations become substitution, function application, and arithmetic simplification in CPS

Reading

Compiling with Continuations

: Appel

-   CPS as an IR

The Essence of Compiling with Continuations

: Flanagan 1993

-   introduces ANF as an alternative

Compiling with Continuations, continued

: Kennedy 2007

-   alas, some optimisations are not accessible from ANF
-   argues for CPS IR
-   very practical

Compiling without Continuations

: Maurer 2017

-   introduces *join points* to augment ANF

Compiling with Continuations, or without? Whatever : Cong 2019

LISP in Small Pieces

:

RABBIT: A compiler for Scheme (A study in optimisation) : Steele 1978

Design notes of the STALIN compiler

:

Typical Compiler Workflow

Phases
1. Conversion to CPS
2. Optimisation of CPS
3. Closure-conversion
  - add extra argument to all form that captures the current environment
  - this contains only the free variables of the form
  - now, all forms are closed, ie they contain no free variables
4. Elimination of nested scopes
  - one global set of mutually recursive functions
5. Register spilling
6. Lowering to abstract machine code
7. Optimisations at machine level
  - scheduling, etc
8. Instruction generation
We will replace 5.-8. with vectorisation and code generation
Closure generated for the continuation usually treated as a bit special
- efficiency seems to be the main reason
Many of the additional $\bar\lambda$ are eliminated using $\beta$-reduction $((\bar \lambda x.P) Q) \rightarrow P[x := Q]$
- this is for efficiency, since CPS conversion explodes the size of the program
- most often done in the actual CPS steps

The Essence of Compiling with Continuations

Source Language: Core Scheme

#+begin_example example

-- Expressions M ::= V -- value

(let (x M) M') -- bind (if M M' M'') -- if-then-else (M M' M'' ...) -- apply (O M' M'' ...) -- primitive

-- Values V ::= c -- constant

x -- named variable (λ x x' ... M) -- lambda abstraction

#+end_example

evaluated via the CEK abstract machine
- definition not reproduced here

Continuation Passing Style - CPS

High level CPS terms P op [v] [k]
- perform an operation op
- bind one or more new variables v
- continue into one of the given continuations k

Abstract syntax

-- Expressions
P ::= (k W)                  -- apply = return
    | (let (x W) P)          -- bind
    | (W k       W' W'' ...) -- tail call
    | (W (\lambda x.P) W' W'' ...) -- call
    | (if W P P')            -- if-then-else
    | (O k       W' W'' ...) -- primitive (tail)
    | (O (\lambda x.P) W' W'' ...) -- primitive

-- Values
W ::= c                      -- constant
    | x                      -- named variable
    | (\lambda x x' ... P)         -- lambda abstraction

Take source language and transform such that
- every intermediate value is bound via an administrative lambda form $\bar\lambda$
- every form now takes one or more continuations, an abstraction for the rest of the program
  - no return
  - only tail calls
This makes the control flow obvious and explicit

Administrative Normal Form - ANF

M ::= V                       -- Values
   | (let (x V) M)            -- bind
   | (if V M M')              -- if-then-else
   | (V V' V'' ...)           -- tail call
   | (let (x (V V' ...)) M)   -- call
   | (O V V' ...)             -- tail primitive
   | (let (x (O V V' ...)) M) -- primitive

-- Values
V ::= c                      -- constant
    | x                      -- named variable
    | (/\ x x' ... M)        -- lambda abstraction

The issue with CPS is this:
- First we introduce tons of new abstractions and, for effiency, we eliminate them again
- Second, much of the information in CPS is redundant
  - eg: the return form takes a continuation, to which it will return, but this information is actually present in the control state of the (abstract) machine
- These two steps basically undo (most of) CPS
- So, why do it in the first place?
ANF is a transform from the source language to the source language, that captures this CPS-β-unCPS cycle into one step
- no form takes unevaluated expressions, only values
- all intermediates are let-bound

ANF conversion

-- evaluation context
E ::= []            -- empty
    | (let (x E) M)
    | (if E M M')
    | (F V V' ... E M M') where F = V -- lambda
                                  | O -- primitive

-- reduction rules/
A[(let (x M) M')] = (let (x M) (A[M']))          -- recurse into body
A[(if V M M')]    = (if V A[M] A[M'])            -- recurse into both arms
A[(F V V' ...)]   = (let (t (F V V' ...) A[t])   -- name intermediates and lift out computation

Example
```
#+begin_example scheme
```
(+ ;; E = (+ (+ V V) (let (x V) (F x)))] (+ 2 2) ;; => E = (+ [] (let (x V) (F x))) (let (x 1) ;; => E[t] = (+ t (let (x V) (F x))) (f x))) ;; A[(F V V' ...)] = (let (t (F V V' ...) A[t]) (let (a (+ 2 2)) (+ a (let (x 1) (f x)))) ;; A[(let (x M) M')] = (let (x M) (A[M'])) (let (a (+ 2 2)) (let (x 1) (+ a ;; To be reduced (f x)))) ;; ---- " ---- ;; A[(F V V' ...)] = (let (t (F V V' ...) A[t]) (let (a (+ 2 2)) (let (x 1) (let (b (f x)) (+ a b)))) ;; clean-up (let ((a (+ 2 2)) (x 1) (b (f x))) (+ a b))
```
#+end_example
```

Optimisations

Partial evaluation / constant propagation

Compiling with Continuations

Language

#+begin_example haskell

-- ??? No idea, yet data Access where Off :: Int -> Access Sel :: Int -> Access -> Access deriving(Show)

-- Primitive operations data PrimOp = FAdd | FMul -- FP arithmetic operations

IAdd IMul -- Int arithmetic operations IEq ILt

deriving(Show)

-- CPS language values data Value where Var :: Tag -> Value -- Variable Label :: Tag -> Value -- An abstract label I64 :: Int -> Value -- Integer F64 :: Double -> Value -- Float Str :: String -> Value -- String deriving(Show)

data CExp where -- Define a record with a name and continue Record :: [(Value, -- Field Access)] -- ??? -> Tag -- Name -> CExp -- Continuation -> CExp -- Retrieve field from record Select :: Int -- Field to pick -> Value -- Record -> Tag -- Name given to result -> CExp -- Continuation -> CExp -- Given a record field, bind a new variable pointing to a field shifted against the input Offset :: Int -- Shift -> Value -- Value pointing into a record -> Tag -- Bind the result -> CExp -- Continuation -> CExp -- Call a function. NB. No continuation Apply :: Value -- Function -> [Value] -- Arguments -> CExp -- Define a set of mutually recursive functions and step into continuation Fix :: [(Tag, -- Name [Tag], -- Formal parameter list CExp)] -- Body -> CExp -- Continuation -> CExp -- Take i'th branch Switch :: Value -- Which branch to take -> [CExp] -- Branches -> CExp -- Primitive operation: define some variables and step into one of the continuations. Prim :: PrimOp -- Operation to call -> [Value] -- Arguments -> [Tag] -- Variable names for the outputs -> [CExp] -- Continuations -> CExp

#+end_example

Scope

Prim p (...) [w, ...] [e, ...] : all w are visible in each e

Record [(f, a), ...] w e : w is visible in e

Select i v w e : w is visible in e

Offset i v w e : w is visible in e

Fix [..., (f_i, [... ,w_ij, ...], b_i), ...] e

: - all w_ij are visible in b_i - all f_i are visible in all b_k and e

Closure Conversion

A function f with free variables u, v, w converted to a closure C = <f', [u, v, w]>
- f' = \lambda u v w. f
- C does not have free variables
Label is used to mark constant references to functions
- this is motivated by easier register allocation/computation of the free set
- these are not considered free, despite being free terms, since they do not occupy space/registers
After this step Fix [..., (f_i, [... ,w_ij, ...], b_i), ...] e obeys this rule
- in b_i only w_ik and f_j can be free
At this point all functions in a compilation unit can be defined in one Fix
- since we do not have free variables, no more nesting is needed
- Thus, each unit becomes
```
Fix [ (f_1, [v_11, v_12, ... ], b_1),
      ···,
      (f_n, [v_n1, v_n2, ... ], b_n)]
    E
```
  - none of the f_i nor E contains a Fix
- Which can be re-written, by abstracting over E
```
Fix [ (f_1, [v_11, v_12, ... ], b_1),
      ···,
      (f_n, [v_n1, v_n2, ... ], b_n),
      (f_E, [v_E1, v_E2, ... ], b_E)
    ]
    App (Var f_E) [Var v_E1, ...]
```
- From, we only need to deal with triples (f, [v, ...], b)

Free Variables

in the λ calculus we this simple rule

free (Var x)  = {x}
free (\lambda x. M) = free(M) - {x}
free (M N)    = free(M) \cup free(N)

in Appel, we have this specialised to the IR
```
#+begin_example haskell
```
-- free vars in a list freeList [] = {} freeList ((Var x) : vs) = {x} <> freeList vs freeList ((Label _) : vs) = freeList vs freeList ((Int _) : vs) = freeList vs freeList ((Real _) : vs) = freeList vs freeList ((String _) : vs) = freeList vs

-- free vars in an expression free (Apply f vs) = freeList (f:vs) free (Swtich v cs) = freeList v <> free c0 <> ... free (Select i v w e) = freeList v <> free e - {w} -- What is the precedence here? free (Offset i v w e) = freeList v <> free e - {w} -- What is the precedence here? free (PrimOp o vs ws cs) = freeList vs <> (free c0 <> ...) - {w0, ...} free (Fix [(f0, ws0, b0), ...] e) = free e <> (free b0 - (free w00 <> ...)) - {f0, ...}
```
#+end_example
```

Side Notes

Static Single Assignment

Intermediate form were
- all variables are assigned once
- every intermediate is bound
Control flow via Φ-nodes $x \leftarrow \Phi(p, A, B) // x \leftarrow if p then A else B$
Splitting into Basic Blocks
- one entry and one exit
- can be optimised in isolation

Lambda Calculus

Reductions

β $((\lambda x.P) Q) \rightarrow P[x := Q]$
- replace all non-free occurences of $x$ in $Q$ with $P$
η $((\lambda x (f x)) \rightarrow f$

Contification

optimise around functions that return to a single location eg, this

if p then
   ...
   j()
else
   ...
   j()

can be re-written without calls as

j:
  ...
  goto end

if p then
   ...
   goto j
else
   ...
   goto j
end:

and this becomes

if p then
   ...
else
   ...
end:
...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPS

Introduction

Reading

Typical Compiler Workflow

The Essence of Compiling with Continuations

Source Language: Core Scheme

(let (x M) M') -- bind (if M M' M'') -- if-then-else (M M' M'' ...) -- apply (O M' M'' ...) -- primitive

x -- named variable (λ x x' ... M) -- lambda abstraction

Continuation Passing Style - CPS

Administrative Normal Form - ANF

Optimisations

Compiling with Continuations

Language

Scope

Closure Conversion

Free Variables

Side Notes

Static Single Assignment

Lambda Calculus

Reductions

Contification

Clone this wiki locally