# Tagless Final Encoding
##### (http://okmij.org/ftp/tagless-final/course/lecture.pdf)

The tagless final style of encoding allows for greater extensability of an embedded DSL than the initial encoding.

Initial encoding uses ADTs to construct a syntax tree:

In [1]:
data Exp = Lit Int
         | Add Exp Exp
         | Neg Exp
         
eval :: Exp -> Int
eval (Lit x) = x
eval (Neg e) = negate $ eval e
eval (Add a b) = eval a + eval b

eval $ Add (Lit 3) (Lit 4)

prettyPrint :: Exp -> String
prettyPrint e = case e of
  Lit x -> show x
  Neg e -> "(-" ++ prettyPrint e ++ ")"
  Add a b -> "(" ++ prettyPrint a ++ '+' : prettyPrint b ++ ")"
  
prettyPrint $ Add (Lit 3) (Lit 4)

7

"(3+4)"

The same using final encoding:

In [2]:
:ext TypeSynonymInstances
:ext FlexibleInstances

class ExpSYM repr where
  lit :: Int -> repr
  add :: repr -> repr -> repr
  neg :: repr -> repr
  
instance ExpSYM Int where
  lit x = x
  add a b = a + b
  neg x = negate x
  
instance ExpSYM String where
  lit = show
  add a b = "(" ++ a ++ '+' : b ++ ")"
  neg x = "(-" ++ x ++ ")"
  
eval :: Int -> Int
eval = id

eval $ add (lit 3) (lit 4)

prettyPrint :: String -> String
prettyPrint = id

prettyPrint $ add (lit 3) (lit 4)

7

"(3+4)"

We can easily extend the language using the final encoding whereas the initial encoding would require that we modify both the ADT and the evaluator and any code that uses these definitions must be recompiled.

In [3]:
class MultSYM repr where
  mult :: repr -> repr -> repr
  
instance MultSYM Int where
  mult a b = a * b

instance MultSYM String where
  mult a b = "(" ++ a ++ '*' : b ++ ")"
  
eval $ mult (lit 3) (lit 4)
prettyPrint $ mult (lit 3) (lit 4)

12

"(3*4)"

We didn't need to alter any existing code for this to work.

### Non-fold Interpreters

If we want to "push negation down" such that there are no double negatives and negation is only ever applied to literals, it's easy to do with the initial encoding / pattern matching, but how do we do it with final encoding where we are not able to access context directly (we need to see context to identify the double negatives)?

In [4]:
data Ctx = Pos | Neg

instance ExpSYM repr => ExpSYM (Ctx -> repr) where
  lit n Pos = lit n
  lit n Neg = neg (lit n)
  neg e Pos = e Neg
  neg e Neg = e Pos
  add a b ctx = add (a ctx) (b ctx)
  
instance MultSYM repr => MultSYM (Ctx -> repr) where
  mult a b Neg = mult (a Pos) (b Neg)
  mult a b Pos = mult (a Pos) (b Pos)
  
push_neg :: ExpSYM repr => (Ctx -> repr) -> repr
push_neg e = e Pos

prettyPrint $ neg (neg (lit 3))
prettyPrint . push_neg $ neg (neg (lit 3))

"(-(-3))"

"3"

Now we want to flatten and straighten out addition. We do so by applying right-association to any nested addition.

`Add((Add 2 3) 4)` becomes `Add (2 (Add 3 4))`.

In [5]:
data Ctx2 e = LCA e | NonLCA

instance ExpSYM repr => ExpSYM (Ctx2 repr -> repr) where
  lit n NonLCA = lit n
  lit n (LCA e) = add (lit n) e
  neg e NonLCA = neg (e NonLCA)
  neg e (LCA e3) =  add (neg (e NonLCA)) e3 -- we assume that push_neg has already been applied (e is a lit)
  add a b ctx = a (LCA (b ctx))
  
instance (MultSYM repr, ExpSYM repr) => MultSYM (Ctx2 repr -> repr) where
  mult a b NonLCA = mult (a NonLCA) (b NonLCA)
  mult a b (LCA e) = add (mult (a NonLCA) (b NonLCA)) e
  
flata :: ExpSYM repr => (Ctx2 repr -> repr) -> repr
flata e = e NonLCA

tf1 = add (lit 8) (neg (add (lit 1) (lit 2)))
tf3 = add tf1 (neg (neg tf1))

prettyPrint tf3
prettyPrint . flata $ push_neg tf3
eval tf3
eval . flata . push_neg $ tf3

eval . flata . push_neg $ add (mult (lit 2) (lit 3)) (lit 5)

"((8+(-(1+2)))+(-(-(8+(-(1+2))))))"

"(8+((-1)+((-2)+(8+((-1)+(-2))))))"

10

10

11

## Tags
Here's an objects language for lambda-calculus with booleans in the intial form.

In [6]:

data Exp
  = V Var
  | B Bool
  | L Exp
  | A Exp Exp

data Var = VZ | VS Var

-- apply id function to a booleans
ti1 = A (L (V VZ)) (B True)

A problem when writing an evaluator for this is that `L` evals to a lambda while the other branches are Bool.

To fix this, we must have the union of Bool and lambda. The descriminators (UB, UA) are the *tags*.

In [7]:
data U = UB Bool | UA (U -> U)

eval :: [U] -> Exp -> U
eval env (V v) = lookp v env
eval env (B b) = UB b
eval env (L e) = UA $ \x -> eval (x:env) e
eval env (A e1 e2) = case eval env e1 of UA f -> f (eval env e2)

lookp :: Var -> [U] -> U
lookp VZ (x:_) = x
lookp (VS v) (_:xs) = lookp v xs

(UB x) = eval [] ti1
x

True

This evaluator trips up if we try to use it with invalid expressions such as `A (B True) (B False)` or `A (L (V (VS VZ))) (B True)`. We need to make the language typed in order to catch these problems.

We could implement a typecheck function like this:

`typecheck :: Exp -> Either ErrorMsg Exp
type ErrorMsg = String`

Running this typechecker before we do eval ensures that eval returns a valid result since the pattern matches in `A` and `lookp` are effectively exhaustive. Haskell still treats them as pattern matching on tags, however, which inccurs the cost of run-time type tag checking. `typecheck` takes an `Exp` and returns an `Exp` which is the same type, there is no way for Haskell to know that the result `Exp` is typechecked.

Thus, the prescence of type tags such as `UA` and `UB` and run-time tag checking are a symptom of the problem of embedding typed object languages.

## Going tagless
In order to make our initial embedding unable to represent invalid types, we use GADTs

In [16]:
:ext GADTs

data Var env t where
  VZ :: Var (t, env) t
  VS :: Var env t -> Var (a, env) t

data Expr env t where
  B :: Bool -> Expr env Bool
  V :: Var env t -> Expr env t
  L :: Expr (a, env) b -> Expr env (a -> b)
  A :: Expr env (a -> b) -> Expr env a -> Expr env b
  
lookp :: Var env t -> env -> t
lookp VZ (a,_) = a
lookp (VS v) (_, env) = lookp v env
  
eval :: env -> Expr env t -> t
eval env (B b) = b
eval env (V v) = lookp v env
eval env (L e) = \x -> eval (x, env) e
eval env (A e1 e2) = eval env e1 (eval env e2)

ti1 = A (L (V VZ)) (B True)

eval () ti1

-- These guys don't compile anymore
--ti2 = A (B True) (B True)
--eval () ti2

--ti3 = A (L (V (VS VZ))) (B True)
--eval () ti3

True

## Final Tagless
We can write the entire tagless version in 5 lines. The evaluator is part and parcel to the encoding.

In [5]:
vz (vc, _) = vc
vs vp (_, env) = vp env

b bv env = bv
l e env = \x -> e (x, env)
a e1 e2 env = e1 env $ e2 env

tf1 = a (l vz) (b True)
tf1 ()

True

In order to allow multiple interpretations of an embedded language term, we employ a type class.

In [33]:
class Symantics repr where
  int :: Int -> repr h Int
  add :: repr h Int -> repr h Int -> repr h Int
  
  z :: repr (a, h) a
  s :: repr h a -> repr (any, h) a
  lam :: repr (a, h) b -> repr h (a -> b) -- if we know b given a, we can derive a -> b
  app :: repr h (a -> b) -> repr h a -> repr h b
  
td1 :: Symantics repr => repr h Int
td1 = add (int 2) (int 3)

td2o :: Symantics repr => repr (Int, h) (Int -> Int)
td2o = lam (add z (s z))

td3 :: Symantics repr => repr h ((Int -> Int) -> Int)
td3 = lam (add (app z (int 3)) (int 2))

Here is one interpreter, the evaluator

In [37]:
-- we use R instead of (->) to disambiguate from other instances that use the same symantic type.
newtype R h a = R { unR :: h -> a }

instance Symantics R where
  int x     = R $ const x
  add a b   = R $ \h -> unR a h + unR b h
  
  z         = R $ \(x, _) -> x
  s v       = R $ \(_, h) -> unR v h
  lam e     = R $ \h -> \x -> unR e (x, h)
  app e1 e2 = R $ \h -> unR e1 h $ unR e2 h
  
eval e = unR e ()

eval td1
eval td3 (+3)
-- td2o is open, therefore eval td2o is ill-typed

5

8

In [None]:
pg 28