Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Fetching contributors…

Octocat-spinner-32-eaf2f5

Cannot retrieve contributors at this time

file 444 lines (332 sloc) 15.209 kb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443

Background
=====================================================================

  As an exercise while reading through
Types and Programming Languages,
I decided to implement an interpreter
and typechecker for System F,
using HOAS (Higher-Order Abstract Syntax)
and Haskell's GADTs (Generalized Algebraic Data Types).
There were some really cute tricks
that made the implementation fairly simple,
so I decided to blog about it.

> {-# LANGUAGE GADTs, KindSignatures, FlexibleInstances, EmptyDataDecls #-}

> import Prelude hiding (succ)
> import Control.Applicative
> import Control.Monad

  I've left the module declaration out of this version
for the sake of space. See
[http://github.com/DanBurton/system-f](http://github.com/DanBurton/system-f)
for the module-ized version with few comments.


Preliminaries
=====================================================================

  I've provided a few synonyms around the Either type,
in the event that I might want to change the error handling
in the future. Basic stuff, really.

> type ErrOr a = Either String a

> good :: a -> ErrOr a
> good = Right

> err :: String-> ErrOr a
> err = Left

> isGood :: ErrOr a -> Bool
> isGood Right{} = True
> isGood Left{} = False

  This next function is a bit of an abombination.
I promise I won't abuse it too much.

> get :: ErrOr a -> a
> get (Right x) = x
> get (Left e) = error e


Data types
=====================================================================

  The language should provide some primitives.
Here I've provided Num primitives which correspond to
natural numbers, sort of. Primitive doesn't just have kind `*`,
it has kind `* -> *`, meaning primitives are parameterized over
some Haskell type. More on this later.

> data Primitive :: * -> * where
> Num :: Integer -> Primitive Integer
> Succ :: Primitive (Integer -> Integer)

  In retrospect, it is slightly confusing
that I named one of these constructors "Num".
This is a *value*, so Haskell doesn't confuse it with
the typeclass of the same name. This might cause problems
if I turned on some of the extra kind-wankery GHC provides.

---------------------------------------------------------------------

  Types are a little more interesting.
In System F, Types are also "values" in the language:
you apply type literals to type abstractions.

  There are three types that I provide here:
the Num type, the Function type, and the Type Abstraction type.
I use V as a pun for the "forall" symbol,
which represents type abstractions.
If additional primitives were added,
then this could be extended with relative ease
(e.g. CharTy, ListTy, etc).

  Notice how Types are also parameterized on Haskell types.
We'll talk about that soon... promise!

> data Type :: * -> * where
> NumTy :: Type Integer
> FunTy :: Type a -> Type b -> Type (a -> b)
> VTy :: (Type a -> Type b) -> Type (V a b)
> TyVar :: Char -> Type a

  Now, obviously I can't count,
because I said there were three types of types,
but in fact there are four. Well the fourth (TyVar)
is used purely as a hack for the purpose of printing
and equality testing. Let's check those out:

> instance Eq (Type a) where
> (==) = eqTy (['A' .. 'Z'] ++ ['a' .. 'z'])

  Here we are about to define equality on types
based on a helper function that takes a list of Chars.
A "forall" type is represented by a Haskell function from Type to Type,
so whenever we hit one of those, we will just pull a Char out of the list
and create a TyVar to hand into that function.
By handing the same Char to two different VTys,
we will be able to see whether or not the functions are equal.
There are some underlying assumptions going on here,
primarily that you are not constructing types using strange means.

> eqTy :: [Char] -> Type a -> Type a -> Bool
> eqTy _ NumTy NumTy = True
> eqTy cs (FunTy dom rng) (FunTy dom' rng') = eqTy cs dom dom' && eqTy cs rng rng'
> eqTy (c:cs) (VTy f) (VTy f') = eqTy cs (f (TyVar c)) (f' (TyVar c))
> eqTy [] _ _ = error "Congratulations, you've used up all of the characters. Impressive."
> eqTy _ (TyVar c) (TyVar c') = c == c'
> eqTy _ _ _ = False

  The Show instance is much the same.
I prefer my type variables to be X, Y, and Z when possible,
so the list of Chars starts with those.
Whenever we need to print out a Forall'd type,
we just pick a Char from the list, and use that TyVar hack.

> instance Show (Type a) where
> show = showTy ("XYZ" ++ ['A' .. 'W'])

> showTy :: [Char] -> (Type a) -> String
> showTy _ NumTy = "Num"
> showTy cs (FunTy dom rng) = "(" ++ showTy cs dom ++ " -> " ++ showTy cs rng ++ ")"
> showTy (c:cs) (VTy f) = "(∀ " ++ [c] ++ ". " ++ showTy cs (f (TyVar c)) ++ ")"
> showTy [] VTy{} = error "Too many nested type applications"
> showTy _ (TyVar t) = [t]

---------------------------------------------------------------------

  Here's a lonely little line of code.
This is the entire reason for the EmptyDataDecls pragma.
The Haskell type of a System F function abstraction is parameterized
on a Haskell type (as we will soon see). But type abstractions
are not parameterized by any pre-existing Haskell type.
Instead, they are parameterized on this V type.

> data V a b

---------------------------------------------------------------------

  Now for the exciting part, terms!
GADTs really shine here. (Well, they shine for Types too).

  There are five kinds of terms in System F:
primitives, function abstractions, function applications,
type abstractions, and type applications.
Look very carefully at the type signature for each of these constructors.

  A Primitive parameterized on Haskell type `a` turns into
a Term parameterized on Haskell type `a`.

  An Abstraction requires you to declare the input Type
and to provide a function from Term to Term. Notice how
these are parameterized on types `a` and `b` in a nifty way.

  An Application requires a Term parameterized on a function
from `a` to `b`, and a Term parameterized on `a`, and creates
a term parameterized on `b`.

  Type abstractions and applications are similar to those of functions.
The only differences are the use of the V type,
and the type abstraction does not require you to specify the type of the input.
(This implementation of System F does not support bounded quantification)
Now go back and review the type signatures for constructors of Type.
Makes sense, right?

> data Term :: * -> * where
> Prim :: Primitive a -> Term a
> Abs :: Type a -> (Term a -> Term b) -> Term (a -> b)
> App :: Term (a -> b) -> Term a -> Term b
> TAbs :: (Type a -> Term b) -> Term (V a b)
> TApp :: Term (V a b) -> Type a -> Term b
> Unknown :: Char -> Term a

  There's also that nifty little extra constructor: Unknown.
This is a hack analogous to TyVar; we'll see how they play together later.

  Now for another cute little trick:
getting ghci to also be the repl for our little System F language.
All you have to do is provide a Show instance for Terms
that evaluates its argument and discovers its type.

> instance Show a => Show (Term a) where
> show t = let v = get $ eval t
> ty = get $ typeOf t
> in show v ++ " : " ++ show ty

Using a GHC < 7.4, this Eq instance is required for what follows.
Implementing Eq on terms could be done in similar fashion to
Eq on Types, but I was too lazy to do it.

> instance Eq (Term Integer) where
> (==) = undefined

  Here's a related trick to make our System F even more convenient,
by defining `fromInteger = num` for `Term Integer`, we can now use
numeric literals as primitive values in our System F language.
`num` is defined near the end, along with a few other
language conveniences.

> instance Num (Term Integer) where
> fromInteger = num
> (+) = undefined
> (-) = undefined
> (*) = undefined
> abs = undefined
> signum = undefined
> negate = undefined


Evaluation
=====================================================================

  Now for evaluation. The code is almost insanely simple.
First, let's define `eval'`, which reduces terms as much as possible,
using a big-step approach.

> eval' :: Term a -> ErrOr (Term a)
> eval' (Prim p) = good $ Prim p
> eval' (Abs t f) = good $ Abs t f
> eval' (TAbs f) = good $ TAbs f
> eval' (App f x) = do
> f' <- eval' f
> res <- runApp f' <*> eval' x
> eval' res
> eval' (TApp f x) = do
> f' <- eval' f
> res <- runTApp f' <*> pure x
> eval' res

  Here for function and type applications,
we rely on helper functions `runApp` and `runTApp` respectively,
which will either hit an error, or produce an actual Haskell function.

  We'll also define a "full eval" function,
which creates an actual Haskell value out of evaluating a `Term`
(or produces an error).

> eval :: Term a -> ErrOr a
> eval t = eval' t >>= valueOf

---------------------------------------------------------------------

  Here I've only provided a way to extract an Int.
Functions are left as an exercise to the reader.

> valueOf :: Term a -> ErrOr a
> valueOf (Prim n) = fromPrim n
> valueOf _ = err "Not a value"

> fromPrim :: Primitive a -> ErrOr a
> fromPrim (Num n) = good n
> fromPrim _ = err "fromPrim failed unexpectedly"

---------------------------------------------------------------------

> runApp :: Term (a -> b) -> ErrOr (Term a -> Term b)
> runApp (Abs t f) = good f
> runApp (Prim p) = runAppPrim p
> runApp _ = err "unexpected non-abstraction used in application"

  runApp is simple. If it is a function abstraction,
just grab the Haskell function that was already provided.
If it is a primitive, then provide a primitive implementation.

> runAppPrim :: Primitive (a -> b) -> ErrOr (Term a -> Term b)
> runAppPrim Succ = good $ \(Prim (Num n)) -> num (n + 1)

  Here's something really cute about GADTs.
At first I had another case,
`runAppPrim _ = err "blah blah"`
which followed the `Succ` case.
But GHC warned me about overlapping patterns!
Since I gave this function the type `Primitive (a -> b) -> blah`,
GHC *knows* that the `Num` is not a possibility.

---------------------------------------------------------------------

  There are no primitives that are type abstractions,
so `runTApp` is even simpler than `runApp`.

> runTApp :: Term (V a b) -> ErrOr (Type a -> Term b)
> runTApp (TAbs f) = good f
> runTApp _ = err "runTApp failed unexpectedly"


Typing
=====================================================================

  Now for another fun part, type reconstruction!
Given a `Term`, we want to discover its `Type`.
But here's something really cool: the `Term` and the `Type`
have to be parameterized over *the same Haskell type*!
Basically, Haskell's type checker will prevent me from
writing my type checker incorrectly.

  I think it would actually be safe to remove the Error wrapping
around the result of this function, and always assume that it is correct,
because the Haskell type checker should prevent you from constructing
an ill-typed Term in the first place.

> typeOf :: Term a -> ErrOr (Type a)
> typeOf (Prim p) = good $ primType p
> typeOf (Abs t f) = FunTy t <$> typeOf (f (genTy t))
> typeOf (TAbs f) = good $ VTy (\x -> get $ typeOf (f x))
> typeOf (App f x) = do
> FunTy dom rng <- typeOf f
> t <- typeOf x
> if (t == dom)
> then good rng
> else err "function domain does not match application input"
> typeOf (TApp f x) = do
> VTy f' <- typeOf f
> good (f' x)
> typeOf (Unknown c) = good $ TyVar c

The types of primitives are predetermined

> primType :: Primitive a -> Type a
> primType Num{} = NumTy
> primType Succ = FunTy NumTy NumTy

  Now even more fun! In order to determine the type of a function abstraction,
we need to be able to inspect the "body" or "result" of that function.
However, since it is represented as a Haskell function, it is opaque to us!
Or is it? All we really have to do is give it some Term, any Term,
of the correct input type, and then look at the type of the result.

  So all we have to do is, given a Type, generate a Term of the correct Type.
Can we actually do that? Well, sure! Check it out:

> genTy :: Type a -> Term a
> genTy NumTy = num 0
> genTy (FunTy dom rng) = l dom (\_ -> genTy rng)
> genTy (VTy f) = TAbs (\x -> genTy (f x))
> genTy (TyVar c) = Unknown c

  First, notice the interplay between genTy and typeOf
when it comes to Unknown and TyVar. An Unknown c has type TyVar c,
and to get a value of TyVar c, just create an Unknown c! Cute.

  More seriously, this function is a testament to the awesomeness of GADTs.
Check this out:

    [haskell]
    num 0 :: Term Integer
    l foo (\_ -> genTy bar) :: Term (Foo -> Bar)

(l = Abs, see below)
genTy cannot possibly be well-typed, because these two expressions
have entirely different (and entirely concrete, non-polymorphic) types!

  ... and yet it is, and this is the real magic of parameterizing *both*
Types and Terms on Haskell types. Since NumTy is parameterized on `Integer`,
that means that the result of `genTy NumTy` must be a `Term Integer'.
But since `FunTy foo bar` is parameterized on `Foo -> Bar`,
that means that the result of `genTy (FunTy foo bar)` must be a `Term (Foo -> Bar)`.
So genTy, which would otherwise be impossible to type,
is, in fact, well typed! All thanks to (quite natural) use of GADTs and
Haskell's sexy types.

tl;dr - parametric polymorphism + GADTs = awesomesauce


Language primitives
=====================================================================

  These are just a few "primitives".
Here I use the term "primitive" to mean "you should actually write
System F expressions using these". Although with the Num typeclass hack,
`num` should be unnecessary.

> num = Prim . Num
> succ = Prim Succ

> v = TAbs
> l = Abs

> app = App
> tapp = TApp


Basic testing functions
=====================================================================

  Let's play around, defining a couple functions in System F.
You'll notice how verbose it is to perform function and type applications.
Brownie points to you if you write some Template Haskell quasi-quoting
that can prettify this. (Contribute it to the github repo linked at the top!)

A simple function that simply applies the primitive succ to its input

> succ' = l NumTy (\x -> app succ x)

The identity function. Given a type and an input of that type,
reproduce the input.

> id' = v (\t -> l t (\x -> x))

The const function. Given two types, and two inputs of those types,
reproduce the first.

> const' = v (\t1 -> v (\t2 -> l t1 (\x -> l t2 (\_ -> x))))

Given a function from X -> X, produce the function
that performs the original function twice on its given input.

> twice = v (\t -> l (FunTy t t) (\f -> l t (\x -> (app f (app f x)))))

Use the twice function on itself!

> fourTimes = v (\t -> app (tapp twice (FunTy t t)) (tapp twice t))

Example usage:

    [ghci]
    app (app (tapp twice NumTy) succ) 0

I wish this could be written in a more System F style:

    [haskell]
    [| twice NumTy succ 0 |]

----------------------------------------------------------------------

Here's a cool thing to check out. Go into ghci and try out the following:

    [ghci]
    :type const'
    typeOf const'

Cool, huh? The inferred Haskell type really captures a lot of the meaning.
Something went wrong with that request. Please try again.