jupytext | kernelspec | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
{{ video_embed | replace("%%VID%%", "CLi5RmgQ9Mg")}}
The problem we were having in the previous section was that we wanted to add code to two different modules, but that code needed to be parameterized on the details of the module to which it was being added. It's that kind of parameterization that is enabled by an OCaml language feature called functors.
**Why the name "functor"?** In [category theory][intellectualterrorism], a
*category* contains *morphisms*, which are a generalization of functions as we
know them, and a *functor* is map between categories. Likewise, OCaml modules
contain functions, and OCaml functors map from modules to modules.
The name is unfortunately intimidating, but a functor is simply a "function" from modules to modules. The word "function" is in quotation marks in that sentence only because it's a kind of function that's not interchangeable with the rest of the functions we've already seen. OCaml's type system is stratified: module values are distinct from other values, so functions from modules to modules cannot be written or used in the same way as functions from values to values. But conceptually, functors really are just functions.
Here's a tiny example of a functor:
module type X = sig
val x : int
end
module IncX (M : X) = struct
let x = M.x + 1
end
The functor's name is IncX
. It's essentially a function from modules to
modules. As a function, it takes an input and produces an output. Its input is
named M
, and the type of its input is X
. Its output is the structure that
appears on the right-hand side of the equals sign: struct let x = M.x + 1 end
.
Another way to think about IncX
is that it's a parameterized structure. The
parameter that it takes is named M
and has type X
. The structure itself has
a single value named x
in it. The value that x
has will depend on the
parameter M
.
Since functors are essentially functions, we apply them. Here's an example of
applying IncX
:
module A = struct let x = 0 end
A.x
module B = IncX (A)
B.x
module C = IncX (B)
C.x
Each time, we pass IncX
a module. When we pass it the module bound to the name
A
, the input to IncX
is struct let x = 0 end
. Functor IncX
takes that
input and produces an output struct let x = A.x + 1 end
. Since A.x
is 0
,
the result is struct let x = 1 end
. So B
is bound to struct let x = 1 end
.
Similarly, C
ends up being bound to struct let x = 2 end
.
Although the functor IncX
returns a module that is quite similar to its input
module, that need not be the case. In fact, a functor can return any module it
likes, perhaps something very different than its input structure:
module AddX (M : X) = struct
let add y = M.x + y
end
Let's apply that functor to a module. The module doesn't even have to be bound to a name; we can just write an anonymous structure:
module Add42 = AddX (struct let x = 42 end)
Add42.add 1
Note that the input module to AddX
contains a value named x
, but the output
module from AddX
does not:
:tags: ["raises-exception"]
Add42.x
It's tempting to think that a functor is the same as `extends` in Java, and that
the functor therefore extends the input module with new definitions while
keeping the old definitions around too. The example above shows that is not the
case. A functor is essentially just a function, and that function can return
whatever the programmer wants. In fact the output of the functor could be
arbitrarily different than the input.
In the functor syntax we've been using:
module F (M : S) = ...
end
the type annotation : S
and the parentheses around it, (M : S)
are required.
The reason why is that OCaml needs the type information about S
to be provided
in order to do a good job with type inference for F
itself.
Much like functions, functors can be written anonymously. The following two syntaxes for functors are equivalent:
module F (M : S) = ...
module F = functor (M : S) -> ...
The second form uses the functor
keyword to create an anonymous functor, like
how the fun
keyword creates an anonymous function.
And functors can be parameterized on multiple structures:
module F (M1 : S1) ... (Mn : Sn) = ...
Of course, that's just syntactic sugar for a higher-order functor that takes a structure as input and returns an anonymous functor:
module F = functor (M1 : S1) -> ... -> functor (Mn : Sn) -> ...
If you want to specify the output type of a functor, the syntax is again similar to functions:
module F (M : Si) : So = ...
As usual, it's also possible to write the output type annotation on the module expression:
module F (M : Si) = (... : So)
To evaluate an application module_expression1 (module_expression2)
, the first
module expression is evaluated and must produce a functor F
. The second module
expression is then evaluated to a module M
. The functor is then applied to the
module. The functor will be producing a new module N
as part of that
application. That new module is evaluated as always, in order of definition from
top to bottom, with the definitions of M
available for use.
The simplest syntax for functor types is actually the same as for functions:
module_type -> module_type
For example, X -> Add
below is a functor type, and it works for the AddX
module we defined earlier in this section:
module type Add = sig val add : int -> int end
module CheckAddX : X -> Add = AddX
Functor type syntax becomes more complicated if the output module type is dependent upon the input module type. For example, suppose we wanted to create a functor that pairs up a value from one module with another value:
module type T = sig
type t
val x : t
end
module Pair1 (M : T) = struct
let p = (M.x, 1)
end
The type of Pair1
turns out to be:
functor (M : T) -> sig val p : M.t * int end
So we could also write:
module type P1 = functor (M : T) -> sig val p : M.t * int end
module Pair1 : P1 = functor (M : T) -> struct
let p = (M.x, 1)
end
Module type P1
is the type of a functor that takes an input module named M
of module type T
, and returns an output module whose module type is given by
the signature sig..end
. Inside the signature, the name M
is in scope. That's
why we can write M.t
in it, thereby ensuring that the type of the first
component of pair p
is the type from the specific module M
that is passed
into Pair1
, not any other module. For example, here are two different
instantiations:
module P0 = Pair1 (struct type t = int let x = 0 end)
module PA = Pair1 (struct type t = char let x = 'a' end)
Note the difference between int
and char
in the resulting module types. It's
important that the output module type of Pair1
can distinguish those. And
that's why M
has to be nameable on the right-hand side of the arrow in P1
.
Functor types are an example of an advanced programming language feature called
*dependent types*, with which the **type** of an output is determined by the
**value** of an input. That's different than the normal case of a function,
where it's the output **value** that's determined by the input value, and the
output **type** is independent of the input value.
Dependent types enable type systems to express much more about the correctness
of a program, but type checking and inference for dependent types is much more
challenging. Practical dependent type systems are an active area of research.
Perhaps someday they will become popular in mainstream languages.
The module type of a functor's actual argument need not be identical to the
formal declared module type of the argument; it's fine to be a subtype. For
example, it's fine to apply F
below to either X
or Z
. The extra item in
Z
won't cause any difficulty.
module F (M : sig val x : int end) = struct let y = M.x end
module X = struct let x = 0 end
module Z = struct let x = 0;; let z = 0 end
module FX = F (X)
module FZ = F (Z)
{{ video_embed | replace("%%VID%%", "sCbUwQvNYJA")}}
The standard library's Map module implements a map (a binding from keys to values) using balanced binary trees. It uses functors in an important way. In this section, we study how to use it. You can see the implementation of that module on GitHub as well as its interface.
The Map module defines a functor Make
that creates a structure implementing a
map over a particular type of keys. That type is the input structure to Make
.
The type of that input structure is Map.OrderedType
, which are types that
support a compare
operation:
module type OrderedType = sig
type t
val compare : t -> t -> int
end
The Map module needs ordering, because balanced binary trees need to be able to
compare keys to determine whether one is greater than another. The compare
function's specification is the same as that for the comparison argument to
List.sort_uniq
, which we previously discussed:
- The comparison should return
0
if two keys are equal. - The comparison should return a strictly negative number if the first key is lesser than the second.
- The comparison should return a strictly positive number if the first key is greater than the second.
Does that specification seem a little strange? Does it seem hard to remember
when to return a negative vs. positive number? Why not define a variant instead?
```ocaml
type order = LT | EQ | GT
val compare : t -> t -> order
```
Alas, historically many languages have used comparison functions with similar
specifications, such as the C standard library's [`strcmp` function][strcmp].
When comparing two integers, it does make the comparison easy: just perform a
subtraction. It's not necessarily so easy for other data types.
[strcmp]: http://www.gnu.org/software/libc/manual/html_node/String_002fArray-Comparison.html
The output of Map.Make
supports all the usual operations we would expect from
a dictionary:
module type S = sig
type key
type 'a t
val empty: 'a t
val mem: key -> 'a t -> bool
val add: key -> 'a -> 'a t -> 'a t
val find: key -> 'a t -> 'a
...
end
The type variable 'a
is the type of values in the map. So any particular
map module created by Map.Make
can handle only one type of key, but is not
restricted to any particular type of value.
Here's an example of using the Map.Make
functor:
:tags: ["hide-output"]
module IntMap = Map.Make(Int)
If you show that output, you'll see the long module type of IntMap
. The Int
module is part of the standard library. Conveniently, it already defines the two
items required by OrderedType
, which are t
and compare
, with appropriate
behaviors. The standard library also already defines modules for the other
primitive types (String
, etc.) that make it convenient to use any primitive
type as a key.
Now let's try out that map by mapping an int
to a string
:
open IntMap;;
let m1 = add 1 "one" empty
find 1 m1
mem 42 m1
:tags: ["raises-exception"]
find 42 m1
bindings m1
The same IntMap
module allows us to map an int
to a float
:
let m2 = add 1 1. empty
bindings m2
But the keys must be int
, not any other type:
:tags: ["raises-exception"]
let m3 = add true "one" empty
That's because the IntMap
module was specifically created for keys that are
integers and ordered accordingly. Again, order is crucial, because the
underlying data structure is a binary search tree, which requires key
comparisons to figure out where in the tree to store a key. You can even see
that in the standard library source code (v4.12), of which the
following is a lightly-edited extract:
module Make (Ord : OrderedType) = struct
type key = Ord.t
type 'a t =
| Empty
| Node of {l : 'a t; v : key; d : 'a; r : 'a t; h : int}
(** left subtree, key, value/data, right subtree, height of node *)
let empty = Empty
let rec mem x = function
| Empty -> false
| Node {l, v, r} ->
let c = Ord.compare x v in
c = 0 || mem x (if c < 0 then l else r)
...
end
The key
type is defined to be a synonym for the type t
inside Ord
, so
key
values are comparable using Ord.compare
. The mem
function uses
that to compare keys and decide whether to recurse on the left subtree or right
subtree.
Note how the implementor of Map
had a tricky problem to solve: balanced binary
search trees require a way to compare keys, but the implementor can't know in
advance all the different types of keys that a client of the data structure will
want to use. And each type of key might need its own comparison function.
Although Stdlib.compare
can be used to compare any two values of the same
type, the result it returns isn't necessarily what a client will want. For
example, it's not guaranteed to sort names in the way we wanted above.
So the implementor of Map
used a functor to solve their problem. They
parameterized on a module that bundles together the type of keys with a function
that can be used to compare them. It's the client's responsibility to implement
that module.
The Java Collections Framework solves a similar problem in the TreeMap class, which has a constructor that takes a Comparator. There, the client has the responsibility of implementing a class for comparisons, rather than a structure. Though the language features are different, the idea is the same.
When the type of a key becomes complicated, we might want to write our own
custom comparison function. For example, suppose we want a map in which keys are
records representing names, and in which names are sorted alphabetically by last
name then by first name. In the code below, we provide a module Name
that can
compare records that way:
type name = {first : string; last : string}
module Name = struct
type t = name
let compare { first = first1; last = last1 } { first = first2; last = last2 }
=
match String.compare last1 last2 with
| 0 -> String.compare first1 first2
| c -> c
end
The Name
module can be used as input to Map.Make
because it satisfies the
Map.OrderedType
signature:
:tags: ["hide-output"]
module NameMap = Map.Make (Name)
Now we could use that map to associate names with birth years:
let k1 = {last = "Kardashian"; first = "Kourtney"}
let k2 = {last = "Kardashian"; first = "Kimberly"}
let k3 = {last = "Kardashian"; first = "Khloe"}
let nm =
NameMap.(empty |> add k1 1979 |> add k2 1980 |> add k3 1984)
let lst = NameMap.bindings nm
Note how the order of keys in that list is not the same as the order in which we
added them. The list is sorted according to the Name.compare
function we
wrote. Several of the other functions in the Map.S
signature will also process
map bindings in that sorted order—for example, map
, fold
, and iter
.
In the standard library's map.mli
interface, the specification for
Map.Make
is:
module Make (Ord : OrderedType) : S with type key = Ord.t
The with
constraint there is crucial. Recall that type constraints specialize
a module type. Here, S with type key = Ord.t
specializes S
to expose the
equality of S.key
and Ord.t
. In other words, the type of keys is the ordered
type.
You can see the effect of that sharing constraint by looking at the module type
of our IntMap
example from before. The sharing constraint is what caused the
= Int.t
to be present:
module IntMap : sig
type key = Int.t
...
end
And the Int
module contains this line:
type t = int
So IntMap.key = Int.t = int
, which is exactly why we're allowed to pass
an int
to the add
and mem
functions of IntMap
.
Without the type constraint, type key
would have remained abstract. We can
simulate that by adding a module type annotation of Map.S
, thereby
resealing the module at that type without exposing the equality:
module UnusableMap = (IntMap : Map.S);;
Now it's impossible to add a binding to the map:
:tags: ["raises-exception"]
let m = UnusableMap.(empty |> add 0 "zero")
This kind of use case is why module type constraints are quite important in effective programming with the OCaml module system. Often it is necessary to specialize the output type of a functor to show a relationship between a type in it and a type in one of the functor's inputs. Thinking through exactly what constraint is necessary can be challenging, though!
With Map
we saw one use case for functors: producing a data structure that was
parameterized on a client-provided ordering. Here are two more use cases.
Here are two implementations of a stack:
:tags: ["hide-output"]
exception Empty
module type Stack = sig
type 'a t
val empty : 'a t
val push : 'a -> 'a t -> 'a t
val peek : 'a t -> 'a
val pop : 'a t -> 'a t
end
module ListStack = struct
type 'a t = 'a list
let empty = []
let push = List.cons
let peek = function [] -> raise Empty | x :: _ -> x
let pop = function [] -> raise Empty | _ :: s -> s
end
module VariantStack = struct
type 'a t = E | S of 'a * 'a t
let empty = E
let push x s = S (x, s)
let peek = function E -> raise Empty | S (x, _) -> x
let pop = function E -> raise Empty | S (_, s) -> s
end
Suppose we wanted to write an OUnit test for ListStack
:
:tags: ["remove-cell"]
#use "topfind";;
#require "ounit2";;
open OUnit2;;
:tags: ["remove-output"]
let test = "peek (push x empty) = x" >:: fun _ ->
assert_equal 1 ListStack.(empty |> push 1 |> peek)
Unfortunately, to test a VariantStack
, we'd have to duplicate that code:
:tags: ["remove-output"]
let test' = "peek (push x empty) = x" >:: fun _ ->
assert_equal 1 VariantStack.(empty |> push 1 |> peek)
And if we had other stack implementations, we'd have to duplicate the test for them, too. That's not so horrible to contemplate if it's just one test case for a couple implementations, but if it's hundreds of tests for even a couple implementations, that's just too much duplication to be good software engineering.
Functors offer a better solution. We can write a functor that is parameterized on the stack implementation, and produces the test for that implementation:
:tags: ["remove-output"]
module StackTester (S : Stack) = struct
let tests = [
"peek (push x empty) = x" >:: fun _ ->
assert_equal 1 S.(empty |> push 1 |> peek)
]
end
module ListStackTester = StackTester (ListStack)
module VariantStackTester = StackTester (VariantStack)
let all_tests = List.flatten [
ListStackTester.tests;
VariantStackTester.tests
]
Now whenever we invent a new test we add it to StackTester
, and it
automatically gets run on both stack implementations. Nice!
There is still some objectionable code duplication, though, in that we have to write two lines of code per implementation. We can eliminate that duplication through the use of first-class modules:
:tags: ["remove-output"]
let stacks = [ (module ListStack : Stack); (module VariantStack) ]
let all_tests =
let tests m =
let module S = (val m : Stack) in
let module T = StackTester (S) in
T.tests
in
let open List in
stacks |> map tests |> flatten
Now it suffices just to add the newest stack implementation to the stacks
list. Nicer!
Earlier, we tried to add a function of_list
to both ListSet
and
UniqListSet
without having any duplicated code, but we didn't totally succeed.
Now let's really do it right.
The problem we had earlier was that we needed to parameterize the implementation
of of_list
on the add
function and empty
value in the set module. We can
accomplish that parameterization with a functor:
:tags: ["remove-output"]
module type Set = sig
type 'a t
val empty : 'a t
val mem : 'a -> 'a t -> bool
val add : 'a -> 'a t -> 'a t
val elements : 'a t -> 'a list
end
module SetOfList (S : Set) = struct
let of_list lst = List.fold_right S.add lst S.empty
end
Notice how the functor, in its body, uses S.add
. It takes the implementation
of add
from S
and uses it to implement of_list
(and the same for empty
),
thus solving the exact problem we had before when we tried to use includes.
When we apply SetOfList
to our set implementations, we get modules
containing an of_list
function for each implementation:
:tags: ["remove-output"]
module ListSet : Set = struct
type 'a t = 'a list
let empty = []
let mem = List.mem
let add = List.cons
let elements s = List.sort_uniq Stdlib.compare s
end
module UniqListSet : Set = struct
(** All values in the list must be unique. *)
type 'a t = 'a list
let empty = []
let mem = List.mem
let add x s = if mem x s then s else x :: s
let elements = Fun.id
end
module OfList = SetOfList (ListSet)
module UniqOfList = SetOfList (UniqListSet)
The functor has enabled the code reuse we couldn't get before: we now can
implement a single of_list
function and from it derive implementations for two
different sets.
But that's the only function those two modules contain. Really what we want
is a full set implementation that also contains the of_list
function. We can
get that by combining includes with functors:
module SetWithOfList (S : Set) = struct
include S
let of_list lst = List.fold_right S.add lst S.empty
end
That functor takes a set as input, and produces a module that contains
everything from that set (because of the include
) as well as a new function
of_list
.
When we apply the functor, we get a very nice set module:
module SetL = SetWithOfList (ListSet)
module UniqSetL = SetWithOfList (UniqListSet)
Notice how the output structure records the fact that its type t
is the same
type as the type t
in its input structure. They share it because of the
include
.
Stepping back, what we just did bears more than a passing resemblance to class
extension in Java. We created a base module and extended its functionality with
new code while preserving its old functionality. But whereas class extension
necessitates that the newly extended class is a subtype of the old, and that it
still has all the old functionality, OCaml functors are more fine-grained in
what they can accomplish. We can choose whether they include the old
functionality. And no subtyping relationships are necessarily involved.
Moreover, the functor we wrote can be used to extend any set implementation
with of_list
, whereas class extension applies to just a single base class.
There are ways of achieving something similar in object-oriented languages with
mixins, which enable a class to re-use functionality from other classes
without necessitating the complication of multiple inheritance.