Skip to content
Sindre Føring Devik edited this page Feb 12, 2014 · 4 revisions

The type system

This section details the type system of Opa, including:

  • definition of types
  • polymorphism and rank 2 polymorphism
  • interactions with packages and modules
  • type error messages
  • type best practices

Type system overview

Strong typing

Opa is a statically and strongly typechecked language. This means that the type of an expression is computed once and for all at compile-time, an expression has one unique type and this type can't be used as another one. In Opa, any expression evaluates into a value and functions are first class values. This means that, like integers, booleans, strings in other languages, functions in Opa can be passed as arguments, returned as results, stored in data-types, and so on.

Type inference

Opa compiler offers type inference, i.e. it automatically determines the type of expressions from the context, without the user needing to write these types. Hence, a definition like

x = 5

will lead x to have the type int because the integer literal 5 is known to be of type int, this type representing the type of integer values.

As a bit more complex example, the function

function chose(cond, if_true, otherwise) {
  if (cond) { if_true } else { otherwise }
}

will be found having the type

(bool, int, int) -> int

where bool represents the type or boolean values and -> is the function taking arguments of the types stated on the left side and returning a result of the type stated on the right side of the arrow symbol.

Polymorphism

The type system of Opa features polymorphism, i.e. types in which some parts are not constrained and can be of any type. For instance, the identity function:

function identity(x) { x }

doesn't set any constraint on it argument's type: it can safely be applied to any kind of value and will return this kind of value. Its type will then be 'a -> 'a where 'a is a type variable, i.e. any type will do for 'a. Using identity with an integer as argument will then instantiate 'a with int. Using identity with a boolean will then instantiate 'a with bool.

Records and sums

In addition to usual data-types (integers, booleans, string, floating point numbers), Opa provides two cornerstone structures: records and sums. A record can be seen as a struct in C, i.e. a non-ordered sequence of fields packed into one global entity. However, conversely to C these records can be either closed or open: the list of fields can be fixed or can contain at least the ones stated in the record type. A sum is a set of possible cases, each case being a record. Hence, a sum allows to express the fact that a data-structure can be of different shapes. This can be compared to union in C, with the difference that the effective shape of the data is directly embedded inside its representation.

Type abstraction

In the Opa system, type abstraction, i.e. ability to hide the actual structure of a type is provided thanks to the notions of private and abstract types. The first ones are types that only have an existence in the package where they are defined; they are not visible outside this package. The second ones are transparently visible (i.e. their structure can be manipulated) only in the package where they are defined; outside this package, they become opaque, i.e. are still visible but their internal structure is not visible anymore.

Modularity

Opa provides a certain notion of module, i.e. packages containing value definitions via a particular form of records. Such records are syntactically marked as being modules and will get a particular type called type forall. Under some restrictions due to typechecking decidability, it is possible to use these modules like regular record, hence passing them through functions or returning them as result.

Type inference: restriction

Type inference is not complete for some parts of Opa features and a few type annotations are required from the user. This is mostly due to the expressiveness of the language that provides very flexible features at the price of the non-existence of a principal type. Existence of a principal type means that there is always one better type, i.e. all the other types that could be chosen for an expression would reduce the contexts where this expression can be used. In such a case, the user will have to indicate himself the type he wishes using a type annotation to enforce the expression to be of this type. This construct is called a coercion. On the other side, coercion by type annotations may also be required when the typechecker infers a cyclic type. In effect, such types are internally used by the typechecker but are not currently allowed outside in the language.

Type algebra ------------

As previously stated, expressions evaluate to values of certain types. The type algebra describes the shape of these types. The present section doesn't explains how types are synthesized from expressions even if some examples are given. We rather address here an intuitive presentation of the types.

Constant types

Natively, Opa provides three basic types:

  • int : The type of signed integer values (e.g. 0, -4).
  • string : The type of characters strings (e.g. "this is a string").
  • float : The type of signed floating point values (e.g. 0.489, -5.94).
### Type variable

As stated in introduction, type variables are used to represent types with not particular constraint, i.e. types we don't need to know about or we don't know yet their effective internal structure. Type variables are written by a leading quote followed by regular identifier, for instance: 'a, 'myvar5.

Type variables act in two particular parts of the type system.

  • During type inference, they are used by the typechecker to temporarily give types to expressions that at a certain point do not exhibit the need to have a particular type.
  • At the end of type inference of a definition, type variables that still remain unconstrained are generalized. This means that the typechecker discovered that there is no need to know more about the structure of the types they represent and hence decides to make the type polymorphic: any type will be suitable in place of these variables. Hence, in the type of a definition (not expression), all finally appearing type variables are implicitly quantified by a for all qualifier, telling that the definition having this type has all the possible type schemes where type variables can be instantiated (replaced) by any type. In other words, a type scheme is a model, a pattern of types, describing a family of possible types.

This last point raises an important concept:

  • Expressions have a type.
  • Definitions have a type scheme.

In other words:

  • In an expression a type variable is just an unknown that may one day be known equal to some type.
  • In a definition a type variable is a universally quantified type. Each instance, usage of the identifier bound by this definition is free to replace these variables by types it needs in its context.

Function type

A function is characterized by the type of its arguments and the type of its result. The type constructor for a function is the arrow symbol: ->. On the left part the types of the arguments are listed, comma-separated if there are several arguments. On the right part is the type of the result of the function.

Examples:
  • -> int : Function taking no argument and returning an integer.
  • int -> bool : Function taking one argument of type integer and returning a boolean.
  • int, string -> string : Function taking two arguments, the first one being of type integer, the second of type characters string and returning a characters string.
  • int, (int -> bool), int -> bool : Function taking three arguments, the first and third ones being of type integer, the second one being a function taking an integer and returning a boolean. Finally, the function returns a boolean. Note the presence of parentheses enclosing the function type of the second argument.

Named type

Types can be inductively combined to form complex data-types and bound to names thanks to type definitions. Hence, the following extract defines a new type name natural, whose structure is int.

type natural = int

Once this type is defined, we may use type name natural to denote this new type.

Types can have parameters. For instance, the type of pairs with their two components being of a same type can be defined as:

type same_comp_pair_ty('a) = ('a, 'a)

A pair denoted by the type same_comp_pair_ty(float) will have its two components carrying floating point numbers.

The type of pairs with components of any types can be defined as:

type same_comp_pair_ty('a,'b) = ('b, 'a)

Note that the order of the type variables 'a and 'b is different in the quantification (i.e. left-part of the definition, on the left of the = sign) and in the body of the type definition. As a result, this means that pairs having type same_comp_pair_ty(int, bool) will carry integers in their second component and booleans in their first component.

Finally the type of relational operators (i.e. functions taking two arguments of the same type and returning a boolean value) can be defined as:

type relational_ty('a) = 'a, 'a -> bool

Following this definition, the type of relational operators testing integers is relational_ty(int).

Hence, a named type is an identifier followed by as many types comma-separated and enclosed between parentheses as the corresponding type definition has parameters.

See also the section dealing with type definitions.

### Sum type

Sums are a sequence of cases, each case being a record type.

A record is a sequence of typed fields. In a record, accessing a fields will yield a value of the type of this field. Sums as record can be either closed of open. A closed record type (respectively sum) enumerates exactly the fields it contains (respectively cases for a sum). An open record type (respectively sum) enumerates the fields (respectively cases for a sum) it may contain at minimum, and allows to have more if needed. In case of open record or sum, the type ends by a trailing ellipsis, otherwise it is silently closed.

An open sum ends by a column variable, i.e. a kind of type variable that can be instantiated by another sum. In other words, this column variable allows the sum to gain more sum cases.

An open record ends by a row variable, i.e. a kind of type variable that can be instantiated by another record (also called a row). In other words, this row variable allows the record to gain more fields.

In error messages (where they need to be clearly named), column variables (resp. row variables) are denoted by 'c.a, 'c.b, etc (resp. 'r.a, 'r.b, etc). In other cases, when types are printed and the identity of such variables doesn't matter, they are both written "...".

Here follow some examples of correct record types :

  • { } : Empty record type. It contains no field.
  • { int x, float y } : Closed record containing exactly the two fields x of type int and 'y of type float.
  • { int x, ... } : Open record type containing at least a field x of type int and possibly any other fields of any type.

Sums are a form of sequence of records. They can be non-disjoint, meaning that some records forming cases may overlap.

IMPORTANT:

A strong restriction imposes that an open sum must not host any case that is an open record and any (closed or open) sum must not host several cases that are open records. Moreover, if several cases (i.e. records) of a sum share a same label, this label must have the same type in each cases. Finally, if a sum has several cases, none of them must be open.

Here follow some correct sum types examples:

  • { int a } or { float b } : Closed sum containing two cases, one being a record having exactly one field a of type int, the second begin a record containing a record having exactly a field b of type float.
  • { int a } or ... : Open sum containing at least one case being a record having exactly a field a of type int and possibly any other cases.
  • { int a } or { int a, float b } : Closed sum containing exactly two cases, one being a record containing exactly one field a of type int, the second being a record containing exactly two fields, a of type int and b of type float. We see in this example that fields can be common to several cases as long they keep the same type. For example, having a of type bool in one of the cases would be ill-typed.
  • { int a } or { int a, float b } or ... : Open sum type similar to the previous example. It may contains additional other cases compared to the previous example.
  • { int a } : Closed sum with only one case. It can be hence seen as a record (i.e. like a struct a la C) exactly containing one field a of type int.
  • { int a, ... } : Closed sum with only one case. This case is open hence can be seen as a record containing at least one field a of type int. Any other record containing additional fields is compatible with this type.
### Type forall

We saw in section Type variable the difference between types and type schemes. Variables present in type schemes are universally quantified, hence a type scheme represent a family of types. A type forall embeds a type scheme inside a type and represents the type of a module field. Such a type is written by prefixing its body by the keyword forall followed by the list of polymorphic type variables of the type between parentheses followed by a "." (dot) then the body of the type. Type-forall are in fact a way to introduce a flavor of rank-2 polymorphism in Opa type system.

Rank-2 polymorphism is a system in which a forall quantifier may not appear to the left of more than 2 arrows (when the type is drawn as a tree). To illustrate this concept, consider the function:

function f(x) { x }
_ = (f (true), f(2))

variables bound with a definition (sometimes called let-definition may be polymorphic, as in f above. That's the reason why f can be applied to two different types, bool and int. However, in a rank-1 polymorphism system, function arguments may not be:

function g(f) {
  (f(true), f(2))
}

results in an error:

> *Error*
> Function was found of type bool -> 'a but application expects it to be of
> type int -> 'b.
> Types int and { false } or { true } are not compatible

In a rank-2 polymorphism system, we can give g a signature like (forall a. a -> a) -> (bool, int). This is called a rank 2 type, because a function argument is polymorphic, as indicated by the forall quantifier. Now the function g may be applied to expression whose generalized type is at least as general as that declared.

Modules are particular records in which each field type gets generalized. This way, a module can be seen as a package of definitions grouped in the module rather than laying at toplevel.

When a module field definition appears to be polymorphic, its type is turned into a type forall where all generalizable variables are universally quantified. The type of the module is hence a closed record type whose fields names are the fields definitions names with their related type turned into a type forall if they are polymorphic. Hence, types forall are closely related to module fields types and can only be created via module expressions.

The following program clearly show the difference between the type of r which is a regular record and the one a m which is a record containing types forall for fields having a polymorphic type.

r = {
  function id(x) { x },
  function pair(x) { (x, x) },
  one: 1
}
> r : {('v0 -> 'v0) id, ('v1 -> tuple_2('v1, 'v1)) pair, int one} or ... =
>  { id: <fun>, one: 1, pair: <fun> }
m = {{
  function id(x) { x },
  function pair(x) { (x, x) },
  one: 1
}}
> m : {(forall('v0). 'v0 -> 'v0) id,
>           (forall('v1). 'v1 -> tuple_2('v1, 'v1)) pair,
>           int one } =
>      { id: <fun>, one: 1, pair: <fun> }

In the record type of r, no type forall appear even if the type scheme of r itself is polymorphic ('v0 and 'v1 are generalized). In the record type of m, since it is a module, all fields having a polymorphic type get generalized and turned into forall types. One can notice that one being not polymorphic, it is not turned into a forall type.

IMPORTANT:

Types forall are explicitly created by the developer when he uses a module construct: there is hence no automatic type forall creation. Conversely, when a type forall appears in an expression, it is automatically instantiated, hence leading to a type in which generalized variables are replaced by fresh variables and universal quantification disappears (in other words, the type forall becomes a type non-forall again).

Type algebra -- formal grammar

For sake of completeness, we give here a presentation of the type algebra like usually described in theoretical approaches. This section is mostly dedicated to advanced users or people familiar with type systems design. A more complete description of this theoretical presentation is available in the technical and architecture documents of Opa.

Type
τ ::= `int`              Integer constant type
    | `bool`             Boolean constant type
    | `string`           String constant type
    | `float`            Floating point numbers type
    | `void`             Trivial (i.e. having 1 unique constructor) constant type
    | α                  Type variable
    | t ( τ )            Named type constructor (possibly parameterized)
    | τ `->` τ           Function type
    | κ                  Sum type
    | ∀ α, ρ, δ . τ      Type forall
Row
φ ::= •                  Closed row
    | ρ                  Row variable
    | l : τ, φ           Field sequence
Sum
κ ::= •                  Closed column
    | ζ                  Column variable
    | { φ } or κ         Case (record) sequence

Typing a program

Now that the form of Opa's types has been presented, we will address how Opa programs are typechecked. The aim of this section is to understand the type to expect for each construct of the language. This presentation is split in two parts, one dealing with type definitions and one dealing with expressions.

### Typing type definitions

Type definitions allow to create new data-structure from previously existing ones or allow to alias, i.e. provide a name as a shortcut for cumbersome type expressions. Both usages follow the same syntax and rules.

Structure of a type definition

A type definition is introduced by:

[directive] type [directive] ident[(parameters, ...)] = type expression

where ident is the name of the created type, parameters is a comma-separated enumeration between parentheses of the type variables parameterizing the definition (or nothing if the definition is not parameterized), and type expression is the body of the defined type. We will see directives in the next section.

Once defined, the new type name will be compatible with itself and any type expression compatible with the body of its definition. Hence, following the definition type t = int, the type t can be used anywhere an int is expected and reciprocally.

Directives in type definitions

Directives allow to alter the visibility of a type definition to ensure abstraction. Two directives are available: private and abstract.

private makes the type definition only visible in the package where it it defined. Outside this package, the type is no more visible : it doesn't exist for other packages. Such types are typically internal to a package and serve only to implement functionalities without appearing in the interface of the package. As a consequence, they must not escape outside the package, i.e. must not appear in the signatures of values visible outside the package. To make this possible, values manipulating such types must also be declared private, using the private directive also available for value definitions. If a private type appears in the signature of an exported value, an error is raised, advising to make this value private, like in the following example:

package private_ty_escape2
private type prv_string_ty = string
vv = "" : prv_string_ty
> *Error*
> Description:
> Definition of vv is not private but its type contains a type private to the
> package. Type prv_string_ty must not escape its scope.
> Hint:
>   Add a private directive on this definition.

abstract makes the type definition transparently visible inside the package where it is defined, i.e. its internal structure can be accessed and manipulated, but outside this package it turns opaque: the type is still visible (i.e. it still exists) but its internal structure is not visible anymore.

Such types are typically data-structures implemented in a package, provided to other packages that must not directly manipulate the effective implementation of the data-structure and instead use the API provided by the package implementing the data-structure. This is a kind of abstract data type (ADT), which permits to preserve consistency and internal invariants required by the implementation of a data-type. The basic idea is that the definition package "knows" how to safely manage the type, provides a safe API to the outer "world", hence one gets sure that nobody will come and "hack by hand", hence break, invariants required by the data-structure to keep consistent. Outside its definition module, an abstract type loses its internal representation, hence becomes compatible only with itself.

package p1
abstract type u = int
function u to_u(int x) { x }
function int from_u(u x) { x }
// Legal since we are in the defining package: we see u's internals.
_ = 1 + ((u) 1)

package p2
import p1
// Obtain a value of type u from the integer 0.
v = to_u(0)
_ = 1 + v

Because the last line of code of package p2 tries to rely on u's internal structure although it is not the package defining it, u being turned abstract can't be compatible with int anymore, hence leading to the error:

> *Error*
> Function was found of type 'a, 'a -> 'a but application expects it to be of
> type int, u -> 'b.
> Types int and u are not compatible

Parameters in type definitions

Parameters allow to define types polymorphically, in order to instantiate them later by providing effective type arguments to the type name in future type definitions or type expressions. Let's examine the following sample code.

type fun_t('a, 'b) = 'a -> 'b
type int_string_fun_t = fun_t(int, string)

function twice_general(f , i) {
  f(i) + f(i)
}
> twice_general : ('v0 -> int), 'v0 -> int = <fun>

function twice(fun_t(int, int) f, i) {
  f(i) + f(i)
}
> twice : fun_t(int, int), int -> int = <fun>

fun_t is defined as the type of functions taking one argument an returning a value, both of them not being fixed in the definition. Hence the type fun_t is partially defined and will need to be instantiated with effective arguments in further usages.

Next comes the definition of int_string_fun_t using fun_t and specializing this latter with int for both the types of the argument and returned value of the function. By this definition, we used the general fun_ty to create a more specialized type definition.

Let's now consider the definition of twice_general taking a function, a parametric value and adding twice the result of applying this function to this value. The type of `` (internally int, int -> int) enforces the result of `f` to be `int` but no constraint applies on the type of the argument of `f`. Hence `f` has type `'v0 -> int`, and by consequence `twice_general` also has a parameterized type `('v0 -> int), 'v0 -> int`.

Using the type definition int_string_fun_t, we can define a function similar to twice_general, i.e. with the same body but with a more restricted type: twice. Hence, by adding a type coercion on the parameter f of twice, we can enforce it to be of type fun_t(int, int) hence forcing the parameter of f to be int instead of leaving it free like in twice_general.

NOTE:

Note that both definitions of fun_t and int_string_fun_t do not really create new types. They are aliases, shortcuts for a type expression. Instead of using these 2 defined type names, we could directly use their body but this would be less handy. Creation of really new types is achieved by defining sums and records as described below.

Using type expressions in a type definition body

The basic type expressions that can be used allow to represent the types described in the Opa type algebra.

Basic constant named types

int, bool, string, float, void.

type t = int defines t as an alias on the type of integers int.

Function type: ->
  • type t1 = int -> float defines t1 as the type of functions taking an int and returning a float.
  • type t2 = -> float defines t2 as the type of functions taking no argument and returning a float.
  • type t3('a, 'b) = 'a -> 'b defines t3 as the type of functions taking an argument and returning a value whose types are not fixed and are parameters of the type definition. Hence, t3(int, float) specializes these parameters and lead to a type compatible with the above type t1.
  • type t4 = int, string -> bool defines t4 as the type of functions taking two arguments (of types int and string) and returning a bool.
Named type

Type definitions bind type expressions to names. So defined type names can be on their turn used to form new type definitions. If a type name represents a parameterized type, it must be instantiated providing effective type arguments according to its arity (i.e. number of required arguments).

  • type t5 = t1 -> t2 defines t5 as the type of functions taking a parameter of type t1 and returning a result of type t2, both types being those presented in the above paragraph.
  • type t6 = t3(bool, int) defines t6 using the previously defined type t3 and applying it to bool and int. As a consequence, t6 is the type of functions taking an argument of type bool and returning a value of type int.
Sum

A sum type is an enumeration of cases separated by or. Each case is a record i.e. an enumeration between braces of labels with their type.

type t7 = { int A } or { string B, bool C } defines t7 as a new sum type containing two cases. The first one is a record containing a field A of type int; The second one is a record containing two fields, B of type string and C of type bool. Values of this type can be pattern matched to discriminate their effective form and access the particular fields of each case.

Sum type definitions always lead to closed sum, i.e. the resulting type doesn't ends by a "..." (column variable). This simply means that when defining a type, the definition simply describes all the cases this type can host.

As a shortcut, when a field of record has type void, its type specification can be ommited. Hence,

type color = { Red } or { Green } or { Blue } defines color as a sum with three cases, each of them implicitly of type void. This is equivalent to { Red : void } or { Green : void } or { Blue : void }. The idea in such a definition is that the fields of the record are self-sufficient and do not need to carry additional information to be precise enough.

type pair('a, 'b) = { 'a first, 'b second } defines pair as a parameterized sum with only one case. This case is a record containing two fields, one per component of the pair. Since the pair data-structure is naturally polymorphic, this type definition has in effect 2 parameters 'a and 'b.

IMPORTANT:

Remember, as already state while presenting Opa type algebra, in the sum types section, that a strong restriction imposes that an open sum must not host any case that is an open record and any (closed or open) sum must not host several cases that are open records. Moreover, if several cases (i.e. records) of a sum share a same label, this label must have the same type in each cases. Finally, if a sum has several cases, none of them must be open.

Record

A record type is defined like (and compatible with) a sum type with only one case. This way, by default a record type is a closed row in an open column.

Type forall

Type-forall being strongly related to modules, they are defined using a syntax close to modules, allowing to avoid the explicit quantification of type variables. In fact, type-forall are not directly written, they are introduced on need through module types. Any type variable appearing in the type of a field of the definition and not bound outside will be implicitly considered as quantified.

type t = {{ identity : 'a -> 'a }} defines t as a module type (i.e. a record) in which the field identity has the type-forall forall('a).'a -> 'a.

Note that variables not bound outside the module type have a scope local to the field's type where they appear. For instance, in type t = {{ ('a -> 'a) identity, 'a val }}, 'a represents 2 different variables in identity and val.

Variables bound outside the module type are not quantified, hence not participating into type-forall creation. For instance in type t('a) = {{ ('a -> 'a) id, 'b v }} -> 'a the type variable 'a being a type parameter is bound outside the module type, hence is not quantified. By consequence, the type of the field identity is no more a type-forall. Conversely, 'b being not bound remains quantified and introduces a type-forall. This also shows that type-forall are introduced only when polymorphism exists: there exists no trivial type-forall, i.e. no type-forall with an empty set of quantified variables.

A few syntactic extensions allow shortcuts to make life easier.

Sum sugar

Additionally, a syntactic shortcut allows to merge several sum types definitions to create a new definition containing the cases of all the initial types. In the same idea as when creating a sum type, i.e. separating its different cases by a or, the names of types to merge are simply separated by or.

type t = { A } or { B }
type u = { C }
type v = { D }
type w = t or u or v
_ = (u) { C }
> _do_ : u = { C: void }

In the above sample code, w finally stands for the type definition { A } or { B } or { C } or { D } and expression { C } can be coerced into w although it is initially a value of type u. You may note the _ = expr construct which allows evaluating the expression expr then throw its result value. Since this value is not bound to any identifier, Opa names it _do_ in its feedback messages.

Tuples

Tuples are generalization of pairs. A pair is a data-structure with 2 components: a tuple has an arbitrary number of such components. Two tuples types are compatible if their components are compatible 2 by 2. This especially means that 2 compatible tuples types must at least have the same number of components. Syntactic sugar allows to write tuples types as comma-separated sequence of types enclosed between parentheses.

type t10 = (int, bool, int, int) defines t10 as a tuple with 4 components, all being of type int except the second one.

type t11('a) = ('a, 'a, 'a) defines t11 as a tuple with 3 components all being of the same type, parameter of the type definition. Using t11 in a type expression with bool as effective argument will hence represent the type of a tuple with 3 components all being booleans.

NOTE:

Opa standard library already proposes numerous data-types like lists, maps, sets, etc which may be directly used in developments, avoiding the need to implement again the related common and widespread algorithms.

Typing expressions

In this section, we will review each Opa construct and explain which way its type is computed. In some cases, we will see that inference is not complete and type annotations may be added by the developper to help the typechecker in its job.

Constant expressions

Constant expressions are given their natively defined type:

  • Integer values: int
  • Boolean values ({ true } and { false }): bool
  • Characters strings literals: string
  • Signed floating point values: float
  • The trivial type value ({}): void
Examples:
i = 5
> i : int = 5
f = 3.14159
> f : float = 3.14159
s = "foo"
> s : string = "foo"
v = {}
> v : {} or ... = void
b = true
> b : bool = { void true }

First, notice the apparition of a construct ident = ... we will detail deeper later. For the intuition, let's simply say it binds (attaches the result of its right-side expression to the name ident). This allows to name values. Be aware that this is different from the notion of assignment "à-la C" which permits to modify in place the value stored in the memory cell touched by the assignment. The difference between a binding and an assignment refers to the notion of mutability, i.e. the possibility to alter the value represented by the identifier. A binding links an identifier once for all to a value. Some kind of values are mutable, this is the way assignments are obtained aside bindings.

Note in the above sample code that the void and boolean values have been typed as sums, not as named types void and bool. This is due to the fact that type inference is structural: named types are introduced by type annotations. Types bool and void are just abbreviations on the structural types {} and { void true } or { void false }. If no explicit type annotation imposes to see these void and boolean values as void and bool, the structural types induced by their structures remain the primarily ones inferred.

#### Functions

A function may have arguments but always return a value. Hence, defining a function requires to define its possible arguments and write its body, i.e. the expression building the returned value. The form of a function expression is:

[parameters, ...] -> expr

Parameters of the function are bound by the definition, hence are only available in the function's body. Obviously, to typecheck the body of the function, we will need the types of its arguments. Type schemes bound to identifiers (function parameters or identifiers introduced by the identifier binding construct) are recorded in an internal structure we call here a typing environment acting as the memory of already seen identifiers' types.

Typechecking a function expression is performed in 3 steps:

  1. First, each parameter is given a temporary unknown type: a trivial type scheme (i.e. with no generalized variables) whose body is a type variable. This especially means that a function parameter can't be used polymorphically inside the body of the function.
  2. The body expr is then typechecked in the typing environment extended by the function's parameters, hence leading to the type τ of the returned value.
  3. Finally, an arrow type is built, grouping the types of the parameters (that were possibly instantiated because of type constraints found while typechecking the function's body) on the left-side of the arrow and setting the returned type on the right-side, finally giving a type of the form τ_1, ..., τ_n -> τ.
Examples:
_ = x -> x + 1
> _do_ : int -> int = <fun>

First, notice the _ = ... notation that allows to throw the result of the expression laying on its right-side. This is a particular form of identifier binding, a construct described in detail later. For the moment, the intuition to keep is that it binds the result of the expression to nothing.

In this sample code, x is inserted in the typing environment with a type variable, say 'a (what we previously called an unknown type). The body of the function is then typechecked. Even is we didn't present yet the typechecking of function application, driven by the intuition, the operator + expects two integers and returns an integer. Hence, to be correctly typed, the parameter x of the function must be of type int i.e. the type variable 'a previously introduced for x gets instantiated by int. Now, 1 also has type int, so the operator + is used in a consistent typing context, returning a value of type int which is the global type of the function's body. Finally, the whole function type is built having 'a now being int on the left-side of the -> and int on its right-side, giving int -> int.

_ = x, y -> x + 1
> _do_ : int, 'v0 -> int = <fun>

This second example shows a function with 2 arguments. As previously, each of them is entered in the typing environment with a type variable before typechecking the body of the function. This body is the same as in the previous example, hence it doesn't use the y parameter of the function. As a consequence, no constraint applies on y and its type remains a variable. As a final result, grouping the types of the arguments on the left-side of the -> and the result type on its right, we obtain the final type int, 'v0 -> int.

_ = -> "constant"
> _do_ : -> string = <fun>

In this last code snippet, since the function doesn't have any argument, its body is typechecked in the unmodified typing environment. The string literal "constant" representing the body of the function had type string which is the type of its returned value. Since there is no argument, there is no type to set on the left-side of the final -> type, only the result type will appear on its right-side, leading to the function type -> string.

Application

Application is the construct allowing to "use" a function by providing it some effective arguments. In Opa, an application expression is denoted by a functional expression on the left-side and the enumeration comma-separated and between parentheses of the expressions being effective arguments to pass to the function:

expr1([expr2, ...])

If the function requires no argument, then the enumeration between the parentheses is empty, however, the parentheses still remain required. Note that there must not be any blank between the functional expression and the opening parenthesis. Typechecking an application is performed in 4 steps:

  1. First arguments expressions expr2, ... are typechecked, leading to as many types τ_2, ....
  2. Next, the functional expression expr1 is typechecked. It is then expected to lead to a functional type with as many arguments' types as there are effective arguments and returning a value of a certain type: τ. Hence, the expected type must have the form τ'_2, ... -> τ.
  3. Each effective argument's type τ_i is checked against the corresponding τ'_i of the type inferred for the function to ensure they are compatible.
  4. Finally, the type of the application is the result type of the function, i.e. τ
Examples:
function f(x, y) { x + y + 1 }
> f : int, int -> int = <fun>
_ = f(2, 4)
> _do_ : int = 7

To begin, simply ignore the typechecking of the function f, we will have a look at it later. Simply accept it has type int, int -> int and let's examine the second expression. The arguments expressions 2 and 4 have type int. We then typecheck the functional expression which is the identifier f. Even if we don't have described typechecking of identifiers yet, we already introduced the notion of typing environment keeping trace of types of identifiers (see Functions). Looking-up in this environment we find that f has type int, int -> int. Hence, we see that the types of the effective arguments provided in the application are compatible with the expected types arguments of f (i.e. int). Hence the result type is the one on the right-side of f's type: int.

To go further, we can study how f is typechecked since its body contains the application of the operator +. Like previously described for functions, x and y are entered in the typing environment both with temporary type variables, say 'a and 'b. Then the body of the function which is an application must be typechecked. We then first type the arguments, i.e. x and y that have types 'a and 'b. Assuming that + has the obvious type int, int -> int we must now ensure that the types of the effective arguments are compatible with the type of +. This means that 'a must be int and same thing for 'b: the two type variables get then instantiated, i.e. made equal to int. The result type of the application is then the result type of +, i.e. int. We iterate exactly the same process to typecheck the addition with the constant 1 which finally gives type int. Hence, int is the type of f's body, so the returned type of f. As a consequence, f is inferred the final type int, int ->int.

#### Identifier binding

The construct

ident = expr1
expr2

allows to evaluate expr1, bind its value to ident and evaluate expr2 in which ident possibly appears and will be replaced by its bound value. Identifier binding introduces polymorphism in Opa: once the type of the expression to bind to the identifier is computed, all the type variables remaining in it with no constraint are considered polymorphic and generalized. As a consequence, in identifier binding expressions, identifiers obtain not a type but a type scheme (see also the presentation of type variables) that can be polymorphic.

Typechecking this construct processes in 3 steps:

  1. First, expr1 is typechecked, leading to a type τ1.
  2. Then τ1 gets generalized: all type variables without any constraint are generalized and universally quantified to lead to a type scheme. This type scheme is then bound to the identifier ident in the typing environment.
  3. Finally, expr2 is typechecked in the extended typing environment, giving a type τ2 which is the final type of the whole expression.
Examples:
v =
  one = 1
  one + one
> v : int = 2

The identifier one is bound to the value 1 which has type int. Then it is used in an addition. The operator + expects 2 integers and returns an integer. Hence, since one was found of type int, it can safely be used with the addition, hence this latter expression leads to the final type int.

By the way, we can note a special identifier binding construct, the toplevel one that binds the identifier v. In its toplevel flavor, this construct doesn't have any trailing expression (we previously called expr2). This allows to bind identifiers at the outermost level of a program, i.e. to give it a scope (a life duration) spanning on all the rest of the program.

pair =
  function identity(x) { x }
  (identity(1), identity("foo"))
> pair : tuple_2(int, string) = { f1: 1, f2: "foo" }

The above code snippet introduces 2 features. First, the shortcut for defining a function by directly enumerating its arguments. Instead of writing f = x, y -> 4, it is possible to write function f(x, y) { 4 }. Second, we make use of the built-in syntactic sugar to represent tuples whose components are enclosed between parentheses and comma-separated.

Hence, the identifier identity gets bound to a function type directly returning its argument. Clearly, such a function can take any kind of value and will return this kind of value. The only constraint on the type of this function is that it returns the same type it got for its argument, hence giving the type 'a -> 'a. This type can then be generalized giving identity the polymorphic type scheme ∀ 'a.'a -> 'a. The function is then invoked twice, once with an int, once with a string, which is legal since it has a polymorphic type. Hence, the results of these invocations have types int and string and are grouped in a tuple, tuple_2(int, string) which is the final type of this expression.

#### Recursive identifier binding

Similarly to the identifier binding construct, the recursive flavor

rec ident_1 = expr_1
...
rec ident_N = expr_N
expr

allows to bind each ident_i to the value of expr_i but with all the ident_x possibly appearing in the bodies, expr_x, of the bound expressions. This hence allows (mutually) recursive definitions. Intuitively, this implies that to compute the type to bind to ident_i, we must compute the type of all the expressions expr_x for which we may need to already known the types bound to the ident_x.

Note that toplevel definitions are automatically reordered, i.e. it is possible to use a function and define it later and if needed, definitions are implicitly made recursive. Hence, the rec construct mostly serves to define local recursive definitions, i.e. definitions not spanning at toplevel (or in other words, definitions nested in other definitions).

Typechecking this construct processes in 5 steps:

  1. First, each ident_i is temporarily given a unknown type: a trivial type scheme (i.e. with no generalized variables) whose body is a type variable. This especially means that a recursively defined identifier can't be used polymorphically in its own definition. The typing environment gets extended, binding each ident_i to its related temporary type scheme.
  2. Then each expr_i is typechecked in the extended typing environment, leading to a type τ_i.
  3. Now the bodies of the bindings are typechecked, each τ_i and the related temporary type scheme bound to ident_i (which have possibly be instantiated by some type constraints) are checked to ensure they are compatible.
  4. If so, each τ_i is then generalized and becomes the type scheme bound to ident_i in the typing environment.
  5. Finally, expr_n+1 is typechecked in the extended typing environment, giving a type τ_n+1 which is the final type of the whole expression.

Value identifiers

As we previously introduced (see Functions), the typing environment is a memory keeping trace of types inferred and bound to identifiers (bound by a function definition or an identifier binding construct). More accurately, it contains, not the types, but the type schemes bound to identifiers since Opa supports polymorphism.

Typechecking an identifier expression aims to find a type for this identifier and is performed in two steps.

  1. First a looking-up is performed in the typing environment to find the type scheme entered for this identifiers. Providing that the identifier was actually defined, it das obviously been typechecked and bound a type scheme in this environment.
  2. The obtained type scheme must be transformed into a type. Remember that a type scheme is a model, a pattern of types, describing a family of possible types. Generalized (i.e. universally quantified) type variables in the scheme represent parts that can be instantiated at need by any type expression. In order to allow the identifier to be used with different types in place of the type variables of its scheme, the scheme gets specialized, i.e. the type for this occurrence of the identifier is built by copying the scheme's body, replacing all generalized type variables by fresh (i.e. not appearing anywhere in the program) type variables.
Examples:
function id(x) { x }
> id : 'v0 -> 'v0 = <fun>
one = 1
> one : int = 1
hello = "Hi"
> hello : string = "Hi"
_ = (id(one), id(hello), id(id))
> _do_ : tuple_3(int, string, 'v1 -> 'v1)

In this sample code, the interesting part deals with the 3 applications used to build the tuple expression. We see that id was entered in the typing environment with the polymorphic type scheme ∀ 'a.'a -> 'a.

Hence when typechecking id(one), the type scheme of one is fetched, which is not polymorphic and whose body is int. Hence, the occurrence of one is given the type int. The type scheme of id is then fetched and instantiated: the quantified type variable 'a is replaced by a fresh one, say 'b leading to type 'b ->'b (note there is no more ∀ since this is a type, not a type scheme). Application then makes this 'b instantiated into int and leading to the final application type int for the first component of the tuple.

The second component of the tuple is processed the same way, the type scheme of id being fetched again in the environment and specialized again by replacing 'a by another fresh type variable, say 'c. The type scheme of hello is fetched and is not polymorphic, leading to type string. As a consequence, not surprisingly the type of the second component of the tuple is string.

Finally remains the third and last component of the tuple. The type scheme of id is fetched twice: one first time for the argument of the application, one second time for the functional part of it. Each occurrence of id is then given a type obtained by specialization of the scheme, each one having fresh type variables in place of the ones generalized in id's scheme, leading to types 'd ->'d for the functional usage of id and 'e ->'e for id used as argument. Then typechecking of the application imposes that 'd and 'e->'e must be the same type. The result of the application expression being 'd, from the constraint we have just seen, we obtain that id(id) has type 'e->'e, which is the type of the third component of the tuple.

Records and sums

A sum expression consists in the record representing the case of sum, i.e. an enumeration (possibly empty) between braces and semi-separated of the field labels with their related values that represent the sum case:

{ [label: expr or ...] }

Such an expression always lead to an open sum type containing a closed record. The reasons for closed / open are the following:

  • Since the record enumerates all the fields of the case, this means that this record has no more fields in this expression. For this reason, it must be closed because it contains no other fields than thoses explicitly stated in the expession.
  • Since the expression represents one case of sum, its type at least contains this case, but possibly some other cases. Hence it must be open to accept being compatible with these possible future other cases.

Typechecking of a record expression is performed in 3 steps:

  1. Each field expression expr_i is typechecked, leading to as many types τ_i.
  2. A closed record type is then built, binding each label_i to its related found type τ_i. This record being closed means that it doesn't end by a row variable, hence won't be compatible with record containing other fields than those explicitly specified.
  3. Finally, an open sum type is created, plugging the above record type as one of its case. This sum being open means that it ends by a column variable, hence means that the type of this expression contains at least this case, but may be more if needed.

Pattern-matching

Pattern-matching is the construct allowing to discriminate by cases on the structure of a value. According to the syntax:

match (expr) {
  case case_1: expr_1
  case case_2: expr_2
  ...
  case case_n: expr_n
  default: expr_def
}

the expression expr is matched against each pattern in sequence, stopping on the first one that matches. This fitting between the value and the pattern may induce bindings of pattern variables. The environment is then extended by these bindings and the right-side part expression of the matching case is then evaluated as the result of the whole expression.

As a consequence, the matched value and all the patterns must have a same type. Because any right-side part expression of the cases can be returned, they must also have a same type.

More accurately, typechecking of this construct is performed as follows. First, the matched expression is typechecked, leading to a type τ. Then, each pattern is typechecked leading to types τ_1, ... τ_n that are checked to be compatible together and with τ. While typechecking patterns, pattern variable types are inferred and reminded for later. At this point, a special operation is performed on catchall patterns ("_", or pattern variables) having a sum type. In effect, a catchall pattern absorbs any other value not already matched by previous patterns. In the case of sum types, this means that the column type representing this sum can have any other cases, hence is open. For this reason, sum types appearing in the type of a catchall pattern get recursively open (i.e. opening is done in depth). Now the type of the matched expression and patterns is computed, each right-side part expression of cases is typechecked in the environment extended by bindings of its related pattern variables (previously recorded). As a result, each branch gets a type τ'1, ..., τ'n that are checked to be compatible. The resulting type of the whole expression is then the type of the right side parts expressions of the pattern-matching.

Below follow some examples showing pattern-matching typechecking.

function f(x) {
  match (x) {
    case 0: false
    case 1: false
    default: true
  }
}
> f : int -> bool =  <fun>

Typechecking the matched expression x doesn't impose any particular constraint. Hence x is assigned a type variable. Then, each pattern is typechecked and appears to be int. Now, each right-side part is typechecked leading to type bool which is the return type of the function. The last pattern is a catchall, but since it doesn't have a sum type, opening has no effect. Since this function takes x as parameter and this latter was inferred the type int, the final type of this function is as expected int -> bool.

function g1(x) {
  match (x) {
    case { a: 1 }: 0
    case { b: bval }: bval
  }
> g1 : {int b} or {int a} -> int = <fun>

Typechecking the patterns, it can be seen that the matched expression must be of a type sum containing either { a : int } or { b : 'a }. At this point, the pattern variable bval is not yet constrained to any particular type from the current context, so it remains a type variable. Hence, the type of the matched expression must be, a priori, {b : 'a} or {a : int}. The first right side part is then typechecked leading to type int. The second one, thanks to the previously recorded binding of bval to 'a, has type 'a and must be compatible with int. Hence 'a must be instantiated by int. As a result, the matched value must have type {b : int } or {a : int} and the result of the pattern-matching must have type int. Finally, we have a catchall pattern, the variable bval, but since its type is not a sum type, opening has no effect and types are not changed. Hence, the function g1 is inferred of type {b : int } or {a : int} -> int.

function g2(x) {
  match (x) {
    case { a: 1 }: 0
    case { b: bval }: bval
    default: 42
  }
}
> g2 : {int b} or {int a} or ... -> int = <fun>

This example only differs from the previous one by the addition of a final catchall case. Typechecking is done in a similar way as in the previous example. The only difference is that because the catchall pattern is related to a sum type ({b : int} or {a : int}), opening columns has now an effect. The type of the matched value finally gets open, under the form {b : int} or {a : int} or ... to reflect the fact that this pattern-matching can handle sums with a case { a }, a case { b } or any other case.

function g3(x) {
  match (x) {
    case { a: { c } }: { d }
    case { b: bval }: bval
  }
}
> g3 : {{d} b or ...} or {{c} a} -> {d} or ... = <fun>

More subtle, the above example shows that while typechecking the patterns, the catchall pattern variable bval is not yet constrained, hence remains a type variable 'a. If we have a look "in the future", we will see that right-side parts impose that 'a is instantiated by a sum type. However, this is not yet known and the opening operation has no effect on the type currently bound to bval. The type of right-side parts is now inferred. The first one imposes to have a sum type containing a case { d }, hence has type { d } or ... (remember that a record expression is by default inferred as plugged in an open column). The second one imposes now 'a to be instantiated by the former type. This instantiation of 'a propagates both in the types of the argument and of the result of the function g3, stating that this pattern-matching can in fact handle sums with a case { a : { c } } and a case { b } itself having a sum type containing "d or any other case". This difference then appears in the global type of the function g3 since it shows that it returns a sum with either a case a or any other case being the embedded value of its argument when it is a sum of the form { b = ... }.

function g4(x) {
  match (x) {
    case { a: 1, b: { c: 3 } }: 0
    case { a: 1, b: { c: 4 } }: 0
    default: 0
  }
}
> g4 : {int a, ({c : int} or ...) b} or ... -> int = <fun>

This example demonstrates the opening of sums due to catchall patterns operating in depth. In effect, the final "_" pattern is bound to a sum type {int a, {int c} b}, i.e. a sum whose first case has a field b also having a sum type. Since this catchall "absorbs all the other cases" of this sum type, it absorbs others cases than { a, b } but recursively also " the other cases than { c } in the type of b". For this reason, both sum types get open in the argument of the function (note the apparition of the 2 column variables or ...).

Pattern-matching and records

By default, when typechecking patterns, if all the patterns lead to a same record type, the inferred type is a sum type with only one case being this record type:

function g5(x) {
  match (x) {
    case { a: 1, b: "" }: x
    case { a: 2, b: "A" }: x
    default: x
  }
}
> g5 : {int a, string b} or ... -> {int a, string b} or ... = <fun>
v = g5({ a: 1, b: "" })
> v : {int a, string b} or ... = { a: 1, b: "" }
_ = v.a
> _do_ : int = 1

Since such a sum only has one case, it remains compatible with a record as shows the creation of v then its field access in the above sample code. However, it is possible to add constraints enforcing the matched expression to have a pure record type, i.e. a closed sum. For instance, in:

function g6(x) {
  match (x) {
    case { a: 1, b: "" }: x.a
    case { a: 2, b: "A" }: 1
    default: 2
  }
}
> g6 : {int a, string b} -> int = <fun>

the field access performed on x in the right-side part of the first case imposes x to be a record (see section dealing with field access), i.e. to be plugged into a closed column. This appears in the type of g6 since is has any more column variable in spite of the catchall pattern. In effect, opening operation is performed after typechecking the patterns and before typechecking right-side parts. Hence, the field access construct enforce closing again the column that was open due to the final catchall pattern. This is sound because if the sum was remained open, the following example

function g6bug(x) {
  match x {
    case { a: 1, b: "" }: x.a
    case { a: 2, b: "A" }: 1
    default: x.a
  }
}

would also have type {int a, string b} or ... -> int which would be unsound because it would mean that in the default case, when none of the 2 first patterns matches the tested value, we return its field x. Passing this function a value of any form not compatible with the 2 first patterns and not containing a field x, for instance, { c: 1 }, would fall in this default case and since no field a is present, the field access would fail.

Pattern-matching and elipsis

A special form of pattern, the elipsis written "..." can be used in record patterns involved with both sums and records. However, it currently has 2 different meanings depending on the context and these 2 meanings may sometimes conflict. Elipsis is used in patterns under the form:

match (x) {
  case { a: 1, ... }: 1
}

The first meaning of the elipsis is "this pattern is an open record". In other words, "this pattern matches records with at least all the specified fields, all other ones being of no interest". For instance:

function h1(x) {
  match (x) {
    case { a: v, ... }: v + 1
  }
}
> h1 : {int a, ...} -> int = <fun>
_ = h1({ a: 3, b: "useless", c: 3.14159 })
> _do_ : int = 4

states that the function h1 takes as argument a record having at least a field a of type int and ignores all other possible fields of the record actually passed.

The second meaning of the elipsis deals with sum cases in which only some fields of each case are useful and means "this pattern matches a case of a sum containing the specified fields that are useful and the other fields of this case are of no interest but I don't want to enumerate all of them". For instance

type t =
  { int a, int b, int c, int d } or
  { int e, int f }
> type t = {int a, int b, int c, int d} or {int e, int f}
function h2(x) {
  match (x : t) {
    case { a: a_val, ... }: a_val
    case { f: f_val, ... }: f_val
  }
> h2 : t -> int = <fun>

defines a sum type with 2 cases and several fields inside. The function h2 performs a pattern-matching on a value of this type but is only interested in one of the fields of each case. To avoid the tedious enumeration of all the other useless cases, elipsis is used to let the typechecker determine the remaining part of the fields of each case.

However, this 2-sided aspect of the elipsis introduces tricky behavior when the context doesn't permit to guess if the elipsis is used in the context of a record or a sum type. Without enough information, the elipsis is implicitly considered as meaning an open record type. This means that this creates, for the pattern, a record type ended by a row variable. When unifying the type of patterns and matched expression to ensure they are compatible together, if the open record types of 2 patterns get unified, they will be along the record and not along the sum. In other words, 2 open records will join their fields in a unique final type although 2 closed records with incompatible fields will join into a sum type with these 2 records as cases. Hence,

function h3(x) {
  match (x) {
    case { a: 1, ... }: 0
    case { b: 2, ... }: 0
  }
}
> h3 : {int a, int b, ...} -> int = <fun>

having 2 open cases, infers for them { int a, ... } and { int b, ... }, that get unified into {int a, int b, ...}, giving this possibly surprising type to h3. In fact, the point is that there is not enough information in the context to guess if these 2 patterns represent 2 different "interesting" fields of a record or represent 2 cases of a sum in which we don't want to enumerate useless fields of each case. The difference with the previous example is that in the previous one, there was a type constraint on the matched expression bringing the information that the matched value had a sum type. With this extra information, the typechecker "knew" that patterns were related to a sum type and could complete the omitted fields of each case instead of considering them as 2 records to merge together.

As we stated above, the elipsis is implicitly considered as meaning an open record type hence creates a type ended by a row variable. If when checking that patterns have compatible types unification fails unifying along the record, i.e. merging the fields of the record, then it tries to unify along the column, i.e. creating a sum case with the 2 records as cases. Consider

function h4(x) {
  match (x) {
    case { a: 1, ... }: 0
    case { b: "true" }: 0
    case { a: _ }: 0
  }
}
> h4 : {int a} or {string b} -> int = <fun>

the types inferred for each pattern is { int a, ... } or ..., { string b } or ... and { int a }. Unifying the 2 first types can't succeed along the row because the second record type is closed, then can't acquire the field a. Hence unification is performed along the column, leading to a sum type {int a, ...} or {string b} or .... Now unification is done between this type and the one of the last pattern, { int a }. These 2 types can be unified since {int a, ... } and { int a } can be, instantiating and closing the ending row variable. Since there is no catchall pattern, the resulting sum is not open, hence doesn't show any column variable.

CAUTION:

What happens if both row and column variables appear ? The answer is that in this case, it may be possible to unify types along row or columns, i.e. to merge fields of records or to catenate records as sum cases. In some cases, there is no better choice and hence no principal type. This weakness of the type system is known, as well as the fact that unification "prefers" (first tries) to unify along the rows. That's some reasons why some type annotations are sometimes required to get out of tying errors.

Let's consider the following example:

function h5(x) {
  match (x) {
    case { a: 1, ... }: 0
    case { b: "true" }: 0
    default: 0
  }
}

The first pattern has type { int a, ... } or ..., the second one has type { string b } or ... and the third has type 'a. When unifying the 2 first types to ensure they are compatible, the only solution is to unify along the columns, hence leading to a sum type: { int a, ... } or { string b } or .... Unification with the last type only instantiates 'a. Because there is a catchall pattern, all sum types remain open, hence the global type of the patterns is finally { int a, ... } or { string b } or .... We are exactly in presence of a type that can be unified either along its column or row variable, hence in presence of possibilities of non-existence of principal type. In such a case, an error is raised by the typechecker:

> Sum type with row and column variables
> { string b } or { int a, 'r.a } or 'c.a
> Error
> Exportation failure

It is worthy to note that at this point, the typechecker detected an issue while working with a richer type algebra that the one available to state the signatures. In effect, this algebra temporarily tolerates sums with row and column variables but is only internally used by the typechecker. Error messages related to the internal type algebra print types a bit differently.

CAUTION:

Because a sum type represents an enumeration of cases, possibly open, but with each case having a fixed and finite set of fields, sums with cases containing row variables are also not legal types. For this reason, even if the richer and internal type algebra used by the typechecker temporarily tolerates such types, once typecheking ends such types, if they escape are rejected and don't fit the public type algebra in which signatures of the values are stated. Hence, the following example:

function h6(x) { match (x) { case { a: 1, ... }: 0 case { b: "true" }: 0 } }

Closed sum type with row variable(s) { string b } or { int a, 'r.a }## Error Exportation failure

is rejected since the first pattern has type { int a, ... } or .... The second one has type { string b } or .... Hence, unifying these 2 types gives { int a, ... } or { string b } or .... Since there is no catchall pattern, the sum doesn't remain open hence leading to type { int a, ... } or { string b }. This type is a sum and it actually has a case being a row ended by a variable.

Binary conditional

Binary conditional is introduced under the form:

if expr_1 then expr_2 else expr_3

where expr_1 is the tested condition, expected to be of type bool, expr_2 is the result value in case the condition holds (i.e. the evaluation of expr_1 yields true) and expr_3 is the result value in case the condition doesn't holds. Since this construct can return the value of either expr_2 or expr_3, both expressions must have the same type. Hence, typechecking of this construct if performed in 3 steps:

  1. The condition expression expr_1 is typechecked and it is checked to be of type bool.
  2. The expression expr_2 and expr_3 are typechecked, leading to types τ_2 and τ_3.
  3. Types τ_2 and τ_3 are checked to be compatible, i.e. are unified in a common type τ and if so, the type of the whole expression is then τ.
Example:
ok = if (true) { 1 } else { 2 }
> ok : int = 1
reok = if (1 != 2) { {A} } else { {B} }
> reok : {{} A} or {{} B} or ... = { A: void }
ko = if (1) { {A} } else { {B} }
> *Error*
> Matched expression or previous patterns have type int but new pattern is
> found of type { true } or 'c.a.
reko = if (false) { 1 } else { "one" }
> *Error*
> Previous right-side parts of the pattern matching return type int but current
> one returns type string.
> Types int and string are not compatible
test(c, default) = if (c) { "Holds" } else { default }
> test : bool, string -> string = <fun>

In the first example, the condition expression true is correctly of type bool, then both 1 and 2 have type int hence the whole expression is well-typed and of type int.

In the second example, assuming that the comparison operator "different" != has type 'a -> 'a -> bool, its application to 2 integers is well-typed and of type bool. Hence the condition expression is well-typed. The "then" part has type { A } or ..., the "else "part has type { B } or ..., hence the common type obtained by unification of these 2 types is { A } or { B } or ... which is the type of the whole expression.

The third example shows that because the condition expression 1 has type int and not bool the whole expression is ill-typed.

The next example shows that because both branches of the binary conditional do not have the same type (string versus int), even if the condition expression is of the correct type bool, the whole expression is ill-typed.

Finally, the last example exhibits the constraints imposed by the binary conditional construct, making them appearing in the type of the function using it. In effect, since the tested condition is the first parameter, c of the function test, this enforce this parameter to be of type int. Moreover, since the "then" branch has type string and since both branches are assumed to have the same type, the "else" part must also have type string, which requires default, the second parameter of test to be of type string, which is also the type of the whole conditional expression. From these constraints, we actually find that test has type bool, string -> string.

NOTE:

Internally, the binary conditional construct is transformed in a pattern-matching on boolean values. In other words, if c then t else e is rewritten as

match (c) { case { true }: t case { false }: e }

That's the reason why, in case of typechecking error on a binary conditional, one sometimes get a message involving pattern-matching as we could see in some above examples.

Modules

Opa provides a "light" notion of modules through a special form of records that are written between double-braces instead of simple-braces. Hence, a module expression has the form:

{{ [label: exp, ...] }}

The main feature brought by modules compared to records is the ability to have each field separately generalized and all of them implicitly mutually recursive. Hence, a module can be seen as a record grouping several definitions instead of leaving them spanning at toplevel.

Since modules are a sort of records they are first-class values, i.e. can be passed to functions, returned as result, stored in data-structures, etc. However their type is not a simple record type. In effect, since each field, i.e. definition of the module can be generalized, the type of the definitions are especially processed before being part of the module type: if they are polymorphic then their type is turned into a type-forall (see types forall) to mimic a type scheme. The final type of a module is them a closed record type in which labels are bound to their respective type (possibly being type-forall), embedded in a closed sum (i.e. the sum has the record as unique case).

Typechecking of a module is performed in 4 steps:

  1. Like for record expressions, each field expression expr_i is typechecked and like for a recursive identifier binding (see dedicated section), fields are considered as implicitly mutually recursive. As a result, we obtain as many types τ_i.
  2. Each type τ_i is inspected, any unconstrained type variables get generalized and a type-forall τ'_i is built quantifying these variables and whose body it τ_i. The related module field is then assigned this new type τ_i. However if no such type variable exist in τ_i, then no change is done and the module field keeps its initial type τ_i.
  3. A closed record type is then built, binding each label_i to its related found type.
  4. Finally, an closed sum type is created, plugging the above record type as its unique case.
Examples:
M = {{
  function plus(x, y) { x + y }
  function f(x) { x }
}}
> M : { f : forall('v0). 'v0 -> 'v0, plus : int, int -> int} or ... = { f: <fun>, plus: <fun> }

As we stated before, the type of a module is a closed record in a closed sum. The reason for having a closed record is that, like any record, a module expression enumerates all the fields this module has, and it has no more fields that those making its definition. There are 2 reasons for having a closed column. First, modules are not sum types: they do not present "different cases". Next, having an ending column variable would make it unconstrained, hence generalized. In such a case, when creatign sub-modules, the type of the sub-module would have a generalizable column variable and would naturally turn itself into a type-forall which is not the expected type for a module.

This feature, pretty related to rank 2 polymorphism, however introduces some usage restrictions. The first important point is that modules are explicitly introduced: when the syntactic construct {{ }} is used, a module is to be created. The second point to remind is that instantiation of type-forall is automatic: each time an expression is found having a type-forall, it is immediately specialized, replacing its generalized variables by fresh ones and removing the forall quantification (i.e. the obtained type is an instance of the type-forall's body).

Let's examine a simple example.

// Definition of a module type.
type Mty = {{ ('a -> 'a) f, (int -> int) g }}
> type Mty = { (forall('a). 'a -> 'a) f, (int -> int) g}

// Coercion to a module type
M = {{ function f(x) { x }, function g(x) { x + 1 } }} : Mty
> M : Mty = { f: <fun>, g: <fun> }
_ = M.f(5)
> _do_ : int = 5
_ = M.f("foo")
> _do_ : string = "foo"

The first type definition introduces a module type, automatically adding the forall-types on polymorphic fields. This avoid having to explicitly state them in the type de finition. Next a module M is defined and its signature gets constrained by Mty. Note that this constraint is not mandatory but set here to show how to constrain a module and to avoid getting a verbose structural type like {(forall('a). 'a -> 'a) f, (int -> int) g} = { f: <fun>, g: <fun> } we would get otherwise. Module fields access is performed like for record, via the dot-notation. The point to note is that M.f is a forall-type, hence during the application to the expression 5, M.f appears to have type forall('a). 'a -> 'a. As stated above, this forall-type get automatically instantiated into a fresh type 'b -> 'b, and the type of the whole expression is int as expected. Following this point, when applied a second time, to the expression "foo", this type is instantiated again into another fresh type 'c -> 'c, hence preserving the polymorphism and allowing to apply M.f to a string, returning the expected string final type.

An important remark is that module types are strict and there is no implicit instantiation of explicit quantification. Automatic instantiation is only done when encountering a forall-type as a type, not as a subterm-of a type. Hence, coercion cannot be used to restrict the type of a module:

// Rejected.
_ = {{ function f(x) { x }, function g(x) { x + 1 } }} : {{ f : int -> int, g : int -> int }}
> *Error*
Expression has type { f: !for all 'a . 'a -> 'a, g: int -> int } but is
coerced into { f: int -> int, g: int -> int }.
Types !for all 'a . 'a -> 'a and int -> int are not compatible
Hint:
  Error occurred through field f.

// Ok.
_ = {{ function f(int x) { x }, function g(x) { x + 1 } }} : {{ f : int -> int, g : int -> int }}
> _do_ : {f : int -> int, g : int -> int} = { f = <fun>, g = <fun> }

// Rejected.
_ = M : {{ f : int -> int, g : int -> int }}
> *Error*
Expression has type Mty but is coerced into { f: int -> int, g: int -> int }.
Types int -> int and !for all 'a . 'a -> 'a are not compatible
Hint:
  Error occurred through field f.

// Ok.
_ = {{ f = M.f : int -> int, g = M.g }} : {{ f : int -> int, g : int -> int }}
> _do_ : {f : int -> int, g : int -> int} = { f = <fun>, g = <fun> }

// Ok.
_ = {{ function f(int x) { M.f(x) }, function g(int x) { M.g(x) } }} : {{ f : int -> int, g : int -> int }}
> _do_ : {f : int -> int, g : int -> int} = { f = <fun>, g = <fun> }

Modules can be returned as results of functions, hence providing the mean to create functors. For instance:

type Mty2 = {{ f : 'a -> 'a }}
> type Mty2 = {f : forall('a). 'a -> 'a}
M2 = {{ function f(x) { x } }} : Mty2
> M2 : Mty2 = { f = <fun> }

// A functor.
F(m : Mty2) = {{
  function a(x) { m.f(x) }
  function b(x, y) {
    { f1: m.f(1)
    , f2: m.f(x)
    , f3: m.f("string")
    , f4: m.f(y)
    }
  }
}}
> F : Mty2 ->
>     {a : forall('v0). 'v0 -> 'v0,
>       b : forall('v0, 'v1, rows:, cols:...).
>             'v0, 'v1 -> {f1 : int, f2 : 'v0, f3 : string, f4 : 'v1} or ...} = <fun>

// Application of the functor.
M3 = F(M2)
> M3 : {a : forall('v0). 'v0 -> 'v0,
>      b : forall('v0, 'v1, rows:, cols:...).
>            'v0, 'v1 -> {f1 : int, f2 : 'v0, f3 : string, f4 : 'v1} or ...} = { a = <fun> , b = <fun> }

_ = M3.a(1)
> _do_ : int = 1
_ = M3.b("toto", 18.26)
> _do_ : {f1 : int, f2 : string, f3 : string, f4 : float} or ... =
>  { f1: 1 , f2: "toto" , f3: "string" , f4: 18.26 }

However, an important consequence of module types strictness is that a module cannot be passed to a function expecting a less general module or a record, for instance:

f(m) = {{
  function a(x) {
    { f1: m.f(x)
    , f2: m.f(365)
    }
  }
}}
> f : {f : int -> 'v0, ...} ->
>  {a : forall(, rows:, cols:...). int -> {f1 : 'v0, f2 : 'v0} or ...} = <fun>

_ = f(M)     // Rejected because M is too general.
> *Error*
> Function was found of type
> { f: int -> 'a, 'r.a } ->
> { a: !for all 'c.a . int -> { f1: 'a, f2: 'a } or 'c.a } but application
> expects it to be of type Mty -> 'b.
> Types int -> 'a and !for all 'c . 'c -> 'c are not compatible
> Hint:
>  Error occurred through field f.

// Ok because M.f is instantiated.
_ = f({ f = M.f })
> _do_ : {a : forall(, rows:, cols:...). int -> {f1 : int, f2 : int} or ...} = { a = <fun> }

// Rejected.
_ = f({{ f = M.f }})
> *Error*
> Function was found of type
> { f: int -> 'a, 'r.a } ->
> { a: !for all 'c.a . int -> { f1: 'a, f2: 'a } or 'c.a } but application
> expects it to be of type { f: !for all 'b . 'b -> 'b } -> 'c.
> Types int -> 'a and !for all 'b . 'b -> 'b are not compatible

// Ok.
_ = f({{ f = M.f : int -> int }})
> _do_ : {a : forall(, rows:, cols:...). int -> {f1 : int, f2 : int} or ...} = { a = <fun> }
#### Field access ("dot" access)

This construct allows to access the value of a record field following the form expr.field_name. As a consequence, it is expected that expr has a record type in which the field field_name exists with a given type τ. The type of the whole expression is then this type τ.

type t('a) = { 'a a }
> type t('a) = { 'a a }
x = { a: 3.14159 } : t(float)
> x : t(float) = { a: 3.14159 }
_ = x.a
> _do_ : float = 3.14159
_ = x.b
> *Error*
Record has type t(float) but field access expected it to have type
{ b: 'a, 'r.a }.

As shows the last example, if the "dotted" expression doesn't have a record type containing at least the required field, the expression is ill-typed. It is important to note that the expected record type must contain at least the accessed fields and may contain any other ones as the ending row variable 'r.a suggests.

#### Record update

The construct:

{ expr_0 with field_1: expr_1 , ... , field_n: expr_n }

allows creating a new record value, borrowing all the fields from expr_0 but replacing the values of field_1 to field_n by those evaluated for expr_1 to `expr_n. Three important points are:

  • The fields updated must belong to the type of expr_0. This means that record update doesn't add new fields to the record. This restriction is required to keep Opa's type system consistent.
  • The types of the updated fields can however change from the ones they had in expr_0.
  • This construct only applies on records, i.e. open or closed rows hosted by a closed column.

Let's assume that expr_0 has a record type with fields field_1 to field_n and other possible fields fields_m of type τ_ms. Expressions expr_1, to expr_n are typechecked leading to types τ_1, ..., τ_n. The type of the whole expression is then the record type where field_1 is bound to type τ_1, ..., field_n bound to type τ_n and all other fields from the type of expr_0 are unchanged. In particular, if the original record type was open (resp. closed), so will be the resulting one. This is clearly shown by the type of the function f in the example below where both the argument and return types are open records since the body of this function only add the constraint that the record must contain at least the field x.

v = { x: 3 , y: 4 }
> v : {x : int, y : int} or ... = { x: 3 , y: 4 }
w = { v with x = 3.14159 }
> w : {x : float, y : int} = { x: 3.14159 , y: 4 }
function f(r) {
  { r with x: "str" }
}
> f : {x : 'v0, ...} -> {x : string, ...} = <fun>
x = f({ z: 5 })
> *Error*
> Function was found of type { x: 'a, 'r.a } -> { x: string, <'r.a> } but
> application expects it to be of type { z: int } or 'c.a -> 'b.
> Types { x: 'a, 'r.a } and { z: int } or 'c.a are not compatible
> Hint:
>   Field(s) x only appear(s) in the first type.
> Hint:
>   Field(s) z only appear(s) in the second type.

Type coercion

Although Opa proposes type inference, it is sometimes useful to force an expression to have a certain type instead of the one naturally guessed by the typechecker. This mostly happen when a particular signature is wanted for an expression (for instance for sake of abstraction) or when a type was structurally inferred although the user prefers to get a named type having the same structure (either for sake of shortness or because the type structurally inferred is recursive and hence must be explicitly replaced by a named type since cyclic types are not accepted in types appearing in values interfaces).

Type coercion, expr : t, typechecks expr leading to a type τ. Then is converts the type expression t leading to a second type τ'. Finally, if τ and τ' are compatible, it gives the whole expression the type τ'.

type t = int
> type t = int
_ = 4 : t
> _do_ : t = 4
_ = 4.5 : t
> *Error*
> Description: This expression has type float.
> However, according to the context, it seems that it should have type t.
type ilst = { head : int , tail : ilst } or { nil }
> type ilst = {head : int, tail : ilst} or {nil}
function f(x, y) {
  if (x) {
    { nil }
  } else {
    { head: y , tail: f(true, y) }
  }
> Cyclic type 'a
>  where 'a = { head: 'b, tail: 'a } or { nil } or 'c.a##
> *Error*
> Exportation failure
function ilst f(x, y) {
  if (x) {
    {nil}
  } else {
    {head: y , tail: f(true, y)}
  }
> f : bool, int -> ilst = <fun>

The first coercion in the above sample code show that because t and int are compatible, the value 4 being naturally of type int can be assigned the type t. The second example shows that because 4.5 has type float and this latter and int are not compatible, coercion is a failure. Finally, the last example exhibits a case where an explicit naming of a recursive type must be done. First, a type similar to a list of integer is defined, ilst. Then the function f that builds empty or one-length lists of integer is defined. Since Opa typechecking is performed in a structural way, nothing in the body of f indicates that it works on a particular named type. The constraints only indicates that the function returns a record type containing a field tail having this record type. As a result, an error is issued since cyclic types must not escape in signatures. Adding a coercion on the return type of f, like in the last line of code, allows the typechecker to find a named type hiding the recursion and removes the error.

Implicit type parameters in coercions

When using a parameterized type name in a coercion, it is possible to omit the arguments by which its parameters must be instantiated. The typechecker will then automatically instantiate them according to the context. Omission applies to all or none of the parameters: only omitting some of them will result in a typechecking error related to the arity of the used type name. To omit only some of the arguments, they must explicitly be replaced by some underscore ("_") characters upon matching the right arity of the type constructor.

type t('a, 'b) = { 'a first, 'b second }
> type t('a, 'b) = { 'a first, 'b second }
_ : t = { first: 1 , second: "str" }
> _do_ : t(int, string) = { first: 1 , second: "str" }
_ : t(int) = { first: 1 , second: "str" }
> *Error*
> Description: The type instanciation for t is invalid. According to the
> definition of this type, it should have 2 argument(s). However, here, it is
> applied to 1 argument(s).
_ : t(int, _) = { first: 1 , second: "str" }
> _do_ : t(int, string) = { first: 1 , second: "str" }

In the above sample code, a record type, t, with 2 parameters is defined. The first coercion omits all of them. According to the context, since field first is assigned the value 1 and field the value "str" it is inferred that 'a maps onto int and 'b onto string. The second coercion only omits 1 of the parameters but doesn't explicitly state the omission, hence making the type t used with a wrong number of argument. Finally, the last example explicitly claims that the first parameter of t is expected to be int but second one is not explicitly given, hence making the t type constructor used with its right number of arguments and avoiding the previous error.

Type variables in coercions and generalization

Type variables appearing in coercion type expressions have precise scoping rules and may influence the generalization process, i.e. the polymorphism level of definitions. Mostly, in type annotations, type variables are identified by their name. This means that inside a definition, several occurrences of a same type variable name implies the sharing of this type variable among these occurrences. The scope of a type variable name is:

  • the current toplevel value definition,
  • the current module field definition.

In other words, all along these 2 constructs, a same variable name represents a unique type variable. For instance, in:

function 'a f(x, y) { y }
> f : 'v0, 'a -> 'a = <fun>
function 'a g('a x, y) { y }
> g : 'a, 'a -> 'a = <fun>

the definition of f introduces a type variable 'a used only once. As a result, x and y have 2 different types since they are not related. Conversely, in g, a new type variable 'a is introduced (diferent from the one of f), but used twice, once to constrain x and once to constrain y. Since these 2 occurrences of 'a are hosted in the same toplevel definition, they represent the same variable. As a result, the types of the 2 arguments of g are enforced to be the same which was not the case in f.

This example can be reproduced the same way in a module since the scope of a variable introduced in a module field is this module field:

M = {{
  function 'a f(x, y) { y }
  function 'a g(x:'a, y) { y }
}}
> M : {
>  f : forall('v0, 'a). 'v0, 'a -> 'a ,
>  g : forall('a). 'a, 'a -> 'a } =
>  { f: <fun> , g: <fun> }

As the type-forall's suggest variables are differents (i.e. not shared) between the fields f and g since they are generalized separatly. And like in the previous example, f has 2 arguments with different types although g have its ones of the same type.

CAUTION:

Introducing type variables in coercions doesn't mean that the annotated expression is polymorphic ! This only means that its type is named by this type variable. Generalization, i.e. polymorphism introduction is a separate process that arises at the end of a definition. In particular, introducing a type variable in a annotation will postpone its possible generalization upon the end of the current toplevel or module field definition. In effect, as soon as a name is introduced for a type, we can't make sure this name will be or not used later, until we get out of the definition where it is introduced. For this reason, it is not safe to generalize it as soon as we leave the definition where it appears since some later type constraints may change this variable into a non-variable type, i.e. the variable will not remain a variable hence can't be subject to generalization. Let's consider:

function f1() { rec function 'b v1('a x) { v1(x) } v1 }

f1 : -> ('a -> 'b) = function f2() { rec function 'b v1('a x) { v1(x) } _ = v1(1) _ = v1(1.0) v1 } Error Function was found of type int -> 'a but application expects it to be of type float -> 'b. Types int and float are not compatible function f3() { rec function 'b v1('a x) { v1(x) } v2 : bool -> int = v1 v1 } f3 : -> (bool -> int) =

Let's first have a look at f1. It introduces 2 variables, 'a and 'b that finally get generalized in the final type of f1 that really is polymorphic. However, it must be clear that these 2 type variables were not generalized at the end of the local definition of v1 ! They were generalized at the end of the toplevel definition of f1 through the generalization of the type of f1. In other words, inside the body of f1, the type of v1 was not polymorphic because of the presence of the type variables. If we used v1 with several types in the body of f1, we would get a typing error like it is the case in f2. The definition of f3 also shows that 'a and 'b are not generalized as soon as exiting from the definition of v1. In effect, before returning v1 as the result of f3, we define a local variable v2 as being equal to v1 but coercing its type. This implies a unification between the type of v2 which is forced to be bool -> int and an instance of the type scheme of v1. But, because v1 is not yet generalized, its type scheme is "trivial", i.e. has no generic variables. Hence, taking an instance of it implies directly sharing its scheme's body, then instantiating its non-generic variables during the unification with the type of v2. For this reason, 'a becomes bool and 'b becomes int, hence leading f3 to finally have type -> (bool -> int).

Database path

Since databases are explicitly typed at their definition, accesses to paths simply have the type they were given when the database was defined. For instance, defining a database as:

db /test/i : int
db /test/map : map(int, string)

will explicitly state that the path /test/i has type int and the path /test/map has type map(int, string). This way, paths can only by used in contexts where their types are compatible with expected ones.

#### @opensums directive

Sum type definitions always lead to closed columns. In effect, such a definition enumerates all the values of the type, hence there is no reason for having a trailing column variable. In some particular cases, when an expression has a type coming from such a definition, it may be useful to alter its type to make it an anonymous open sum in order to "plug" it in a larger sum type. Let's consider the following example:

type small = { a } or { b } or { c }
> type small = {a} or {b} or {c}
type larger = small or { d }
> type larger = small or {d}
function larger_to_string(larger x) {
  match (x) {
    case { a }: "a"
    case { b }: "b"
    case { c }: "c"
    case { d }: "d"
  }
}
> larger_to_string : larger -> string = <fun>
small_v = { a } : small
> small_v : small = { void a }
_ = larger_to_string(small_v)
> *Error*
> Function was found of type larger -> string but application expects it to be
> of type small -> 'a.
> Types { a } or { b } or { c } or { d } and { a } or { b } or { c } are not
> compatible
> Hint:
>   One of the sum types may be missing the following cases of the other:
>   { d }.

In the above example, the type larger clearly contains all the values of the type small, and even more. The function larger_to_string operates on all these values, hence is also able to operate on the type small. However, applying it to a value of type small fails. The reason is that the type small is a closed sum. Hence it can't unify with larger since it has no column variable to acquire the field c. To workaround this issue, the @opensums directive can be applied on the expression small_v in order to replace its type small by the structure of this type in which all columns are turned open (i.e. ending by a column variable).

_ = larger_to_string(@opensums(small_v))
> _do_ : string = "a"

This directive only has an effect when applied to an expression having a sum type. Applying it to another type will not change the type unless it is a type variable. In effect, type constraints are discovered in an arbitrary order by the typechecker. At some point, it may arise that the type of an expression is not yet complete, hence still remain a type variable even if later it will appear to be a sum type. If the directive is applied to such an expression, it can't yet have any effect although if the type constraints were acquired sooner it would have an effect. When later the type variable may get instantiated by a sum type, it will be too late to open it since the directive expression will have been already processed. To prevent such non-commutative behavior, an error message is issued requesting to add a type constraint on the expression in order to be able to have its type before applying the directive.

function f(x) { @opensums(x) }
> *Error*
> Expression argument of the @opensums directive has a type that can't be subtype of another
> type (sum or record).
> Hint:
>   Its type is currently unconstrained. Consider explicitly adding a type constraint toward
> a sum type to precise the expression's type.

It is important to note that this directive recursively (i.e. searching in depth) opens all the sums present in the type of its argument, except on function types where it only descends on the result type and not on the arguments types. In particular, named types are examined and their structure is also recursively traversed by the opening operation. It is worthy to make clear that because opening a sum type changes its nature, if an expression appears to have a named type being a sum, once open, it won't have anymore this named type, but instead, the structure of this named type in which sums are now open.

#### @openrecord directive

As presented in the section dedicated to the record update construct, fields borrowed from a record must exist in this record: it is forbidden to add a new field. This restriction is imposed for sake of consistency of Opa's type system. However, when 2 different record types have numerous fields in common, writing a function transforming a value of the first type to a value of the second one, we would like not to have to manually map the common fields' values 2 by 2 between the 2 records. For instance:

type house = { number : int , street : string , city : string , zip : int }
type rented_house = { number : int , street : string , city : string ,
  zip : int , start_date : int , end_date : int }

defines 2 types, the first one describing houses in general and the second one describing houses being rented. The only difference is that the latter has 2 more fields than the first one. When modelling the fact that a house gets rented, we would like to write a function like:

function rented_house rent(house h, start_day, stop_day) {
  { h with start_date: start_day, end_date: stop_day }
}

However, above restriction forbids the addition of new fields as reported by the error message:

> *Error*
> Record to update has type house but extension requires it to have type
> { end_date: 'a, 'r.a }.
> Types { end_date: 'a, 'r.a } and
> { city: string, number: int, street: string, zip: int } are not compatible
> Hint:
>   Field(s) end_date only appear(s) in the first type.
> Hint:
>   Field(s) city number street zip only appear(s) in the second type.

One solution would then be to write this function under the form:

function rented_house rent(house h, start_day, stop_day) {
  { number: h.number, street: h.street, city: h.city, zip: h.zip,
    start_date: start_day, end_date: stop_day }
}

which is very tedious when the number of common fields is important. As an alternative it is possible to explicitly request a record type to get open, hence adding it a fresh row variable to accept addition of new fields by the directive @openrecord. The renting function can then be written as:

function rented_house rent(house h, start_day, stop_day) {
  { @openrecord(h) with start_date: start_day, end_date: stop_day }
}

and finally gets the expected type: house, int, int -> rented_house. The @openrecord directive can only be applied on an expression having a record type. Applying it to another type will raise an error depending on the encountered type.

First, when the type is a variable, for the same kind of reason as explained about the @opensums directive, it is requested to add a type constraint to explicitly precise the type as shown in the following example:

function f(x) { @openrecord(x) }
> *Error*
> Expression argument of the @openrecord directive has a type that can't be subtype of
> another type (record).
> Hint:
>   Its type is currently unconstrained. Consider explicitly adding a type constraint
> toward a record type to precise the expression's type.

Second, for any other type not being a record or a variable the default error message is issued without any hint:

_ = @openrecord(1.0)
> *Error*
> Expression argument of the @openrecord directive has a type that can't be subtype of a record type.

It is important to note that, conversely to the @opensums directive, @openrecord doesn't recursively traverse types: it only acts on the toplevel structure of the type being expected, as previously explained, a record type.

Dealing with recursive types

By default, Opa typechecker infers types in a structural way. This means that, without user coercions, it determines the types by guessing their structure rather by giving them names. This is pretty satisfactory since several named types can have a same structure and there is no reason the typechecker decides itself to give a name a priori to a type structure. For instance, consider:

type t = { a } or { b } or { c }
> type t = { a } or { b } or { c }
type u = { a } or { z } or { c }
> type u = { a } or { z } or { c }
function f(x) {
  match (x) {
    case { a }: "a"
    case { c }: "c"
    default: "something else"
  }
}
> f : {c} or {a} or ... -> string = <fun>

the function f has no reason to be considered as taking an argument of type t more than of type u. The typechecker inferred a structural type, i.e. the shape of the type. If we want f to take explicitly a particular named type, it must be coerced by f(x : t) or f(x : u). The fact that types are inferred structurally is powerful since f can naturally be used with a value of type t or u since both of them have a compatible structure:

_ = f({ a } : t)
> _do_ : string = "a"
_ = f({ c } : u)
> _do_ : string = "c"

However, this also has drawbacks. During type inference, the typechecker knows how to deal with, i.e. build if required, cyclic (recursive) types. However such types can't escape in values signatures. They must get mapped onto a named type, whose definition is obviously recursive, who hides the structural recursion. This may require the user to add some type annotations, i.e coercions, to prevent these recursive types from remaining in the final signature of the definition which leads to an error message.

function len(l) {
  match (l) {
    case { end }: 0
    case { head: _, tail: t }: 1 + len(t)
  }
}
> Cyclic type 'a
>   where 'a = { 'b head, 'a tail } or { end }
> *Error*
> Exportation failure

In the above example, the system determined that len is a function taking a sum type with 2 cases, the second one containing a field, tail being of this sum type. Hence, the type is really cyclic as reported in the error message. To avoid such a recursive type, one must define a named type having an equivalent structure:

type mylst('a) = { end } or { 'a head, mylst('a) tail }

allowing to "name" the recursion. From now, adding a type annotation in the source of len will give the typechecker a clue "which type is used here" and will allow replacing the cyclic parts of the structure by the type name.

function len(mylst('a) l) {
  match (l) {
    case { end }: 0
    case { head: _ , tail: t }: 1 + len(t)
  }
}
> len : mylst('a) -> int = <fun>

It could have been possible to put the type constraint at some other points, like:

function len(l) {
  match (l) {
    case { end }: 0
    case { head: _ , tail: t } -> 1 + len((mylst('a)) t)
}

but a good practice is to put constraints directly in the function's header, the closest to its parameters and return types definition. Hence, the fully annotated definition of len according to this good practice would be:

function int len(mylst('a) l) {
  match (l) {
    case { end }: 0
    case { head: _ , tail: t }: 1 + len(t)
  }
}

where the return type is stated after the parameters of the function and before its body.

Clone this wiki locally