/
ref_type_system.omd
2262 lines (1862 loc) · 104 KB
/
ref_type_system.omd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
//[appendix]
The type system
===============
This section details the type system of Opa, including:
- definition of types
- polymorphism and rank 2 polymorphism
- interactions with packages and modules
- type error messages
- type best practices
/////////////////////////////////////////////////
// Main editor for this section: Francois Pessaux
/////////////////////////////////////////////////
//////////////////////////////////////////////////////
// If an item spans several sections, please provide
// hyperlinks, e.g. type definitions have both a syntax
// and a more complete definition on the corresponding
// section
//////////////////////////////////////////////////////
//////////////////////////////////////////////////////
// If an item is considered experimental and may or may
// not survive to future versions, please label it using
// an Admonition block with style [CAUTION]
//////////////////////////////////////////////////////
Type system overview
------------------------
### Strong typing
Opa is a *statically* and *strongly* typechecked language. This means
that the type of an expression is computed once and for all at compile-time,
an expression has one unique type and this type can't be used as another one.
In Opa, any expression evaluates into a value and functions are first class
values. This means that, like integers, booleans, strings in other languages,
functions in Opa can be passed as arguments, returned as results, stored in
data-types, and so on.
### Type inference
Opa compiler offers *type inference*, i.e. it automatically determines
the type of expressions from the context, without the user needing to write these
types. Hence, a definition like
x = 5
will lead `x` to have the type `int` because the integer literal `5` is
known to be of type `int`, this type representing the type of integer values.
As a bit more complex example, the function
function chose(cond, if_true, otherwise) {
if (cond) { if_true } else { otherwise }
}
will be found having the type
(bool, int, int) -> int
where `bool` represents the type or boolean values
and `->` is the function taking arguments of the types
stated on the left side and returning a result of the type stated on the right
side of the arrow symbol.
### Polymorphism
The type system of Opa features *polymorphism*, i.e. types in which some parts
are not constrained and can be of any type. For instance, the identity function:
function identity(x) { x }
doesn't set any constraint on it argument's type:
it can safely be applied to any kind of value and will return this kind of
value. Its type will then be `'a -> 'a` where `'a` is a *type variable*,
i.e. _any type will do for 'a_. Using `identity` with an
integer as argument will then instantiate `'a` with `int`. Using `identity`
with a boolean will then instantiate `'a` with `bool`.
### Records and sums
In addition to usual data-types (integers, booleans, string, floating point
numbers), Opa provides two cornerstone structures: *records* and *sums*.
A *record* can be seen as a `struct` in C, i.e. a non-ordered sequence of
fields packed into one global entity. However, conversely to C these records
can be either _closed_ or _open_: the list of fields can be fixed or can
contain _at least_ the ones stated in the record type.
A *sum* is a set of possible _cases_, each case being a record. Hence, a sum
allows to express the fact that a data-structure can be of different shapes. This
can be compared to `union` in C, with the difference that the effective shape
of the data is directly embedded inside its representation.
### Type abstraction
In the Opa system, *type abstraction*, i.e. ability to hide the actual
structure of a type is provided thanks to the notions of *private* and *abstract*
types. The first ones are types that only have an existence in
the package where they are defined; they are not visible outside this package.
The second ones are transparently visible (i.e. their structure can be
manipulated) only in the package where they are defined; outside this package,
they become opaque, i.e. are still visible but their internal structure is not
visible anymore.
### Modularity
Opa provides a certain notion of *module*, i.e. packages containing value
definitions via a particular form of records. Such records are syntactically
marked as being modules and will get a particular type called *type forall*.
Under some restrictions due to typechecking decidability, it is possible to
use these modules like regular record, hence passing them through functions or
returning them as result.
### Type inference: restriction
Type inference is not complete for some parts of Opa features and
a few type annotations are required from the user. This is mostly due to the
expressiveness of the language that provides very flexible features at the price
of the non-existence of a principal type. Existence of a principal type means
that there is always one _better_ type, i.e. all the other types that could be
chosen for an expression would reduce the contexts where this expression can
be used. In such a case, the user will have to indicate himself the type he
wishes using a type annotation to enforce the expression to be of this type.
This construct is called a *coercion*. On the other side, coercion by type
annotations may also be required when the
typechecker infers a cyclic type. In effect, such types are internally used by
the typechecker but are not currently allowed outside in the language.
Type algebra
------------
As previously stated, expressions evaluate to values of certain types. The
type algebra describes the shape of these types. The present section doesn't
explains how types are synthesized from expressions even if some examples are
given. We rather address here an intuitive presentation of the types.
### Constant types
Natively, Opa provides three basic types:
- `int` : The type of signed integer values (e.g. `0`, `-4`).
- `string` : The type of characters strings (e.g. `"this is a string"`).
- `float` : The type of signed floating point values (e.g. `0.489`, `-5.94`).
[[type_variable]]
### Type variable
As stated in introduction, type variables are used to represent
types with not particular constraint,
i.e. types we don't need to know about or we don't know yet their effective
internal structure. Type variables are written by a leading quote followed by
regular identifier, for instance: `'a`, `'myvar5`.
Type variables act in two particular parts of the type system.
* During type inference, they are used by the typechecker to temporarily give types to expressions that at a certain point do not exhibit the need to have a particular type.
* At the end of type inference of a definition, type variables that still remain unconstrained are _generalized_. This means that the typechecker discovered that there is no need to know more about the structure of the types they represent and hence decides to make the type polymorphic: any type will be suitable in place of these variables. Hence, in the type of a definition (not expression), all finally appearing type variables are implicitly quantified by a _for all_ qualifier, telling that the definition having this type has all the possible *type schemes* where type variables can be instantiated (replaced) by any type. In other words, a type scheme is a _model_, a _pattern_ of types, describing a _family_ of possible types.
This last point raises an important concept:
- *Expressions* have a *type*.
- *Definitions* have a *type scheme*.
In other words:
- In an expression a type variable is just an *unknown* that may one day be known equal to some type.
- In a definition a type variable is a *universally quantified type*. Each instance, usage of the identifier bound by this definition is free to replace these variables by types it needs in its context.
### Function type
A function is characterized by the type of its arguments and the type of its
result. The type constructor for a function is the _arrow_ symbol: `->`. On the
left part the types of the arguments are listed, comma-separated if there are
several arguments. On the right part is the type of the result of the function.
###### Examples:
- `-> int` : Function taking no argument and returning an integer.
- `int -> bool` : Function taking one argument of type integer and returning a boolean.
- `int, string -> string` : Function taking two arguments, the first one being of type integer, the second of type characters string and returning a characters string.
- `int, (int -> bool), int -> bool` : Function taking three arguments, the first and third ones being of type integer, the second one being a function taking an integer and returning a boolean. Finally, the function returns a boolean. Note the presence of parentheses enclosing the function type of the second argument.
### Named type
Types can be inductively combined to form complex data-types and bound to names
thanks to *type definitions*. Hence, the following extract defines a new type
name `natural`, whose structure is `int`.
type natural = int
Once this type is defined, we may use type name `natural` to denote this new
type.
Types can have parameters. For instance, the type of pairs with their two
components being of a same type can be defined as:
type same_comp_pair_ty('a) = ('a, 'a)
A pair denoted by the type `same_comp_pair_ty(float)` will have its two
components carrying floating point numbers.
The type of pairs with components of any types can be defined as:
type same_comp_pair_ty('a,'b) = ('b, 'a)
Note that the order of the type variables `'a` and `'b` is different in the
quantification (i.e. left-part of the definition, on the left of the `=` sign)
and in the body of the type definition. As a result, this means that pairs
having type `same_comp_pair_ty(int, bool)` will carry integers in their second
component and booleans in their first component.
Finally the type of relational operators (i.e. functions taking two arguments
of the same type and returning a boolean value) can be defined as:
type relational_ty('a) = 'a, 'a -> bool
Following this definition, the type of relational operators testing integers
is `relational_ty(int)`.
Hence, a named type is an identifier followed by as many types comma-separated
and enclosed between parentheses as the corresponding type definition has
parameters.
See also the section dealing with [type definitions](/manual/The-core-language/Type-definitions).
[[algebra_sum_type]]
### Sum type
Sums are a sequence of cases, each case being a record type.
A record is a sequence of typed fields. In a record, accessing a fields will
yield a value of the type of this field. Sums as record can be either *closed*
of *open*. A *closed* record type (respectively sum) enumerates exactly the
fields it contains (respectively cases for a sum). An *open* record type
(respectively sum) enumerates the fields (respectively cases for a sum) it may
contain at minimum, and allows to have more if needed. In case of open record
or sum, the type ends by a trailing ellipsis, otherwise it is silently closed.
An *open* sum ends by a *column variable*, i.e. a kind of type variable that
can be instantiated by another sum. In other words, this column variable allows
the sum to gain more sum cases.
An *open* record ends by a *row variable*, i.e. a kind of type variable that
can be instantiated by another record (also called a _row_). In other words,
this row variable allows the record to gain more fields.
In error messages (where they need to be clearly named), column variables
(resp. row variables) are denoted by `'c.a`, `'c.b`, etc
(resp. `'r.a`, `'r.b`, etc). In other cases, when types are printed and
the identity of such variables doesn't matter, they are both written "...".
Here follow some examples of correct record types :
- `{ }` : Empty record type. It contains no field.
- `{ int x, float y }` : Closed record containing exactly the two fields `x` of type `int` and '`y` of type `float`.
- `{ int x, ... }` : Open record type containing at least a field `x` of type `int` and possibly any other fields of any type.
Sums are a form of sequence of records. They can be non-disjoint, meaning that
some records forming cases may overlap.
{block}[IMPORTANT]
A strong restriction imposes that an *open sum* must not host *any* case
that is an open record and any (*closed or open*) *sum* must not
host *several* cases that are open records.
Moreover, if several cases (i.e. records) of a sum share a same label, this label
must have the same type in each cases. Finally, if a sum has several cases,
none of them must be open.
{block}
Here follow some correct sum types examples:
- `{ int a } or { float b }` : Closed sum containing two cases, one being a record having exactly one field `a` of type `int`, the second begin a record containing a record having exactly a field `b` of type `float`.
- `{ int a } or ...` : Open sum containing at least one case being a record having exactly a field `a` of type `int` and possibly any other cases.
- `{ int a } or { int a, float b }` : Closed sum containing exactly two cases, one being a record containing exactly one field `a` of type `int`, the second being a record containing exactly two fields, `a` of type `int` and `b` of type `float`. We see in this example that fields can be common to several cases as long they keep the same type. For example, having `a` of type `bool` in one of the cases would be ill-typed.
- `{ int a } or { int a, float b } or ...` : Open sum type similar to the previous example. It may contains additional other cases compared to the previous example.
- `{ int a }` : Closed sum with only one case. It can be hence seen as a _record_ (i.e. like a _struct_ a la C) exactly containing one field `a` of type `int`.
- `{ int a, ... }` : Closed sum with only one case. This case is open hence can be seen as a record containing at least one field `a` of type `int`. Any other record containing additional fields is compatible with this type.
[[type_forall]]
### Type forall
We saw in section [Type variable](/manual/The-type-system/Type-algebra/type_variable) the difference between *types*
and *type schemes*. Variables present in type schemes are universally
quantified, hence a type scheme represent a family of types. A *type forall*
embeds a type scheme inside a type and represents the type of a module field.
Such a type is written by prefixing its body by the keyword `forall` followed
by the list of polymorphic type variables of the type between parentheses
followed by a "`.`" (dot) then the body of the type. Type-forall are in fact a
way to introduce a flavor of rank-2 polymorphism in Opa type system.
Rank-2 polymorphism is a system in which a forall quantifier
may not appear to the left of more than 2 arrows (when the type is drawn as a
tree). To illustrate this concept, consider the function:
function f(x) { x }
_ = (f (true), f(2))
variables bound with a definition (sometimes called _let-definition_ may be
polymorphic, as in `f` above. That's the reason why `f` can be applied to two
different types, `bool` and `int`.
However, in a rank-1 polymorphism system, function arguments may not be:
function g(f) {
(f(true), f(2))
}
results in an error:
> *Error*
> Function was found of type bool -> 'a but application expects it to be of
> type int -> 'b.
> Types int and { false } or { true } are not compatible
In a rank-2 polymorphism system, we can give `g` a signature like
`(forall a. a -> a) -> (bool, int)`. This is called a rank 2 type, because a
function argument is polymorphic, as indicated by the forall quantifier. Now the
function `g` may be applied to expression whose generalized type is at least
as general as that declared.
Modules are particular records in which each field type gets generalized. This
way, a module can be seen as a package of definitions grouped in the module
rather than laying at toplevel.
When a module field definition appears to be polymorphic, its type is turned
into a type forall where all generalizable variables are universally quantified.
The type of the module is hence a *closed* record type whose fields names are
the fields definitions names with their related type turned into a type forall
if they are polymorphic. Hence, types forall are closely related to module
fields types and can only be created via module expressions.
The following program clearly show the difference between the type of `r` which
is a regular record and the one a `m` which is a record containing types
forall for fields having a polymorphic type.
r = {
function id(x) { x },
function pair(x) { (x, x) },
one: 1
}
> r : {('v0 -> 'v0) id, ('v1 -> tuple_2('v1, 'v1)) pair, int one} or ... =
> { id: <fun>, one: 1, pair: <fun> }
m = {{
function id(x) { x },
function pair(x) { (x, x) },
one: 1
}}
> m : {(forall('v0). 'v0 -> 'v0) id,
> (forall('v1). 'v1 -> tuple_2('v1, 'v1)) pair,
> int one } =
> { id: <fun>, one: 1, pair: <fun> }
In the record type of `r`, no type forall appear even if the type scheme of
`r` itself is polymorphic (`'v0` and `'v1` are generalized). In the record
type of `m`, since it is a module, all fields having a polymorphic type get
generalized and turned into forall types. One can notice that `one` being
not polymorphic, it is not turned into a forall type.
{block}[IMPORTANT]
Types forall are explicitly created by the developer when he uses a module
construct: there is hence no automatic type forall creation. Conversely, when
a type forall appears in an expression, it is automatically instantiated, hence
leading to a type in which generalized variables are replaced by fresh variables
and universal quantification disappears (in other words, the type forall
becomes a type non-forall again).
{block}
Type algebra -- formal grammar
------------------------------
For sake of completeness, we give here a presentation of the type algebra like
usually described in theoretical approaches. This section is mostly dedicated
to advanced users or people familiar with type systems design. A more complete
description of this theoretical presentation is available in the technical
and architecture documents of Opa.
###### Type
```
τ ::= `int` Integer constant type
| `bool` Boolean constant type
| `string` String constant type
| `float` Floating point numbers type
| `void` Trivial (i.e. having 1 unique constructor) constant type
| α Type variable
| t ( τ ) Named type constructor (possibly parameterized)
| τ `->` τ Function type
| κ Sum type
| ∀ α, ρ, δ . τ Type forall
```
###### Row
```
φ ::= • Closed row
| ρ Row variable
| l : τ, φ Field sequence
```
###### Sum
```
κ ::= • Closed column
| ζ Column variable
| { φ } or κ Case (record) sequence
```
Typing a program
----------------
Now that the form of Opa's types has been presented, we will address how Opa
programs are typechecked. The aim of this section is to understand the type
to expect for each construct of the language. This presentation is split in two
parts, one dealing with type definitions and one dealing with expressions.
[[type_definitions]]
### Typing type definitions
Type definitions allow to create new data-structure from previously existing
ones or allow to alias, i.e. provide a name as a shortcut for cumbersome type
expressions. Both usages follow the same syntax and rules.
#### Structure of a type definition
A type definition is introduced by:
[directive] type [directive] ident[(parameters, ...)] = type expression
where _ident_ is the name of the created type, _parameters_ is a
comma-separated enumeration between parentheses of the type variables
parameterizing the definition (or nothing if the definition is not
parameterized), and _type expression_ is the body of the defined type. We will
see directives in the next section.
Once defined, the new type name will be compatible with itself and any
type expression compatible with the body of its definition. Hence, following
the definition `type t = int`, the type `t` can be used anywhere an `int` is
expected and reciprocally.
#### Directives in type definitions
Directives allow to alter the visibility of a type definition to ensure
abstraction. Two directives are available: `@private` and `@abstract`.
`@private` makes the type definition only visible in the package where it
it defined. Outside this package, the type is no more visible : it doesn't
exist for other packages. Such types are typically _internal_ to a
package and serve only to implement functionalities without appearing in
the interface of the package. As a consequence, they must not escape outside
the package, i.e. must not appear in the signatures of values visible outside
the package. To make this possible, values manipulating such types must also
be declared private, using the `@private` directive also available for
value definitions. If a private type appears in the signature of an exported
value, an error is raised, advising to make this value private, like in the
following example:
package private_ty_escape2
@private type prv_string_ty = string
vv = "" : prv_string_ty
> *Error*
> Description:
> Definition of vv is not private but its type contains a type private to the
> package. Type prv_string_ty must not escape its scope.
> Hint:
> Add a @private directive on this definition.
`@abstract` makes the type definition transparently visible inside the
package where it is defined, i.e. its internal structure can be accessed and
manipulated, but outside this package it turns opaque: the type is still
visible (i.e. it still exists) but its internal structure is not visible
anymore.
Such types are typically data-structures implemented in a package, provided
to other packages that must not directly manipulate the effective
implementation of the data-structure and instead use the API provided by the
package implementing the data-structure. This is a kind of abstract data type
(ADT), which permits to preserve consistency and internal invariants required
by the implementation of a data-type. The basic idea is that the definition
package "knows" how to safely manage the type, provides a safe API to the
outer "world", hence one gets sure that nobody will come and "hack by hand",
hence break, invariants required by the data-structure to keep consistent.
Outside its definition module, an abstract type loses its internal
representation, hence becomes compatible only with itself.
package p1
@abstract type u = int
function u to_u(int x) { x }
function int from_u(u x) { x }
// Legal since we are in the defining package: we see u's internals.
_ = 1 + ((u) 1)
package p2
import p1
// Obtain a value of type u from the integer 0.
v = to_u(0)
_ = 1 + v
Because the last line of code of package `p2` tries to rely on `u`'s
internal structure although it is not the package defining it, `u` being
turned abstract can't be compatible with `int` anymore, hence leading to the
error:
> *Error*
> Function was found of type 'a, 'a -> 'a but application expects it to be of
> type int, u -> 'b.
> Types int and u are not compatible
### Parameters in type definitions
Parameters allow to define types polymorphically, in order to instantiate them
later by providing effective type arguments to the type name in future type
definitions or type expressions. Let's examine the following sample code.
type fun_t('a, 'b) = 'a -> 'b
type int_string_fun_t = fun_t(int, string)
function twice_general(f , i) {
f(i) + f(i)
}
> twice_general : ('v0 -> int), 'v0 -> int = <fun>
function twice(fun_t(int, int) f, i) {
f(i) + f(i)
}
> twice : fun_t(int, int), int -> int = <fun>
`fun_t` is defined as the type of functions taking one argument an returning
a value, both of them not being fixed in the definition. Hence the type
`fun_t` is _partially defined_ and will need to be instantiated with
effective arguments in further usages.
Next comes the definition of `int_string_fun_t` using `fun_t` and specializing
this latter with `int` for both the types of the argument and returned value of
the function. By this definition, we used the general `fun_ty` to create a more
specialized type definition.
Let's now consider the definition of `twice_general` taking a function, a
parametric value and adding twice the result of applying this function to this
value. The type of `` (internally `int, int -> int`) enforces the result of
`f` to be `int` but no constraint applies on the type of the argument of
`f`. Hence `f` has type `'v0 -> int`, and by consequence `twice_general` also
has a parameterized type `('v0 -> int), 'v0 -> int`.
Using the type definition `int_string_fun_t`, we can define a function similar
to `twice_general`, i.e. with the same body but with a more restricted type:
`twice`.
Hence, by adding a type coercion on the parameter `f` of `twice`, we can
enforce it to be of type `fun_t(int, int)` hence forcing the parameter of `f`
to be `int` instead of leaving it free like in `twice_general`.
{block}[NOTE]
Note that both definitions of `fun_t` and `int_string_fun_t` do not really
create _new_ types. They are aliases, _shortcuts_ for a type expression.
Instead of using these 2 defined type names, we could directly use their body
but this would be less handy. Creation of really new types is achieved by
defining sums and records as described below.
{block}
#### Using type expressions in a type definition body
The basic type expressions that can be used allow to represent the types
described in the [Opa type algebra](/manual/The-type-system/Type-algebra).
##### Basic constant named types
`int`, `bool`, `string`, `float`, `void`.
`type t = int` defines `t` as an alias on the type of integers `int`.
##### Function type: `->`
* `type t1 = int -> float` defines `t1` as the type of functions taking an `int` and returning a `float`.
* `type t2 = -> float` defines `t2` as the type of functions taking no argument and returning a `float`.
* `type t3('a, 'b) = 'a -> 'b` defines `t3` as the type of functions taking an argument and returning a value whose types are not fixed and are parameters of the type definition. Hence, `t3(int, float)` specializes these parameters and lead to a type compatible with the above type `t1`.
* `type t4 = int, string -> bool` defines `t4` as the type of functions taking two arguments (of types `int` and `string`) and returning a `bool`.
##### Named type
Type definitions bind type expressions to names. So defined type names can
be on their turn used to form new type definitions. If a type name represents
a parameterized type, it must be instantiated providing effective type
arguments according to its arity (i.e. number of required arguments).
* `type t5 = t1 -> t2` defines `t5` as the type of functions taking a parameter of type `t1` and returning a result of type `t2`, both types being those presented in the above paragraph.
* `type t6 = t3(bool, int)` defines `t6` using the previously defined type `t3` and applying it to `bool` and `int`. As a consequence, `t6` is the type of functions taking an argument of type `bool` and returning a value of type `int`.
##### Sum
A sum type is an enumeration of _cases_ separated by `or`.
Each _case_ is a _record_ i.e. an enumeration between braces of labels
with their type.
`type t7 = { int A } or { string B, bool C }` defines `t7` as a new
sum type containing two _cases_. The first one is a record containing a
field `A` of type `int`; The second one is a record containing two fields,
`B` of type `string` and `C` of type `bool`. Values of this type can be
_pattern matched_ to discriminate their effective form and access the
particular fields of each case.
Sum type definitions always lead to _closed_ sum, i.e. the resulting
type doesn't ends by a "..." (column variable). This simply means that
when defining a type, the definition simply describes all the _cases_
this type can host.
As a shortcut, when a field of record has type `void`, its type
specification can be ommited. Hence,
`type color = { Red } or { Green } or { Blue }` defines `color` as a sum
with three cases, each of them implicitly of type `void`. This is equivalent
to `{ Red : void } or { Green : void } or { Blue : void }`. The idea in
such a definition is that the fields of the record are self-sufficient and
do not need to carry additional information to be precise enough.
`type pair('a, 'b) = { 'a first, 'b second }` defines `pair` as a
parameterized sum with only one _case_. This _case_ is a record
containing two fields, one per component of the pair. Since the pair
data-structure is naturally polymorphic, this type definition has in effect
2 parameters `'a` and `'b`.
{block}[IMPORTANT]
Remember, as already state while presenting Opa type algebra, in the
[sum types section](/manual/The-type-system/Type-algebra/algebra_sum_type), that a strong restriction imposes that
an *open sum* must not host *any* case that is an open record and any
(*closed or open*) *sum* must not host *several* cases that are open records.
Moreover, if several cases (i.e. records) of a sum share a same label, this label
must have the same type in each cases. Finally, if a sum has several cases,
none of them must be open.
{block}
##### Record
A record type is defined like (and compatible with) a sum type with only one
_case_. This way, by default a record type is a closed row in an open
column.
##### Type forall
Type-forall being strongly related to modules, they are defined using a
syntax close to modules, allowing to avoid the explicit quantification of
type variables. In fact, type-forall are not directly written, they are
introduced on need through module types. Any type variable appearing in the
type of a field of the definition and not bound outside will be implicitly
considered as quantified.
`type t = {{ identity : 'a -> 'a }}` defines `t` as a module type (i.e. a
record) in which the field `identity` has the type-forall
`forall('a).'a -> 'a`.
Note that variables not bound outside the module type have a scope local to
the field's type where they appear. For instance, in
`type t = {{ ('a -> 'a) identity, 'a val }}`, `'a` represents 2
different variables in `identity` and `val`.
Variables bound outside the module type are not quantified, hence not
participating into type-forall creation. For instance in
`type t('a) = {{ ('a -> 'a) id, 'b v }} -> 'a` the type variable `'a`
being a type parameter is bound outside the module type, hence is not
quantified. By consequence, the type of the field `identity` is no more a
type-forall. Conversely, `'b` being not bound remains quantified and
introduces a type-forall. This also shows that type-forall are introduced
only when polymorphism exists: there exists no _trivial_ type-forall,
i.e. no type-forall with an empty set of quantified variables.
A few syntactic extensions allow shortcuts to make life easier.
#### Sum sugar
Additionally, a syntactic shortcut allows to merge several sum types
definitions to create a new definition containing the _cases_ of all the
initial types. In the same idea than when creating a sum type, i.e. separating
its different cases by a `or`, the names of types to merge are simply separated
by `or`.
type t = { A } or { B }
type u = { C }
type v = { D }
type w = t or u or v
_ = (u) { C }
> _do_ : u = { C: void }
In the above sample code, `w` finally stands for the type definition
`{ A } or { B } or { C } or { D }` and expression `{ C }` can
be coerced into `w` although it is initially a value of type `u`.
You may note the `_ = expr` construct which allows evaluating the
expression `expr` then throw its result value. Since this value is not bound
to any identifier, Opa names it `_do_` in its feedback messages.
#### Tuples
Tuples are generalization of pairs. A pair is a data-structure with 2
components: a tuple has an arbitrary number of such components.
Two tuples types are compatible if their components are compatible 2 by 2.
This especially means that 2 compatible tuples types must at least have the
same number of components. Syntactic sugar allows to write tuples types as
comma-separated sequence of types enclosed between parentheses.
`type t10 = (int, bool, int, int)` defines `t10` as a tuple with 4
components, all being of type `int` except the second one.
`type t11('a) = ('a, 'a, 'a)` defines `t11` as a tuple with 3 components
all being of the same type, parameter of the type definition. Using
`t11` in a type expression with `bool` as effective argument will hence
represent the type of a tuple with 3 components all being booleans.
{block}[NOTE]
Opa standard library already proposes numerous data-types like lists, maps,
sets, etc which may be directly used in developments, avoiding the need to
implement again the related common and widespread algorithms.
{block}
### Typing expressions
In this section, we will review each Opa construct and explain which way its
type is computed. In some cases, we will see that inference is not complete and
type annotations may be added by the developper to help the typechecker in its
job.
#### Constant expressions
Constant expressions are given their natively defined type:
- Integer values: `int`
- Boolean values (`{ true }` and `{ false }`): `bool`
- Characters strings literals: `string`
- Signed floating point values: `float`
- The trivial type value (`{}`): `void`
###### Examples:
i = 5
> i : int = 5
f = 3.14159
> f : float = 3.14159
s = "foo"
> s : string = "foo"
v = {}
> v : {} or ... = void
b = true
> b : bool = { void true }
First, notice the apparition of a construct `ident = ...` we will detail
deeper [later](/manual/The-type-system/Typing-a-program/identifier_binding). For the intuition, let's simply
say it *binds* (_attaches_ the result of its right-side expression to the
name `ident`). This allows to name values. Be aware that this is different
from the notion of *assignment* "à-la C" which permits to modify in place the
value stored in the memory cell touched by the assignment. The difference between
a *binding* and an *assignment* refers to the notion of mutability, i.e. the
possibility to alter the value represented by the identifier. A binding links
an identifier once for all to a value. Some kind of values are mutable, this is
the way assignments are obtained aside bindings.
Note in the above sample code that the void and boolean values have been typed
as sums, not as named types `void` and `bool`. This is due to the fact that
type inference is structural: named types are introduced by type
annotations. Types `bool` and `void` are just abbreviations on the
structural types `{}` and `{ void true } or { void false }`. If no explicit
type annotation imposes to see these void and boolean values as `void` and
`bool`, the structural types induced by their structures remain the primarily
ones inferred.
[[typing_function]]
#### Functions
A function may have arguments but always return a value. Hence, defining
a function requires to define its possible arguments and write its body, i.e.
the expression building the returned value. The form of a function expression
is:
[parameters, ...] -> expr
Parameters of the function are bound by the definition, hence are only available
in the function's body. Obviously, to typecheck the body of the
function, we will need the types of its arguments. Type schemes bound to
identifiers (function parameters or identifiers introduced by the
[identifier binding construct](/manual/The-type-system/Typing-a-program/identifier_binding)) are recorded in
an internal structure we call here a _typing environment_ acting as the
memory of _already seen identifiers'_ types.
Typechecking a function expression is performed in 3 steps:
1. First, each parameter is given a temporary _unknown_ type: a trivial type scheme (i.e. with no generalized variables) whose body is a type variable. *This especially means that a function parameter can't be used polymorphically inside the body of the function.*
2. The body `expr` is then typechecked in the typing _environment_ extended by the function's parameters, hence leading to the type `τ` of the returned value.
3. Finally, an _arrow_ type is built, grouping the types of the parameters (that were possibly instantiated because of type constraints found while typechecking the function's body) on the left-side of the _arrow_ and setting the returned type on the right-side, finally giving a type of the form `τ_1, ..., τ_n -> τ`.
###### Examples:
_ = x -> x + 1
> _do_ : int -> int = <fun>
First, notice the `_ = ...` notation that allows to throw the result of the
expression laying on its right-side. This is a particular form of identifier
binding, a construct described in detail
[later](/manual/The-type-system/Typing-a-program/identifier_binding). For the moment, the intuition to keep
is that it binds the result of the expression to _nothing_.
In this sample code, `x` is inserted in the typing environment with a type
variable, say `'a` (what we previously called an _unknown_ type). The body of
the function is then typechecked. Even is we didn't present yet the typechecking
of function application, driven by the intuition, the operator `+` expects two
integers and returns an integer. Hence, to be correctly typed, the parameter
`x` of the function must be of type `int` i.e. the type variable `'a`
previously introduced for `x` gets instantiated by `int`. Now, `1` also has type
`int`, so the operator `+` is
used in a consistent typing context, returning a value of type `int` which is
the global type of the function's body. Finally, the whole function type is
built having `'a` now being `int` on the left-side of the `->` and `int` on
its right-side, giving `int -> int`.
_ = x, y -> x + 1
> _do_ : int, 'v0 -> int = <fun>
This second example shows a function with 2 arguments. As previously, each of
them is entered in the typing environment with a type variable before
typechecking the body of the function. This body is the same than in the
previous example, hence it doesn't use the `y` parameter of the function. As a
consequence, no constraint applies on `y` and its type remains a variable. As a
final result, grouping the types of the arguments on the left-side of the `->`
and the result type on its right, we obtain the final type `int, 'v0 -> int`.
_ = -> "constant"
> _do_ : -> string = <fun>
In this last code snippet, since the function doesn't have any argument, its
body is typechecked in the unmodified typing environment. The string literal
`"constant"` representing the body of the function had type `string` which
is the type of its returned value. Since there is no argument, there is no
type to set on the left-side of the final `->` type, only the result type will
appear on its right-side, leading to the function type ` -> string`.
#### Application
Application is the construct allowing to "use" a function by providing it
some effective arguments. In Opa, an application expression is denoted by a
functional expression on the left-side and the enumeration comma-separated and
between parentheses of the expressions being effective arguments to pass to the
function:
expr1([expr2, ...])
If the function requires no argument, then the enumeration between the
parentheses is empty, however, the parentheses still remain required. Note *that there must not be any blank between the functional expression and the opening parenthesis*. Typechecking an application is performed in 4 steps:
1. First arguments expressions `expr2`, ... are typechecked, leading to as many types `τ_2, ...`.
2. Next, the functional expression `expr1` is typechecked. It is then expected to lead to a functional type with as many arguments' types as there are effective arguments and returning a value of a certain type: `τ`. Hence, the expected type must have the form `τ'_2, ... -> τ`.
3. Each effective argument's type `τ_i` is checked against the corresponding `τ'_i` of the type inferred for the function to ensure they are compatible.
4. Finally, the type of the application is the result type of the function, i.e. `τ`
###### Examples:
function f(x, y) { x + y + 1 }
> f : int, int -> int = <fun>
_ = f(2, 4)
> _do_ : int = 7
To begin, simply ignore the typechecking of the function `f`, we will have
a look at it later. Simply accept it has type `int, int -> int` and let's
examine the second expression. The arguments expressions `2` and `4` have
type `int`. We then typecheck the functional expression which is the identifier
`f`. Even if we don't have described typechecking of identifiers yet, we already
introduced the notion of _typing environment_ keeping trace of types of
identifiers (see [Functions](/manual/The-type-system/Typing-a-program/typing_function)). Looking-up in this
environment we find that `f` has type `int, int -> int`. Hence, we see that
the types of the effective arguments provided in the application are compatible
with the expected types arguments of `f` (i.e. `int`). Hence the result type
is the one on the right-side of `f`'s type: `int`.
To go further, we can study how `f` is typechecked since its body contains the
application of the operator `+`. Like previously described for functions, `x` and
`y` are entered in the typing environment both with temporary type variables,
say `'a` and `'b`. Then the body of the function which is an application must
be typechecked. We then first type the arguments, i.e. `x` and `y` that have
types `'a` and `'b`. Assuming that `+` has the obvious type `int, int -> int` we
must now ensure that the types of the effective arguments are compatible with
the type of `+`. This means that `'a` must be `int` and same thing for `'b`: the
two type variables get then instantiated, i.e. made _equal_ to `int`. The
result
type of the application is then the result type of `+`, i.e. `int`. We iterate
exactly the same process to typecheck the addition with the constant `1` which
finally gives type `int`. Hence, `int` is the type of `f`'s body, so the
returned type of `f`. As a consequence, `f` is inferred the final type
`int, int ->int`.
[[identifier_binding]]
#### Identifier binding
The construct
ident = expr1
expr2
allows to evaluate `expr1`, bind its value to `ident` and evaluate
`expr2` in which `ident` possibly appears and will be replaced by its
bound value. Identifier binding introduces polymorphism in Opa: once the type
of the expression to bind to the identifier is computed, all the type
variables remaining in it with no constraint are considered polymorphic
and _generalized_. As a consequence, in identifier binding expressions,
identifiers obtain not a type but a type scheme (see also
the presentation of [type variables](/manual/The-type-system/Type-algebra/type_variable)) that can be
polymorphic.
Typechecking this construct processes in 3 steps:
1. First, `expr1` is typechecked, leading to a type `τ1`.
2. Then `τ1` gets generalized: all type variables without any constraint are generalized and universally quantified to lead to a type scheme. This type scheme is then bound to the identifier `ident` in the typing environment.
3. Finally, `expr2` is typechecked in the extended typing environment, giving a type `τ2` which is the final type of the whole expression.
###### Examples:
v =
one = 1
one + one
> v : int = 2
The identifier `one` is bound to the value `1` which has type `int`. Then it
is used in an addition. The operator `+` expects 2 integers and returns an
integer.
Hence, since `one` was found of type `int`, it can safely be used with the
addition, hence this latter expression leads to the final type `int`.
By the way, we can note a special identifier binding construct, the
toplevel one that binds the identifier `v`. In its toplevel flavor, this
construct doesn't have any trailing expression (we previously called
`expr2`). This allows to bind identifiers at the outermost level of a
program, i.e. to give it a scope (a _life duration_) spanning on
_all the rest of the program_.
pair =
function identity(x) { x }
(identity(1), identity("foo"))
> pair : tuple_2(int, string) = { f1: 1, f2: "foo" }
The above code snippet introduces 2 features. First, the shortcut for defining
a function by directly enumerating its arguments. Instead of writing
`f = x, y -> 4`, it is possible to write `function f(x, y) { 4 }`. Second, we make use of
the built-in syntactic sugar to represent tuples whose components are enclosed
between parentheses and comma-separated.
Hence, the identifier `identity` gets bound to a function type directly
returning its argument. Clearly, such a function can take any kind of value and
will return this kind of value. The only constraint on the type of this
function is that it returns the same type it got for its argument, hence
giving the type `'a -> 'a`. This type can then be generalized giving `identity`
the polymorphic type scheme `∀ 'a.'a -> 'a`. The function is then invoked
twice, once with an `int`, once with a `string`, which is legal since it has
a polymorphic type. Hence, the results of these invocations have types `int`
and `string` and are grouped in a tuple, `tuple_2(int, string)` which is the
final type of this expression.
[[recursive_identifier_binding]]
#### Recursive identifier binding
Similarly to the identifier binding construct, the recursive flavor
rec ident_1 = expr_1
...
rec ident_N = expr_N
expr
allows to bind each `ident_i` to the value of `expr_i` but with
all the `ident_x` possibly appearing in the bodies, `expr_x`, of
the bound expressions. This hence allows (mutually) recursive
definitions. Intuitively, this implies that to compute the type to bind to
`ident_i`, we must compute the type of all the expressions `expr_x`
for which we may need to already known the types bound to the `ident_x`.
Note that toplevel definitions are automatically reordered, i.e. it is possible
to use a function and define it later and if needed, definitions are implicitly
made recursive. Hence, the `rec` construct mostly serves to define *local*
recursive definitions, i.e. definitions not spanning at toplevel (or in other
words, definitions nested in other definitions).
Typechecking this construct processes in 5 steps:
1. First, each `ident_i` is temporarily given a _unknown_ type: a trivial type scheme (i.e. with no generalized variables) whose body is a type variable. *This especially means that a recursively defined identifier can't be used polymorphically in its own definition.* The typing environment gets extended, binding each `ident_i` to its related temporary type scheme.
2. Then each `expr_i` is typechecked in the extended typing environment, leading to a type `τ_i`.
3. Now the bodies of the bindings are typechecked, each `τ_i` and the related temporary type scheme bound to `ident_i` (which have possibly be instantiated by some type constraints) are checked to ensure they are compatible.
4. If so, each `τ_i` is then generalized and becomes the type scheme bound to `ident_i` in the typing environment.
5. Finally, `expr_n+1` is typechecked in the extended typing environment, giving a type `τ_n+1` which is the final type of the whole expression.
#### Value identifiers
As we previously introduced (see [Functions](/manual/The-type-system/Typing-a-program/typing_function)), the
typing environment is a memory keeping trace of types inferred and bound
to identifiers (bound by a function definition or an identifier binding
construct). More accurately, it contains, not the types, but the *type schemes*
bound to identifiers since Opa supports polymorphism.
Typechecking an identifier expression aims to find a type for this identifier
and is performed in two steps.
1. First a looking-up is performed in the typing environment to find the type scheme entered for this identifiers. Providing that the identifier was actually defined, it das obviously been typechecked and bound a type scheme in this environment.
2. The obtained type scheme must be transformed into a type. Remember that a type scheme is a _model_, a _pattern_ of types, describing a _family_ of possible types. Generalized (i.e. universally quantified) type variables in the scheme represent _parts_ that can be instantiated at need by any type expression. In order to allow the identifier to be used with different types in place of the type variables of its scheme, the scheme gets _specialized_, i.e. the type for this occurrence of the identifier is built by copying the scheme's body, replacing all generalized type variables by fresh (i.e. not appearing anywhere in the program) type variables.
###### Examples:
function id(x) { x }
> id : 'v0 -> 'v0 = <fun>
one = 1
> one : int = 1
hello = "Hi"
> hello : string = "Hi"