Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
1241 lines (939 sloc) 35.7 KB
Plasma Language Reference
=========================
Paul Bone <paul@plasmalang.org>
v0.1, Janurary 2018: Draft.
Copyright (C) 2015-2018 Plasma Team
License: CC BY-SA 4.0
:toc:
As the language is under development this is a working draft.
Many choices may be described only as bullet points.
As the language develops these will be filled out and terms will be
clarified.
link:https://github.com/PlasmaLang/plasma/tree/master/docs/plasma_ref.txt[Contribute to this page]
== Lexical analysis and parsing
The "front end" passes of Plasma compilation work as follows:
* Tokenisation converts a character stream into a token stream.
* Parsing converts the token stream into an AST.
* AST->Core transformation converts the AST into the core representation.
This phase also performs symbol resolution, converting textual identifiers
in the AST into unique references.
=== Lexical analysis
* Input files are UTF-8
* Comments begin with a '#' and extend to the end of line.
* Curly braces for blocks/scoping
* Whitespace is only significant when it separates two tokens what would
otherwise form a single token
* Statements and declarations are not delimited. The end of a statement can
be determined by the statement alone. Therefore: there are no statement
terminators or separators (such as semicolons in C) nor significant
whitespace (as in Python or Haskell).
* String constants are surrounded by double quotes and may contain the
following escapes. +\n \r \t \v \f \b \\+. Escaping the double quote
character is not currently supported, using character codes is not
currently supported. Escaping any other character prints that character
as is; this allows +\'+ to work as many programmers may expect, even
though it's not necessary.
=== Parsing
Plasma's EBNF is given in prices throughout this document as concepts are
introduced.
However the top level and some shared definitions are given here.
In this ENBF syntax I use ( and ) to denote groups and ? + and * to denote
optional, one or more, and zero or more.
----
Plasma := ModuleDecl ToplevelItem*
ModuleDecl := module ident
ToplevelItem := ExportDirective
| ImportDirective
| TypeDefinition
| ResourceDefinition
| FuncDefinition
ModuleQualifiers := ( ident . )*
QualifiedIdent := ModuleQualifiers ident
IdentList := ident ( , ident )*
----
=== A note on case and style.
Sometimes it is necessary to use case to distinguish symbols in
different namespaces that may appear in the same expression. For example
type names and type variables can both appear in type expressions. In other
situations there is no requirement but it can be useful to adopt a
convention that makes it easier to read code.
Disambiguation based on case is done as part of AST->Core transformation.
Plasma either requires or suggests the following cases in the following
situations.
|===
| | Requirement | Suggestion | Notes
| Variable | first letter lower | lower_case |
| Function Name | first letter lower | lower_case |
| Module Name | - | UpperCase | Case insensitive
| Type Name | first letter upper | UpperCase |
| Type Variable | first letter lower | lower_case |
| Data constructor | first letter upper | UpperCase | to distinguish construction from function application or variable use.
| Field selector | first letter lower | lower_case | Must be the same as function names.
| Interface | ? | UpperCase |
| Instance | ? | lower_case | not first class,
but may appear in expressions.
| Resources | - | lower_case |
|===
Note that there may be more symbol namespaces in the future.
The rationale for these decisions is:
.Variables, functions and field selectors
The most common symbols should be in lower case and use '_' to separate
words are preferred, but not enforced.
.Module names
It is useful to visually distinguish module names from other symbols that
can appear within expressions. Currently module names can only be used to
module-qualify other symbols, but this may change in the future. This may
also become a requirement rather than a suggestion.
.Types and type variables
Type variables must be distinguished from types. Note that variables
don't need to be distinguished from functions as this is available from
context: free variables do not exist and a bound variable has the same
semantics as a defined function name.
Therefore plasma uses case to distinguish between type names (uppercase
first letter) and type variables (lowercase first letter).
----
List(t)
----
Is a list of _t_ where +List+ is the list type and +t+ is a type
variable, it may stand for any type. Note that this is the same as
Haskell but the opposite of Mercury.
.Data constructors
Code that does different things should look different.
Therefore data construction should stand apart from function calls, and
hence it is useful if data constructors to begin with capital letters.
It could be argued that the same is true for field selection. Suggestions
welcome.
.Interfaces and Interface Instances
Interfaces are to instances as types are to values,
This is reflected in our decision to suggest that interfaces should be
CamelCase and instances lower_case.
Also, instances and module qualifiers can both appear within
expressions as a prefix to another symbol.
Instances will also appear distinct from module qualifiers.
[[environment]]
== Environment
The _environment_ is a concept we will consider for Plasma's scoping rules.
The environment maps symbols to their underlying items (modules, types,
functions, variables etc). Even though no environment exists at runtime,
and the compile-time structure is an implementation detail of the compiler
(+pre.env+), it is useful to think of scoping in these terms, as it explains
most scoping behaviours.
Some languages allow overloading of symbols, usually based on a symbol's
type and sometimes on it's arity. Plasma does not support any overloading.
=== Scopes
When a new name is defined it is added to the current environment.
----
print!(x) # x does not exist.
x = "hello" # x (a variable) is added to the environment.
print!(x) # We may now refer to x.
----
When a nested block starts, it creates a new environment based upon the old
environment.
----
x = "hello"
if (...) {
print!(x) # Ok
}
----
When a nested block ends, the original environment is restored.
----
if (...) {
x = "Hello"
print!(x) # Ok
}
print!(x) # Error
----
.Variables
When all paths (of a branching statement) assign the same variable, then
that variable is _added to_ the original environment. No earlier
declaration is required. This only makes sense for branching statements,
and only happens with variables.
----
if (...) {
x = "Hello"
y = "Hi"
} else {
x = "Goodbye"
}
print!(x) # Ok
print!(y) # Error
----
.Other symbols
This is an error for other symbols.
----
if (...) {
import SortedListSet as Set
# Some code using Set
} else {
import RBTreeSet as Set
# Some code using Set
}
# Cannot use Set here.
----
This is a desirable feature however it complicates the type system,
instead <<interfaces,Interfaces>> provide a less direct way to accomplish
this type of thing.
=== Shadowing
Shadowing refers to a new binding with the same name as an old binding being
permitted and dominant in an _inner_ or _later_ scope.
Shadowing is not permitted for variables at all. It is permitted for other
symbols.
NOTE: TODO: Decide on rules for a symbol of one type overriding a symbol of
another type. For example it should probably be an error for a module
import to shadow an interface declaration. But it's probably okay for a
variable to overload a function, unless that function is defined within
another function (a closure).
==== Variables
A variable cannot shadow another variable.
----
x = 3
x = 4 # Error
if (...) {
x = 5 # Error
}
----
NOTE: We are considering a special syntax to use with variables that allows
shadowing.
==== Other symbols
Symbols other than variables allow shadowing, for example module imports can
create shadowing of their contents (types, functions etc).
Including when import is used with a wildcard. Therefore we can use a
different +Set+ implementation in the inner scope:
----
import SortedListSet as Set
...
# some code
...
{
import RBTreeSet as Set
...
# some code using RBTreeSets
...
}
...
# back to SortedListSet
...
----
(Yes, module imports may appear within function bodies and so-on.)
However, a binding that cannot be observed such as:
----
import SortedListSet as Set
import RBTreeSet as Set
----
Doesn't make sense, and the compiler should generate a warning.
TODO: Figure out if context always tells us enough about the role of a
symbol that modules do not need to shadow types and constructors. I suspect
this is true but I'll have to define the rest of the language first.
=== Namespaces
The environment maps names to items. Names might be qualified and if so the
qualifier is required to refer to that name. For example.
----
import Set
my_set1 = Set.new # Ok
my_set2 = new # Undefined symbol
----
Or they can be unqualified
----
import Set.new
my_set1 = Set.new # Undefined symbol Set
my_set2 = new # Ok
----
The name within the namespace does not need to correspond to the name as
it was defined.
----
import Set.new as new_set
my_set = new_set # Ok
----
This applies to all symbols except for variables, which can never be
qualified. There is no syntax that would allow a variable to be defined
with a qualifier.
== Modules
Each file is a module, the file name must match the module name (case
insensitive). By convention CamelCase is used.
NOTE: I prefer lower case filenames, but I also want these to match. Maybe
I'll grow to like CamelCase file names.
Each module begins with a module declaration.
----
module MyModule
----
=== Module Imports
----
ImportDirective := import QualifiedIdent
| import QualifiedIdent . *
| import QualifiedIdent as ident
----
Modules may be imported with an import declaration.
----
import RBTreeMap
import RBTreeMap as Map
import IO.getpid as getpid
import IO.*
----
+import+ imports one or more symbols from a module. Lines one and two add a
module name (+RBTreeMap+ or +Map+, respectively) to the current environment.
Line three imports only the +getpid+ function from +IO+ and names it
+getpid+ in the current environment. While line four imports everything in
IO, adding them all to the current environment.
Code using symbols imported by lines one and two will require module
qualification (either +RBTreeMap+ or +Map+), while code using getopt (or
other symbols from +IO+) will not.
A module cannot be used without an +import+ declaration.
=== Module exports
----
ExportDirective := export IdentList
| export *
----
Symbols can be exported with export directives.
----
export my_function
----
If a module has no export directives then nothing is exported. Which
probably makes the module useless.
To export everything from a module use a +*+.
----
export *
----
TODO: Syntax for exporting types abstractly & fully.
== Types
The Plasma type system supports:
* Algebraic types
* parametric polymorphism (aka generics)
* Abstract types
* Other features may be considered for a later version
* Type variables are lower case, type names begin with an uppercase letter
(Haskell style).
See also link:types.html[Type system design] which reflects more up-to-date
ideas.
Type expressions refer to types.
----
TypeExpr := TypeName ( '(' TypeExpr ( ',' TypeExpr )* ')' )?
| 'func' '(' ( TypeExpr ( ',' TypeExpr )* )? ')' Uses* RetTypes?
| TypeVar
# Uses denotes which resources a function may use.
Uses := uses Ident
| uses '(' IdentList ')'
| observes Ident
| observes '(' IdentList ')'
RetTypes := '->' TypeExpr
| '->' '(' TypeExpr ( ',' TypeExpr )* ')'
TypeName := QualifiedIdent
TypeVar := ident
----
Type names must begin with an upper case first letter, and +TypeVars+
with a lower case letter, otherwise they are both identifiers.
We can define new types using type definitions
----
TypeDefinition := type UpperIdent TypeParams? = OrTypeDefn ( '|' OrTypeDefn )*
TypeParams := '(' IdentList ')'
OrTypeDefn := ConstructorName
| ConstructorName '(' TypeField ( , TypeField )+ ')'
TypeField := FieldName ':' TypeExpr
| TypeExpr # Not supported
ConstructorName := ident
FieldName := ident
----
+TypeParams+ is a comma separated list of lowercase identifiers.
+TypeField+ will need lookahead, so for now all fields must be named, but
the anonymous name (+_+) is supported.
TODO: We use vertical bars to separate or types. Vertical bars mean "or"
and are used in Haskell, but in C commas (for enums) and semicolons (for
unions) are used. Which is best? Mercury uses semicolons as these mean
"or" in Mercury.
TODO: We use parens around the arguments of constructors, like Mercury, and
because fancy brackets aren't required. However curly braces would be more
familiar to C programmers.
=== Builtin types
How "builtin" these are varies. +Ints+ are completely builtin and handled by
the compiler where as a List has some compiler support (for special symbols
& no imports required to say "List(t)") but operations may be via library
calls.
* Int
* Uint
* Int8, Int16, Int32, Int64
* Uint8, UInt16, UInt32, UInt64
* Char (a unicode codepoint)
* Float (NIY)
* Array(t)
* List(t)
* String (neither a CString or a list of chars).
* Function types
These types are implemented in the standard library.
* CString
* Map(t)
* Set(t)
* etc...
=== User types
User defined types support discriminated unions (here a +Map+ is
either a +Node+ or +Empty+), and generics (+k+ and +v+ are type parameters).
----
type Map(k, v) = Node(
m_key : k,
m_value : v,
m_left : Map(k, v),
m_right : Map(k, v)
)
| Empty
----
TODO: Syntax will probably change, I don't like +,+ as a separator, I prefer
a terminator, or nothing to match the rest of the language. Curly braces?
+|+ is also used as a separator here.
Types may also be defined abstractly, with their details hidden behind module
abstraction.
[[interfaces]]
== Interfaces
Interfaces are a lot like OCaml modules. They are not like OO classes and
only a little bit like Haskell typeclasses.
Interfaces are used to say that some type and/or code behaves in a particular
way.
The +Ord+ interface says that values of type +Ord.t+ are totally ordered
and provides a generic comparison function for +Ord.t+.
----
type CompareResult = LessThan | EqualTo | GreaterThan
interface Ord {
type t
func compare(t, t) -> CompareResult
}
----
+t+ is not a type parameter but +Ord+ itself may be a parameter to another
interface, which is what enables +t+ to represent different types in different
situations; +compare+ may also represent different functions in different
situations.
We can create instances of this interface.
----
instance ord_int : Ord {
type t = Int
func compare(a : Int, b : Int) -> CompareResult {
if (a < b) {
LessThan
} else if (a > b) {
GreaterThan
} else {
EqualTo
}
}
}
----
Note that in this case each member has a definition. This is what makes
this an interface instance (plus the different keyword), rather than an
(abstract) interface. The importance of this distinction is that interfaces
cannot be used by code directly, instances can.
Code can now use this instance.
----
r = ord_int.compare(3, 4)
----
Interfaces can also be used as parameter types for other interfaces.
Here we define a sorting algorithm interface using an instance (+o+) of the
+Ord+ interface.
----
interface Sort {
type t
func sort(List(t)) -> List(t)
}
instance merge_sort(o : Ord) : Sort {
type t = o.t
func sort(l : List(t)) -> List(t) {
...
}
}
----
+merge_sort+ is an instance, each of its members has a definition, but it
cannot be used without passing an argument (an instance of the +Ord+
interface). A list of +Int+s can now be sorted using:
----
sorted_list = merge_sort(ord_int).sort(unsorted_list)
----
NOTE: This example is somewhat contrived, I think it'd be more convenient
for sort to take a higher order parameter. But the example is easy to
follow.
+merge_sort(ord_int)+ is an instance expression, so is +ord_int+ in the
example above.
Instance expressions will also allow developers to name and reuse specific
interfaces, for example:
----
instance s = merge_sort(ord_int)
sorted_list = s.sort(unsorted_list)
----
More powerful expressions may also be added.
Instances can also be made implicit within a context:
----
implicit_instance merge_sort(ord_int)
sorted_list = sort(unsorted_list)
----
This is useful when an instance defines one or more operators, it makes
using the interface more convenient. Suitable instances for the basic types
such as Int are implicitly made available in this way.
Only one implicit instance for the given interface and types may be used at
a time.
== Resources
----
ResourceDefinition := 'resource' UpperIdent 'from' QualifiedIdent
----
This defines a new resource. The resource has the given name and is a
child resource of the specified resource. +SuperRes+ is the ultimate
resource and is already defined, along with it's child resource such as
+IO+. See 'Handling effects' below.
== Code
=== Functions
----
FuncDefinition := func ident '(' ( Param ( ',' Param )* )? ')'
Uses* RetTypes? Block
Param := IdentLower ':' TypeExpr
| '_' : TypeExpr
RetTyes := '->' TypeExpr
| '->' '(' TypeExpr ( ',' TypeExpr )* ')'
Block := '{' Statement* '}'
----
Uses is defined above in the type declarations section.
TODO: Probably add support for naming return parameters
TODO: Consider adding optional perens to enclose return parameters.
TODO: More expressions and statements
Code is organised into functions.
A function has the following form.
----
func Name(arg1 : type1, arg2 : type2, ...) -> ret_type1, ret_type2
Resources?
{
Statement+
}
----
In the future if the types are omitted from a non-exported function's
argument list the compiler will attempt to infer them. For now all types
are required.
TODO: Find a way that return parameters can be named. This will change the
behaviour of functions WRT having the value of their last statement.
TODO: What if neither the name or type of a return value is specified?
Resources is optional and may either or both "uses" or "observes" clauses,
which are either the uses or observes keywords followed by a list of one
or more comma separated resource names.
The special symbol +_+ can be used as a parameter to ignore any arguments
passed in that position, the type is still enforced.
=== Statements
----
Statement := Assignment
| Call
| Return
| MatchStemt
----
==== Assignment
----
Assignment := LHSIdent ( ',' LHSIdent )* '=' Expr
LHSIdent := IdentLower
| '_'
----
Statements may be assignments, for example:
----
variable = expr
----
An expression may have more than one value.
----
var1, var2 = expr # expr returns a two items.
----
This works when the expression is a multi-valued call.
TODO: it should also work when the expression is for example a if or switch
expression that returns multiple items. This is a property of the if or
switch expression, not the assignment statement.
----
var1, var2 = if (...) then expr1, expr2 else expr3, expr4
----
Plasma is a single assignment language. Each variable can only be assigned
to once along any execution path, and must be assigned on each execution
path that falls-through (see <<environment,Environment>>).
The special symbol '_' can be used to ignore the return value of a function.
It can be used to selectively capture only some values.
----
div, _ = div_and_quot(7, 5)
----
Or to ignore the result of a function call that affects a resource.
----
_ = close!(file)
----
==== Function call
----
Call := ExprPart1 '!'? '(' Expr ( , Expr )* ')'
----
Function calls often return values, however functions that do not return
anything can be called as a statement. Such a function only makes sense if
effects a resource, and therefore will have a '!'. However the grammar and
semantics allow functions that don't have an affect (the compiler will
almost certainly optimize these away).
----
function_name!(arg1, arg2)
----
Calls may also be expressions (see below), as an expression a call might
still use or observe some resource. However only one call per statement may
observe the same or a related resource, this ensures that effects happen in
a clear order.
==== Return
----
Return := 'return' TupleExpr
| 'return'
----
For example:
----
# Return one thing
return expr
# Return two things
return expr1, expr2
# Return nothing
return
----
A function that returns a one or more values must always end in a return
statement, or a branching statement that (indrectly) ends in a return
statement on each branch.
TODO: This will need to be relaxed for code that aborts.
TODO: Named returns.
Functions that return nothing may optional use a return statement, this can
be used to implement early return.
Functions and blocks do not have values.
This is deliberate to keep functions and expressions _semantically_
separate.
This means that the last statement of a block does not have any special
significance as it does in some other languages.
==== Pattern matching
----
MatchStmt := 'match' Expr '{' Case+ '}'
Case := Pattern '->' '{' Statement* '}'
Pattern := Number
| IdentLower
| '_'
| IdentUpper ( '(' Pattern ',' ( Pattern ',' )+ ')' )?
----
Pattern matching is also a statement (as well as an expression).
Cases are tried in the order they are written, the compiler should provide a
warning if a case will never be executed, or a value is not covered by any
cases. All variables in the pattern (the LHS of the +->+) must be new.
----
match (n) {
0 -> {
beer = "There's no beer!"
}
1 -> {
beer = "There's only one beer"
}
m -> {
beer = "There are " ++ show(m) ++ " bottles of beer"
}
}
print!(beer)
----
If a variable bound by one of the cases is used outside the match (like
+beer+) then it must be bound by every case. If it is not used outside the
match, then each cases' variable of the same name is "named apart"
(see <<environment,Environment>>).
Currently either all cases must have a return statement or none of them.
TODO Matches where some return and others do not will be added in the future.
==== If-then-else
----
if (expr) {
statements
} else if (expr) {
statements
} else {
statements
}
----
There may be zero or more else if parts.
Plasma's single-assignment rules imply that if the "then" part of an
if-then-else binds a non-local variable, then there must be an else part
that also binds the variable (or does not fall-through). Else branches
aren't required if the then branch does not fall-through or does not bind
anything (it may have an effect).
[[loops]]
==== Loops
NOTE: Not implemented yet.
NOTE: I'm seeking feedback on this section in particular.
----
# Loop over both structures in a pairwise way.
for [x <- xs, y <- ys] {
# foo0 and foo form an accumulator starting at 0. The value of foo
# becomes the value of foo0 in the next iteration.
accumulator foo0 foo initial 0
# The loop body.
z = f(x, y)
foo = foo0 + bar(x)
# This loop has three outputs. "list" and "sum" are names of
# reductions. Reductions are instances of the reduction
# interfaces. They "reduce" the values produced by each iteration
# into a single value.
output zs = list of z
output sum = sum of x
# foo is not visible outside the loop, an output is required to
# expose it. value is a keyword, it is handled specially and
# simply takes the last value encountered.
output foo_final = value of foo
}
----
NOTE: the accumulator syntax will probably change after the introduction of
some kind of state variable notation.
TODO: Introduce a more concise syntax for one-liners and expressions.
Similar in succinctness to using map and foldl calls.
The loop will iterate over corresponding items from multiple inputs. When
they're not of equal length the loop will stop after the shortest one is
exhausted. This decision allows them to be used with a mix of finite and
infinite sequences.
Looping over the Cartesian combination of all items should also be supported
(syntax not yet defined, maybe use +&+). This is equivalent to using nested
loops in many other languages.
Valid input structures are: lists, arrays and sequences. Sequences are
coroutines and therefore can be used to iterate over the keys and values of
a dictionary, or generate a list of numbers.
TODO: Possibly allow this to work on keys and values in dictionaries. If
the keys are unmodified during the loop then the output dictionary can be
rebuilt more easily, its structure doesn't need to change. Lua has the
ability to require keys to be sorted, or to drop this requirement.
The output declarations include a reduction. This is how the loop should
build the result.
TODO: Reduction isn't a good word for it, since the output type can be
either a scalar or a vector.
The reduction can be completely different from the type of any of the
inputs. This builds an array from a list (or other ADT). This uses the
+array+ reduction.
----
for [x <- xs] {
y = f(x)
output ys = array of y
}
----
Many reductions will be possible: +array+, +list+, +sequence+, +min+, +max+,
+sum+, +product+, +concat_list+. Developers will be able to create their
own as these are interfaces.
Loops are implemented in terms of coroutines. Coroutines return the values
for the inputs and the loop body and coroutines handle building the value of
the outputs (list and sum are coroutines above). Coroutines offer the most
flexibility as some of their state is kept on the stack.
Simpler implementations should be used as an optimisation when it is
possible. In these cases some loops may be optimised to calls to map or
foldl, or even simpler inline code.
Auto-parallelisation (a future goal) will work better with reductions that
are known to be either:
- Order independent
- Associative / commutative, but whose input type is the same as the output
- Mergable, with a known identity value.
Accumulators are implemented more directly (not coroutines). However they
require the iterations to be processed in a specific order and may inhibit
parallelisation. A dependency analysis on the body and separating out the
code for each accumulator may mitigate this, especially if it can be
combined with the same analyses as reductions above.
=== Expressions
Expressions are broken into two parts. This allows us to parse call
expressions properly, with the correct precedence and without a left
recursive grammar. Binary operators are described as a left recursive
grammar, but are not implemented this way, their precedence rules are
documented below.
----
TupleExpr := Expr ( ',' Expr )*
Expr := Expr BinOp Expr
| UOp Expr
| ExprPart1 '!'? '(' Expr ( , Expr )* ')' % A call or
% construction
| ExprPart1 '[' '-'? Expr ( '..' '-'? Expr )? ']' % array access
| ExprPart1
ExprPart1 := '(' Expr ')'
| '[' ListExpr ']'
| '[:' TupleExpr? ':]' # An array
| QualifiedIdent # A value
| Const # A constant value
BinOp := '+'
| '-'
| '*'
| '/'
| '%'
| '++'
| '>'
| '<'
| '>='
| '<='
| '=='
| '!='
| 'and'
| 'or'
UOp := '-' # Minus
| 'not' # Logical negation
----
UOp operators have higher precedence than BinOp, BinOp precidence is as
follows, group 1: * / %, group 2: + - group 3: < > <= >= == !=,
group: 4-7: and or ++ ,
Lists have the following syntax (within square brackets)
----
ListExpr := e
| Expr ( ',' Expr )* ( '|' Expr )?
----
Examples of lists are:
----
# The empty list
[]
# A cons cell
[ head | tail ]
# A list 1, 2, and 3 are "consed" onto the empty list.
[ 1, 2, 3 ]
# Consing multiple items at once onto a list.
[ 1, 2, 3 | list ]
----
Arrays elements may be access by _subscripting_ the array. Eg
+a[3]+ will retrieve the 3rd element (1-based). A dash before the subscript
expression will count backwards from the end of the array, +a[-2]+ is the
second last element. This syntax currently clashes with unary minus and so
is currently unimplemented. Array slices will use the +..+ token and are
also unimplemented.
TODO: Streams
Any control-flow statement is also an expression.
----
x = if (...) { statements } else { statements }
----
In this case the branches cannot bind anything visible outside of
themselves, and the value of a branch is the value of the last statement in
that branch.
TODO: Pattern matching expressions.
==== Ideas
These are just ideas at this stage, they are probably bad ideas.
If a multi-return expression is used as a sub-expression in another context
then that expression is in-turn duplicated.
----
x, y = multi_value_expr + 3
----
is
----
x0, y0 = multi_value_expr
x = x0 + 3
y = y0 + 3
----
Therefore calls involved in these expressions must not "use resources".
Another idea to consider is that a multiple return expression in the context
of function application applies as many arguments as values it returns. We
probably won't do this.
----
... = bar(foo(), Z);
----
Is the same as
----
x, y = foo();
... = bar(x, y, z);
----
== Handling effects (IO, destructive update)
Plasma is a pure language, we need a way to handle effects like IO and
destructive update. This is called resources. A function call that uses a
resource (such as +print()+), may only be called from functions that declare
that they use a resource. This means that a callee cannot use a resource
that a caller doesn't expect (resource usage is transitive) and anyone
looking at a functions' signature can tell that it might use a resource.
A resource usage declaration looks like:
----
func main() -> Int uses IO
----
Here +main()+ declares that it uses (technically _may use_) the +IO+
resource. Resources can be either _used_ or _observed_; and a function may
use or observe any number of resources (decided statically). An observed
resource may be read but is never updated, a used resource may be read or
updated. This distinction allows two observations of a resource to commute
(code may be re-arranged during optimisation), but two uses of a resource
may not commute.
Developers may declare new resources, the standard library will provide some
resources including the +IO+ resource. Examples of +IO+ 's children might be
+Filesystem+ and +Time+, +Filesystem+ might have children for open files
(WIP), although none of these have been decided / implemented.
A call is valid if:
|===
| | Callee is Pure | Callee may Observe | Callee may Use
| Caller is Pure | Y | N | N
| Caller may Observe | Y | Y | N
| Caller may Use | Y | Y | Y
|===
You'll find that this is very intuitive.
It's shown in a table for completeness.
=== Resource hierarchy
Resources form a hierarchy (not yet defined). For a call to be valid either
the resource, or its parent must be available in the caller. For example if
+mkdir()+ uses the +Filesystem+ resource, which is a child of +IO+ then any
caller that +uses IO+ can call +mkdir()+.
=== Temporary resources (NIY)
Some resources can be creating and destroyed, and rather than being a part
of their parent always (+Filesystem+ is always a part of +IO+) they are
subsumed by their parent instead. For example an array uses some memory as
its resource, that memory is allocated and freed when the array is
initialised and then goes out of scope (it is unique). But if that
the memory resource is created and destroyed within the same function, it's
caller does not need the uses declaration, memory and possibly some other
resources are special cases.
=== Resources in statements
Every call that uses a resource must have the +!+ suffix. For example:
----
print!("Hello world\n")
----
This makes it clear to anyone reading the code to *beware* something
_happens_, _changes_ or might be _observed_ to have happened or have
changed. This is also the entire reason to have it in the language, it
serves no other function, but the compiler will make sure that it is
present on every call that either uses or observes something.
Multiple calls with +!+ may be used in the same statement, provided that
their resources do not overlap, or they are all observing the resource and
not modifying it. (Note that we are debating) this at the moment).
=== Commutativity of resources
Optimisation may cause code to be executed in a different order than
written. The following reorderings of two related (ancestor/descendant)
resources are legal.
|===
| | None | Observe | Use
| None | Y | Y | Y
| Observe | Y | Y | N
| Use | Y | N | N
|===
Non-related resources may be reordered freely.
=== Higher order code (NIY)
This aspect of Plasma is currently under consideration.
* Higher order functions need to handle resources, otherwise their
usefulness is reduced.
* Resource usage from such code needs to be safe (WRT order of operations).
* We want to encourage polymorphism here, otherwise people will write
higher-order abstractions that can't be used with resources.
* We'd prefer to make code concise that isn't intended to be used with
resources, but ought to be resource-capable anyway.
==== Proposal 1
Higher order values may have +uses+/+observes+ declarations (added to their
type) values without such declarations are pure. All higher order calls
have the usual +!+ sigil and the statement rules apply.
Map over list looks like:
----
func map(f : a -> b using r, l : List(a)) -> List(b) using r {
switch (l) {
case [] -> {
return []
}
case [x0 | xs0] -> {
x = f!(x0)
xs = map!(f, xs0)
return [x | xs]
}
}
}
----
Note that the calls to +f+ and +map+ must be in separate statements.
This has the disadvantage that it is not as concise, and that people who aren't
planning to use resources, won't write resource-capable code, if that code is
in a library it may be annoying to modify if it needs to be used with a
resource later.
==== Other proposals
There are several other ideas and their combinations that may help.
* All higher order code implicitly uses resources, a function like map
therefore also uses that resource since it contains such calls. Unless
there's a way to show this in the function's signature I'm not fond of it.
Unless-unless we leave that to type inference.
* Require all higher-order code to handle resources, users may feel that the
compiler is being overly-pedantic.
* Higher order calls are exempt from the one-resource-per-statement rule.
Making the code more concise (it still includes a !).
* Either expressions have a well-ordered declarative semantics or
* resources must be declared as 'don't-care' ordering so they can be placed
in the same statements.
=== Linking to and storing as data (NIY)
Linking a resource with a _real_ piece of data, such as a file descriptor,
is highly desirable. Likewise putting such data inside a structure to be
used later, such as a pool of warmed-up database connections, will be
necessary.
There are a couple of ideas. We could add information to the types to say
that they are resources and what their parent resource type is. So that the
variable can stand-in for the resource.
----
type Fd =
resource from Filesystem
func write(Fd, ...) uses Fd
----