- Daniil Livitin - language features developer
- Roman Soldatov - language features developer
- Mikhail Martovitsky - implement a standard library and predefined functions. Online compiler Backend developer.
- Timur Nugaev - tester & test writer. Online compiler Frontend developer.
- The link to the source code.
- The link to the readme with language documentation
- The organisation with repositories.
- Functional Lisp-like language with a static type system.
- The language is not purely functional.
The language represents a sequence of elements. Elements can be:
- List. The sequence of elements. They can be considered as just a sequence of homogenous elements or a list can be of a special form which represents a list as a function. In this case, elements don't require to have the same type. The special form of a list is an S-expression, so each element is evaluated before a function call.
- Literal. The basic values such as integers, doubles, booleans and strings.
- Atom. Identifier of a variable which can be assigned to a literal or a list. An identifier can be written with the use of letters, digits and underscores. The name can't start with a digit and it must contain at least one letter.
- Function type. A function declaration of input argument types and return types separated by
->
- Tuple. A sequence of heterogeneous elements separated by a comma
,
.
Program : Element { Element }
List : ( Element { Element } )
Element : Atom | Literal | List | Tuple | Functype
Atom : Identifier
Literal : [+|-] Integer | [+|-] Double | Boolean | String
Tuple : (Element , Element {, Element})
Functype : (Element {Element} -> Element)
Identifier : { _ } Letter { Letter | DecimalDigit | _ }
Letter : Any Unicode character that represents a letter
Integer : DecimalDigit { DecimalDigit }
DecimalDigit : 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Double : Integer . Integer
Boolean : true | false
String: " Characters "
Characters: { Letter }
- Before interpretation compiler runs the type checker which analysis AST asking each node its type. Connected nodes compare their expected types with actual or infer ones. Each node has a set of rules on which type of arguments it expects and what type of argument it should return. For example, if the node is a condition which accepts a single argument of type boolean and returns some node of any type, then we take into account several things
(cond BooleanCondition TrueBranch FalseBranch
: cond
has three children nodes that represent branches(TrueBranch
andFalseBranch
) andBooleanCondition
.BooleanCondition
node must return a boolean.- Branches must return the same type. For this purpose, we recursively check what each branch returns and then compare types.
- The parent node (which calls
cond
) must provide an external atom argument forcond
of type boolean. Also, this parent node knows whatcond
returns since we checked its branches. Therefore, we can conclude what the parent node should return. Thus, if we know that e.g.cond
returnsInt
, but its parent node should returnDouble
, then it's the types error which we detected during the compilation. - if the node has an
Auto
type, but its children or parent provides some defined type for it (e.g.String
). Then we can conclude that argument's typeAuto
is actually a String. That's how the type inference works.
In compiler implementation, to check that connected nodes pass and return arguments of expected types, each AST node has the following methods
List<NodeType> getArgumentType(); // Get a list of argument types which node accepts
void setArgumentType(List<NodeType>); // Set a list of argument types which node should accept. Call this method for type inference (if node had Auto type)
NodeType getReturnType(); // Get a type of return type
void setReturnType(NodeType type); // Set return type which node should provide. Call this method for type inference (if node had Auto type)
- For the
function
we should check if the provided arguments are the same types as declared in thefunctype
keyword. If the argument is not Base type then we should recursively get the type of argument. - For the variable (atom) we can use the
define
keyword to specify the variable's type. - If the function accepts or returns an atom of type
Any
orNum
then it can be used for different argument types (Int
andDouble
will be acceptable forNum
). In this case, possible exceptions would be detected only during the runtime.
In our language type annotation in the source code is optional.
If type is not specified explicitly then it should be inferred by the interpreter automatically. Alternatively, you can specify a type as unknown using Auto
.
- if the node's argument or the return type is
Auto
and such node does not have special rules of required types (e.g. thecond
has a rule that condition has a type of boolean) and its connected nodes also can't conclude their required types then the type inference won't work during the compile time. So, it's not possible to traverse through nodes and detect their types if everything has the typeAuto
. - if connected nodes have types
Any
andAuto
thenAuto
could be only resolved as a type ofAny
. However, we could get a type exception during the runtime because the type checker was not able to check these generic types.
Language has general-purpose types that are built into it:
-
Numeric types (
Int
,Double
,Num
- parent class ofInt
andDouble
)Int
- integer numberDouble
- number with floating pointNum
- can be acceptable eitherInt
, orDouble
-
Boolean type (
Boolean
)Boolean
- has values:true
,false
-
Unit type (
Unit
). It indicates that a function accepts or returns nothing. -
String type (
String
). Any symbols between quotes signs "..." -
Any type (
Any
). Parent of all types. It can be used in functions in which we don't care about the variable's type. -
Homogeneous lists. A list is a sequence of elements separated by whitespaces and enclosed by parentheses. However, there are two types of lists in general.
- The list with elements of the same type. So, this list is just a simple list of elements.
- However, the list can be interpreted as an S-expression. So, the compiler considers this list as a function. We call such a list a list with a special form. A list with a special form has the same syntax, but it is considered as a function definition or a function call. Its elements don't require to have the same type (since it's not a simple sequence of homogeneous elements). Therefore, such a list doesn't support classic operations on a list (e.g.
cons
,tail
) since it represents a function.- Function definition
- The first element is an atom representing the function name
- The second element is a list of arguments
- The third element is the function body
- Function call
- The first element is an atom representing the function name
- The second element is a list of arguments
- Function definition
-
Heterogeneous tuples. The tuple should have more than 1 element. The difference between tuples and lists is that tuple's elements are separated by a comma.
- User can define aliases with the use of primitives.
- Aliases can be defined in a local scope.
(type Seconds Int)
(type Minutes Int)
(functype ConvertSecondsToMinutes (Seconds -> Minutes))
(func ConvertSecondsToMinutes (seconds) (divide seconds 60))
- To import some program with defined functions and atoms from another file just write
import file.txt
. - It should be written above the usage of the functions.
- From an implementation point of view, before running the lexical analysis we substitute the content of the imported file in a program row where
import
was written.
Our standard library has the following functions:
(functype lastElement ((Any) -> Any))
(functype reverse ((Any) -> (Any)))
(functype getListWithoutLastElement ((Any) -> (Any)))
(functype getMinElement ((Any) -> Any))
(functype removeElement (Any (Any) -> (Any)))
(functype sort ((Any) -> (Any)))
To import the standard library you need to write in the source code:
import std.txt
After that, you can use the aforesaid functions in your code.
- Our language support nested definitions. It means that it is possible to define a local variable/function in any scope.
- From an implementation point of view, we keep the singleton stack of hashmaps. When the function should enter the local scope, we create a new hashmap and put it in the stack. All local function and atom definitions are stored in such a hashmap. To get the outer scope atom we traverse through the stack. When we leave the local scope, the top stack's hashmap is deleted.
Example:
(setq x 55) ; define global x = 55
(setq y 66) ; define global y = 66
(prog (x y) (
(func helper () 42) ; define local function 'helper'
(setq x (helper)) ; define local x = 42
(setq y 6) ; define local y = 6
(x y) ; (42 6)
))
; the 'helper' function doesn't exist anymore
(x y) ; These x and y are the global ones (55 66)
Reserved word Auto
is used for specifying omitted types
; Define a function 'sum'
(functype sum (Auto Auto -> Auto))
(func sum (a b) (plus a b))
; Compiler checks that the function 'sum' is used for integers.
; Therefore, 'sum' is resolved as (Int Int -> Int) function
(sum 5 4)
; Since 'sum' was inferred as (Int Int -> Int) function, here you'll get a compile error because arguments must be integers, not doubles.
; (sum 4.4 2.3)
- Type of the function :
Everything before
->
is the type of argument of the function Everything after->
is returned type of function
(functype sum (Num Num -> Num)) ; function takes 2 arguments of type Num and return type Num
(functype example ((Num -> Num) -> Num)) ; argument of the function is a function that takes Num and return Num and returned type of the function is Num
Comment lines ;
; Everything after the semicolon is a comment on this line
; The compiler will ignore this part of the code
Create a variable setq
(functype setq (Any Any -> Unit))
(setq a 5) ; a has type Int
(setq b 5.3) ; b has type Double
(setq c true); c has type Boolean
(setq d "hello, world"); d has type String
Reassignment
(setq x 5) ; We declared and initialized a variable x which has type Int
(setq x 6); x reassigned a value. It's still has a type Int
; (setq x 8.0) compile error. Can't reassign to Double since x was declared as Int
(setq a 8)
(setq x a); now x has a value of 8
define
keyword
(functype define (Any Any -> Unit))
The user can't define an atom of types: Unit
, Any
, Num
(define x Int) ; declare a variable e with type Int
(setq x 5); assign an integer value
(define x Int); In futher usage x must be Int
; (setq x true) compile error
(setq x 5); valid assignment since 5 is integer
functype
keyword
- The syntax: (
functype
(list of arguments types
->return type
)). - You can declare the
functype
for a particular function only once above thefunc
. Unit
argument can be used only as a single type.
(functype unitExample1 (Unit -> Int)) ; can use Unit as argument
(functype unitExample1 (Unit Double -> Int)) ; can't use Unit. It should be single.
(functype myEasyFunction (Int -> Double)) ; a function which takes Int as an argument and return Double
(functype myMediumFunction (Int Boolean -> Double)) ; a function which takes two arguments: Int and Boolean and returns Double.
(functype myHardFunction ((Int -> Boolean) -> Double)) ; a function which takes a function as an argument and returns Double. The argument function is a function which takes Int and returns Boolean.
(functype myInterestingFunction ((Int -> Boolean) -> (Double -> String))) ; a function which takes a function as an argument and returns a function. The argument function accepts Int and return Bolean. The result function is a function which takes Double and returns String
Creare a function func
The syntax: (func
function name
list of arguments
function body
)
(functype squareSum)
(func squareSum (a b)
(
(plus (times a a) (times b b))
))
(setq x (squareSum 2 3)) ; call the function squareSum with arguments 2 and 3. Assign result to x
x ; should output 13
(functype getBestNumber (Unit -> Int)) ; The function doesn't reqire any argument and returns an integer.
(func getBestNumber () 42)
(getBestNumber) ; this line will call function getBestNumber and outputs 42
(setq x 3)
(functype incrementByValue (Int -> Unit))
(func incrementByValue (n)
(setq x (plus x n))
)
; The function accepts an argument n and increments outer scope variable x by n. The function itself returns nothing
(setq x 3)
(functype sideEffect (Unit -> Unit))
(func sideEffect ()
(setq x (plus x 1))
)
; The function accepts nothing and returns nothing. It just increments outer scope variable x
Arithmetic functions
(functype plus (Num Num -> Num))
(functype minus (Num Num -> Num))
(functype times (Num Num -> Num))
(functype divide (Num Num -> Num))
Logical operators
(functype and (Boolean Boolean -> Boolean))
(functype or (Boolean Boolean -> Boolean))
(functype xor (Boolean Boolean -> Boolean))
(functype not (Boolean -> Boolean))
Comparisons
(functype equal (Any Any -> Boolean))
(nonequal false (Any Any -> Boolean))
(nonequal (Any Any -> Boolean))
(less (Num Num -> Boolean))
(lesseq (Num Num -> Boolean))
(greater (Num Num -> Boolean))
(greatereq (Num Num -> Boolean))
Cond
keyword
Syntax: (cond
Boolean condition
result if true
result if false
)
The result for true
and false
branches must have the same type.
(setq a 5)
(setq b 6)
(cond (less a b) ; condition
(plus a b) ; the branch if condition is true
(minus a b) ; the branch if condition is false
)
(cond (less 4 3)
45.2
false
) ; compile error since the return type must be the same for both branches
Lists. The list is homogeneous. If the first element is a function name then such a list is considered as a function call, the rest list's elements are function arguments.
(plus 1 2) ; this is a function call
(1 2 3) ; this is a list of integers
; (1 5.23 true) compile error. The list must be homogeneous
() ; empty list
(define x (Int)) ; x is a list of integers
(setq x (1 2 3)) ; now x is equal to (1 2 3)
; (setq x (true false)) complie error since x is a list of integers, not booleans
For function arguments, we denote a list as (T)
where T
- is the list type.
(functype getMaxElement ((Int) -> Int)) ; the function accepts a list of integers and returns an integer number
(functype getMaxElement ((Int) -> (Int))) ; the function accepts a list of integers and returns a list of integers
Operations on list
(functype cons (Any (Any) -> (Any)))
(functype tail ((Any) -> (Any)))
(functype isempty ((Any) -> Boolean))
(functype length ((Any) -> Int))
(cons 3 ()); returns a list with one element (3)
(cons 1 (2 3)); returns a list (1 2 3)
(head (1 2 3)); returns the first element
(tail (1 2 3)); returns a list without the first element
(isempty (1 2 3)); returns false
(isempty ()); returns true
(length (1 2 3)); returns 3
(length ()); returns 0
lambda
keyword.
Syntax: (lambda
list of arguments
function body
)
(define myFunc (Int -> Int))
(setq myFunc (lambda (p) (plus p 1)))
(functype applyFunction (Int (Int -> Int) -> Int)) ; the function takes an integer and some function as an arguments and returns intger number.
(func applyFunction (x f)
(f x)
)
(applyFunction 3 (lamda (p) (times p p)))
let binding
Syntax: (let
list of defined variables
in body
)
(func getMax (currentMax list)
(cond (isempty list)
currentMax
(let
(
(first (head list))
(restList (tail list))
)
(cond (greater first currentMax)
(getMax first restList)
(getMax currentMax restList))
)
)
)
(setq lst (2 4 1 3))
(getMax (head lst) (tail lst))
tuples
(define x (Int, Double, Int))
(setq x (3, 2.2, 30))
(getAt x 2) ; returns 2.2
(setAt x 2 4.4) ; now x = (3, 4.4, 30)
Currying function
(functype moreThanX (Int (Int) -> (Int)))
(func moreThanX (x lst) (
(cond (isepmty lst)
()
(let
(
(first (head lst))
(restLst (tail lst))
)
(cond (greater first x)
(cons first (moreThanX x restLst))
(moreThanX x restLst)
)
)
)
)
(setq moreThan5 (moreThanX 5)) ; moreThan5 can be considered as a function similar to moreThanX, but the first argument is already passed as 5.
(setq initialList (1 2 5 6 7))
(setq listResult (moreThan5 initialList))
; listResult will have values '(6 7)
In the interpreter package there are classes related to AST interpretation.
- AtomsTable.java saves defined atoms and their values. This class is represented as a singleton. Atoms are stored in some local context. Atoms with similar names shadows atoms from the outer scope. Each local context is represented as a HashMap. The hierarchy of local contexts is presented as a Stack. When you enter into some function or loop the interpreter calls the
introduceLocalContext()
function to create a new temporary local context which will be deleted after leaving the form (leaveLocalContext()
will be called). - FunctionsTable.java is similar to AtomsTable.java, but it stores defined user-functions.
- PredefinedFunction.java and DefinedFunction.java classes are utility functions to interpret a list as a function call and evaluate it.
Online interpreter You can test the interpreter on this website
Instructions for building the project
- Сlone a repository of our project.
- Navigate to the folder of the project using the terminal.
- Run the command
mvn clean install
. This command will create a ‘target’ folder and an executable .jar file inside.
Instructions for using the prototype
- Create a build of the application (see the previous step)
- Navigate to the created ‘target’ folder.
- Run a command:
java -jar Compilers_Innopolis-1.0-SNAPSHOT.jar ../code/main.txt
Note: before running a command you need to enter a code of the program inside the code/main.txt file. You can specify another path to the program file.
; sorting example
(functype getMinElement ((Int) -> Int))
(functype removeElement (Int (Int) -> (Int)))
(functype sort ((Int) -> (Int)))
(func getMinElement (lst)
(cond (isempty (tail lst))
(head lst)
(cond (less (head lst) (getMinElement (tail lst)))
(head lst)
(getMinElement (tail lst))))
)
(func removeElement (el lst)
(cond (equal (head lst) el) (tail lst) (cons (head lst) (removeElement el (tail lst))))
)
(func sort (lst)
(cond (isempty (tail lst))
lst
(cons (getMinElement lst) (sort (removeElement (getMinElement lst) lst)) )
)
)
(setq sortedList (sort initialList))
sortedList
(func getMax (currentMax list)
(cond (isempty list)
currentMax
(let
(
(first (head list))
(restList (tail list))
)
(cond (greater first currentMax)
(getMax first restList)
(getMax currentMax restList))
)
)
)
(define superMaxElement Auto)
(setq superMaxElement (getMax (head sortedList) (tail sortedList)))
(functype getMaxElementTwice (Unit -> Auto))
(func getMaxElementTwice () (times 2 (getMax (head sortedList) (tail sortedList))))