Skip to content

A simplistic stack-based programming language inspired by Factor with a compiler or interpreter (haven't decided yet) written in rust

Notifications You must be signed in to change notification settings

Will-Banksy/tower

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tower

A simplistic stack-based programming language inspired by Factor with a compiler or interpreter (haven't decided yet) written in LLVM IR Rust.

Why not LLVM IR anymore? Cause I feel I've got what I wanted from writing it, and it was just tedious from there - additionally, I started caring about the result of the project. I made more progress in Rust in a few hours than I did in LLVM IR in a few days.

Extremely WIP, the lexer has not been finished yet, and the language design isn't fully decided upon either. This README serves as its documentation and discussion.

TODO: Update this README

Concepts

In tower, all operations operate on a stack of values that is shared throughout the whole program.

Functions do not have parameters, since they operate directly on the stack, and so any arguments you want to pass to a function have to be pushed onto the stack. This does mean that the responsibility of ensuring functions are provided all required arguments falls upon the programmer.

Currently being worked on though is a system of checking stack effects, where the exact inputs and outputs of every function are checked to make sure they are correct, much like type checking in most languages - this really requires generics and as a dependency of that, traits/interfaces, which are also planned. Further along, a method of type erasure will be made where the constraints of the stack effect checking is too restrictive.

Syntax

Top level tower code is currently composed entirely of comments and the arguably more exciting function declarations. Since all functions operate on a stack rather than taking arguments and returning values, tower code is essentially just postfix notation.

UTF-8 is fully supported, so that identifiers can be constructed out of any unicode characters except whitespace, or in ways that would identify it as another token type such as a numeric literal. E.g. fn ✨ = "sparkle" println ; is valid but fn "sparkle" = "✨" println ; is not.

Of note is that right now, all tokens are whitespace delimited. E.g. fn main = 1 println ; is valid but fn main=1 println; is not - main=1 will be parsed as a single identifier, as will println; and due to the lack of the = and ; tokens the function declaration will not make it through parsing.

Comments

Comments begin with #, and end at the end of the line, bash style. This also allows shebangs.

(Named) Functions

Function declarations take the form fn function_name = followed by the function body, and terminated with a ;.

The function body is a list of literals, words (function calls), and keywords. Literals are like an instruction to push a value on to the stack - E.g. fn hello = "hello" ; is a function that simply pushes "hello" onto the stack and then returns. Words are simply the name of a function, which when execution reaches it that function is called - E.g. fn hello = print_hello ; is a function that simply calls the print_hello function then returns. The only keywords that can appear in a function body are the { } keywords, denoting the start and end of an anonymous function.

Anonymous Functions

Anonymous functions are, as their name suggests, functions without a name. They are declared inline inside a function body between a pair of curly brackets { }. Anything that goes inside a normal/named function body can go inside an anonymous function body.

At parsing, anonymous functions are extracted into a named function with a unique name, and the original declaration turned into a fnptr literal to that uniquely named function, so when execution reaches an anonymous function declaration, a fnptr to the extracted anonymous function is pushed onto the stack.

Anonymous functions are extremely useful for control flow, as they remove the need to declare a named function for each control flow target.

Example:

fn main = { "inside an anonymous function" println } dup println call

Output:

&anon_<unique sequence of numbers and dashes>
inside an anonymous function

Types

The following types are (or will be) supported:

Type name Description Literal example(s)
str UTF-8 string "hello", "✨"
bool Boolean true, false
i64 Signed 64-bit integer -19725, 1, 64i
u64 Unsigned 64-bit integer 741u
f64 64-bit floating-point number 6.9, 7f
fnptr Function pointer &function_name

TODO: Support different integer sizes, and rethink how data is stored on the stack

TODO: Also support Algebraic Data Types (ADTs), where the type would be e.g. bool | i64 and to deconstruct it you use match keyword/instruction, which has stack effect e.g. bool | i64 -> T, where T is the type returned from the anon fns passed to match (there may be multiple things returned... how might the stack effect reflect that...? Maybe it just copies the output of the anon fns as it's stack effect... be nice if I could do it in a less special case way tho). Also as we might need named variants, can do something like None | Some(i64)

String Literals

String literals are enclosed in "" and are UTF-8 compatible. All whitespace in the string literal is included in the final string. Escape sequences similar to in other languages are supported.

Supported escape sequences:

  • \n - newline
  • \r - carriage return
  • \t - tab
  • \0 - null character
  • \\ - single backslash
  • \u{######} - specified unicode code point, in hex, up to 6 digits

Control Flow

Branching is done by using the instructions if and ifelse with fnptrs. Usually, the fnptrs will be anonymous functions.

The if instruction expects a fnptr and bool on the stack, and if the bool is true then the fnptr is called, otherwise nothing happens.

Example usage:

fn main = true { "hello world" println } if

Output:

hello world

The ifelse instruction expects two fnptrs and a bool on the stack, and if the bool is true then the first fnptr is called, otherwise the second is.

Example usage:

fn main = 1 2 eq { "equal" } { "not equal" } ifelse

Output:

not equal

Loops are done using the while instruction, which expects a fnptr and a bool on the stack, much like if. while pops the two arguments off the stack, and if the bool is true, the fnptr gets called. After that, while expects another bool on the stack, and if that is true, the fnptr gets called again.

Example usage:

fn main = 0 true { 1 add dup dup print 10 ne " " print } while

Output:

1 2 3 4 5 6 7 8 9 10

Standard Library

No standard library has been finalised yet. Currently, there are a few functions such as print, dup, add, etc. that are implemented in the interpreter, known internally as instructions.

Ideally I'd like to have a way to define the standard library and whole language inside of itself, but most basic operations such as arithmetic and stack manipulation will for now need to be defined in the compiler/interpreter (function argument binding would be a way to move stack manipulation to be writable in pure tower). C/rust interop and low level functionality (inline IR?) might be a way forward, if I can think of a way to get that working.

Dev Notes

C Interop Notes

Current thinking is to in tower code declare an interface with syntax like below:

dynlib "libc.so" decl malloc/i32 -> i8*
dynlib "libc.so" decl fseek/i8* i32 i32 -> i32

or

extern "libc.so" fn malloc/i32 -> i8*
extern "libc.so" fn fseek/i8* i32 i32 -> i32

(The syntax is very much not finalised, the syntax for the function parameters should also probably be integrated into normal tower functions e.g. fn main/i32 i8** -> i32 = but maybe take more inspiration from factor's stack effect declarations like fn shuffle/x y -> y x = y x (some functions can be untyped? generics? and how is this enforced? This is getting a bit complicated. Maybe just leave it as returning types for now rather than whole stack effect declarations))

And then dynamically load the library with dlopen and call functions with dlsym on posix, LoadLibrary and GetProcAddress on windows (meaning I have to link to the dl library on posix, think on windows should be good as kernel32.dll should be linked automatically) That's if I write an interpreter, a compiler to LLVM IR would be able to avoid that. But then I have to write a compiler.

Syntax Extensibility

In a similar vein to Factor, perhaps one could define new ways of parsing tower code, like creating new literals... Or simply macros... I can't think of a way of doing this right now, but the keyword syn defined like a function might be good.

About

A simplistic stack-based programming language inspired by Factor with a compiler or interpreter (haven't decided yet) written in rust

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published