GitHub - doublec/cf: a simple concatenative programming language written in C++

doublec / cf Public
Notifications You must be signed in to change notification settings
Fork 2
Star 34
a simple concatenative programming language written in C++
www.bluishcoder.co.nz
Notifications
Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
gc @ 8b1c928		gc @ 8b1c928
.gitignore		.gitignore
.gitmodules		.gitmodules
CHANGELOG		CHANGELOG
README		README
bench.lcf		bench.lcf
cf.cpp		cf.cpp
cf.h		cf.h
cfbot.lcf		cfbot.lcf
factorial.lcf		factorial.lcf
leakmain.cpp		leakmain.cpp
literate.lcf		literate.lcf
main.cpp		main.cpp
makefile		makefile
prelude.cf		prelude.cf
repl.cf		repl.cf
socket.cpp		socket.cpp
socket.h		socket.h
test.lcf		test.lcf
testmain.cpp		testmain.cpp
threads.cpp		threads.cpp
threads.h		threads.h
Repository files navigation

cf programming language
=======================

cf is a simple concatenative programming language written in C++. It
is based on the languages XY [1] and F [2] written by Stevan Apter. 

It originally started out as an implementation of F, written in C,
hence the name 'cf'. I later moved to C++ and changed things a bit
to try out some ideas.

This isn't really intended to be a usable language in itself. The code
is meant to be hacked and modified to add features and make it easy
to try out concatenative language ideas.

[1] http://nsl.com/k/xy/xy.htm
[2] http://nsl.com/k/f/f.htm

License
=======
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice,
   this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
DEVELOPERS AND CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Building
======== 
You'll need the following:

1) A C++ compiler. I used gccc 4.3.3 to build it and it worked fine.
2) libgmp [3] for 'big numbers'
3) Boost [4] for various library features I use.
4) gc C++ garbage collector [5]

'cf' uses 'gc', a C++ garbage collector. The git repository for 'cf'
has 'gc' included as a submodule. When first getting the 'cf' source
code via git you should do the following:

  $ git submodule init
  $ git submodule update

To build 'cf' run 'make'. This produces 'cf', 'testcf' and 'leakcf'
executables.  Run 'cf' to get a simple command line repl and 'testcf'
to run some tests to see if things are working. 'leakcf' will run,
load some test files, execute a garbage collection and exit. It's
useful for testing for leaks under valgrind.

[3] http://gmplib.org/
[4] http://www.boost.org/
[5] http://github.com/doublec/gc

Running
=======
Run the 'cf' command. It optionally taks a list of filenames as
arguments. These files are loaded, in order, and evaluated. For 
example, 'prelude.cf' provides some standard library functions.
I recommend using rlwrap, or similar, to get command line editing:

  $ rlwrap ./cf prelude.cf
    Loading prelude.cf
   ->
  ok 

Data entered at the 'ok' prompt is pushed onto the stack, and pressing enter
evaluates the program:

  ok  25 5 + fac.
  265252859812191058636308480000000  ->
  ok

'fac' is the factorial function implemented in prelude.cf.

Quick Overview
==============
I'll add more detailed information here later, for now reading the
information in [1] and [2] should enable a quick overview.

Basically the running system is composed of an environment, E, a
stack X, and a queue Y. A 'program' is a series of functions and
literals entered at the prompt as above.

When a program is evaluated, each item is prepended to the queue Y.
The system then runs, taking the item at the front of the queue and
evaluating it. 

  Literals (Numbers, Strings, Symbols) are immediately pushed onto the
  stack X.
  Primitives and shuffle patterns are executed immediately
  and the results can affect E, X or Y.

Primitives are the only symbols that are immediately executed. Symbols
can be set to values and the value is obtained using the ';' primitive.
If this value is a list it can then be further evaluated as a program using
the '.' primitive. If '.' is used on a symbol then it's value will be
obtained (if it has one) and '.' called again on that value. This
makes the use of ';' optional in a lot of common cases.

Comments are anything between ** and **.
For example:

  ok [1 2 +] myfunc set   ** set the 'myfunc' symbol to the list 1,2 and + **
  ok myfunc               ** push the 'myfunc' symbol on the stack         **
  ok ;                    ** get the value of it ([1 2 +])                 **
  ok .                    ** execute the list as a program                 **
  ok println              ** print the result                              **
   => 3

Literate Files
==============
A form of literate programming is supported, based on that of Literate
Haskell: http://haskell.org/haskellwiki/Literate_programming

Any file that ends with the extension .lcf is a literate cf file. All
lines starting with '>' are treated as code, everything else is a
comment. For example:

> "The sum of 1 + 2 is " write 
> 1 2 + .

To facilitate use in LaTeX documents, all code between \begin{code}
and \end{code} are also treated as code:

\begin{code}
"This is evaluated as code" .
\end{code}

See the examples in literate.lcf. They can be run with:

$ ./cf prelude.cf literate.lcf

Primitives
==========
Look up cf.cpp in the XY() constructor for the list of primitives. Currently
they are:

+        - Addition
-        - Subtraction
*        - Multiplication
%        - Division
^        - Power
_        - Floor
set      - set a symbol to a value
;        - get the value of a symbol
.        - Execute a program on the stack
)        - Process a pattern and store the result on the stack X
(        - Process a pattern and prepend the result to the queue Y
`        - dip operator
|        - Reverse a list
'        - quote 
$        - stack operator
$$       - stackqueue operator
=        - equals
<        - less than
<=       - less than or equal to
>        - greater than
>=       - greater than or equal to
not      - if true, return false and vice versa.
@        - retrieve nth item from a list (2 [1 2 3 4] @ => 3)
.        - print the topmost item on the stack
print    - as '.' but don't do a newline
write    - write the topmost item on the stack
count    - return the length of the list
tokenize - given a string, return a list of the cf tokens.
parse    - given a list of tokens, return a list of cf objects
getline  - get a line of input from the user
millis   - returns number of milliseconds since 1970/1/1.
enum     - given a number, returns a list of elements from 0 to n-1.
clone    - creates a copy of the object on the stack
to-string - leaves a string representation of the object on the stack
split     - ( string seperators -- seq ) splits a string
sdrop     - ( seq n -- seq ) removes n items from the sequence
stake     - ( seq n -- seq ) returns a sequence with the first n elements
foldl     - ( seq seed q -- seq ) left fold 
foldr     - ( seq seed q -- seq ) right fold 
if        - ( bool then else -- )
?         - ( seq elt -- index ) find
gc        - ( -- ) Perform garbage collection

Numbers can be floats or integers. Integers can be of any length. For example:

  ok 1000 fac. println
  ...a really big number...

Shuffle patterns can be used to move things around on the stack. These are
templates for stack operations:

  ok 1 2 3 abc-cba 
   => 3 2 1

There is Pattern support as per F [2]. Patterns, ( and ) operate the same as
in F as far as I know:

  ok [1 2 [3 4 5]] [ [a b [c C]] c C [a b] ] )
   => 3 [4 5] [1 2]

$ works the same as in F [2]. It takes a program on the stack. That program
will be evaluated with X and Y on the stack. It should return a new X and Y.
This powerful operator can be used to do anything, including modify the future
of the computation - it can implement partial and full continuations.

$$ is a helper function for $. It takes and X and Y on the stack and replaces
X and Y with these.

` is dip. For example:

  Joy/Factor: 1 2 3 [ - ] dip => -1 3
          cf: 1 2 3 [-]`

' is the quote operator. It takes the next item on the Y queue, puts it into a
one elements list and pushes it on the X stack.  

  1 2 '+ 
   => 1 2 [+]

, is the join operator. It appends things to lists. Given two lists they are
appended. Given two non-lists a two element list is made containing them. 
Given a list and a non-list, the non-list is added to the list. eg:

  [1 2] [3 4], => [1 2 3 4]
  1 [2 3 4],   => [1 2 3 4]
  [1 2 3] 4,   => [1 2 3 4]
  1 2,         => [1 2]

Non-Primitives
==============
See 'prelude.cf' for the current non-primitives. Some of these include:

dup        - 1 dup.           => 1 1
drop       - 1 2 drop.        => 1
swap       - 1 2 swap.        => 2 1 
uncons     - [1 2 3] uncons.  => 1 [2 3]
cons       - 1 [2 3] cons.    => [1 2 3]
first      - [1 2 3] first.   => 1
rest       - [1 2 3] rest.    => [2 3]
clear      - Clears the X and Y stacks. Useful at the REPL.
queue      - Move the top stack item to the tail of the queue. >r in forth/factor
unqueue    - Move the tail of the queue to the top of the stack. r> in forth/factor
stack      - push a list cotaining stack contents on the stack
unstack    - replace the stack with the contents of the list on the top of the stack
cond       - conditional: 5 [0 >] [ drop. "is true" ] [drop. "is false"] cond.
while      - 1 [10 <] [1 + a-aa .] while.
do         - 10 [ "hello" println ] do.
map        - [1 2 3] [1 +] map. => [2 3 4]
each       - [1 2 3] [println] each.
fold       - [1 2 3] 0 [+] fold. => 6
unfold     - 1 [10 >] [] [1 +] unfold => [1 2 3 4 5 6 7 8 9 10]
cleave     - 1 [1 +] [2 +] cleave => 2 3
time       - [100 fac.drop.] time. => 7
fac        - 5 fac. => 120
fac2       - same as 'fac' but uses a fold/unfold and is much slower.
match      - ( regexp string -- seq ) 
             matches string against the regular expression, returning
             a sequence of matches. 
REPL
====
repl.cf contains a tiny repl implemented in cf. It calls the 'tokenize' and
'parse' primitives to evaluate the user input and prints the X and Y stack/queue
after each line. Note that the Y queue will show the 'remaining computation'
of the repl itself. Run with:

  $ ./cf prelude.cf repl.cf
  ok repl.
  Y: prompt . getline [ "bye" = ] [ drop . ] [ tokenize parse . repl . ] cond . 
  X: 
  repl> 

Exit using 'bye'.

Other Stuff
===========
A bunch of useful primitives remain to be written. I want to implement
many of them in the language itself.

I find needing to do the ';' after functions to get their value a bit 
annoying but I'm interested in seeing what can be done with this. This
idea comes from the False language mentioned in the F documentation.

Combinators are generally written by building lists of programs to do
the work. For example, the definition of 'each' from the prelude:

  [[[]]`swap;unit.`,uncons;unit.`,while.drop.] each set

This builds a while expression by taking the definition of 'uncons'
and 'swap' and prefixing them to the program given by the user. So
the translation looks like:

  [1 2 3] [println] each. => [1 2 3] [] [uncons.swap.] while.

'fold' works similar it translates like this:

  [1 2 3] 0 [+] fold => [0 1 2 3 + + + ].

Experimental Object System
==========================
cf includes an experimental protoype based object system. This is
still being designed and written so it is likely to change quite
a bit. The object system is based on that in the Self programming
language.

cf objects can have named slots. Each slot refernces another cf
object. Messages are symbols. The value of a slot is obtained
using the 'lookup' primitive: 

  lookup ( object message -- object list )

The result of lookup is the original object and a list containing
code that when unquoted will send the message to the object. The
list can be unquoted with '.' as normal. So 'lookup .' is to objects
as 'foo ;.' is to standard cf lookup and execution of functions.

Slots can be data or method slots. A data slot holds an initial value
and when a message is sent to it retrieves the value. Data slots are
added with 'add-slot':

  add-slot ( object value name -- object )

  object copy 42 foo add-data foo lookup .
    => 42
  object copy 42 foo add-data 25 over. foo: lookup . foo lookup .
    => 25

There is also an 'add-ro-slot' which adds a read only slot. No setter will
be added for this type of slot.

Parent slots have a name ending in '*'. If a message is not found when
looked up in an object then all that objects parent slots are also searched
for the message. This is how the prototype object system handles inheritance:

  object copy 42 foo add-slot
  object copy swap. parent* add-slot
  foo lookup .
    => 42

Method slots take a list representing the program to run when the
message is sent. When a message is sent that results in a method slot a
frame object is created. This frame object has a parent slot called 'self'
that points to the original object holding the method. Slot values of this
original object can be retrieved by accessing this frame object.

  add-method ( object args code name -- object )
  frame ( -- object )

  object copy [] [ frame message lookup . print ] display add-method
  object copy "hello" message add-slot
  swap. parent* add-slot
  display lookup .
    => "hello"

There is an 'object' primitive that returns the prototypical object. The
'copy' primitive returns a shallow copy of an object.

  object ( -- object )
  copy ( object -- object' )

The get and unquote primitives have been overloaded to work with the object
system. If ';' is called on a symbol, and that symbol is not found in the
environment, then it will be looked up in an object on the stack using
'lookup':

  object copy 25 foo add-slot foo;.
    => 25

If '.' is called on a symbol, and that symbol is not found in the 
environment, then it will be looked up in the frame object using
'lookup'. This makes '.' similar to Self's implicit self sends:

  object copy 25 foo add-slot
  [] [ foo. println ] print-foo add-method
  print-foo;.
    => 25

This works with arguments to methods too:

  object copy 
  [msg] [ msg. println ] print-msg add-method
  "hello" swap. print-msg;.
    => "hello"

Troubleshooting
===============
Error handling started of as being C++ assertions so a lot of
error handling still aborts with a fatal error. 

Most errors that occur as a result of interpreting (programming
errors, stack underflows, type errors, etc) cause the interpreter drop
back to the repl with an error object on the stack. The error object
has the following elements:

  1: The 'error' symbol
  2. The error message
  3. The stack at the time of the error
  4. The queue at the time of the error

From the repl you can manually examine the stacks, modify them
and restart execution. The following can be used to restart
given the stack and queue:

  ok [ ab- ] $

Why?
====
Why write this when concatenative languages like Factor exist
that are full featured with lots of libraries?

I wanted something small to play around with different ideas. Like
a programming puzzle to find different implementations of prelude
functions in different styles.

I wanted a C/C++ program that used small amounts of memory so I
could run it on embedded devices like a phone.

Feedback
========
I host this on githib:
  http://github.com/doublec/cf/tree/master

Feel free to clone, hack away, suggest patches, etc. I can be reached
via email: chris.double@double.co.nz