a simple concatenative programming language written in C++
Latest commit 71a979c Aug 2, 2009 @doublec Handle \r\n in strings
Failed to load latest commit information.
gc @ 8b1c928 Fix leak due to using frame copy constructor Jul 29, 2009
.gitignore Convert test.lcf to use latex literate syntax Jul 31, 2009
.gitmodules Split out gc implementation into an external project May 30, 2009
CHANGELOG Add details of object system in README Jul 29, 2009
README Use regexps to parse irc message in cfbot Jul 31, 2009
bench.lcf Benchmark fixes May 21, 2009
cf.cpp Handle \r\n in strings Aug 2, 2009
cf.h Add primitive for adding methods with arguments Jul 30, 2009
cfbot.lcf Handle printing of empty stack in cfbot Jul 31, 2009
leakmain.cpp Add factorial to leaktest and remove compiler warnings Jul 30, 2009
literate.lcf Rename printnl to println Apr 19, 2009
main.cpp Implement Mark and Sweep GC May 20, 2009
makefile Split out gc implementation into an external project May 30, 2009
prelude.cf Use regexps to parse irc message in cfbot Jul 31, 2009
repl.cf Major primitive word change. unquote '!' is now . Apr 17, 2009
socket.cpp Add socket-readn Aug 2, 2009
socket.h Convert from reference counting to garbage collection May 12, 2009
test.lcf Convert test.lcf to use latex literate syntax Jul 31, 2009
testmain.cpp XYObject is no longer pure virtual Jul 29, 2009
threads.cpp thread-resume wasn't popping sequence off stack Jul 31, 2009
threads.h Convert from reference counting to garbage collection May 12, 2009


cf programming language

cf is a simple concatenative programming language written in C++. It
is based on the languages XY [1] and F [2] written by Stevan Apter. 

It originally started out as an implementation of F, written in C,
hence the name 'cf'. I later moved to C++ and changed things a bit
to try out some ideas.

This isn't really intended to be a usable language in itself. The code
is meant to be hacked and modified to add features and make it easy
to try out concatenative language ideas.

[1] http://nsl.com/k/xy/xy.htm
[2] http://nsl.com/k/f/f.htm

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice,
   this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.


You'll need the following:

1) A C++ compiler. I used gccc 4.3.3 to build it and it worked fine.
2) libgmp [3] for 'big numbers'
3) Boost [4] for various library features I use.
4) gc C++ garbage collector [5]

'cf' uses 'gc', a C++ garbage collector. The git repository for 'cf'
has 'gc' included as a submodule. When first getting the 'cf' source
code via git you should do the following:

  $ git submodule init
  $ git submodule update

To build 'cf' run 'make'. This produces 'cf', 'testcf' and 'leakcf'
executables.  Run 'cf' to get a simple command line repl and 'testcf'
to run some tests to see if things are working. 'leakcf' will run,
load some test files, execute a garbage collection and exit. It's
useful for testing for leaks under valgrind.

[3] http://gmplib.org/
[4] http://www.boost.org/
[5] http://github.com/doublec/gc

Run the 'cf' command. It optionally taks a list of filenames as
arguments. These files are loaded, in order, and evaluated. For 
example, 'prelude.cf' provides some standard library functions.
I recommend using rlwrap, or similar, to get command line editing:

  $ rlwrap ./cf prelude.cf
    Loading prelude.cf

Data entered at the 'ok' prompt is pushed onto the stack, and pressing enter
evaluates the program:

  ok  25 5 + fac.
  265252859812191058636308480000000  ->

'fac' is the factorial function implemented in prelude.cf.

Quick Overview
I'll add more detailed information here later, for now reading the
information in [1] and [2] should enable a quick overview.

Basically the running system is composed of an environment, E, a
stack X, and a queue Y. A 'program' is a series of functions and
literals entered at the prompt as above.

When a program is evaluated, each item is prepended to the queue Y.
The system then runs, taking the item at the front of the queue and
evaluating it. 

  Literals (Numbers, Strings, Symbols) are immediately pushed onto the
  stack X.
  Primitives and shuffle patterns are executed immediately
  and the results can affect E, X or Y.

Primitives are the only symbols that are immediately executed. Symbols
can be set to values and the value is obtained using the ';' primitive.
If this value is a list it can then be further evaluated as a program using
the '.' primitive. If '.' is used on a symbol then it's value will be
obtained (if it has one) and '.' called again on that value. This
makes the use of ';' optional in a lot of common cases.

Comments are anything between ** and **.
For example:

  ok [1 2 +] myfunc set   ** set the 'myfunc' symbol to the list 1,2 and + **
  ok myfunc               ** push the 'myfunc' symbol on the stack         **
  ok ;                    ** get the value of it ([1 2 +])                 **
  ok .                    ** execute the list as a program                 **
  ok println              ** print the result                              **
   => 3

Literate Files
A form of literate programming is supported, based on that of Literate
Haskell: http://haskell.org/haskellwiki/Literate_programming

Any file that ends with the extension .lcf is a literate cf file. All
lines starting with '>' are treated as code, everything else is a
comment. For example:

> "The sum of 1 + 2 is " write 
> 1 2 + .

To facilitate use in LaTeX documents, all code between \begin{code}
and \end{code} are also treated as code:

"This is evaluated as code" .

See the examples in literate.lcf. They can be run with:

$ ./cf prelude.cf literate.lcf

Look up cf.cpp in the XY() constructor for the list of primitives. Currently
they are:

+        - Addition
-        - Subtraction
*        - Multiplication
%        - Division
^        - Power
_        - Floor
set      - set a symbol to a value
;        - get the value of a symbol
.        - Execute a program on the stack
)        - Process a pattern and store the result on the stack X
(        - Process a pattern and prepend the result to the queue Y
`        - dip operator
|        - Reverse a list
'        - quote 
$        - stack operator
$$       - stackqueue operator
=        - equals
<        - less than
<=       - less than or equal to
>        - greater than
>=       - greater than or equal to
not      - if true, return false and vice versa.
@        - retrieve nth item from a list (2 [1 2 3 4] @ => 3)
.        - print the topmost item on the stack
print    - as '.' but don't do a newline
write    - write the topmost item on the stack
count    - return the length of the list
tokenize - given a string, return a list of the cf tokens.
parse    - given a list of tokens, return a list of cf objects
getline  - get a line of input from the user
millis   - returns number of milliseconds since 1970/1/1.
enum     - given a number, returns a list of elements from 0 to n-1.
clone    - creates a copy of the object on the stack
to-string - leaves a string representation of the object on the stack
split     - ( string seperators -- seq ) splits a string
sdrop     - ( seq n -- seq ) removes n items from the sequence
stake     - ( seq n -- seq ) returns a sequence with the first n elements
foldl     - ( seq seed q -- seq ) left fold 
foldr     - ( seq seed q -- seq ) right fold 
if        - ( bool then else -- )
?         - ( seq elt -- index ) find
gc        - ( -- ) Perform garbage collection

Numbers can be floats or integers. Integers can be of any length. For example:

  ok 1000 fac. println
  ...a really big number...

Shuffle patterns can be used to move things around on the stack. These are
templates for stack operations:

  ok 1 2 3 abc-cba 
   => 3 2 1

There is Pattern support as per F [2]. Patterns, ( and ) operate the same as
in F as far as I know:

  ok [1 2 [3 4 5]] [ [a b [c C]] c C [a b] ] )
   => 3 [4 5] [1 2]

$ works the same as in F [2]. It takes a program on the stack. That program
will be evaluated with X and Y on the stack. It should return a new X and Y.
This powerful operator can be used to do anything, including modify the future
of the computation - it can implement partial and full continuations.

$$ is a helper function for $. It takes and X and Y on the stack and replaces
X and Y with these.

` is dip. For example:

  Joy/Factor: 1 2 3 [ - ] dip => -1 3
          cf: 1 2 3 [-]`

' is the quote operator. It takes the next item on the Y queue, puts it into a
one elements list and pushes it on the X stack.  

  1 2 '+ 
   => 1 2 [+]

, is the join operator. It appends things to lists. Given two lists they are
appended. Given two non-lists a two element list is made containing them. 
Given a list and a non-list, the non-list is added to the list. eg:

  [1 2] [3 4], => [1 2 3 4]
  1 [2 3 4],   => [1 2 3 4]
  [1 2 3] 4,   => [1 2 3 4]
  1 2,         => [1 2]

See 'prelude.cf' for the current non-primitives. Some of these include:

dup        - 1 dup.           => 1 1
drop       - 1 2 drop.        => 1
swap       - 1 2 swap.        => 2 1 
uncons     - [1 2 3] uncons.  => 1 [2 3]
cons       - 1 [2 3] cons.    => [1 2 3]
first      - [1 2 3] first.   => 1
rest       - [1 2 3] rest.    => [2 3]
clear      - Clears the X and Y stacks. Useful at the REPL.
queue      - Move the top stack item to the tail of the queue. >r in forth/factor
unqueue    - Move the tail of the queue to the top of the stack. r> in forth/factor
stack      - push a list cotaining stack contents on the stack
unstack    - replace the stack with the contents of the list on the top of the stack
cond       - conditional: 5 [0 >] [ drop. "is true" ] [drop. "is false"] cond.
while      - 1 [10 <] [1 + a-aa .] while.
do         - 10 [ "hello" println ] do.
map        - [1 2 3] [1 +] map. => [2 3 4]
each       - [1 2 3] [println] each.
fold       - [1 2 3] 0 [+] fold. => 6
unfold     - 1 [10 >] [] [1 +] unfold => [1 2 3 4 5 6 7 8 9 10]
cleave     - 1 [1 +] [2 +] cleave => 2 3
time       - [100 fac.drop.] time. => 7
fac        - 5 fac. => 120
fac2       - same as 'fac' but uses a fold/unfold and is much slower.
match      - ( regexp string -- seq ) 
             matches string against the regular expression, returning
             a sequence of matches. 
repl.cf contains a tiny repl implemented in cf. It calls the 'tokenize' and
'parse' primitives to evaluate the user input and prints the X and Y stack/queue
after each line. Note that the Y queue will show the 'remaining computation'
of the repl itself. Run with:

  $ ./cf prelude.cf repl.cf
  ok repl.
  Y: prompt . getline [ "bye" = ] [ drop . ] [ tokenize parse . repl . ] cond . 

Exit using 'bye'.

Other Stuff
A bunch of useful primitives remain to be written. I want to implement
many of them in the language itself.

I find needing to do the ';' after functions to get their value a bit 
annoying but I'm interested in seeing what can be done with this. This
idea comes from the False language mentioned in the F documentation.

Combinators are generally written by building lists of programs to do
the work. For example, the definition of 'each' from the prelude:

  [[[]]`swap;unit.`,uncons;unit.`,while.drop.] each set

This builds a while expression by taking the definition of 'uncons'
and 'swap' and prefixing them to the program given by the user. So
the translation looks like:

  [1 2 3] [println] each. => [1 2 3] [] [uncons.swap.] while.

'fold' works similar it translates like this:

  [1 2 3] 0 [+] fold => [0 1 2 3 + + + ].

Experimental Object System
cf includes an experimental protoype based object system. This is
still being designed and written so it is likely to change quite
a bit. The object system is based on that in the Self programming

cf objects can have named slots. Each slot refernces another cf
object. Messages are symbols. The value of a slot is obtained
using the 'lookup' primitive: 

  lookup ( object message -- object list )

The result of lookup is the original object and a list containing
code that when unquoted will send the message to the object. The
list can be unquoted with '.' as normal. So 'lookup .' is to objects
as 'foo ;.' is to standard cf lookup and execution of functions.

Slots can be data or method slots. A data slot holds an initial value
and when a message is sent to it retrieves the value. Data slots are
added with 'add-slot':

  add-slot ( object value name -- object )

  object copy 42 foo add-data foo lookup .
    => 42
  object copy 42 foo add-data 25 over. foo: lookup . foo lookup .
    => 25

There is also an 'add-ro-slot' which adds a read only slot. No setter will
be added for this type of slot.

Parent slots have a name ending in '*'. If a message is not found when
looked up in an object then all that objects parent slots are also searched
for the message. This is how the prototype object system handles inheritance:

  object copy 42 foo add-slot
  object copy swap. parent* add-slot
  foo lookup .
    => 42

Method slots take a list representing the program to run when the
message is sent. When a message is sent that results in a method slot a
frame object is created. This frame object has a parent slot called 'self'
that points to the original object holding the method. Slot values of this
original object can be retrieved by accessing this frame object.

  add-method ( object args code name -- object )
  frame ( -- object )

  object copy [] [ frame message lookup . print ] display add-method
  object copy "hello" message add-slot
  swap. parent* add-slot
  display lookup .
    => "hello"

There is an 'object' primitive that returns the prototypical object. The
'copy' primitive returns a shallow copy of an object.

  object ( -- object )
  copy ( object -- object' )

The get and unquote primitives have been overloaded to work with the object
system. If ';' is called on a symbol, and that symbol is not found in the
environment, then it will be looked up in an object on the stack using

  object copy 25 foo add-slot foo;.
    => 25

If '.' is called on a symbol, and that symbol is not found in the 
environment, then it will be looked up in the frame object using
'lookup'. This makes '.' similar to Self's implicit self sends:

  object copy 25 foo add-slot
  [] [ foo. println ] print-foo add-method
    => 25

This works with arguments to methods too:

  object copy 
  [msg] [ msg. println ] print-msg add-method
  "hello" swap. print-msg;.
    => "hello"

Error handling started of as being C++ assertions so a lot of
error handling still aborts with a fatal error. 

Most errors that occur as a result of interpreting (programming
errors, stack underflows, type errors, etc) cause the interpreter drop
back to the repl with an error object on the stack. The error object
has the following elements:

  1: The 'error' symbol
  2. The error message
  3. The stack at the time of the error
  4. The queue at the time of the error

From the repl you can manually examine the stacks, modify them
and restart execution. The following can be used to restart
given the stack and queue:

  ok [ ab- ] $

Why write this when concatenative languages like Factor exist
that are full featured with lots of libraries?

I wanted something small to play around with different ideas. Like
a programming puzzle to find different implementations of prelude
functions in different styles.

I wanted a C/C++ program that used small amounts of memory so I
could run it on embedded devices like a phone.

I host this on githib:

Feel free to clone, hack away, suggest patches, etc. I can be reached
via email: chris.double@double.co.nz