This free software project has its source code on https://github.com/RefPerSys/RefPerSys with an obsolete variant on https://gitlab.com/bstarynk/refpersys/ which is not maintained in 2024. It is its own web site on http://refpersys.org/ where more details are given.
RefPerSys is aiming to become a free software symbolic artificial intelligence system or inference engine, e.g. a better alternative to CLIPSrules.
Contributions to RefPerSys are welcome. Contact by email
Basile STARYNKEVITCH (near Paris, France)
at basile@starynkevitch.net
and b.starynkevitch@gmail.com
or Abhishek Chakravarti (Kolkata, India) at chakravarti.avishek@gmail.com
.
Basile can also be contacted
by snail mail (8 rue de la Faïencerie, 92340 Bourg-la-Reine, France)
or (on French office hours) by Whatapp to +33 6 8501 followed by
the four digits product of seven and 337.
There used to have several variants of RefPerSys. But unless we get help (including funding) and contributors we focus on RefPerSys in C++ since commit 559ea329f46a (2024, Nov, 20).
HorizonEurope or ITEA consortia interested by RefPerSys are welcome to contact us. Likewise for students or academics or passionate open source developers wanting to contribute.
The Reflective Persistent System language is a research project, taking many good ideas from Bismon, sharing a lot of goals (except static source code analysis) with it but avoiding bad ideas from it.
For Linux/x86-64 only, unless someone provides us ssh
remote access
to some other Linux 64 bits system. Don't even think of running that
on non-Linux systems, unless you provide patches for that. And we need
a 64 bits processor. If you can give us ssh
access to non x86-64
sixty-four bits Linux machines (multi-core, at least 32Gbytes of RAM,
at least 128Gbytes of disk space, with GCC
installed for C and C++) please contact us.
We have multi-threading in mind, but in some limited way. We think of a pool of a few dozen Pthreads at most (but not of a thousand Pthreads).
We absolutely want to avoid any GIL
Don't expect anything useful from RefPerSys before at least 2024. But you could have fun sharing our ideas and experimenting yours.
A rewrite of RefPerSys in C was attempted on refpersys-in-c but is abandonned in commit 559ea329f46a28 (nov. 2024).
We considered previously to use the garbage collector from Ravenbrook MPS.
Don't expect RefPerSys to be a mature project. It is not in October 2024.
Your $EDITOR
should give a valid editor path (e.g. /usr/bin/emacsclient
or /usr/bin/gedit
or /usr/bin/vim
...)
Your $HOME
should be valid.
Your $PATH
should be valid and give access to your refpersys
and
(when relevant) lto-refpersys
executables. A possibility could be to
have (or create) your $HOME/bin/
directory (appearing in your
$PATH
) and symbolic links inside it. Your $PATH
should give access
to GNU make as make
(a simpler
make utility, including BSD make, is not ok). the
git
version control command should be
accessible and likewise for other GNU
utilities.
The GNU make
software should have version at least 4 and be
configured to use GNU
guile. See GNU make bug
#66658.
Conventionally any RefPerSys specific environment variable (outside
of the usual environment variables mentioned above) is starting with
REFPERSYS_
.... Some of them are set by the refpersys
executable before forking any command:
The $REFPERSYS_TOPDIR
should contain the path of the RefPerSys
source directory. So if you did some command like git clone https://github.com/RefPerSys/RefPerSys
in your $HOME/work/
directory you need to export REFPERSYS_TOPDIR=$HOME/work/RefPerSys
(probably in some of your ̀$HOME/.bashrc
or $HOME/.zshrc
or
$HOME/.zshenv
shell files) ...
When the refpersys
executable is forking a process or running a
command, the following environment variables have been set (by its
rps_extend_env
C++ function):
REFPERSYS_TOPDIR
to the top directory e.g.$HOME/work/RefPerSys
REFPERSYS_RUN_NAME
to the run name (if given with--run-name=NAME
)REFPERSYS_GITID
to the complete git id (e.g.d65f8c47aed61d31454b3612a33a30a308660d31+
).
The suffix+
indicates that your RefPerSys source code has been modified locally and is newer that what you are reading on this https://github.com/RefPerSys/RefPerSys webpage.REFPERSYS_PID
to the process id e.g.1653963
REFPERSYS_SHORTGITID
to a shortened git id e.g.d65f8c47aed6+
Some draft design ideas are written in the RefPerSys design draft which is very incomplete work in progress.
If you happen to know about any research call for proposals or funding
opportunities e.g. thru some HorizonEurope
consortium
in Europe (Euro zone) about this (e.g. related to artificial
intelligence
goals) and open source
please mention them to Basile
Starynkevitch (France) by email to
basile@starynkevitch.net
(personal email) or by snail mail, e.g. postcards are also welcome.
Like Bismon, RefPerSys is managing an evolving, persistable, heap of dynamically typed, garbage-collected, values, exactly like Bismon does (see §2 Data and its persistence in Bismon of the Bismon draft report...). The semantics -but not the syntax- of values is on purpose close to those of Lisp, Python, Scheme, JavaScript, Go, or even Java, etc.... Most of these RefPerSys values are immutable; for example boxed strings, sets -with dichotomic search inside them- or tuples of references to objects, closures, etc ...- But some of these RefPerSys values are mutable objects, and by convention every mutable value is called an object. Each mutable object has its own lock, and any access or update of mutable data inside objects is generally made under its lock. By exception, some very few, and very often accessed, mutable fields inside objects (e.g. their class) are atomic pointers, for performance reasons. Objects have (exactly like in Bismon) attributes, components, and some optional payload. An attribute is an association between an object (called the key of that attribute) and some RefPerSys arbitrary non-nil value (called the value of that attribute), and each object has its mutable associative table of attributes. A component is an arbitrary RefPerSys value, and each object has some mutable vector of them. The payload is any additional mutable data (e.g. a string buffer, an mutable vector or hashtable of values, some class metadata, etc...), owned by the object. So the data model of a RefPerSys object is as flexible as the data model of JavaScript. However, RefPerSys objects have a mutable class defining their behavior (not their fields, which are represented as attributes) so used for dynamic message dispatching.
See separate CODE-REPR.md markup file.
RefPerSys will have a small fixed set of worker threads (perhaps a dozen of them), each running some agenda loop; we would have some central data structure (called the agenda, like in Bismon (see §1.7 of the Bismon draft report...) organizing runnable tasklets (e.g. a few FIFO queues of them). A tasklet should conceptually run quickly (in a few milliseconds) and is allowed to add or remove runnable tasklets (including itself) to the agenda. Each worker thread is looping: fetching a runnable tasklet from the agenda, then running that tasklet.
This research project is GPLv3+ licensed and copyrighted by the RefPerSys team, currently made of:
-
Basile Starynkevitch <basile@starynkevitch.net>, 8 rue de la Faïencerie 92340 Bourg-la-Reine France homepage http://starynkevitch.net/Basile/ near Paris, France. So usual timezone `TZ=MEST`
-
Abhishek Chakravarti <ack@fifthestate.co.in> Kolkata, India
-
Nimesh Neema <nimeshneema@gmail.com>
-
Niklas Rozencrantz in Stockholm, Sweden.
Some files might be "borrowed" from other similar GPLv3+ licensed projects (notably from Bismon...) and could retain their original copyright owner.
Please ask, by email, the above RefPerSys team for C++ coding
conventions before starting non-trivial contributions to the C++
runtime of RefPerSys. If you are contributing to its C++ runtime,
please run make clean
after any git pull
. You generally should run make config
The GPLv3+ license of RefPerSys is unlikely to change before 2025 (and probably even after).
RefPerSys could be patched and extended to generate proprietary code or data. In 2023 some authors (including Basile Starynkevitch) are not interested to add such a features. Others authors (in India) are interested to add that. Their contributions are pending (in october 2024).
The RefPerSys runtime is implemented in C++17, with hand-written C++
code in *_rps.cc
, and has a single C++ header file refpersys.hh
.
We don't claim to be C++ gurus. Most C++ experts could write more
genuine C++ code than we do and will find our C++ code pityful. We
just want our runtime to work, not to serve as an example of well
written C++17 code.
The prefered C++ compiler (in 2023Q2) for RefPerSys is GCC version 13 or (preferably in 2024Q3) 14.
It could be worthwhile to sometimes compile RefPerSys with clang++
(see http://clang.llvm.org/ for more). In practice make clean
then
make RPS_BUILD_CXX=clang++
. The Clang static
analyzer could be useful, but
expect a lot of warnings, since C++ dont have flexible array
members but we
need something similar.
RefPerSys may later also use generated C++ code in some _*.cc
file, some generated C code in some _*.c
and generated C or C++
headers in some _*.h
files. By convention, files starting with an
underscore are generated (but they may, or not, being git
versioned). Some generated C++ files which are git add
-ed are under
generated/
subdirectory.
A RefPerSys generated C++ file should be generated from some RefPerSys object (its generator).
We could need later some C++ generating program (maybe similar in
spirit to Bismon's
BM_makeconst.cc. it
would then be named rps_*
for the executable, and fits in a single
self-sufficient rps_*.cc
C++ file. Perhaps we'll later have some
rps_makeconst
executable to generate some C++, and its source in
some rps_makeconst.cc
. So the convention is that any future C++
generating source code is in some rps_*.cc
C++ file. In commit
65a8f84aeffc9ba4e468
or newer the dumping facility is scanning
hand-written C++ source files to emit generated/rps-constants.hh
RefPerSys aims to become an
homoiconic system : we
hope to generate most of its C++ source code (under generated/
subdirectory), and explicitly represent the generated code as objects.
A plugin is some shared object
(some *.so
ELF file) loaded by
dlopen(3).
The C++ code of a plugin is hand-written, with the hope of needing less and less of them
(and have them replaced by automatically generated module code). We don't bother
dlclose(3)-ing plugins.
A binary module is a shared object whose C++ code is generated by
RefPerSys at dump time. The generated C++
code is called the source module (unrelated to C++20 modules). It is
conventionally named generated/_
objid.cc
Binary modules are conventionally named generated/__rps_*.so
, see in
our GNUmakefile
the line around comment **generated binary modules
See our C++ utility do-build-refpersys-plugin.cc
to understand compiling
conventions in binary modules or plugins. In particular, if the first
fifty (50) lines of a generated C++ file contain - probably inside a
comment - @RPSCOMPILEFLAGS= that is used to compile the plugin. If
they contain @RPSLIBES= that is used to link the plugin.
A transpiler for RefPerSys is being developed on https://github.com/bstarynk/misc-basile
files transpiler-refpersys.{cc,hh}
The build automation
tool used here is GNU make since
commit 6d56f50660c7cc41b9
(it was
omake before).
The GNU lightning library is need for machine code generation. You may want to compile it from source code and configure this lightning library with
./configure --with-gnu-ld --enable-disassembler --enable-devel-disassembler \
--enable-devel-get-jit-size --disable-silent-rules CFLAGS='-O2 -g2'
then the usual build commands.
You should have compiled and installed Ian Taylor's
libbacktrace,
e.g. under /usr/local/
. You may need to add /usr/local/lib/
in
your /etc/ld.so.conf
and run ldconfig -v -a
after installation of
that libbacktrace
.
The JsonCPP and and
also a mail command in your
$PATH
.
To install the dependencies on a recent Debian 12 bookworm or Ubuntu 22 system, you could run the following steps
sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
(for Ubuntu 20.04)sudo apt install -y gcc-12 g++-12 clang-14 libc++-11-dev libc++abi-11-dev
(for Ubuntu 22.04)sudo apt install libunistring-dev
sudo apt install libjsoncpp-dev
sudo apt-get install libssl-dev
sudo apt install bisonc++ bisonc++-doc
sudo apt install ccache g++ make build-essential remake gdb automake
sudo apt install ttf-unifont ttf-mscorefonts-installer unifont msttcorefonts fonts-ubuntu fonts-tuffy fonts-spleen fonts-roboto fonts-recommended fonts-yanone-kaffeesatz fonts-play fonts-eurofurence fonts-ecolier-court fonts-dejavu fonts-croscore fonts-cegui fonts-inter fonts-inconsolata
git clone https://github.com/ianlancetaylor/libbacktrace.git
cd libbacktrace
./configure
make
make install
Some C++ code is important since shared between RefPerSys and the
guifltk-refpersys
program. In particular, the code of
rps_compute_cstr_two_64bits_hash
should not be changed after
mid-september 2023. It uses GNU libunistring and is shared (in file
jsonrpsfltk.cc
of the guifltk-refpersys
See also some files from
misc-basile in particular
build-with-guile.c
You need a recent C99 compiler e.g. gcc
(at least GCC
12). Check with gcc --version
.
You need a recent C++17 compiler such as g++
(We use
GCC 13 or GCC 14
and sometimes clang++
whose warnings are different.
Look into, and perhaps improve, our GNUmakefile
. Check with g++ --version
. Build using make -j 3
or more.
You may need the ninja build utility; in commit 559ea329f46a and before
it is used by our do-refpersys-build-plugin
utility.
You need the glibmm
from GTK suite. And some gtkmm
for some RefPerSys plugins.
You also should do a make clean
after any git pull
and often redo a make config
...
Your Unix environment should contain a REFPERSYS_TOPDIR
shell
variable. The author has env REFPERSYS_TOPDIR=$HOME/RefPerSys/
and
did git clone
this into that $REFPERSYS_TOPDIR
.
You need a Linux pkg-config
utility.
You first need to make config
. I recommend having the GNU readline library.
You then build with make -j4 refpersys && make all
Debugging them is painful (or my code using GUI libraries) because of
lack of DWARF debugging information in many GUI libraries (and even
for my bugs, it is easier to have GUI libraries compiled with -O2 -g
). If you are familiar with FLTK and able
and allowed to download its source code and configure it using
'./configure' 'CFLAGS=-O2 -fPIC -g -Wall -Wextra' 'CXXFLAGS=-O2 -g -fPIC -Wall -Wextra' '--disable-static' '--enable-shared' '--enable-debug' '--with-abiversion' '--with-optim=-g -O2 -fPIC' '--sysconfdir=/etc/local/'
please send us an email.
RefPerSys is a multi-threaded and garbage-collected system. We are fully aware that multi-thread friendly and efficient garbage collection is a very difficult topic.
The reader unaware of garbage collection terminology (precise vs. conservative GC, tracing garbage collection, copying GC, GC roots, GC locals, mark and sweep GC, incremental GC, write barrier) is advised to read the GC handbook and is expected to have read very carefully the Tracing Garbage Collection wikipage.
We have considered to use Ravenbrook MPS. Unfortunately for us, that very good GC implementation seems unmaintained, and with almost a hundred thousand lines of code is very difficult to grasp, understand, and adopt. Finally, using MPS is not reasonable in our eyes.
We also did consider using Boehm
GC. That conservative GC is really simple
to use (basically, use GC_MALLOC
instead of malloc
, etc...) and is
C++ friendly. However,
it is rather slow (even for allocations of GC-ed zones, and we would
have many of them) and might be quite unsuitable for programs having
lots of circular
references, and
reflexive programs have lots of them.
So we probably are heading towards developing our own precise and multi-thread friendly GC (hopefully "better" than Boehm, but worse than MPS), with the following ideas:
-
local roots in the local frame are explicit, like in Bismon (
LOCALFRAME_BM
macro of bismon/cmacros_BM.h) or Ocaml (see its §20.5 Living in harmony with the garbage collector andCAMLlocal*
andCAMLparam*
andCAMLreturn*
macros). The local call frame is conventionally reified as the_
local variable, so an automatic variable GC-ed pointerfoo
is coded_f.foo
in our C++ runtime. A local frame in RefPerSys should be declared in C++ usingRPS_LOCALFRAME
. By convention, and for readability, useRPS_NULL_CALL_FRAME
in C++ code when the caller frame argument of invocation of C++ macroRPS_LOCALFRAME
is statically null, andRPS_CALL_FRAME_UNDESCRIBED
when its descriptor is not given. -
our garbage collector manages memory zones inside a set of
mmap
-ed memory blocks : either small blocks of a megaword that is 8 megabytes (i.e.RPS_SMALL_BLOCK_SIZE
), or large blocks of 8 megawords (i.e.RPS_LARGE_BLOCK_SIZE
). Values are inside such memory zones. Mutable objects may contain -perhaps indirectly- pointers to quasivalues (notably in their payload), that is to garbage collected zones which are not first-class values. A typical example of quasivalue could be some bucket in some (fully RefPerSys-implemented) array hash table (appearing as the payload of some object), in which buckets would be some small and mutable dynamic arrays of entries with colliding hashes. Such buckets indeed garbage collected zones, but are not themselves values (since they are mutable, but not reified as objects). -
The GC allocation operations are explicitly given the pointer to the local frame (i.e.
&_
, namedRPS_CURFRAME
), which is linked to the previous call frame and so on. That pointer is passed to every routine needing the GC (i.e. allocating or mutating values); only functions which don't allocate or mutate (e.g. accessor or getter functions) can avoid getting that local frame pointer. -
The C++ runtime and plugins, and any code generated in RefPerSys (i.e. modules) or with libgccjit, should explicitly be in A-normal form. So coding
z = f(g(x),y)
is forbidden in C++ (wheref
andg
are C++ functions using the GC). Instead, reserve a local slot such as_.tmp1
in the local frame, then code_.tmp1 = g(RPS_CURFRAME, _.x); _.z = f(RPS_CURFRAME, _.tmp1, _.y);
In less pedantic terms, we should do only one call (to GC-aware functions) or one allocation per statement; and every such call to some allocation primitive, or to a GC-aware function, should pass theRPS_CURFRAME
and useRPL_LOCALFRAME
in the calling function. -
A write barrier should be called after object or quasivalue updates, and before any other allocation or update of some other object, value, or quasivalue. In practice, code
_.foo.rps_write_barrier(RPS_CURFRAME)
or more simply_.foo.RPS_WRITE_BARRIER()
-
Every garbage-collection aware thread (a thread allocating GC-ed values, mutating GC-ed quasivalues or objects, running the GC forcibly) should call quite often, typically once per few milliseconds, the
Rps_GarbageCollector::maybe_garbcoll
routine. If this is not possible (e.g. before a potentially blockingread
orpoll
system call), special precautions should be taken. Forgetting to call thatmaybe_garbcoll
function often enough (typically every few milliseconds) could maybe crash the system. -
Consequently, as a rule of thumb, any routine which can directly or indirectly allocate GC-ed values or quasi-values, or directly or indirectly mutate GC-ed values or quasi-values, should take a calling callframe argument. We might need to consider: putting that specific
callframe
argument in some global register, using GCCregister
...asm
extension to define global register variables and compile with the-ffixed-
reg code generation option. By coding convention, that calling callframe argument should be preferably namedcallingfra
, and should be the first argument of every function or methods (member functions in C++ classes) requiring the GC.
For Bismon, see http://github.com/bstarynk/bismon and read its draft Bismon report.
For the C++17 language, see this C++ reference.
For Linux programming, see Advanced Linux
Programming and the
syscalls(2)
man
page.
For GCC, see notably its Invoking GCC chapter.
For garbage collection, read Paul Wilson's Uniprocessor Garbage Collection Techniques old paper, then read the GC handbook
We already need the following libraries:
- libunistring for UTF-8 support, since UTF-8 is everywhere
- libbacktrace for backtraces is essential. We hope to have rules mentioning and querying the call stack.
- libgccjit for code generation.
We may want to use, either soon or within a few years, (usually after 2022) interesting C or C++ libraries such as:
- libonion or Wt should be very soon (even in 2019) useful for the web interface
- libevent or libev for some event loop (quite soon).
- TensorFlow for machine learning purposes
- Gudhi for topological data analysis
- libcurl for HTTP client
- GMPlib for Arbitrary Precision Arithmetic or Bignums.
- 0mq for distributed messaging, in relation with distributed computing and message passing approaches.
- JsonCPP could be useful for JSON.
- POCO is a useful C++ generic framework library, and Qt might also be useful, even without its GUI aspect.
We should list other libraries interesting for us here, just in case (to avoid forgetting them).
Thanks to Niklas Rosencrantz (Sweden) (he is montao
on github) for several contributions.
Thanks to Abhishek Chkravarti (India) (he is achakravarti
on github) for several contributions.
Other contributors, please email basile@starynkevitch.net
about you.
We are adding HTTP service in RefPerSys. So
libonion is required. For
many months, we just hope to use http://localhost:9090/
in a recent
(e.g. Firefox 80) web browser.
We really need to be able to show a demo of RefPerSys on a laptop
without Internet connection. So all required resources should be
copied here, under webroot/
. Be careful about copyright and
licensing issues.
The webroot/
subdirectory holds resources useful for HTTP
requests. In particular the following subdirectories:
-
webroot/css/
for -hand-written- style sheets. -
webroot/img/
for additional images. Prefer SVG or PNG formats. -
webroot/js/
for JavaScript code.
- apt install make
- apt install pkg-config
- apt install libcurl4-openssl-dev
- apt install zlib1g-dev
- apt install libreadline-dev
- apt install libjsoncpp-dev
- apt install qt5-default
- apt install cmake # we probably want to remove this dependency
- apt install build-essential