A research project
The Reflective Persistent System language is a research project, taking many good ideas from Bismon, sharing a lot of goals (except static source code analysis) with it but avoiding bad ideas from it.
For Linux/x86-64 only. Don't even think of running that on non-Linux systems, unless you provide patches for that. And we need a 64 bits processor.
We have multi-threading in mind, but in some limited way. We think of a pool of a few dozen Pthreads at most (but not of a thousand Pthreads).
We absolutely want to avoid any GIL
Don't expect anything useful from RefPerSys before at least 2023. But you could have fun sharing our ideas and experimenting yours.
A rewrite of RefPerSys in C happens on refpersys-in-c.
We considered previously to use the garbage collector from Ravenbrook MPS. Since that project is now obsolete, we gave up that idea.
Don't expect RefPerSys to be a realistic project. It is not (and certainly not before 2025).
Some draft design ideas are written in the RefPerSys design draft which is very incomplete work in progress.
If you happen to know about any research call for proposals or funding
opportunities in Europe (Euro zone) about this (e.g. related to
goals) please mention them to Basile
Starynkevitch (France) by email to
Worker threads and agenda of tasklets
RefPerSys will have a small fixed set of worker threads (perhaps a dozen of them), each running some agenda loop; we would have some central data structure (called the agenda, like in Bismon (see §1.7 of the Bismon draft report...) organizing runnable tasklets (e.g. a few FIFO queues of them). A tasklet should conceptually run quickly (in a few milliseconds) and is allowed to add or remove runnable tasklets (including itself) to the agenda. Each worker thread is looping: fetching a runnable tasklet from the agenda, then running that tasklet.
License and copyright
This research project is GPLv3+ licensed and copyrighted by the RefPerSys team, currently made of:
Basile Starynkevitch <firstname.lastname@example.org>, homepage http://starynkevitch.net/Basile/ near Paris, France. So usual timezone `TZ=MEST`
Abhishek Chakravarti <email@example.com>
Nimesh Neema <firstname.lastname@example.org>
Some files might be "borrowed" from other similar GPLv3+ licensed projects (notably from Bismon...) and could retain their original copyright owner.
Please ask, by email, the above RefPerSys team for C++ coding
conventions before starting non-trivial contributions to the C++
runtime of RefPerSys. If you are contributing to its C++ runtime,
make clean after any
The GPLv3+ license of RefPerSys is unlikely to change before 2025 (and probably even after).
The RefPerSys runtime is implemented in C++17, with hand-written C++
*_rps.cc, and has a single C++ header file
We don't claim to be C++ gurus. Most C++ experts could write more
genuine C++ code than we do and will find our C++ code pityful. We
just want our runtime to work, not to serve as an example of well
written C++17 code.
It could be worthwhile to sometimes compile RefPerSys with
(see http://clang.llvm.org/ for more). In practice
make clean then
make RPS_BUILD_CXX=clang++. The Clang static
analyzer could be useful, but
expect a lot of warnings, since C++ dont have flexible array
members but we
need something similar.
RefPerSys may later also use generated C++ code in some
file, some generated C code in some
_*.c and generated C or C++
headers in some
_*.h files. By convention, files starting with an
underscore are generated (but they may, or not, being git
versioned). Some generated C++ files which are
git add-ed are under
We could need later some C++ generating program (maybe similar in
spirit to Bismon's
would then be named
rps_* for the executable, and fits in a single
rps_*.cc C++ file. Perhaps we'll later have some
rps_makeconst executable to generate some C++, and its source in
rps_makeconst.cc. So the convention is that any future C++
generating source code is in some
rps_*.cc C++ file. In commit
65a8f84aeffc9ba4e468 or newer the dumping facility is scanning
hand-written C++ source files to emit
Building and dependencies.
You should have compiled and installed Ian Taylor's
/usr/local/. You may need to add
/etc/ld.so.conf and run
ldconfig -v -a after installation of
sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test(for Ubuntu 20.04)
sudo apt install -y gcc-11 g++-11 clang-11 libc++-11-dev libc++abi-11-dev(for Ubuntu 20.04)
sudo apt install libunistring-dev
sudo apt install libjsoncpp-dev
sudo apt-get install libssl-dev
sudo apt install ccache g++ make build-essential remake gdb automake
sudo apt install ttf-unifont ttf-mscorefonts-installer unifont msttcorefonts fonts-ubuntu fonts-tuffy fonts-spleen fonts-roboto fonts-recommended fonts-yanone-kaffeesatz fonts-play fonts-eurofurence fonts-ecolier-court fonts-dejavu fonts-croscore fonts-cegui fonts-inter fonts-inconsolata
git clone https://github.com/ianlancetaylor/libbacktrace.git
compiling FLTK with DWARF debug information
RefPerSys is using (e.g. in its commit
the FLTK graphical user interface toolkit
(e.g. FLTK version
or newer...). That toolkit should be
compiled with both debug information and optimization, by configuring
./configure --enable-debug --with-optim="-O2"
You also should do a
make clean after any
You may want to edit your
$HOME/.refpersys.mk file to contain
definitions of GNU
make variables for your particular C and C++ compiler,
# file ~/.refpersys.mk RPS_BUILD_CC= gcc-11 RPS_BUILD_CXX= g++-11
You then build with
make -j4 refpersys && make all
RefPerSys is a multi-threaded and garbage-collected system. We are fully aware that multi-thread friendly and efficient garbage collection is a very difficult topic.
The reader unaware of garbage collection terminology (precise vs. conservative GC, tracing garbage collection, copying GC, GC roots, GC locals, mark and sweep GC, incremental GC, write barrier) is advised to read the GC handbook and is expected to have read very carefully the Tracing Garbage Collection wikipage.
We have considered to use Ravenbrook MPS. Unfortunately for us, that very good GC implementation seems unmaintained, and with almost a hundred thousand lines of code is very difficult to grasp, understand, and adopt. Finally, using MPS is not reasonable in our eyes.
We also did consider using Boehm
GC. That conservative GC is really simple
to use (basically, use
GC_MALLOC instead of
malloc, etc...) and is
C++ friendly. However,
it is rather slow (even for allocations of GC-ed zones, and we would
have many of them) and might be quite unsuitable for programs having
lots of circular
reflexive programs have lots of them.
Garbage collection ideas
So we probably are heading towards developing our own precise and multi-thread friendly GC (hopefully "better" than Boehm, but worse than MPS), with the following ideas:
local roots in the local frame are explicit, like in Bismon (
LOCALFRAME_BMmacro of bismon/cmacros_BM.h) or Ocaml (see its §20.5 Living in harmony with the garbage collector and
CAMLreturn*macros). The local call frame is conventionally reified as the
_local variable, so an automatic variable GC-ed pointer
_.fooin our C++ runtime. A local frame in RefPerSys should be declared in C++ using
our garbage collector manages memory zones inside a set of
mmap-ed memory blocks : either small blocks of a megaword that is 8 megabytes (i.e.
RPS_SMALL_BLOCK_SIZE), or large blocks of 8 megawords (i.e.
RPS_LARGE_BLOCK_SIZE). Values are inside such memory zones. Mutable objects may contain -perhaps indirectly- pointers to quasivalues (notably in their payload), that is to garbage collected zones which are not first-class values. A typical example of quasivalue could be some bucket in some (fully RefPerSys-implemented) array hash table (appearing as the payload of some object), in which buckets would be some small and mutable dynamic arrays of entries with colliding hashes. Such buckets indeed garbage collected zones, but are not themselves values (since they are mutable, but not reified as objects).
The GC allocation operations are explicitly given the pointer to the local frame (i.e.
RPS_CURFRAME), which is linked to the previous call frame and so on. That pointer is passed to every routine needing the GC (i.e. allocating or mutating values); only functions which don't allocate or mutate (e.g. accessor or getter functions) can avoid getting that local frame pointer.
The C++ runtime, and any code generated in RefPerSys, should explicitly be in A-normal form. So coding
z = f(g(x),y)is forbidden in C++ (where
gare C++ functions using the GC). Instead, reserve a local slot such as
_.tmp1in the local frame, then code
_.tmp1 = g(RPS_CURFRAME, _.x); _.z = f(RPS_CURFRAME, _.tmp1, _.y);In less pedantic terms, we should do only one call (to GC-aware functions) or one allocation per statement; and every such call to some allocation primitive, or to a GC-aware function, should pass the
RPL_LOCALFRAMEin the calling function.
A write barrier should be called after object or quasivalue updates, and before any other allocation or update of some other object, value, or quasivalue. In practice, code
_.foo.rps_write_barrier(RPS_CURFRAME)or more simply
Every garbage-collection aware thread (a thread allocating GC-ed values, mutating GC-ed quasivalues or objects, running the GC forcibly) should call quite often, typically once per few milliseconds, the
Rps_GarbageCollector::maybe_garbcollroutine. If this is not possible (e.g. before a potentially blocking
pollsystem call), special precautions should be taken. Forgetting to call that
maybe_garbcollfunction often enough (typically every few milliseconds) could maybe crash the system.
Consequently, as a rule of thumb, any routine which can directly or indirectly allocate GC-ed values or quasi-values, or directly or indirectly mutate GC-ed values or quasi-values, should take a calling callframe argument. We might need to consider: putting that specific
callframeargument in some global register, using GCC
asmextension to define global register variables and compile with the
-ffixed-reg code generation option. By coding convention, that calling callframe argument should be preferably named
callingfra, and should be the first argument of every function or methods (member functions in C++ classes) requiring the GC.
For the C++17 language, see this C++ reference.
useful and relevant libraries
We already need the following libraries:
We may want to use, either soon or within a few years, (usually after 2022) interesting C or C++ libraries such as:
- libonion or Wt should be very soon (even in 2019) useful for the web interface
- libevent or libev for some event loop (quite soon).
- TensorFlow for machine learning purposes
- Gudhi for topological data analysis
- libcurl for HTTP client
- GMPlib for Arbitrary Precision Arithmetic or Bignums.
- 0mq for distributed messaging, in relation with distributed computing and message passing approaches.
- JsonCPP could be useful for JSON.
- POCO is a useful C++ generic framework library, and Qt might also be useful, even without its GUI aspect.
We should list other libraries interesting for us here, just in case (to avoid forgetting them).
Thanks to Niklas Rosencrantz (Sweden) for past minor contributions.
We are adding HTTP service in RefPerSys. So
libonion is required. For
many months, we just hope to use
http://localhost:9090/ in a recent
(e.g. Firefox 80) web browser.
We really need to be able to show a demo of RefPerSys on a laptop
without Internet connection. So all required resources should be
copied here, under
webroot/. Be careful about copyright and
webroot/ subdirectory holds resources useful for HTTP
requests. In particular the following subdirectories:
webroot/css/for -hand-written- style sheets.
webroot/img/for additional images. Prefer SVG or PNG formats.
Dependency installation notes (Ubuntu 20.04 Focal Fossa)
- apt install make
- apt install pkg-config
- apt install libcurl4-openssl-dev
- apt install zlib1g-dev
- apt install libreadline-dev
- apt install libjsoncpp-dev
- apt install qt5-default
- apt install cmake
- apt install build-essential