Skip to content

RefPerSys/RefPerSys

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

RefPerSys

This free software project has its source code on https://gitlab.com/bstarynk/refpersys/ and on https://github.com/RefPerSys/RefPerSys . It is its own web site on http://refpersys.org/ where more details are given.

RefPerSys is aiming to become a free software symbolic artificial intelligence system or inference engine, e.g. an alternative to CLIPSrules.

Contributions to RefPerSys are welcome. Contact by email Basile Starynkevitch (near Paris, France) at basile@starynkevitch.net or Abhishek Chakravarti (Kolkata, India) at chakravarti.avishek@gmail.com. Basile can also be contacted by snail mail (8 rue de la Faïencerie, 92340 Bourg-la-Reine, France).

HorizonEurope or ITEA consortia interested by RefPerSys are welcome to contact us.

A research project

The Reflective Persistent System language is a research project, taking many good ideas from Bismon, sharing a lot of goals (except static source code analysis) with it but avoiding bad ideas from it.

For Linux/x86-64 only, unless someone provides us ssh remote access to some other Linux 64 bits system. Don't even think of running that on non-Linux systems, unless you provide patches for that. And we need a 64 bits processor. If you can give us ssh access to non x86-64 sixty-four bits Linux machines (multi-core, at least 32Gbytes of RAM, at least 128Gbytes of disk space, with GCC installed for C and C++) please contact us.

We have multi-threading in mind, but in some limited way. We think of a pool of a few dozen Pthreads at most (but not of a thousand Pthreads).

We absolutely want to avoid any GIL

Don't expect anything useful from RefPerSys before at least 2024. But you could have fun sharing our ideas and experimenting yours.

A rewrite of RefPerSys in C was attempted on refpersys-in-c.

We considered previously to use the garbage collector from Ravenbrook MPS.

Don't expect RefPerSys to be a mature project. It is not in Feb 2024.

Some draft design ideas are written in the RefPerSys design draft which is very incomplete work in progress.

If you happen to know about any research call for proposals or funding opportunities e.g. thru some HorizonEurope consortium in Europe (Euro zone) about this (e.g. related to artificial intelligence goals) and open source please mention them to Basile Starynkevitch (France) by email to basile@starynkevitch.net (personal email) or basile.starynkevitch@cea.fr (professional email).

persistent values

Like Bismon, RefPerSys is managing an evolving, persistable, heap of dynamically typed, garbage-collected, values, exactly like Bismon does (see §2 Data and its persistence in Bismon of the Bismon draft report...). The semantics -but not the syntax- of values is on purpose close to those of Lisp, Python, Scheme, JavaScript, Go, or even Java, etc.... Most of these RefPerSys values are immutable; for example boxed strings, sets -with dichotomic search inside them- or tuples of references to objects, closures, etc ...- But some of these RefPerSys values are mutable objects, and by convention every mutable value is called an object. Each mutable object has its own lock, and any access or update of mutable data inside objects is generally made under its lock. By exception, some very few, and very often accessed, mutable fields inside objects (e.g. their class) are atomic pointers, for performance reasons. Objects have (exactly like in Bismon) attributes, components, and some optional payload. An attribute is an association between an object (called the key of that attribute) and some RefPerSys arbitrary non-nil value (called the value of that attribute), and each object has its mutable associative table of attributes. A component is an arbitrary RefPerSys value, and each object has some mutable vector of them. The payload is any additional mutable data (e.g. a string buffer, an mutable vector or hashtable of values, some class metadata, etc...), owned by the object. So the data model of a RefPerSys object is as flexible as the data model of JavaScript. However, RefPerSys objects have a mutable class defining their behavior (not their fields, which are represented as attributes) so used for dynamic message dispatching.

Worker threads and agenda of tasklets

RefPerSys will have a small fixed set of worker threads (perhaps a dozen of them), each running some agenda loop; we would have some central data structure (called the agenda, like in Bismon (see §1.7 of the Bismon draft report...) organizing runnable tasklets (e.g. a few FIFO queues of them). A tasklet should conceptually run quickly (in a few milliseconds) and is allowed to add or remove runnable tasklets (including itself) to the agenda. Each worker thread is looping: fetching a runnable tasklet from the agenda, then running that tasklet.

License and copyright

This research project is GPLv3+ licensed and copyrighted by the RefPerSys team, currently made of:

  •  Basile Starynkevitch <basile@starynkevitch.net>,
     8 rue de la Faïencerie
    

    92340 Bourg-la-Reine France homepage http://starynkevitch.net/Basile/ near Paris, France. So usual timezone TZ=MEST

  •  Abhishek Chakravarti <abhishek@taranjali.org>
     Kolkotta, India
    
  •  Nimesh Neema <nimeshneema@gmail.com>
    
  •  Niklaus Rozencrantz in Sweden.
    

Some files might be "borrowed" from other similar GPLv3+ licensed projects (notably from Bismon...) and could retain their original copyright owner.

Contributing

Please ask, by email, the above RefPerSys team for C++ coding conventions before starting non-trivial contributions to the C++ runtime of RefPerSys. If you are contributing to its C++ runtime, please run make clean after any git pull.

The GPLv3+ license of RefPerSys is unlikely to change before 2025 (and probably even after).

Generated output.

RefPerSys could be patched and extended to generate proprietary code or data. In 2023 some authors (including Basile Starynkevitch) are not interested to add such a features. Others authors (in India) are interested to add that. Their contributions are pending (in sept. 2023).

File conventions

The RefPerSys runtime is implemented in C++17, with hand-written C++ code in *_rps.cc, and has a single C++ header file refpersys.hh. We don't claim to be C++ gurus. Most C++ experts could write more genuine C++ code than we do and will find our C++ code pityful. We just want our runtime to work, not to serve as an example of well written C++17 code.

The prefered C++ compiler (in 2023Q2) for RefPerSys is GCC version 12 or 13.

It could be worthwhile to sometimes compile RefPerSys with clang++ (see http://clang.llvm.org/ for more). In practice make clean then make RPS_BUILD_CXX=clang++. The Clang static analyzer could be useful, but expect a lot of warnings, since C++ dont have flexible array members but we need something similar.

RefPerSys may later also use generated C++ code in some _*.cc file, some generated C code in some _*.c and generated C or C++ headers in some _*.h files. By convention, files starting with an underscore are generated (but they may, or not, being git versioned). Some generated C++ files which are git add-ed are under generated/ subdirectory.

A RefPerSys generated C++ file should be generated from some RefPerSys object (its generator).

We could need later some C++ generating program (maybe similar in spirit to Bismon's BM_makeconst.cc. it would then be named rps_* for the executable, and fits in a single self-sufficient rps_*.cc C++ file. Perhaps we'll later have some rps_makeconst executable to generate some C++, and its source in some rps_makeconst.cc. So the convention is that any future C++ generating source code is in some rps_*.cc C++ file. In commit 65a8f84aeffc9ba4e468 or newer the dumping facility is scanning hand-written C++ source files to emit generated/rps-constants.hh

Terminology and conventions

RefPerSys aims to become an homoiconic system : we hope to generate most of its C++ source code (under generated/ subdirectory), and explicitly represent the generated code as objects.

A plugin is some shared object (some *.so file) loaded by dlopen(3).

A binary module is a shared object whose C++ code is generated by RefPerSys at dump time. The generated C++ code is called the source module (unrelated to C++20 modules). It is conventionally named generated/_objid.cc

Binary modules are conventionally named generated/__rps_*.so, see in our GNUmakefile the line around comment **generated binary modules

See our shell script build-plugin.sh to understand compiling conventions in binary modules or plugins. In particular, if the first fifty (50) lines of a generated C++ file contain - probably inside a comment - @RPSCOMPILEFLAGS= that is used to compile the plugin. If they contain @RPSLIBES= that is used to link the plugin.

Building and dependencies.

The build automation tool used here is GNU make since commit 6d56f50660c7cc41b9 (it was omake before).

You should have compiled and installed Ian Taylor's libbacktrace, e.g. under /usr/local/. You may need to add /usr/local/lib/ in your /etc/ld.so.conf and run ldconfig -v -a after installation of that libbacktrace.

The JsonCPP and and also a mail command in your $PATH.

To install the dependencies on a recent Debian 12 bookworm or Ubuntu 22 system, you could run the following steps

  • sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test (for Ubuntu 20.04)
  • sudo apt install -y gcc-12 g++-12 clang-14 libc++-11-dev libc++abi-11-dev (for Ubuntu 22.04)
  • sudo apt install libunistring-dev
  • sudo apt install libjsoncpp-dev
  • sudo apt-get install libssl-dev
  • sudo apt install bisonc++ bisonc++-doc
  • sudo apt install ccache g++ make build-essential remake gdb automake
  • sudo apt install ttf-unifont ttf-mscorefonts-installer unifont msttcorefonts fonts-ubuntu fonts-tuffy fonts-spleen fonts-roboto fonts-recommended fonts-yanone-kaffeesatz fonts-play fonts-eurofurence fonts-ecolier-court fonts-dejavu fonts-croscore fonts-cegui fonts-inter fonts-inconsolata
  • git clone https://github.com/ianlancetaylor/libbacktrace.git
  • cd libbacktrace
  • ./configure
  • make
  • make install

Important simple functions

Some C++ code is important since shared between RefPerSys and the guifltk-refpersys program. In particular, the code of rps_compute_cstr_two_64bits_hash should not be changed after mid-september 2023. It uses GNU libunistring and is shared (in file jsonrpsfltk.cc of the guifltk-refpersys

Related files

See also some files from misc-basile

Build instructions

You need a recent C++17 compiler such as g++ (We use GCC version GCC 10) or GCC 11 or clang++ version Clang 11. Look into, and perhaps improve, our Makefile. Build using make -j 3 or more.

You also should do a make clean after any git pull

You may want to edit your $HOME/.refpersys.mk file to contain definitions of GNU make variables for your particular C and C++ compiler, like e.g.

 # file ~/.refpersys.mk
 RPS_BUILD_CC= gcc-12
 RPS_BUILD_CXX= g++-12

You then build with make -j4 refpersys && make all

Garbage collection

RefPerSys is a multi-threaded and garbage-collected system. We are fully aware that multi-thread friendly and efficient garbage collection is a very difficult topic.

The reader unaware of garbage collection terminology (precise vs. conservative GC, tracing garbage collection, copying GC, GC roots, GC locals, mark and sweep GC, incremental GC, write barrier) is advised to read the GC handbook and is expected to have read very carefully the Tracing Garbage Collection wikipage.

We have considered to use Ravenbrook MPS. Unfortunately for us, that very good GC implementation seems unmaintained, and with almost a hundred thousand lines of code is very difficult to grasp, understand, and adopt. Finally, using MPS is not reasonable in our eyes.

We also did consider using Boehm GC. That conservative GC is really simple to use (basically, use GC_MALLOC instead of malloc, etc...) and is C++ friendly. However, it is rather slow (even for allocations of GC-ed zones, and we would have many of them) and might be quite unsuitable for programs having lots of circular references, and reflexive programs have lots of them.

Garbage collection ideas

So we probably are heading towards developing our own precise and multi-thread friendly GC (hopefully "better" than Boehm, but worse than MPS), with the following ideas:

  • local roots in the local frame are explicit, like in Bismon (LOCALFRAME_BM macro of bismon/cmacros_BM.h) or Ocaml (see its §20.5 Living in harmony with the garbage collector and CAMLlocal* and CAMLparam* and CAMLreturn* macros). The local call frame is conventionally reified as the _ local variable, so an automatic variable GC-ed pointer foo is coded _f.foo in our C++ runtime. A local frame in RefPerSys should be declared in C++ using RPS_LOCALFRAME. By convention, and for readability, use RPS_NULL_CALL_FRAME in C++ code when the caller frame argument of invocation of C++ macro RPS_LOCALFRAME is statically null, and RPS_CALL_FRAME_UNDESCRIBED when its descriptor is not given.

  • our garbage collector manages memory zones inside a set of mmap-ed memory blocks : either small blocks of a megaword that is 8 megabytes (i.e. RPS_SMALL_BLOCK_SIZE), or large blocks of 8 megawords (i.e. RPS_LARGE_BLOCK_SIZE). Values are inside such memory zones. Mutable objects may contain -perhaps indirectly- pointers to quasivalues (notably in their payload), that is to garbage collected zones which are not first-class values. A typical example of quasivalue could be some bucket in some (fully RefPerSys-implemented) array hash table (appearing as the payload of some object), in which buckets would be some small and mutable dynamic arrays of entries with colliding hashes. Such buckets indeed garbage collected zones, but are not themselves values (since they are mutable, but not reified as objects).

  • The GC allocation operations are explicitly given the pointer to the local frame (i.e. &_, named RPS_CURFRAME), which is linked to the previous call frame and so on. That pointer is passed to every routine needing the GC (i.e. allocating or mutating values); only functions which don't allocate or mutate (e.g. accessor or getter functions) can avoid getting that local frame pointer.

  • The C++ runtime, and any code generated in RefPerSys, should explicitly be in A-normal form. So coding z = f(g(x),y) is forbidden in C++ (where f and g are C++ functions using the GC). Instead, reserve a local slot such as _.tmp1 in the local frame, then code _.tmp1 = g(RPS_CURFRAME, _.x); _.z = f(RPS_CURFRAME, _.tmp1, _.y); In less pedantic terms, we should do only one call (to GC-aware functions) or one allocation per statement; and every such call to some allocation primitive, or to a GC-aware function, should pass the RPS_CURFRAME and use RPL_LOCALFRAME in the calling function.

  • A write barrier should be called after object or quasivalue updates, and before any other allocation or update of some other object, value, or quasivalue. In practice, code _.foo.rps_write_barrier(RPS_CURFRAME) or more simply _.foo.RPS_WRITE_BARRIER()

  • Every garbage-collection aware thread (a thread allocating GC-ed values, mutating GC-ed quasivalues or objects, running the GC forcibly) should call quite often, typically once per few milliseconds, the Rps_GarbageCollector::maybe_garbcoll routine. If this is not possible (e.g. before a potentially blocking read or poll system call), special precautions should be taken. Forgetting to call that maybe_garbcoll function often enough (typically every few milliseconds) could maybe crash the system.

  • Consequently, as a rule of thumb, any routine which can directly or indirectly allocate GC-ed values or quasi-values, or directly or indirectly mutate GC-ed values or quasi-values, should take a calling callframe argument. We might need to consider: putting that specific callframe argument in some global register, using GCC register ... asm extension to define global register variables and compile with the -ffixed-reg code generation option. By coding convention, that calling callframe argument should be preferably named callingfra, and should be the first argument of every function or methods (member functions in C++ classes) requiring the GC.

useful references

For Bismon, see http://github.com/bstarynk/bismon and read its draft Bismon report.

For the C++17 language, see this C++ reference.

For Linux programming, see Advanced Linux Programming and the syscalls(2) man page.

For GCC, see notably its Invoking GCC chapter.

For garbage collection, read Paul Wilson's Uniprocessor Garbage Collection Techniques old paper, then read the GC handbook

useful and relevant libraries

We already need the following libraries:

We may want to use, either soon or within a few years, (usually after 2022) interesting C or C++ libraries such as:

We should list other libraries interesting for us here, just in case (to avoid forgetting them).

some contributors

Thanks to Niklas Rosencrantz (Sweden) (he is montao on github) for several contributions. Thanks to Abhishek Chkravarti (India) (he is achakravarti on github) for several contributions.

Other contributors, please email basile@starynkevitch.net about you.

HTTP service

We are adding HTTP service in RefPerSys. So libonion is required. For many months, we just hope to use http://localhost:9090/ in a recent (e.g. Firefox 80) web browser.

We really need to be able to show a demo of RefPerSys on a laptop without Internet connection. So all required resources should be copied here, under webroot/. Be careful about copyright and licensing issues.

Web conventions.

The webroot/ subdirectory holds resources useful for HTTP requests. In particular the following subdirectories:

  • webroot/css/ for -hand-written- style sheets.

  • webroot/img/ for additional images. Prefer SVG or PNG formats.

  • webroot/js/ for JavaScript code.

Dependency installation notes (Ubuntu 20.04 Focal Fossa)

  • apt install make
  • apt install pkg-config
  • apt install libcurl4-openssl-dev
  • apt install zlib1g-dev
  • apt install libreadline-dev
  • apt install libjsoncpp-dev
  • apt install qt5-default
  • apt install cmake # we probably want to remove this dependency
  • apt install build-essential

About

Reflexive & Persistent System (artificial intelligence)

Resources

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE
GPL-3.0
COPYING-GPLv3
LGPL-3.0
COPYING-LGPLv3

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published