Skip to content

Commit

Permalink
switching back to single-threaded contexts, two-space collectors, as …
Browse files Browse the repository at this point in the history
…better fit for Wikilon's use case (really good straight-line code more important than cheap parallelism).
  • Loading branch information
dmbarbour committed Mar 28, 2016
1 parent c98d2c4 commit e50d54f
Show file tree
Hide file tree
Showing 8 changed files with 211 additions and 825 deletions.
33 changes: 22 additions & 11 deletions docs/ABCRT.md
Expand Up @@ -36,6 +36,12 @@ I'll actually include a copy of these directly, no shared libs.

With LMDB, I have the opportunity to use a single-process or multi-process access to the database. Multi-process access to a database would be very convenient if I want to later develop shells, command-line interpreters, etc. to access this database in the background. But that isn't a use case for Wikilon. For now I should probably favor single-process access using a lockfile, since it's much easier (with LMDB) to go from single-process to multi-process than vice versa.

If I do favor multi-process access, I'll need to somehow track ephemeral stowage references safely between processes. This is non-trivial.

### Regarding Error Reporting

Originally I was returning an 'error' value for most functions. I'm beginning to think this was a bad idea. First, it overly complicates the client API. Second, most errors are not recoverable. Third, tracking the exact cause of error is expensive. I may be better served by accumulating errors, similar to a context-specific `errno` or an error list of sorts, or by wrapping error values. By simply tracking, at the context level, whether an error has occurred.

## Structure

I need an *environment* bound to an underlying stowage and persistence model, e.g. via LMDB. The environment may also be associated with worker threads for par-seq parallelism. I also need *contexts* that associate an environment with a heap of memory for computations.
Expand All @@ -52,15 +58,11 @@ Originally I was going for a more conventional API where values have handles out

### Memory Management

ABC is well suited for manual memory management due to its explicit copy and drop operators. This couples nicely with a preference for linear structure and 'move' semantics, deep copies favored over aliasing. Linearity allows me to implement many functions (or their common cases) without allocation.
ABC is well suited for manual memory management due to its explicit copy and drop operators. This couples nicely with a preference for linear structure and 'move' semantics, deep copies favored over aliasing. Linearity allows me to implement many functions (or their common cases) without allocation.

*Thoughts:* I've frequently contemplated a bump-pointer nursery. This is feasible, perhaps, though it might require rewriting data plumbing operations to allocate within the nursery (lest `l` and `r` and the like produce references from the old space to the new space). This would probably complicate code a fair bit... so I'll hold off for now. Without a bump-pointer nursery, I cannot guarantee stack-like locality for 'new' values.
Bump-pointer allocators, together with two-space collectors, would also serve me very well. Parallelism would be 'heavy' in the sense that it copies computation data, but could also be 'lightweight' in the sense that it only triggers upon compacting GC - i.e. when data is to be copied regardless. Two-space compacting could drive both stowage and quotas, and the extra space would be convenient for stowage without runtime allocations (assuming stowage is no more expensive than a local representation). Stowage would serve as the implicit 'old space' where we no longer copy-collect data.

For the moment, I use a conventional size-segregated free-list and quick-fit mechanisms. With this, fragmentation is a problem, especially in a multi-threaded scenario. Use of `wikrt_move()` is a fragmentation hazard because it results in data being free'd in a separate thread from the one where it was allocated. When that memory is then reused, local fragmentation worsens. I'll accept high fragmentation so I can move forward with implementation.

Latent destruction of large values is a related, viable feature that could improve cx_destroy performance for large contexts values. OTOH, it limits an easy means to track substructural properties. So I'll probably leave this feature out for now.

At the moment, copy on large stacks and lists and such will tend to use slab-allocations for a large 'spine' value, but we will 'free' the same slab one element at a time. I'm not sure what effect this will have on fragmentation - it might help or hurt.
A potential difficulty with a moving collector is that I must represent active computation state within the memory region and deal with any changes in pointers after each allocation. I might achieve this by modeling a small set of active registers or similar... i.e. our active computation stack, as necessary. It would also serve as a scratch space as needed.

## Representations

Expand Down Expand Up @@ -148,14 +150,23 @@ ABC's discretionary value sealers. I'll optimize for 3-character (or fewer) disc

### Value Stowage

We can annotate a value to moved to persistent storage, or transparently reverse this.
We annotate a value to moved to persistent storage, or transparently reverse this.

*Thoughts*: If stowage is constrained to sealed values, that may simplify implementation code because I only need to handle transparent expansion as part of an 'unseal' operation. Further, this would enable me to present better metadata about sealed values. I believe this is a path worth pursuing or attempting. This is an easy constraint to enforce, and I may proceed with it.
It might be useful to align stowage with value sealers, i.e. such that stowage may only occur at sealed values. This would simplify processing of stowage because the 'transparent' stowage would only need to be handled at an unseal operation. Further, it would provide some typeful metadata around each node - relatively convenient for debugging. Conversely, having implicit stowage upon sealing a value could be convenient for space management purposes.

I may need similar alignment for parallelism or laziness, to avoid true transparent transforms. A simple notion of parallelism occurring behind a sealer is interesting because it would also align nicely with a notion of distributed computing (each sealer acts as a logical space for computation). So we could have something like `{&par:sealer}` that operates on sealed values of the appropriate type. OTOH, I could just require `{&par}` be paired with `{&seq}` for best performance.
(Parallel computations may benefit from similar techniques, to avoid 'transparent' translation of types and to essentially name the computation.)

### Computations

*Thoughts:* Lightweight parallelism has been a sticky point for me. It complicates memory management, increases risk of memory fragmentation. Heavy-weight shared-nothing parallelism could reduce synchronization a great deal, but has a larger overhead to copy data between threads. This could be mitigated by having some shared resources (e.g. binaries, blocks).


I'd benefit from focusing instead on fast alloc/free and fast processing within a context. Fast logical copy may also prove more useful than I was initially thinking, at least for specific value types (e.g. blocks, maybe binaries and arrays). Though, I could presumably achieve that much with reference counting.

An interesting option, perhaps, is to combine a fast bump-pointer alloc and GC (instead of `free`) with scope-limited copying of blocks etc..



I'd prefer to avoid arbitrary 'force to type' code scattered about. This would impact stowage, laziness, parallelism. If I enforce that stowage, laziness, and parallelism are always behind a sealer token, this would simplify my runtime code. This alignment may also simplify presentation and metadata, and eventual distribution. So, for now, let's assume I'll be aiming for this feature.

If I support lazy computations, that can potentially simplify many things such as streaming outputs from block-to-text conversions. Laziness is easy to express via annotation, e.g. `{&lazy}` could tag a block. One difficulty with laziness is that we need either to integrate state or to treat them somehow as linear objects (no copy). The other difficulty is integrating laziness appropriately with effort quotas - i.e. it requires we shift the quota to our context, somehow, rather than to a larger evaluation.
Expand Down Expand Up @@ -209,7 +220,7 @@ ABC supports integers. But other numbers could be supported via arithmetic accel

One of the bigger challenges will be supporting evaluation limits, i.e. effort quotas.


## Parallelism


# older content
Expand Down
5 changes: 1 addition & 4 deletions wikilon-runtime/Makefile
Expand Up @@ -5,7 +5,7 @@ THREADS := -pthread
OPT := -O2 -flto
CFLAGS := $(THREADS) $(OPT) $(W) -fPIC -g -std=gnu11
CC := gcc $(CFLAGS)
OBJECTS := lmdb/mdb.o lmdb/midl.o murmur3/murmur3.o futil.o utf8.o wikrt.o wikrt_mem.o wikrt_db.o
OBJECTS := lmdb/mdb.o lmdb/midl.o murmur3/murmur3.o futil.o utf8.o wikrt.o wikrt_db.o
INSTALL_PREFIX := /usr/local

.PHONY: all lib clean install uninstall runtests
Expand Down Expand Up @@ -45,9 +45,6 @@ libwikilon-runtime.so: $(OBJECTS) wikrt.lds
wikrt.o: wikrt.c wikrt.h utf8.h lmdb/lmdb.h
$(CC) -o $@ -c wikrt.c

wikrt_mem.o: wikrt_mem.c wikrt.h
$(CC) -o $@ -c wikrt_mem.c

wikrt_db.o: wikrt_db.c wikrt.h futil.h lmdb/lmdb.h
$(CC) -o $@ -c wikrt_db.c

Expand Down
6 changes: 1 addition & 5 deletions wikilon-runtime/testSuite.c
Expand Up @@ -686,16 +686,12 @@ void run_tests(wikrt_cx* cx, int* runct, int* passct) {
#define TCX(T) \
{ \
++(*runct); \
wikrt_cx* fork; \
wikrt_cx_fork(cx,&fork); \
assert(NULL != fork); \
bool const pass = T(fork); \
bool const pass = T(cx); \
if(pass) { ++(*passct); } \
else { \
char const* name = #T ; \
fprintf(stderr, errFmt, *runct, name); \
} \
wikrt_cx_destroy(fork); \
}

TCX(test_tcx);
Expand Down
137 changes: 60 additions & 77 deletions wikilon-runtime/wikilon-runtime.h
Expand Up @@ -34,10 +34,9 @@
* need for external persistence (e.g. no need for true filesystem
* access).
*
* - Parallelism. Modulo space requirements, pure computations behave
* independently of evaluation order. Divide and conquer tactics
* are effective if we can divide into coarse-grained tasks. ABC
* easily supports par/seq parallelism.
* - Parallelism. Pure computations can be parallelized easily because
* their behavior is independent of evaluation order. Though, they
* may fail non-deterministically due to memory consumption.
*
* Wikilon runtime shall support these techniques. I'll also support
* an integrated key-value store for stowage-friendly persistence.
Expand Down Expand Up @@ -100,15 +99,10 @@ typedef struct wikrt_env wikrt_env;

/** @brief Opaque structure representing a context for computation.
*
* A wikrt_cx - a wikilon runtime context - represents both a value and
* a space in which computations may be performed. Contexts are single
* threaded (that is, they must be used from a single thread). It is
* possible to communicate between contexts by use of `wikrt_move()`.
*
* A context contains a single value, to which we may apply a stream of
* functions (much like Awelon Bytecode). Values include unit, products,
* sums, integers, blocks of code, sealed values, and optimized encodings
* for texts, binaries, lists, etc.. A new context has the unit value.
* A wikrt_cx holds a single value that may be manipulated by a single
* external thread in order to perform a computation. The manipulations
* may introduce data and process it. The 'value' in question is used
* implicitly as a stack in most cases.
*/
typedef struct wikrt_cx wikrt_cx;

Expand Down Expand Up @@ -137,7 +131,7 @@ typedef enum wikrt_err
, WIKRT_BUFFSZ = (1<< 5) // output buffer too small

// Transactions
, WIKRT_CONFLICT = (1<< 6) // transaction state conflict
, WIKRT_CONFLICT = (1<< 6) // any transaction conflict

// Evaluations
, WIKRT_QUOTA_STOP = (1<< 7) // halted on time/effort quota
Expand All @@ -147,21 +141,45 @@ typedef enum wikrt_err
/** @brief Translate a single wikrt_err to human text. */
char const* wikrt_strerr(wikrt_err);

/** @brief Open or Create a Wikilon environment.
/** @brief Support a simple consistency check for dynamic library.
*
* The developer specifies a directory and how much space to allocate
* for persistent storage. This space will be used for transactions and
* stowage, and is also allocated within the address space.
*
* It is possible to create an environment without a database by setting
* dirPath to NULL and dbMaxMB to 0. In this case, transactions fail and
* stowage is mostly ignored.
* Compare WIKRT_API_VER to wikrt_api_ver(). If they aren't the same,
* then your app was compiled against a different interface than the
* dynamic lib implements. This is just a simple sanity check.
*/
uint32_t wikrt_api_ver();
#define WIKRT_API_VER 20160328

/** @brief Configuration options for an environment.
*
* Zero or NULL is always an accepted value, resulting in some default or
* disabling the associated feature. All contexts are the same size in
* Wikilon because this simplifies parallelism and eventual flyweight
* allocation features.
*
* wikrt_env_create will attempt to recursively create a directory if it
* does not exist. The most likely error is WIKRT_DBERR if the directory
* cannot be created or opened.
* Wikilon runtime will attempt to create dirPath (including parents) if
* necessary and feasible.
*
* Context size is currently limited to about 4 gigabytes. Wikilon runtime
* uses 32-bit within a context. Use of stowage is necessary for computations
* that will process more than 4GB at once, the idea being to treat memory as
* a cache for stowed data.
*
* Context address space is currently doubled for semispace memory management.
*/
wikrt_err wikrt_env_create(wikrt_env**, char const* dirPath, uint32_t dbMaxMB);
typedef struct wikrt_env_options
{
char const* dirPath; // where is our persistent database and stowage?
uint32_t dbMaxMB; // how much space (megabytes) for database and stowage?
uint32_t cxMemMB; // how much space (megabytes) for context memory?
uint32_t maxPar; // how many {&par} threads (shared by contexts)?
} wikrt_env_options;

#define WIKRT_CX_MIN_SIZE 4
#define WIKRT_CX_MAX_SIZE 4092

/** @brief Open or Create a Wikilon environment with given options. */
wikrt_err wikrt_env_create(wikrt_env**, wikrt_env_options const*);

/** @brief Destroy the environment.
*
Expand All @@ -182,37 +200,20 @@ void wikrt_env_sync(wikrt_env*);
*
* This creates a new shared-nothing context in the environment with
* a given size in megabytes. The context initially contains the unit
* value. At the moment, we're limited to contexts between 3 and 4092
* megabytes in size (due to use of 32-bit words under the hood).
*/
wikrt_err wikrt_cx_create(wikrt_env*, wikrt_cx**, uint32_t sizeMB);

#define WIKRT_CX_SIZE_MIN 4
#define WIKRT_CX_SIZE_MAX 4092

/** @brief Lightweight external parallelism.
*
* Use of `wikrt_cx_fork()` creates a lightweight context that shares
* the same memory as its parent. Wikilon runtime doesn't support the
* aliasing of memory, so context values remain separate. But shared
* memory does enable an efficient `wikrt_move()` to communicate data
* between two contexts (i.e. without a deep copy).
*
* Active forks tend each to acquire a few megabytes working memory.
* Be certain to account for this in `wikrt_cx_create()`. Passive forks
* that merely hold or transport data, OTOH, don't require extra space.
* value, but may be loaded with more data via the `wikrt_intro` verbs
* or key-value reads against the implicit database.
*/
wikrt_err wikrt_cx_fork(wikrt_cx*, wikrt_cx**);
wikrt_err wikrt_cx_create(wikrt_env*, wikrt_cx**);

/** @brief Destroy a context and recover memory.
*
* Destroying a context will automatically free the bound values, abort
* a transaction if necessary, and release working memory (thread-local
* free lists). If it's the last context associated with wikrt_cx_create,
* the underlying volume of memory is unmapped and returned to the OS.
*
* A wikrt_cx_fork context will hold onto the underlying shared space.
/* Note: I originally pursued `wikrt_cx_fork()` as a basis for lightweight
* external parallelism. At this time, however, I feel the synchronization
* overheads, memory fragmentation, and other costs are not worthwhile. If
* external parallelism is necessary, we can instead copy values from one
* context to another which isn't cheap but is only paid for at specific
* boundaries.
*/

/** @brief Destroy a context and recover memory. */
void wikrt_cx_destroy(wikrt_cx*);

/** @brief A context knows its parent environment. */
Expand Down Expand Up @@ -391,20 +392,8 @@ bool wikrt_valid_token(char const* s);
*
* For the left context, this has type `(a*b)→b`. For the right context,
* this has type `c→(a*c)`. The `a` value is moved from the left context
* to the right context. Move returns WIKRT_INVAL and does nothing if
* left and right contexts are identical.
*
* Between different forks that share the same memory, `wikrt_move()` is
* a non-allocating, non-copying, O(1) operation, guaranteed to succeed
* if the argument has the correct type. Otherwise, the value is copied
* to the right hand context. There is some risk of memory fragmentation
* when values are moved between forks, i.e. due to reduced locality.
*
* Note that the `wikrt_move` call requires exclusive control over both
* contexts. In many cases, it may be useful to create intermediate
* 'messenger' forks to carry data between threads, i.e. such that the
* creator pushes data to the messenger then the receiver takes data
* from the messenger.
* to the right context. Move fails if left and right contexts are the
* same, or if the RHS context isn't large enough.
*/
wikrt_err wikrt_move(wikrt_cx*, wikrt_cx*);

Expand Down Expand Up @@ -436,15 +425,9 @@ wikrt_err wikrt_copy(wikrt_cx*, wikrt_ss*);

/** @brief Combined copy and move operation. Fail-safe.
*
* For the left context, this has type (a*b)→(a*b). For the right context
* it has type (c)→(a*c). This corresponds to wikrt_copy followed immediately
* by wikrt_move, albeit in one step. The combination avoids an intermediate
* copy and reduces risk of memory fragmentation from wikrt_move.
*
* This is roughly wikrt_copy followed by wikrt_move, albeit in one step.
* The combined action has some advantages: it avoids an intermediate copy
* between two different memories, and it potentially mitigates memory
* fragmentation between forks (copies to the target's working memory).
* This is equivalent to wikrt_copy followed by wikrt_move in one step.
* It has performance advantages over performing copy and move in separate
* steps.
*/
wikrt_err wikrt_copy_move(wikrt_cx*, wikrt_ss*, wikrt_cx*);

Expand Down

0 comments on commit e50d54f

Please sign in to comment.