diff --git a/docs/ABCRT.md b/docs/ABCRT.md index 0d98f74..6bd49ac 100644 --- a/docs/ABCRT.md +++ b/docs/ABCRT.md @@ -36,6 +36,12 @@ I'll actually include a copy of these directly, no shared libs. With LMDB, I have the opportunity to use a single-process or multi-process access to the database. Multi-process access to a database would be very convenient if I want to later develop shells, command-line interpreters, etc. to access this database in the background. But that isn't a use case for Wikilon. For now I should probably favor single-process access using a lockfile, since it's much easier (with LMDB) to go from single-process to multi-process than vice versa. +If I do favor multi-process access, I'll need to somehow track ephemeral stowage references safely between processes. This is non-trivial. + +### Regarding Error Reporting + +Originally I was returning an 'error' value for most functions. I'm beginning to think this was a bad idea. First, it overly complicates the client API. Second, most errors are not recoverable. Third, tracking the exact cause of error is expensive. I may be better served by accumulating errors, similar to a context-specific `errno` or an error list of sorts, or by wrapping error values. By simply tracking, at the context level, whether an error has occurred. + ## Structure I need an *environment* bound to an underlying stowage and persistence model, e.g. via LMDB. The environment may also be associated with worker threads for par-seq parallelism. I also need *contexts* that associate an environment with a heap of memory for computations. @@ -52,15 +58,11 @@ Originally I was going for a more conventional API where values have handles out ### Memory Management -ABC is well suited for manual memory management due to its explicit copy and drop operators. This couples nicely with a preference for linear structure and 'move' semantics, deep copies favored over aliasing. Linearity allows me to implement many functions (or their common cases) without allocation. +ABC is well suited for manual memory management due to its explicit copy and drop operators. This couples nicely with a preference for linear structure and 'move' semantics, deep copies favored over aliasing. Linearity allows me to implement many functions (or their common cases) without allocation. -*Thoughts:* I've frequently contemplated a bump-pointer nursery. This is feasible, perhaps, though it might require rewriting data plumbing operations to allocate within the nursery (lest `l` and `r` and the like produce references from the old space to the new space). This would probably complicate code a fair bit... so I'll hold off for now. Without a bump-pointer nursery, I cannot guarantee stack-like locality for 'new' values. +Bump-pointer allocators, together with two-space collectors, would also serve me very well. Parallelism would be 'heavy' in the sense that it copies computation data, but could also be 'lightweight' in the sense that it only triggers upon compacting GC - i.e. when data is to be copied regardless. Two-space compacting could drive both stowage and quotas, and the extra space would be convenient for stowage without runtime allocations (assuming stowage is no more expensive than a local representation). Stowage would serve as the implicit 'old space' where we no longer copy-collect data. -For the moment, I use a conventional size-segregated free-list and quick-fit mechanisms. With this, fragmentation is a problem, especially in a multi-threaded scenario. Use of `wikrt_move()` is a fragmentation hazard because it results in data being free'd in a separate thread from the one where it was allocated. When that memory is then reused, local fragmentation worsens. I'll accept high fragmentation so I can move forward with implementation. - -Latent destruction of large values is a related, viable feature that could improve cx_destroy performance for large contexts values. OTOH, it limits an easy means to track substructural properties. So I'll probably leave this feature out for now. - -At the moment, copy on large stacks and lists and such will tend to use slab-allocations for a large 'spine' value, but we will 'free' the same slab one element at a time. I'm not sure what effect this will have on fragmentation - it might help or hurt. +A potential difficulty with a moving collector is that I must represent active computation state within the memory region and deal with any changes in pointers after each allocation. I might achieve this by modeling a small set of active registers or similar... i.e. our active computation stack, as necessary. It would also serve as a scratch space as needed. ## Representations @@ -148,14 +150,23 @@ ABC's discretionary value sealers. I'll optimize for 3-character (or fewer) disc ### Value Stowage -We can annotate a value to moved to persistent storage, or transparently reverse this. +We annotate a value to moved to persistent storage, or transparently reverse this. -*Thoughts*: If stowage is constrained to sealed values, that may simplify implementation code because I only need to handle transparent expansion as part of an 'unseal' operation. Further, this would enable me to present better metadata about sealed values. I believe this is a path worth pursuing or attempting. This is an easy constraint to enforce, and I may proceed with it. +It might be useful to align stowage with value sealers, i.e. such that stowage may only occur at sealed values. This would simplify processing of stowage because the 'transparent' stowage would only need to be handled at an unseal operation. Further, it would provide some typeful metadata around each node - relatively convenient for debugging. Conversely, having implicit stowage upon sealing a value could be convenient for space management purposes. -I may need similar alignment for parallelism or laziness, to avoid true transparent transforms. A simple notion of parallelism occurring behind a sealer is interesting because it would also align nicely with a notion of distributed computing (each sealer acts as a logical space for computation). So we could have something like `{&par:sealer}` that operates on sealed values of the appropriate type. OTOH, I could just require `{&par}` be paired with `{&seq}` for best performance. +(Parallel computations may benefit from similar techniques, to avoid 'transparent' translation of types and to essentially name the computation.) ### Computations +*Thoughts:* Lightweight parallelism has been a sticky point for me. It complicates memory management, increases risk of memory fragmentation. Heavy-weight shared-nothing parallelism could reduce synchronization a great deal, but has a larger overhead to copy data between threads. This could be mitigated by having some shared resources (e.g. binaries, blocks). + + +I'd benefit from focusing instead on fast alloc/free and fast processing within a context. Fast logical copy may also prove more useful than I was initially thinking, at least for specific value types (e.g. blocks, maybe binaries and arrays). Though, I could presumably achieve that much with reference counting. + +An interesting option, perhaps, is to combine a fast bump-pointer alloc and GC (instead of `free`) with scope-limited copying of blocks etc.. + + + I'd prefer to avoid arbitrary 'force to type' code scattered about. This would impact stowage, laziness, parallelism. If I enforce that stowage, laziness, and parallelism are always behind a sealer token, this would simplify my runtime code. This alignment may also simplify presentation and metadata, and eventual distribution. So, for now, let's assume I'll be aiming for this feature. If I support lazy computations, that can potentially simplify many things such as streaming outputs from block-to-text conversions. Laziness is easy to express via annotation, e.g. `{&lazy}` could tag a block. One difficulty with laziness is that we need either to integrate state or to treat them somehow as linear objects (no copy). The other difficulty is integrating laziness appropriately with effort quotas - i.e. it requires we shift the quota to our context, somehow, rather than to a larger evaluation. @@ -209,7 +220,7 @@ ABC supports integers. But other numbers could be supported via arithmetic accel One of the bigger challenges will be supporting evaluation limits, i.e. effort quotas. - +## Parallelism # older content diff --git a/wikilon-runtime/Makefile b/wikilon-runtime/Makefile index bd315b0..f1c2351 100644 --- a/wikilon-runtime/Makefile +++ b/wikilon-runtime/Makefile @@ -5,7 +5,7 @@ THREADS := -pthread OPT := -O2 -flto CFLAGS := $(THREADS) $(OPT) $(W) -fPIC -g -std=gnu11 CC := gcc $(CFLAGS) -OBJECTS := lmdb/mdb.o lmdb/midl.o murmur3/murmur3.o futil.o utf8.o wikrt.o wikrt_mem.o wikrt_db.o +OBJECTS := lmdb/mdb.o lmdb/midl.o murmur3/murmur3.o futil.o utf8.o wikrt.o wikrt_db.o INSTALL_PREFIX := /usr/local .PHONY: all lib clean install uninstall runtests @@ -45,9 +45,6 @@ libwikilon-runtime.so: $(OBJECTS) wikrt.lds wikrt.o: wikrt.c wikrt.h utf8.h lmdb/lmdb.h $(CC) -o $@ -c wikrt.c -wikrt_mem.o: wikrt_mem.c wikrt.h - $(CC) -o $@ -c wikrt_mem.c - wikrt_db.o: wikrt_db.c wikrt.h futil.h lmdb/lmdb.h $(CC) -o $@ -c wikrt_db.c diff --git a/wikilon-runtime/testSuite.c b/wikilon-runtime/testSuite.c index 5db6333..1eebcd9 100644 --- a/wikilon-runtime/testSuite.c +++ b/wikilon-runtime/testSuite.c @@ -686,16 +686,12 @@ void run_tests(wikrt_cx* cx, int* runct, int* passct) { #define TCX(T) \ { \ ++(*runct); \ - wikrt_cx* fork; \ - wikrt_cx_fork(cx,&fork); \ - assert(NULL != fork); \ - bool const pass = T(fork); \ + bool const pass = T(cx); \ if(pass) { ++(*passct); } \ else { \ char const* name = #T ; \ fprintf(stderr, errFmt, *runct, name); \ } \ - wikrt_cx_destroy(fork); \ } TCX(test_tcx); diff --git a/wikilon-runtime/wikilon-runtime.h b/wikilon-runtime/wikilon-runtime.h index 37a015c..3c54416 100644 --- a/wikilon-runtime/wikilon-runtime.h +++ b/wikilon-runtime/wikilon-runtime.h @@ -34,10 +34,9 @@ * need for external persistence (e.g. no need for true filesystem * access). * - * - Parallelism. Modulo space requirements, pure computations behave - * independently of evaluation order. Divide and conquer tactics - * are effective if we can divide into coarse-grained tasks. ABC - * easily supports par/seq parallelism. + * - Parallelism. Pure computations can be parallelized easily because + * their behavior is independent of evaluation order. Though, they + * may fail non-deterministically due to memory consumption. * * Wikilon runtime shall support these techniques. I'll also support * an integrated key-value store for stowage-friendly persistence. @@ -100,15 +99,10 @@ typedef struct wikrt_env wikrt_env; /** @brief Opaque structure representing a context for computation. * - * A wikrt_cx - a wikilon runtime context - represents both a value and - * a space in which computations may be performed. Contexts are single - * threaded (that is, they must be used from a single thread). It is - * possible to communicate between contexts by use of `wikrt_move()`. - * - * A context contains a single value, to which we may apply a stream of - * functions (much like Awelon Bytecode). Values include unit, products, - * sums, integers, blocks of code, sealed values, and optimized encodings - * for texts, binaries, lists, etc.. A new context has the unit value. + * A wikrt_cx holds a single value that may be manipulated by a single + * external thread in order to perform a computation. The manipulations + * may introduce data and process it. The 'value' in question is used + * implicitly as a stack in most cases. */ typedef struct wikrt_cx wikrt_cx; @@ -137,7 +131,7 @@ typedef enum wikrt_err , WIKRT_BUFFSZ = (1<< 5) // output buffer too small // Transactions -, WIKRT_CONFLICT = (1<< 6) // transaction state conflict +, WIKRT_CONFLICT = (1<< 6) // any transaction conflict // Evaluations , WIKRT_QUOTA_STOP = (1<< 7) // halted on time/effort quota @@ -147,21 +141,45 @@ typedef enum wikrt_err /** @brief Translate a single wikrt_err to human text. */ char const* wikrt_strerr(wikrt_err); -/** @brief Open or Create a Wikilon environment. +/** @brief Support a simple consistency check for dynamic library. * - * The developer specifies a directory and how much space to allocate - * for persistent storage. This space will be used for transactions and - * stowage, and is also allocated within the address space. - * - * It is possible to create an environment without a database by setting - * dirPath to NULL and dbMaxMB to 0. In this case, transactions fail and - * stowage is mostly ignored. + * Compare WIKRT_API_VER to wikrt_api_ver(). If they aren't the same, + * then your app was compiled against a different interface than the + * dynamic lib implements. This is just a simple sanity check. + */ +uint32_t wikrt_api_ver(); +#define WIKRT_API_VER 20160328 + +/** @brief Configuration options for an environment. + * + * Zero or NULL is always an accepted value, resulting in some default or + * disabling the associated feature. All contexts are the same size in + * Wikilon because this simplifies parallelism and eventual flyweight + * allocation features. * - * wikrt_env_create will attempt to recursively create a directory if it - * does not exist. The most likely error is WIKRT_DBERR if the directory - * cannot be created or opened. + * Wikilon runtime will attempt to create dirPath (including parents) if + * necessary and feasible. + * + * Context size is currently limited to about 4 gigabytes. Wikilon runtime + * uses 32-bit within a context. Use of stowage is necessary for computations + * that will process more than 4GB at once, the idea being to treat memory as + * a cache for stowed data. + * + * Context address space is currently doubled for semispace memory management. */ -wikrt_err wikrt_env_create(wikrt_env**, char const* dirPath, uint32_t dbMaxMB); +typedef struct wikrt_env_options +{ + char const* dirPath; // where is our persistent database and stowage? + uint32_t dbMaxMB; // how much space (megabytes) for database and stowage? + uint32_t cxMemMB; // how much space (megabytes) for context memory? + uint32_t maxPar; // how many {&par} threads (shared by contexts)? +} wikrt_env_options; + +#define WIKRT_CX_MIN_SIZE 4 +#define WIKRT_CX_MAX_SIZE 4092 + +/** @brief Open or Create a Wikilon environment with given options. */ +wikrt_err wikrt_env_create(wikrt_env**, wikrt_env_options const*); /** @brief Destroy the environment. * @@ -182,37 +200,20 @@ void wikrt_env_sync(wikrt_env*); * * This creates a new shared-nothing context in the environment with * a given size in megabytes. The context initially contains the unit - * value. At the moment, we're limited to contexts between 3 and 4092 - * megabytes in size (due to use of 32-bit words under the hood). - */ -wikrt_err wikrt_cx_create(wikrt_env*, wikrt_cx**, uint32_t sizeMB); - -#define WIKRT_CX_SIZE_MIN 4 -#define WIKRT_CX_SIZE_MAX 4092 - -/** @brief Lightweight external parallelism. - * - * Use of `wikrt_cx_fork()` creates a lightweight context that shares - * the same memory as its parent. Wikilon runtime doesn't support the - * aliasing of memory, so context values remain separate. But shared - * memory does enable an efficient `wikrt_move()` to communicate data - * between two contexts (i.e. without a deep copy). - * - * Active forks tend each to acquire a few megabytes working memory. - * Be certain to account for this in `wikrt_cx_create()`. Passive forks - * that merely hold or transport data, OTOH, don't require extra space. + * value, but may be loaded with more data via the `wikrt_intro` verbs + * or key-value reads against the implicit database. */ -wikrt_err wikrt_cx_fork(wikrt_cx*, wikrt_cx**); +wikrt_err wikrt_cx_create(wikrt_env*, wikrt_cx**); -/** @brief Destroy a context and recover memory. - * - * Destroying a context will automatically free the bound values, abort - * a transaction if necessary, and release working memory (thread-local - * free lists). If it's the last context associated with wikrt_cx_create, - * the underlying volume of memory is unmapped and returned to the OS. - * - * A wikrt_cx_fork context will hold onto the underlying shared space. +/* Note: I originally pursued `wikrt_cx_fork()` as a basis for lightweight + * external parallelism. At this time, however, I feel the synchronization + * overheads, memory fragmentation, and other costs are not worthwhile. If + * external parallelism is necessary, we can instead copy values from one + * context to another which isn't cheap but is only paid for at specific + * boundaries. */ + +/** @brief Destroy a context and recover memory. */ void wikrt_cx_destroy(wikrt_cx*); /** @brief A context knows its parent environment. */ @@ -391,20 +392,8 @@ bool wikrt_valid_token(char const* s); * * For the left context, this has type `(a*b)ā†’b`. For the right context, * this has type `cā†’(a*c)`. The `a` value is moved from the left context - * to the right context. Move returns WIKRT_INVAL and does nothing if - * left and right contexts are identical. - * - * Between different forks that share the same memory, `wikrt_move()` is - * a non-allocating, non-copying, O(1) operation, guaranteed to succeed - * if the argument has the correct type. Otherwise, the value is copied - * to the right hand context. There is some risk of memory fragmentation - * when values are moved between forks, i.e. due to reduced locality. - * - * Note that the `wikrt_move` call requires exclusive control over both - * contexts. In many cases, it may be useful to create intermediate - * 'messenger' forks to carry data between threads, i.e. such that the - * creator pushes data to the messenger then the receiver takes data - * from the messenger. + * to the right context. Move fails if left and right contexts are the + * same, or if the RHS context isn't large enough. */ wikrt_err wikrt_move(wikrt_cx*, wikrt_cx*); @@ -436,15 +425,9 @@ wikrt_err wikrt_copy(wikrt_cx*, wikrt_ss*); /** @brief Combined copy and move operation. Fail-safe. * - * For the left context, this has type (a*b)ā†’(a*b). For the right context - * it has type (c)ā†’(a*c). This corresponds to wikrt_copy followed immediately - * by wikrt_move, albeit in one step. The combination avoids an intermediate - * copy and reduces risk of memory fragmentation from wikrt_move. - * - * This is roughly wikrt_copy followed by wikrt_move, albeit in one step. - * The combined action has some advantages: it avoids an intermediate copy - * between two different memories, and it potentially mitigates memory - * fragmentation between forks (copies to the target's working memory). + * This is equivalent to wikrt_copy followed by wikrt_move in one step. + * It has performance advantages over performing copy and move in separate + * steps. */ wikrt_err wikrt_copy_move(wikrt_cx*, wikrt_ss*, wikrt_cx*); diff --git a/wikilon-runtime/wikrt.c b/wikilon-runtime/wikrt.c index 93398fc..111a9a1 100644 --- a/wikilon-runtime/wikrt.c +++ b/wikilon-runtime/wikrt.c @@ -8,7 +8,6 @@ #include "wikrt.h" void wikrt_acquire_shared_memory(wikrt_cx* cx, wikrt_sizeb sz); -static void wikrt_cx_init(wikrt_cxm*, wikrt_cx*); char const* wikrt_abcd_operators() { // currently just pure ABC... @@ -93,21 +92,34 @@ bool wikrt_valid_token(char const* cstr) { return true; } -wikrt_err wikrt_env_create(wikrt_env** ppEnv, char const* dirPath, uint32_t dbMaxMB) { +uint32_t wikrt_api_ver() +{ + _Static_assert(WIKRT_API_VER < UINT32_MAX, "bad value for WIKRT_API_VER"); + return WIKRT_API_VER; +} + +wikrt_err wikrt_env_create(wikrt_env** ppEnv, wikrt_env_options const* opts) { _Static_assert(WIKRT_SIZE_MAX >= 4294967295, "minimum 32-bit words (for texts, etc.)"); _Static_assert(WIKRT_CELLSIZE == WIKRT_CELLBUFF(WIKRT_CELLSIZE), "cell size must be a power of two"); _Static_assert(WIKRT_PAGESIZE == WIKRT_PAGEBUFF(WIKRT_PAGESIZE), "page size must be a power of two"); + assert((NULL != ppEnv) && (NULL != opts)); (*ppEnv) = NULL; + size_t const cxMemMB = (0 == opts->cxMemMB) ? WIKRT_CX_MIN_SIZE : (size_t)(opts->cxMemMB); + bool const okCxSize = ((WIKRT_CX_MIN_SIZE <= cxMemMB) && (cxMemMB <= WIKRT_CX_MAX_SIZE)) + && (cxMemMB < (SIZE_MAX >> 21)); // last condition to prevent overflow on 32-bit systems + if(!okCxSize) { return WIKRT_INVAL; } + wikrt_env* const e = calloc(1, sizeof(wikrt_env)); - if(NULL == e) return WIKRT_NOMEM; + if(NULL == e) { return WIKRT_NOMEM; } + e->cxsize = cxMemMB << 20; e->mutex = (pthread_mutex_t) PTHREAD_MUTEX_INITIALIZER; - if(!dirPath || (0 == dbMaxMB)) { + if(!opts->dirPath || (0 == opts->dbMaxMB)) { e->db = NULL; - } else if(!wikrt_db_init(&(e->db), dirPath, dbMaxMB)) { + } else if(!wikrt_db_init(&(e->db), opts->dirPath, opts->dbMaxMB)) { free(e); return WIKRT_DBERR; } @@ -119,7 +131,7 @@ wikrt_err wikrt_env_create(wikrt_env** ppEnv, char const* dirPath, uint32_t dbMa } void wikrt_env_destroy(wikrt_env* e) { - assert(NULL == e->cxmlist); + assert(NULL == e->cxlist); if(NULL != e->db) { wikrt_db_destroy(e->db); } @@ -134,140 +146,81 @@ void wikrt_env_sync(wikrt_env* e) { } } -static void wikrt_cx_init(wikrt_cxm* cxm, wikrt_cx* cx) { - cx->cxm = cxm; - cx->memory = cxm->memory; - cx->val = WIKRT_UNIT; - cx->txn = WIKRT_VOID; - - wikrt_cxm_lock(cxm); { - cx->next = cxm->cxlist; - if(NULL != cx->next) { cx->next->prev = cx; } - cxm->cxlist = cx; - } wikrt_cxm_unlock(cxm); -} - -wikrt_err wikrt_cx_create(wikrt_env* e, wikrt_cx** ppCX, uint32_t sizeMB) +wikrt_err wikrt_cx_create(wikrt_env* e, wikrt_cx** ppCX) { (*ppCX) = NULL; - - bool const bSizeValid = (WIKRT_CX_SIZE_MIN <= sizeMB) - && (sizeMB <= WIKRT_CX_SIZE_MAX); - if(!bSizeValid) return WIKRT_IMPL; - wikrt_sizeb const sizeBytes = (wikrt_sizeb) ((1024 * 1024) * sizeMB); - - wikrt_cxm* const cxm = calloc(1,sizeof(wikrt_cxm)); - wikrt_cx* const cx = calloc(1,sizeof(wikrt_cx)); - if((NULL == cxm) || (NULL == cx)) { goto callocErr; } - + wikrt_cx* const cx = calloc(1, sizeof(wikrt_cx)); + if(NULL == cx) { goto callocErr; } + static int const prot = PROT_READ | PROT_WRITE | PROT_EXEC; static int const flags = MAP_ANONYMOUS | MAP_PRIVATE; - void* const memory = mmap(NULL, sizeBytes, prot, flags, -1, 0); - if(NULL == memory) { goto mmapErr; } - - cxm->env = e; - cxm->mutex = (pthread_mutex_t) PTHREAD_MUTEX_INITIALIZER; - cxm->memory = memory; - cxm->size = sizeBytes; - - // insert into environment's list of context roots + size_t const twospace_size = 2 * e->cxsize; + void* const twospace = mmap(NULL, twospace_size, prot, flags, -1, 0); + if(NULL == twospace) { goto mmapErr; } + + cx->env = e; + cx->mem = twospace; + cx->ssp = (void*)(cx->size + ((char*)cx->mem)); + cx->alloc = WIKRT_ALLOC_START; + cx->size = (wikrt_size)(e->cxsize); + cx->val = WIKRT_UNIT; + cx->txn = WIKRT_VOID; + + // track context in list wikrt_env_lock(e); { - cxm->next = e->cxmlist; - if(NULL != cxm->next) { cxm->next->prev = cxm; } - e->cxmlist = cxm; + cx->cxnext = e->cxlist; + if(NULL != cx->cxnext) { cx->cxnext->cxprev = cx; } + e->cxlist = cx; } wikrt_env_unlock(e); - // I'll block cell 0 from allocation (it's used for 'unit'.) - // For alignment, allocation of first page is delayed. It might - // be useful to eventually support page-aligned memory in the cxm. - wikrt_fl_free(memory, &(cxm->fl), (WIKRT_PAGESIZE - WIKRT_CELLSIZE), WIKRT_CELLSIZE); // first page minus first cell - wikrt_fl_free(memory, &(cxm->fl), (sizeBytes - WIKRT_PAGESIZE), WIKRT_PAGESIZE); // all pages after the first - - // initialize thread-local context - wikrt_cx_init(cxm, cx); - (*ppCX) = cx; return WIKRT_OK; mmapErr: callocErr: - free(cxm); free(cx); return WIKRT_NOMEM; } -wikrt_err wikrt_cx_fork(wikrt_cx* cx, wikrt_cx** pfork) -{ - wikrt_cxm* const cxm = cx->cxm; - (*pfork) = calloc(1, sizeof(wikrt_cx)); - if(NULL == (*pfork)) { return WIKRT_NOMEM; } - wikrt_cx_init(cxm, (*pfork)); - return WIKRT_OK; -} - void wikrt_cx_destroy(wikrt_cx* cx) { - // drop bound values to recover memory - wikrt_drop_v(cx, cx->val, NULL); cx->val = WIKRT_VOID; - wikrt_txn_abort(cx); + // remove context from environment. + wikrt_env* const e = cx->env; + wikrt_env_lock(e); { + if(NULL != cx->cxnext) { cx->cxnext->cxprev = cx->cxprev; } + if(NULL != cx->cxprev) { cx->cxprev->cxnext = cx->cxnext; } + else { assert(cx == e->cxlist); e->cxlist = cx->cxnext; } + } wikrt_env_unlock(e); - // remove cx from cxm - wikrt_cxm* const cxm = cx->cxm; - wikrt_cxm_lock(cxm); { - wikrt_fl_merge(cx->memory, &(cx->fl), &(cxm->fl)); - if(NULL != cx->next) { cx->next->prev = cx->prev; } - if(NULL != cx->prev) { cx->prev->next = cx->next; } - else { assert(cx == cxm->cxlist); cxm->cxlist = cx->next; } - } wikrt_cxm_unlock(cxm); - - free(cx); + // TODO: consider flyweight allocator for a few contexts. - if(NULL == cxm->cxlist) { - // last context for this memory destroyed. - // remove context memory from environment - wikrt_env* const e = cxm->env; - wikrt_env_lock(e); { - if(NULL != cxm->next) { cxm->next->prev = cxm->prev; } - if(NULL != cxm->prev) { cxm->prev->next = cxm->next; } - else { assert(cxm == e->cxmlist); e->cxmlist = cxm->next; } - } wikrt_env_unlock(e); - - // release memory back to operating system - errno = 0; - int const unmapStatus = munmap(cxm->memory, cxm->size); - bool const unmapSucceeded = (0 == unmapStatus); - if(!unmapSucceeded) { - fprintf(stderr,"Failure to unmap memory (%s) when destroying context.\n", strerror(errno)); - abort(); // this is some sort of OS failure. - } - pthread_mutex_destroy(&(cxm->mutex)); - free(cxm); + // free the memory-mapped twospace + errno = 0; + void* const twospace = (cx->mem < cx->ssp) ? cx->mem : cx->ssp; + size_t const twospace_size = 2 * e->cxsize; + int const unmapStatus = munmap(twospace, twospace_size); + if(0 != unmapStatus) { + fprintf(stderr,"Failure to unmap memory (%s) when destroying context.\n", strerror(errno)); + abort(); // this is some sort of OS failure. } + + // recover memory from context structure. + free(cx); } wikrt_env* wikrt_cx_env(wikrt_cx* cx) { - return cx->cxm->env; + return cx->env; } wikrt_err wikrt_move(wikrt_cx* const lcx, wikrt_cx* const rcx) { - if(lcx == rcx) { return WIKRT_INVAL; } - - if(lcx->cxm == rcx->cxm) { - // local move between forks, non-allocating. - wikrt_val const v = lcx->val; - if(!wikrt_p(v)) { return WIKRT_TYPE_ERROR; } - wikrt_val* const pv = wikrt_pval(lcx, v); - lcx->val = pv[1]; - pv[1] = rcx->val; - rcx->val = v; - return WIKRT_OK; - } else { - // fail-safe move via copy to rhs then drop from lhs. - wikrt_err const st = wikrt_copy_move(lcx, NULL, rcx); - if(WIKRT_OK == st) { wikrt_drop(lcx, NULL); } - return st; - } + // this function could feasibly be optimized, especially if + // I later restore the wikrt_cx_fork() feature. But for now, + // we'll simply deep copy the value then drop the original + // if the copy succeeds. + wikrt_err const st = wikrt_copy_move(lcx, NULL, rcx); + if(WIKRT_OK == st) { wikrt_drop(lcx, NULL); } + return st; } wikrt_err wikrt_copy(wikrt_cx* cx, wikrt_ss* ss) @@ -1948,6 +1901,7 @@ wikrt_err wikrt_unwrap_sum_v(wikrt_cx* cx, bool* inRight, wikrt_val* v) return WIKRT_OK; } else { // Note: I reserve one cell for fail-safe data plumbing. + // This ensures I can rebuild the sum type after expansion. wikrt_addr reserved_cell; if(!wikrt_alloc(cx, WIKRT_CELLSIZE, &reserved_cell)) { return WIKRT_CXFULL; } wikrt_err const st = wikrt_expand_sum_v(cx, v); diff --git a/wikilon-runtime/wikrt.h b/wikilon-runtime/wikrt.h index 276dc75..94859f5 100644 --- a/wikilon-runtime/wikrt.h +++ b/wikilon-runtime/wikrt.h @@ -69,14 +69,10 @@ typedef enum wikrt_opcode_ext #define WIKRT_PAGESIZE (1 << 17) #define WIKRT_PAGEBUFF(sz) WIKRT_LNBUFF_POW2(sz, WIKRT_PAGESIZE) -// free list management -#define WIKRT_FLCT_QF 16 // quick-fit lists (sep by cell size) -#define WIKRT_FLCT_FF 10 // first-fit lists (exponential) -#define WIKRT_FLCT (WIKRT_FLCT_QF + WIKRT_FLCT_FF) -#define WIKRT_QFSIZE (WIKRT_FLCT_QF * WIKRT_CELLSIZE) -#define WIKRT_FFMAX (WIKRT_QFSIZE * (1 << (WIKRT_FLCT_FF - 1))) -#define WIKRT_QFCLASS(sz) ((sz - 1) / WIKRT_CELLSIZE) -#define WIKRT_FREE_THRESH (1 << 21) +// I'll reserve one cell at the start for error capture on CXFULL. +// The zero address is also reserved for unit and void values. +#define WIKRT_ALLOC_START (2 * WIKRT_CELLSIZE) + // for lockfile, LMDB file #define WIKRT_FILE_MODE (mode_t)(S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP) @@ -375,9 +371,9 @@ void wikrt_db_flush(wikrt_db*); struct wikrt_env { wikrt_db *db; - wikrt_cxm *cxmlist; // linked list of context roots - pthread_mutex_t mutex; // shared mutex for environment - uint64_t cxm_created; // stat: wikrt_cx_create() count. + wikrt_cx *cxlist; // linked list of context roots + size_t cxsize; // size for each new context + pthread_mutex_t mutex; // shared mutex for environment }; static inline void wikrt_env_lock(wikrt_env* e) { @@ -385,149 +381,83 @@ static inline void wikrt_env_lock(wikrt_env* e) { static inline void wikrt_env_unlock(wikrt_env* e) { pthread_mutex_unlock(&(e->mutex)); } -/** wikrt size class index, should be in 0..(WIKRT_FLCT-1) */ -typedef int wikrt_sc; - -/** A circular free list of (size, next) pairs. - * - * Note that the head element of a circular singly-linked free list is - * also the last one allocated (since we cannot update its prior ref). - * It's closer in nature to a tail pointer. - */ -typedef wikrt_addr wikrt_flst; - -/** @brief Size-segregated free lists. - * - * Each of these 'free lists' is a *circular* free-list of (size, next) - * pairs, or empty (address 0). The circular structure is intended to - * simplify splicing of lists while reducing header memory overhead. - * - * Todo: minimize fragmentation between contexts. +/** wikrt_cx internal state. * - * For now, I'll move forward with the current implementation. - */ -typedef struct wikrt_fl { - wikrt_size free_bytes; - wikrt_size frag_count; - wikrt_flst size_class[WIKRT_FLCT]; -} wikrt_fl; - -bool wikrt_fl_alloc(void* mem, wikrt_fl*, wikrt_sizeb, wikrt_addr*); -void wikrt_fl_free(void* mem, wikrt_fl*, wikrt_sizeb, wikrt_addr); // -void wikrt_fl_coalesce(void* mem, wikrt_fl*); // combines adjacent free blocks -void wikrt_fl_merge(void* mem, wikrt_fl* src, wikrt_fl* dst); // moves free blocks from src to dst - -/** Shared state for multi-threaded contexts. */ -struct wikrt_cxm { - // doubly-linked list of contexts for env. - wikrt_cxm *next; - wikrt_cxm *prev; - - // for now, keeping a list of associated contexts - wikrt_cx *cxlist; - - // lock for shared state within context - pthread_mutex_t mutex; - - // shared environment for multiple contexts. + * I've decided to try a bump-pointer allocation with a semi-space + * collection process. + */ +struct wikrt_cx { + // doubly-linked list of contexts in environment. + wikrt_cx *cxnext; + wikrt_cx *cxprev; wikrt_env *env; - // primary context memory - wikrt_size size; - void *memory; + // maybe add a mutex, if necessary - // root free-list, shared between threads - wikrt_fl fl; -}; + // Memory + void* mem; // active memory + wikrt_addr alloc; // bump-pointer allocation + wikrt_size size; // size of memory -static inline void wikrt_cxm_lock(wikrt_cxm* cxm) { - pthread_mutex_lock(&(cxm->mutex)); } -static inline void wikrt_cxm_unlock(wikrt_cxm* cxm) { - pthread_mutex_unlock(&(cxm->mutex)); } + wikrt_val val; // primary value + wikrt_val txn; // transaction data -/* The 'wikrt_cx' is effectively the thread-local storage for - * wikilon runtime computations. It's assumed this is used from - * only one thread. - * - * Todo: - * latent destruction of items - * - */ -struct wikrt_cx { - wikrt_cx *next; // sibling context - wikrt_cx *prev; // sibling context - wikrt_cxm *cxm; // shared memory structures - - wikrt_val txn; // context's transaction - wikrt_val val; // context's held value + // semispace garbage collection. + void* ssp; // for GC, scratch + wikrt_size compaction_size; // memory after compaction + wikrt_size compaction_count; // count of compactions - void *memory; // main memory - wikrt_fl fl; // local free space - - // statistics and metrics, supports quotas and heuristics - uint64_t ct_bytes_freed; // bytes freed - uint64_t ct_bytes_alloc; // bytes allocated - uint64_t fragmentation; // fragments added to cxm + // Other... maybe move error tracking here? }; + static inline wikrt_val* wikrt_paddr(wikrt_cx* cx, wikrt_addr addr) { - return (wikrt_val*)(addr + ((char*)(cx->memory))); + return (wikrt_val*)(addr + ((char*)(cx->mem))); } static inline wikrt_val* wikrt_pval(wikrt_cx* cx, wikrt_val v) { return wikrt_paddr(cx, wikrt_vaddr(v)); } -bool wikrt_alloc(wikrt_cx*, wikrt_size, wikrt_addr*); -void wikrt_free(wikrt_cx*, wikrt_size, wikrt_addr); -void wikrt_release_mem(wikrt_cx*); -bool wikrt_realloc(wikrt_cx*, wikrt_size, wikrt_addr*, wikrt_size); +/* NOTE: Because I'm using a moving GC, I need to be careful about + * how I represent and process allocations. Any allocation I wish + * to preserve must be represented in the root set. When copying an + * array or similar, I'll need to be sure that all the contained + * values are copied upon moving them. + * + * Despite being a semi-space collector, I still use linear values + * and mutation in place where feasible. So locality shouldn't be + * a huge problem for carefully designed + */ -// Allocate a cell value tagged with WIKRT_O, WIKRT_P, WIKRT_PL, or WIKRT_PR -// note that 'dst' is only modified on success. Some code depends on this. -static inline bool wikrt_alloc_cellval(wikrt_cx* cx, wikrt_val* dst, - wikrt_tag tag, wikrt_val v0, wikrt_val v1) -{ - wikrt_addr addr; - if(!wikrt_alloc(cx, WIKRT_CELLSIZE, &addr)) { - return false; - } - (*dst) = wikrt_tag_addr(tag, addr); - wikrt_val* const pv = wikrt_paddr(cx, addr); - pv[0] = v0; - pv[1] = v1; - return true; +// copy from mem to ssp. swap ssp to mem. +void wikrt_mem_compact(wikrt_cx*); +static inline bool wikrt_mem_available(wikrt_cx* cx, wikrt_size sz) { + return ((cx->size - cx->alloc) >= sz); } +static inline bool wikrt_mem_reserve(wikrt_cx* cx, wikrt_sizeb sz) { + if(wikrt_mem_available(cx,sz)) { return true; } + wikrt_mem_compact(cx); + return wikrt_mem_available(cx,sz); } -// add a value to the main context stack -static inline wikrt_err wikrt_intro(wikrt_cx* cx, wikrt_val v) +// Allocate a given amount of space, assuming enough space is reserved. +// There is no risk of reorganizing data. But there is risk of overflow. +static inline wikrt_addr wikrt_alloc_unsafe(wikrt_cx* cx, wikrt_sizeb sz) { - if(!wikrt_alloc_cellval(cx, &(cx->val), WIKRT_P, v, cx->val)) { - wikrt_drop_v(cx, v, NULL); - return WIKRT_CXFULL; - } - return WIKRT_OK; + wikrt_addr result = cx->alloc; + cx->alloc += sz; + return result; } -// Allocate a double cell tagged WIKRT_O. -// note that 'dst' is only modified on success. Some code depends on this. -static inline bool wikrt_alloc_dcellval(wikrt_cx* cx, wikrt_val* dst, - wikrt_val v0, wikrt_val v1, wikrt_val v2, wikrt_val v3) +// Note: need to ensure that each allocation becomes a proper value on our +// stack, strictly before the next allocation. +static inline bool wikrt_alloc(wikrt_cx* cx, wikrt_size sz, wikrt_addr* addr) { - wikrt_addr addr; - if(!wikrt_alloc(cx, (2 * WIKRT_CELLSIZE), &addr)) { - return false; - } - (*dst) = wikrt_tag_addr(WIKRT_O, addr); - wikrt_val* const pv = wikrt_paddr(cx, addr); - pv[0] = v0; - pv[1] = v1; - pv[2] = v2; - pv[3] = v3; + sz = WIKRT_CELLBUFF(sz); + if(!wikrt_mem_reserve(cx, sz)) { return false; } + (*addr) = wikrt_alloc_unsafe(cx, sz); return true; } - - /* Recognize values represented entirely in the reference. */ static inline bool wikrt_copy_shallow(wikrt_val const v) { return (wikrt_i(v) || (0 == wikrt_vaddr(v))); diff --git a/wikilon-runtime/wikrt.lds b/wikilon-runtime/wikrt.lds index de280d6..0d53837 100644 --- a/wikilon-runtime/wikrt.lds +++ b/wikilon-runtime/wikrt.lds @@ -6,7 +6,6 @@ wikrt_env_sync; wikrt_cx_create; - wikrt_cx_fork; wikrt_cx_destroy; wikrt_cx_env; @@ -14,11 +13,11 @@ wikrt_abcd_expansion; wikrt_strerr; wikrt_valid_token; + wikrt_api_ver; wikrt_copy; + wikrt_copy_remote; wikrt_drop; - wikrt_move; - wikrt_copy_move; wikrt_stow; wikrt_wswap; diff --git a/wikilon-runtime/wikrt_mem.c b/wikilon-runtime/wikrt_mem.c deleted file mode 100644 index 5783b29..0000000 --- a/wikilon-runtime/wikrt_mem.c +++ /dev/null @@ -1,484 +0,0 @@ - -#include "wikrt.h" -#include -#include -//#include - -typedef struct wikrt_fb { - wikrt_size size; - wikrt_flst next; -} wikrt_fb; - -static inline wikrt_fb* wikrt_pfb(void* mem, wikrt_flst addr) { - return (wikrt_fb*) (addr + (char*)mem); -} - - -// basic strategies without fallback resources -static bool wikrt_fl_alloc_ff(void* mem, wikrt_fl* fl, wikrt_sizeb sz, wikrt_addr* addr); -static wikrt_sc wikrt_size_class_ff(wikrt_size const sz); - -static inline wikrt_sc wikrt_size_class(wikrt_size const sz) { - return (sz <= WIKRT_QFSIZE) - ? (wikrt_sc) WIKRT_QFCLASS(sz) - : wikrt_size_class_ff(sz); -} - -static wikrt_sc wikrt_size_class_ff(wikrt_size const sz) { - _Static_assert((WIKRT_CELLSIZE == sizeof(wikrt_fb)), "invalid free-block size"); - _Static_assert((WIKRT_FLCT_FF > 0), "code assumes WIKRT_FLCT_FF > 0"); - wikrt_sc sc = (WIKRT_FLCT - 1); - wikrt_size szt = WIKRT_FFMAX; - while(szt >= sz) { - szt = szt >> 1; - sc = sc - 1; - } - return sc; -} - -static inline wikrt_flst wikrt_flst_singleton(void* mem, wikrt_sizeb sz, wikrt_addr a) -{ - wikrt_fb* const pa = wikrt_pfb(mem, a); - pa->size = sz; - pa->next = a; - return a; -} - -/* merge two circular free-lists in constant time. - * - * The resulting list will allocate the `a` elements before the `b` elements. - */ -static inline wikrt_flst wikrt_flst_join(void* mem, wikrt_flst a, wikrt_flst b) -{ - if(0 == a) { return b; } - if(0 == b) { return a; } - wikrt_fb* const pa = wikrt_pfb(mem, a); - wikrt_fb* const pb = wikrt_pfb(mem, b); - wikrt_flst const hd = pa->next; - pa->next = pb->next; - pb->next = hd; - return b; -} - -/* Add to head of free-list. No coalescing adjacent memory. */ -void wikrt_fl_free(void* mem, wikrt_fl* fl, wikrt_sizeb sz, wikrt_addr addr) -{ - fl->free_bytes += sz; - fl->frag_count += 1; - wikrt_addr* const l = fl->size_class + wikrt_size_class(sz); - (*l) = wikrt_flst_join(mem, wikrt_flst_singleton(mem, sz, addr), (*l)); -} - - -/* For small allocations, we'll simply double the allocation if we couldn't - * find an exact match. This should reduce fragmentation. Large allocations - * will use first-fit. We'll always fall back on first-fit for completeness. - * - * Note: We do not coalesce automatically. Any decision to coalesce memory - * will be deferred to our caller. - */ -bool wikrt_fl_alloc(void* mem, wikrt_fl* fl, wikrt_sizeb sz, wikrt_addr* addr) -{ - _Static_assert((WIKRT_CELLSIZE == sizeof(wikrt_fb)), "free-block should match minimum allocation"); - if(sz <= WIKRT_QFSIZE) { - wikrt_flst* const l = fl->size_class + WIKRT_QFCLASS(sz); - if(0 != (*l)) { - wikrt_fb* const hd = wikrt_pfb(mem, (*l)); - (*addr) = hd->next; - if((*addr) == (*l)) { (*l) = 0; } - else { hd->next = wikrt_pfb(mem, (*addr))->next; } - fl->frag_count -= 1; - fl->free_bytes -= sz; - return true; - } else if(wikrt_fl_alloc(mem, fl, (sz << 1), addr)) { - // double sized allocation, then free latter half. - // we cannot assume l empty (first-fit leftovers) - wikrt_fl_free(mem, fl, sz, sz + (*addr)); - return true; - } - } - // fallback on first-fit - return wikrt_fl_alloc_ff(mem, fl, sz, addr); -} - -/* allocate using a first-fit strategy. - * - * This will also rotate our free list such that we always match the - * first item in the list. This rotation isn't necessarily optimal, - * but it also shouldn't hurt much due to segregation of free lists. - */ -static bool wikrt_fl_alloc_ff(void* mem, wikrt_fl* fl, wikrt_sizeb sz, wikrt_addr* addr) { - wikrt_sc sc = wikrt_size_class(sz); - do { - wikrt_flst* const l = fl->size_class + (sc++); // increment included - wikrt_flst const l0 = (*l); - if(0 == l0) { continue; } - wikrt_fb* pl = wikrt_pfb(mem, l0); - do { - wikrt_flst const a = pl->next; - wikrt_fb* const pa = wikrt_pfb(mem, a); - if(pa->size >= sz) { - (*addr) = a; - if(pa == pl) { (*l) = 0; } - else { pl->next = pa->next; } - fl->free_bytes -= pa->size; - fl->frag_count -= 1; - if(pa->size > sz) { - wikrt_fl_free(mem, fl, (pa->size - sz), (a + sz)); - } - return true; - } else { (*l) = a; pl = pa; } - } while(l0 != (*l)); - } while(sc < WIKRT_FLCT); - return false; -} - -// Optimization: Is growing allocations in place worthwhile? -// -// It's a bunch of duplicate code. It only triggers in rare cases, -// which damages robustness. It probably won't work nicely with -// rapid allocations, small increases, or multi-threading. If we -// cannot easily predict or control an optimization, it seems an -// unnecessary source of frustration... -// -// Resolution: leave it out. - -// join segregated free-lists into a single large free-list. -static inline wikrt_flst wikrt_fl_flatten(void* const mem, wikrt_fl* const fl) { - wikrt_flst r = 0; - for(wikrt_sc sc = 0; sc < WIKRT_FLCT; ++sc) { - r = wikrt_flst_join(mem, r, fl->size_class[sc]); - } - return r; -} - -// break a circular free-list into a non-circular linked list. -static inline wikrt_addr wikrt_flst_open(void* mem, wikrt_flst a) { - if(0 == a) { return 0; } // empty list - wikrt_fb* const pa = wikrt_pfb(mem, a); - wikrt_addr const hd = pa->next; - pa->next = 0; - return hd; -} - -static void wikrt_fl_split(void* const mem, wikrt_addr const hd, wikrt_addr* const a, - wikrt_size const sza, wikrt_addr* const b) -{ - // I assume sza is valid (no larger than the list) - (*a) = hd; - wikrt_addr* tl = a; - wikrt_size ct = 0; - while(ct++ != sza) { - tl = &(wikrt_pfb(mem, (*tl))->next); - } - // at this point 'tl' points to the location of the split. - (*b) = (*tl); // split remainder of list into 'b'. - (*tl) = 0; // 'a' now terminates where 'b' starts. -} - -/* Sort free nodes by ascending address, using an in-place mergesort. */ -static void wikrt_fl_mergesort(void* const mem, wikrt_addr* hd, wikrt_size const count) -{ - // base case: list of size zero or one is fully sorted - if(count < 2) { return; } - - wikrt_size const sza = count / 2; - wikrt_size const szb = count - sza; - wikrt_addr a, b; - - // split list in two and sort each half - wikrt_fl_split(mem, (*hd), &a, sza, &b); - wikrt_fl_mergesort(mem, &a, sza); - wikrt_fl_mergesort(mem, &b, szb); - - // merge sublists 'a' and 'b'. - wikrt_addr* tl = hd; - do { - if(a < b) { - (*tl) = a; - tl = &(wikrt_pfb(mem, a)->next); - a = (*tl); - if(0 == a) { (*tl) = b; return; } - } else { - (*tl) = b; - tl = &(wikrt_pfb(mem, b)->next); - b = (*tl); - if(0 == b) { (*tl) = a; return; } - } - } while(true); -} - -/* Combine all adjacent free addresses. - * - * This is O(N*lg(N)) with the number of fragments. It uses an - * in-place linked list merge sort of the fragments. - */ -void wikrt_fl_coalesce(void* mem, wikrt_fl* fl) -{ - wikrt_size const frag_count_init = fl->frag_count; - wikrt_size const free_bytes_init = fl->free_bytes; - - // obtain the coalesced, address-sorted list of free nodes. - wikrt_addr a = wikrt_flst_open(mem, wikrt_fl_flatten(mem, fl)); - wikrt_fl_mergesort(mem, &a, frag_count_init); - - // reset the free list. - (*fl) = (wikrt_fl){0}; - - // coalesce adjacent nodes and add back to our free-lists. - while(0 != a) { - wikrt_fb* const pa = wikrt_pfb(mem, a); - - // coalesce adjacent nodes - while((a + pa->size) == pa->next) { - wikrt_fb* const pn = wikrt_pfb(mem, pa->next); - pa->size += pn->size; - pa->next = pn->next; - } - wikrt_addr const a_next = pa->next; - pa->next = a; // close singleton flst (circular) - - // recompute free list stats - fl->free_bytes += pa->size; - fl->frag_count += 1; - - wikrt_sc const sc = wikrt_size_class(pa->size); - wikrt_flst* const l = fl->size_class + sc; - - // update free-list, adding new node to end of list. - (*l) = wikrt_flst_join(mem, (*l), a); - a = a_next; - } - - // weak validation - assert( (free_bytes_init == fl->free_bytes) && - (frag_count_init >= fl->frag_count) ); - -} - -/** Combine two free-lists, moving nodes from 'src' into 'dst'. - * - * Performed in constant time via circular linked list joins. I'll favor - * nodes from `dst` before nodes from `src` in the result, though this is - * entirely arbitrary. - * - * The `src` list is zeroed during this process. - */ -void wikrt_fl_merge(void* const mem, wikrt_fl* const src, wikrt_fl* const dst) -{ - assert(src != dst); - // a merge in approximately constant time - dst->free_bytes += src->free_bytes; - dst->frag_count += src->frag_count; - src->free_bytes = 0; - src->frag_count = 0; - for(wikrt_sc sc = 0; sc < WIKRT_FLCT; ++sc) { - wikrt_flst* const lsrc = src->size_class + sc; - wikrt_flst* const ldst = dst->size_class + sc; - (*ldst) = wikrt_flst_join(mem, (*ldst), (*lsrc)); - (*lsrc) = 0; - } -} - -#if 0 -static void wikrt_fl_print(FILE* out, void* mem, wikrt_fl* fl) -{ - fprintf(out, "wikrt_fl: frags=%d, bytes=%d\n", (int)fl->frag_count, (int)fl->free_bytes); - for(wikrt_sc sc = 0; sc < WIKRT_FLCT; ++sc) { - wikrt_flst const l0 = fl->size_class[sc]; - if(0 == l0) { continue; } - fprintf(out, "\t[%d]: ", (int)sc); - wikrt_flst iter = l0; - do { - iter = wikrt_pfb(mem, iter)->next; - fprintf(out, " %d(%d)", (int)iter, (int)wikrt_pfb(mem, iter)->size); - } while(iter != l0); - fprintf(out, "\n"); - } -} -#endif - - -/** attempt to acquire a quantity of space, accepting fragmented memory. - * - * On success, this may overshoot the request by most of one fragment. So - * it shouldn't be unless we know there is no single fragment large enough - * to fulfill the request. This favors larger fragments. - */ -static bool wikrt_move_frags(void* mem, wikrt_fl* src, wikrt_fl* dst, wikrt_size amt) -{ - assert(src != dst); - wikrt_sc sc = WIKRT_FLCT; - - while(sc-- > 0) { - wikrt_flst* const s = src->size_class + sc; - wikrt_flst* const d = dst->size_class + sc; - while(0 != (*s)) { - wikrt_fb* const ps = wikrt_pfb(mem, (*s)); - wikrt_flst const f = ps->next; - wikrt_fb* const pf = wikrt_pfb(mem, f); - wikrt_size const sz = pf->size; - - // remove fragment from src - if(ps == pf) { (*s) = 0; } - else { ps->next = pf->next; } - - // add fragment to end of dst list - pf->next = f; - (*d) = wikrt_flst_join(mem, (*d), f); - - // track changes in statistics. - dst->free_bytes += sz; - dst->frag_count += 1; - src->free_bytes -= sz; - src->frag_count -= 1; - - // manage allocation goals - if(sz >= amt) { return true; } - else { amt -= sz; } - } - } - return false; -} - -static inline bool wikrt_acquire_shm(wikrt_cx* cx, wikrt_sizeb sz) -{ - // assuming cxm lock is held - wikrt_addr block; - if(wikrt_fl_alloc(cx->memory, &(cx->cxm->fl), sz, &block)) { - wikrt_fl_free(cx->memory, &(cx->fl), sz, block); - return true; - } - return false; -} - -static void wikrt_acquire_shared_memory(wikrt_cx* cx, wikrt_size sz) -{ - - // I want a simple, predictable heuristic strategy that is very - // fast for smaller computations (the majority of Wikilon ops). - // - // Current approach: - // - // - allocate a single slab directly, if feasible. - // - otherwise: merge, coalesce, retry once. - // - final fallback: accept fragmented memory. - // - // The final case is near to 'thrashing'. Before we get this far, - // we should probably also try to recover memory (e.g. stowage). - // - wikrt_cxm* const cxm = cx->cxm; - void* const mem = cx->memory; - wikrt_cxm_lock(cxm); { - if(!wikrt_acquire_shm(cx, sz)) { - wikrt_size const f0 = cxm->fl.frag_count; - wikrt_fl_merge(mem, &(cx->fl), &(cxm->fl)); - wikrt_fl_coalesce(mem, &(cxm->fl)); - wikrt_size const ff = cxm->fl.frag_count; - - // track fragmentation of memory - cx->fragmentation += ((ff < f0) ? 0 : (ff - f0)); - - if(!wikrt_acquire_shm(cx, sz)) { - wikrt_move_frags(mem, &(cxm->fl), &(cx->fl), sz); - } - } - } wikrt_cxm_unlock(cxm); -} - -static inline bool wikrt_alloc_local(wikrt_cx* cx, wikrt_sizeb sz, wikrt_addr* addr) -{ - if(wikrt_fl_alloc(cx->memory, &(cx->fl), sz, addr)) { - cx->ct_bytes_alloc += sz; - return true; - } - return false; -} - -/** Currently using a simple allocation strategy. - * - * I'll try to allocate locally, if feasible. Coalescing locally could - * help in cases like copying stacks and vectors, where I perform slab - * allocations, and otherwise doesn't hurt much (because our local free - * list has no more than WIKRT_FREE_THRESH bytes.) - * - * If that doesn't work, we'll try to allocate from the shared space. - */ -bool wikrt_alloc(wikrt_cx* cx, wikrt_size sz, wikrt_addr* addr) -{ - sz = WIKRT_CELLBUFF(sz); - - // allocate locally if feasible. - if(cx->fl.free_bytes >= sz) { - if(wikrt_alloc_local(cx, sz, addr)) { return true; } - - // coalesce and retry - wikrt_fl_coalesce(cx->memory, &(cx->fl)); - if(wikrt_alloc_local(cx, sz, addr)) { return true; } - } - - // otherwise try to use external memory resources. - wikrt_acquire_shared_memory(cx, WIKRT_PAGEBUFF(sz)); - return wikrt_alloc_local(cx, sz, addr); -} - -/** Free locally. If we overflow, dump everything. */ -void wikrt_free(wikrt_cx* cx, wikrt_size sz, wikrt_addr addr) -{ - sz = WIKRT_CELLBUFF(sz); - cx->ct_bytes_freed += sz; - wikrt_fl_free(cx->memory, &(cx->fl), sz, addr); - - // Don't allow a context to 'own' too much unused space. - if(WIKRT_FREE_THRESH < cx->fl.free_bytes) { - wikrt_release_mem(cx); - } - - // thoughts: it might be wortwhile to free larger blocks of memory - // directly to our shared space. But I'm not sure what impact this - // would have on fragmentation. -} - -/* Release working memory back to the root. */ -void wikrt_release_mem(wikrt_cx* cx) -{ - // Estimate exposed memory fragmentation. - wikrt_fl_coalesce(cx->memory, &(cx->fl)); - cx->fragmentation += cx->fl.frag_count; - - // Release memory fragments to the commons. - wikrt_cxm* const cxm = cx->cxm; - wikrt_cxm_lock(cxm); { - wikrt_fl_merge(cxm->memory, &(cx->fl), &(cxm->fl)); - } wikrt_cxm_unlock(cxm); -} - -bool wikrt_realloc(wikrt_cx* cx, wikrt_size sz0, wikrt_addr* addr, wikrt_size szf) -{ - sz0 = WIKRT_CELLBUFF(sz0); - szf = WIKRT_CELLBUFF(szf); - if(sz0 == szf) { - // no buffered size change - return true; - } else if(szf < sz0) { - // free up a little space at the end of the buffer - wikrt_free(cx, (sz0 - szf), ((*addr) + szf)); - return true; - } else { - // As an optimization, in-place growth is unreliable and - // unpredictable. So, Wikilon runtime doesn't bother. We'll - // simply allocate, shallow-copy, and free the original. - wikrt_addr const src = (*addr); - wikrt_addr dst; - if(!wikrt_alloc(cx, szf, &dst)) { - return false; - } - void* const pdst = (void*) wikrt_pval(cx, dst); - void const* const psrc = (void*) wikrt_pval(cx, src); - memcpy(pdst, psrc, sz0); - wikrt_free(cx, sz0, src); - (*addr) = dst; - return true; - } -} -