Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

460 lines (377 sloc) 15.64 kb
I Am Legend:
Items are sorted by priority (highest on top).
o a pending TODO item (for the current release)
. a pending TODO item (for future releases)
x a finished TODO item
-----------------------------------------------------------------------------
This Branch Is About Integrating The hamsterdb2 Functionality!!!!!
-----------------------------------------------------------------------------
The big headline is:
As a user i want to run many Transactions in parallel with high performance.
I'm using multiple threads b/c my CPU has multiple cores, and expect hamsterdb
to scale with the number of cores.
==============================================================================
o involve everyone in decisions on the new interface
Do you use any of the HAM_HINT_* flags?
If yes, would it be very problematic if I would remove this functionality?
Do you use any of the HAM_DAM_* flags?
If yes, would it be very problematic if I would remove this functionality?
Do you use AES encryption?
If yes, would it be very problematic if I would remove this functionality?
Do you use zlib compression?
If yes, would it be very problematic if I would remove this functionality?
Do you use to insert duplicate keys at exact positions (i.e. use one of
HAM_CURSOR_INSERT_BEFORE, HAM_CURSOR_INSERT_AFTER,
HAM_CURSOR_INSERT_FIRST, HAM_CURSOR_INSERT_LAST)?
If yes, would it be very problematic if I would remove this functionality?
Do you use ham_cursor_move in combination with HAM_SKIP_DUPLICATES
or HAM_ONLY_DUPLICATES?
If yes, would it be very problematic if I would remove this functionality?
Do you use sorted duplicates?
If yes, would it be very problematic if I would remove this functionality?
o clean up interface
o rename ham_env_open_ex to ham_env_open, remove ham_open[_ex]
o rename ham_env_create_ex to ham_env_create, remove ham_create[_ex]
o rename ham_cursor_find_ex to ham_cursor_find
o clean up ham_get_parameters, ham_env_get_parameters, ham_get_flags
o clean up ham/hamsterdb_stats.h, ham/hamsterdb_int.h
o clean up and refactor the code
x os.h
x os_posix.cc
x os_win32.cc
x packstart.h
x packstop.h
x rb.h
x serial.h
x version.h
x util.cc
x util.h
x config.h
x internal_fwd_decl.h
x errorinducer.h
x endianswap.h
x mem.cc
x mem.h
o error.cc
fill dbg_lock, dbg_unlock
o error.h
ham_assert/ham_verify: remove second parameter
o hash-table.h
replace this with a boost class
o cache.cc
o cache.h
o changeset.cc
o changeset.h
o device.cc
o device.h
o journal.cc
o journal_entries.h
find a decent way to handle endian-dependent structures
o journal.h
o log.cc
o log.h
find a decent way to handle endian-dependent structures
o page.cc
o page.h
find a decent way to handle endian-dependent structures
o remote.cc
o extkeys.h
o blob.cc
o blob.h
o backend.h
o btree.h
o btree_key.cc
o btree_key.h
o btree_stats.cc
should use arena for lower/upper bounds
o btree_stats.h
o btree.cc
o btree_check.cc
o btree_cursor.cc
o btree_cursor.h
o btree_enum.cc
o btree_erase.cc
o btree_find.cc
o btree_insert.cc
o freelist.cc
o freelist.h
o freelist_statistics.cc
o freelist_statistics.h
o cursor.cc
o cursor.h
o db.cc
should use arena for get_extended_key
o db.h
o env.cc
o env.h
o txn.cc
o txn_cursor.cc
o txn_cursor.h
o txn.h
o hamsterdb.cc
o unittests
o more fine-grained locking
x cache
x make sure it's perfectly abstracted
x protect with mutex
x extkey-cache
x make sure it's perfectly abstracted
x protect with mutex
x log
x make sure it's perfectly abstracted
x protect with mutex
x journal
x make sure it's perfectly abstracted
x protect with mutex
x changeset
x make sure it's perfectly abstracted
x protect with mutex
x device
x make sure it's perfectly abstracted
x protect with mutex
x freelist
x remove HAM_DAM_ENFORCE_PRE110_FORMAT
x remove legacy code (1.x)
x remove freelist_v2.cc
x freelist class takes ownership of freelist_cache (from device)
x what is the difference between alloc_area and alloc_area_ex??
x move call to lazy_create into class
x create c++ class
x rename lazy_create to "initialize"
x do not create a freelist when opening in read-only mode
or in-memory
x freelist functions do not need to check for in-memory!
x completely refactor/reformat the code
x get rid of static functions in freelist.cc
x freelist_entry_t -> Freelist::Entry
x freelist_payload_t -> Freelist::Payload
x also merge/refactor freelist_statistics.cc
x check with valgrind
x protect with mutex
o backend
x replace with C++ class and inheritance
x remove be_set_dirty, replace with be->_fun_flush()
x btree has keydata1, keydata2 - why? better use DynamicArray?
x BtreeBackend does not require to store in little/big-endian!
o btree.h declares many static functions - still used?
o include the functions in btree_insert.cc, btree_erase.cc etc
o improve abstraction
o statistics
o split into env/db/freelist
o remove unused code - also from public interface?
o protect env and freelist with mutex
o in db: assert that the db is locked
o db
o completely hide the local/remote implementation in the database
o move more members from Database to the Implementation classes
o improve abstraction
o protect with mutex
o env
o completely hide the local/remote implementation in the Environment
o move more members from Environment to the Implementation classes
o improve abstraction
o txn
o convert to clean c++ code
o protect with a mutex
o when changing the transaction queue: assert that the env is locked
o configuration/env-header-page
o create configuration object
o protect with mutex
o file filters -- why? the filter list is not modified in the selected
functions; but it needs to be documented that it has to be
threadsafe, therefore verify that the AES filter *is* threadsafe
o convert to C++ visitor pattern?
o page
o do we also need to lock pages?? or are they locked implicitly by
adding them to the changeset? if yes then make sure that this is
failsafe! and assert that there can be only one changeset at a
time
o changeset::contains is not thread safe
only used in Cache::purge
o blob manager
o introduce a buffer manager?
o i.e. holds the cache and manages the device
o i.e. for de-coupling freelist from env
o i.e. for de-coupling btree from db and env
o the pages are only used in the btree (with one exception: header
page); should the cache therefore be limited to btree functions?
then purge and other stuff can also move to the btree level and
everything gets simplified
o all other functions/modules: assert that the env-mutex is locked!
o use read/write lock for Environment
o introduce a new lock in the Database
o the following functions lock the Environment for reading, but the
Database for writing:
o ham_insert
o ham_find
o ham_erase
o ham_cursor_*
o any others?
o make sure that the cache does not flush pages that are currently locked
by any Transaction
o freelist: store the last successful freelist page in the hinter of the
database
o is the recovery working if there's a crash during ham_close?
o blob_allocate: remove writev when writing header and blob body
o also remove locking from C# and Java APIs
o why do tests with 20 threads fail with oom when using mmap? the cache limits
should still work
o implementation of hamsterdb should move into a namespace; otherwise
there are conflicts if users have a C++ class called Database or
Environment etc
o clean up the "close" functions
o ham_close - move all functionality to Database::close
o ham_env_close - move all functionality to Environment::close
o ~Database: call close(), then simplify all code
o ~Environment: call close(), then simplify all code
o Cursor::close: currently (nearly) empty; merge with ~Cursor and
Database::close_cursor()
o do not have st2/st; just fail if there's a serious problem
o python API - update and integrate
o rewrite with boost::python??
o also add to win32 package
o continue with c++-ification of db.h/db.cc
o c++-ify the backend
o replace with C++ class and inheritance
o BtreeBackend does not require to store in little/big-endian!
o remove be_set_dirty, replace with be->_fun_flush()
o btree has keydata1, keydata2 - why? better use DynamicArray?
o move flushing of transactions in background
o new flag HAM_DISABLE_ASYNC_COMMITS
o need new test: n threads; each thread inserts keys [i*n, i*(n+1))
o make sure that each Database accesses its own pages; i.e. do not share
blob pages between databases. Each database stores the address of the
previously used blob page in order to reduce freelist access
o when flushing and values are written to multiple databases: write them
in parallel
o need a function to get the txn of a conflict (same as in v2)
ham_status_t ham_txn_get_conflicting_txn(ham_txn_t *txn, ham_txn_t **other);
oder: txn-id zurückgeben?
o also add to c++ API
o add documentation (header file)
o add documentation (wiki)
o recovery: re-create pending transactions (if required)
o needs a function to enumerate them
o c++-ify the transaction
o allow transactions in-memory
o allow transactions w/o journal
o allow transactions w/o recovery
o rename HAM_WRITE_THROUGH to HAM_ENABLE_FSYNC
o new node format:
iiikkkkkkkkkrrrrrrrr
iii: fixed size index (skip-list)
kkk: keys
rrr: records
keys can have different compressions (bitmap, suppress null, ...)
records are compressed; will be multiplied by BLOCKSIZE (32)
each key consists of { char flag; short record_id; char data[1]; }
where record_id is an offset into rrrrrrrrr
keys are sorted lazily (i.e. inserted at the end and only sorted
when flushed to disk or when searched
. android port (needs new java api) in /contrib directory (it's on a separate
branch)
. new test case for cursors
insert (1, a)
insert (1, b) (duplicate of 1)
move (last) (-> 1, b)
insert (1, c)
move (last) (-> 1, c)? is the dupecache updated correctly?
. look for someone who can write a PHP or Perl or Ruby wrapper
. implement support for partial keys
. test with tcmalloc; if it works then also use it in the master branch, but
make sure that memory consumption does not increase significantly
. there are a couple of areas where a btree cursor is uncoupled, just to
retrieve the key and to couple the txn-key. that's not efficient
db.c:__btree_cursor_points_to
db.c:__compare_cursors
txn_cursor.c:cursor_sync
txn_cursor.c:cursor_overwrite
o move to a separate function
o try to optimize
. hash-table.h: the foreach/remove_if/visitor pattern is clumsy. use
functor or class w/ operator() instead
. changeset: use a generic hash table for fast lookup (but leave list in place
for everything else)
. cache: use a generic hash table
. add tests to verify that the cursor is not modified if an operation fails!
(in cursor.cpp:LongTxnCursorTest are some wrapper functions to move or
insert the cursor; that's a good starting point)
. the whole c++ protocol should be c++-ified
. move the whole configuration (key sizes, parameters, page size, etc) into a
separate class which is instantiated by the env
. c++-ify the btree node representation;
o include duplicates as well! disentangle duplicates from blob-handling
. c++-ify the blob handling and split off the duplicates
. cleanup db.h/db.cc - move functions into Database or
DatabaseImplementationLocal namespace - but take care b/c these functions
are also used by Cursor or other modules which don't necessarily have
access to the Local stuff
o db_get_key_count
o db_alloc_page
o db_fetch_page
o db_insert_txn
o db_erase_txn
o db_find_txn
o db_check_insert_conflicts
o db_check_erase_conflicts
o __increment_dupe_index
. c++-ify everything else
. device, page and os shold no longer return errors but throw exceptions
XXXXXXXXXXXXXXXXXXXXX release 2.0.0 STABLE !!! XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
. new flag for transactions: HAM_TXN_WILL_COMMIT
if this flag is set, then write all records directly to the blob, not
to the log. the log will only contain the rid.
o document this (and also the drawback - that an abort will lose the
blobs and they cannot be reused
-> this affects all temporary ham_insert-transactions
(not sure if this should get high priority)
o hamsterdb.com
x add twitter feed
o API documentation: don't link to "modules" but to startup page, update
with newest version
o crupp.de: do a backup of the database
. google +1 button
. can we use something like Aller.font?
o update documentation
x in header file
o in the wiki
o don't forget to list all functions that are currently disabled
w/ txn's -> sorting dupes, approx. matching, ...
o transactional behavior/conflicts of duplicate keys
o in the wiki: start with internal documentation
o transactions
o architecture
o btree
o journal/log
o cache
o I/O
o unittests
o cursor(s)
o monstertests - how to use them?
o fully (!) automize the whole release process for unix; the result (on
success) are the following files:
o tar-ball
o the README
o the documentation as a tar
o the Changelog
o the release notes (a template)
o the output (xml) of the monster tests
. port to WinCE
o how can we extend the monster-tests to have reliable tests for transactions?
. if memory consumption in the txn-tree is too high: flush records to disk
(not sure if this should get high priority)
o when recovering, give users the choice if active transactions should be
aborted (default behavior) or re-created
o extkeys: don't use txn_id for the 'age', use lsn instead
o the DatabaseImplementation subclass is not neccessary; all subclasses
can directly derive from Database.
. allow use of transactions without a log/journal
. allow use of transactions for in-memory databases
XXXXXXXXXXXXXXXXXXXXX release 2.0.0 STABLE XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
o ham_get_count: could be atomically updated with every journal entry
o when flushing a changeset: sort by offset, use writev()
o add concurrency (on a high level)
o flush transactions in background
. have a flag to disable flushing of logfiles/journal files (or flush them
async.)
o continue as described in integrate-ham2.txt...
Jump to Line
Something went wrong with that request. Please try again.