TODO ==== * C - IMPORTANT: + FIX file descriptor overflow. See Tickets #341 and #343 - add .. operator to query parser. For example, [100 200] could be written as 100..200 or 100...201 like in Ruby Ranges - remove exception handling from C code. All errors to be handled by return values. - Move to sqlite's locking model. Ferret should work fine in a multi-process environment. - Add optional logging. To be enabled at compilation time, perhaps? - Add support for changing zlib and bzlib compression parameters - Improve unit test coverage to 100% - Add benchmark suite - Add Rakefile for development purposes + task to publish gcov and benchmark results to ferret wiki - Index rebuilding of old versioned indexes. - Add a globally accessable, threadsafe symbol table. This will be very useful for storing field names so that no objects need to strdup the field-names but can just store the symbol representative instead. + this has been done but it can be improved using actual Symbol structs instead of plain char* - Make threading optional at compile time - to_json should limit output to prevent memory overflow on large indexes. Perhaps we could use some type of buffered read for this. - Make BitVector run as fast as bitset from C++ STL. See; c/benchmark/bm_bitvector.c - Add a symbol table for field names. This will mean that we won't need to worry about mallocing and freeing field names which happens all over the place. - Divide the headers into public and private (the private headers to be stored in the src directory). - Group-by search. ie you should be able to pass a field to group search results by - Auto-loading of documents during search. ie actual documents get returned instead of document numbers. * Ruby bindings - argument checking for every method. We need a new api for argument checking so that the arguments get checked at the start of each method that could cause a segfault. - improve memory management. It was way to complex at the moment. I also need to document how it works so that other developers understand what is going on. - Replace Data_Wrap_Struct with ferret alternative which handles rewrapping of structs automatically and also knows when to release a struct by using refcounting. * Ruby - integrate rcov - improve unit test coverage to 100% * Documentation. - generate Ruby binding documentation with custom build template similar jaxdoc http://rubyforge.org/projects/jaxdoc - all documentation should meet DOCUMENTATION_STANDARDS - documentation in C code to be generated by doxygen Someday Maybe ============= * apply for Google Summer of Code 2009 * optimize read and write vint - test the following outside of ferret before implementing - perform a binary scan using bit-wise or to find out how many bytes need to be written - if the write/read will overflow the buffer, split it into two, refreshing the buffer in between - use Duff's device to write bytes now that we know how many we need * add a super fast language based dictionary compression * add portable stacktrace function. Perhaps implement as an external library. - See http://www.nongnu.org/libunwind/ - See http://www.tlug.org.za/wiki/index.php/Obtaining_a_stack_trace_in_C_upon_SIGSEGV * investigate unscored searching * user defined sorting * Fix highlighting to work for external fields * investigate faster string hashing method Done ==== * add rake install task * FIX :create parameter so that it only deletes the files owned by Ferret. * fix compression. Currently nothing is happening if you set a field to :compress. I guess we'll just assume zlib is installed, as I think it has to be for Ruby to be installed. * add bzlib support * integrate gcov * add a field cache to IndexReader * setup email alerts for svn commits * Ranged, unordered searching. Ie search through the index until you have the required number of documents and then break. This will require the ability to start searches from a particular doc-num. + See searcher_search_unordered in the C code and Searcher#scan in Ruby * improve unit test code. I'd like to implement some way to print out a stack trace when a test fails so that it is easy to find the source of the error. * catch segfaults and print stack trace so users can post helpful bug tickets. again, see the same links for adding stacktrace to unit tests. * Add string Sort descripter * fix memory bug * add MultiReader interface * add lexicographical sort (byte sort) * Add highlighting * add field compression * Fix highlighting to work for compressed fields * Add Ferret::Index::Index * Fix: + Working Query: field1:value1 AND NOT field2:value2 + Failing Query: field1:value1 AND ( NOT field2:value2 ) * update benchmark suite to use getrusage