Skip to content
Find file
Fetching contributors…
Cannot retrieve contributors at this time
189 lines (149 sloc) 9.12 KB

Riak 1.1.2 Release Notes

Features and Improvements for Riak

This is mainly a stabalization release, adding a few stability improvements for heavily loaded networks and improving support for running mixed clusters while migrating to the 1.1.x series.

The frequency that Riak checks whether primary partitions need to be transferred and also a few other administrative tasks has been lowered from every 30s to every 10s by default. It is also configurable on an application environment variable in app.config.

{riak_core, [{vnode_management_timer, #Millisecs}, ...]}

Known Issues

The /usr/sbin/riak script expects /var/run/riak to exist. On Fedora and possibly other Linux distributions /var/run is built on tmpfs, so the directory created at package install time goes away on reboot and the /usr/sbin/riak script does not have permissions to recreate it.

As a work around, add this to /etc/rc.local

mkdir /var/run/riak && chown -R riak /var/run/riak

Bugs Fixed

-erlang_js inlines do not work on XCode 4.3 -Work-around for net_kernel:get_status() race condition -Handoff reported as started even if rejected for exceeding handoff concurrency -Intermittent hang with handoff sender -handoff of fallback data can be delayed or never performed -Chicken & egg problem with vnode proxy process start & registration? -Put FSM crashes if forwarded node goes down

Riak 1.1.1 Release Notes

Major Features and Improvements for Riak

MapReduce Changes

This release includes fixes for systems with heavy MapReduce load.

Pipe maintains a limit on the number of active worker processes in the system. Workers are created on demand when the pipe executes - they are not reserved at pipe creation time.

Prior to 1.1.1 not all fittings (e.g. get object from riak, map value, reduce) were checking to make sure their output was being delivered to the next stage of the pipeline. This has been changed so that the pipe will error if processing errors or system limits are hit.

As a result of the change if the pipe is overloaded clients will see errors of this form: {error,<<"Error sending inputs: {error,worker_limit_reached}">>}

Clients can either decide to retry (preferably with a backoff/retries limit mechanism) or if the hardware has sufficient capacity the number of workers per vnode can be increased by setting worker_limit in etc/app.config.

{riak_pipe, [{worker_limit, #Workers}]}

Different numbers of workers are used based on the type of MapReduce query so tuning will be very workload dependent.

If you are using Javascript MapReduce you will also need to increase the number of Javascript virtual machines in the riak_kv section of app.config.

{map_js_vm_count,  #MapVMs},
{reduce_js_vm_count, #ReduceVMs}

Again, tuning will be workload dependent. If sufficient resources are available setting #MapVMs to the maximum number of JS MapReduce jobs being run in parallel will give best results and #ReduceVMs set to number of jobs / number of nodes plus a margin for overload.

Bugs Fixed

-Map-reduce jobs fail during rolling upgrade to 1.1 -Race Condition causes Javascript VMs to never be returned to VM Pool -Avoid {case_clause, ok} errors when vnode backend fails to start -Bitcask - Fix incorrect NIF error tuples

Riak 1.1.0 Release Notes

Major Features and Improvements for Riak

Riak Control

-Riak Control preview demo -Riak Control repository with documentation on how to get started


-Blog post announcing riaknostic -Riaknostic Homepage

Bitcask Improvements

  • CRC checks have been added to hintfiles for bitcask

LevelDB Improvements

  • Snappy (compression algorithm from Google) has been enabled
  • The compaction algorithm has been tuned to reduce delays due to compaction

Other backend changes

  • Multi-backend now supports 2i properly

Lager Improvements

  • Tracing support (see the README)
  • Term printing is ~4x faster and much more correct (compared to io:format)
  • Bitstring printing support was added

MapReduce Improvements

  • The MapReduce interface now supports requests with empty queries. This allows the 2i, list-keys, and search inputs to return matching keys to clients without needing to include a reduce_identity query phase.
  • MapReduce error messages have been improved. Most error cases should now return helpful information all the way to the client, while also producing less spam in Riak’s logs.

Riak KV Improvements

Listkeys Backpressure

Backpressure has been added to listkeys to prevent the node listing keys from being overwhelemed. The change has required a protocol change so that the key lister can limit the rate it receives data.

In mixed clusters where some of the nodes are < 1.1 please set listkeys_backpressure false in the riak_kv section of app.config until all nodes are upgraded.

{listkeys_backpressure, false}

Once all nodes are upgraded, set listkeys_backpressue to true in the riak_kv section of app.config

{listkeys_backpressure, true}

Running nodes can be upgraded without restarting by running this snippet from the riak console

application:set_env(riak_kv, listkeys_backpressure, true).

Fresh 1.1.0 and above installs default to using listkeys backpressure - adjust app.config if different behavior is desired.

Don’t drop post-commit errors on floor

In previous releases there is no easy way to determine if a post-commit hook is failing. In this release two counters have been added to riak-admin status that will indicate pre/post-commit hook failures. They are precommit_fail and postcommit_fail. By default the errors themselves are not logged. The thought is that a bad hook could cause unnecessary IO overload.

If the error needs to be discovered then Lager, Riak’s logging system, will allow you to dynamically change the logging level to debug on the function executing the hook. To do that you need to riak attach on one of the nodes and run the following.

{ok, Trace} = lager:trace_file("<path>/failing-postcommits", [{module, riak_kv_put_fsm}, {function, decode_postcommit}], debug).

This will output all post-commit errors to <path>/failing-postcommits. When you’ve got enough samples you can stop the trace like so.


Other Additions

Default small_vclock to be equal to big_vclock

If you are using bidirectional cluster replication and you have overridden the defaults for either of these then you should consider setting both to the same value.

The default value of small_vclock has been changed to be equal to big_vclock in order to delay or even prevent unnecessary sibling creation in a Riak deployment with bidirectional cluster replication. When you replicate a pruned vector clock the other cluster will think it isn’t a descendent, even though it is, and create a sibling. By raising small_vclock to match big_vclock you reduce the frequency of pruning and thus siblings. Combined with vnode vclocks, sibling creation, for this particular reason, may be entirely avoided since the number of entries will almost always stay below the threshold in a well behaved cluster (i.e. one not under constant node membership change or network partitions).

Known Issues

-Luwak has been deprecated in the 1.1 release -bz1160 - Bitcask fails to merge on corrupt file

Bugs Fixed

-bz775 - Start-up script does not recreate /var/run/riak -bz1283 - erlang_js uses non-thread-safe driver function -bz1333 - Bitcask attempts to open backup/other files -Poolboy - Lots of potential bugs fixed, see detailed post by Andrew Thompson

Lager Specific Bugs Fixed

  • #26 - don’t make a crash log called ‘undefined’
  • #28 - R13A support (god only knows why I bothered merging this)
  • #29 - Don’t unnecessarily quote atoms
  • #31 - Better crash reports for proc_lib processes
  • #33 - Don’t assume supervisor children are named with atoms
  • #35 - Support printing bitstrings (binaries with trailing bits)
  • #37 - Don’t generate dynamic atoms
Something went wrong with that request. Please try again.