Reliable Allreduce and Broadcast Interface for distributed machine learning
C++ Python Makefile C Shell
Latest commit 21b5e12 Nov 24, 2016 @AbdealiJK AbdealiJK committed with tqchen allreduce_robust.cc: Allow num_global_replica to be 0 (#38)
In some cases, users may not want to have any global replica of
the data being broadcasted/all-reduced. In such cases, set the
result_buffer_round to -1 as a flag that this is not necessary
and check for it.
Permalink
Failed to load latest commit information.
doc refactor: librabit Feb 27, 2016
guide Make rabit library thread local Mar 2, 2016
include sync dmlc header Aug 26, 2016
lib update doc Jan 3, 2015
python refactor: librabit Feb 27, 2016
scripts fix Feb 28, 2016
src allreduce_robust.cc: Allow num_global_replica to be 0 (#38) Nov 24, 2016
test fix Feb 28, 2016
.gitignore add link translation Jul 29, 2015
.travis.yml Update .travis.yml Nov 23, 2016
LICENSE license Feb 11, 2015
Makefile Applied FreeBSD support (#37) Nov 16, 2016
README.md Update README.md May 11, 2016

README.md

Rabit: Reliable Allreduce and Broadcast Interface

Build Status Documentation Status

rabit is a light weight library that provides a fault tolerant interface of Allreduce and Broadcast. It is designed to support easy implementations of distributed machine learning programs, many of which fall naturally under the Allreduce abstraction. The goal of rabit is to support portable , scalable and reliable distributed machine learning programs.

Features

All these features comes from the facts about small rabbit:)

  • Portable: rabit is light weight and runs everywhere
    • Rabit is a library instead of a framework, a program only needs to link the library to run
    • Rabit only replies on a mechanism to start program, which was provided by most framework
    • You can run rabit programs on many platforms, including Yarn(Hadoop), MPI using the same code
  • Scalable and Flexible: rabit runs fast
    • Rabit program use Allreduce to communicate, and do not suffer the cost between iterations of MapReduce abstraction.
    • Programs can call rabit functions in any order, as opposed to frameworks where callbacks are offered and called by the framework, i.e. inversion of control principle.
    • Programs persist over all the iterations, unless they fail and recover.
  • Reliable: rabit dig burrows to avoid disasters
    • Rabit programs can recover the model and results using synchronous function calls.

Use Rabit

  • Type make in the root folder will compile the rabit library in lib folder
  • Add lib to the library path and include to the include path of compiler
  • Languages: You can use rabit in C++ and python
    • It is also possible to port the library to other languages

Contributing

Rabit is an open-source library, contributions are welcomed, including:

  • The rabit core library.
  • Customized tracker script for new platforms and interface of new languages.
  • Tutorial and examples about the library.