Skip to content

Commit

Permalink
Update doc index.html
Browse files Browse the repository at this point in the history
  • Loading branch information
bichengying committed Nov 6, 2020
1 parent 30539b0 commit a937a1d
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 109 deletions.
6 changes: 4 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Quick Start
First, make sure your environment is with ``python>=3.7`` and ``openmpi >= 4.0``.
Then, install Bluefog with: ``pip install --no-cache-dir bluefog`` or
``BLUEFOG_WITH_NCCL=1 pip install bluefog`` if NCCL is supported (``NCCL>=2.7``). Check
the ``install_bluefog`` page if you need more information or other install options.
the `install_bluefog <https://bluefog-lib.github.io/bluefog/install.html>`_ page if you need more information or other install options.

We provide high-level wrapper for torch optimizer. You just need to modify
the existing script to distributed implementation is wrapping the optimizer
Expand Down Expand Up @@ -110,7 +110,8 @@ more code:
optimizer.step()
Check our BlueFog `dynamic topology neighbor averaging <https://bluefog-lib.github.io/bluefog/neighbor_average.html>`_
page to see more on how to control and use topology.
page to see more on how to control and use topology. See BlueFog `examples`_ folder for full code.


We also provide lots of low-level functions, which you can use those as building
blocks to construct your own distributed training algorithm. The following example
Expand All @@ -136,3 +137,4 @@ Citation
*BlueFog: Make Decentralized Algorithms Practical for Optimization and Deep Learning*, Bluefog Team, To be Appeared in 2020

.. _AWS: https://aws.amazon.com/about-aws/whats-new/2018/12/introducing-amazon-ec2-p3dn-instances-our-most-powerful-gpu-instance-yet/
.. _examples: https://github.com/Bluefog-Lib/bluefog/tree/master/examples
138 changes: 31 additions & 107 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,112 +11,34 @@ Bluefog
:align: center
:alt: Bluefog Logo

Bluefog is a distributed training framework for PyTorch based
on diffusion/consensus-type algorithm.
The goal of Bluefog is to make distributed and decentralized machine learning fast,
fault-tolerant, friendly to heterogeneous environment, and easy to use.

The most distinguishable feature of Bluefog compared with other popular distributed training frameworks, such as
DistributedDataParallel provided by pytorch, Horovod, BytePS, etc., is that our core implementation rooted on the idea
that we introduce the virtual topology into the multiple processes and local averaging will converge to the global average
as iteration keep going, where local averaging is defined based on the connection in the virtual topology.

Leveraging the *One-sided Communication Ops* (i.e. remote-memory access) of MPI, Bluefog is not only distributed
but also decentralized training framework with high performance.

* Unlike the *ring-allreduce* based Bulk Synchronous Parallel algorithm, each process in bluefog is highly decoupled so that we can maximize the power of asynchronous algorithm.
* Unlike the *parameter-server* (PS) based architecture, there is no central node to collect and distribute information so that we can avoid the bottleneck problem existing in the PS.

The cost of those advantages is the inconsistence between models. Please check our papers to see the theoratical guarantee.

.. Important::

Although the most torch-based APIs perform well, this repository is still
in the early stage of development and more features are waiting to be implemented.
If you are interested, you are more than welcome to join us and contribute this project!

Quick Start
-----------

First, make sure your environment is with ``python>=3.7`` and `openmpi`_ >= 4.0.
Then, install Bluefog with: ``pip install --no-cache-dir bluefog``. Check
the :ref:`install_bluefog` page if you need more information or other install options.
We provide high-level wrapper for torch optimizer.
Probably, the only thing you need to modify
the existing script to distributed implementation is wrapping the optimizer
with our ``DistributedNeighborAllreduceOptimizer``,
then run it through ``bfrun``. That is it!

.. code-block:: python
# Execute Python functions in parallel through
# bfrun -np 4 python file.py
import torch
import bluefog.torch as bf
...
bf.init()
optimizer = optim.SGD(model.parameters(), lr=lr * bf.size())
optimizer = bf.DistributedNeighborAllreduceOptimizer(
optimizer, model
)
...
We also provide lots of low-level functions, which you can use those as building
blocks to construct your own distributed trainning algorithm. The following example
illustrates how to run a simple consensus algorithm through bluefog.

.. code-block:: python
import torch
import bluefog.torch as bf
bf.init()
x = torch.Tensor([bf.rank()])
for _ in range(100):
x = bf.neighbor_allreduce(x)
print(f"{bf.rank()}: Average value of all ranks is {x}")
One noteable feature of Bluefog is that we leverage the One-sided Communication of MPI
to build a real decentralized and asynchronized algorithms. This is another example about
how to use Bluefog to implement an asynchronized push-sum consensus algorithm.

.. code-block:: python
import torch
import bluefog.torch as bf
from bluefog.common import topology_util
bf.init()
# Setup the topology for communication
bf.set_topology(topology_util.PowerGraph(bf.size()))
outdegree = len(bf.out_neighbor_ranks())
indegree = len(bf.in_neighbor_ranks())
# Create the buffer for neighbors.
x = torch.Tensor([bf.rank(), 1.0])
bf.win_create(x, name="x_buff", zero_init=True)
for _ in range(100):
bf.win_accumulate(
x, name="x_buff",
dst_weights={rank: 1.0 / (outdegree + 1)
for rank in bf.out_neighbor_ranks()},
require_mutex=True)
x.div_(1+outdegree)
bf.win_update_then_collect(name="x_buff")
bf.barrier()
# Do not forget to sync at last!
bf.win_update_then_collect(name="x_buff")
print(f"{bf.rank()}: Average value of all ranks is {x[0]/x[-1]}")
Please explore our `examples`_ folder to see more about
how to implemented deep learning trainning and distributed
optimization algorithm quickly and easily through bluefog. If you want to understand more on
how to use the low-level API as the building blocks for your own distributed
algorithm, please read our :ref:`Ops Explanation` page.
OVERVIEW
--------

BlueFog is a high-performance distributed training framework for PyTorch built with **decentralized optimization** algorithms.
The goal of Bluefog is to make decentralized algorithms easy to use, fault-tolerant, friendly to heterogeneous environment,
and even faster than training frameworks built with parameter server, or ring-allreduce.

In each communication stage, neither the typical star-shaped parameter-server toplogy, nor the pipelined ring-allreduce topology is used, which
is fundamentally different from other popular distributed training frameworks, such as DistributedDataParallel provided by PyTorch, Horovod, BytePS, etc.
Instead, BlueFog will exploit a virtual and probably dynamic network topology (that can be in any shape) to achieve most communication efficiency.

..
Main Idea: Replace expensive allreduce averaging over gradients by cheap neighbor averaging over parameters

For each training iteration, one process (or agent) will update its model with information received from its **direct** neighbors defined by the virtual topology. It is observed all communications only occur over the predefied virtual topolgy and no global communication is required. This is why the algorithms is named *decentralized*.
Decentralized training algorithms are proved in literature that it can converge to the same solution as their standard centralized counterparts.

The topology decides the communication efficiency. BlueFog supports both **static** topology and **dynamic** topology usages. After tremendous trials, the dynamic Exponential-2 graph is observed to achieve the best performance
if the number of agents is the power of 2, such as 4, 32, 128 agents. In Exponential-2 graph, each agent will
communicates with the neighbors that are 2 :sup:`0`, 2 :sup:`1`, ..., 2 :sup:`t` hops away. **Dynamic** toplogy means all agents select
one neighbor only in one iteration and select next neighbor in next iteration as illustrated in the following figure:

.. raw:: html

<p align="center"><img src="https://user-images.githubusercontent.com/16711681/97928035-04654400-1d1b-11eb-91d2-2da890b4522e.png" alt="one-peer-exp2" width="650"/></p>

In this scenario, the communcation cost for each iteration is only one unit delay, one standard parameter size to transmit and no communication conflict happens, which is better than what parameter server or ring-allreduce promises. As for loss and accuracy guarantees, please check out our theoratical paper. [Will add a full tutorial soon].

.. toctree::
:maxdepth: 1
Expand Down Expand Up @@ -148,4 +70,6 @@ algorithm, please read our :ref:`Ops Explanation` page.
Development Guide <devel_guide>

.. _openmpi: https://www.open-mpi.org/
.. _examples: https://github.com/ybc1991/bluefog/tree/master/examples
.. _examples: https://github.com/Bluefog-Lib/bluefog/tree/master/examples
.. _AWS: https://aws.amazon.com/about-aws/whats-new/2018/12/introducing-amazon-ec2-p3dn-instances-our-most-powerful-gpu-instance-yet/

0 comments on commit a937a1d

Please sign in to comment.