<a href="https://colab.research.google.com/github/AjeetSingh02/Notebooks/blob/master/uber_fiber.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**What is Fiber?**

Fiber is a Python distributed computing library for modern computer clusters.

**Note**: Fiber is experimental and the APIs are not stable. (Source: Github of Fiber)

# Basic Properties

* **Easy to use**
    * Fiber allows you to write programs that run on a computer cluster level without the need to dive into the details of computer cluster.


* **Easy to learn**
    * Fiber provides the same API as Python's standard multiprocessing library that you are familiar with. If you know how to use multiprocessing, you can program a computer cluster with Fiber.


* **Fast**
    * Fiber's communication backbone is built on top of Nanomsg which is a high-performance asynchronous messaging library to allow fast and reliable communication.


* **Batteries included**
    * You don't need to deploy Fiber on computer clusters. You run it as the same way as running a normal application on a computer cluster and Fiber handles the rest for you.


* **Reliable**
    * Fiber has built-in error handling when you are running a pool of workers. Users can focus on writing the actual application code instead of dealing with crashed workers.


* **Dynamic scaling**
    * Fiber can dynamically allocate resources from computer clusters including CPU/Memory/GPU etc. It can scale up and down according to the computation needed by the user.

# Installation

Since fiber is just like any other python library, **pip install** will work as shown below.

In [0]:
! pip install fiber

# How to use

To understand Fiber we will take one example. In this example, we will create a simple program that estimates Pi with [Monte Carlo Method](https://en.wikipedia.org/wiki/Monte_Carlo_method).<br>

We will create a file **pi_estimation.py** with following content:

 ```python
from fiber import Pool
import random

NUM_SAMPLES = int(1e6)

def is_inside(p):
    x, y = random.random(), random.random()
    return x * x + y * y < 1

def main():
    pool = Pool(processes=4)
    pi = 4.0 * sum(pool.map(is_inside, range(0, NUM_SAMPLES))) / NUM_SAMPLES
    print("Pi is roughly {}".format(pi))

if __name__ == '__main__':
    main()
    
```

In [0]:
# After running this command we will get the estimated value of pi
! python pi_estimation.py

Pi is roughly 3.139944


In this example, Fiber created a pool of 4 workers, passed all the workload to them and collected results from them. We can increase the degree of parallelism by increasing the number of Pool workers.

Since this code ran on my local, it is essentially multiprocessing (different workers running on different cores) and not cluster computing.

We will see cluster computing in next section.

# Running on a Kubernetes cluster

To run our program on a computer cluster, we need to containarize it. Following **Dockerfile** and **docker build** command will do that.

Dockerfile:

```python    
FROM python:3.6-buster
ADD pi_estimation.py /root/pi_estimation.py
RUN pip install fiber
```

Docker Build command : 
```python    
docker build -t fiber-pi-estimation .
```

Now we can run the same code with **docker backend**

In [0]:
! FIBER_BACKEND=docker FIBER_IMAGE=fiber-pi-estimation:latest python pi_estimation.py

Pi is roughly 3.142896


Some points to note:

* **FIBER_BACKEND** tells Fiber what backend to use. Currently, Fiber supports these backends: *local*, *docker* and *kubernetes*. When FIBER_BACKEND is set to docker, all new processes will be launched through docker backend which means all of them will be running inside their own docker container.

* **FIBER_IMAGE** tells Fiber what docker image to use when launching new containers. This container provides the running environment for your child processes, so it needs to have Fiber installed in it. And we already did that in the previous step when building the docker container.

* **Note** that in this example, the master process (the one you started with python pi_estimation.py) still runs on local machine instead of inside a docker container. All the processes started by Fiber are inside containers.

* Also, **note** that Fiber is not installed on all the system but it is there in docker containers as part of environment and thus all the systems have Fiber the other way around.

In [0]:
# You can check the containers launched by Fiber by running this command:
!docker ps -a|grep fiber-pi-estimation

d41ef4ad7ee6        fiber-pi-estimation:latest   "/usr/local/bin/pyth…"   25 seconds ago      Exited (1) 14 seconds ago                       PoolWorker-4-bb15d42e-3c0d-474e-9f48-ccf466fd0522
db4a5b510d56        fiber-pi-estimation:latest   "/usr/local/bin/pyth…"   25 seconds ago      Exited (1) 14 seconds ago                       PoolWorker-3-eaef3af5-a862-4251-b1e6-c77e29e304b6
46c0a175b6a1        fiber-pi-estimation:latest   "/usr/local/bin/pyth…"   25 seconds ago      Exited (1) 14 seconds ago                       PoolWorker-2-96ae84be-7ef4-4a99-94b7-9cfbaa8fae99
2822b7583f3d        fiber-pi-estimation:latest   "/usr/local/bin/pyth…"   25 seconds ago      Exited (1) 14 seconds ago                       PoolWorker-1-35e610ca-13a5-433d-9edc-02328d3ddbe5


*As you can see in the above cell there are four containers started by fiber for the computation. Since currently we are running in local these containers will be running on different cores of my system and not on different computers. But in essence, same thing will happen in case of cluster computing. Here 4 containers are running on different cores, there 4 containers will be running on different computers.*

To run on Kubernetes cluster we have to install **kubectl**, **Google Cloud SDK** and need to authenticate docker to access **Google Container Registry (GCR)**. After that these 4 commands will do the trick.

**Note**: I have not tested below 4 commands as I dont have access to a compute cluster or GCP credit. So I am copying the codes from the Fiber repo.

```Python
# tag our image and push it to a container registry that is accessible by Kubernetes cluster.
docker tag fiber-pi-estimation:latest gcr.io/[your-project-name]/fiber-pi-estimation:latest
docker push gcr.io/[your-project-name]/fiber-pi-estimation:latest

# launch job     
kubectl create job fiber-pi-estimation --image=gcr.io/[your-project-name]/fiber-pi-estimation:latest -- python3 /root/pi_estimation.py
    
# The job has been submitted to Kubernetes cluster, and now we can get its
# logs. It may take some time before the job is scheduled. We will get our
# output after running this command
kubectl logs $(kubectl get po|grep fiber-pi-estimation|awk '{print $1}')
```

On Kubernetes, Fiber behaves similarly to when running locally with Docker. **Each process becomes a Kubernetes pod and all the pods work collectively to compute our estimation of Pi**!


To avoid the hassle, we can also use **fiber**, which is a command line tool that can be used to avoid all the above things, but currently that works with GCP only.

Below command will work as an alternative to all the above steps. We will be using the same Dockerfile:

```Python
fiber run -a python3 /root/pi_estimation.py
```

# Summary

* Fiber can be used to run your code on a cluster.

* What fiber does is that it takes your code in a docker container (or any other container) and with Kubernetes (or some other orchestrator) runs this code inside container on multiple machines. With containers, all the code and enironment will be consistent.

* You need not install Fiber on all the machines. You just have to install it on on the master machine, rest will be handeled by Fiber.

* Fiber works similar to Multiprocessing in Python. In multiprocessing you run your code on different cores of the same machine. Whereas in Fiber the code is running on different machines.

* Fiber automatically handles the workers which get crashed and replaces with available new worker and restarts the task, if still pending, on new worker.

* Fiber is fast owing to the fact that Fiber's communication backbone is built on top of Nanomsg (A high-performance asynchronous messaging library), 

* With Fiber we can scale out with ease.

# Side Note

**Kubernetes-vs-Fiber**

Note: I am not sure about Kubernetes part as I don't know much about Kubernetes.

We can achieve cluster computing using Kubernetes only but with Kubernetes whole code runs on different workers. whereas with fiber code runs on different workers but we can get the output of all the workers back to the master node to assemble. 