# Book Part II: Neural Networks & Deep Learning
   
<img src="res/book.jpg" width = 25% align = "right">
   
---
**CH 12 - Distributing TensorFlow Across Devices and Servers <---  <span style="color: #FF0000">THIS WEEK !</span>**

---
**CH 13 - Convolutional Neural Networks <--- <span style="color: #0000FF">NEXT WEEK</span>**

---
**CH 14 - Recurrent Neural Networks**

---
**CH 15 - Autoencoders**

---
**CH 16 - Reinforcement Learning**

---


# Distributing Tensorflow Across Devices and Servers


### By default tensorflow will run everything on your CPU
<img src="./images/cpu.png" width="20%" align="right"/>

- This is quite slow

- Your CPU will be busy with the OS, all your background tasks at the same time 

- We need a way to speed up our model training so it doesn't take days or even weeks to train something good!


## So what can we do to speed it up?

### - Tensorflow was built to scale!
### - Because each node in a neural network is a separate operation, it can be run on a different device.

<img src="./images/deep_neural_network.png"/>

**Example Deep Neural Network:** with 5 hidden layers, 4 inputs, and 3 outputs

# 1. Use a GPU

<img src="./images/gpu.png" width="35%" align="right"/>

### Setup
- Uninstall your old tensorflow
- Install tensorflow-gpu `pip install tensorflow-gpu`
- Install Nvidia Graphics Card & Drivers (you probably already have)
- Download & Install CUDA
- Download & Install cuDNN
- Verify:
```python
from tensorflow.python.client import device_lib 
 print(device_lib.list_local_devices())
    ```
 
### 'Pin' Some (or All) Nodes to the GPU
- This will assign a node to belong to a device

In [None]:
with tf.device("/gpu:0"):    
    pi = tf.Variable(3.1415926535897932384)

### And the rest on the CPU

In [None]:
with tf.device("/cpu:0"):    
    a = tf.Variable(3.0)    
    b = tf.constant(4.0)
c = a * b

### Important: A GPU can only perform operations that it has a kernel (operation declaration) for! 
- For example, it doesn't have a kernel for Integers variables!
- Use **soft placement** to automatically send nodes back to the CPU if they aren't defined for the GPU

In [None]:
config = tf.ConfigProto() 
config.allow_soft_placement = True 
sess = tf.Session(config=config) 
sess.run(i.initializer)  # the placer runs and falls back to /cpu:0 

### And thats it! Tensorflow will run the nodes in parallel across all the devices.
#### This works for more devices if you have them (`/gpu:1`, `/gpu:2`...). 

# 2. Use Servers

### We can use multiple computers with many GPU's and CPU's as well. But things get more complicated...

- We need to make a cluster of tensorflow servers and divide our work evenly!

- The hardest part is choosing how to divide your work, and data among many devices

**Official Tensorflow Summary with Code:**

https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/distributed.md

<img src="./images/terms.PNG" />

#### Client

A client is typically a program that builds a TensorFlow graph and constructs a `tensorflow::Session` to interact with a cluster. Clients are typically written in Python or C++. A single client process can directly interact with multiple TensorFlow servers, and a single server can serve multiple clients.

<img src="./images/client-server.PNG" />

#### Cluster

A TensorFlow cluster comprises one or more "jobs", each divided into lists of one or more "tasks". A cluster is typically dedicated to a particular high-level objective, such as training a neural network, using many machines in parallel. A cluster is defined by a `tf.train.ClusterSpec` object.


In [None]:
cluster_spec = tf.train.ClusterSpec({    
    "ps": [
        "machine-a.example.com:2221",  # /job:ps/task:0    
    ],    
    "worker": [        
        "machine-a.example.com:2222",  # /job:worker/task:0        
        "machine-b.example.com:2222",  # /job:worker/task:1    
    ]
}) 

#### Job

A job comprises a list of "tasks", which typically serve a common purpose. For example, a job named `ps` (for "parameter server") typically hosts nodes that store and update variables; while a job named `worker` typically hosts stateless nodes that perform compute-intensive tasks. The tasks in a job typically run on different machines. The set of job roles is flexible: for example, a `worker` may maintain some state.

<img src="./images/job-task-cluster.PNG"/>

#### Task

A task corresponds to a specific TensorFlow server, and typically corresponds to a single process. A task belongs to a particular "job" and is identified by its index within that job's list of tasks.

#### TensorFlow server 

A process running a `tf.train.Server` instance, which is a member of a cluster, and exports a "master service" and "worker service".

<img src="./images/server-labelled.PNG"/>

#### Master service

A service that provides remote access to a set of distributed devices, and acts as a session target. The master service implements the tensorflow::Session interface, and is responsible for coordinating work across one or more "worker services". All TensorFlow servers implement the master service.

#### Worker service

A service that executes parts of a TensorFlow graph using its local devices. All TensorFlow servers implement the worker service.

## How do we split the work in our servers?

- #### There are too many ways to split the work, and it is sensitive to what kind of network you are using!
- #### Can split your network by layer or horizontally per worker and split your data into mini-batches so you aren't waiting on other servers too long
- #### Can train a whole neural network on each server with different hyperparameters when you are looking for the best
- #### Can distribute training one neural network across all servers by programatically pinning nodes to each GPU/CPU on each server  in a round robin way, and then share the updating parameters from each on a parameter server (with a queue to eliminate race conditions)
- #### Can shard your data and train different neural networks on different servers and then ensemble them together
