![saama](images/slide_1.png)

# Boosting Tensorflow inference performance with asyncio
### by Derrick Joseph, Associate Consultant, Saama Technologies Pvt Ltd.

### About me..

* I am Derrick Joseph , Research Engineer focused on bringing Software Engineering Practice to world of Deep Learning. 
* I am currently trying to optimize and accelerate drug clinical trials using Deep Learning. 
* Connect with me 
    * Linkedin : https://www.linkedin.com/in/derrick-joseph-545566a4/
    * Twitter  : @c0d3eswaran
    * Email    : derrick.joseph@saama.com

### What is a "Tensor" in Tensorflow...?

* A mathematical structure that is basically a group of matrices(vectors, scalars).
* The major advantage is mathematical operations can be done on a batch of multiple linear equtions.
* Can be represented with higher and more complex dimensions than traditional matrices.

![Tensors](images/tensor examples.svg)

### But Alas, every Rose 🌹 has a thorn.

Batching is fine on papers and while training the model but in a production service..😶 🤐


![Batching](images/batching.png)

> https://www.tensorflow.org/serving/serving_advanced

## There has to be a better way...🤔 

We need to figure out a scalable way to batch tensors in a production deep learning service.

## Actor and CSP patterns

* Both were suggested by the same Computer Scientist **Tony Hoare**.
* Actor pattern in 1973 and CSP in 1978.
* CSP stands for **Communicating Sequential Processes**.

Both look eerily similar but have a lot of major differences

## CSP a quick primer

* **Communicating Sequential Processes**
* Consists of Processes connected via Channels(think of Pipes if you like that, *but Pipe is an OS primitve*).
* In this context **Process and Channels are not created with OS services** but rather by the VM internally.
* Processes respond to Events.
* Example of an Event can be availability of a message on the Channel a Process is listening to.
* **Events are managed by OS Service "Selectors".**


## Coroutine FAQs..

* A Subroutine is a block of assembly code that does not returns a value, but rather changes the values of Processor registers. It is a low level primitive.
* A function is a highler level primitive that starts execution and returns a value, it has a single entry and exit point like the traditional `def foo():`.
* A Coroutine is a high level primitive that has multiple entry and exit point for eg a generator.
```python
def gen_foo():
    yield 1
    yield 2
```

### Coroutine FAQs contd..:
* Threads, Processes and Pipes are OS Primitives managed by OS and are controlled by the VM usig OS kernel services.
* Coroutines are completely a VM Primitive, OS isnt aware of the existence of a coroutine.
* The coroutine waits for its designated Event to occur.
  ```python
  data = await a_data_source()
  # or
  yield some_value
  ```
* These Events are OS Primitves managed by the OS.
* The OS Services that manages events are, KqueueSelector for OSX, Epoll/poll for Linux, Select or Windows.
* For more info refer to the documentation of `selectors` module in Python3.

### How does a Coroutine/generator work in Python.
* When a Function call happens:
    1. Python Creates a Stack Frame inside the VM Stack(PUSH) and assigns the respective PyCodeObject to the frame.
    2. Executes the Function code.
    3. Takes the Return value, POPS the Stack Frame off the VM Stack by marking it for Garbage collection.


** The Python VM Stack is actually a HEAP behaving like a STACK..!!!**😮

Which means you can pick and choose STACK frames and not necessarily maintain a LIFO order.
An yield statements freezes the current state of the generator and returns the value, upon calling next(generator) it continues from the frozen state.

### So, Why are we talking about this..?

1. The idea is to make PythonVM do the complicated **state management** for us.
2. Especially the parts where we use external dependencies like **Redis, Celery etc**.

## Asyncio

* Builds on generators.
* Uses OS Selectors to become self sustaining psuedo processes.
* Borrows design cues from Twisted.
* Consists of an **Event Loop that deals with OS Selector service**
* Coroutines are submitted to the Event Loop, at that point they become **Future**, analogous to JavaScript Promises.
* When the ***Future* is resolved and starts execution by the *event loop* the Future is called a *Task* **

## Coroutine Chains
* Coroutines can be connected to form a lightweight Data Pipeline.
* Two types of Chains:
    1. Coroutine spawns new child Coroutine(Actor Pattern).
    2. Producer and Consumer Coroutines are connected by a Message passing Pipeline(CSP Pattern).

### Coroutine spawning Coroutine

In [None]:
from sanic import Sanic
from sanic import response
from signal import signal, SIGINT
import asyncio
import uvloop

app = Sanic(__name__)

async def child(sample,ws):
    resp = await a_batch_processing_service(sample)
    await ws.send(resp)
    

@app.route("/process_sample")
async def parent(request):
    fut = asyncio.ensure_future(child(request.get(data),ws))
    fut.add_done_callback(lambda fut:log.info('DONE'))
    return

if __name__ == '__main__':
    app.run(host="0.0.0.0", port=8000, debug=True)

In [1]:
from IPython.core.display import HTML, display
HTML("""<iframe src="https://docs.google.com/a/saama.com/presentation/d/e/2PACX-1vRRFuZEwpzgElezcJmoqn2wSR2MvXOcntSC32cvdPKcKUpCopul7T3wTtoC5kF2_10I2t40mHNLWfzL/embed?start=false&loop=false&delayms=3000" frameborder="0" width="960" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>""")

## Connected Coroutines/Coroutine chains
Producer and Consumer coroutines...

In [None]:
import asyncio;import random

async def produce(queue, n):
    for x in range(n):
        # produce an item
        print('producing {}/{}'.format(x, n))
        # simulate i/o operation using sleep
        await asyncio.sleep(random.random())
        item = str(x)
        # put the item in the queue
        await queue.put(item)

async def consume(queue):
    while True:
        # wait for an item from the producer
        item = await queue.get()

        # process the item
        print('consuming {}...'.format(item))
        # simulate i/o operation using sleep
        await asyncio.sleep(random.random())

        # Notify the queue that the item has been processed
        queue.task_done()

async def run(n):
    queue = asyncio.Queue()
    # schedule the consumer
    consumer = asyncio.ensure_future(consume(queue))
    # run the producer and wait for completion
    await produce(queue, n)
    # wait until the consumer has processed all items
    await queue.join()
    # the consumer is still awaiting for an item, cancel it
    consumer.cancel()


loop = asyncio.get_event_loop()
loop.run_until_complete(run(10))
loop.close()

In [2]:
from IPython.core.display import HTML, display
HTML("""<iframe src="https://docs.google.com/a/saama.com/presentation/d/e/2PACX-1vTZpD8Qb8MD1FCC0gQtYNHpGTiC04gpkA8H85_JR9bznIJysryq8AF2YU1zuSH5IC-lthMfUGA6Xq90/embed?start=false&loop=false&delayms=3000" frameborder="0" width="960" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>""")