Industrial Machine Learning

Horizontally scalable Machine Learning in Python

Alejandro Saucedo

@AxSaucedo
in/axsaucedo

[NEXT]

Industrial Machine Learning

Horizontally scalable Machine Learning in Python

![portrait](images/alejandro.jpg)
Alejandro Saucedo

    <br>
    Head of Deployed Engineering
    <br>
    <a style="color: cyan" href="http://eigentech.com">Eigen Technologies</a>
    <br>
    <br>
    Chairman
    <br>
    <a style="color: cyan" href="http://ethical.institute">The Institute for Ethical AI & ML</a>
    <br>
    <br>
    Fellow (AI, Data & ML)
    <br>
    <a style="color: cyan" href="#">The RSA</a>
    <br>
    <br>
    Advisor
    <br>
    <a style="color: cyan" href="http://teensinai.com">TeensInAI.com Initiative</a>
    <br>

</td>

Contact me at: a@e-x.io

[NEXT]

Industry-ready ML

An overview of caveats in deploying ML

Very high level talk

Intuitive overview of ML

Going distributed, and beyond

The big picture

[NEXT]

Learning by example

Today we are...

Building a tech startup

[NEXT]

Crypto-ML Ltd.

Let's jump the hype-train!

A ML framework for crypto-data

Supporting heavy compute/memory ML

Can our system survive the 2017 crypto-craze?

[NEXT]

The Dataset

All historical data from top 100 cryptos

Data goes from beginning to 11/2017

563871 daily-price (close) points

Objectives:

* Supporting heavy ML computations * Supporting increasing traffic

[NEXT]

Interface: CryptoLoader


from crypto_ml.data_loader import CryptoLoader cl

loader = cl()

loader.get_prices("bitcoin")
> array([ 134.21,  144.54,  139.  , ..., 3905.95, 3631.04, 3630.7 ])


loader.get_df("bitcoin").head()
>            Date    Open    High     Low   Close     Market Cap
> 1608 2013-04-28  135.30  135.98  132.10  134.21  1,500,520,000
> 1607 2013-04-29  134.44  147.49  134.00  144.54  1,491,160,000
> 1606 2013-04-30  144.00  146.93  134.05  139.00  1,597,780,000
> 1605 2013-05-01  139.00  139.89  107.72  116.99  1,542,820,000
> 1604 2013-05-02  116.38  125.60   92.28  105.21  1,292,190,000

[NEXT]

Interface: CryptoManager


from crypto_ml.manager import CryptoManager as cm

manager = cm()

manager.send_tasks()

> bitcoin [[4952.28323284 5492.85474648 6033.42626011 6573.99777374 7114.56928738
> 7655.14080101 8195.71231465 8736.28382828 9276.85534192 9817.42685555]]
> bitconnect [[157.70136155 181.86603134 206.03070113 230.19537092 254.36004071
> 278.5247105  302.6893803  326.85405009 351.01871988 375.18338967]]

[NEXT]

Code

https://github.com/axsauze/crypto-ml

Slides

http://github.com/axsauze/industrial-machine-learning

[NEXT]

#LetsDoThis

[NEXT]

1. The Early Crypto-Beginnings

[NEXT]

Crypto-ML Ltd. managed to obtain access to a unique dataset, which allowed them

to built their initial prototype and raise massive amounts of VC money


import random

def predict_crypto(self, crypto_data):
    # I have no clue what I'm doing
    return crypto_data * random.uniform(0, 1)

Now they need to figure out what ML is

[NEXT]

ML Tutorials Everywhere

[NEXT] Given some input data, predict the correct output

Let's try to predict whether a shape is a square or a triangle

How do we do this?

[NEXT]

Let's visualise it

Imagine a 2-d plot
The x-axis is the area of the input shape
The y-axis is the perimeter of the input shape

[NEXT]

All about the function

**$f(x̄) = mx̄ + b$**, where:

**x̄** is input (area & perimeter)

**m** and **b** are weights/bias

The result $f(x̄)$ states whether it's a triangle or square

(i.e. if it's larger than 0.5 it's triangle otherwise square)

[NEXT]

We let the machine do the learning

[NEXT]

We let the machine do the learning

[NEXT]

We let the machine do the learning

[NEXT]

Minimising loss function

We optimise the model by minimising its loss.

Keep adjusting the weights...

...until loss is not getting any smaller.

[NEXT]

Finding the weights!

When it finishes, we find optimised weights and biases

i.e. $f(x̄)$ = triangle if ($0.3 x̄ + 10$) > 0.5 else square

[NEXT]

Now predict new data

Once we have our function, we can predict NEW data!

[NEXT]

We're ML experts!

Please collect your certificates after the talk

These are valid in:

Your Linkedin profile
Non-tech Meetups and Parties
Any time you reply to a tweet

[NEXT] The Crypto-ML devs asked themselves...


from crypto_ml.data_loader import CryptoLoader

btc = CryptoLoader().get_df("bitcoin")

btc.head()

>            Date    Open    High     Low   Close     Market Cap
> 1608 2013-04-28  135.30  135.98  132.10  134.21  1,500,520,000
> 1607 2013-04-29  134.44  147.49  134.00  144.54  1,491,160,000
> 1606 2013-04-30  144.00  146.93  134.05  139.00  1,597,780,000
> 1605 2013-05-01  139.00  139.89  107.72  116.99  1,542,820,000
> 1604 2013-05-02  116.38  125.60   92.28  105.21  1,292,190,000

...can this be used for our cryptocurrency price data?

[NEXT]

Not yet.

Processing sequential data requires a different approach.

Instead of trying to predict two classes...

...we want to predict future steps

[NEXT]

Sequential Models

Sequential models often are used to predict future data.

![classification_small](images/linreg.gif)

Still uses the same approach

f(x) = mx + b

To find the weights and biases

But can be used on time-sequence data - ie. prices, words, characters, etc.

[NEXT]

The hello_world of sequential models

Predicting prices by fitting a line on set of time-series points

Linear Regression

[NEXT]

Linear Regression


from sklearn import linear_model


def predict(prices, times, predict=10):

    model = linear_model.LinearRegression()

    model.fit(times, prices)

    predict_times = get_prediction_timeline(times, predict)

    return model.predict(predict_times)

[NEXT]

Linear Regression


from crypto_ml.data_loader import CryptoLoader


cl = CryptoLoader()

df = cl.get_df("bitcoin")

times = df[["Date"]].values
prices = df[["Price"]].values

results = predict(prices, times, 5)

Success

[NEXT]

But the Crypto-ML team wants

#cutting edge tech

[NEXT]

2. Diving deep into the hype

[NEXT] The Crypto-ML team realised that their usecases were much more complex

[NEXT] The team had to learn many critical points on machine learning development!

![perceptron_learning](images/perceptron_learning4.png)

Extending our feature space
Increasing number of inputs
Regularisation techniques (dropout, batch normalisation)
Normalising datasets

[NEXT] But they also stumbled upon some neural network tutorials...

[NEXT]

Remember our favourite function?

f(x) = mx + b

Now on a simple perceptron function

in a neural network

[NEXT]

Instead of just one neuron

[NEXT]

We just have many

This gives the function more flexibility

[NEXT]

With a few layers

This gives more flexibility for learning

[NEXT]

Deep Neural Networks

[NEXT]

Deep Networks — many hidden layers

[NEXT]

For sequential models?

Deep Recurrent Neural Networks

(e.g. LSTMs)

[NEXT]

Simplified Visualisation

One node represents a full layer of neurons.

[NEXT]

Simplified Visualisation

One node represents a full layer of neurons.

[NEXT]

Unrolled Recurrent Network

Previous predictions help make the next prediction.

Each prediction is a time step.

[NEXT]

Recurrent Networks

Hidden layer's input includes the output of itself during the last run of the network.

[NEXT]

Loss/Cost function

Cost function is based on getting the prediction right!

[NEXT]

Training an RNN in Python



def deep_predict(prices):

    p = 10

    model = get_rnn_model()

    x, y = build_lstm_data(prices, 50)

    model.fit(x, y, batch_size=512, nb_epoch=1, validation_split=0.05)

    return rnn_predict(model, x, prices)

Build model
Compute the weights
Return the prediction

[NEXT]

Not too different, eh!


from sklearn import linear_model


def predict(prices, times, predict=10):

    model = linear_model.LinearRegression()

    model.fit(times, prices)

    predict_times = get_prediction_timeline(times, predict)

    return model.predict(predict_times)



def deep_predict(prices):

    p = 10

    model = get_rnn_model()

    x, y = build_lstm_data(prices, 50)

    model.fit(x, y, batch_size=512, nb_epoch=1, validation_split=0.05)

    return rnn_predict(model, x, prices)

[NEXT]

Code to build the the LSTM


from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential
import lstm

def get_rnn_model():
    model = Sequential()
    model.add(LSTM(input_dim=1, output_dim=50, return_sequences=True))
    model.add(Dropout(0.2))

    model.add(LSTM(100, return_sequences=False))
    model.add(Dropout(0.2))

    model.add(Dense(output_dim=1))
    model.add(Activation('linear'))

    model.compile(loss="mse", optimizer="rmsprop")

    return model

A linear dense layer to aggregate the data into a single value

Compile with mean sq. error & gradient descent as optimizer

Simple!

[NEXT]

RNN Test-run


from crypto_ml.data_loader import CryptoLoader


cl = CryptoLoader()

df = cl.get_df("bitcoin")

times = df[["Date"]].values
prices = df[["Price"]].values

results = deep_predict(prices, times, 5)

Success

[NEXT]

Side note

In this example we are training and predicting in the same function.

Normally you would train your model, and then run the model in "production" for predictions

[NEXT]

Crypto-ML

has the DL!

Are we done then?

Nope

The fun is just starting

[NEXT]

3. Going distributed

[NEXT]

After CryptoML was caught using deep learning...

...they were featured in the top 10 global news

[NEXT]

Their userbase exploded

Now they have quite a few users coming in every day

Each user is running several ML algorithms concurrently

They tried getting larger and larger AWS servers

[NEXT]

They should've seen this coming

Machine learning is known for being computing heavy
But often it's forgotten how memory-heavy it is
I'm talking VERY heavy - holding whole models in-mem
Scaling to bigger instances with more cores is expensive
Having everything in one node is a central point of failure

It's time to go for scale

[NEXT]

Producer-consumer Architecture

The Crypto-ML Devs thought go distributed was too hard

It's not.

[NEXT]

Introducing Celery

Distributed
Asynchronous
Task Queue
For Python

[NEXT]

Step 1: Choose a function



def deep_predict(prices, times, predict=10):

    model = utils.get_rnn_model()

    model.fit(time, prices, batch_size=512, nb_epoch=1, validation_split=0.05)

    predict_times = get_prediction_timeline(times, predict)

    return model.predict(predict_times)

[NEXT]

Step 2: Celerize it


from celery import Celery
from utils import load, dump

# Initialise celery with rabbitMQ address
app = Celery('crypto_ml',
    backend='amqp://guest@localhost/',    
    broker='amqp://guest@localhost/')

# Add decorator for task (can also be class)
@app.task
def deep_predict(d_prices, d_times, predict=10):

    # Load from stringified binaries (Pandas Dataframes)
    prices = load(d_prices)
    times = load(d_times)

    model = utils.get_rnn_model()

    model.fit(time, prices, batch_size=512, nb_epoch=1, validation_split=0.05)

    predict_times = get_prediction_timeline(times, predict)

    return dump(model.predict(predict_times))

[NEXT]

Step 3: Run it!

from crypto_ml.models import deep_predict


$ celery -A crypto_ml worker

Monitor the activity logs:


$ celery -A crypto_ml worker
Darwin-15.6.0-x86_64-i386-64bit 2018-03-10 00:43:28

[config]
.> app:         crypto_ml:0x10e81a9e8
.> transport:   amqp://user:**@localhost:5672//
.> results:     amqp://
.> concurrency: 8 (prefork)
.> task events: OFF (enable -E to monitor tasks in this worker)

[queues]
.> celery           exchange=celery(direct) key=celery

[NEXT]

We're already halfway there

Now we just need to make the producer!

We can just follow the same recipe

[NEXT]

Step 1: Take the code


cl = CryptoLoader()
results = {}

# Compute results
for name in cl.get_crypto_names():

    prices, times = cl.get_prices(name)

    result = deep_predict(prices, times)

    results[name] = result

# Print results
for k,v in results.items():
    print(k, v)

[NEXT]

Step 2: Celerize it

from crypto_ml.data_loader import CryptoLoader
from util import load, dump

cl = CryptoLoader()
results = {}

# Send task for distributed computing
for name in cl.get_crypto_names():

    prices, times = cl.get_prices(name)

    task = deep_predict.delay(
                      dump(prices)
                    , dump(times))

    results[name] = task

# Wait for results and print
for k,v in results.items():
    p_result = v.get()

    if result:
        result = load(p_result)
        print(k, result)

[NEXT]

Step 3: Run it!

By just running the Python in a shell command!

[NEXT]

Visualise it

You can visualise through Celery "Flower"

$ celery -A crypto_ml flower
> INFO: flower.command: Visit me at http://localhost:5555

[NEXT]

Distriuted #Win

We now have ML, and are distributed.

We have surely won.

We can pack our ba- oh, not yet?

Not really

[NEXT]

4. Smart Data Pipelines

[NEXT]

The Crypto-ML has now an exponentially increasing amount of internal use-cases

Their datapipeline is getting unmanagable!

[NEXT]

Growingly complex data flows

There is a growing need to pull data from different sources
There is a growing need to pre- and post-process data
As complexity increases tasks depending on others
If a task fails we wouldn't want to run the children tasks
Some tasks need to be triggered chronologically
Data pipelines can get quite complex
Having just celerized tasks ran by Linux chronjob gets messy

[NEXT]

You want to go from here

[NEXT]

To here

[NEXT]

Introducing Airflow

The swiss army knife of data pipelines

[NEXT]

What Airflow IS

Written in Python!
Alternative to Chronos (also built by AirBnb) + Luigi
Has a scheduler (like chronjob, but not like cronjob)
Can define tasks and dependent tasks (as a pipeline)
Has real-time visualisation of jobs
Modular separation between framework and logic
Can run on top of celery (without any modifications)
Being introduced to the apache family (incubation)
Actively maintained and growing community
Used by tons of companies (AirBnB, Paypal, Quora,)

[NEXT]

What Airflow is NOT

Airflow is not perfect (but the best out there)
It's not a Lambda/FaaS framework (but can be programmed)
Is not extremely mature (ie incubation)
Airflow is not a data streaming solution (e.g. Storm/Spark Streaming)

[NEXT]

The DAG Architecture

[NEXT]

The scheduler

[NEXT]

The tree view (and sub-components)

[NEXT]

The Crypto-ML Usecase

Scheduled task:

Operator: polls new Crypto-data + triggers each sub-dags

Sub-DAG:

Operator: Transforms the data
Operator: Sends data to the crypto-prediction engine
Sensor: Polls until the prediction is finished
Branch: Modifies data & based on rules triggers action
- Operator: Stores data in database
- Operator: Sends request to trade

[NEXT]

The Crypto-ML Usecase

[NEXT]

Go check it out!

airflow.apache.org

[NEXT]

5. Elastic DevOps Infrastructure

[NEXT] The CryptoML team remembers the easy days...

...when they only had a few new users a day

Now they have much heavier traffic!

They are working with large organsations!

They are processing massive loads of ML requests!

### Their DevOps infrastructure # Can't keep up!

[NEXT]

Underestimating DevOps complexity

Complexity of staging and deploying ML models
Backwards compatibility of feature-spaces/models
Distributing load across infrastructure
Idle resource time minimisation
Node failure back-up strategies
Testing of Machine Learning functionality
Monitoring of ML ecosystem
And the list goes on and on...

[NEXT]

Did anyone say docker?

Package it. Ship it.

[NEXT]

Gotta love the simplicity

from conda/miniconda3-centos7:latest

ADD . /crypto_ml/
WORKDIR /crypto_ml/

# Dependency for one of the ML predictors
RUN yum install mesa-libGL -y

# Install all the dependencies
RUN conda env create -f crypto_ml.yml

and...

docker -t crypto_ml .

it's done!

[NEXT]

Can this be more awesome?

####Yes it can!

Packaging it up and installing through pip/conda.

from conda/miniconda3-centos7:latest
RUN conda install <YOUR_PACKAGE>

Nice and simple!

[NEXT]

The obvious docker-compose

version: '2'
services:
    manager:
        container_name: crypto_manager
        image: crypto_ml
        build: .
        links:
            - rabbitmq
        depends_on:
            - rabbitmq
        command: tail -f /dev/null
    worker:
        container_name: crypto_worker
        image: crypto_ml
        build: .
        links:
            - rabbitmq
        depends_on:
            - rabbitmq
        command: /usr/local/envs/crypto_ml/bin/celery -A crypto_ml worker --prefetch-multiplier 1 --max-tasks-per-child 1 -O fair
    rabbitmq:
        container_name: rabbitmq
        image: rabbitmq:3.6.0
        environment:
            - RABBITMQ_DEFAULT_USER=user
            - RABBITMQ_DEFAULT_PASS=1234
        ports:
            - 4369
            - 5672
            - 15672

It all just works automagically!

docker-compose up --scale crypto_worker=4

[NEXT]

Taking it to the next level with

Kubernetes

[NEXT]

All the YML

Creating all the services and controllers

kubectl create -f k8s/*

[NEXT]

Kubernetes: Dev

Development

Minikube
Docker for Mac

[NEXT]

Kubernetes: Prod

Enterprise

Elastic Container Service for K8s (AWS)
CoreOS Tectonic
Red Hat OpenShift

[NEXT]

Kubernetes: Tools

Many options

CloudFormation (AWS)
Terraform
Kops

[NEXT]

Docker for Mac now supports it!

Mini fist-pump

[NEXT] What's next for Crypto-ML?

As with every other tech startup

the roller-coaster keeps going!

[NEXT]

But for us?

Obtained an intuitive understanding on ML

Learned about caveats on practical ML

Obtained tips on building distributed architectures

Got an taste of elastic DevOps infrastructure

[NEXT]

Code

https://github.com/axsauze/crypto-ml

Slides

http://github.com/axsauze/industrial-machine-learning

[NEXT]

Industrial Machine Learning

![portrait](images/alejandro.jpg)
Alejandro Saucedo

    <br>
    Head of Deployed Engineering
    <br>
    <a style="color: cyan" href="http://eigentech.com">Eigen Technologies</a>
    <br>
    <br>
    Chairman
    <br>
    <a style="color: cyan" href="http://ethical.institute">The Institute for Ethical AI & ML</a>
    <br>
    <br>
    Fellow (AI, Data & ML)
    <br>
    <a style="color: cyan" href="#">The RSA</a>
    <br>
    <br>
    Advisor
    <br>
    <a style="color: cyan" href="http://teensinai.com">TeensInAI.com initiative</a>
    <br>

</td>

Files

contents3.md

Latest commit

History

contents3.md

File metadata and controls

Industrial Machine Learning

Horizontally scalable Machine Learning in Python

Industrial Machine Learning

Horizontally scalable Machine Learning in Python

Contact me at: a@e-x.io

Industry-ready ML

The big picture

Learning by example

Building a tech startup

Crypto-ML Ltd.

The Dataset

Interface: CryptoLoader

Interface: CryptoManager

Code

Slides

#LetsDoThis

1. The Early Crypto-Beginnings

Now they need to figure out what ML is

ML Tutorials Everywhere

How do we do this?

Let's visualise it

All about the function

We let the machine do the learning

We let the machine do the learning

We let the machine do the learning

Minimising loss function

Finding the weights!

Now predict new data

We're ML experts!

Not yet.

Sequential Models

The hello_world of sequential models

Linear Regression

Linear Regression

Linear Regression

Success

But the Crypto-ML team wants

2. Diving deep into the hype

Remember our favourite function?

Instead of just one neuron

We just have many

With a few layers

Deep Neural Networks

Deep Networks — many hidden layers

Deep Recurrent Neural Networks

(e.g. LSTMs)

Simplified Visualisation

Simplified Visualisation

Unrolled Recurrent Network

Recurrent Networks

Loss/Cost function

Training an RNN in Python

Not too different, eh!

Code to build the the LSTM

RNN Test-run

Success

Side note

Crypto-ML

has the DL!

Nope

3. Going distributed

Their userbase exploded

They should've seen this coming

It's time to go for scale

Producer-consumer Architecture

Introducing Celery

Step 1: Choose a function

Step 2: Celerize it

Step 3: Run it!

We're already halfway there

Step 1: Take the code

Step 2: Celerize it

Step 3: Run it!

Visualise it