Skip to content

Latest commit

 

History

History
1291 lines (819 loc) · 25.2 KB

contents3.md

File metadata and controls

1291 lines (819 loc) · 25.2 KB

Industrial Machine Learning

Horizontally scalable Machine Learning in Python



Alejandro Saucedo

@AxSaucedo
in/axsaucedo

[NEXT]

Industrial Machine Learning

Horizontally scalable Machine Learning in Python

![portrait](images/alejandro.jpg)
Alejandro Saucedo

    <br>
    Head of Deployed Engineering
    <br>
    <a style="color: cyan" href="http://eigentech.com">Eigen Technologies</a>
    <br>
    <br>
    Chairman
    <br>
    <a style="color: cyan" href="http://ethical.institute">The Institute for Ethical AI & ML</a>
    <br>
    <br>
    Fellow (AI, Data & ML)
    <br>
    <a style="color: cyan" href="#">The RSA</a>
    <br>
    <br>
    Advisor
    <br>
    <a style="color: cyan" href="http://teensinai.com">TeensInAI.com Initiative</a>
    <br>

</td>

[NEXT]

Industry-ready ML

An overview of caveats in deploying ML

Very high level talk

Intuitive overview of ML

Going distributed, and beyond

The big picture

[NEXT]

Learning by example

Today we are...

Building a tech startup

[NEXT]

Crypto-ML Ltd.

Let's jump the hype-train!

A ML framework for crypto-data

Supporting heavy compute/memory ML

Can our system survive the 2017 crypto-craze?

[NEXT]

The Dataset

All historical data from top 100 cryptos

Data goes from beginning to 11/2017

563871 daily-price (close) points

Objectives:

* Supporting heavy ML computations * Supporting increasing traffic

[NEXT]

Interface: CryptoLoader


from crypto_ml.data_loader import CryptoLoader cl

loader = cl()

loader.get_prices("bitcoin")
> array([ 134.21,  144.54,  139.  , ..., 3905.95, 3631.04, 3630.7 ])


loader.get_df("bitcoin").head()
>            Date    Open    High     Low   Close     Market Cap
> 1608 2013-04-28  135.30  135.98  132.10  134.21  1,500,520,000
> 1607 2013-04-29  134.44  147.49  134.00  144.54  1,491,160,000
> 1606 2013-04-30  144.00  146.93  134.05  139.00  1,597,780,000
> 1605 2013-05-01  139.00  139.89  107.72  116.99  1,542,820,000
> 1604 2013-05-02  116.38  125.60   92.28  105.21  1,292,190,000

[NEXT]

Interface: CryptoManager


from crypto_ml.manager import CryptoManager as cm

manager = cm()

manager.send_tasks()

> bitcoin [[4952.28323284 5492.85474648 6033.42626011 6573.99777374 7114.56928738
> 7655.14080101 8195.71231465 8736.28382828 9276.85534192 9817.42685555]]
> bitconnect [[157.70136155 181.86603134 206.03070113 230.19537092 254.36004071
> 278.5247105  302.6893803  326.85405009 351.01871988 375.18338967]]

[NEXT]

Code

https://github.com/axsauze/crypto-ml

Slides

http://github.com/axsauze/industrial-machine-learning

[NEXT]

#LetsDoThis

[NEXT]

1. The Early Crypto-Beginnings

[NEXT]

Crypto-ML Ltd. managed to obtain access to a unique dataset, which allowed them

to built their initial prototype and raise massive amounts of VC money


import random

def predict_crypto(self, crypto_data):
    # I have no clue what I'm doing
    return crypto_data * random.uniform(0, 1)


Now they need to figure out what ML is

[NEXT]

ML Tutorials Everywhere

shapes

[NEXT] Given some input data, predict the correct output

shapes

Let's try to predict whether a shape is a square or a triangle

How do we do this?

[NEXT]

Let's visualise it

  • Imagine a 2-d plot
  • The x-axis is the area of the input shape
  • The y-axis is the perimeter of the input shape

classification

[NEXT]

All about the function

**$f(x̄) = mx̄ + b$**, where:

**x̄** is input (area & perimeter)

**m** and **b** are weights/bias

The result $f(x̄)$ states whether it's a triangle or square

(i.e. if it's larger than 0.5 it's triangle otherwise square)

[NEXT]

We let the machine do the learning

classification

[NEXT]

We let the machine do the learning

classification

[NEXT]

We let the machine do the learning

classification

[NEXT]

Minimising loss function

We optimise the model by minimising its loss.

Keep adjusting the weights...

...until loss is not getting any smaller.

gradient_descent

[NEXT]

Finding the weights!

When it finishes, we find optimised weights and biases

i.e. $f(x̄)$ = triangle if ($0.3 x̄ + 10$) > 0.5 else square

[NEXT]

Now predict new data

classification_small

Once we have our function, we can predict NEW data!

[NEXT]

We're ML experts!

Please collect your certificates after the talk

These are valid in:

  • Your Linkedin profile
  • Non-tech Meetups and Parties
  • Any time you reply to a tweet

[NEXT] The Crypto-ML devs asked themselves...


from crypto_ml.data_loader import CryptoLoader

btc = CryptoLoader().get_df("bitcoin")

btc.head()

>            Date    Open    High     Low   Close     Market Cap
> 1608 2013-04-28  135.30  135.98  132.10  134.21  1,500,520,000
> 1607 2013-04-29  134.44  147.49  134.00  144.54  1,491,160,000
> 1606 2013-04-30  144.00  146.93  134.05  139.00  1,597,780,000
> 1605 2013-05-01  139.00  139.89  107.72  116.99  1,542,820,000
> 1604 2013-05-02  116.38  125.60   92.28  105.21  1,292,190,000


...can this be used for our cryptocurrency price data?

[NEXT]

Not yet.

Processing sequential data requires a different approach.

Instead of trying to predict two classes...

...we want to predict future steps

[NEXT]

Sequential Models

Sequential models often are used to predict future data.

![classification_small](images/linreg.gif)
Still uses the same approach

f(x) = mx + b

To find the weights and biases

But can be used on time-sequence data - ie. prices, words, characters, etc.

[NEXT]

The hello_world of sequential models

Predicting prices by fitting a line on set of time-series points

classification_small

Linear Regression

[NEXT]

Linear Regression


from sklearn import linear_model


def predict(prices, times, predict=10):

    model = linear_model.LinearRegression()

    model.fit(times, prices)

    predict_times = get_prediction_timeline(times, predict)

    return model.predict(predict_times)

[NEXT]

Linear Regression


from crypto_ml.data_loader import CryptoLoader


cl = CryptoLoader()

df = cl.get_df("bitcoin")

times = df[["Date"]].values
prices = df[["Price"]].values

results = predict(prices, times, 5)

Success

[NEXT]

But the Crypto-ML team wants

#cutting edge tech

[NEXT]

2. Diving deep into the hype

[NEXT] The Crypto-ML team realised that their usecases were much more complex

perceptron_learning

[NEXT] The team had to learn many critical points on machine learning development!

![perceptron_learning](images/perceptron_learning4.png)

  • Extending our feature space
  • Increasing number of inputs
  • Regularisation techniques (dropout, batch normalisation)
  • Normalising datasets

[NEXT] But they also stumbled upon some neural network tutorials...

[NEXT]

Remember our favourite function?

f(x) = mx + b

Now on a simple perceptron function

in a neural network

perceptron

[NEXT]

Instead of just one neuron

rnn_diagram

[NEXT]

We just have many

rnn_diagram

This gives the function more flexibility

[NEXT]

With a few layers

deep_rnn_diagram

This gives more flexibility for learning

[NEXT]

Deep Neural Networks

perceptron_learning

[NEXT]

Deep Networks — many hidden layers

deep_rnn_diagram

[NEXT]

For sequential models?

Deep Recurrent Neural Networks

(e.g. LSTMs)

[NEXT]

Simplified Visualisation

rnn_compress_expanded One node represents a full layer of neurons.

[NEXT]

Simplified Visualisation

rnn_compressed

One node represents a full layer of neurons.

[NEXT]

Unrolled Recurrent Network

Previous predictions help make the next prediction.

Each prediction is a time step.

rnn_unrolled_chars

[NEXT]

Recurrent Networks

rnn_compressed

Hidden layer's input includes the output of itself during the last run of the network.

[NEXT] rnn_unrolled_chars

[NEXT] rnn_unrolled_chars

[NEXT] rnn_unrolled_chars

[NEXT] rnn_unrolled_chars

[NEXT] rnn_unrolled_chars

[NEXT] rnn_unrolled_chars

[NEXT] rnn_unrolled_chars

[NEXT]

Loss/Cost function

Cost function is based on getting the prediction right!

rnn_unrolled_chars

[NEXT]

Training an RNN in Python



def deep_predict(prices):

    p = 10

    model = get_rnn_model()

    x, y = build_lstm_data(prices, 50)

    model.fit(x, y, batch_size=512, nb_epoch=1, validation_split=0.05)

    return rnn_predict(model, x, prices)

  • Build model
  • Compute the weights
  • Return the prediction

[NEXT]

Not too different, eh!


from sklearn import linear_model


def predict(prices, times, predict=10):

    model = linear_model.LinearRegression()

    model.fit(times, prices)

    predict_times = get_prediction_timeline(times, predict)

    return model.predict(predict_times)



def deep_predict(prices):

    p = 10

    model = get_rnn_model()

    x, y = build_lstm_data(prices, 50)

    model.fit(x, y, batch_size=512, nb_epoch=1, validation_split=0.05)

    return rnn_predict(model, x, prices)

[NEXT]

Code to build the the LSTM


from keras.layers.core import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from keras.models import Sequential
import lstm

def get_rnn_model():
    model = Sequential()
    model.add(LSTM(input_dim=1, output_dim=50, return_sequences=True))
    model.add(Dropout(0.2))

    model.add(LSTM(100, return_sequences=False))
    model.add(Dropout(0.2))

    model.add(Dense(output_dim=1))
    model.add(Activation('linear'))

    model.compile(loss="mse", optimizer="rmsprop")

    return model

A linear dense layer to aggregate the data into a single value

Compile with mean sq. error & gradient descent as optimizer

Simple!

[NEXT]

RNN Test-run


from crypto_ml.data_loader import CryptoLoader


cl = CryptoLoader()

df = cl.get_df("bitcoin")

times = df[["Date"]].values
prices = df[["Price"]].values

results = deep_predict(prices, times, 5)

Success

[NEXT]

Side note

In this example we are training and predicting in the same function.

Normally you would train your model, and then run the model in "production" for predictions

[NEXT]

Crypto-ML

has the DL!

Are we done then?

Nope

The fun is just starting

[NEXT]

3. Going distributed

[NEXT]

After CryptoML was caught using deep learning...

...they were featured in the top 10 global news

[NEXT]

Their userbase exploded

Now they have quite a few users coming in every day

Each user is running several ML algorithms concurrently

They tried getting larger and larger AWS servers

[NEXT]

They should've seen this coming

  • Machine learning is known for being computing heavy
  • But often it's forgotten how memory-heavy it is
  • I'm talking VERY heavy - holding whole models in-mem
  • Scaling to bigger instances with more cores is expensive
  • Having everything in one node is a central point of failure


It's time to go for scale

[NEXT]

Producer-consumer Architecture

distributed_architecture

The Crypto-ML Devs thought go distributed was too hard

It's not.

[NEXT]

Introducing Celery

celery_celery

  • Distributed
  • Asynchronous
  • Task Queue
  • For Python

[NEXT]

Step 1: Choose a function



def deep_predict(prices, times, predict=10):

    model = utils.get_rnn_model()

    model.fit(time, prices, batch_size=512, nb_epoch=1, validation_split=0.05)

    predict_times = get_prediction_timeline(times, predict)

    return model.predict(predict_times)

[NEXT]

Step 2: Celerize it


from celery import Celery
from utils import load, dump

# Initialise celery with rabbitMQ address
app = Celery('crypto_ml',
    backend='amqp://guest@localhost/',    
    broker='amqp://guest@localhost/')

# Add decorator for task (can also be class)
@app.task
def deep_predict(d_prices, d_times, predict=10):

    # Load from stringified binaries (Pandas Dataframes)
    prices = load(d_prices)
    times = load(d_times)

    model = utils.get_rnn_model()

    model.fit(time, prices, batch_size=512, nb_epoch=1, validation_split=0.05)

    predict_times = get_prediction_timeline(times, predict)

    return dump(model.predict(predict_times))

[NEXT]

Step 3: Run it!

from crypto_ml.models import deep_predict

$ celery -A crypto_ml worker

Monitor the activity logs:


$ celery -A crypto_ml worker
Darwin-15.6.0-x86_64-i386-64bit 2018-03-10 00:43:28

[config]
.> app:         crypto_ml:0x10e81a9e8
.> transport:   amqp://user:**@localhost:5672//
.> results:     amqp://
.> concurrency: 8 (prefork)
.> task events: OFF (enable -E to monitor tasks in this worker)

[queues]
.> celery           exchange=celery(direct) key=celery

[NEXT]

We're already halfway there

Now we just need to make the producer!

We can just follow the same recipe

[NEXT]

Step 1: Take the code


cl = CryptoLoader()
results = {}

# Compute results
for name in cl.get_crypto_names():

    prices, times = cl.get_prices(name)

    result = deep_predict(prices, times)

    results[name] = result

# Print results
for k,v in results.items():
    print(k, v)


[NEXT]

Step 2: Celerize it

from crypto_ml.data_loader import CryptoLoader
from util import load, dump

cl = CryptoLoader()
results = {}

# Send task for distributed computing
for name in cl.get_crypto_names():

    prices, times = cl.get_prices(name)

    task = deep_predict.delay(
                      dump(prices)
                    , dump(times))

    results[name] = task

# Wait for results and print
for k,v in results.items():
    p_result = v.get()

    if result:
        result = load(p_result)
        print(k, result)

[NEXT]

Step 3: Run it!

By just running the Python in a shell command!

distributed_architecture

[NEXT]

Visualise it

You can visualise through Celery "Flower"

$ celery -A crypto_ml flower
> INFO: flower.command: Visit me at http://localhost:5555

distributed_architecture

[NEXT]

Distriuted #Win

We now have ML, and are distributed.

We have surely won.

We can pack our ba- oh, not yet?

Not really

[NEXT]

4. Smart Data Pipelines

[NEXT]

The Crypto-ML has now an exponentially increasing amount of internal use-cases

Their datapipeline is getting unmanagable!

[NEXT]

Growingly complex data flows

  • There is a growing need to pull data from different sources
  • There is a growing need to pre- and post-process data
  • As complexity increases tasks depending on others
  • If a task fails we wouldn't want to run the children tasks
  • Some tasks need to be triggered chronologically
  • Data pipelines can get quite complex
  • Having just celerized tasks ran by Linux chronjob gets messy


[NEXT]

You want to go from here

cron_tab

[NEXT]

To here

airflow

[NEXT]

Introducing Airflow

The swiss army knife of data pipelines

[NEXT]

What Airflow IS

  • Written in Python!
  • Alternative to Chronos (also built by AirBnb) + Luigi
  • Has a scheduler (like chronjob, but not like cronjob)
  • Can define tasks and dependent tasks (as a pipeline)
  • Has real-time visualisation of jobs
  • Modular separation between framework and logic
  • Can run on top of celery (without any modifications)
  • Being introduced to the apache family (incubation)
  • Actively maintained and growing community
  • Used by tons of companies (AirBnB, Paypal, Quora,)

[NEXT]

What Airflow is NOT

  • Airflow is not perfect (but the best out there)
  • It's not a Lambda/FaaS framework (but can be programmed)
  • Is not extremely mature (ie incubation)
  • Airflow is not a data streaming solution (e.g. Storm/Spark Streaming)

[NEXT]

The DAG Architecture

cron_tab

[NEXT]

The scheduler

cron_tab

[NEXT]

The tree view (and sub-components)

cron_tab

[NEXT]

The Crypto-ML Usecase

Scheduled task:

  • Operator: polls new Crypto-data + triggers each sub-dags

Sub-DAG:

  • Operator: Transforms the data
  • Operator: Sends data to the crypto-prediction engine
  • Sensor: Polls until the prediction is finished
  • Branch: Modifies data & based on rules triggers action
    • Operator: Stores data in database
    • Operator: Sends request to trade

[NEXT]

The Crypto-ML Usecase

cron_tab

[NEXT]

Go check it out!

[NEXT]

5. Elastic DevOps Infrastructure

[NEXT] The CryptoML team remembers the easy days...

...when they only had a few new users a day

Now they have much heavier traffic!

They are working with large organsations!

They are processing massive loads of ML requests!


### Their DevOps infrastructure # Can't keep up!

[NEXT]

Underestimating DevOps complexity

  • Complexity of staging and deploying ML models
  • Backwards compatibility of feature-spaces/models
  • Distributing load across infrastructure
  • Idle resource time minimisation
  • Node failure back-up strategies
  • Testing of Machine Learning functionality
  • Monitoring of ML ecosystem
  • And the list goes on and on...

[NEXT]

Did anyone say docker?

weight_matrix

Package it. Ship it.

[NEXT]

Gotta love the simplicity

from conda/miniconda3-centos7:latest

ADD . /crypto_ml/
WORKDIR /crypto_ml/

# Dependency for one of the ML predictors
RUN yum install mesa-libGL -y

# Install all the dependencies
RUN conda env create -f crypto_ml.yml

and...

docker -t crypto_ml .

it's done!

[NEXT]

Can this be more awesome?

####Yes it can!

Packaging it up and installing through pip/conda.

from conda/miniconda3-centos7:latest
RUN conda install <YOUR_PACKAGE>

Nice and simple!

[NEXT]

The obvious docker-compose

version: '2'
services:
    manager:
        container_name: crypto_manager
        image: crypto_ml
        build: .
        links:
            - rabbitmq
        depends_on:
            - rabbitmq
        command: tail -f /dev/null
    worker:
        container_name: crypto_worker
        image: crypto_ml
        build: .
        links:
            - rabbitmq
        depends_on:
            - rabbitmq
        command: /usr/local/envs/crypto_ml/bin/celery -A crypto_ml worker --prefetch-multiplier 1 --max-tasks-per-child 1 -O fair
    rabbitmq:
        container_name: rabbitmq
        image: rabbitmq:3.6.0
        environment:
            - RABBITMQ_DEFAULT_USER=user
            - RABBITMQ_DEFAULT_PASS=1234
        ports:
            - 4369
            - 5672
            - 15672

It all just works automagically!

docker-compose up --scale crypto_worker=4

[NEXT]

Taking it to the next level with

Kubernetes

weight_matrix

[NEXT]

All the YML

distributed_architecture

Creating all the services and controllers

kubectl create -f k8s/*

[NEXT]

Kubernetes: Dev

Development

  • Minikube
  • Docker for Mac

[NEXT]

Kubernetes: Prod

Enterprise

  • Elastic Container Service for K8s (AWS)
  • CoreOS Tectonic
  • Red Hat OpenShift

[NEXT]

Kubernetes: Tools

Many options

  • CloudFormation (AWS)
  • Terraform
  • Kops

[NEXT]

Docker for Mac now supports it!

weight_matrix

Mini fist-pump

[NEXT] What's next for Crypto-ML?

weight_matrix

As with every other tech startup

the roller-coaster keeps going!

[NEXT]

But for us?

Obtained an intuitive understanding on ML

Learned about caveats on practical ML

Obtained tips on building distributed architectures

Got an taste of elastic DevOps infrastructure

[NEXT]

Code

https://github.com/axsauze/crypto-ml

Slides

http://github.com/axsauze/industrial-machine-learning

[NEXT]

Industrial Machine Learning

![portrait](images/alejandro.jpg)
Alejandro Saucedo

    <br>
    Head of Deployed Engineering
    <br>
    <a style="color: cyan" href="http://eigentech.com">Eigen Technologies</a>
    <br>
    <br>
    Chairman
    <br>
    <a style="color: cyan" href="http://ethical.institute">The Institute for Ethical AI & ML</a>
    <br>
    <br>
    Fellow (AI, Data & ML)
    <br>
    <a style="color: cyan" href="#">The RSA</a>
    <br>
    <br>
    Advisor
    <br>
    <a style="color: cyan" href="http://teensinai.com">TeensInAI.com initiative</a>
    <br>

</td>