Getting started with SinaraML

What you'll get after completion of this tutorial

After completion of this tutorial you will create SinaraML Server, setup example ML pipeline and train model. After than you will create model image with REST service.

SinaraML System Requirements

RAM: 16 Gb

vCPU: 4

Disk Free space: 20Gb + space for user's data

Prerequisites

SinaraML components can be run on Linux, MacOS and Windows. Following programs should be installed.

Prerequisites for Linux

Docker is running
Python 3.6+ installed
Unzip installed
Git installed

To check prerequisites please follow Prerequisites checklist for Linux

Prerequisites for Windows

Docker Desktop is running
Ubuntu installed
Python 3.6+ installed in Ubuntu
Unzip installed in Ubuntu
Git installed in Ubuntu

To check prerequisites please follow Prerequisites checklist for Windows

Prerequisites for MacOS

For now, only MacOS running on Intel CPU supperted MacOS devices running on the Apple m series CPU can experience issues when running Apache Spark tasks.

Docker Desktop is running
Python 3.6+ installed
Unzip installed
Git installed
Before sinaraml installation add python 3 scripts to PATH:

echo "/Users/$(whoami)/Library/Python/3.8/bin:$PATH" >> ~/.zshrc

All commands should use pip3 instead of pip

To check prerequisites please follow Prerequisites checklist for MacOS

Setup SinaraML on your desktop

Use SinaraML CLI for SinaraML Server management - https://pypi.org/project/sinaraml/

Important

In Linux and MacOS use built-in terminal

In Windows use Ubuntu terminal

CLI Installation

pip install sinaraml

if you see a warning in the end of the installation process:

WARNING: The script sinara is installed in '/home/testuser/.local/bin' which is not on PATH.
Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.

Add path to $HOME/.local/bin to your $PATH environment variable to enable cli commands.

export PATH=$PATH:/some/new/path:$HOME/.local/bin

You may need reload shell or reboot your machine after installation to enable CLI commands.

Configure docker on Linux or Windows

Perform Linux post-installation steps for Docker Engine to run SinaraML CLI commands without sudo:

sudo groupadd -f docker
sudo usermod -aG docker $USER

Creating and running server

For additional details check Remote Platform

After installing SinaraML CLI you can use 'sinara' command in terminal (or remote machine terminal via ssh).

First, create a SinaraML server.

sinara server create

Second, run SinaraML server:

sinara server start

After SinaraML Server starts you will see urls where running server is available so you can open it in your browser.

Running step template

Important

Please read Known Issues to address inconvenience that you can experience while running pipelines.

Open Jupyter Notebook Server at http://127.0.0.1:8888/lab in any browser.
Inside Jupyter server terminal execute:

git clone --recursive https://github.com/4-DS/pipeline-step_template.git
cd pipeline-step_template

Perform following actions:

Open 'prepare_data_for_template.ipynb' notebook and run all cells to get sample data or execute following command in the server terminal:

ipython3 prepare_data_for_template.ipynb

Execute 'step.dev.py' in Jupyter server terminal:

python step.dev.py

Stopping and removing server

To stop SinaraML Server execute:

sinara server stop

To continue using the SinaraML execute:

sinara server start

To remove the SinaraML Server execute:

sinara server remove

Note

When on creating and removing SinaraML by default setup scripts creates docker volumes for data, code and temporary data. For day to day usage its recommended to use folder mapping on local disk using create command with option:

sinara server create --runMode b

You will be asked to enter host path where to store server's '/data', '/tmp' and '/work' folders.

SinaraML Pipeline Example

Following example shows how to build a model serving pipeline from a raw dataset to a ML-Model docker container with REST API.
ML-pipeline is based on SinaraML Framework and tools.
This example ML-model calculates house median price. Based on open data sample.
Example pipeline includes 5 steps, which must be run sequentially:

Data Load
Data Preparation
Model Train
Model Evaluation
Model Test

Warning

Creating new pipeline from examples are not recommended since examples can be outdated and use old version if the SinaraML Library. Please see Creating New Pipeline Tutorial

1. Data Load

This step downloads a csv-dataset from the internet and converts it to partitioned parquet files that can be read by Apache spark later in efficient way
To run the step do:

Clone git repository:

git clone --recursive https://github.com/4-DS/house_price-data_load.git
cd house_price-data_load

Run step:

python step.dev.py

2. Data Preparation

This step splits the dataset into train, test and evaluation sets using partitioned parquets made by the data load step
To run the step do:

Clone git repository:

git clone --recursive https://github.com/4-DS/house_price-data_prep.git
cd house_price-data_prep

Run step:

python step.dev.py

3. Model Train

This step

Trains a GradientBoostingRegressor model on the train set made by the data prep step
Packs model to a BentoService using BentoML library

To run the step do:

Clone git repository:

git clone --recursive https://github.com/4-DS/house_price-model_train.git
cd house_price-model_train

Run step:

python step.dev.py

4. Model Evaluation

This step checks quality of the model made by the model train step
To run the step do:

Clone git repository:

git clone --recursive https://github.com/4-DS/house_price-model_eval.git
cd house_price-model_eval

Run step:

python step.dev.py

5. Model Test

This step checks that bento service's REST-API (made by model train step) is working properly before it will be built into a docker image
To run the step do:

Clone git repository:

git clone --recursive https://github.com/4-DS/house_price-model_test.git
cd house_price-model_test

Run step:

python step.dev.py

Model Image build

Important

Following commands should be executed in the host terminal.

After running all 5 steps we are ready to build a docker image with REST API. To build do the following:

SinaraML CLI should be installed
Create docker image for model service - execute:

sinara model containerize

Enter a path to bento service's model.zip inside your running dev jupyter environment
Enter repository url to push image to (enter "local" to use local docker repository on machine where docker is installed)
When docker build command finishes running it will output a model image name which we'll run
Execute docker run command with the image name from previous step.
Ensure that 5000 port is free on the host system or you can choose your own port that REST Service will be available at:

docker run -p 0.0.0.0:5000:5000 %model_image_name%

Swagger UI of the model should be available at http://127.0.0.1:5000

How to create your own pipeline

To create your own pipeline please read Creating New Pipeline Tutorial

Getting started with SinaraML

What you'll get after completion of this tutorial

SinaraML System Requirements

Prerequisites

Prerequisites for Linux

Prerequisites for Windows

Prerequisites for MacOS

Setup SinaraML on your desktop

CLI Installation

Configure docker on Linux or Windows

Creating and running server

Running step template

Stopping and removing server

SinaraML Pipeline Example

1. Data Load

2. Data Preparation

3. Model Train

4. Model Evaluation

5. Model Test

Model Image build

How to create your own pipeline

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally