medium-ds-pipeline-tutorial

Data science production pipeline tutorial

Note

This tutorial assumes that you have basic knowledge of Python, websockets and sklearn (Basic ML with classification using GaussianNB model for binary classification). It assumes that you understand the code base (which has been kept as simple as possible). For further understanding, please refer #4 in current 'Note' section.
The whole communication between services happend using MQ broker RabbitMQ. You can use something else, if you are familiar with, by changing protocol and dependencies.
You can clone the repository for a Quick Start on this tutorial
For any queries, discussions and ideas, please feel free to drop a mail at [bmonikraj@gmail.com]

The artifacts of the projects are => { clf_model.sav, predictor.py } , { service.py }
The testing script of the project is => { client.py }
Install all the dependencies of the project before running any script, on individual target system => Quick Note : If you have latest Anaconda version of python installed (3.6+ Py), then you only need to install pika
Make sure to have data file in the same directory as of trainer.py => Also available in the repository
You will need RabbitMQ to use this tutorial. You can install locally or use CloudAMQ free service for the purpose => Once you have the Connection URL String of rabbitMq, put (at RABBITMQ_CONN) them in files { predictor.py, service.py }
Once the training is complete by running command mentioned below in #1 and clf_model.sav is generated, { clf_model.sav, predictor.py } should be residing in same directory and can be treated as one logical deployment unit (LDU 1), with their required dependencies, as mentioned in trainer.py and predictor.py installed
After installing dependencies of service.py, this can be treated as another logical deployment unit (LDU 2).
We can run (LDU 1) by taking care of above mentioned pre-requisite and running command mentioned below in #2
We can run (LDU 2) by taking care of above mentioned pre-requisite and running command mentioned below in #3
We can test by running command mentioned below in #4. The test data is hard-coded in the client.py file. You can change accordingly.

Running trainer

python trainer.py -> This will save clf_model.sav in the same directory as of trainer.py The clf_model.sav file is the artifact which must be kept in the same directory as of predictor.py file.
Running predictor

python predictor.py -> This will run the predictor application
Running service

python service.py -> This will run the service application, a Websocket application which acts in async mode with respective client
Running client

python client.py <host_of_ws_server>:<port> -> This will run a simple websocker based client which will connect to given websocket server (should connect to accessible service instance), send test data and receive it's response

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
clf_model.sav		clf_model.sav
client.py		client.py
predictor.py		predictor.py
service.py		service.py
sonar.all-data		sonar.all-data
test-post-data.txt		test-post-data.txt
trainer.py		trainer.py