No description or website provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data_eng
data_science
model/spark-model
serving
util
.gitignore
LICENSE
README.md
constraints.txt
job_screenshot.png
requirements-webapp.txt
requirements.txt
setup.sh

README.md

cdsw-simple-serving-python

This is aim to Python version of cdsw-simple-serving.

This repo has:

  • data preparation with RDD
  • built a simple machine learning pipeline with Spark.ml
  • export built model
  • example web server code for scoring

Currently, this repo doesn't have following features:

  • export built model as PMML

requirements

pip install -r requirements.txt -c constraints.txt

Set the environment variable

  • HDFS_HOST for handling HDFS files via hdfs package

You can use this repo for:

  • as a template for collaboration with Data Engineer and Data Scientist
  • create job dependencies from data preparation to model serving

How to run sample web app

  1. Create virtualenv for your app: virtualenv -p python2 venv && source ./venv/bin/activate
  2. Install dependent libraries: pip install -r requirements-webapp.txt
  3. Run example app: spark-submit serving/web_app.py

then, you can POST data as follows:

$ curl -v -H "Accept: application/json" -H "Content-type: application/json" -X POST -d '{"Temperature":23.18,"Humidity":27.272,"Light":426,"CO2":721.25,"HumidityRatio":0.00478}' http://localhost:5000/api/predict

or, if you want to use gunicorn

  1. pip install -r requirements-webapp.txt
  2. Download spark repo
  3. Install pyspark dependencies: cd some-spark-director/python && pip install -e
  4. Run example app: cd serving; gunicorn web_app:app --log-file -