# Fast AI with Tabular data

This notebook is based on fastai's cours v3 lesson 4.  We are going to train a model that predict salary range base on the data we provided.

![Impression](https://www.google-analytics.com/collect?v=1&tid=UA-112879361-3&cid=555&t=event&ec=nb&ea=open&el=gallery-example&dt=fastai-tabular-csv)

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

!pip install fastai
!pip install bentoml





In [2]:
from fastai.tabular import *

## Prepare Training Data

In [3]:
path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')

In [4]:
dep_var = 'salary'
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [FillMissing, Categorify, Normalize]

In [5]:
test = TabularList.from_df(df.iloc[800:1000].copy(), path=path, cat_names=cat_names, cont_names=cont_names)

In [6]:
data = (TabularList.from_df(df, path=path, cat_names=cat_names, cont_names=cont_names, procs=procs)
                           .split_by_idx(list(range(800,1000)))
                           .label_from_df(cols=dep_var)
                           .add_test(test)
                           .databunch())

In [7]:
data.show_batch(rows=10)

workclass,education,marital-status,occupation,relationship,race,education-num_na,age,fnlwgt,education-num,target
Federal-gov,HS-grad,Never-married,Prof-specialty,Not-in-family,White,False,-1.1425,1.238,-0.4224,<50k
Local-gov,Bachelors,Divorced,Prof-specialty,Not-in-family,Black,False,1.7161,0.5628,1.1422,<50k
Self-emp-not-inc,Some-college,Married-civ-spouse,Farming-fishing,Husband,White,False,-0.2629,1.2336,-0.0312,<50k
Self-emp-not-inc,Bachelors,Married-civ-spouse,Exec-managerial,Husband,White,False,0.7632,-0.7372,1.1422,>=50k
Private,Some-college,Married-civ-spouse,Tech-support,Husband,White,False,-0.776,-1.1808,-0.0312,<50k
Private,HS-grad,Never-married,Craft-repair,Own-child,White,False,-1.1425,-1.2604,-0.4224,<50k
Private,7th-8th,Married-civ-spouse,Craft-repair,Husband,White,False,2.0826,-0.3717,-2.3781,<50k
Private,Some-college,Never-married,#na#,Own-child,White,False,-1.2158,0.2702,-0.0312,<50k
Without-pay,HS-grad,Never-married,Craft-repair,Own-child,Black,False,-1.2891,0.4077,-0.4224,<50k
Private,HS-grad,Married-civ-spouse,Transport-moving,Husband,White,False,-0.4095,-1.0861,-0.4224,<50k


## Model Training

In [8]:
learn = tabular_learner(data, layers=[200,100], metrics=accuracy)

In [9]:
learn.fit(1, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.363818,0.390632,0.805,00:03


In [10]:
row = df.iloc[0] # sample input date for testing

learn.predict(row)

(Category >=50k, tensor(1), tensor([0.2878, 0.7122]))

## Create BentoService for model serving

In [11]:
%%writefile tabular_csv.py

from bentoml import env, api, artifacts, BentoService
from bentoml.artifact import FastaiModelArtifact
from bentoml.handlers import DataframeHandler


@env(conda_environment=['fastai'])
@artifacts([FastaiModelArtifact('model')])
class TabularModel(BentoService):
    
    @api(DataframeHandler)
    def predict(self, df):
        result = []
        for index, row in df.iterrows():            
            result.append(self.artifacts.model.predict(row))
        return str(result)

Overwriting tabular_csv.py


## Save BentoService to file archive

In [12]:
# 1) import the custom BentoService defined above
from tabular_csv import TabularModel

# 2) `pack` it with required artifacts
svc = TabularModel.pack(model=learn)

# 3) save your BentoSerivce
saved_path = svc.save()

[2019-09-17 14:49:16,473] INFO - Successfully saved Bento 'TabularModel:2019_09_17_e8f3521a' to path: /Users/chaoyuyang/bentoml/repository/TabularModel/2019_09_17_e8f3521a


## Install saved BentoService as PyPI package

In [13]:
!pip install {saved_path}

Processing /Users/chaoyuyang/bentoml/repository/TabularModel/2019_09_17_e8f3521a
Building wheels for collected packages: TabularModel
  Building wheel for TabularModel (setup.py) ... [?25ldone
[?25h  Stored in directory: /private/var/folders/ns/vc9qhmqx5dx_9fws7d869lqh0000gn/T/pip-ephem-wheel-cache-dm18zls6/wheels/d8/22/b3/193cd35f0ca411b7962b7e2ea98b8ab919ceb2e8e8ddd5383b
Successfully built TabularModel
Installing collected packages: TabularModel
  Found existing installation: TabularModel 2019-09-17-7d8feac4
    Uninstalling TabularModel-2019-09-17-7d8feac4:
      Successfully uninstalled TabularModel-2019-09-17-7d8feac4
Successfully installed TabularModel-2019-09-17-e8f3521a


In [14]:
# Use json data
!TabularModel predict --input=test.json

[(Category <50k, tensor(0), tensor([0.6706, 0.3294]))]


In [15]:
# Use CSV data
!TabularModel predict --input=test.csv

[(Category >=50k, tensor(1), tensor([0.2878, 0.7122]))]


## Model Serving via REST API

*Note: Running as local rest api server does not work with Google Colab, please copy this notebook to run it locally*

In [16]:
!bentoml serve {saved_path}

 * Serving Flask app "TabularModel" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [17/Sep/2019 14:49:46] "[37mGET / HTTP/1.1[0m" 200 -
127.0.0.1 - - [17/Sep/2019 14:49:47] "[37mGET /docs.json HTTP/1.1[0m" 200 -
127.0.0.1 - - [17/Sep/2019 14:49:52] "[37mPOST /predict HTTP/1.1[0m" 200 -
^C


### Send prediction requeset to the REST API server

#### JSON Request

```bash
curl -X POST \
  http://localhost:5000/predict \
  -H 'Content-Type: application/json' \
  -d '[{
  "age": 49,
  "workclass": "Private",
  "fnlwgt": 101320,
  "education": "Assoc-acdm",
  "education-num": 12.0,
  "marital-status": "Married-civ-spouse",
  "occupation": "",
  "relationship": "Wift",
  "race": "White",
  "sex": "Female",
  "capital-gain": 0,
  "capital-loss": 1902,
  "hours-per-week": 40,
  "native-country": "United-States",
  "salary": ">=50k"
}]'
```

#### CSV Request

```bash
curl -X POST \
  http://localhost:5000/predict \
  -H 'Content-Type: text/csv' \
  -d 'age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
49, Private,101320, Assoc-acdm,12.0, Married-civ-spouse,, Wife, White, Female,0,1902,40, United-States,>=50k'
```