<div class="alert alert-block alert-info">
# <font size = 30>anomaly → PMML (using Nyoka) </font>

<div class="alert alert-block alert-success">
### <font color=purple> Exporter: Anomaly Detection models (OneClassSVM) </font>
### <font color=purple> Data Set used: iris </font>


### <font color=purple>**STEPS**: </font>
<font color=brown>
- Build the model using sklearn OneClassSVM
- Build PMML (Data Dictionary, Mining schema, Ouput, PMML) using Nyoka classes
- Upload PMML into Zementis using REST API and perform predictions using test data set
- Predict using original sklearn model using test data set
- Compare both the predictions

In [1]:
# Jupyter cells: 100% width 
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
%config NotebookApp.iopub_data_rate_limit = 7000000000000.0 
%config NotebookApp.rate_limit_window=60.0

<div class="alert alert-block alert-warning">
# Pre-processing, Model building (using pipeline) for iris data set

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from requests.auth import HTTPBasicAuth
import requests
import json
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, Imputer
from sklearn.svm import OneClassSVM

irisdata = datasets.load_iris()
iris = pd.DataFrame(irisdata.data,columns=irisdata.feature_names)
iris['Species'] = irisdata.target

feature_names = iris.columns.drop('Species')

X_train, X_test, y_train, y_test = train_test_split(iris[iris.columns.drop(['Species'])], 
                                                    iris['Species'], test_size=0.33, random_state=101)

X_test.to_csv("iris_test.csv")


pipe = Pipeline([('standard_scaler',StandardScaler()), ('Imputer',Imputer()), ('model',OneClassSVM())])

pipe.fit(X_train)

print("\n","Anomaly detection model is built successfully.")


 Anomaly detection model is built successfully.


<div class="alert alert-block alert-warning">
# Export the Pipeline object into PMML using the Nyoka package

In [3]:
from nyoka import skl_to_pmml
skl_to_pmml(pipeline=pipe, col_names=feature_names, pmml_f_name="anomaly_model.pmml")

<div class="alert alert-block alert-warning">
# PMML Upload using REST APIs

<div class="alert alert-block alert-warning">
### Adapa engine URL and authentication

In [3]:
# Set the URL and auth
url = "http://dcindgo01:8083/adapars/"
auth = HTTPBasicAuth('Administrator', 'manage')

<div class="alert alert-block alert-warning">
### Check for the model existence

In [5]:
# Check if the model is already uploaded into Zementis, if yes delete it
response = requests.get(url + "models/", auth=auth)
if 'OneClassSVM' in json.loads(response.text)['models']:
    requests.delete(url + "model/OneClassSVM",auth=auth)
    print("Model with the same name was deleted from Zementis! You can go ahead and upload your pmml file...")
else:
    print("OneClassSVM model does not exist in Zementis! You can go ahead and upload your pmml file...")

Model with the same name was deleted from Zementis! You can go ahead and upload your pmml file...


<div class="alert alert-block alert-warning">
### Upload the new PMML file

In [6]:
# Upload the PMML into Zementis 
pmml_file = open("anomaly_model.pmml","r")
pmml = {'file': pmml_file}

pmml_upload = requests.post(url + "models/", files=pmml, auth=auth)

print(pmml_upload.status_code)
print(pmml_upload.text)

201
{
  "modelName" : "OneClassSVM",
  "description" : "Default description",
  "isActive" : true,
  "inputFields" : [ {
    "usage" : "SUPPLEMENTARY",
    "name" : "sepal length (cm)",
    "type" : "DOUBLE"
  }, {
    "usage" : "SUPPLEMENTARY",
    "name" : "sepal width (cm)",
    "type" : "DOUBLE"
  }, {
    "usage" : "SUPPLEMENTARY",
    "name" : "petal length (cm)",
    "type" : "DOUBLE"
  }, {
    "usage" : "SUPPLEMENTARY",
    "name" : "petal width (cm)",
    "type" : "DOUBLE"
  } ],
  "outputFields" : [ {
    "usage" : "OUTPUT",
    "name" : "anomalyScore",
    "type" : "FLOAT"
  }, {
    "usage" : "OUTPUT",
    "name" : "anomaly",
    "type" : "BOOLEAN"
  } ]
}


<div class="alert alert-block alert-warning">
### Extract the model name from model properties

In [7]:
# Extract the model name from model properties
model_properties = json.loads(pmml_upload.text)
model_name = model_properties['modelName']
print(model_name)

OneClassSVM


<div class="alert alert-block alert-warning">
# Predictions using REST APIs

<div class="alert alert-block alert-warning">
## Single record prediction

In [8]:
# Create a single record for scoring
single_rec = X_test.iloc[21].to_dict()
print(single_rec,"\n")

for i in single_rec.keys():
    single_rec[i] = int(single_rec[i])

record = json.dumps(single_rec)

predictions_data = requests.put(url + "apply/" + model_name, data=record, auth=auth)
print("Status code:",predictions_data.status_code)
json.loads(predictions_data.text)

{'sepal length (cm)': 5.3, 'sepal width (cm)': 3.7, 'petal length (cm)': 1.5, 'petal width (cm)': 0.2} 

Status code: 200


{'model': 'OneClassSVM',
 'outputs': [{'anomalyScore': -1.5678122, 'anomaly': True}]}

<div class="alert alert-block alert-warning">
# Multiple records prediction

In [9]:
# Read the test data into Pandas data frame and display first 5 rows
df = pd.read_csv("iris_test.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,33,5.5,4.2,1.4,0.2
1,16,5.4,3.9,1.3,0.4
2,43,5.0,3.5,1.6,0.6
3,129,7.2,3.0,5.8,1.6
4,50,7.0,3.2,4.7,1.4


In [10]:
# Perform the predictions using test data
test_file = open("iris_test.csv","r")
test_csv = {'file':test_file}

predictions_data = requests.post(url + "apply/" + model_name, files=test_csv, auth=auth)

# Compare the predictions of Zementis against original sklearn model
predictions = predictions_data.text.split()
predictions.pop(0)

anomaly_scores = [predictions[i].split(",")[0] for i in range(len(predictions))]
predictions = [predictions[i].split(",")[1] for i in range(len(predictions))]
original = list(pipe.predict(X_test))

def convert_anomaly(val):
    if val == 'false':
        return 1
    else:
        return -1
    
results = list(map(convert_anomaly, predictions))

anomaly_df = pd.DataFrame(pd.Series(anomaly_scores), columns=['Anomaly_Score'])
anomaly_df['Anomaly'] = results

#anomaly_df.head()

In [11]:
print("Do Zementis and sklearn predictions match? :", np.array_equal(results,original), "\n")
print("Number of unmatched predictions:", len([(a,b) for (a,b) in zip(results,original) if a!=b]))

Do Zementis and sklearn predictions match? : True 

Number of unmatched predictions: 0
