<div class="alert alert-block alert-info">
<font size = 30>scikit-learn → PMML (using Nyoka) </font>

<div class="alert alert-block alert-success">
### <font color=purple> Exporter: Logistic Regression </font> <br>
### <font color=purple> Data Set used: Auto-mpg Dataset </font>


### <font color=purple>**STEPS**: </font>
<font color=brown>
<br>-Build the model using sklearn Logistic Regression
<br>-Perform Preprocessing using Pipeline and DataFrameMapper
<br>- Build PMML (Data Dictionary, Mining schema, Ouput, PMML) using Nyoka classes
<br>- Upload PMML into Zementis using REST API and perform predictions using test data set
<br>- Predict using original sklearn model using test data set
<br>- Compare both the predictions<br> </font>

In [1]:
# Jupyter cells: 100% width 
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
%config Completer.use_jedi = False
%config NotebookApp.iopub_data_rate_limit = 7000000000.0 

<div class="alert alert-block alert-warning">
# Importing dependent resources

In [3]:
from nyoka import skl_to_pmml
from zementis_calls import ZementisCalls

In [4]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVR
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler,MinMaxScaler,MaxAbsScaler,RobustScaler
from sklearn_pandas import DataFrameMapper

<div class="alert alert-block alert-warning">
<b><font size =5 color="brown">Model building,Pre-processing (using pipeline) using Logistic Regression</font><b>

<div class="alert alert-block alert-warning">
# Setting the Zementis REST API calls and Comparision Process

In [5]:
def get_pmlpredictions(pmml_f_name, test_csv_f_name):
    call = ZementisCalls()
    if call.authentication():
        model_name = pmml_f_name[:-5]  # use pmml file name as model name
        call.check_model_existence(model_name)
        upload_status = call.pmml_upload(pmml_f_name)
        # print(upload_status)
        if upload_status == 201:
            print('upload sucessfull')
            z_predictions = np.asarray(call.get_predictions(test_csv_f_name, model_name))
            return z_predictions
            

In [6]:
def compare_predictions(skl_predictions,z_predictions,mdl_type='regression'):
    if mdl_type == 'regression':
        z_predictions = z_predictions.astype(np.float64)
        if np.allclose(skl_predictions, z_predictions):
            print('No inconsistent predictions')
        else:
            print('Inconsistent predictions')
    elif mdl_type == 'classification':
        z_predictions=list(map(int,z_predictions))
        if np.array_equal(skl_predictions, z_predictions):
            print('No inconsistent predictions')
        else:
            print('Inconsistent predictions')

<div class="alert alert-block alert-warning">
# Auto Dataset For Classification of 'origin' attribute

In [45]:
df=pd.read_csv('auto-mpg.csv')
feature_names = df.columns.drop(['car name','origin'])
feature_names = feature_names._data
target_name = 'origin'
x_train, x_test, y_train, y_test = train_test_split(df[feature_names], df[target_name], test_size=0.33)
test_csv_f_name = 'auto_test.csv'
x_test.to_csv(test_csv_f_name, index=False)

<br><div class="alert alert-block alert-warning">
<b><font size = 5 color="brown">Case 1:  Creation of Pipeline with  only an estimator</font><b><br>

In [46]:
pipeline_1 = Pipeline([
    ('mdl_name', LogisticRegression())
])

<div class="alert alert-block alert-warning">
 PMML Name 

In [47]:
pmml_f_name_1 = 'logt_auto_1.pmml'

<div class="alert alert-block alert-warning">
Fitting the pipeline instance

In [48]:
pipeline_1.fit(x_train, y_train)

Pipeline(memory=None,
     steps=[('mdl_name', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False))])

<div class="alert alert-block alert-warning">
Export the Pipeline object into PMML using the Nyoka package

In [49]:
skl_to_pmml(pipeline=pipeline_1,
          col_names=feature_names,
          target_name=target_name,
          pmml_f_name=pmml_f_name_1)

<div class="alert alert-block alert-warning">
Storing the Scikit Learn Predictions

In [50]:
skl_predictions_1 = pipeline_1.predict(x_test)

<div class="alert alert-block alert-warning">
Getting the Predictions from Zementis Server using our pmml as input

In [51]:
pml_predictions_1 = get_pmlpredictions(pmml_f_name_1,test_csv_f_name)

upload sucessfull


<div class="alert alert-block alert-warning">
Comparing the Prediction between the the two predictions

In [52]:
compare_predictions(skl_predictions_1, pml_predictions_1, mdl_type='classification')

No inconsistent predictions


<br><div class="alert alert-block alert-warning">
 <b><font size = 5 color="brown">Case 2:  Creation of Pipeline with  a pipeline step and an estimator</font><b> <br>

In [53]:
pipeline_2 = Pipeline([
    ("p_step0",StandardScaler()),
    ('mdl_name', LogisticRegression())
])

<div class="alert alert-block alert-warning">
PMML Name 

In [54]:
pmml_f_name_2 = 'logt_auto_2.pmml'

<div class="alert alert-block alert-warning">
Fitting the pipeline instance

In [55]:
pipeline_2.fit(x_train, y_train)

Pipeline(memory=None,
     steps=[('p_step0', StandardScaler(copy=True, with_mean=True, with_std=True)), ('mdl_name', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False))])

<div class="alert alert-block alert-warning">
Export the Pipeline object into PMML using the Nyoka package

In [56]:
skl_to_pmml(pipeline=pipeline_2,
          col_names=feature_names,
          target_name=target_name,
          pmml_f_name=pmml_f_name_2)

<div class="alert alert-block alert-warning">
Storing the Scikit Learn Predictions

In [57]:
skl_predictions_2 = pipeline_2.predict(x_test)

<div class="alert alert-block alert-warning">
Getting the Predictions from Zementis Server using our pmml as input

In [58]:
pml_predictions_2 = get_pmlpredictions(pmml_f_name_2,test_csv_f_name)

upload sucessfull


<div class="alert alert-block alert-warning">
Comparing the Prediction between the the two predictions

In [59]:
compare_predictions(skl_predictions_2, pml_predictions_2, mdl_type='classification')

No inconsistent predictions


 <br><div class="alert alert-block alert-warning">
<b><font size = 5 color="brown">Case 3:  Creation of Pipeline with  two pipeline steps and an estimator</font><b> <br>

In [60]:
pipeline_3 = Pipeline([
    ("p_step0",StandardScaler()),
    ('p_step1', MinMaxScaler()),
    ('mdl_name', LogisticRegression()),
    
])

<div class="alert alert-block alert-warning">
 PMML Name 

In [61]:
pmml_f_name_3 = 'logt_auto_3.pmml'

<div class="alert alert-block alert-warning">
Fitting the pipeline instance

In [62]:
pipeline_3.fit(x_train, y_train)

Pipeline(memory=None,
     steps=[('p_step0', StandardScaler(copy=True, with_mean=True, with_std=True)), ('p_step1', MinMaxScaler(copy=True, feature_range=(0, 1))), ('mdl_name', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False))])

<div class="alert alert-block alert-warning">
Export the Pipeline object into PMML using the Nyoka package

In [63]:
skl_to_pmml(pipeline=pipeline_3,
          col_names=feature_names,
          target_name=target_name,
          pmml_f_name=pmml_f_name_3)

<div class="alert alert-block alert-warning">
Storing the Scikit Learn Predictions

In [64]:
skl_predictions_3 = pipeline_3.predict(x_test)

<div class="alert alert-block alert-warning">
Getting the Predictions from Zementis Server using our pmml as input

In [65]:
pml_predictions_3 = get_pmlpredictions(pmml_f_name_3,test_csv_f_name)

upload sucessfull


<div class="alert alert-block alert-warning">
Comparing the Prediction between the the two predictions

In [66]:
compare_predictions(skl_predictions_3, pml_predictions_3, mdl_type='classification')

No inconsistent predictions


<br><div class="alert alert-block alert-warning">
<b><font size = 5 color="brown">Case 4:  Creation of Pipeline with  one dataframemapper as pipeline step and an estimator</font><b> <br>

In [67]:
dframe_0 = DataFrameMapper([
    (['displacement'],[MinMaxScaler()])
])

In [68]:
pipeline_4 = Pipeline([
    ("df_0",dframe_0),
    ('mdl_name', LogisticRegression()),
    
])

<div class="alert alert-block alert-warning">
PMML Name 

In [69]:
pmml_f_name_4 = 'logt_auto_4.pmml'

<div class="alert alert-block alert-warning">
Fitting the pipeline instance

In [70]:
pipeline_4.fit(x_train, y_train)

Pipeline(memory=None,
     steps=[('df_0', DataFrameMapper(default=False, df_out=False,
        features=[(['displacement'], [MinMaxScaler(copy=True, feature_range=(0, 1))])],
        input_df=False, sparse=False)), ('mdl_name', LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False))])

<div class="alert alert-block alert-warning">
Export the Pipeline object into PMML using the Nyoka package

In [71]:
skl_to_pmml(pipeline=pipeline_4,
          col_names=feature_names,
          target_name=target_name,
          pmml_f_name=pmml_f_name_4)

<div class="alert alert-block alert-warning">
Storing the Scikit Learn Predictions

In [72]:
skl_predictions_4 = pipeline_4.predict(x_test)

<div class="alert alert-block alert-warning">
Getting the Predictions from Zementis Server using our pmml as input

In [73]:
pml_predictions_4 = get_pmlpredictions(pmml_f_name_4,test_csv_f_name)

upload sucessfull


<div class="alert alert-block alert-warning">
Comparing the Prediction between the the two predictions

In [74]:
compare_predictions(skl_predictions_4, pml_predictions_4, mdl_type='classification')

No inconsistent predictions


<div class="alert alert-block alert-warning">
<b><font size = 5 color="brown">Case 5:  Creation of Pipeline with  one dataframemapper,a pipeline step and an estimator</font><b>

In [75]:
dframe_1 = DataFrameMapper([
    (['displacement'],[MinMaxScaler()])
])

In [76]:
pipeline_5 = Pipeline([
    ("df_0",dframe_1),
    ("p_step0",StandardScaler()),
    ('mdl_name', LogisticRegression()),
    
])

<div class="alert alert-block alert-warning">
PMML Name 

In [77]:
pmml_f_name_5 = 'logt_auto_5.pmml'

<div class="alert alert-block alert-warning">
Fitting the pipeline instance

In [78]:
pipeline_5.fit(x_train, y_train)

Pipeline(memory=None,
     steps=[('df_0', DataFrameMapper(default=False, df_out=False,
        features=[(['displacement'], [MinMaxScaler(copy=True, feature_range=(0, 1))])],
        input_df=False, sparse=False)), ('p_step0', StandardScaler(copy=True, with_mean=True, with_std=True)), ('mdl_name', LogisticRegression(C=1.0, c...ty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False))])

<div class="alert alert-block alert-warning">
Export the Pipeline object into PMML using the Nyoka package

In [79]:
skl_to_pmml(pipeline=pipeline_5,
          col_names=feature_names,
          target_name=target_name,
          pmml_f_name=pmml_f_name_5)

<div class="alert alert-block alert-warning">
Storing the Scikit Learn Predictions

In [80]:
skl_predictions_5 = pipeline_5.predict(x_test)

<div class="alert alert-block alert-warning">
Getting the Predictions from Zementis Server using our pmml as input

In [81]:
pml_predictions_5 = get_pmlpredictions(pmml_f_name_5,test_csv_f_name)

upload sucessfull


<div class="alert alert-block alert-warning">
Comparing the Prediction between the the two predictions

In [82]:
compare_predictions(skl_predictions_5, pml_predictions_5, mdl_type='classification')

No inconsistent predictions
