## Recipe Builder Actions Overview

### Saving a File Cell
If you wish to save the contents of a cell, simply run it.  The `%%writefile` command at the top of the cell will write the contents of the cell to the file named at the top of the cell. You should run the cells manually when applicable. However, **pressing any of the actions at the top will automatically run all file cells relevant to the action**.

### Training and Scoring
Press the associated buttons at the top in order to run training or scoring. The training output will be shown below the `evaluator.py` cell and scoring output will be shown below the `datasaver.py` cell. You must run training at least once before you can run scoring. You may delete the output cell(s). Running training the first time or after changing `requirements.txt` will be slower since the dependencies for the recipe need to be installed, but subsequent runs will be significantly faster. If you wish to see the hidden output add `debug` to the end of the output cell and re-run it.

### Creating the Recipe
When you are done editing the recipe and satisfied with the training/scoring output, you can create a recipe from the notebook by pressing `Create Recipe`. You must run scoring at least once before you can create the recipe. After pressing it, you will see a progress bar showing how much time is left for the build to finish. If the recipe creation is successful the progress bar will be replaced by an external link that you can click to navigate to the created recipe.


## Caution!
* **Do not delete any of the file cells**
* **Do not edit the `%%writefile` line at the top of the file cells**

---

#### **Requirements File** (Optional)
Add additional libraries you wish to use in the recipe to the cell below. You can specify the version number if necessary. The file cell below is a **commented out example**.

In [1]:
# pandas=0.22.0
# numpy

Overwriting /home/asruser/my-workspace/.recipes/recipe-kgur_YT7/requirements.txt


Search here for additional libraries https://anaconda.org/. This is the list of main **libraries already in use**:
`python=3.5.2` `scikit-learn` `pandas` `numpy` `data_access_sdk_python`
**Warning: libraries or specific versions you add may be incompatible with the above libraries**.

---

#### **Configuration Files**
List any hyperparameters you wish to use. Specify the dataset(s) and schema(s) that are needed for training/scoring. To find the dataset ids go to the **Data tab** in Adobe Experience Platform or view the **Datasets** folder in the **Notebooks Data tab** on the left. You can also find schema id in the **Notebooks Data tab** under the **Schemas** folder. Each configuration will only be used for its corresponding action. `ACP_DSW_TRAINING_XDM_SCHEMA` and `ACP_DSW_SCORING_RESULTS_XDM_SCHEMA` will only be used after the recipe has been created.

##### Training Configuration

In [2]:
{
   "trainingDataSetId": "5dd2ee272a371e18a8fa7ebd",
   "ACP_DSW_TRAINING_XDM_SCHEMA": "2Fd2ed6f8ae2dab35ec660cf998383a79d",
   "learning_rate": "0.1",
   "n_estimators": "100",
   "max_depth": "3"
}

Overwriting /home/asruser/my-workspace/.recipes/recipe-kgur_YT7/training.conf


##### Scoring Configuration

In [3]:
{
   "scoringDataSetId": "5dd2ee272a371e18a8fa7ebd",
   "scoringResultsDataSetId": "5dd320d152d57618b787e38a",
   "ACP_DSW_SCORING_RESULTS_XDM_SCHEMA": "2Fd2ed6f8ae2dab35ec660cf998383a79d"
}

Overwriting /home/asruser/my-workspace/.recipes/recipe-kgur_YT7/scoring.conf


**The following configuration parameters are automatically set for you when you train/score:** 
`ML_FRAMEWORK_IMS_USER_CLIENT_ID` `ML_FRAMEWORK_IMS_TOKEN` `ML_FRAMEWORK_IMS_ML_TOKEN` `ML_FRAMEWORK_IMS_TENANT_ID`

---

#### **Training Data Loader File**
Implement the `load` function to load and prepare the training data.

In [4]:
import logging
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

from numpy import median

import pandas as pd 
import numpy as np

import os
import scipy.stats as st

import itertools
import scipy.stats as ss
from scipy.stats import mstats
import sys
import sklearn
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from datetime import datetime, timedelta
from platform_sdk.dataset_reader import DatasetReader
from .utils import get_client_context
from numpy import median

def load(configProperties):
    print("Training Data Load Start")

    #########################################
    # Load Data
    #########################################    
    client_context = get_client_context(configProperties)

    dataset_reader = DatasetReader(client_context, configProperties['trainingDataSetId'])
    df0 = dataset_reader.read()
    
    print("Training read complete")
    df0['timestamp'] = df0['timestamp'].astype('datetime64[ns]')
    df0['Invoice_Month'] = df0['timestamp'].dt.month    

    df0.loc[(df0['Invoice_Month'] <= 8) ,'Timeperiod'] = '1'
    df0.loc[(df0['Invoice_Month'] > 8) ,'Timeperiod'] = '2'
    
    print("Timeperiod calculated")
    
    df_orders = df0[df0['_experienceplatform.qty']>0]
    df_returns= df0[df0['_experienceplatform.qty']<0]

    df00 = df_orders

    df_GroupByUser=df00.groupby(['_experienceplatform.acctnumber'])['timestamp'].count().to_frame('Store_Orders_Placed').reset_index()
    df1=df00.groupby(['_experienceplatform.acctnumber'])['timestamp'].apply(min).to_frame('First_Order_Date').reset_index()
    df2=df00.groupby(['_experienceplatform.acctnumber'])['timestamp'].apply(max).to_frame('Last_Order_Date').reset_index()
    df3=df00.groupby(['_experienceplatform.acctnumber'])['_experienceplatform.salesamount'].sum().to_frame('Store_Revenue').reset_index()
    df4=df00.groupby(['_experienceplatform.acctnumber'])['_experienceplatform.qty'].sum().to_frame('Store_Product_Quantity').reset_index()

    df_GroupByUser['First_Order_Date'] = df1['First_Order_Date']
    df_GroupByUser['Last_Order_Date'] = df2['Last_Order_Date']
    df_GroupByUser['Store_Revenue'] = df3['Store_Revenue']
    df_GroupByUser['Store_Product_Quantity'] = df4['Store_Product_Quantity']

# Calculate total revenue and orders for each timeperiod for modeling
    Store_Orders_Placed = pd.pivot_table(df00, index='_experienceplatform.acctnumber', columns=['Timeperiod'],
                                         values='timestamp', aggfunc='count', fill_value=0).reset_index()

    Store_Revenue = pd.pivot_table(df00, index='_experienceplatform.acctnumber', columns=['Timeperiod'],
                                   values='_experienceplatform.salesamount', aggfunc='sum', fill_value=0).reset_index()

    Store_Product_Quantity = pd.pivot_table(df00, index='_experienceplatform.acctnumber', columns=['Timeperiod'],
                                            values='_experienceplatform.qty', aggfunc='sum', fill_value=0).reset_index()


    df_GroupByUser['Store_Orders_Placed_TP1'] = Store_Orders_Placed['1']
    df_GroupByUser['Store_Orders_Placed_TP2'] = Store_Orders_Placed['2']

    df_GroupByUser['Store_Revenue_TP1'] = Store_Revenue['1']
    df_GroupByUser['Store_Revenue_TP2'] = Store_Revenue['2']

    df_GroupByUser['Store_Product_Quantity_TP1'] = Store_Product_Quantity['1']
    df_GroupByUser['Store_Product_Quantity_TP2'] = Store_Product_Quantity['2']

    
    df01 = df_returns

    dfr_GroupByUser=df01.groupby(['_experienceplatform.acctnumber'])['timestamp'].count().to_frame('Store_Orders_Returned').reset_index()
    df8=df01.groupby(['_experienceplatform.acctnumber'])['timestamp'].apply(min).to_frame('First_Return_Date').reset_index()
    df9=df01.groupby(['_experienceplatform.acctnumber'])['timestamp'].apply(max).to_frame('Last_Return_Date').reset_index()
    df10=df01.groupby(['_experienceplatform.acctnumber'])['_experienceplatform.salesamount'].sum().to_frame('Store_Revenue_Refund').reset_index()
    df11=df01.groupby(['_experienceplatform.acctnumber'])['_experienceplatform.qty'].sum().to_frame('Store_Product_Quantity_Returned').reset_index()

# Calculate total revenue and orders for each timeperiod for modeling
    Store_Orders_Returned = pd.pivot_table(df01, index='_experienceplatform.acctnumber', columns=['Timeperiod'],
                                           values='timestamp', aggfunc='count', fill_value=0).reset_index()

    Store_Revenue_Refund = pd.pivot_table(df01, index='_experienceplatform.acctnumber', columns=['Timeperiod'],
                                          values='_experienceplatform.salesamount', aggfunc='sum', fill_value=0).reset_index()

    Store_Product_Quantity_Returned = pd.pivot_table(df01, index='_experienceplatform.acctnumber', columns=['Timeperiod'],
                                                     values='_experienceplatform.qty', aggfunc='sum', fill_value=0).reset_index()

    dfr_GroupByUser['Store_Orders_Returned'] = abs(dfr_GroupByUser['Store_Orders_Returned'])

    dfr_GroupByUser['First_Return_Date'] = df8['First_Return_Date']
    dfr_GroupByUser['Last_Return_Date'] = df9['Last_Return_Date']
    dfr_GroupByUser['Store_Revenue_Refund'] = abs(df10['Store_Revenue_Refund'])
    dfr_GroupByUser['Store_Product_Quantity_Returned'] = abs(df11['Store_Product_Quantity_Returned'])

    dfr_GroupByUser['Store_Orders_Returned_TP1'] = abs(Store_Orders_Returned['1'])
    dfr_GroupByUser['Store_Orders_Returned_TP2'] = abs(Store_Orders_Returned['2'])

    dfr_GroupByUser['Store_Revenue_Refund_TP1'] = abs(Store_Revenue_Refund['1'])
    dfr_GroupByUser['Store_Revenue_Refund_TP2'] = abs(Store_Revenue_Refund['2'])

    dfr_GroupByUser['Store_Product_Quantity_Returned_TP1'] = abs(Store_Product_Quantity_Returned['1'])
    dfr_GroupByUser['Store_Product_Quantity_Returned_TP2'] = abs(Store_Product_Quantity_Returned['2'])


    df_store = pd.merge(df_GroupByUser, dfr_GroupByUser, how='outer',on = '_experienceplatform.acctnumber')
    df_store= df_store.reset_index(drop=True)

    df_store['Store_Orders_Returned'] = df_store['Store_Orders_Returned'].fillna(0)
    df_store['Store_Orders_Returned_TP1'] = df_store['Store_Orders_Returned_TP1'].fillna(0)
    df_store['Store_Orders_Returned_TP2'] = df_store['Store_Orders_Returned_TP2'].fillna(0)

    df_store['Store_Revenue_Refund'] = df_store['Store_Revenue_Refund'].fillna(0)
    df_store['Store_Revenue_Refund_TP1'] = df_store['Store_Revenue_Refund_TP1'].fillna(0)
    df_store['Store_Revenue_Refund_TP2'] = df_store['Store_Revenue_Refund_TP2'].fillna(0)


    df_store['Store_Product_Quantity_Returned'] = df_store['Store_Product_Quantity_Returned'].fillna(0)
    df_store['Store_Product_Quantity_Returned_TP1'] = df_store['Store_Product_Quantity_Returned_TP1'].fillna(0)
    df_store['Store_Product_Quantity_Returned_TP2'] = df_store['Store_Product_Quantity_Returned_TP2'].fillna(0)

    
    df_store['First_Order_Date'] = pd.to_datetime(df_store['First_Order_Date'], errors='coerce')
    df_store['Last_Order_Date'] = pd.to_datetime(df_store['Last_Order_Date'], errors='coerce')
    
    df_store = df_store[~(df_store.First_Order_Date.isnull())]

    df_store = df_store[df_store['First_Order_Date']<= '2019-08-31']
    df_store['Churn']=1
    df_store.loc[(df_store['First_Order_Date']<= '2019-08-31') & (df_store['Last_Order_Date']> '2019-08-31'), 'Churn'] = 0
    
    
    all_data = df_store
    
    all_data.drop(['First_Return_Date',
                   'Last_Return_Date',
                   'First_Order_Date',
                   'Last_Order_Date'
                  ],axis=1, inplace=True)

    all_data['Store_Orders_Placed_TP1'] = np.where(all_data['Store_Orders_Placed_TP1'] > 264, 264,
                                                   all_data['Store_Orders_Placed_TP1'])
    all_data['Store_Orders_Returned_TP1'] = np.where(all_data['Store_Orders_Returned_TP1'] > 11, 11,
                                                     all_data['Store_Orders_Returned_TP1'])
    all_data['Store_Product_Quantity_Returned_TP1'] = np.where(all_data['Store_Product_Quantity_Returned_TP1'] > 40, 40,
                                                               all_data['Store_Product_Quantity_Returned_TP1'])
    all_data['Store_Product_Quantity_TP1'] = np.where(all_data['Store_Product_Quantity_TP1'] > 871, 871,
                                                      all_data['Store_Product_Quantity_TP1'])
    all_data['Store_Revenue_Refund_TP1'] = np.where(all_data['Store_Revenue_Refund_TP1'] > 497, 497,
                                                    all_data['Store_Revenue_Refund_TP1'])
    all_data['Store_Revenue_TP1'] = np.where(all_data['Store_Revenue_TP1'] > 3286, 3286,
                                             all_data['Store_Revenue_TP1'])

    all_data.rename(columns={'_experienceplatform.acctnumber': 'acctnumber'},inplace=True)

    dataframe = all_data
    

    print("Training Data Load Finish")
    return dataframe

Overwriting /home/asruser/my-workspace/.recipes/recipe-kgur_YT7/recipe/trainingdataloader.py


---

#### **Scoring Data Loader File**
Implement the `load` function to load and prepare the scoring data.

In [5]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from .utils import get_client_context
from platform_sdk.dataset_reader import DatasetReader

def load(config_properties):

    print("Scoring Data Load Start")

    #########################################
    # Load Data
    #########################################
    client_context = get_client_context(config_properties)

    dataset_reader = DatasetReader(client_context, config_properties['scoringDataSetId'])
    df0 = dataset_reader.read()
    
    df0['timestamp'] = df0['timestamp'].astype('datetime64[ns]')
    df0['Invoice_Month'] = df0['timestamp'].dt.month    

    df0.loc[(df0['Invoice_Month'] <= 8) ,'Timeperiod'] = '1'
    df0.loc[(df0['Invoice_Month'] > 8) ,'Timeperiod'] = '2'

    df_orders = df0[df0['_experienceplatform.qty']>0]
    df_returns= df0[df0['_experienceplatform.qty']<0]

    df00 = df_orders

    df_GroupByUser=df00.groupby(['_experienceplatform.acctnumber'])['timestamp'].count().to_frame('Store_Orders_Placed').reset_index()
    df1=df00.groupby(['_experienceplatform.acctnumber'])['timestamp'].apply(min).to_frame('First_Order_Date').reset_index()
    df2=df00.groupby(['_experienceplatform.acctnumber'])['timestamp'].apply(max).to_frame('Last_Order_Date').reset_index()
    df3=df00.groupby(['_experienceplatform.acctnumber'])['_experienceplatform.salesamount'].sum().to_frame('Store_Revenue').reset_index()
    df4=df00.groupby(['_experienceplatform.acctnumber'])['_experienceplatform.qty'].sum().to_frame('Store_Product_Quantity').reset_index()

    df_GroupByUser['First_Order_Date'] = df1['First_Order_Date']
    df_GroupByUser['Last_Order_Date'] = df2['Last_Order_Date']
    df_GroupByUser['Store_Revenue'] = df3['Store_Revenue']
    df_GroupByUser['Store_Product_Quantity'] = df4['Store_Product_Quantity']

# Calculate total revenue and orders for each timeperiod for modeling
    Store_Orders_Placed = pd.pivot_table(df00, index='_experienceplatform.acctnumber', columns=['Timeperiod'],
                                         values='timestamp', aggfunc='count', fill_value=0).reset_index()

    Store_Revenue = pd.pivot_table(df00, index='_experienceplatform.acctnumber', columns=['Timeperiod'],
                                   values='_experienceplatform.salesamount', aggfunc='sum', fill_value=0).reset_index()

    Store_Product_Quantity = pd.pivot_table(df00, index='_experienceplatform.acctnumber', columns=['Timeperiod'],
                                            values='_experienceplatform.qty', aggfunc='sum', fill_value=0).reset_index()


    df_GroupByUser['Store_Orders_Placed_TP1'] = Store_Orders_Placed['1']
    df_GroupByUser['Store_Orders_Placed_TP2'] = Store_Orders_Placed['2']

    df_GroupByUser['Store_Revenue_TP1'] = Store_Revenue['1']
    df_GroupByUser['Store_Revenue_TP2'] = Store_Revenue['2']

    df_GroupByUser['Store_Product_Quantity_TP1'] = Store_Product_Quantity['1']
    df_GroupByUser['Store_Product_Quantity_TP2'] = Store_Product_Quantity['2']

    
    df01 = df_returns

    dfr_GroupByUser=df01.groupby(['_experienceplatform.acctnumber'])['timestamp'].count().to_frame('Store_Orders_Returned').reset_index()
    df8=df01.groupby(['_experienceplatform.acctnumber'])['timestamp'].apply(min).to_frame('First_Return_Date').reset_index()
    df9=df01.groupby(['_experienceplatform.acctnumber'])['timestamp'].apply(max).to_frame('Last_Return_Date').reset_index()
    df10=df01.groupby(['_experienceplatform.acctnumber'])['_experienceplatform.salesamount'].sum().to_frame('Store_Revenue_Refund').reset_index()
    df11=df01.groupby(['_experienceplatform.acctnumber'])['_experienceplatform.qty'].sum().to_frame('Store_Product_Quantity_Returned').reset_index()

# Calculate total revenue and orders for each timeperiod for modeling
    Store_Orders_Returned = pd.pivot_table(df01, index='_experienceplatform.acctnumber', columns=['Timeperiod'],
                                           values='timestamp', aggfunc='count', fill_value=0).reset_index()

    Store_Revenue_Refund = pd.pivot_table(df01, index='_experienceplatform.acctnumber', columns=['Timeperiod'],
                                          values='_experienceplatform.salesamount', aggfunc='sum', fill_value=0).reset_index()

    Store_Product_Quantity_Returned = pd.pivot_table(df01, index='_experienceplatform.acctnumber', columns=['Timeperiod'],
                                                     values='_experienceplatform.qty', aggfunc='sum', fill_value=0).reset_index()

    dfr_GroupByUser['Store_Orders_Returned'] = abs(dfr_GroupByUser['Store_Orders_Returned'])

    dfr_GroupByUser['First_Return_Date'] = df8['First_Return_Date']
    dfr_GroupByUser['Last_Return_Date'] = df9['Last_Return_Date']
    dfr_GroupByUser['Store_Revenue_Refund'] = abs(df10['Store_Revenue_Refund'])
    dfr_GroupByUser['Store_Product_Quantity_Returned'] = abs(df11['Store_Product_Quantity_Returned'])

    dfr_GroupByUser['Store_Orders_Returned_TP1'] = abs(Store_Orders_Returned['1'])
    dfr_GroupByUser['Store_Orders_Returned_TP2'] = abs(Store_Orders_Returned['2'])

    dfr_GroupByUser['Store_Revenue_Refund_TP1'] = abs(Store_Revenue_Refund['1'])
    dfr_GroupByUser['Store_Revenue_Refund_TP2'] = abs(Store_Revenue_Refund['2'])

    dfr_GroupByUser['Store_Product_Quantity_Returned_TP1'] = abs(Store_Product_Quantity_Returned['1'])
    dfr_GroupByUser['Store_Product_Quantity_Returned_TP2'] = abs(Store_Product_Quantity_Returned['2'])


    df_store = pd.merge(df_GroupByUser, dfr_GroupByUser, how='outer',on = '_experienceplatform.acctnumber')
    df_store= df_store.reset_index(drop=True)

    df_store['Store_Orders_Returned'] = df_store['Store_Orders_Returned'].fillna(0)
    df_store['Store_Orders_Returned_TP1'] = df_store['Store_Orders_Returned_TP1'].fillna(0)
    df_store['Store_Orders_Returned_TP2'] = df_store['Store_Orders_Returned_TP2'].fillna(0)

    df_store['Store_Revenue_Refund'] = df_store['Store_Revenue_Refund'].fillna(0)
    df_store['Store_Revenue_Refund_TP1'] = df_store['Store_Revenue_Refund_TP1'].fillna(0)
    df_store['Store_Revenue_Refund_TP2'] = df_store['Store_Revenue_Refund_TP2'].fillna(0)


    df_store['Store_Product_Quantity_Returned'] = df_store['Store_Product_Quantity_Returned'].fillna(0)
    df_store['Store_Product_Quantity_Returned_TP1'] = df_store['Store_Product_Quantity_Returned_TP1'].fillna(0)
    df_store['Store_Product_Quantity_Returned_TP2'] = df_store['Store_Product_Quantity_Returned_TP2'].fillna(0)

    
    df_store['First_Order_Date'] = pd.to_datetime(df_store['First_Order_Date'], errors='coerce')
    df_store['Last_Order_Date'] = pd.to_datetime(df_store['Last_Order_Date'], errors='coerce')
    
    df_store = df_store[~(df_store.First_Order_Date.isnull())]

    df_store = df_store[df_store['First_Order_Date']<= '2019-08-31']
    df_store['Churn']=1
    df_store.loc[(df_store['First_Order_Date']<= '2019-08-31') & (df_store['Last_Order_Date']> '2019-08-31'), 'Churn'] = 0
    
    
    all_data = df_store
    
    all_data.drop(['First_Return_Date',
                   'Last_Return_Date',
                   'First_Order_Date',
                   'Last_Order_Date'
                  ],axis=1, inplace=True)
    
    
    all_data['Store_Orders_Placed_TP1'] = np.where(all_data['Store_Orders_Placed_TP1'] > 264, 264,
                                                   all_data['Store_Orders_Placed_TP1'])
    all_data['Store_Orders_Returned_TP1'] = np.where(all_data['Store_Orders_Returned_TP1'] > 11, 11,
                                                     all_data['Store_Orders_Returned_TP1'])
    all_data['Store_Product_Quantity_Returned_TP1'] = np.where(all_data['Store_Product_Quantity_Returned_TP1'] > 40, 40,
                                                               all_data['Store_Product_Quantity_Returned_TP1'])
    all_data['Store_Product_Quantity_TP1'] = np.where(all_data['Store_Product_Quantity_TP1'] > 871, 871,
                                                      all_data['Store_Product_Quantity_TP1'])
    all_data['Store_Revenue_Refund_TP1'] = np.where(all_data['Store_Revenue_Refund_TP1'] > 497, 497,
                                                    all_data['Store_Revenue_Refund_TP1'])
    all_data['Store_Revenue_TP1'] = np.where(all_data['Store_Revenue_TP1'] > 3286, 3286,
                                             all_data['Store_Revenue_TP1'])

    all_data.rename(columns={'_experienceplatform.acctnumber': 'acctnumber'},inplace=True)

    dataframe = all_data
    
    print("Scoring Data Load Finish")

    return dataframe

Overwriting /home/asruser/my-workspace/.recipes/recipe-kgur_YT7/recipe/scoringdataloader.py


---

#### **Pipeline File**
Implement the `train` function and return the trained model. Implement the `score` function to return a prediction made on the scoring data.

In [6]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

def train(configProperties, data):

    print("Train Start")

    #########################################
    # Extract fields from configProperties
    #########################################
    learning_rate = float(configProperties['learning_rate'])
    n_estimators = int(configProperties['n_estimators'])
    max_depth = int(configProperties['max_depth'])

    X_train = data.drop(['Churn','acctnumber'], axis=1).values
    Y_train = data['Churn'].values
   

    
    from sklearn.linear_model import LogisticRegression
    import sklearn
    from sklearn import metrics

    model = LogisticRegression()
    model.fit(X_train, Y_train)
    
    print("Train Complete")
    
    return model

def score(configProperties, data, model):

    print("Score Start")

    X_test = data.drop(['Churn','acctnumber'], axis=1).values
    
    y_pred = model.predict(X_test)

    data['Churn_Prediction'] = y_pred

    data = data[['acctnumber','Churn_Prediction']]

    data = data.rename(columns={'Churn_Prediction': '_experienceplatform.Churn_Prediction', 'acctnumber': '_experienceplatform.acctnumber'})


    print("Score Complete")
    return data

Overwriting /home/asruser/my-workspace/.recipes/recipe-kgur_YT7/recipe/pipeline.py


---

#### **Evaluator File**
Implement the `split` function to partition the training data and the `evaluate` function to the return the validation metrics you wish to see. Training output will be shown below this file cell.

In [7]:
from ml.runtime.python.core.regressionEvaluator import RegressionEvaluator
import numpy as np
import sklearn
from sklearn.model_selection import train_test_split


class Evaluator(RegressionEvaluator):
    def __init__(self):
        print ("Initiate")

    def split(self, configProperties={}, dataframe=None):
        
        train, val = train_test_split(dataframe, test_size=0.3,random_state=1234)

        return train, val

    def evaluate(self, data=[], model={}, configProperties={}):
        print ("Evaluation evaluate triggered")
        val = data.drop(['Churn','acctnumber'], axis=1)
        y_pred = model.predict(val)
        y_actual = data['Churn'].values
        
        accuracy = sklearn.metrics.accuracy_score(y_actual, y_pred, normalize=True, sample_weight=None)
        recall = sklearn.metrics.recall_score(y_actual, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None)
        precision = sklearn.metrics.precision_score(y_actual, y_pred, labels=None, pos_label=1, average='binary', sample_weight=None)
        
        
        metric = [{"name": "Accuracy", "value": accuracy, "valueType": "double"},
                  {"name": "Recall", "value": recall, "valueType": "double"},
                  {"name": "Precision", "value": precision, "valueType": "double"}]
        
          
        print(metric)
        return metric

Overwriting /home/asruser/my-workspace/.recipes/recipe-kgur_YT7/recipe/evaluator.py


---

#### **Data Saver File**
Implement the `save` function for saving your prediction. Scoring output will be added below this cell.

In [8]:
from .utils import get_client_context
from platform_sdk.models import Dataset
from platform_sdk.dataset_writer import DatasetWriter

def save(config_properties, prediction):
  print("Datasaver Start")

  client_context = get_client_context(config_properties)
  dataset = Dataset(client_context).get_by_id(config_properties['scoringResultsDataSetId'])
  dataset_writer = DatasetWriter(client_context, dataset)
  dataset_writer.write(prediction, file_format='json')

  print("Datasaver Finish")
  print(prediction)


Overwriting /home/asruser/my-workspace/.recipes/recipe-kgur_YT7/recipe/datasaver.py
