## Recipe Builder Actions Overview

### Saving a File Cell
If you wish to save the contents of a cell, simply run it.  The `%%writefile` command at the top of the cell will write the contents of the cell to the file named at the top of the cell. You should run the cells manually when applicable. However, **pressing any of the actions at the top will automatically run all file cells relevant to the action**.

### Training and Scoring
Press the associated buttons at the top in order to run training or scoring. The training output will be shown below the `pipeline.py` cell and scoring output will be shown below the `datasaver.py` cell. You must run training at least once before you can run scoring. You may delete the output cell(s). Running training the first time or after changing `requirements.txt` will be slower since the dependencies for the recipe need to be installed, but subsequent runs will be signigicantly faster.

### Creating the Recipe
When you are done editing the recipe and satisfied with the training/scoring output, you can create a recipe from the notebook by pressing `Create Recipe`. After pressing it, you will see a spinner that will spin until the recipe creation has finished. If the recipe creation is successful the spinner will be replaced by an external link that you can click to navigate to the created recipe.


## Caution!
* **Do not delete any of the file cells**
* **Do not edit the `%%writefile` line at the top of the file cells**
* **Do not refresh the JupyterLab page while the recipe is being created**
</br>
</br>
---

#### **Requirements file** (Optional)
Add additional libraries you wish to use in the recipe to the cell below. You can specify the version number if necessary. The file cell below is a **commented out example**.

In [None]:

# pandas=0.22.0
# numpy

Search here for additional libraries https://anaconda.org/. This is the list of main **libraries already in use**: </br>
`python=3.5.2` `scikit-learn` `pandas` `numpy` `data_access_sdk_python` </br>
**Warning: libraries or specific versions you add may be incompatible with the above libraries**.

#### **Configuration file**
Specify the dataset(s) you wish to use for training/scoring and add hyperparameters. To find the dataset ids go to the **Data tab** in Adobe Experience Platform.

In [None]:


{
    "trainingDataSetId": "5c8c31db3603991515a6e2da",
    "scoringDataSetId": "5c8c31db3603991515a6e2da",
    "scoringResultsDataSetId":"5c913bbf16f0dc151609b69d",
    "ACP_DSW_TRAINING_XDM_SCHEMA":"https://ns.adobe.com/platformlab/schemas/764807fb1f2503047c09945e46cfdf30",
    "ACP_DSW_SCORING_RESULTS_XDM_SCHEMA":"https://ns.adobe.com/platformlab/schemas/637f4724918fe241b2e6d1cdf8353c39",
    "num_recommendations": "5",
    "sampling_fraction": "0.5"
}

**The following configuration parameters are automatically set for you when you train/score:** </br>
`ML_FRAMEWORK_IMS_USER_CLIENT_ID` `ML_FRAMEWORK_IMS_TOKEN` `ML_FRAMEWORK_IMS_ML_TOKEN` `ML_FRAMEWORK_IMS_TENANT_ID` `saveData`

---

#### **Evaluator file**
Fill in how you wish to evaluate your trained recipe and how your training data should be split. You can also use this file to load and prepare the training data.

In [None]:


from ml.runtime.python.Interfaces.AbstractEvaluator import AbstractEvaluator
from data_access_sdk_python.reader import DataSetReader
import numpy as np
import pandas as pd

class Evaluator(AbstractEvaluator):
    def __init__(self):
        print("Initiate")
        self.user_id_column = '_platformlab.userId'
        self.recommendations_column = '_platformlab.recommendations'
        self.item_id_column = '_platformlab.itemId'


    def evaluate(self, data=[], model={}, configProperties={}):
        print ("Evaluation evaluate triggered")
        
        # remove columns having none
        data = data[data[self.item_id_column].notnull()]
        
        data_grouped_by_user = data.groupby(self.user_id_column).agg(
            {self.item_id_column: lambda x: '#'.join(x)})\
        .rename(columns={self.item_id_column:'interactions'}).reset_index()
        
        data_recommendations = model.predict(data)
        
        merged_df = pd.merge(data_grouped_by_user, data_recommendations, on=[self.user_id_column]).reset_index()
        
        def compute_recall(row):
            set_interactions = set(row['interactions'].split('#'))
            set_recommendations = set(row[self.recommendations_column].split('#'))
            inters = set_interactions.intersection(set_recommendations)
            if len(inters) > 0:
                return 1
            return 0
        
        def compute_precision(row):
           set_interactions = set(row['interactions'].split('#'))
           list_recommendations = row[self.recommendations_column].split('#')
           score = 0
           weight = 0.5
           for rec in list_recommendations:
               if rec in set_interactions:
                   score = score + weight
               weight = weight / 2

           return score


        merged_df['recall'] = merged_df.apply(lambda row: compute_recall(row), axis=1)
        merged_df['precision'] = merged_df.apply(lambda row: compute_precision(row), axis=1)

        recall = merged_df['recall'].mean()
        precision = merged_df['precision'].mean()

        metric = [{"name": "Recall", "value": recall, "valueType": "double"},
                 {"name": "Precision", "value": precision, "valueType": "double"}]

        print(metric)

        return metric

    def split(self, configProperties={}):
        #########################################
        # Load Data
        #########################################
        prodreader = DataSetReader(client_id=configProperties['ML_FRAMEWORK_IMS_USER_CLIENT_ID'],
                                   user_token=configProperties['ML_FRAMEWORK_IMS_TOKEN'],
                                   service_token=configProperties['ML_FRAMEWORK_IMS_ML_TOKEN'])

        df = prodreader.load(data_set_id=configProperties['trainingDataSetId'],
                             ims_org=configProperties['ML_FRAMEWORK_IMS_TENANT_ID'])

        train = df[:]
        test = df[:]

        return train, test


---

#### **Training Data Loader file**
Call your Evaluator split here and/or use this file to load and prepare the training data.

In [None]:


import numpy as np
import pandas as pd
from data_access_sdk_python.reader import DataSetReader

from recipe.evaluator import Evaluator

def load(configProperties):
    print("Training Data Load Start")
    evaluator = Evaluator()
    (train_data, _) = evaluator.split(configProperties)

    print("Training Data Load Finish")
    return train_data



---

#### **Scoring Data Loader file**
Use this file to load and prepare your scoring data.

In [None]:


import numpy as np
import pandas as pd
from data_access_sdk_python.reader import DataSetReader

def load(configProperties):

    print("Scoring Data Load Start")

    #########################################
    # Load Data
    #########################################
    prodreader = DataSetReader(client_id=configProperties['ML_FRAMEWORK_IMS_USER_CLIENT_ID'],
                               user_token=configProperties['ML_FRAMEWORK_IMS_TOKEN'],
                               service_token=configProperties['ML_FRAMEWORK_IMS_ML_TOKEN'])

    df = prodreader.load(data_set_id=configProperties['scoringDataSetId'],
                         ims_org=configProperties['ML_FRAMEWORK_IMS_TENANT_ID'])

    print("Scoring Data Load Finish")

    return df


---

#### **Pipeline file**
Fill in the training and scoring functions for your recipe. Training output will be added below this file cell.

In [None]:


import pandas as pd
import numpy as np
from collections import Counter

class PopularityBasedRecommendationModel():
    def __init__(self, num_to_recommend):
        self.num_to_recommend = num_to_recommend
        self.recommendations = ['dummy']
        self.user_id_column = '_platformlab.userId'
        self.recommendations_column = '_platformlab.recommendations'
        self.item_id_column = '_platformlab.itemId'
    
    def fit(self, df):
        df = df[df[self.item_id_column].notnull()]
        self.recommendations = [item for item, freq in 
                                Counter(list(df[self.item_id_column].values)).most_common(self.num_to_recommend)]

        
    def predict(self, df):
        # remove columns having none
        df = df[df[self.item_id_column].notnull()]
        
        df_grouped_by_user = df.groupby(self.user_id_column).agg(
            {self.item_id_column: lambda x: ','.join(x)})\
        .rename(columns={self.item_id_column:'interactions'}).reset_index()
        
        df_grouped_by_user[self.recommendations_column] = '#'.join(self.recommendations)
        df_grouped_by_user = df_grouped_by_user.drop(['interactions'],axis=1)
        
        return df_grouped_by_user

def train(configProperties, data):

    print("Train Start")

    #########################################
    # Extract fields from configProperties
    #########################################
    num_recommendations = int(configProperties['num_recommendations'])

    #########################################
    # Fit model
    #########################################
    model = PopularityBasedRecommendationModel(num_recommendations)

    model.fit(data)

    print("Train Complete")

    return model

def score(configProperties, data, model):

    print("Score Start")

    result = model.predict(data)

    print("Score Complete")

    return result


---

#### **Data Saver file**
Add how you wish to save your scored data. **By default saveData=False, since saving data is not relevant when scoring from the notebook.** Scoring output will be added below this cell.

In [None]:

from data_access_sdk_python.writer import DataSetWriter
from functools import reduce
import json

def save(configProperties, prediction):
    
    print(prediction)
    prodwriter = DataSetWriter(client_id=configProperties['ML_FRAMEWORK_IMS_USER_CLIENT_ID'],
                               user_token=configProperties['ML_FRAMEWORK_IMS_TOKEN'],
                               service_token=configProperties['ML_FRAMEWORK_IMS_ML_TOKEN'])
    
    batch_id = prodwriter.write(data_set_id=configProperties['scoringResultsDataSetId'],
                 dataframe=prediction,
                 ims_org=configProperties['ML_FRAMEWORK_IMS_TENANT_ID'])
    print("Data written successfully to platform:",batch_id)