# Building collaborative filtering model for recommending products to customers
* Description: A documentation on building collaborative filtering model for recommending products to customers
* Tech: python + turicreate + (optional) Amazon SageMaker  
* Author: Ariel Novelli / arieladriannovelli@gmail.com
     
## Problem statement
* We are building a model for recommending product items based on previously purchased items.
* The output will be a function that searches for a recommendation list based on a speficied user.  
Input: userID  
Returns: ranked list of items (productIDs), that the user is most likely to want to put in her "basket".  

## Some considerations
* I will be using a dataset from a bookstore, but you can use a dataset from any domain.
* The structure of the input file should be like this: userID, productID  
* I will be using a large dataset with more than 100k users and more than 20k products.  
* To avoid any kind of "out of memory" error I will be using an AWS SageMaker Notebook instance.  
* If you are going to use a larga dataset I encorage you to do the same and launch an AWS SageMaker Notebook instance. It's easy!  

## AWS SageMaker Notebook instance
To create an Amazon SageMaker notebook instance:
1. Open the Amazon SageMaker console at https://console.aws.amazon.com/sagemaker/.  
2. Choose Notebook instances, then choose Create notebook instance.  
3. On the Create notebook instance page, provide the following information:  
a. For Notebook instance name, type a name for your notebook instance.  
b. For Notebook instance type, choose an instance type for your notebook instance. For a list of supported instance types, see Amazon SageMaker Limits.  
c. For Elastic Inference, choose an inference accelerator type to associate with the notebook instance if you plan to conduct inferences from the notebook instance, or choose none. For information about elastic inference, see Use Amazon SageMaker Elastic Inference (EI).  
d. (Optional) Additional configuration lets advanced users create a shell script that can run when you create or start the instance. This script, called a lifecycle configuration script, can be used to set the environment for the notebook or to perform other functions. For information, see Customize a Notebook Instance Using a Lifecycle Configuration Script.  
e. (Optional) Additional configuration also lets you specify the size, in GB, of the ML storage volume that is attached to the notebook instance. You can choose a size between 5 GB and 16,384 GB, in 1 GB increments. You can use the volume to clean up the training dataset or to temporarily store validation or other data.  
f. For IAM role, choose either an existing IAM role in your account that has the necessary permissions to access Amazon SageMaker resources or choose Create a new role. If you choose Create a new role, Amazon SageMaker creates an IAM role named AmazonSageMaker-ExecutionRole-YYYYMMDDTHHmmSS. The AWS managed policy AmazonSageMakerFullAccess is attached to the role. The role provides permissions that allow the notebook instance to call Amazon SageMaker and Amazon S3.  
g. For Root access, to enable root access for all notebook instance users, choose Enable. To disable root access for users, choose Disable.If you enable root access, all notebook instance users have administrator privileges and can access and edit all files on it.  
h. (Optional) Encryption key lets you encrypt data on the ML storage volume attached to the notebook instance using an AWS Key Management Service (AWS KMS) key. If you plan to store sensitive information on the ML storage volume, consider encrypting the information.  
i. (Optional) Network lets you put your notebook instance inside a Virtual Private Cloud (VPC). A VPC provides additional security and restricts access to resources in the VPC from sources outside the VPC. For more information on VPCs, see Amazon VPC User Guide.  
To add your notebook instance to a VPC:  
i. Choose the VPC and a SubnetId.  
ii. For Security Group, choose your VPC's default security group.  
iii. If you need your notebook instance to have internet access, enable direct internet access. For Direct internet access, choose Enable. Internet access can make your notebook instance less secure. For more information, see Connect a Notebook Instance to Resources in a VPC.  
j. (Optional) To associate Git repositories with the notebook instance, choose a default repository and up to three additional repositories. For more information, see Associate Git Repositories with Amazon SageMaker Notebook Instances.  
Choose Create notebook instance.  
In a few minutes, Amazon SageMaker launches an ML compute instance—in this case, a notebook instance—and attaches an ML storage volume to it. The notebook instance has a preconfigured Jupyter notebook server and a set of Anaconda libraries. For more information, see the   CreateNotebookInstance API.  
4. When the status of the notebook instance is InService, in the console, the notebook instance is ready to use. Choose Open Jupyter next to the notebook name to open the classic Jupyter dashboard.  
You can choose Open JupyterLab to open the JupyterLab dashboard. The dashboard provides access to your notebook instance and sample Amazon SageMaker notebooks that contain complete code walkthroughs. These walkthroughs show how to use Amazon SageMaker to perform common machine learning tasks. For more information, see Use Example Notebooks. For more information, see Access Notebook Instances.

## 1. Import modules
* `pandas` and `numpy` for data manipulation
* `turicreate`(*) for performing model selection and evaluation
* `sklearn` for splitting the data into train and test set  
  
(*) turicreate only supports apple/ linux. If you have windows follow my advise and launch an AWS SageMaker Notebook instance (see previous section).


In [173]:
import pandas as pd
import numpy as np
import time
import turicreate as tc
from sklearn.model_selection import train_test_split

import sys
sys.path.append("..")

## 2. Load data
I will be using a dataset from a bookstore that I am not allow to share.  
You can use any dataset of your own with this format: customerId, productId  
* `data.csv` consisting of user transactions  
  
The format is as follows.


In [174]:
transactions = pd.read_csv('data.csv', dtype={'customerId': np.int32, 'products': np.int32})
print(transactions.shape)
transactions.head()

(572205, 2)


Unnamed: 0,customerId,products
0,7701,43293
1,7701,37268
2,54490,42591
3,64646,28661
4,73889,37399


## 3. Data preparation
### 3.1. Create a function to transform data  
Now we have our data with this structure:  
customerID, productID  
1, 101  
1, 101  
1, 101  
2, 101  
2, 101  
  
We would like to have it with this structure:  
customerID, productID, purchase_count  
1, 101, 3  
2, 101, 2  


In [175]:
data = pd.melt(transactions.set_index('customerId')['products'].apply(pd.Series).reset_index(), 
             id_vars=['customerId'],
             value_name='products') \
    .dropna().drop(['variable'], axis=1) \
    .groupby(['customerId', 'products']) \
    .agg({'products': 'count'}) \
    .rename(columns={'products': 'purchase_count'}) \
    .reset_index() \
    .rename(columns={'products': 'productId'})
data['productId'] = data['productId'].astype(np.int64)

print(data.shape)
data.head()

(565725, 3)


Unnamed: 0,customerId,productId,purchase_count
0,1,3811,1
1,1,4037,1
2,1,29040,1
3,1,30306,1
4,1,47215,1


### 3.2. Create column dummy

* Dummy for marking whether a customer bought that item or not.  
* If one buys an item, then purchase_dummy are marked as 1  
* Why create a dummy?
    * This is because in my domain (bookstore) purchase_count does not give any kind of insight about customer preferences. 
    * This could be very different in other domains. 
    * I will create models using purchase_dummy. Feel free to use purchase_count if it applies to your domain.


In [176]:
def create_data_dummy(data):
    data_dummy = data.copy()
    data_dummy['purchase_dummy'] = 1
    return data_dummy

In [177]:
data_dummy = create_data_dummy(data)
data_dummy.head()

Unnamed: 0,customerId,productId,purchase_count,purchase_dummy
0,1,3811,1,1
1,1,4037,1,1
2,1,29040,1,1
3,1,30306,1,1
4,1,47215,1,1


## 4. Split train and test set
* Splitting the data into training and testing sets is an important part of evaluating predictive modeling, in this case a collaborative filtering model.   
* Typically, we use a larger portion of the data for training and a smaller portion for testing.  
* We use 70:30 ratio for our train-test set size.  
* Our training portion will be used to develop a predictive model, while the other to evaluate the model's performance.  
* Now that we have two datasets with purchase counts, purchase dummy, we would like to split each.  


In [178]:
# We define a split_data function for splitting data to training and test set

def split_data(data, testsize):
    '''
    Splits dataset into training and test set.
    
    Args:
        data (pandas.DataFrame)
        
    Returns
        train_data (tc.SFrame)
        test_data (tc.SFrame)
    '''
    train, test = train_test_split(data, test_size = testsize)
    train_data = tc.SFrame(train)
    test_data = tc.SFrame(test)
    return train_data, test_data

In [179]:
train_data, test_data = split_data(data, .3)
train_data_dummy, test_data_dummy = split_data(data_dummy, .3)


## 5. Models
### 5.1 Baseline: popularity
* We would like to use a baseline model to compare and evaluate collaborative filtering models.   
* The popularity baseline model takes the most popular items for recommendation (products with the highest number of sells across all customers). 
* Since baseline typically uses a very simple approach, techniques used beyond this approach should be chosen if they show better accuracy.  

### 5.2 Collaborative filtering  
* In collaborative filtering, we would recommend items based on how similar users purchase items.   
* In order to compute similarity we will be using three different types: cosine, pearson and jaccard.  
#### 5.2.1 Cosine similarity
* Similarity is the cosine of the angle between the 2 vectors of the item vectors of A and B
* It is defined by the following formula
![](https://wikimedia.org/api/rest_v1/media/math/render/svg/1d94e5903f7936d3c131e040ef2c51b473dd071d)
* Closer the vectors, smaller will be the angle and larger the cosine

#### 5.2.2 Pearson similarity
* Similarity is the pearson coefficient between the two vectors.
* It is defined by the following formula
![](http://critical-numbers.group.shef.ac.uk/glossary/images/correlationKT1.png)

#### 5.2.3 Jaccard similarity
* Jaccard similarity is used to measure the similarity between two set of elements.  
* In the context of recommendation, the Jaccard similarity between two items is computed as:  
![](https://wikimedia.org/api/rest_v1/media/math/render/svg/eaef5aa86949f49e7dc6b9c8c3dd8b233332c9e7)


#### Define a `model` function for model selection

In [180]:
# Define a model function for model selection using turicreate

def model(train_data, name, user_id, item_id, target, users_to_recommend, n_rec, n_display):
    if name == 'popularity':
        model = tc.popularity_recommender.create(train_data, 
                                                    user_id=user_id, 
                                                    item_id=item_id, 
                                                    target=target)
    elif name == 'cosine':
        model = tc.item_similarity_recommender.create(train_data, 
                                                    user_id=user_id, 
                                                    item_id=item_id, 
                                                    target=target, 
                                                    similarity_type='cosine')
    elif name == 'pearson':
        model = tc.item_similarity_recommender.create(train_data, 
                                                    user_id=user_id, 
                                                    item_id=item_id, 
                                                    target=target, 
                                                    similarity_type='pearson')        
    elif name == 'jaccard':
        model = tc.item_similarity_recommender.create(train_data, 
                                                    user_id=user_id, 
                                                    item_id=item_id, 
                                                    target=target, 
                                                    similarity_type='jaccard')
        
        
    recom = model.recommend(users=users_to_recommend, k=n_rec)
    recom.print_rows(n_display)
    return model

In [181]:
# variables to define field names
user_id = 'customerId'
item_id = 'productId'

users_to_recommend = list(data.customerId.unique())
n_rec = 5 # number of items to recommend
n_display = 30 # to print the head

In [182]:
# Baseline: popularity over purchase_dummy
# feel free to use purchase_count
name = 'popularity'
target = 'purchase_dummy'
#target = 'purchase_count'
pop_dummy = model(train_data_dummy, name, user_id, item_id, target, users_to_recommend, n_rec, n_display)

+------------+-----------+-------+------+
| customerId | productId | score | rank |
+------------+-----------+-------+------+
|     1      |   29441   |  1.0  |  1   |
|     1      |   19250   |  1.0  |  2   |
|     1      |   31536   |  1.0  |  3   |
|     1      |   25292   |  1.0  |  4   |
|     1      |   32880   |  1.0  |  5   |
|     2      |   29441   |  1.0  |  1   |
|     2      |   19250   |  1.0  |  2   |
|     2      |   31536   |  1.0  |  3   |
|     2      |   25292   |  1.0  |  4   |
|     2      |   32880   |  1.0  |  5   |
|     3      |   29441   |  1.0  |  1   |
|     3      |   19250   |  1.0  |  2   |
|     3      |   31536   |  1.0  |  3   |
|     3      |   25292   |  1.0  |  4   |
|     3      |   32880   |  1.0  |  5   |
|     4      |   29441   |  1.0  |  1   |
|     4      |   19250   |  1.0  |  2   |
|     4      |   31536   |  1.0  |  3   |
|     4      |   25292   |  1.0  |  4   |
|     4      |   32880   |  1.0  |  5   |
|     5      |   29441   |  1.0  |

In [183]:
# Cosine
name = 'cosine'
target = 'purchase_dummy'
#target = 'purchase_count'
cos_dummy = model(train_data_dummy, name, user_id, item_id, target, users_to_recommend, n_rec, n_display)

+------------+-----------+----------------------+------+
| customerId | productId |        score         | rank |
+------------+-----------+----------------------+------+
|     1      |   17440   | 0.09622504313786824  |  1   |
|     1      |   43977   | 0.09622504313786824  |  2   |
|     1      |    6034   | 0.04811253150304159  |  3   |
|     1      |   15145   | 0.04811253150304159  |  4   |
|     1      |    595    | 0.04303314288457235  |  5   |
|     2      |   32738   | 0.047377943992614746 |  1   |
|     2      |   55484   |  0.0370370348294576  |  2   |
|     2      |    5678   |  0.0370370348294576  |  3   |
|     2      |   22268   |  0.0370370348294576  |  4   |
|     2      |   55744   |  0.0370370348294576  |  5   |
|     3      |    9861   | 0.020689308643341064 |  1   |
|     3      |    1825   | 0.02044650912284851  |  2   |
|     3      |   42787   | 0.01811191439628601  |  3   |
|     3      |   41231   | 0.01669451594352722  |  4   |
|     3      |   56688   | 0.01

In [187]:
# Pearson
name = 'pearson'
target = 'purchase_dummy'
#target = 'purchase_count'
pear_dummy = model(train_data_dummy, name, user_id, item_id, target, users_to_recommend, n_rec, n_display)

+------------+-----------+-------+------+
| customerId | productId | score | rank |
+------------+-----------+-------+------+
|     1      |   29441   |  0.0  |  1   |
|     1      |   19250   |  0.0  |  2   |
|     1      |   31536   |  0.0  |  3   |
|     1      |   25292   |  0.0  |  4   |
|     1      |   32880   |  0.0  |  5   |
|     2      |   29441   |  0.0  |  1   |
|     2      |   19250   |  0.0  |  2   |
|     2      |   31536   |  0.0  |  3   |
|     2      |   25292   |  0.0  |  4   |
|     2      |   32880   |  0.0  |  5   |
|     3      |   29441   |  0.0  |  1   |
|     3      |   19250   |  0.0  |  2   |
|     3      |   31536   |  0.0  |  3   |
|     3      |   25292   |  0.0  |  4   |
|     3      |   32880   |  0.0  |  5   |
|     4      |   29441   |  0.0  |  1   |
|     4      |   19250   |  0.0  |  2   |
|     4      |   31536   |  0.0  |  3   |
|     4      |   25292   |  0.0  |  4   |
|     4      |   32880   |  0.0  |  5   |
|     5      |   29441   |  0.0  |

In [188]:
# Jaccard
name = 'jaccard'
target = 'purchase_dummy'
#target = 'purchase_count'
jacc_dummy = model(train_data_dummy, name, user_id, item_id, target, users_to_recommend, n_rec, n_display)


+------------+-----------+-----------------------+------+
| customerId | productId |         score         | rank |
+------------+-----------+-----------------------+------+
|     1      |   17440   |  0.02777777115503947  |  1   |
|     1      |   43977   |  0.02777777115503947  |  2   |
|     1      |    6034   |  0.02222222089767456  |  3   |
|     1      |   15145   |  0.02222222089767456  |  4   |
|     1      |    595    |  0.020833333333333332 |  5   |
|     2      |   23708   |  0.013550142447153727 |  1   |
|     2      |   32738   |  0.006734013557434082 |  2   |
|     2      |   22433   |  0.006472488244374593 |  3   |
|     2      |   40050   |  0.00584795077641805  |  4   |
|     2      |   24461   |  0.005108555157979329 |  5   |
|     3      |    9861   | 0.0072921812534332275 |  1   |
|     3      |   42787   |  0.007282644510269165 |  2   |
|     3      |   19852   |  0.005385607481002808 |  3   |
|     3      |   53642   |  0.004958689212799072 |  4   |
|     3      |

## 7. Model Evaluation
For evaluating recommendation models, we will use the concept of precision-recall.

* Precision
    * Also called positive predictive value, is the fraction of relevant instances among the retrieved instances.  
    * If 5 products were recommended to the customer out of which he buys 3 of them, then precision is 0.6

* Recall
    * Also known as sensitivity, is the fraction of the total amount of relevant instances that were actually retrieved.   
    * If a customer buys 5 products and the recommendation decided to show 4 of them, then the recall is 0.8
    
Our aim is to optimize both precision and recall.  
Lets compare the models we have built based on precision-recall:  

In [189]:
models = [pop_dummy, cos_dummy, pear_dummy, jacc_dummy]
names = ['Baseline Popularity Model', 'Cosine Similarity', 'Pearson Similarity', 'Jaccard Similarity']


In [190]:
eval_dummy = tc.recommender.util.compare_models(test_data_dummy, models, model_names=names)

PROGRESS: Evaluate model Baseline Popularity Model



Precision and recall summary statistics by cutoff
+--------+------------------------+------------------------+
| cutoff |     mean_precision     |      mean_recall       |
+--------+------------------------+------------------------+
|   1    | 2.3298618391929365e-05 | 9.707757663303904e-06  |
|   2    | 1.1649309195964682e-05 | 9.707757663303908e-06  |
|   3    | 1.164930919596466e-05  | 1.1371944691298866e-05 |
|   4    | 8.736981896973541e-06  | 1.1371944691298809e-05 |
|   5    | 6.989585517578789e-06  | 1.1371944691298826e-05 |
|   6    | 9.707757663303904e-06  | 2.4186184806859962e-05 |
|   7    | 1.1649309195964688e-05 | 3.971859706814611e-05  |
|   8    | 1.3105472845460233e-05 | 4.5543251666128706e-05 |
|   9    | 1.2943676884405236e-05 |  5.13679062641109e-05  |
|   10   | 1.1649309195964667e-05 | 5.136790626411092e-05  |
+--------+------------------------+------------------------+
[10 rows x 3 columns]


Overall RMSE: 0.0

Per User RMSE (best)
+------------+------+-------+
|


Precision and recall summary statistics by cutoff
+--------+-----------------------+-----------------------+
| cutoff |     mean_precision    |      mean_recall      |
+--------+-----------------------+-----------------------+
|   1    |  0.001945434635726103 | 0.0012218835181370975 |
|   2    | 0.0014969362316814617 | 0.0017647872631447136 |
|   3    | 0.0013086057330133527 | 0.0022926528090084423 |
|   4    | 0.0011794925560914256 | 0.0027142052514247937 |
|   5    |  0.001129982992008575 |   0.0031683803823316  |
|   6    | 0.0010600871368327846 | 0.0035387502484028658 |
|   7    | 0.0009918554686849858 | 0.0038851558212709978 |
|   8    | 0.0009421378812236443 |  0.00422210746702253  |
|   9    | 0.0009021742788430556 | 0.0045778324872474616 |
|   10   | 0.0008760280515365441 |  0.00493069561008328  |
+--------+-----------------------+-----------------------+
[10 rows x 3 columns]


Overall RMSE: 0.9990383995416421

Per User RMSE (best)
+------------+------+-------+
| customerId |


Precision and recall summary statistics by cutoff
+--------+------------------------+------------------------+
| cutoff |     mean_precision     |      mean_recall       |
+--------+------------------------+------------------------+
|   1    | 3.4947927587893936e-05 | 1.0484378276368205e-05 |
|   2    | 1.7473963793947042e-05 | 1.0484378276368188e-05 |
|   3    | 1.1649309195964621e-05 | 1.0484378276368206e-05 |
|   4    | 8.736981896973535e-06  | 1.0484378276368205e-05 |
|   5    | 6.989585517578765e-06  | 1.0484378276368196e-05 |
|   6    | 5.824654597982333e-06  | 1.0484378276368201e-05 |
|   7    | 6.656748111979829e-06  | 1.6309032874350556e-05 |
|   8    | 5.824654597982331e-06  | 1.630903287435065e-05  |
|   9    | 1.035494150752416e-05  | 4.640308163059274e-05  |
|   10   | 9.319447356771752e-06  | 4.640308163059276e-05  |
+--------+------------------------+------------------------+
[10 rows x 3 columns]


Overall RMSE: 1.0

Per User RMSE (best)
+------------+------+-------+
|


Precision and recall summary statistics by cutoff
+--------+-----------------------+----------------------+
| cutoff |     mean_precision    |     mean_recall      |
+--------+-----------------------+----------------------+
|   1    |  0.006139185946273366 | 0.003906332051645545 |
|   2    |  0.004980079681274885 | 0.00614822353569121  |
|   3    |  0.004244231650396437 | 0.007688942842138102 |
|   4    | 0.0038500966892663106 | 0.009013848782791597 |
|   5    |  0.003476153864075791 | 0.009989765167249262 |
|   6    | 0.0031996769258249343 | 0.010960763794989371 |
|   7    |  0.002975566406054933 | 0.011818273143097713 |
|   8    | 0.0028016588616295093 | 0.012702136741891029 |
|   9    | 0.0026741636443181685 | 0.013547687057106557 |
|   10   | 0.0025488688520769877 | 0.014351992559287234 |
+--------+-----------------------+----------------------+
[10 rows x 3 columns]


Overall RMSE: 0.9993991692398976

Per User RMSE (best)
+------------+--------------------+-------+
| customerId |

## 8. Model Selection
### 8.1. Evaluation summary
* Precision and Recall
![](../images/model_comparisons.png)


* Looking at the evaluation summary, we see that the precision and recall for Collaborative Filtering models > Baseline Popularity model, and we choose Jaccard Similarity.  
* Evaluating these kind of models is difficult and I would say it is normal to obtain low values for precision and recall.  
* This is becuase we actually are not trying to predict what porducts the customer is going to buy, but to give the customer recommendations based on her own previous purchases.    
* This evaluation give us an idea of model performance but not a final word.    
* It is importan to understand that every recommendation given by the model is joint with a similarity score.    
* This score gives us the support of this recommendation.    
* The higher the score the bigger the chance that the recommendation is a good one.  
* I would only suggest recommendations if scores are sufficiently large.  



## 9. Output

### 9.1 Re-build the model using winner similarity type (jaccard) and all data


In [191]:
users_to_recommend = list(customers[user_id])

final_model = tc.item_similarity_recommender.create(tc.SFrame(data_dummy), 
                                            user_id=user_id, 
                                            item_id=item_id, 
                                            target='purchase_dummy', 
                                            similarity_type='jaccard')

recom = final_model.recommend(users=users_to_recommend, k=5)
recom.print_rows(n_display)

+------------+-----------+-----------------------+------+
| customerId | productId |         score         | rank |
+------------+-----------+-----------------------+------+
|     1      |   46978   |   0.0555555522441864  |  1   |
|     1      |   51614   |   0.0370370348294576  |  2   |
|     1      |    2684   |  0.02777778108914693  |  3   |
|     1      |    740    |  0.02777778108914693  |  4   |
|     1      |   11726   |  0.02777778108914693  |  5   |
|     2      |   23708   |  0.014981269836425781 |  1   |
|     2      |   54554   |  0.008572399616241455 |  2   |
|     2      |   42978   |  0.006504058837890625 |  3   |
|     2      |   48866   |  0.006097555160522461 |  4   |
|     2      |   25737   | 0.0054200490315755205 |  5   |
|     3      |    7407   |   0.0416666716337204  |  1   |
|     3      |    6470   |   0.0416666716337204  |  2   |
|     3      |   56973   |   0.0357142835855484  |  3   |
|     3      |    4209   |   0.0357142835855484  |  4   |
|     3      |

### 9.1. CSV output file with all recommendations

In [192]:
df_rec = recom.to_dataframe()
df_rec.to_csv('recommendation.csv')


### 9.2. Customer recommendation function

In [193]:
df_rec['recommendedProducts'] = df_rec.groupby([user_id])[item_id].transform(lambda x: '|'.join(x.astype(str)))
df_output = df_rec[['customerId', 'recommendedProducts']].drop_duplicates().sort_values('customerId').set_index('customerId')

def customer_recomendation(customer_id):
    if customer_id not in df_output.index:
        print('Customer not found.')
        return customer_id
    return df_output.loc[customer_id]

In [194]:
customer_recomendation(101)

recommendedProducts    11685|51032|49437|55977|36001
Name: 101, dtype: object

## Summary
We were able to create a recommendation system for making recommendations to customers.   
We used Collaborative Filtering approaches with cosine, pearson and jaccard measure and compare the models with our baseline popularity model.   
We evaluated our models using precision and recall and realized the impact of personalization.   
Finally, we selected the best approach, exported the recommendations and created a recommendation function.  
