# Creating a Recommendation System Web App

_Deep Learning Nanodegree Program | Capstone Project_

---

## General Outline

Recall the general outline for SageMaker projects using a notebook instance.

1. Download or otherwise retrieve the data.
2. Process / Prepare the data.
3. Upload the processed data to S3.
4. Train a chosen model.
5. Test the trained model (typically using a batch transform job).
6. Deploy the trained model.
7. Use the deployed model.

**TODO**: provide more information on the special setup related to the project here.


------
## 1. Introduction

### 1.1 Project Overview

#### Background Info

In this project, our goal will be to have a simple web page which one can use to select a song and a music app user. The web page will then send the data to our deployed model which will predict whether the song should be recommendated to the music app user.


#### The Project Origin

[WSDM - KKBox's Music Recommendation Challenge](https://www.kaggle.com/c/kkbox-music-recommendation-challenge/data)

####  Related data sets

We will be using the [kkbox dataset](https://www.kaggle.com/bvmadduluri/wsdm-kkbox) which has been used for the [WSDM - KKBox's Music Recommendation Challenge](https://www.kaggle.com/c/kkbox-music-recommendation-challenge/data) kaggle competetion.


- **For detailed data preprocessing, please find the dedicated notebook within the same folder.**

In [1]:
# Make sure that we use SageMaker 1.x
!pip install sagemaker==1.72.0

Collecting sagemaker==1.72.0
  Downloading sagemaker-1.72.0.tar.gz (297 kB)
[K     |████████████████████████████████| 297 kB 15.7 MB/s eta 0:00:01
Collecting smdebug-rulesconfig==0.1.4
  Downloading smdebug_rulesconfig-0.1.4-py2.py3-none-any.whl (10 kB)
Building wheels for collected packages: sagemaker
  Building wheel for sagemaker (setup.py) ... [?25ldone
[?25h  Created wheel for sagemaker: filename=sagemaker-1.72.0-py2.py3-none-any.whl size=386358 sha256=c47bc5392f1844a496857acf174acaa461b3e23c5e88a9f0e278ee695b78ca1a
  Stored in directory: /home/ec2-user/.cache/pip/wheels/c3/58/70/85faf4437568bfaa4c419937569ba1fe54d44c5db42406bbd7
Successfully built sagemaker
Installing collected packages: smdebug-rulesconfig, sagemaker
  Attempting uninstall: smdebug-rulesconfig
    Found existing installation: smdebug-rulesconfig 1.0.1
    Uninstalling smdebug-rulesconfig-1.0.1:
      Successfully uninstalled smdebug-rulesconfig-1.0.1
  Attempting uninstall: sagemaker
    Found existing instal

### 1.2 Problem Statement

#### problem defintion:
given the song id, the user id (msno) and many other meta information of song and user, the trained model can predict whehter the song can be recommended to the user.


#### A strategy for solving the problem
we will try to extend the collaborative filering with the rich meta information of song and user. After some investigation, the following paper can be used as the base of our solution. 

- Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua.  Neural collaborative Filtering. WWW 2-17, 173–182. [Link](https://arxiv.org/pdf/1708.05031.pdf)

We will modify the NCF model architecture to include additional meta information.




discussion of the expected solution


### 1.3 Metrics:
Loss function
- binary cross entropy

The preformance of the algorithm is measured by 
- precision, 
- recall, 
- f1 score
- roc-auc score.

**For model evaluation, please check out the visualization notebook based on local model evaluation.**


In [1]:
%load_ext autoreload
%autoreload 2

## 3. Model Training

In [1]:
import pandas as pd
train_data = pd.read_csv('./data/train_sample.csv')

In [2]:
train_data

Unnamed: 0,msno,song_id,source_system_tab,source_screen_name,source_type,target,song_length,composer,lyricist,language,year,country,genre,artist,city,gender,registered_via,app_age,registration
0,27736,2177998,7,5,11,1,206471,241316,0,5,2016,200,58,131519,14,1,5,2103,2838
1,27736,456984,7,5,11,1,187802,147477,0,5,2016,153,103,196188,14,1,5,2103,2838
2,27736,2193373,7,5,11,1,247803,35078,0,5,2016,161,5,26332,14,1,5,2103,2838
3,27736,1821216,7,5,11,1,181115,117840,0,5,2016,161,103,156774,14,1,5,2103,2838
4,27736,2283904,7,5,11,0,257369,132191,0,5,2013,161,89,144996,14,1,5,2103,2838
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,1903,1450857,6,16,11,1,239709,53578,75968,5,2013,200,89,71037,14,1,5,396,4523
9996,1903,197806,6,16,11,0,149629,64332,0,5,2016,161,165,135485,14,1,5,396,4523
9997,1903,1484475,2,20,7,0,226440,99175,0,7,2016,81,165,71325,14,1,5,396,4523
9998,1903,674,8,4,12,0,179908,0,0,5,1999,161,89,103736,14,1,5,396,4523


## Step 3: Upload the data to S3

We will need to upload the training dataset to S3 in order for our training code to access it. 

### Uploading the training data


Next, we need to upload the training data to the SageMaker default S3 bucket so that we can provide access to it while training our model.

In [3]:
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()
prefix = 'sagemaker/recommendation_system'

role = sagemaker.get_execution_role()

In [5]:
data_dir = "./upload/"
input_data = sagemaker_session.upload_data(path=data_dir, bucket=bucket, key_prefix=prefix)

**NOTE:** The cell above uploads the entire contents of our data directory. This includes the `word_dict.pkl` file. This is fortunate as we will need this later on when we create an endpoint that accepts an arbitrary review. For now, we will just take note of the fact that it resides in the data directory (and so also in the S3 training bucket) and that we will need to make sure it gets saved in the model directory.

## Step 4: Build and Train the PyTorch Model

A model comprises three objects

 - Model Artifacts,
 - Training Code, and
 - Inference Code,
 
We will start by implementing our own neural network in PyTorch along with a training script. For the purposes of this project we have provided the necessary model object in the `model.py` file, inside of the `code` folder. You can see the provided implementation by running the cell below.

### Neural Collaborative Filtering 

we have implementated teh Neural Collaborative Filtering (NCF) model based on paper:
- Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua.  Neural collaborative Filtering. WWW 2-17, 173–182.

Link to the paper:
- https://arxiv.org/pdf/1708.05031.pdf

### Training method



In [9]:
def move_to(obj, device):
    """helpter function to move complex data structure to GPUs"""
    if torch.is_tensor(obj):
        return obj.to(device)
    elif isinstance(obj, dict):
        res = {}
        for k, v in obj.items():
            res[k] = move_to(v, device)
        return res
    elif isinstance(obj, list):
        res = []
        for v in obj:
            res.append(move_to(v, device))
        return res
    else:
        raise TypeError("Invalid type for move_to")
        
def train(model, train_loader, epochs, optimizer, loss_fn, device):
    for epoch in range(1, epochs + 1):
        model.train()
        total_loss = 0
        for batch in train_loader:         
            batch_X, batch_y = batch, batch['target'].float()
            
            batch_X = move_to(batch_X,device)
            batch_y = move_to(batch_y,device)
            
            
            # TODO: Complete this train method to train the model provided.
            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = model(batch_X)
            
            loss = loss_fn(outputs, batch_y)
            loss.backward()
            optimizer.step()
            
            total_loss += loss.data.item()
        print("Epoch: {}, BCELoss: {}".format(epoch, total_loss / len(train_loader)))

Supposing we have the training method above, we will test that it is working by writing a bit of code in the notebook that executes our training method on the small sample training set that we loaded earlier. The reason for doing this in the notebook is so that we have an opportunity to fix any errors that arise early when they are easier to diagnose.

In [13]:
from torch.utils.data import Dataset,DataLoader

class KKbox_set(Dataset):
    def __init__(self, df, transformation=None):
        super(KKbox_set).__init__()
        self.data = df
        self.data = list(self.data.T.to_dict().values())
        self.transform = transformation     
        
    def __getitem__(self,index):
        """return a role assignment in a dictionary form"""
        sample = self.data[index]
        return sample
    
    def __len__(self):
        return len(self.data)

In order to construct a PyTorch model using SageMaker we must provide SageMaker with a training script. We may optionally include a directory which will be copied to the container and from which our training code will be run. When the training container is executed it will check the uploaded directory (if there is one) for a `requirements.txt` file and install any required Python libraries, after which the training script will be run.

### (TODO) Training the model

When a PyTorch model is constructed in SageMaker, an entry point must be specified. This is the Python file which will be executed when the model is trained. Inside of the `train` directory is a file called `train.py` which has been provided and which contains most of the necessary code to train our model. The only thing that is missing is the implementation of the `train()` method which you wrote earlier in this notebook.

**TODO**: Copy the `train()` method written above and paste it into the `train/train.py` file where required.

The way that SageMaker passes hyperparameters to the training script is by way of arguments. These arguments can then be parsed and used in the training script. To see how this is done take a look at the provided `train/train.py` file.

In [39]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(entry_point="train.py",
                    source_dir="code",
                    role=role,
                    framework_version='0.4.0',
                    train_instance_count=1,
                    train_instance_type='ml.p2.xlarge',
                    hyperparameters={
                        'epochs': 10,
                        'embedding_dim': 10,
                    })

In [40]:
estimator.fit({'training': input_data})

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


2021-02-28 19:53:26 Starting - Starting the training job...
2021-02-28 19:53:28 Starting - Launching requested ML instances......
2021-02-28 19:54:35 Starting - Preparing the instances for training......
2021-02-28 19:55:50 Downloading - Downloading input data.........
2021-02-28 19:57:09 Training - Downloading the training image..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2021-02-28 19:57:32,174 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2021-02-28 19:57:32,198 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2021-02-28 19:57:35,231 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2021-02-28 19:57:35,473 sagemaker-containers INFO     Module train does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34m2021-02-28 19:57:35,474 sagemaker-containers IN


2021-02-28 19:58:02 Uploading - Uploading generated training model
2021-02-28 19:59:05 Completed - Training job completed
Training seconds: 195
Billable seconds: 195


## Step 5: Testing the model

As mentioned at the top of this notebook, we will be testing this model by first deploying it and then sending the testing data to the deployed endpoint. We will do this so that we can make sure that the deployed model is working correctly.

## Step 6: Deploy the model for testing

Now that we have trained our model, we would like to test it to see how it performs. Currently our model takes input of the form `review_length, review[500]` where `review[500]` is a sequence of `500` integers which describe the words present in the review, encoded using `word_dict`. Fortunately for us, SageMaker provides built-in inference code for models with simple inputs such as this.

There is one thing that we need to provide, however, and that is a function which loads the saved model. This function must be called `model_fn()` and takes as its only parameter a path to the directory where the model artifacts are stored. This function must also be present in the python file which we specified as the entry point. In our case the model loading function has been provided and so no changes need to be made.

**NOTE**: When the built-in inference code is run it must import the `model_fn()` method from the `train.py` file. This is why the training code is wrapped in a main guard ( ie, `if __name__ == '__main__':` )

Since we don't need to change anything in the code that was uploaded during training, we can simply deploy the current model as-is.

**NOTE:** When deploying a model you are asking SageMaker to launch an compute instance that will wait for data to be sent to it. As a result, this compute instance will continue to run until *you* shut it down. This is important to know since the cost of a deployed endpoint depends on how long it has been running for.

In other words **If you are no longer using a deployed endpoint, shut it down!**

**TODO:** Deploy the trained model.

In [41]:
# TODO: Deploy the trained model
predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


-----------------------------*

UnexpectedStatusException: Error hosting endpoint sagemaker-pytorch-2021-02-28-19-53-25-732: Failed. Reason:  The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..

### Delete the endpoint

Of course, just like in the XGBoost notebook, once we've deployed an endpoint it continues to run until we tell it to shut down. Since we are done using our endpoint for now, we can delete it.

In [51]:
estimator.delete_endpoint()

estimator.delete_endpoint() will be deprecated in SageMaker Python SDK v2. Please use the delete_endpoint() function on your predictor instead.
