<div align="center"><a href="https://www.nvidia.com/en-us/deep-learning-ai/education/"><img src="./images/DLI_Header.png"></a></div>

Welcome to the assessment for this course. While we have gone over a variety of algorithms, we haven't done the most important part. We still need to serve a user a recommendation!

Not a problem. We have all the pieces, time to put them together. For this assessment, we've constructed a skeleton for an end-to-end recommender system by copying over some code from Labs 1 and 2. By using what we've learned about the strengths and weaknesses of these algorithms, it will be up to you to apply them in the correct places.

The code we have from the previous lab is:
* [als.py](assessment/als.py) - Train and predict with an alternating least squares model
* [dataset.py](assessment/dataset.py) - Prepare a dataset into a tf.data
* [wide_and_deep.py](assessment/wide_and_deep.py) Construct a Wide & Deep TensorFlow model
* [utils.py](assessment/utils.py) - Helper functions such as calculating RMSE

As a reference, we can also unzip the packaged notebooks from the previous lab by uploading them (with the upward pointing arrow above the file menu) and running the cell block below. Please keep in mind that these previous lab notebooks won't run in this lab3 coding environment. 

In [1]:
!unzip lab1_work.zip -d lab1/
!unzip lab2_work.zip -d lab2/

Archive:  lab1_work.zip
  inflating: lab1/1-01_intro.ipynb   
  inflating: lab1/1-03_content_based_intro.ipynb  
  inflating: lab1/1-05_als_intro.ipynb  
  inflating: lab1/1-02_environment.ipynb  
  inflating: lab1/1-06_als_real_data.ipynb  
  inflating: lab1/1-04_content_based_real_data.ipynb  
Archive:  lab2_work.zip
  inflating: lab2/2-04_wide_and_deep.ipynb  
  inflating: lab2/2-01_intro.ipynb   
  inflating: lab2/2-02_dataloader.ipynb  
  inflating: lab2/2-03_deep_nn.ipynb  


Let's get started. Here's the total pipeline we'll be constructing.

<img src="images/endtoend.png" height="300" width="500"/>

The goal is to create two python files, [trainer.py](assessment/trainer.py) and [client.py](assessment/client.py). [trainer.py](assessment/trainer.py) will train both the Candidate Generator Model and the Candidate Scoring Model. [client.py](assessment/client.py) will return recommendations for a specified user from Triton.

## Scoring
* [1. Model Training](#1.-Model-Training)
 * [1.1 Candidate Generator Model](#1.1-Candidate-Generator-Model) - `30 Points`
 * [1.2 Candidate Scoring Model](#1.2-Candidate-Scoring-Model) - `30 Points`
* [2. Model Deployment](#2.-Model-Deployment)
 * [2.1 Triton](#2.1-Triton) - `20 Points`
 * [2.2 Client Application](#2.2-Client-Application) - `20 Points`

## 1. Model Training

In [trainer.py](assessment/trainer.py), we have four functions:
* `get_als_model`: Initializes and trains an ALS Model
* `get_wide_and_deep_model`: Initializes and trains a Wide & Deep Model
* `get_candidate_generator`: Creates a Candidate Generator Model
* `get_candidate_scoring_model`: Creates a Candidate Scoring Model

Each of them have a number of `FIXME`s. 

### 1.1 Candidate Generator Model
For `get_candidate_generator`, implement either a ALS (`get_als_model`) or Wide & Deep model (`get_wide_and_deep_model`). It will be trained on the ratings data below.

In [2]:
import pandas as pd

ratings = pd.read_csv("data/task_3_ratings.csv")
ratings.head()

Unnamed: 0,user_index,item_index,overall,valid
0,180332,781,1.0,False
1,55433,781,2.0,False
2,34202,781,5.0,False
3,77087,781,4.0,False
4,67012,781,1.0,False


The Candidate Generator is focused on speed, and is meant to quickly reduce our item catalogue down to a few hundred that our user might like.

One way we can speed up predictions for our users is to cache the results of the Candidate Generator. If we do this, including contextual information such as time and location is not so straightforward. We would need to cache different results for different combinations of time and place for the variety of different ways users can interact with our system.

Let's keep things simple for now, and instead just focus on `user_index`, `item_index`, and `overall`.

Whether ALS or Wide and Deep is chosen, the Candidate Generator must be fast in both training and prediction:

* The Candidate Generator should be trained in less than 15 seconds `10 points`.
* The Candidate Generator should have a Root Mean Squared Error of less than 1.3 on the Validation Dataset `10 points`.
* The Candidate Generator should make predictions for the Validation Dataset in 0.005 seconds `10 points`.

**While either algorithm can hypothetically pass these requirements, one will be significantly easier to implement than the other.**

### 1.2 Candidate Scoring Model

The goal of the Candidate Scorer Model is to take the candidates from the Generator Model, and to rank them in order of the user's predicted preference. While the Generator Model is focused on speed, the Scoring Model is focused on accuracy. Additionally, the Scoring Model can more easily incorporate contextual information as it is ran when a user asks for a recommendation.

When we trained our Wide and Deep Model in Lab 2, our `metadata` was already joined with user rankings. However, in production when we're making a prediction for one user, our predictions will be based on our separate metadata table:

In [3]:
metadata = pd.read_csv("data/task_3_metadata.csv")
metadata.head()

Unnamed: 0,item_index,brand_index,category_0_0,category_0_1,category_0_2,category_0_3,category_1_0,category_1_1,category_1_2,category_1_3,...,category_1_2_index,salesRank_Electronics,salesRank_Camera,salesRank_Computers,salesRank_CellPhones,salesRank_CellPhones_NA,salesRank_Electronics_NA,salesRank_Camera_NA,salesRank_Computers_NA,price_filled
0,0,0,Electronics,GPS & Navigation,Vehicle GPS,Trucking GPS,,,,,...,101,147236.0,18665.5,14369.5,254050.0,False,False,False,False,299.99
1,1,0,Electronics,Computers & Accessories,Touch Screen Tablet Accessories,Chargers & Adapters,,,,,...,101,147236.0,18665.5,14369.5,254050.0,False,False,False,False,49.95
2,2,337,Electronics,eBook Readers & Accessories,Power Adapters,,,,,,...,101,147236.0,18665.5,14369.5,254050.0,False,False,False,False,19.65
3,3,3146,Electronics,Accessories & Supplies,Audio & Video Accessories,TV Accessories & Parts,,,,,...,101,147236.0,18665.5,14369.5,254050.0,False,False,False,False,29.99
4,4,0,Electronics,Computers & Accessories,Tablets,,,,,,...,101,147236.0,18665.5,14369.5,254050.0,False,False,False,False,188.88


We should update our training data to reflect this. We've kept our training data from Lab 2.

In [4]:
lab_2_data = pd.read_csv("data/task_2_wide_and_deep.csv")
lab_2_data.head()

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,reviewerID,asin,overall,unixReviewTime,brand,category_0_0,category_0_1,category_0_2,category_0_3,category_1_0,...,user_index,item_index,brand_index,als_prediction,user_embed_0,user_embed_1,item_embed_0,item_embed_1,category_0_2_index,category_1_2_index
0,ARA6X7G3KBX39,B00005B4BW,1.0,1042243200,,Electronics,Camera & Photo,Lighting & Studio,Photo Studio,,...,180332,781,0,3.105687,1.451223,-2.425463,0.715103,-1.19517,83,101
1,A231WM2Z2JL0U3,B00005B4BW,2.0,965433600,,Electronics,Camera & Photo,Lighting & Studio,Photo Studio,,...,55433,781,0,3.53297,1.65088,-2.759162,0.76952,-1.28612,83,101
2,A1O130H3XTF5WF,B00005B4BW,5.0,954460800,,Electronics,Camera & Photo,Lighting & Studio,Photo Studio,,...,34202,781,0,3.962095,1.851401,-3.094298,0.820172,-1.370778,83,101
3,A2IIZ25SZSQGCC,B00005B4BW,4.0,1030233600,,Electronics,Camera & Photo,Lighting & Studio,Photo Studio,,...,77087,781,0,2.661104,1.243477,-2.078256,0.770355,-1.28751,83,101
4,A2BBDPGILE8EN4,B00005B4BW,1.0,1009497600,,Electronics,Camera & Photo,Lighting & Studio,Photo Studio,,...,67012,781,0,3.232237,1.510355,-2.524297,0.709057,-1.185062,83,101


Update the `get_candidate_scoring_model` in [trainer.py](assessment/trainer.py) in order to calculate the [Gaussian Rank](https://medium.com/rapids-ai/gauss-rank-transformation-is-100x-faster-with-rapids-and-cupy-7c947e3397da) of our `metadata`'s `salesRank_Electronics` column. Then, join this column into `lab_2_data`.

Finally, choose one of either ALS (`get_als_model`) or Wide and Deep model (`get_wide_and_deep_model`) to implement as the Candidate Scoring Model.

Whether ALS or Wide and Deep is chosen, the Candidate Scoring Model needs to meet the following requirements:

* The Candidate Scoring Model should be trained in less than 600 seconds `10 points`.
* The Candidate Scoring Model should have a Root Mean Squared Error of less than 1.175 on the Validation Dataset `10 points`.
* The Candidate Scoring Model should make predictions for the Validation Dataset in 30 seconds `10 points`.

**While either algorithm can hypothetically pass these requirements, one will be significantly easier to implement than the other.** Run the below cell to verify that [trainer.py](assessment/trainer.py) has been correctly implemented for both Candidate Generation and Candidate Scoring, meeting all of the above requirements. 

In [8]:
!python3 -m assessment.trainer

2023-03-18 05:18:10.407533: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-03-18 05:18:12.875603: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2023-03-18 05:18:12.875737: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2023-03-18 05:18:12.875911: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1038] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-03-18 05:18:12.876521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Found device 0 with properties: 
pciBusID: 0000:00:1e.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2023-03-18 05:18:12.876574: I tensorflow/stream_executor/platform/default/dso_load

## 2. Model Deployment

If the above code passed, it should have saved the components for the Candidate Generation Model and the Candidate Scoring Model. We'll need to load them into Triton and our client application. Let's start with Triton.

### 2.1 Triton

Open the File Browser (Ctrl/Command + Shift + F) to see the lab's file directory on the left (The list may need to be refreshed). The Candidate Scoring Model should now be saved to a folder on the left called `candidate_scorer`. Load it into `model_repository` using the following structure:

`model_repository/
  candidate_scorer/
    config.pbtxt
    1/
      model.savedmodel/
        <tensorflow_saved_model_files>/
          ...
`

Feel free to use the [previous lab](3-03_triton.ipynb) as a guide. The below cell will **delete and rebuild** a `candidate_scorer` folder in `model_repository`. Please complete the `FIXME` below.

In [9]:
!rm -rf model_repository/candidate_scorer ||:; mkdir model_repository/candidate_scorer

In [10]:

import tensorflow as tf
from nvtabular.inference.triton.ensemble import export_tensorflow_model

model_name = 'candidate_scorer'
model = tf.keras.models.load_model("candidate_scorer")
tf_config = export_tensorflow_model(model, model_name, "model_repository/candidate_scorer", version=1)


INFO:tensorflow:Assets written to: model_repository/candidate_scorer/1/model.savedmodel/assets


Finally, let's verify that the model is running on Triton. `20 points`

In [11]:
import tritonhttpclient

try:
    triton_client = tritonhttpclient.InferenceServerClient(url="triton:8000", verbose=True)
    print("client created.")
except Exception as e:
    print("channel creation failed: " + str(e))

triton_client.get_model_repository_index()
triton_client.load_model(model_name=model_name)
triton_client.is_model_ready(model_name)

client created.
POST v2/repository/index, headers None

<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '77'}>
bytearray(b'[{"name":"candidate_scorer"},{"name":"wnd_tf","version":"1","state":"READY"}]')
POST v2/repository/models/candidate_scorer/load, headers None

<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '0'}>
Loaded model 'candidate_scorer'
GET v2/models/candidate_scorer/ready, headers None
<HTTPSocketPoolResponse status=200 headers={'content-length': '0', 'content-type': 'text/plain'}>


True

Once verified, run the code below to free the GPU for the next section.

In [12]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

### 2.2 Client Application

We have our models, and our Candidate Scorer is loaded into Triton. Time for the magic to happen. Let's string everything together in [client.py](assessment/client.py) where there are a number of TODOs.

First, load your Candidate Generation Model and use it to find a user's top 16 (`BATCH_SIZE`) recommendations.
Second, specify the metadata columns specific to your Candidate Scoring Model.
Finally, fix the `for` loop to format the data for Triton.

For both the Candidate Generation Model and the Candidate Scoring Model, CuPy's/NumPy's [argpartition](https://numpy.org/doc/stable/reference/generated/numpy.argpartition.html) function are used to quickly separate top scoring items versus low scoring items.

The following tests are in place:
* Prediction for one user's top 16 items takes less than `2` seconds `10 points`
* The average predicted score for user `131676`'s top 4 recommended items is greater than `4.2`. `10 points`

In [1]:
!python3 -m assessment.client

2023-03-18 05:23:00.290320: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2023-03-18 05:23:02.104918: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2023-03-18 05:23:02.105054: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2023-03-18 05:23:02.105249: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1038] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-03-18 05:23:02.105847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Found device 0 with properties: 
pciBusID: 0000:00:1e.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2023-03-18 05:23:02.105900: I tensorflow/stream_executor/platform/default/dso_load

Got both `Tests pass!` for [1. Model Training](#1.-Model-Training) and [2. Model Deployment](#2.-Model-Deployment)? Run the cell below for the final check! This will verify your code against our assessment server, and if it passes, will qualify you for a certificate.

In [None]:
from run_assessment import run_assessment
run_assessment()

client created.
RMSE 4.717508816405891
RMSE 4.7618305926376125
RMSE 4.456239081430795
RMSE 2.7517976589880377
RMSE 1.6074786439280702
RMSE 1.4030620056945249
RMSE 1.305157570778642
RMSE 1.25908484286197
RMSE 1.2458316684430941
RMSE 1.23350007525153
RMSE 1.2218358127779516
RMSE 1.2159572277360957
RMSE 1.214025268669153
RMSE 1.2127061303551454
RMSE 1.21046435671919
RMSE 1.2077734250261178


  if (await self.run_code(code, result,  async_=asy)):


Epoch 1/2
{'val_loss': 2.4234686, 'val_rmse': 1.1854905}
Epoch 2/2

## Generate a Certificate
If you passed the assessment, please return to the course page (shown below) and click the "ASSESS TASK" button, which will generate your certificate for the course.

<img src="images/run_assess_task.png" height="300" width="500"/>

<div align="center"><a href="https://www.nvidia.com/en-us/deep-learning-ai/education/"><img src="./images/DLI_Header.png"></a></div>