# Two-Tower DNN Model for Learning Query and Candidate Embeddings

References: 
- https://www.tensorflow.org/recommenders/examples/basic_retrieval
- https://storage.googleapis.com/pub-tools-public-publication-data/pdf/6c8a86c981a62b0126a11896b7f6ae0dae4c3566.pdf
_____

In this notebook, we will build a Two-Tower DNN Model system to recommend a set of courses from an online catalogue that a given user is likely to watch. One of the advantages of building a Deep Learning Recommendation engine is the ability to build rich, flexible feature representations.

## DNN: Retrival and Rankng Architecture 

![](../docs/assets/deep_retrieval.png) 

### Retrieval 

Stage is responsible for selecting an initial set of hundreds of candidates from all candidates and efficiently filter out candidates that a user is not interested in. Because the retrieval model could consists of millions of users and items, it has to be computationally efficient. We'll discuss more below on optimizing this stage, but first there are two main components of the retrieval model:

**1. Query Model**: computes the query representation (normally a fixed-dimensionality embedding vector) using query features. A query fearture could consist of, but not limited to the the following:

- unique id
- title
- previous_history
- content viewing time
- likes
- ratings
- view date

**2. Candidate model** computes the candidate representation (an equally-sized vector) using the candidate features. A candidate fearture could consist of:

- uqniue id
- title


The process in the retrieval stage is to translate User and Item ids into embedding vectors, which are just high-dimensional numerical representations. The weights for the embedding layers are adjuste during training and the outputs of the two models are then multiplied together to give a query-candidate affinity score, with higher scores expressing a better match between the candidate and the query.

### Ranking

Stage takes the outputs of the retrieval model and fine-tunes them to a much smaller subset of recommendations. For the ranking stage, one of the main differences with this architecture is the ability to substantially improve the recommendations by using more features rather than just user and candidate identifiers. We can include features about the candidate such as:

- prices
- genres
- posted time

_______
Keep in mind that **Matrix Factorization** `user_id` and `candidate_id` features collaboratively to learn the latent features and does not consider side-features like mentioned above, and thus may not be highly performant. We can use deep learnning architectures such as the two-tower nerual network to include side features into the model as shown below.

![](../docs/assets/two_tower_model.png) 

![](../docs/assets/two_tower_model2.png) 

________
### Evaluate Model

The training data contains positive (user, item) pairs. To evaluate the mode, we need to compare the affinity score that the model calculates for this pair to the scores of all the other possible candidates. So if the score for the positive pair is higher than for all other candidates, our model is highly accurate. The metric we will be using to evaluate the model is:

- **Metrics:** The metric utilized for evaluating the model is [factorized_top_k.TopK](https://www.tensorflow.org/recommenders/api_docs/python/tfrs/metrics/FactorizedTopK), which computes the top K categorical accuracy: how often the true candidate is in the top K candidates for a given query.

    - As the model trains, the top-k retrieval metrics updates. The `factorized_top_k` retrieval metric measures the number of true positive that is in the top-k retrieved items from the entire candidate set. As an example, a top-5 categorical accuracy metric of 0.2 would tell us that, on average, the true positive is in the top 5 retrieved items 20% of the time.

        - **Recall:** the ratio of items that a user likes were actually recommended. If a user likes say 5 items and the recommendation shows 3 of them, then the recall is 0.60.

        - **Precision:** out of all the recommended items, how many the user actually liked? If 5 items were recommended to the user out of which he liked, say 4 of them, then the precision is 0.80.

         - So we if we recommend all items to a user, then we have 100% recall! If we recommend say 1000 items and the user only likes 10, the precision is 0.10%, which is very low. 

        - **Goal:** Maximize both the precision and recall. 

- **Loss Function**: The loss function [tfrs.tasks.Retrieval](https://www.tensorflow.org/recommenders/api_docs/python/tfrs/tasks/Retrieval) will try to maximize the affinity of these query, candidate pairs while minimizing the affinity between the query and candidates belonging to other queries in the batch.



In [2]:
%cd ../

/Users/tracesmith/Desktop/Trace/Coding/user-recommender


In [3]:
%load_ext autoreload
import os
import pandas as pd
import tensorflow as tf
import tensorflow_recommenders as tfrs

from recommender.engine.deep_retrieval import train, QueryModel, CandidateModel, brute_force_recommendation
from recommender.utils.plots import plot_metrics

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Build Dataset

For this example, the dataset we will focus on is `user_course_views.csv`.

In [6]:
# User content
user_course_views = pd.read_csv(os.path.join('data','user_course_views.csv'))
course_tags = pd.read_csv(os.path.join('data','course_tags.csv'))

In [7]:
user_course_views['course_id'] = user_course_views['course_id'].apply(lambda x: ' '.join(x.split('-')))
courses = user_course_views[['course_id','author_handle','level']].drop_duplicates()

In [8]:
# Users --> Convert to Tensors
user_id = tf.convert_to_tensor(user_course_views['user_handle'].astype(str), dtype=tf.string)
users_watched_courses = tf.convert_to_tensor(user_course_views['course_id'].astype(str), dtype=tf.string)
users_watched_courses_view_time = tf.convert_to_tensor(user_course_views['view_time_seconds'].astype(str), dtype=tf.int64)
tensors = {'user_id':user_id,'title':users_watched_courses,'user_view_time':users_watched_courses_view_time}
users = tf.data.Dataset.from_tensor_slices(tensors)

In [9]:
# Courses --> Convert to Tensors
course_ids = tf.convert_to_tensor(courses['course_id'].astype(str), dtype=tf.string)
tensors = {'title':course_ids}
courses = tf.data.Dataset.from_tensor_slices(tensors)
courses = courses.map(lambda x: x['title'])

### Test Query/Candidate Towers

In [291]:
%autoreload
query_model = QueryModel(users)
query_model.user_embeddings.layers[0].adapt(users.map(lambda x: x['user_id']).batch(10))
query_model.viewing_layer_norm.adapt(users.map(lambda x: x['user_view_time']).batch(10))

for row in users.batch(1).take(1):
    print(f"Computed representations: {user_model(row)[0, :3]}")
    
candidate_model = CandidateModel(courses)
candidate_model.title_id_embeddings.layers[0].adapt(courses.map(lambda x: x['title']).batch(10))
candidate_model.title_text_embedding.layers[0].adapt(courses.map(lambda x: x['title']).batch(10))
candidate_model.title_text_embedding.layers[0].adapt(courses.map(lambda x: x['level']).batch(10))
candidate_model.author_text_embedding.layers[0].adapt(courses.map(lambda x: x['level']).batch(10))

for row in courses.batch(1).take(1):
    print(f"Computed representations: {candidate_model(row)[0, :3]}")

Computed representations: [-0.01667677  0.04237409 -0.00219495]


### Train Model

In [None]:
%autoreload
model = train(users,courses,epochs=3)

### Evaluate Model

In [None]:
%autoreload
plot_metrics(history,['factorized_top_k/top_5_categorical_accuracy','factorized_top_k/top_10_categorical_accuracy',
                      'factorized_top_k/top_50_categorical_accuracy','factorized_top_k/top_100_categorical_accuracy'])

### Model Summary

In [334]:
model[1].query_model.summary()

Model: "sequential_65"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
query_model_13 (QueryModel)  multiple                  280355    
_________________________________________________________________
dense_25 (Dense)             multiple                  1088      
Total params: 281,443
Trainable params: 281,440
Non-trainable params: 3
_________________________________________________________________


In [335]:
model[1].candidate_model.summary()

Model: "sequential_68"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
candidate_model_13 (Candidat (None, 64)                222176    
_________________________________________________________________
dense_26 (Dense)             (None, 32)                2080      
Total params: 224,256
Trainable params: 224,256
Non-trainable params: 0
_________________________________________________________________


### Model Evaluation 

In [None]:
# Load Models
query_model = tf.keras.models.load_model('models/QueryModel')
candidate_model = tf.keras.models.load_model('models/QueryModel')

### Make Recommendation

To make a prediction, a query model takes in the features of the query and transforming them into a query embedding, and a candidate model. 

- As mentioned, we will be using [tfrs.layers.factorized_top_k.BruteForce](https://www.tensorflow.org/recommenders/api_docs/python/tfrs/layers/factorized_top_k/BruteForce), but this this approach is not suitable for large scale datasets since this layer sorts the pre-computed candidate representations, and calculate the scores of the query-candidate pairs for all possible candidates, and then returns the highest `k` ranked itmes. 

- An alternative approach is using approximate nearest neighbours (ANN) index which allows fast approximate lookup of candidates in response to a query produced by the query model. [ScaNN (Scalable Nearest Neighbors)](https://eugeneyan.com/writing/how-to-install-scann-on-mac/) is a method for efficient vector similarity search at scale.

- https://github.com/google-research/google-research/tree/master/scann

In [10]:
brute_force = brute_force_recommendation(courses,query_model,candidate_model,batch_size=1_000,k=10)

In [13]:
brute_force(tf.constant(["0"]))

(<tf.Tensor: shape=(1, 10), dtype=float32, numpy=
 array([[0.02702044, 0.02702044, 0.02702044, 0.02702044, 0.02702044,
         0.02702044, 0.02702044, 0.02702044, 0.02702044, 0.02702044]],
       dtype=float32)>,
 <tf.Tensor: shape=(1, 10), dtype=string, numpy=
 array([[b'cpt sp2010 web designers branding intro',
         b'cpt sp2010 web designers css',
         b'aws certified solutions architect professional',
         b'aws certified sysops admin associate',
         b'aws system admin fundamentals', b'react js getting started',
         b'arnold maya fundamentals',
         b'animated web social media banners photoshop flash 1857',
         b'design 2d game level illustrator 2113',
         b'twod racing game series unity 1 1245']], dtype=object)>)