<h1>Content Based Filtering By Hand </h1>

we need to use TensorFlow version 1.15.0. 

In [1]:
!pip install tensorflow==1.15.0

Collecting tensorflow==1.15.0
  Downloading tensorflow-1.15.0-cp37-cp37m-manylinux2010_x86_64.whl (412.3 MB)
[K     |████████████████████████████████| 412.3 MB 15 kB/s s eta 0:00:01     |███████████████▍                | 198.4 MB 60.7 MB/s eta 0:00:04
Installing collected packages: tensorflow
  Attempting uninstall: tensorflow
    Found existing installation: tensorflow 1.15.2
    Uninstalling tensorflow-1.15.2:
      Successfully uninstalled tensorflow-1.15.2
Successfully installed tensorflow-1.15.0


Make sure to restart your kernel to ensure this change has taken place.

<h2>Importing Required Modules </h2>

In [1]:
import numpy as np
import tensorflow as tf #you can follow along colab you don't need ai-platform for this tutorial

In [2]:
tf.enable_eager_execution()
print(tf.__version__)

1.15.0


To start, we'll create our list of users, movies and features. While the users and movies represent elements in our database, for a content-based filtering method the features of the movies are likely hand-engineered and rely on domain knowledge to provide the best embedding space. Here we use the categories of Action, Sci-Fi, Comedy, Cartoon, and Drama to describe our movies (and thus our users).

In this example, we will assume our database consists of four users and six movies, listed below.

In [6]:
users = ["Ayan" , "Anusha", "Prem", "Srinjoy"] #random indian names 4 user
movies = ['Star Wars', 'The Dark Knight', 'Shrek', 'The Incredibles',"Ra-one" , "DDLJ"] #six movies mixture of bollywood and hollywood
features = ["Action", "Sci-Fi", "Comedy", "Cartoon","Drama"]
num_users = len(users)
num_movies = len(movies)
num_feats = len(features)
num_recommendations = 2 #number of recommendation to be made

### Initialize our users, movie ratings and features

We'll need to enter the user's movie ratings and the k-hot encoded movie features matrix. Each row of the users_movies matrix represents a single user's rating (from 1 to 10) for each movie. <b>A zero indicates that the user has not seen/rated that movie</b>. The movies_feats matrix contains the features for each of the given movies. Each row represents one of the six movies, the columns represent the five categories. A one indicates that a movie fits within a given genre/category. 

In [7]:
#each row in user matrix represents users rating for different movies

users_movies = tf.constant([
                [4,  6,  8,  0, 0, 0],
                [0,  0, 10,  0, 8, 3],
                [0,  6,  0,  0, 3, 7],
                [10, 9,  0,  5, 0, 2]],dtype=tf.float32) #tf.constant([[user1_rating],[user2_rating],[user3_rating]])


# features of the movies one-hot encoded
# e.g. columns could represent ['Action', 'Sci-Fi', 'Comedy', 'Cartoon', 'Drama']

movies_feats = tf.constant([
                [1, 1, 0, 0, 1],
                [1, 1, 0, 0, 0],
                [0, 0, 1, 1, 0],
                [1, 0, 1, 1, 0],
                [0, 0, 0, 0, 1],
                [1, 0, 0, 0, 1]],dtype=tf.float32)  #tf.constant([[movie1_feature],[movie2_feature],[movie3_feature]])

### Computing the user feature matrix

We will compute the user feature matrix; that is, a matrix containing each user's embedding in the five-dimensional feature space.  We can calculuate this as the matrix multiplication of the `users_movies` tensor with the `movies_feats` tensor. Implement this in the  below.

In [8]:
#basically we have to matroix multiply the both matrix this will give us which user have interest in which field i.g Action,drama
users_feature = tf.matmul(users_movies,movies_feats)
users_feature

<tf.Tensor: id=2, shape=(4, 5), dtype=float32, numpy=
array([[10., 10.,  8.,  8.,  4.],
       [ 3.,  0., 10., 10., 11.],
       [13.,  6.,  0.,  0., 10.],
       [26., 19.,  5.,  5., 12.]], dtype=float32)>

we have 4 users and there interest as we had user_rating for movies and movie features

Next we normalize each user feature vector to sum to 1. Normalizing isn't strictly neccesary, but it makes it so that rating magnitudes will be comparable between users.

In [9]:
users_feature = users_feature / tf.reduce_sum(users_feature,axis=1,keepdims=True)
users_feature

<tf.Tensor: id=5, shape=(4, 5), dtype=float32, numpy=
array([[0.25      , 0.25      , 0.2       , 0.2       , 0.1       ],
       [0.0882353 , 0.        , 0.29411766, 0.29411766, 0.32352942],
       [0.44827586, 0.20689656, 0.        , 0.        , 0.3448276 ],
       [0.3880597 , 0.2835821 , 0.07462686, 0.07462686, 0.17910448]],
      dtype=float32)>

We have normalized scores of the users .
<p> Now we can use users_feature computed above to represent the relative importance of each movie category for each user</p>

In [10]:
top_users_features = tf.nn.top_k(users_feature, num_feats)[1]
top_users_features

<tf.Tensor: id=8, shape=(4, 5), dtype=int32, numpy=
array([[0, 1, 2, 3, 4],
       [4, 2, 3, 0, 1],
       [0, 4, 1, 2, 3],
       [0, 1, 4, 2, 3]], dtype=int32)>

Basically it ranks features according to users preference

In [11]:
for i in range(num_users):
    feature_names = [features[int(index)] for index in top_users_features[i]]
    print("{}: {}".format(users[i],feature_names))

Ayan: ['Action', 'Sci-Fi', 'Comedy', 'Cartoon', 'Drama']
Anusha: ['Drama', 'Comedy', 'Cartoon', 'Action', 'Sci-Fi']
Prem: ['Action', 'Drama', 'Sci-Fi', 'Comedy', 'Cartoon']
Srinjoy: ['Action', 'Sci-Fi', 'Drama', 'Comedy', 'Cartoon']


## Determining movie recommendations. 

We'll now use the `users_feats` tensor we computed above to determine the movie ratings and recommendations for each user.

To compute the projected ratings for each movie, we compute the similarity measure between the user's feature vector and the corresponding movie feature vector.  

We will use the dot product as our similarity measure. In essence, this is a weighted movie average for each user.

In [13]:
users_ratings = tf.matmul(users_feature,tf.transpose(movies_feats))
users_ratings

<tf.Tensor: id=107, shape=(4, 6), dtype=float32, numpy=
array([[0.6       , 0.5       , 0.4       , 0.65      , 0.1       ,
        0.35      ],
       [0.4117647 , 0.0882353 , 0.5882353 , 0.67647064, 0.32352942,
        0.4117647 ],
       [1.        , 0.6551724 , 0.        , 0.44827586, 0.3448276 ,
        0.79310346],
       [0.8507463 , 0.6716418 , 0.14925373, 0.53731346, 0.17910448,
        0.5671642 ]], dtype=float32)>

<h1> In case of new ratings </h1>
The computation above finds the similarity measure between each user and each movie in our database. To focus only on the ratings for new movies, we apply a mask to the all_users_ratings matrix.  

If a user has already rated a movie, we ignore that rating. This way, we only focus on ratings for previously unseen/unrated movies.

In [14]:
users_ratings_new = tf.where(tf.equal(users_movies, tf.zeros_like(users_movies)),users_ratings
                            ,tf.zeros_like(tf.cast(users_movies,tf.float32)))#where clause like np.where to find the place to apply condition

users_ratings_new

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


<tf.Tensor: id=111, shape=(4, 6), dtype=float32, numpy=
array([[0.        , 0.        , 0.        , 0.65      , 0.1       ,
        0.35      ],
       [0.4117647 , 0.0882353 , 0.        , 0.67647064, 0.        ,
        0.        ],
       [1.        , 0.        , 0.        , 0.44827586, 0.        ,
        0.        ],
       [0.        , 0.        , 0.14925373, 0.        , 0.17910448,
        0.        ]], dtype=float32)>

see out of 6 movies user1 has rated 3 movies rest sets as 0

In [16]:
top_movies = tf.nn.top_k(users_ratings_new, num_recommendations)[1]
top_movies

<tf.Tensor: id=114, shape=(4, 2), dtype=int32, numpy=
array([[3, 5],
       [3, 0],
       [0, 3],
       [4, 2]], dtype=int32)>

In [17]:
for i in range(num_users):
    movie_names = [movies[int(index)] for index in top_movies[i]]
    print("{}: {}".format(users[i],movie_names))

Ayan: ['The Incredibles', 'DDLJ']
Anusha: ['The Incredibles', 'Star Wars']
Prem: ['Star Wars', 'The Incredibles']
Srinjoy: ['Ra-one', 'Shrek']


<h3>Finish Up</h3>
Congratulations You learn how to build Content based filter recmmender from scratch