# Content Based Recommenders: User Basic Profile

In this exercise we are going to build a simple user profile based on documents attribute data, and predict
user liking / disliking of the unrated documents.

In addition we will answer the following questions:

1) Which document as per simple profile prediction **User1** will like best? What is the prediction score for this document?

2) Which document as per simple profile prediction **User2** will like best?  What is the prediction score for this document?

3) How many documents **User1** will dislike?

4) How many documents **User2** will dislike?

## 1. Settings

In [1]:
# Settings 
import numpy as np
import pandas as pd

## 2. Data

The dataset contains a table of documents content attributes: 20 documents across 10 attributes. 

We also have two users' evaluations of five documents each. 

The content attributes should be interpreted as follows:

- 1 - document is about listed topic;
- 0 - document is not about listed topic;

User evaluations should be read as follows:

- 1 - user liked a document;
- 0 - user never saw a document;
- -1 - user didn't like a document;


### 2.1. Content Attributes Table

In [2]:
# Content Attributes Table
topics = ["baseball", "economics", "politics", "Europe", "Asia", 
          "soccer", "war", "security", "shopping", "family"]
doclist = ["doc1", "doc2", "doc3", "doc4", "doc5", "doc6", "doc7", "doc8", "doc9", "doc10",
           "doc11", "doc12", "doc13", "doc14", "doc15", "doc16", "doc17", "doc18", "doc19", "doc20"]

document_attributes = pd.DataFrame(columns=topics, index=doclist)

# Adding documents data
document_attributes.loc['doc1'] = [1,0,1,0,1,1,0,0,0,1]
document_attributes.loc['doc2'] = [0,1,1,1,0,0,0,1,0,0]
document_attributes.loc['doc3'] = [0,0,0,1,1,1,0,0,0,0]
document_attributes.loc['doc4'] = [0,0,1,1,0,0,1,1,0,0]
document_attributes.loc['doc5']= [0,1,0,0,0,0,0,0,1,1]
document_attributes.loc['doc6'] = [1,0,0,1,0,0,0,0,0,0]
document_attributes.loc['doc7'] = [0,0,0,0,0,0,0,1,0,1]
document_attributes.loc['doc8'] = [0,0,1,1,0,0,1,0,0,1]
document_attributes.loc['doc9'] = [0,0,0,0,0,1,0,0,1,0]
document_attributes.loc['doc10'] = [0,1,0,0,1,0,1,0,0,0]
document_attributes.loc['doc11'] = [0,0,1,0,1,0,0,0,1,0]
document_attributes.loc['doc12'] = [1,0,0,0,0,1,1,0,0,0]
document_attributes.loc['doc13'] = [0,0,1,1,1,0,0,1,0,0]
document_attributes.loc['doc14'] = [0,1,1,1,0,0,0,0,1,0]
document_attributes.loc['doc15'] = [0,0,0,1,0,1,1,1,0,0]
document_attributes.loc['doc16'] = [1,0,0,0,0,1,0,0,1,0]
document_attributes.loc['doc17'] = [0,1,1,1,0,0,0,1,0,0]
document_attributes.loc['doc18'] = [0,0,0,1,0,0,0,0,1,0]
document_attributes.loc['doc19'] = [0,1,1,0,1,0,1,0,0,1]
document_attributes.loc['doc20'] = [0,0,1,1,0,0,1,0,1,0]

document_attributes

Unnamed: 0,baseball,economics,politics,Europe,Asia,soccer,war,security,shopping,family
doc1,1,0,1,0,1,1,0,0,0,1
doc2,0,1,1,1,0,0,0,1,0,0
doc3,0,0,0,1,1,1,0,0,0,0
doc4,0,0,1,1,0,0,1,1,0,0
doc5,0,1,0,0,0,0,0,0,1,1
doc6,1,0,0,1,0,0,0,0,0,0
doc7,0,0,0,0,0,0,0,1,0,1
doc8,0,0,1,1,0,0,1,0,0,1
doc9,0,0,0,0,0,1,0,0,1,0
doc10,0,1,0,0,1,0,1,0,0,0


### 2.2. Users' Evaluations Data

In [3]:
# User Evaluations Data
users = ["User1", "User2"]
user_evaluations = pd.DataFrame(columns=users, index=doclist).fillna(0)

# Adding users evaluations data
user_evaluations.loc['doc1'] = [1,-1]
user_evaluations.loc['doc2'] = [-1,1]
user_evaluations.loc['doc4'] = [0,1]
user_evaluations.loc['doc6'] = [1,0]
user_evaluations.loc['doc12'] = [0,-1]
user_evaluations.loc['doc16'] = [1,0]
user_evaluations.loc['doc17'] = [0,1]
user_evaluations.loc['doc19'] = [-1,0]

user_evaluations.T

Unnamed: 0,doc1,doc2,doc3,doc4,doc5,doc6,doc7,doc8,doc9,doc10,doc11,doc12,doc13,doc14,doc15,doc16,doc17,doc18,doc19,doc20
User1,1,-1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,-1,0
User2,-1,1,0,1,0,0,0,0,0,0,0,-1,0,0,0,0,1,0,0,0


## 3. Building Basic User Profile

In order to build user profiles we need to multiply the transposed **user_evaluations** matrix (E) by the **document_attributes** matrix (A).  

$$ U = E^T \times A $$

$U$ - user profiles matrix

$E$ - user evaluations matrix

$A$ - document attributes matrix

In [4]:
# Calculating User Profiles Matrix

user_profiles = user_evaluations.T.dot(document_attributes)
user_profiles

Unnamed: 0,baseball,economics,politics,Europe,Asia,soccer,war,security,shopping,family
User1,3,-2,-1,0,0,2,-1,-1,1,0
User2,-2,2,2,3,-1,-2,0,3,0,-1


## 4. Computing Documents Prediction Scores

In this section we are going to predict user liking/disliking of each document. In order to achieve that we are going to calculate a document prediction score for each document for each user. We will use matrices to perform the computation.

User predicted preferences matrix (L) consists out of document prediction scores for each user.

$$ L = U \times A^T $$

$L$ - user predicted preference matrix

$U$ - user profiles matrix

$A$ - document attributes matrix

In [5]:
# User Predicted Preference Matrix
user_predicted_preferences = user_profiles.dot(document_attributes.T)
user_predicted_preferences

Unnamed: 0,doc1,doc2,doc3,doc4,doc5,doc6,doc7,doc8,doc9,doc10,doc11,doc12,doc13,doc14,doc15,doc16,doc17,doc18,doc19,doc20
User1,4,-4,2,-3,-1,3,-1,-2,3,-3,0,4,-2,-2,0,6,-4,1,-4,-1
User2,-4,10,0,8,1,1,2,4,-2,1,1,-4,7,7,4,-4,10,3,2,5


### 4.1. Ordered Document Preferences: User1

In [6]:
# Predicted Documents Preference: User1
user_predicted_preferences.loc[['User1']].T.sort_values(['User1'], ascending=False).T

Unnamed: 0,doc16,doc1,doc12,doc9,doc6,doc3,doc18,doc15,doc11,doc7,doc5,doc20,doc8,doc13,doc14,doc10,doc4,doc2,doc17,doc19
User1,6,4,4,3,3,2,1,0,0,-1,-1,-1,-2,-2,-2,-3,-3,-4,-4,-4


### 4.2. Ordered Document Preferences: User2

In [7]:
# Predicted Documents Preference: User2
user_predicted_preferences.loc[['User2']].T.sort_values(['User2'], ascending=False).T

Unnamed: 0,doc17,doc2,doc4,doc13,doc14,doc20,doc8,doc15,doc18,doc19,doc7,doc11,doc10,doc6,doc5,doc3,doc9,doc12,doc16,doc1
User2,10,10,8,7,7,5,4,4,3,2,2,1,1,1,1,0,-2,-4,-4,-4


## 5. Q & A

In this section we are going to answer the questions stated at the beginning of the notebook.

**Question 1:** Which document(s) as per simple profile prediction User1 will like best? What is the prediction score for the document(s)?


In [8]:
user1_max = user_predicted_preferences.loc['User1'].max()
user1_favorite_documents = \
    user_predicted_preferences.loc['User1'][user_predicted_preferences.loc['User1']==user1_max]
user1_favorite_documents

doc16    6
Name: User1, dtype: object

We can see that as per the prediction score **User1** is going to like most the **doc16**. This document has got a score of **6**. Let's see the content of this document

In [9]:
# Content of User1 favourite document(s)
document_attributes.loc[user1_favorite_documents.index]
    

Unnamed: 0,baseball,economics,politics,Europe,Asia,soccer,war,security,shopping,family
doc16,1,0,0,0,0,1,0,0,1,0


**Question 2:** Which document(s) as per simple profile prediction User2 will like best? What is the prediction score for the document(s)?

In [10]:
user2_max = user_predicted_preferences.loc['User2'].max()
user2_favorite_documents = \
    user_predicted_preferences.loc['User2'][user_predicted_preferences.loc['User2']==user2_max]
user2_favorite_documents

doc2     10
doc17    10
Name: User2, dtype: object

We can see that the **User2** is going to like **doc2** and **doc17**, both those documents have got a score of 10. Let's see what is in there.

In [11]:
# Content of User1 favourite document(s)
document_attributes.loc[user2_favorite_documents.index]

Unnamed: 0,baseball,economics,politics,Europe,Asia,soccer,war,security,shopping,family
doc2,0,1,1,1,0,0,0,1,0,0
doc17,0,1,1,1,0,0,0,1,0,0


**Question 3:** How many documents **User1** will dislike?

In order to answer this question we need to find all the documents for which **User1** have got negative scores.

In [12]:
user1_disliked_documents = \
    user_predicted_preferences.loc['User1'][user_predicted_preferences.loc['User1'] < 0]
len(user1_disliked_documents)


11

We can see that the **User1** is going to dislike **11** documents out of **20**.

**Question 4:** How many documents **User2** will dislike?

In order to answer this question we need to find all the documents for which **User2** have got negative scores.

In [13]:
user2_disliked_documents = \
    user_predicted_preferences.loc['User2'][user_predicted_preferences.loc['User2'] < 0]
len(user2_disliked_documents)

4

There are **4** documents out of existing **20** that **User2** is not going to like.