# Content Oriented Recommender Systems:
Content Oriented or Content Filtering Recommendation Systems are the type of recommender systems that make use of the users previous choices and preferences and extracts the features from those choices to make decisions for the recommendations to provide. These features may include genre, actors, central idea, etc. to determine what a user likes or dislikes. Using these features, items with similar features are found and are used for recommendation purposes.

## Example Demonstration:
For the sake of simiplicity, lets assume for now that we have an movie recommendation system with a user, say A, is interested in finding a good movie for his weekend. Lets say that person A gives ratings 4 and 3 to the two movies X and Y which can be represented via genres {comedy:2, adventure:1, thriller:1, suspense:1} and {comedy:2, adventure:2, action:1} respectively.

![image.png](attachment:image.png)

Now, based on these choices, assume that we want to determine, how much rating will the user A provide to a movie Z, with the genre representation {comedy:1, adventure:2, thriller:0, suspense:0, action:2}. 

### Step 1 - Compute User Profile:
Before we move to the recommendation part, first we have to find the overall preference of user A for the given genres. One way of doing this is summing the product of weight of rating given for each genre by the user for each movie and then normalizing the result to a standard overall rating of 5. This process is shown visually as under:

![image.png](attachment:image.png)

### Step 2 - Obtain Similarity Score:
Now that we have computed the genre preference for the user, we can now use it to determine how he will rate the movie Z, by using a basic similarity metric. This can be done by first normalizing the genre representation for movie Z and then computing the Euclidean distance between the obtained vector and the user's profile for genre preference. Lets take a look at it in visual representation.

![image.png](attachment:image.png)

Now just like movie Z, given some m movies to recommend from, the content based recommender system will make use of Eucilidean distance or a better aggregation metric to find the similarity in the movie's genre representation and that of the user's profile. The lowest scores obtained represent the most similarity and hence are ranked higher than the others. 

## Scalability of Approach:
The problem here though would be with the efficiency of the algorithm, which would need to compute the distance metric for every movie everytime. Instead, a better way would be to store the movies in the form of a set of k clusters (each with at most m movies). The user profile is first measured to check which cluster is the closest to it and then the movies in that cluster are used to compare for ranking purposes. This techniques allows to reduce the complexity of the overall operation from O(n) to O(k+m) as well as gives a solid way to perform the matching and ranking operations for the recommender model.

## Pros and Cons of Approach:

### Advantages of Approach:
This approach offers the following advantages in comparison to its other counterparts:

* It does not rely on other users in the system i.e. in initial stages of the system, you can easily utilize this method to gather information for different users. Thus, we can easily use this approach to build the recommender from the ground up.


* It does not undergo user cold start problem. The user cold start problem is defined as the problem faced by the recommenders when a new user enters the system. This makes it difficult for the recommender to provide any recommendations as the user is new and nothing is known about his preferences. To avoid this, the content filtering system simply asks for some reknown films that the user likes or genres he prefers and then using the clustering approach determines which movies turn out to be the best fit for the user. Once the initial values have been set, the user profile updates accordingly as per the movies watched and rated and thus results in a better approximation for the recommendations with the passage of time. In case the user doesn't fill any information in the initial stage, the system can show the trending movies for different genres.

### Disadvantages of Approach:
This approach offers the following disadvantages in comparison to its other counterparts:

* It does not rely on other users preferences i.e. the recommendations are user-centric only. This results in narrowing the list of items recommended as the theme revolves around the user based genres only and leads to transparency of trending items in other genres. For example, if a new item gets released whose genre categories fall in a manner that the user has no ratings or preferences for, then that item may get hidden from the user regardless of how much influence it has over the total community. 


* Features used in the process of content filtering are usually manually engineered in nature and need to be determine by the human i.e. if we want to label the genres of a particular movie, we would need to annotate it with the percentage of how much comedy it contains and whether it does contain it or not. This is time consuming and limited to the number of features used for such representation.

This approach in practice is usually utilized with collaborative filtering to avoid the mishaps that it can cause and thus results in a system, without the flaws mentioned.