Recommendation system using movie rating dataset. Film names are taken into account. 

Top 10 recommendations for user with id $10$ are shown as an example of model usage.

## Investigation

Firstly I investigate the field, starting from [Recommender system](https://en.wikipedia.org/wiki/Recommender_system) wikipedia page.

Wikipedia:

---
### Intro
A **recommendation system** is a subclass of information filtering system that seeks to **predict the "rating" or "preference"** a user would give to an item.

Recommender systems usually make use of either or both **collaborative filtering** and **content-based filtering**. 

**Collaborative filtering** approaches build a **model from a user's past behavior** (items previously purchased or selected and/or numerical ratings given to those items) as well as **similar decisions made by other users**. This model is then used to predict items (or ratings for items) that the user may have an interest in. 

**Content-based filtering** approaches utilize a series of discrete, pre-tagged characteristics of an item in order to **recommend additional items with similar properties**. 

Current recommender systems typically **combine** one or more approaches into a **hybrid system**.

The **differences** between collaborative and content-based filtering can be demonstrated by comparing two early music recommender systems – Last.fm and Pandora Radio.

- Last.fm creates a "station" of recommended songs by observing what bands and individual tracks the **user has listened** to on a regular basis and **comparing** those against the **listening behavior of other users**. Last.fm will play tracks that do not appear in the user's library, but are often **played by other users with similar interests**. As this approach leverages the behavior of users, it is an example of a collaborative filtering technique.

- Pandora uses the **properties of a song or artist** (a subset of the 400 attributes provided by the Music Genome Project) to seed a "station" that **plays music with similar properties**. **User feedback** is used to **refine** the station's results, **deemphasizing** certain **attributes** when a user "dislikes" a particular song and **emphasizing** other **attributes** when a user "likes" a song. This is an example of a content-based approach.

In the above example, Last.fm requires a **large amount of information** about a user to make accurate recommendations. This is an example of the **cold start** problem, and is common in collaborative filtering systems.

Whereas Pandora needs **very little information to start**, it is far more limited in scope (for example, it can only make recommendations that are similar to the original seed).

### Collaborative filtering

Collaborative filtering is based on the assumption that **people who agreed in the past will agree in the future**, and that they **will like similar kinds of items** as they liked in the past. The system generates recommendations using only information about rating profiles for different users or items. By locating peer users/items with a rating history similar to the current user or item, they generate recommendations using this neighborhood. Collaborative filtering methods are classified as memory-based and model-based. A well-known example of memory-based approaches is the user-based algorithm,[34] while that of model-based approaches is the Kernel-Mapping Recommender.[35]

A key advantage of the collaborative filtering approach is that it does not rely on machine analyzable content and therefore it is capable of accurately recommending complex items such as movies without requiring an "understanding" of the item itself. Many algorithms have been used in measuring user similarity or item similarity in recommender systems. For example, the k-nearest neighbor (k-NN) approach[36] and the Pearson Correlation as first implemented by Allen.[37]

When building a model from a user's behavior, a distinction is often made between explicit and implicit forms of data collection.

Examples of explicit data collection include the following:

Asking a user to rate an item on a sliding scale.
Asking a user to search.
Asking a user to rank a collection of items from favorite to least favorite.
Presenting two items to a user and asking him/her to choose the better one of them.
Asking a user to create a list of items that he/she likes (see Rocchio classification or other similar techniques).
Examples of implicit data collection include the following:

Observing the items that a user views in an online store.
Analyzing item/user viewing times.[38]
Keeping a record of the items that a user purchases online.
Obtaining a list of items that a user has listened to or watched on his/her computer.
Analyzing the user's social network and discovering similar likes and dislikes.
Collaborative filtering approaches often suffer from three problems: cold start, scalability, and sparsity.[39]

Cold start: For a new user or item, there isn't enough data to make accurate recommendations.[10][11][12]
Scalability: In many of the environments in which these systems make recommendations, there are millions of users and products. Thus, a large amount of computation power is often necessary to calculate recommendations.
Sparsity: The number of items sold on major e-commerce sites is extremely large. The most active users will only have rated a small subset of the overall database. Thus, even the most popular items have very few ratings.
One of the most famous examples of collaborative filtering is item-to-item collaborative filtering (people who buy x also buy y), an algorithm popularized by Amazon.com's recommender system.[40]

Many social networks originally used collaborative filtering to recommend new friends, groups, and other social connections by examining the network of connections between a user and their friends.[1] Collaborative filtering is still used as part of hybrid systems.

## Solution requirements

We should be concerned about **metrics** to check quality of our solution so it will be easy and representative.

For example, for [movielens](https://movielens.org/) website metrics might be:
- Success metrics
    - high rate from user for recommended film 
    - number of sharing of recommended film 
- Tracking metrics
    - daily visitors
    - time spend
    - daily new users
    - average new rates

We might use **A/B testing** for acquiring data.

When loading page use new recommendation models for part of website visitors for a while and see how they will act now. Calculate success metrics before and after. Implement new models.