# Recommendation Engines

## Introduction

Recommendations are being used to recommend everything from movies to music to friends to new destinations. There are three main methods for implementing recommendations that you will become familiar with throughout this lesson:
* Knowledge Based Recommendations
* Collaborative Filtering Based Recommendations
* Content Based Recommendations

After completing this lesson, you will be ready for the upcoming lessons where you will:
* Learn about more advanced techniques.
* Deploy your recommendations in a web application.

These three lessons will aim to be extremely practical. The lessons will require that you write code to implement a number of different recommendation techniques.

**Example Recommendations:**

* LinkedIn and Facebook
> Both LinkedIn and Facebook have recommendations for connections (business of friends) similar to what is shown below.

* AirBnB Experiences and Destinations
> AirBnB uses recommendations to determine experiences and destinations for their users.

* Walmart, Amazon, and Other Retailers
> As humans on the Internet, we all get pinged with constant recommendations from retailers.

## What's Ahead

### Types of Recommendations

In this lesson, you will be working with the MovieTweetings data to apply each of the three methods of recommendations:
1. Knowledge Based Recommendations
2. Collaborative Filtering Based Recommendations
3. Content Based Recommendations

Within Collaborative Filtering, there are two main branches:
1. Model Based Collaborative Filtering
2. Neighborhood Based Collaborative Filtering

In this lesson, you will implement Neighborhood Based Collaborative Filtering. In the next lesson, you will implement Model Based Collaborative Filtering.

### Similarity Metrics

In order to implement Neighborhood Based Collaborative Filtering, you will learn about some common ways to measure the similarity between two users (or two items) including:
1. Pearson's correlation coefficient
2. Spearman's correlation coefficient
3. Kendall's Tau
4. Euclidean Distance
5. Manhattan Distance

You will learn why sometimes one metric works better than another by looking at a specific situation where one metric provides more information than another.

### Business Cases For Recommendations

Finally, you will look at the four ideas needed for businesses to implement successful recommendations to drive revenue, which include:
1. Relevance
2. Novelty
3. Serendipity
4. Increased Diversity

At the end of this lesson, you will have gained a ton of skills to build upon or to start creating your own recommendations in practice.

## Base Data - MovieTweetings

If you would like additional information about the MovieTweetings data, you can find more information at the links provided here:
* [The MovieTweetings white paper(DEADLINK)](http://crowdrec2013.noahlab.com.hk/papers/crowdrec2013_Dooms.pdf)
* [A Github account set up for MovieTweetings](https://github.com/sidooms/MovieTweetings)
* [A slide deck by Simon Doom about MovieTweetings.](https://www.slideshare.net/simondooms/movie-tweetings-a-movie-rating-dataset-collected-from-twitter)
> Attached in repo as well

### Recommendations with MovieTweetings: Getting to Know The Data

Throughout this lesson, you will be working with the [MovieTweetings Data](https://github.com/sidooms/MovieTweetings/tree/master/recsyschallenge2014).

**Note:** There are solutions to each of the notebooks available by hitting the orange jupyter logo in the top left of this notebook.  Additionally, you can watch me work through the solutions on the screencasts that follow each workbook. 

To get started, read in the libraries and the two datasets you will be using throughout the lesson using the code below.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tests as t

%matplotlib inline

In [3]:
# Read in the MovieTweetings dataset originally taken from https://github.com/sidooms/MovieTweetings/tree/master/latest
movies = pd.read_csv(
    '06_recommendation_engines/movies.dat',
    delimiter='::',
    header=None,
    names=['movie_id', 'movie', 'genre'],
    dtype={'movie_id': object},
    engine='python')
reviews = pd.read_csv(
    '06_recommendation_engines/ratings.dat',
    delimiter='::',
    header=None,
    names=['user_id', 'movie_id', 'rating', 'timestamp'],
    dtype={'movie_id': object, 'user_id': object, 'timestamp': object},
    engine='python')