# Chapter 22. Recommender Systems

In [66]:
from __future__ import division
import math, random
from collections import defaultdict, Counter
from linear_algebra import dot

Another common data challenge is producing [recommendations](https://en.wikipedia.org/wiki/Recommender_system) of some sort.  
Netflix recommends movies that you might want to watch, Amazon recommends products that you might want to buy, Twitter recommends followers, and so on.  
In this chapter, we'll examine a few different ways to use data to make recommendations.  
In particular, we'll look at the data set of `users_interests` that we've used before:

In [67]:
users_interests = [
    ["Hadoop", "Big Data", "HBase", "Java", "Spark", "Storm", "Cassandra"],
    ["NoSQL", "MongoDB", "Cassandra", "HBase", "Postgres"],
    ["Python", "scikit-learn", "scipy", "numpy", "statsmodels", "pandas"],
    ["R", "Python", "statistics", "regression", "probability"],
    ["machine learning", "regression", "decision trees", "libsvm"],
    ["Python", "R", "Java", "C++", "Haskell", "programming languages"],
    ["statistics", "probability", "mathematics", "theory"],
    ["machine learning", "scikit-learn", "Mahout", "neural networks"],
    ["neural networks", "deep learning", "Big Data", "artificial intelligence"],
    ["Hadoop", "Java", "MapReduce", "Big Data"],
    ["statistics", "R", "statsmodels"],
    ["C++", "deep learning", "artificial intelligence", "probability"],
    ["pandas", "R", "Python"],
    ["databases", "HBase", "Postgres", "MySQL", "MongoDB"],
    ["libsvm", "regression", "support vector machines"]
]

We'll use this data to address the problem of recommending new interests to a user based on her currently specified interests.

## Manual Curation

Given DataSciencester's limited number of users and interests, you could probably just spend an afternoon manually recommending interests for each user.  
However, this method doesn't scale very well, and it's limited by your personal knowledge and imagination.  
Intead, let's think about what we can do with our data.

## Recommending What's Popular

One easy approach is to simply recommend what's popular:

In [68]:
popular_interests = Counter(interest
                            for user_interests in users_interests
                            for interest in user_interests).most_common()

popular_interests

[('Python', 4),
 ('R', 4),
 ('Java', 3),
 ('regression', 3),
 ('statistics', 3),
 ('probability', 3),
 ('HBase', 3),
 ('Big Data', 3),
 ('neural networks', 2),
 ('Hadoop', 2),
 ('deep learning', 2),
 ('pandas', 2),
 ('artificial intelligence', 2),
 ('libsvm', 2),
 ('C++', 2),
 ('Postgres', 2),
 ('MongoDB', 2),
 ('scikit-learn', 2),
 ('machine learning', 2),
 ('statsmodels', 2),
 ('Cassandra', 2),
 ('NoSQL', 1),
 ('Mahout', 1),
 ('Storm', 1),
 ('MySQL', 1),
 ('programming languages', 1),
 ('Haskell', 1),
 ('mathematics', 1),
 ('Spark', 1),
 ('numpy', 1),
 ('theory', 1),
 ('decision trees', 1),
 ('MapReduce', 1),
 ('scipy', 1),
 ('databases', 1),
 ('support vector machines', 1)]

Having computed this, we can just suggest to a user the most popular interests that she hasn't already specified:

In [69]:
def most_popular_new_interests(user_interests, max_results=5):
    suggestions = [(interest, frequency)
                    for interest, frequency in popular_interests
                    if interest not in user_interests]
    return suggestions[:max_results]

So, if you are user 1 and have the following interests:

then you would be recommended:

In [70]:
most_popular_new_interests(users_interests[1], 5)

[('Python', 4), ('R', 4), ('Java', 3), ('regression', 3), ('statistics', 3)]

If you are user 3, who has already specified many of those interests listed above, you would instead be recommended:

In [71]:
most_popular_new_interests(users_interests[3], 5)

[('Java', 3),
 ('HBase', 3),
 ('Big Data', 3),
 ('neural networks', 2),
 ('Hadoop', 2)]

While user 8, who has many interests in common with user 1: 

gets the same 5 recommendations:

In [72]:
most_popular_new_interests(users_interests[8], 5)

[('Python', 4), ('R', 4), ('Java', 3), ('regression', 3), ('statistics', 3)]

This technique can be somewhat useful, but "lots of people are interested in Python so you should be too" is not the most compelling sales pitch.  
However, if someone is brand new to our site and we know nothing about them, that might be the best we can do.  
Let's see how we can do better by basing each user's recommendations on her particular interests.

## User-Based Collaborative Filtering