<a href="https://colab.research.google.com/github/bdwalker1/UCSD_MLE_Bootcamp_mec2-projects/blob/main/21_9_5_Student_MLE_MiniProject_Recommendation_Engines_WalkerBruce.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Mini Project: Recommendation Engines

Recommendation engines are algorithms designed to provide personalized suggestions or recommendations to users. These systems analyze user behavior, preferences, and interactions with items (products, movies, music, articles, etc.) to predict and offer items that users are likely to be interested in. Recommendation engines play a crucial role in enhancing user experience, driving engagement, and increasing conversion rates in various applications, including e-commerce, entertainment, content platforms, and more.

There are generally two approaches taken in collaborative filtering and content-based recommendation engines:

**1. Collaborative Filtering:**
Collaborative Filtering is a popular approach to building recommendation systems that leverages the collective behavior of users to make personalized recommendations. It is based on the idea that users who have agreed in the past will likely agree in the future. There are two main types of collaborative filtering:

- **User-based Collaborative Filtering:** This method finds users similar to the target user based on their past interactions (e.g., ratings or purchases). It then recommends items that similar users have liked but the target user has not interacted with yet.

- **Item-based Collaborative Filtering:** In this approach, the system identifies similar items based on user interactions. It recommends items that are similar to the ones the target user has already liked or interacted with.

Collaborative filtering does not require any explicit information about items but relies on the similarity between users or items. It is effective in capturing complex patterns and can provide serendipitous recommendations. However, it suffers from the cold-start problem (i.e., difficulty in recommending to new users or items with no interactions) and scalability challenges in large datasets.

**2. Content-Based Recommendation:**
Content-based recommendation is an alternative approach to building recommendation systems that focuses on the attributes or features of items and users. It leverages the characteristics of items to make recommendations. The key steps involved in content-based recommendation are:

- **Feature Extraction:** For each item, relevant features are extracted. For movies, these features could be genre, director, actors, and plot summary.

- **User Profile:** A user profile is created based on the items they have interacted with in the past. The user profile contains the weighted importance of features based on their interactions.

- **Similarity Calculation:** The similarity between items or between items and the user profile is calculated using similarity metrics like cosine similarity or Euclidean distance.

- **Recommendation:** Items that are most similar to the user profile are recommended to the user.

Content-based recommendation systems are less affected by the cold-start problem as they can still recommend items based on their features. They are also more interpretable as they rely on item attributes. However, they may miss out on providing serendipitous recommendations and can be limited by the quality of feature extraction and user profiles.

**Choosing Between Collaborative Filtering and Content-Based:**
Both collaborative filtering and content-based approaches have their strengths and weaknesses. The choice between them depends on the specific requirements of the recommendation system, the type of data available, and the user base. Hybrid approaches that combine collaborative filtering and content-based techniques are also common, aiming to leverage the strengths of both methods and mitigate their weaknesses.

In this mini-project, you'll be building both content based and collaborative filtering engines for the [MovieLens 25M dataset](https://grouplens.org/datasets/movielens/25m/). The MovieLens 25M dataset is one of the most widely used and popular datasets for building and evaluating recommendation systems. It is provided by the GroupLens Research project, which collects and studies datasets related to movie ratings and recommendations. The MovieLens 25M dataset contains movie ratings and other related information contributed by users of the MovieLens website.

**Dataset Details:**
- **Size:** The dataset contains approximately 25 million movie ratings.
- **Users:** It includes ratings from over 162,000 users.
- **Movies:** The dataset consists of ratings for more than 62,000 movies.
- **Ratings:** The ratings are provided on a scale of 1 to 5, where 1 is the lowest rating and 5 is the highest.
- **Timestamps:** Each rating is associated with a timestamp, indicating when the rating was given.

**Data Files:**
The dataset is usually split into three CSV files:

1. **movies.csv:** Contains information about movies, including the movie ID, title, genres, and release year.
   - Columns: movieId, title, genres

2. **ratings.csv:** Contains movie ratings provided by users, including the user ID, movie ID, rating, and timestamp.
   - Columns: userId, movieId, rating, timestamp

3. **tags.csv:** Contains user-generated tags for movies, including the user ID, movie ID, tag, and timestamp.
   - Columns: userId, movieId, tag, timestamp

First, import all the libraries you'll need.

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [4]:
import os
import zipfile
import numpy as np
import pandas as pd
from urllib.request import urlretrieve
from sklearn.metrics.pairwise import cosine_similarity

Next, download the relevant components of the MoveLens dataset. Note, these instructions are roughly based on the colab [here](https://colab.research.google.com/github/google/eng-edu/blob/main/ml/recommendation-systems/recommendation-systems.ipynb?utm_source=ss-recommendation-systems&utm_campaign=colab-external&utm_medium=referral&utm_content=recommendation-systems#scrollTo=O3bcgduFo4s6).

In [5]:
data_dir = "/content/drive/MyDrive/miscdata/29.1.5_RecEngines"
os.chdir(f"{data_dir}")

if os.path.exists(f"{data_dir}/ml-100k"):
  print("Dataset folder found.")
else:
  print("Downloading movielens data...")

  urlretrieve('http://files.grouplens.org/datasets/movielens/ml-100k.zip', 'movielens.zip')
  zip_ref = zipfile.ZipFile('movielens.zip', 'r')
  zip_ref.extractall()
  print("Done. Dataset contains:")
  print(zip_ref.read('ml-100k/u.info'))



Dataset folder found.


In [6]:
ratings_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings_df = pd.read_csv(
    'ml-100k/u.data', sep='\t', names=ratings_cols, encoding='latin-1')

# The movies file contains a binary feature for each genre.
genre_cols = [
    "genre_unknown", "Action", "Adventure", "Animation", "Children", "Comedy",
    "Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror",
    "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"
]
movies_cols = [
    'movie_id', 'title', 'release_date', "video_release_date", "imdb_url"
] + genre_cols
movies_df = pd.read_csv(
    'ml-100k/u.item', sep='|', names=movies_cols, encoding='latin-1')

Before doing any kind of machine learning, it's always good to familiarize yourself with the datasets you'lll be working with.

Here are your tasks:

1. Spend some time familiarizing yourself with both the `movies` and `ratings` dataframes. How many unique user ids are present? How many unique movies are there?
2. Create a new dataframe that merges the `movies` and `ratings` tables on 'movie_id'. Only keep the 'user_id', 'title', 'rating' fields in this new dataframe.

In [7]:
# Spend some time familiarizing yourself with both the movies and ratings
# dataframes. How many unique user ids are present? How many unique movies
# are there?

In [8]:
print(f"Unique user IDs in ratings: {ratings_df['user_id'].nunique()}")
print(f"Unique movie IDs in ratings: {ratings_df['movie_id'].nunique()}")
print(f"Unique movie IDs in movies: {movies_df['movie_id'].nunique()}")

Unique user IDs in ratings: 943
Unique movie IDs in ratings: 1682
Unique movie IDs in movies: 1682


In [9]:
print(ratings_df.head())
print(movies_df.head())


   user_id  movie_id  rating  unix_timestamp
0      196       242       3       881250949
1      186       302       3       891717742
2       22       377       1       878887116
3      244        51       2       880606923
4      166       346       1       886397596
   movie_id              title release_date  video_release_date  \
0         1   Toy Story (1995)  01-Jan-1995                 NaN   
1         2   GoldenEye (1995)  01-Jan-1995                 NaN   
2         3  Four Rooms (1995)  01-Jan-1995                 NaN   
3         4  Get Shorty (1995)  01-Jan-1995                 NaN   
4         5     Copycat (1995)  01-Jan-1995                 NaN   

                                            imdb_url  genre_unknown  Action  \
0  http://us.imdb.com/M/title-exact?Toy%20Story%2...              0       0   
1  http://us.imdb.com/M/title-exact?GoldenEye%20(...              0       1   
2  http://us.imdb.com/M/title-exact?Four%20Rooms%...              0       0   
3  http://u

In [10]:
print(movies_df.isnull().sum())
print(ratings_df.isnull().sum())

movie_id                 0
title                    0
release_date             1
video_release_date    1682
imdb_url                 3
genre_unknown            0
Action                   0
Adventure                0
Animation                0
Children                 0
Comedy                   0
Crime                    0
Documentary              0
Drama                    0
Fantasy                  0
Film-Noir                0
Horror                   0
Musical                  0
Mystery                  0
Romance                  0
Sci-Fi                   0
Thriller                 0
War                      0
Western                  0
dtype: int64
user_id           0
movie_id          0
rating            0
unix_timestamp    0
dtype: int64


In [11]:
# Merge movies and ratings dataframes
merged_df = pd.merge(movies_df[['movie_id','title']], ratings_df[['movie_id','user_id','rating']], on='movie_id').set_index('movie_id')
merged_df.head()

Unnamed: 0_level_0,title,user_id,rating
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Toy Story (1995),308,4
1,Toy Story (1995),287,5
1,Toy Story (1995),148,4
1,Toy Story (1995),280,4
1,Toy Story (1995),66,3


In [18]:
def user_top_5(user_id):
  user_top_5_df = ratings_df[ratings_df['user_id'] == user_id].sort_values(by='rating', ascending=False).head(5)
  return user_top_5_df.merge(movies_df[['movie_id','title']], on='movie_id')['title'].values

for i in range(20):
  user_id = np.random.choice(ratings_df['user_id'].unique())
  print(f"User ID {user_id}")
  print(f"-------------")
  for title in user_top_5(user_id):
    print(title)
  print("\n")

User ID 515
-------------
Air Force One (1997)
Blues Brothers 2000 (1998)
Scream (1996)
Titanic (1997)
Starship Troopers (1997)


User ID 8
-------------
Braveheart (1995)
GoodFellas (1990)
Star Wars (1977)
Alien (1979)
Full Metal Jacket (1987)


User ID 851
-------------
Apocalypse Now (1979)
Game, The (1997)
Primal Fear (1996)
She's So Lovely (1997)
Escape from L.A. (1996)


User ID 125
-------------
Air Force One (1997)
Evita (1996)
Aladdin (1992)
Ace Ventura: Pet Detective (1994)
Monty Python and the Holy Grail (1974)


User ID 298
-------------
Sound of Music, The (1965)
Monty Python and the Holy Grail (1974)
It Happened One Night (1934)
M*A*S*H (1970)
E.T. the Extra-Terrestrial (1982)


User ID 631
-------------
Good Will Hunting (1997)
Jackie Brown (1997)
Evita (1996)
Titanic (1997)
Life Less Ordinary, A (1997)


User ID 800
-------------
Sling Blade (1996)
Rosewood (1997)
Ransom (1996)
Godfather, The (1972)
Air Force One (1997)


User ID 266
-------------
Sense and Sensibility 

As mentioned in the introduction, content-Based Filtering is a recommendation engine approach that focuses on the attributes or features of items (products, movies, music, articles, etc.) and leverages these features to make personalized recommendations. The underlying idea is to match the characteristics of items with the preferences of users to suggest items that align with their interests. Content-based filtering is particularly useful when explicit user-item interactions (e.g., ratings or purchases) are sparse or unavailable.

**Key Steps in Content-Based Filtering:**

1. **Feature Extraction:**
   - For each item, relevant features are extracted. These features are typically descriptive attributes that can be represented numerically, such as genre, director, actors, author, publication date, and keywords.
   - In the case of text-based items, natural language processing techniques may be used to extract features like TF-IDF (Term Frequency-Inverse Document Frequency) scores.

2. **User Profile Creation:**
   - A user profile is created based on the items they have interacted with in the past. The user profile contains the weighted importance of features based on their interactions.
   - For example, if a user has watched several action movies, the action genre feature would receive a higher weight in their profile.

3. **Similarity Calculation:**
   - The similarity between items or between items and the user profile is calculated using similarity metrics like cosine similarity, Euclidean distance, or Pearson correlation.
   - Cosine similarity is commonly used as it measures the cosine of the angle between two vectors, which represents their similarity.

4. **Recommendation:**
   - Items that are most similar to the user profile are recommended to the user. These are items whose features have the highest similarity scores with the user profile.
   - The recommended items are presented as a list sorted by their similarity scores.

**Advantages of Content-Based Filtering:**
1. **No Cold-Start Problem:** Content-based filtering can make recommendations even for new users with no historical interactions because it relies on item features rather than user history.

2. **User Independence:** The recommendations are based solely on the features of items and do not require knowledge of other users' preferences or behavior.

3. **Transparency:** Content-based recommendations are interpretable, as they depend on the features of items, making it easier for users to understand why specific items are recommended.

4. **Serendipity:** Content-based filtering can recommend items with characteristics not seen before by the user, leading to serendipitous discoveries.

5. **Diversity in Recommendations:** The method can offer diverse recommendations since it suggests items with different feature combinations.

**Limitations of Content-Based Filtering:**
1. **Limited Discovery:** Content-based filtering may struggle to recommend items outside the scope of users' historical interactions or interests.

2. **Over-Specialization:** Users may receive recommendations that are too similar to their previous choices, leading to a lack of exposure to new item categories.

3. **Dependency on Feature Quality:** The quality and relevance of item features significantly influence the quality of recommendations.

4. **Limited for Cold Items:** Content-based filtering can struggle to recommend new items with limited feature information.

Here is your task:

1. Write a function that takes in a user id and the dataframe you created before that contains 'user_id', 'title', and 'rating'. The function should return content-based recommendations for this user. Here are steps you can take:

  A. Get the user's rated movies

  B. Create a TF-IDF matrix using movie genres. Note, this can be extracted from the `movies` dataframe.

  C. Compute the cosine similarity between movie genres. Use the [cosine_similarity](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html) function.

  D. Get the indices of similar movies to those rated by the user based on cosine similarity. Keep only the top 5.

  E. Remove duplicates and movies already rated by the user.

In [13]:
# from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Content-Based Filtering using Movie Genres
def content_based_recommendation(user_id, df):
  # Get the user's rated movies
  movies_seen = df[df['user_id'] == user_id].index
  # print(movies_seen)

  # Create a TF-IDF matrix using movie genres
  movies_tfidf_df = movies_df[genre_cols]
  # print(movies_tfidf_df)

  # Build user profile based on movies seen
  movies_seen_df = movies_tfidf_df.iloc[movies_seen]
  # print(movies_seen_df)
  user_profile = movies_seen_df.mean()
  # print(user_profile)

  # Compute the cosine similarity between movie genres
  movies_unseen_df = movies_tfidf_df.drop(movies_seen, axis=0)
  movies_similar_df = pd.DataFrame(cosine_similarity(user_profile.values.reshape(1,-1),movies_unseen_df).T, index=movies_unseen_df.index, columns=["score"])
  # print(movies_similar_df.head())

  # Get the indices of the similar movies based on cosine similarity
  similar_movies_indices = movies_similar_df.sort_values(by="score", ascending=False).head(5).index.values
  # print(similar_movies_indices)

  # Remove duplicates and movies already rated by the user
  recommendations_df = movies_df.iloc[similar_movies_indices][['title']]
  recommendations = list(recommendations_df['title'])
  # print(recommendations)
  return recommendations


In [14]:
# Test recommendation function
for i in range(20):
  user_id=np.random.choice(merged_df['user_id'].unique())
  print(f"User ID {user_id}")
  print(f"-------------")
  recs = content_based_recommendation(user_id, merged_df)
  for rec in recs:
    print(rec)
  print("\n")

User ID 334
-------------
Get Shorty (1995)
House of Yes, The (1997)
Grace of My Heart (1996)
Swingers (1996)
Kicked in the Head (1997)


User ID 692
-------------
House of Yes, The (1997)
Twelfth Night (1996)
Cinema Paradiso (1988)
Wings of Desire (1987)
What Happened Was... (1994)


User ID 119
-------------
House of Yes, The (1997)
Faster Pussycat! Kill! Kill! (1965)
Get Shorty (1995)
American President, The (1995)
Corrina, Corrina (1994)


User ID 186
-------------
Apollo 13 (1995)
Tough and Deadly (1995)
Mercury Rising (1998)
Fire Down Below (1997)
Hostile Intentions (1994)


User ID 302
-------------
Diva (1981)
Apollo 13 (1995)
Mercury Rising (1998)
Outbreak (1995)
Tough and Deadly (1995)


User ID 93
-------------
Wings of Desire (1987)
Manhattan (1979)
Don Juan DeMarco (1995)
Cinema Paradiso (1988)
Brassed Off (1996)


User ID 394
-------------
Get Shorty (1995)
Best Men (1997)
Wings of Desire (1987)
Corrina, Corrina (1994)
American President, The (1995)


User ID 188
--------

The key idea behind collaborative filtering is that users who have agreed in the past will likely agree in the future. Instead of relying on item attributes or user profiles, collaborative filtering identifies patterns of user behavior and item preferences from the interactions present in the data.

**Types of Collaborative Filtering:**
There are two main types of collaborative filtering:

**Collaborative Filtering Process:**
The collaborative filtering process typically involves the following steps:

1. **Data Collection:**
   - Gather data on user-item interactions, such as movie ratings, product purchases, or article clicks.

2. **User-Item Matrix:**
   - Organize the data into a user-item matrix, where rows represent users, columns represent items, and the entries contain the users' interactions (e.g., ratings).

3. **Similarity Calculation:**
   - Calculate the similarity between users or items using similarity metrics such as cosine similarity, Pearson correlation, or Jaccard similarity.
   - For user-based collaborative filtering, user similarities are calculated, and for item-based collaborative filtering, item similarities are calculated.

4. **Neighborhood Selection:**
   - For each user or item, select the most similar users or items as the neighborhood.
   - The size of the neighborhood (the number of similar users or items to consider) is an important parameter to control the system's behavior.

5. **Prediction Generation:**
   - Predict the ratings for items that the target user has not yet interacted with by combining the ratings of neighboring users or items.

6. **Recommendation Generation:**
   - Recommend items with the highest predicted ratings to the target user.

**Advantages of Collaborative Filtering using User-Item Interactions:**
- Collaborative filtering is based solely on user interactions and does not require knowledge of item attributes, making it useful for cases where item data is sparse or unavailable.
- It can provide serendipitous recommendations, suggesting items that users may not have discovered on their own.
- Collaborative filtering can be applied in various domains, including e-commerce, music, movie, and content recommendations.

**Limitations of Collaborative Filtering:**
- The cold-start problem: Collaborative filtering struggles to recommend to new users or items with no or limited interaction history.
- It may suffer from sparsity when data is limited or when users have only interacted with a small subset of items.
- Scalability issues can arise with large datasets and an increasing number of users or items.

Here is your task:

1. Write a function that takes in a user id and the dataframe you created before that contains 'user_id', 'title', and 'rating'. The function should return collaborative filtering recommendations for this user based on a user-item interaction matrix. Here are steps you can take:

  A. Create the user-item matrix using Pandas' [pivot_table](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html).

  B. Fill missing values with zeros in this matrix.

  C. Calculate user-user similarity matrix using cosine similarity.

  D. Get the array of similarity scores of the target user with all other users from the similarity matrix.

  E. Extract, say the the top 5 most similar users (excluding the target user).

  F. Generate movie recommendations based on the most similar users.

  G. Remove duplicate movies recommendations.

In [15]:
# Collaborative Filtering using User-Item Interactions
def collaborative_filtering_recommendation(user_id, df):
  # Create the user-item matrix
  ratings_pivot = df.pivot_table(index='user_id', columns='movie_id', values='rating')

  # Fill missing values with 0 (indicating no rating)
  avg_ratings = ratings_pivot.mean(axis=1)
  ratings_pivot = ratings_pivot.sub(avg_ratings, axis=0)
  ratings_pivot.fillna(0, inplace=True)
  # print(ratings_pivot)

  # Calculate user-user similarity matrix using cosine similarity
  similarities = cosine_similarity(ratings_pivot)
  similarities_df = pd.DataFrame(similarities, index=ratings_pivot.index, columns=ratings_pivot.index)

  # Get the similarity scores of the target user with all other users
  similar_users = similarities_df.loc[user_id]
  # print(similar_users)

  # Find the top N most similar users (excluding the target user)
  most_similar_users = similar_users.sort_values(ascending=False)[1:11].index.values
  # print(most_similar_users)

  # Generate movie recommendations based on the most similar users
  similar_ratings = ratings_df.loc[ratings_df["user_id"].isin(most_similar_users)]
  similar_ratings_count = similar_ratings.groupby("movie_id")["rating"].count()
  similar_ratings_mean = similar_ratings.groupby("movie_id")["rating"].mean()
  similar_users_ratings = pd.DataFrame({"rating_count": similar_ratings_count, "rating_mean": similar_ratings_mean})
  # print(similar_users_ratings)

  # Remove movies the user has seen
  recommended_movies = similar_users_ratings.iloc[~similar_users_ratings.index.isin(merged_df.loc[merged_df["user_id"] == user_id].index)]
  recommended_movies = recommended_movies.drop(recommended_movies[recommended_movies["rating_count"] < 3].index)
  recommended_movies = recommended_movies.sort_values(by="rating_mean", ascending=False)[:5]
  # print(recommended_movies)

  recommendations = list(movies_df.loc[movies_df["movie_id"].isin(recommended_movies.index)]["title"])
  # print(recommendations)
  return recommendations


In [16]:
# Test recommendation function
for i in range(10):
  user_id=np.random.choice(merged_df['user_id'].unique())
  print(f"User ID {user_id}")
  print(f"-------------")
  recs = collaborative_filtering_recommendation(user_id, merged_df)
  for rec in recs:
    print(rec)
  print("\n")

User ID 941
-------------
Star Wars (1977)
Starship Troopers (1997)
English Patient, The (1996)
Game, The (1997)
Saint, The (1997)


User ID 86
-------------
In the Company of Men (1997)
L.A. Confidential (1997)
Ulee's Gold (1997)
Titanic (1997)
Apostle, The (1997)


User ID 35
-------------
Full Monty, The (1997)
Rosewood (1997)
Liar Liar (1997)
Titanic (1997)
I Know What You Did Last Summer (1997)


User ID 729
-------------
Fargo (1996)
Devil's Own, The (1997)
L.A. Confidential (1997)
Ulee's Gold (1997)
Wag the Dog (1997)


User ID 546
-------------
Godfather, The (1972)
Empire Strikes Back, The (1980)
L.A. Confidential (1997)
Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1963)
Trainspotting (1996)


User ID 557
-------------
Silence of the Lambs, The (1991)
Raiders of the Lost Ark (1981)
Indiana Jones and the Last Crusade (1989)
Good Will Hunting (1997)
Titanic (1997)


User ID 415
-------------
Godfather, The (1972)
Fly Away Home (1996)
Titanic (1997)
Lost 

Now, test your recommendations engines! Select a few user ids and generate recommendations using both functions you've written. Are the recommendations similar? Do the recommendations make sense?

In [17]:
# Test the recommendation engines
for i in range(10):
  user_id=np.random.choice(merged_df['user_id'].unique())
  print(f"User ID {user_id}")
  print(f"-------------")
  top5_df = pd.DataFrame(user_top_5(user_id), columns=["Top 5"])
  recs_content_df = pd.DataFrame(content_based_recommendation(user_id, merged_df), columns=["Content-Based"])
  recs_collab_df = pd.DataFrame(collaborative_filtering_recommendation(user_id, merged_df), columns=["Collaborative"])
  combo_df = pd.concat([top5_df, recs_content_df, recs_collab_df], axis=1)
  display(combo_df)
  print('\n')

User ID 564
-------------


Unnamed: 0,Top 5,Content-Based,Collaborative
0,Independence Day (ID4) (1996),Best Men (1997),Mr. Holland's Opus (1995)
1,Deconstructing Harry (1997),Get Shorty (1995),Fargo (1996)
2,"Godfather, The (1972)",Faster Pussycat! Kill! Kill! (1965),Star Trek: First Contact (1996)
3,Rosewood (1997),C'est arrivé près de chez vous (1992),Courage Under Fire (1996)
4,"Apostle, The (1997)",Hana-bi (1997),"Boot, Das (1981)"




User ID 815
-------------


Unnamed: 0,Top 5,Content-Based,Collaborative
0,"Killing Fields, The (1984)",Get Shorty (1995),Babe (1995)
1,"Princess Bride, The (1987)",Faster Pussycat! Kill! Kill! (1965),"Wrong Trousers, The (1993)"
2,Casablanca (1942),Twelfth Night (1996),"Sting, The (1973)"
3,Star Trek: The Wrath of Khan (1982),I Like It Like That (1994),Sense and Sensibility (1995)
4,"City of Lost Children, The (1995)",Manhattan (1979),"Philadelphia Story, The (1940)"




User ID 713
-------------


Unnamed: 0,Top 5,Content-Based,Collaborative
0,"Apostle, The (1997)",Hostile Intentions (1994),Scream (1996)
1,"Rainmaker, The (1997)",Condition Red (1995),As Good As It Gets (1997)
2,Apt Pupil (1998),Fire Down Below (1997),Conspiracy Theory (1997)
3,"Full Monty, The (1997)",Mercury Rising (1998),Scream 2 (1997)
4,L.A. Confidential (1997),Faster Pussycat! Kill! Kill! (1965),"Sweet Hereafter, The (1997)"




User ID 170
-------------


Unnamed: 0,Top 5,Content-Based,Collaborative
0,"MatchMaker, The (1997)",Outbreak (1995),"Full Monty, The (1997)"
1,Rosewood (1997),Hostile Intentions (1994),L.A. Confidential (1997)
2,"Devil's Own, The (1997)",Tough and Deadly (1995),Titanic (1997)
3,Air Force One (1997),Apollo 13 (1995),Kiss the Girls (1997)
4,G.I. Jane (1997),Fire Down Below (1997),Picture Perfect (1997)




User ID 10
-------------


Unnamed: 0,Top 5,Content-Based,Collaborative
0,Sling Blade (1996),Something to Talk About (1995),"Godfather: Part II, The (1974)"
1,"Magnificent Seven, The (1954)",Manhattan (1979),Shall We Dance? (1996)
2,Alien (1979),Don Juan DeMarco (1995),"Boot, Das (1981)"
3,Vertigo (1958),"Corrina, Corrina (1994)","Killing Fields, The (1984)"
4,2001: A Space Odyssey (1968),Cinema Paradiso (1988),High Noon (1952)




User ID 249
-------------


Unnamed: 0,Top 5,Content-Based,Collaborative
0,Real Genius (1985),Faster Pussycat! Kill! Kill! (1965),Apocalypse Now (1979)
1,2 Days in the Valley (1996),Get Shorty (1995),Miller's Crossing (1990)
2,Like Water For Chocolate (Como agua para choco...,"House of Yes, The (1997)",Ran (1985)
3,North by Northwest (1959),Á köldum klaka (Cold Fever) (1994),Chinatown (1974)
4,Menace II Society (1993),Sleepover (1995),Some Folks Call It a Sling Blade (1993)




User ID 803
-------------


Unnamed: 0,Top 5,Content-Based,Collaborative
0,"English Patient, The (1996)",Mercury Rising (1998),Twelve Monkeys (1995)
1,"Wings of the Dove, The (1997)",Condition Red (1995),Fargo (1996)
2,Eve's Bayou (1997),Outbreak (1995),Breaking the Waves (1996)
3,"Full Monty, The (1997)",Tough and Deadly (1995),L.A. Confidential (1997)
4,Kolya (1996),Apollo 13 (1995),Trainspotting (1996)




User ID 107
-------------


Unnamed: 0,Top 5,Content-Based,Collaborative
0,"Full Monty, The (1997)",Diva (1981),Fargo (1996)
1,Boogie Nights (1997),Condition Red (1995),Good Will Hunting (1997)
2,"Big Lebowski, The (1998)",Outbreak (1995),Ulee's Gold (1997)
3,L.A. Confidential (1997),Hostile Intentions (1994),"Devil's Advocate, The (1997)"
4,Contact (1997),Fire Down Below (1997),Jackie Brown (1997)




User ID 526
-------------


Unnamed: 0,Top 5,Content-Based,Collaborative
0,Apt Pupil (1998),Faster Pussycat! Kill! Kill! (1965),Mr. Holland's Opus (1995)
1,"Full Monty, The (1997)",Get Shorty (1995),Jerry Maguire (1996)
2,"Close Shave, A (1995)",Mercury Rising (1998),Chasing Amy (1997)
3,Brassed Off (1996),Condition Red (1995),Fly Away Home (1996)
4,That Thing You Do! (1996),Tough and Deadly (1995),As Good As It Gets (1997)




User ID 393
-------------


Unnamed: 0,Top 5,Content-Based,Collaborative
0,"Man Who Knew Too Little, The (1997)",Don Juan DeMarco (1995),"Silence of the Lambs, The (1991)"
1,"Long Kiss Goodnight, The (1996)","American President, The (1995)",Raiders of the Lost Ark (1981)
2,Happy Gilmore (1996),What Happened Was... (1994),"Godfather: Part II, The (1974)"
3,"Bye Bye, Love (1995)",Manhattan (1979),Patton (1970)
4,Gattaca (1997),Brassed Off (1996),Fried Green Tomatoes (1991)






## **Observations**

## Are the recommendations similar?

* Interestingly, there seems to be little to no overlap in the top five content-based and collaborative recommendations.
* Both methods, however, seem to offer a good mix of movies from various genres.

## Do the recommendations make sense?

* Based on each user's top five rated movies, I would say both recommendation methods are doing a good job recommending movies similar to the user's top rated movies.

## Other thoughts

* The content-based recommendation engine recommends the same movies more often than the collaborative engine. I think this provides more personalized and more unique options.