<a href="https://colab.research.google.com/github/anshid/python-projects/blob/main/Recommender_Systems_Lab_Crash_Course_AI_16.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Recommender Systems are a class of AI systems that predict and recommend new items (e.g. YouTube videos, Netflix shows, Amazon products).

In this lab, we'll use recommender systems to try to find a good movie for our next movie night!

Here's what we need to do:
* Step 1: Get a dataset of movie ratings, and make sure we understand how the dataset is structured.
* Step 2: Try to get just a non-personalized set of recommendations for John-Green-bot and me, to see if we can find a movie to watch that way.
* Step 3: Get personalized ratings for John-Green-bot and me, and import them into the system in the correct format.
* Step 4: Train a User-User collaborative filtering model to provide personalized recommendations based on John-Green-bot's and my prior ratings.
* Step 5: Combine ratings to generate a single ranked recommendation list for our movie night together!




Just like in our other labs, we're not going to reinvent the wheel from scratch. We'll use an existing dataset published by MovieLens, which contains about 100,000 user ratings for about 10,000 different movies. You can read more about this dataset here: http://files.grouplens.org/datasets/movielens/ml-latest-small-README.html

We'll also use the LensKit API to implement our recommender systems algorithms.

***STEP 1***

**Step 1.1**

In [1]:
!pip install lenskit

import lenskit.datasets as ds
import pandas as pd

!git clone https://github.com/crash-course-ai/lab4-recommender-systems.git

data = ds.MovieLens('lab4-recommender-systems/')

print("Successfully installed dataset.")

Collecting lenskit
  Downloading lenskit-0.14.4-py3-none-any.whl.metadata (7.4 kB)
Collecting numba<0.59,>=0.51 (from lenskit)
  Downloading numba-0.58.1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.7 kB)
Collecting binpickle>=0.3.2 (from lenskit)
  Downloading binpickle-0.3.4-py3-none-any.whl.metadata (2.8 kB)
Collecting seedbank>=0.1.0 (from lenskit)
  Downloading seedbank-0.1.3-py3-none-any.whl.metadata (3.7 kB)
Collecting csr>=0.3.1 (from lenskit)
  Downloading csr-0.5.2-py3-none-any.whl.metadata (2.5 kB)
Collecting llvmlite<0.42,>=0.41.0dev0 (from numba<0.59,>=0.51->lenskit)
  Downloading llvmlite-0.41.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.8 kB)
Collecting anyconfig==0.13.* (from seedbank>=0.1.0->lenskit)
  Downloading anyconfig-0.13.0-py2.py3-none-any.whl.metadata (2.3 kB)
Downloading lenskit-0.14.4-py3-none-any.whl (74 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m74.0/74.0 kB[0m [31m3.1 MB/s[0m 

It's important to understand how a dataset is structured and to make sure that the dataset imported correctly.  Let's print out a few rows of the rating data.

As you see, MovieLens stores a user's ID number (the first row few rows look like they're all ratings from user 1), the item's ID (in this case each ID is a different movie), the rating the user gave this item, and a time stamp for when the rating was left.

**Step 1.2**

In [2]:
rows_to_show = 10   # <-- Try changing this number to see more rows of data
data.ratings.head(rows_to_show)  # <-- Try changing "ratings" to "movies", "tags", or "links" to see the kinds of data that's stored in the other MovieLens files

Unnamed: 0,user,item,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
5,1,70,3.0,964982400
6,1,101,5.0,964980868
7,1,110,4.0,964982176
8,1,151,5.0,964984041
9,1,157,5.0,964984100


A big aspect of recommender system datasets is how they handle missing data. Recommender systems usually have a LOT of missing data, because most users only rate a few movies and most movies only receive ratings from a few users.

For example, we can see that user #1 provided rating of 4.0 to the item #1 and that they provided a rating of 4.0 to item #3. But there isn't a rating for item #2 at all, which means that user #1 never rated this item. It's helpful to know that this dataset doesn't store unranked items at all, instead of, for example, storing unranked items as 0 ratings.

But here we have another small issue: names like item #1 and item #2 aren't very descriptive, so we can't tell what those movies are. Thankfully, MovieLens also has a data table called "movies" that includes information about titles and genres. We can get a more meaningful look at these data by joining the two data files.

**Step 1.3**

In [3]:
joined_data = data.ratings.join(data.movies['genres'], on='item')
joined_data = joined_data.join(data.movies['title'], on='item')
joined_data.head(rows_to_show)

Unnamed: 0,user,item,rating,timestamp,genres,title
0,1,1,4.0,964982703,Adventure|Animation|Children|Comedy|Fantasy,Toy Story (1995)
1,1,3,4.0,964981247,Comedy|Romance,Grumpier Old Men (1995)
2,1,6,4.0,964982224,Action|Crime|Thriller,Heat (1995)
3,1,47,5.0,964983815,Mystery|Thriller,Seven (a.k.a. Se7en) (1995)
4,1,50,5.0,964982931,Crime|Mystery|Thriller,"Usual Suspects, The (1995)"
5,1,70,3.0,964982400,Action|Comedy|Horror|Thriller,From Dusk Till Dawn (1996)
6,1,101,5.0,964980868,Adventure|Comedy|Crime|Romance,Bottle Rocket (1996)
7,1,110,4.0,964982176,Action|Drama|War,Braveheart (1995)
8,1,151,5.0,964984041,Action|Drama|Romance|War,Rob Roy (1995)
9,1,157,5.0,964984100,Comedy|War,Canadian Bacon (1995)


Now we can see the titles and genres of each item, and we'll continue using "join" before printing results in other parts of the lab as well.

Because we've successfully imported our ratings data and see how it's structured, we're done with Step 1.

***STEP 2***

Now that we have ratings, let's create a generic set of recommended movies by looking at the highest rated films. We can average all the ratings by item, sort the list in descending order, and print that top set of recommendations.

**Step 2.1**

In [4]:
average_ratings = (data.ratings).groupby(['item']).mean()
sorted_avg_ratings = average_ratings.sort_values(by="rating", ascending=False)
joined_data = sorted_avg_ratings.join(data.movies['genres'], on='item')
joined_data = joined_data.join(data.movies['title'], on='item')
joined_data = joined_data[joined_data.columns[1:]]

print("RECOMMENDED FOR ANYBODY:")
joined_data.head(rows_to_show)

RECOMMENDED FOR ANYBODY:


Unnamed: 0_level_0,rating,timestamp,genres,title
item,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
88448,5.0,1315438000.0,Comedy|Drama,Paper Birds (Pájaros de papel) (2010)
100556,5.0,1456151000.0,Documentary,"Act of Killing, The (2012)"
143031,5.0,1520409000.0,Comedy|Drama|Romance,Jump In! (2007)
143511,5.0,1526207000.0,Documentary,Human (2015)
143559,5.0,1520410000.0,Comedy|Crime|Fantasy,L.A. Slasher (2015)
6201,5.0,1100120000.0,Drama|Romance,Lady Jane (1986)
102217,5.0,1443200000.0,Comedy,Bill Hicks: Revelations (1993)
102084,5.0,1493422000.0,Action|Animation|Fantasy,Justice League: Doom (2012)
6192,5.0,1063275000.0,Romance,Open Hearts (Elsker dig for evigt) (2002)
145994,5.0,1526207000.0,Comedy,Formula of Love (1984)


That seemed like a good idea, but the results are strange... _Paper Birds_? _Bill Hicks: Revelations_? Those are pretty obscure movies. Let's see what's actually happening here.

In [5]:
average_ratings = (data.ratings).groupby('item') \
       .agg(count=('user', 'size'), rating=('rating', 'mean')) \
       .reset_index()

sorted_avg_ratings = average_ratings.sort_values(by="rating", ascending=False)
joined_data = sorted_avg_ratings.join(data.movies['genres'], on='item')
joined_data = joined_data.join(data.movies['title'], on='item')
joined_data = joined_data[joined_data.columns[1:]]


print("RECOMMENDED FOR ANYBODY:")
joined_data.head(rows_to_show)

RECOMMENDED FOR ANYBODY:


Unnamed: 0,count,rating,genres,title
7638,1,5.0,Comedy|Drama,Paper Birds (Pájaros de papel) (2010)
8089,1,5.0,Documentary,"Act of Killing, The (2012)"
9065,1,5.0,Comedy|Drama|Romance,Jump In! (2007)
9076,1,5.0,Documentary,Human (2015)
9078,1,5.0,Comedy|Crime|Fantasy,L.A. Slasher (2015)
4245,1,5.0,Drama|Romance,Lady Jane (1986)
8136,1,5.0,Comedy,Bill Hicks: Revelations (1993)
8130,1,5.0,Action|Animation|Fantasy,Justice League: Doom (2012)
4240,1,5.0,Romance,Open Hearts (Elsker dig for evigt) (2002)
9104,1,5.0,Comedy,Formula of Love (1984)


Adding the "count" column, we can see that each of these movies was given a perfect 5.0 rating but by just ONE person. They might be good movies, but we can't be very confident in these recommendations.

To improve this list, we should try only including movies in this recommendation list if they have more than a certain number of ratings, so we can be more confident that each movie is generally good. Let's start with movies that 20 or more people rated.

**Step 2.2**

In [6]:
minimum_to_include = 20 #<-- You can try changing this minimum to include movies rated by fewer or more people

average_ratings = (data.ratings).groupby(['item']).mean()
rating_counts = (data.ratings).groupby(['item']).count()
average_ratings = average_ratings.loc[rating_counts['rating'] > minimum_to_include]
sorted_avg_ratings = average_ratings.sort_values(by="rating", ascending=False)
joined_data = sorted_avg_ratings.join(data.movies['genres'], on='item')
joined_data = joined_data.join(data.movies['title'], on='item')
joined_data = joined_data[joined_data.columns[3:]]

print("RECOMMENDED FOR ANYBODY:")
joined_data.head(rows_to_show)

RECOMMENDED FOR ANYBODY:


Unnamed: 0_level_0,genres,title
item,Unnamed: 1_level_1,Unnamed: 2_level_1
318,Crime|Drama,"Shawshank Redemption, The (1994)"
922,Drama|Film-Noir|Romance,Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)
898,Comedy|Drama|Romance,"Philadelphia Story, The (1940)"
475,Drama,In the Name of the Father (1993)
1204,Adventure|Drama|War,Lawrence of Arabia (1962)
246,Documentary,Hoop Dreams (1994)
858,Crime|Drama,"Godfather, The (1972)"
1235,Comedy|Drama|Romance,Harold and Maude (1971)
168252,Action|Sci-Fi,Logan (2017)
2959,Action|Crime|Drama|Thriller,Fight Club (1999)


These movies are more commonly known and we can trust that they're more popularly recommended. But these movies span a bunch of genres, so we can try narrowing the list down a bit more.

Let's try to get a list of recommendations from John-Green-bot's and my favorite genres. I like Action movies and he prefers Romance movies. So in addition to filtering by the number of ratings, let's also filter by a particular genre. We'll run the recommendations for an action movie fan, then for a romance movie fan.

**Step 2.3**

In [7]:
average_ratings = (data.ratings).groupby(['item']).mean()
rating_counts = (data.ratings).groupby(['item']).count()
average_ratings = average_ratings.loc[rating_counts['rating'] > minimum_to_include]
average_ratings = average_ratings.join(data.movies['genres'], on='item')
average_ratings = average_ratings.loc[average_ratings['genres'].str.contains('Action')]

sorted_avg_ratings = average_ratings.sort_values(by="rating", ascending=False)
joined_data = sorted_avg_ratings.join(data.movies['title'], on='item')
joined_data = joined_data[joined_data.columns[3:]]
print("RECOMMENDED FOR AN ACTION MOVIE FAN:")
joined_data.head(rows_to_show)

RECOMMENDED FOR AN ACTION MOVIE FAN:


Unnamed: 0_level_0,genres,title
item,Unnamed: 1_level_1,Unnamed: 2_level_1
168252,Action|Sci-Fi,Logan (2017)
2959,Action|Crime|Drama|Thriller,Fight Club (1999)
58559,Action|Crime|Drama|IMAX,"Dark Knight, The (2008)"
1197,Action|Adventure|Comedy|Fantasy|Romance,"Princess Bride, The (1987)"
260,Action|Adventure|Sci-Fi,Star Wars: Episode IV - A New Hope (1977)
3275,Action|Crime|Drama|Thriller,"Boondock Saints, The (2000)"
1208,Action|Drama|War,Apocalypse Now (1979)
1196,Action|Adventure|Sci-Fi,Star Wars: Episode V - The Empire Strikes Back...
1233,Action|Drama|War,"Boot, Das (Boat, The) (1981)"
1198,Action|Adventure,Raiders of the Lost Ark (Indiana Jones and the...


In [8]:
average_ratings = (data.ratings).groupby(['item']).mean()
rating_counts = (data.ratings).groupby(['item']).count()
average_ratings = average_ratings.loc[rating_counts['rating'] > minimum_to_include]
average_ratings = average_ratings.join(data.movies['genres'], on='item')
average_ratings = average_ratings.loc[average_ratings['genres'].str.contains('Romance')]

sorted_avg_ratings = average_ratings.sort_values(by="rating", ascending=False)
joined_data = sorted_avg_ratings.join(data.movies['title'], on='item')
joined_data = joined_data[joined_data.columns[3:]]
print("RECOMMENDED FOR A ROMANCE MOVIE FAN:")
joined_data.head(rows_to_show)

RECOMMENDED FOR A ROMANCE MOVIE FAN:


Unnamed: 0_level_0,genres,title
item,Unnamed: 1_level_1,Unnamed: 2_level_1
922,Drama|Film-Noir|Romance,Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)
898,Comedy|Drama|Romance,"Philadelphia Story, The (1940)"
1235,Comedy|Drama|Romance,Harold and Maude (1971)
912,Drama|Romance,Casablanca (1942)
1197,Action|Adventure|Comedy|Fantasy|Romance,"Princess Bride, The (1987)"
933,Crime|Mystery|Romance|Thriller,To Catch a Thief (1955)
908,Action|Adventure|Mystery|Romance|Thriller,North by Northwest (1959)
4973,Comedy|Romance,"Amelie (Fabuleux destin d'Amélie Poulain, Le) ..."
356,Comedy|Drama|Romance|War,Forrest Gump (1994)
7361,Drama|Romance|Sci-Fi,Eternal Sunshine of the Spotless Mind (2004)


There's actually one movie that's on both of these lists: _The Princess Bride_. But John-Green-bot doesn't want to rewatch.

So, while Step 2 produced some generic recommendations, our AI hasn't given us a new movie we want to watch together.

***STEP 3***

Step 3 is personalizing our recommender system AI. John-Green-bot and I each need to provide our own movie ratings as data, so we filled out simple spreadsheets. We've uploaded these spreadsheets to GitHub. Here's mine, for example: https://github.com/crash-course-ai/lab4-recommender-systems/blob/master/jabril-movie-ratings.csv

But, we need to provide these personalized ratings in the correct format. By looking at the documentation for LensKit (https://lkpy.lenskit.org/en/stable/interfaces.html#lenskit.algorithms.Recommender.recommend), we know that we need to provide a dictionary of item-rating pairs for each person. This means that we need to import the two spreadsheets from GitHub and format the data in a way that will make sense to our AI: two dictionaries.

To test that it worked, let's also print both our ratings for _The Princess Bride_, since we know that's a movie we both watched.

**Step 3.1**

In [9]:
import csv

jabril_rating_dict = {}
jgb_rating_dict = {}

with open("/content/lab4-recommender-systems/jabril-movie-ratings.csv", newline='') as csvfile:
  ratings_reader = csv.DictReader(csvfile)
  for row in ratings_reader:
    if ((row['ratings'] != "") and (float(row['ratings']) > 0) and (float(row['ratings']) < 6)):
      jabril_rating_dict.update({int(row['item']): float(row['ratings'])})

with open("/content/lab4-recommender-systems/jgb-movie-ratings.csv", newline='') as csvfile:
  ratings_reader = csv.DictReader(csvfile)
  for row in ratings_reader:
    if ((row['ratings'] != "") and (float(row['ratings']) > 0) and (float(row['ratings']) < 6)):
      jgb_rating_dict.update({int(row['item']): float(row['ratings'])})

print("Rating dictionaries assembled!")
print("Sanity check:")
print("\tJabril's rating for 1197 (The Princess Bride) is " + str(jabril_rating_dict[1197]))
print("\tJohn-Green-Bot's rating for 1197 (The Princess Bride) is " + str(jgb_rating_dict[1197]))


Rating dictionaries assembled!
Sanity check:
	Jabril's rating for 1197 (The Princess Bride) is 4.5
	John-Green-Bot's rating for 1197 (The Princess Bride) is 3.5


***STEP 4***

In Step 4, we want to actually train a new collaborative filtering model to provide recommendations. We'll use the UserUser library from LensKit to do this. This algorithm clusters similar users based on their movie ratings, and uses those clusters to predict movie ratings for one user (in this case, we'll want that user to be John-Green-bot or myself).

We're guiding how the algorithm decides whether a particular group of users should be clustered together by setting a minimum and maximum neighborhood size. These parameters modify the result of the algorithm.

Really small clusters represent groups of people who aren't very similar to a lot of others. So by keeping cluster size small, we'll see more unconventional recommendations. But increasing our minimum cluster size, will probably give more conventionally popular recommendations.

Right now, we set the minimum to 3 and the maximum to 15, so the algorithm won't define a cluster unless it has at least 3 users, and it will use the 15 closest users (at most) to make rating predictions. The values we've chosen are considered reasonable defaults, and the "best" values depend on what we want from the recommender system AI. Do they want to be surprised by recommendations they wouldn't otherwise know about? Or are they looking for a more confident expression of quality?

**Step 4.1**

In [10]:
from lenskit.algorithms import Recommender
from lenskit.algorithms.user_knn import UserUser

num_recs = 10  #<---- This is the number of recommendations to generate. You can change this if you want to see more recommendations

user_user = UserUser(15, min_nbrs=3) #These two numbers set the minimum (3) and maximum (15) number of neighbors to consider. These are considered "reasonable defaults," but you can experiment with others too
algo = Recommender.adapt(user_user)
algo.fit(data.ratings)

print("Set up a User-User algorithm!")

Set up a User-User algorithm!


Now that the system has defined clusters, we can give it our personal ratings to get the top 10 recommended movies for me and for John-Green-bot!

For each of us, the User-User algorithm will find a neighborhood of users similar to us based on their movie ratings. It will look at movies that these similar users have rated that we haven't seen yet. Based on their ratings, it will predict how we may rate that movie if we watched it. Finally, it will order these predictions and print them in descending order to give our "top 10."

**Step 4.2**

In [11]:
jabril_recs = algo.recommend(-1, num_recs, ratings=pd.Series(jabril_rating_dict))  #Here, -1 tells it that it's not an existing user in the set, that we're giving new ratings, while 10 is how many recommendations it should generate

joined_data = jabril_recs.join(data.movies['genres'], on='item')
joined_data = joined_data.join(data.movies['title'], on='item')
joined_data = joined_data[joined_data.columns[2:]]
print("\n\nRECOMMENDED FOR JABRIL:")
joined_data



RECOMMENDED FOR JABRIL:


Unnamed: 0,genres,title
0,Comedy|Drama,"Last Detail, The (1973)"
1,Comedy,Love and Death (1975)
2,Drama,Before Night Falls (2000)
3,Drama,"Magdalene Sisters, The (2002)"
4,Drama|Horror|Mystery|Sci-Fi|Thriller,Black Mirror: White Christmas (2014)
5,Action|Animation|Drama|Fantasy|Sci-Fi,Neon Genesis Evangelion: The End of Evangelion...
6,Action|Adventure|Thriller,Raiders of the Lost Ark: The Adaptation (1989)
7,Comedy|Drama|Romance,Submarine (2010)
8,Adventure|Drama,Nebraska (2013)
9,Documentary,"Endless Summer, The (1966)"


In [12]:
jgb_recs = algo.recommend(-1, num_recs, ratings=pd.Series(jgb_rating_dict))  #Here, -1 tells it that it's not an existing user in the set, that we're giving new ratings, while 10 is how many recommendations it should generate

joined_data = jgb_recs.join(data.movies['genres'], on='item')
joined_data = joined_data.join(data.movies['title'], on='item')
joined_data = joined_data[joined_data.columns[2:]]
print("RECOMMENDED FOR JOHN-GREEN-BOT:")
joined_data

RECOMMENDED FOR JOHN-GREEN-BOT:


Unnamed: 0,genres,title
0,Comedy,The Night Before (2015)
1,Adventure|Drama|Sci-Fi,"Day of the Doctor, The (2013)"
2,Drama|Fantasy|Romance,Wristcutters: A Love Story (2006)
3,Comedy|Musical,Holiday Inn (1942)
4,Comedy,Outside Providence (1999)
5,Comedy|Romance,Adam's Rib (1949)
6,Drama,Reign Over Me (2007)
7,Drama,Guess Who's Coming to Dinner (1967)
8,Drama,Half Nelson (2006)
9,Comedy,Fired Up (2009)


Now, we have "top 10" lists of movies for both John-Green-bot and myself! Each of these only has movies that each of us hasn't watched before (or at least that we didn't rate in our personal ratings). These lists include both popular movies and more obscure ones.

That concludes Step 4 of getting personalized recommendations, but our lists don't overlap at all, so we still haven't found a movie for both of us to watch.

***STEP 5***

That brings us to Step 5, making a combined movie recommendation list. Because rating preferences are stored as numbers, we can create a Jabril/John-Green-bot hybrid!

We'll do this by creating a combined dictionary of ratings. If both of us have rated a movie, it will average our ratings. If only one of us has rated a movie, it will just add that movie to the list of preferences. This isn't a perfect strategy; it's possible that I would have hated some movie that I've never seen but John-Green-bot rated highly. But we should get a reasonable estimate across both of our datasets.

We'll also do a quick sanity check by looking at _The Princess Bride_ again. I rated it as a 4.5 (because it's awesome!!) and John-Green-bot rated it as a 3.5, so we'd expect our combined list would have it as a 4.

**Step 5.1**

In [13]:
combined_rating_dict = {}
for k in jabril_rating_dict:
  if k in jgb_rating_dict:
    combined_rating_dict.update({k: float((jabril_rating_dict[k]+jgb_rating_dict[k])/2)})
  else:
    combined_rating_dict.update({k:jabril_rating_dict[k]})
for k in jgb_rating_dict:
   if k not in combined_rating_dict:
      combined_rating_dict.update({k:jgb_rating_dict[k]})

print("Combined ratings dictionary assembled!")
print("Sanity check:")
print("\tCombined rating for 1197 (The Princess Bride) is " + str(combined_rating_dict[1197]))

Combined ratings dictionary assembled!
Sanity check:
	Combined rating for 1197 (The Princess Bride) is 4.0


Looks like everything checks out. So now, we have a combined dictionary that we can plug right into our User-User model to output a ranked list of new movies that we should both enjoy!

**Step 5.2**

In [14]:
combined_recs = algo.recommend(-1, num_recs, ratings=pd.Series(combined_rating_dict))  #Here, -1 tells it that it's not an existing user in the set, that we're giving new ratings, while 10 is how many recommendations it should generate

joined_data = combined_recs.join(data.movies['genres'], on='item')
joined_data = joined_data.join(data.movies['title'], on='item')
joined_data = joined_data[joined_data.columns[2:]]
print("\n\nRECOMMENDED FOR JABRIL / JOHN-GREEN-BOT HYBRID:")
joined_data



RECOMMENDED FOR JABRIL / JOHN-GREEN-BOT HYBRID:


Unnamed: 0,genres,title
0,Comedy|Drama|Romance,Submarine (2010)
1,Drama|Romance,Call Me by Your Name (2017)
2,Drama|Sci-Fi,"Man Who Fell to Earth, The (1976)"
3,Comedy|Romance,Adam's Rib (1949)
4,Drama|War,Gallipoli (1981)
5,Drama,Before Night Falls (2000)
6,Adventure|Drama|Sci-Fi,"Day of the Doctor, The (2013)"
7,Action|Adventure|Thriller,Raiders of the Lost Ark: The Adaptation (1989)
8,Adventure|Drama|Western,True Grit (1969)
9,Comedy,Love and Death (1975)


The number one recommendation is _[Submarine](https://www.imdb.com/title/tt1440292/)_ which is a quirky movie from 2010. If this is too obscure, we could pick a different recommendation from this list like _[True Grit](https://www.imdb.com/title/tt1403865/)_.

We could also go back to Step 4.1 and set different parameters. Setting the minimum and maximum number of neighbors to make bigger clusters (for example, a minimum of 10 and and maximum of 50) would probably yield a more well-known set of movies, but it would also be less tailored to our individual interests. The trade-off between unconventional and popular results is what really characterizes recommender systems!