# Project 4: Movie Recommender System

CS 598 Practical Statistical Learning

2023-12-10

UIUC Fall 2023

**Authors**
* Ryan Fogle
    - rsfogle2@illinois.edu
    - UIN: 652628818
* Sean Enright
    - seanre2@illinois.edu
    - UIN: 661791377

**Contributions**

* Ryan: Initial System I implementation. Edge-case handling for System II. Initial web application deployment. Testing web app, System I and System II.
* Sean: Initial System II implementation. Review and refactoring. Adding initial suggestions and input to web app. Testing web app, System I and System II.

## Introduction

In this project, we describe our implementation of a recommender system to provide movie recommendations using the [MovieLens 1M Dataset](https://grouplens.org/datasets/movielens/1m/).

Our project consists of two components:

**System I: Recommender by Genre**
Given an input genre, we recommend the top movies within the selected genre.

**System II: Recommender by Rating**
This system accepts a set of movie ratings and outputs a list of recommended movies, ordered by the predicted rating.

These two systems are described in greater detail below.

Our implementation uses Python.

## Loading MovieLens Data

We build DataFrames using `ratings.dat`, `movies.dat` and `users.dat` from the dataset. These three contain the information needed to link movies to their tiles and review data.

In [21]:
import pandas as pd
from pathlib import Path

In [22]:
movie_dir = Path('.') / 'ml-1m' / 'ml-1m' 
ratings = pd.read_csv(movie_dir / 'ratings.dat', sep='::',
                      engine = 'python', header=None)
ratings.columns = ['UserID', 'MovieID', 'Rating', 'Timestamp']
movies = pd.read_csv(movie_dir / 'movies.dat', sep='::', engine = 'python',
                     encoding="ISO-8859-1", header = None)
movies.columns = ['MovieID', 'Title', 'Genres']
users = pd.read_csv(movie_dir / 'users.dat', sep='::',
                    engine='python', header=None)
users.columns = ['UserID', 'Gender', 'Age', 'Occupation', 'Zipcode']

# Create new entry for each genre in a movie, join movie and ratings together. 
movies['Genres'] = movies['Genres'].str.split('|')
df = movies.merge(ratings, left_on="MovieID", right_on="MovieID")
df = df.explode('Genres')
df.rename(columns={'Genres': 'Genre'}, inplace=True)
df

Unnamed: 0,MovieID,Title,Genre,UserID,Rating,Timestamp
0,1,Toy Story (1995),Animation,1,5,978824268
0,1,Toy Story (1995),Children's,1,5,978824268
0,1,Toy Story (1995),Comedy,1,5,978824268
1,1,Toy Story (1995),Animation,6,4,978237008
1,1,Toy Story (1995),Children's,6,4,978237008
...,...,...,...,...,...,...
1000206,3952,"Contender, The (2000)",Thriller,5837,4,1011902656
1000207,3952,"Contender, The (2000)",Drama,5927,1,979852537
1000207,3952,"Contender, The (2000)",Thriller,5927,1,979852537
1000208,3952,"Contender, The (2000)",Drama,5998,4,1001781044


Create a map to look up movie titles from their `MovieID` index.

In [23]:
mov_title_map = dict(zip(movies['MovieID'], movies['Title']))
list(mov_title_map.items())[:5]

[(1, 'Toy Story (1995)'),
 (2, 'Jumanji (1995)'),
 (3, 'Grumpier Old Men (1995)'),
 (4, 'Waiting to Exhale (1995)'),
 (5, 'Father of the Bride Part II (1995)')]

## System I: Recommendation Based on Genres

General idea: Use Bayseian probabilty to calcuate new ratings based upon additional prior assumptions, and then rank movies by this new rating to select the top 5 per genre.

This algorthm is based upon [How to Balance Number of Ratings Versus the Ratings Themselves](
https://stackoverflow.com/questions/2495509/how-to-balance-number-of-ratings-versus-the-ratings-themselves)

$$
\tilde{R} = \frac{\bar{w} \bar{r} + \sum_{i=1}^{n}{r_i}}{\bar{w} + n}
$$

Where:
- $\tilde{R}$ is the new rating
- $\bar{w}$ is the predefined number of ratings (weight) to include in our prior assumption
- $\bar{r}$ is the predefined average rating to include in our prior assumption
- $n$ is the number of ratings
- $r_i$ is the rating for a given entry.

**Interpretation**:

Consider $\bar{w}$ to be the average number of ratings for a given genre, and $\bar{r}$ to be the number of times to consider that rating for a given genre. We first assume the rating of a movie to be defined as $\frac{\bar{w} \bar{r}}{\bar{w}}$ when $n=0$, then slightly update the estimate for each new given rating. 

In our implementation we defined $\bar{w}$ to be the median genre rating from the average movie ratings in that genre. We also define $\bar{r}$ to be the genre's 25th percentile count (of ratings per movie)

In [24]:
# Group by Genre and Movie, this will be used to find median ratings
# and percentile counts for our prior
gb = df.groupby(['Genre', 'MovieID'])

# Median Genre ratings of Average Movie ratings
median_ratings = gb['Rating'].mean().reset_index()
median_ratings = median_ratings.groupby('Genre')['Rating'].median()
median_ratings = median_ratings.reset_index()
median_ratings = dict(zip(median_ratings['Genre'], median_ratings['Rating']))
median_ratings

{'Action': 3.1510791366906474,
 'Adventure': 3.1986301369863015,
 'Animation': 3.483720930232558,
 "Children's": 3.0820630318900215,
 'Comedy': 3.202020202020202,
 'Crime': 3.388888888888889,
 'Documentary': 3.802223154362416,
 'Drama': 3.5,
 'Fantasy': 3.2377450980392157,
 'Film-Noir': 3.927212522576761,
 'Horror': 2.7450980392156863,
 'Musical': 3.5777233782129745,
 'Mystery': 3.404844933503611,
 'Romance': 3.3846153846153846,
 'Sci-Fi': 3.165253793311199,
 'Thriller': 3.2687585266030013,
 'War': 3.733826247689464,
 'Western': 3.5236907730673317}

In [25]:
# Grab 25th Percentile of count by genre
quantile_count = gb['Timestamp'].count().reset_index()
quantile_count = quantile_count.groupby('Genre')['Timestamp'].quantile(0.25)
quantile_count = quantile_count.reset_index()
quantile_count.columns = ['Genre', 'Count']
quantile_count = dict(zip(quantile_count['Genre'], quantile_count['Count']))
quantile_count

{'Action': 126.0,
 'Adventure': 91.0,
 'Animation': 129.0,
 "Children's": 65.5,
 'Comedy': 45.0,
 'Crime': 42.0,
 'Documentary': 5.0,
 'Drama': 25.0,
 'Fantasy': 154.5,
 'Film-Noir': 32.0,
 'Horror': 36.0,
 'Musical': 112.0,
 'Mystery': 114.75,
 'Romance': 43.5,
 'Sci-Fi': 136.0,
 'Thriller': 79.0,
 'War': 82.0,
 'Western': 46.0}

Run algorithm

In [26]:
weighted_ratings = []
for (genre, movie_id), movie in gb:
    n = movie.shape[0]
    w = quantile_count[genre]
    r = median_ratings[genre] 
    weighted_rating = (r * w + movie['Rating'].sum()) / (w + n)
    weighted_ratings.append(
        (genre, movie_id, mov_title_map[movie_id],
         weighted_rating, movie['Rating'].sum() / n, n))

weighted_recs = pd.DataFrame(
    weighted_ratings,
    columns=['Genre', 'MovieID', 'Title', 'WeightedRating',
             'AverageRating', '# of Ratings'])
weighted_recs

Unnamed: 0,Genre,MovieID,Title,WeightedRating,AverageRating,# of Ratings
0,Action,6,Heat (1995),3.7927167,3.8787234,940
1,Action,9,Sudden Death (1995),2.9299823,2.6568627,102
2,Action,10,GoldenEye (1995),3.4921459,3.5405405,888
3,Action,15,Cutthroat Island (1995),2.7795440,2.4589041,146
4,Action,20,Money Train (1995),2.8078181,2.5375000,160
...,...,...,...,...,...,...
6187,Western,3737,Lonely Are the Brave (1962),3.7717685,4.0000000,50
6188,Western,3792,Duel in the Sun (1946),3.5431337,3.5555556,72
6189,Western,3806,MacKenna's Gold (1969),3.3758632,3.2586207,58
6190,Western,3871,Shane (1953),3.7979766,3.8393443,305


Sort ratings by WeightedRating, group by genre and grab the first five occurrences. 

In [27]:
sysI_recs = weighted_recs.sort_values(
    'WeightedRating', ascending=False).groupby('Genre').head(n=10)
sysI_recs = sysI_recs.sort_values(
    ['Genre', 'WeightedRating'], ascending=[True, False])
sysI_recs

Unnamed: 0,Genre,MovieID,Title,WeightedRating,AverageRating,# of Ratings
111,Action,858,"Godfather, The (1972)",4.4512712,4.5249663,2223
137,Action,1198,Raiders of the Lost Ark (1981),4.4144076,4.4777247,2514
31,Action,260,Star Wars: Episode IV - A New Hope (1977),4.4010382,4.4536944,2991
258,Action,2019,Seven Samurai (The Magnificent Seven) (Shichin...,4.3249814,4.5605096,628
260,Action,2028,Saving Private Ryan (1998),4.2835682,4.3373539,2653
...,...,...,...,...,...,...
6184,Western,3671,Blazing Saddles (1974),4.0266865,4.0473637,1119
6167,Western,3037,Little Big Man (1970),4.0064204,4.0439932,591
6136,Western,599,"Wild Bunch, The (1969)",4.0035154,4.0871212,264
6173,Western,3365,"Searchers, The (1956)",3.9968721,4.0857143,245


Output System I recommends for the dashboard to use.

In [28]:
sysI_recs.to_csv('sysI_recs.csv', index=False)

## System II: Recommendation Based on IBCF

### Overview

The recommendation system we have implemented follows these main steps:
1) Collect the user and rating data for all considered movies.
2) Calculate the centered cosine similarity between all pairs of movies (items) for the provided user rating data.
3) Use Item-based Collaborative Filtering to predict the ratings of unrated movies.
4) Suggest the movies with the highest predicted ratings.

### Similarity Matrix Construction

To construct the similarity matrix, we require user ratings for various items. The input rating matrix is $R_{a \times i}$, where $a$ is the number of users who have reviewed one or more movie, and $i$ is the number of movies.

In the case of our dataset, there are 6040 users and 3706 movies, so $R$ is of shape $6040 \times 3706$.

In [29]:
user_mov_df = pd.read_csv('Rmat.csv')
user_mov_df.shape

(6040, 3706)

### Normalization of Ratings Matrix
We normalize the rating matrix by subtracting the row means from each row, ignoring `NA` entries. This addresses the variation in each user's average rating.

In [30]:
user_mov_df_norm = user_mov_df.sub(user_mov_df.mean(axis=1, skipna=True), axis=0)


### Cosine Similarity

We seek to compute the similarity between movies (items). We select centered cosine similarity as our measure of similarity. Having normalized our ratings matrix by each user's average rating, the next step is computation of similarity.

Cosine similarity is defined generally as

$$
\frac{(u - \bar{u})^T (v - \bar{v})}
     {\lVert u - \bar{u} \rVert \cdot \lVert v - \bar{v} \rVert}
$$
where $u$ and $v$ are vectors

In our case, these vectors represent user ratings for a given movie. The above similarity value will range from $[-1, 1]$, but we prefer a range of $[0, 1]$, so we perform a transformation on the similarity accordingly. For each pair of movies $i$ and $j$, the similarity value $S_{ij}$ is

$$
S_{ij} =
\frac{1}{2} +
\frac{1}{2}
     \frac{\sum_{l \in \mathcal{I}_{ij}} R_{li} R_{lj}}
          {\sqrt{\sum_{l \in \mathcal{I}_{ij}} R_{li}^2} \,
           \sqrt{\sum_{l \in \mathcal{I}_{ij}} R_{lj}^2}}
$$

where $\mathcal{I}_{ij}$ is the set of users who have both reviewed movies $i$ and $j$, and $R$ is the ratings matrix defined above.

Additionally, we excluding similarity values for any movies with fewer than three ratings, i.e., items with cardinality < 3.

Since the resulting similarity matrix $S$ is symmetric, we only compute the upper half, and then fill in the lower half by transposing it.

Our implementation of the centered cosine similarity between items follows.

In [31]:
import numpy as np
from tqdm import tqdm

def cosine_similarity(x, min_cardinality=3):
    """Compute the cosine similarity between the movies of the input ratings
       matrix, producing a symmetric similarity matrix.

    Args:
        x (np.ndarray): Ratins matrix, of shape (# users, # ratings)
        min_cardinality (int, optional): Minimum cardinality of rating.
                                         Defaults to 3.

    Returns:
        np.ndarray: Symmetric similarity matrix
    """
    # Prepare symmetric result matrix
    s = np.empty((x.shape[1], x.shape[1]))
    s[:] = np.nan

    # Calculate similarity for upper triangular matrix
    for i in tqdm(range(0, x.shape[1] - 1)):
        i_valid = ~np.isnan(x[:, i])
        for j in range(i + 1, x.shape[1]):
            j_valid = ~np.isnan(x[:, j])
            row_mask = np.logical_and(i_valid, j_valid)
            if row_mask.sum() >= min_cardinality:
                r_li = x[row_mask, i]
                r_lj = x[row_mask, j]
                s[i, j] = (np.dot(r_li, r_lj)
                           / (np.sqrt(np.power(r_li, 2).sum()) 
                              * np.sqrt(np.power(r_lj, 2).sum())))
    s = 0.5 + s / 2

    # Transpose upper triangular matrix to form lower
    lower_idx = np.tril_indices(x.shape[1])
    s[lower_idx] = s.T[lower_idx]
    return s

We apply this function to our centered ratings matrix, producing a symmetric similarity matrix $S_{i \times i}$.

We extract and re-wrap the column indices to retain the movie IDs.

In [32]:
min_cardinality = 3

s = cosine_similarity(user_mov_df_norm.to_numpy(),
                      min_cardinality=min_cardinality)
s = pd.DataFrame(data=s,
                 index=user_mov_df_norm.columns,
                 columns=user_mov_df_norm.columns)

100%|██████████| 3705/3705 [04:07<00:00, 14.99it/s] 


#### Validation of Similarity Matrix Before Filtering

In order to validate our similarity matrix and our implementation of centered cosine similarity, we show the pairwise similarity values from the $S$ matrix for the following specified movies:

```m1, m10, m100, m1510, m260, m3212```

We are validating our results against the values in [Campuswire post #861](https://campuswire.com/c/G06C55090/feed/861)

In [33]:
pd.set_option("display.precision", 7)
specified_movies = ["m1", "m10", "m100", "m1510", "m260", "m3212"]
s.loc[specified_movies, specified_movies]

Unnamed: 0,m1,m10,m100,m1510,m260,m3212
m1,,0.5121055,0.3919999,,0.7411482,
m10,0.5121055,,0.5474583,,0.5343338,
m100,0.3919999,0.5474583,,,0.3296943,
m1510,,,,,,
m260,0.7411482,0.5343338,0.3296943,,,
m3212,,,,,,


### Filtering by Most Similar Movies

Next, for each movie, we determine the 30 most similar movies and set all other movies to NA. This allows for a more compact $S$ matrix. For movies that have fewer than 30 similar movies, all available similar movies (i.e., non-`NA`) are kept.

In [34]:
max_similar = 30

for i in range(s.shape[0]):
    row = s.iloc[i, :]
    num_selected = min([(~np.isnan(row)).sum(), max_similar, len(row)])
    # Find max allowed similarity with NaN vals
    max_sim = np.roll(np.sort(row)[::-1],
                      -np.count_nonzero(np.isnan(row)))[num_selected - 1]
    na_mask = row < max_sim
    s.iloc[i, na_mask] = np.nan

This filtered similarity matrix is written to file as `similarity.csv`.

In [35]:
s.to_csv("similarity.csv")

### Item-based Collaborative Filtering

In Item-based Collaborative Filtering (IBCF), given a single set of ratings for a user or a hypothetic user, we seek to predict the ratings of unrated movies based on the ratings given by similar users. We use user ratings to infer similarities between movies.

#### Implementation of ICBF

**ICBF Calculation**

 For all non-rated movies the predicted rating. It is the inner product of all ratings seen in both the input rating vector and the similarity vector for that movie, normalized by the sum of similarity scores for these rated movies. It is computed as follows:

 $$
 \frac
    {1}
    {\sum_{i \in S(l)} S_{li} \textbf{1}_{w_i \neq NA}}
 \sum_{i \in S(l)} S_{li} w_i
 $$

where $w_i$ is a vector of movie ratings for a single user.

Movies with no overlap between the input rating vector and the similarity vector for that movie are given an `NA` rating.

**Results, Ordering and Special Cases**

Having computed the predicted ratings for all movies, we return the `MovieID`s for the 10 movies with the highest predicted rating.

In the case of tie breaks, movies are recommended by rating in descending order. We use the `WeightedRatings` column from the System I implementation for our definition of highest-rated movies. If further ties occur, the movie with the lowest `MovieID` is taken.

If fewer than 10 recommendations are calculated, we fill the missing recommendations with the highest-rated movies in the user's most watched genres. In the case of a tie in most-watched genre, the genre with the lowest name in lexicographic order is taken.

Both cases ensure that the recommended movies are unrated by the user.

In [36]:
def myIBCF(s, newuser, mov_rate_genre, genre_top_recs, num_recs=10):
    """Use item-based collaborative filtering to generate predicted ratings
       for unrated movies.

    Args:
        s (pd.DataFrame): Filtered similarity matrix
        newuser (pd.Series): A list of ratings for each MovieID
        mov_rate_genre (pd.DataFrame): For each movie, the title, weighted
            rating, # of ratings, and genres.
        genre_top_recs (pd.DataFrame): The top 10 recommended movies per genre
        num_recs (int, optional): # of recommendations to give. Defaults to 10.

    Returns:
        pd.DataFrame: The recommended movies
    """

    recs = newuser.copy(deep=True).rename("PredictedRating")
    recs.iloc[:] = np.nan
    
    i_in_w = ~np.isnan(newuser)
    # Compute predicted rating for all non-rated movies
    for l in np.arange(newuser.shape[0])[np.isnan(newuser)]:
        s_li = s.iloc[l, :]
        i_in_sl = ~np.isnan(s_li)
        col_mask = np.logical_and(i_in_sl, i_in_w)
        if s_li[col_mask].sum() == 0:
            continue
        recs.iloc[l] = (
            1 / (s_li[col_mask].sum())
            * np.dot(s_li[col_mask], newuser[col_mask])
        )
    recs = recs[~np.isnan(recs)] 
    
    movie_recs = mov_rate_genre.join(recs, how="inner")
    movie_recs.sort_values(by=["PredictedRating", "WeightedRating"],
                           axis=0, ascending=False, inplace=True)

    if movie_recs.shape[0] >= num_recs:
        movie_recs = movie_recs.iloc[:num_recs, :]
        rec_movie_ids = movie_recs.index.tolist()
    else:
        # Begin with all available recommendations
        rec_movie_ids = movie_recs.index.tolist()
        
        # Add remainding recommendations based on most rated genre
        addl_recs = num_recs - movie_recs.shape[0]
        
        # Identify most-rated genre
        rated_genres = mov_rate_genre["Genres"][~np.isnan(newuser)]
        genre_tup = np.unique(np.concatenate(rated_genres.values),
                              return_counts=True)
        most_watched_genre = genre_tup[0][np.argsort(genre_tup[1])[-1]]
        
        # Identify highest-rated movies in this genre
        genre_recs = genre_top_recs[
            genre_top_recs["Genre"] == most_watched_genre]
        
        # Check that top movies in genre are unrated
        genre_recs.loc[:, "MovieID"] = (
            "m" + genre_recs.loc[:, "MovieID"].astype(str))
        unwatched = [m not in newuser[i_in_w].index.tolist() 
                        for m in genre_recs["MovieID"].tolist()]
        genre_recs = genre_recs[unwatched]["MovieID"][:addl_recs].tolist()
        
        rec_movie_ids += genre_recs

    return rec_movie_ids
        

Our implementation requires weighted rating data for each movie, as well the genre(s) of each movie. This is computed in advance and stored in `movie_ratings_genre.csv`.

In [37]:
# Link each MovieID to its rating
mov_rate_genre = weighted_recs[
    ["MovieID", "WeightedRating", "# of Ratings"]
    ].groupby("MovieID").mean()
mov_rate_genre["# of Ratings"] = mov_rate_genre["# of Ratings"].astype(int)

# Link movie to title
title_df = (weighted_recs[["MovieID", "Title"]]
            .groupby("MovieID").agg(lambda x: np.unique(x)[0]))
mov_rate_genre = mov_rate_genre.join(title_df)

# Link each MovieID to its genre(s)
mov_rate_genre["Genres"] = (weighted_recs[["MovieID", "Genre"]]
                                .groupby("MovieID")["Genre"]
                                .apply(list))
mov_rate_genre.index = "m" + mov_rate_genre.index.astype(str)
mov_rate_genre.sort_values(by="MovieID", inplace=True)

# Write to file
mov_rate_genre.to_csv("movie_ratings_genre.csv")

# Read from file to simulate app
mov_rate_genre = pd.read_csv("movie_ratings_genre.csv", index_col=0,
                             converters={"Genres": pd.eval})
mov_rate_genre

Unnamed: 0_level_0,WeightedRating,# of Ratings,Title,Genres
MovieID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
m1,4.1163910,2077,Toy Story (1995),"[Animation, Children's, Comedy]"
m10,3.5064141,888,GoldenEye (1995),"[Action, Adventure, Thriller]"
m100,3.1376020,128,City Hall (1996),"[Drama, Thriller]"
m1000,3.2795699,20,Curdled (1996),[Crime]
m1002,3.3602058,8,Ed's Next Move (1996),[Comedy]
...,...,...,...,...
m994,4.0642105,450,Big Night (1996),[Drama]
m996,2.9821257,256,Last Man Standing (1996),"[Action, Drama, Western]"
m997,3.3582077,28,Caught (1996),"[Drama, Thriller]"
m998,3.1099418,93,Set It Off (1996),"[Action, Crime]"


#### Validation of `myIBCF`

To validate our implementation of `myIBCF`, we show the top 10 recommendations for:
* User "u1181" from rating matrix $R$
* User "u1351" from rating matrix $R$
* A hypothetical user who rates movie “m1613” with 5 and movie “m1755” with 4

In [38]:
hypothetical_user = user_mov_df.iloc[0, :].copy(deep=True)
hypothetical_user.iloc[:] = np.nan
hypothetical_user.loc[["m1613", "m1755"]] = [5, 4]

test_users = [
    ("User u1181", user_mov_df.loc["u1181", :]),
    ("User u1351", user_mov_df.loc["u1351", :]),
    ("Hypothetical user", hypothetical_user)
]

for username, w in test_users:
    print(f"\n{username}\n--{len(username)*'-'}\n")
    print(myIBCF(s, w, mov_rate_genre, sysI_recs))


User u1181
------------

['m3732', 'm749', 'm3899', 'm3789', 'm1253', 'm337', 'm1914', 'm1734', 'm249', 'm504']

User u1351
------------

['m1234', 'm2061', 'm853', 'm3887', 'm1514', 'm3012', 'm318', 'm2028', 'm2858', 'm1262']

Hypothetical user
-------------------

['m3536', 'm592', 'm1688', 'm3566', 'm2724', 'm74', 'm3254', 'm2805', 'm424', 'm765']


Test for edge-case of less than 10 recommendations given by IBCF

In [39]:
hypothetical_user = user_mov_df.iloc[0, :].copy(deep=True)
hypothetical_user.iloc[:] = np.nan
hypothetical_user.loc[["m6"]] = [5]

test_users = [
    ("Hypothetical user", hypothetical_user)
]

for username, w in test_users:
    print(f"\n{username}\n--{len(username)*'-'}\n")
    print(myIBCF(s, w, mov_rate_genre, sysI_recs))


Hypothetical user
-------------------

['m3012', 'm860', 'm526', 'm1102', 'm3056', 'm50', 'm904', 'm745', 'm2762', 'm908']


### Initial Movie Recommendations

In our application, in order to gauge the user's movie preferences, we provide an initial set of movies and ask for ratings to be input. This set of initial movies consists of the most-rated movies with a rating of 4 or higher. It is precomputed to improve application performance.

In [40]:
initial_size = 10

# Determine most-reviewed movies with rating of 4 or higher
title_suggs = (weighted_recs[["MovieID", "WeightedRating", "# of Ratings"]]
               .groupby("MovieID").mean())
title_suggs = title_suggs[title_suggs["WeightedRating"] >= 4]
title_suggs.sort_values(by=["# of Ratings"], axis=0,
                         ascending=False, inplace=True)
title_suggs = title_suggs.iloc[:initial_size, :]
title_suggs.index = "m" + title_suggs.index.astype(str)
title_suggs = title_suggs.index.tolist()

# Save to file
with open("initial_suggestions.txt", "w") as fp:
    s = "\n".join(map(str, title_suggs))
    fp.write(s)

## Application

We demonstrate our implementation of System I and System II in [our web application](https://psl-2023-recommender.streamlit.app/), which uses the Streamlit framework to deploy our recommender.