# User-Based Collaborative Filtering with Word2Vec

This notebook can be used to train and test user-based collaborative filtering models using Word2Vec. Before you start, a couple things are needed:

1. A database with scraped movie reviews and movie data. See the Groa/web_scraping folder for a [scraper](https://github.com/Lambda-School-Labs/Groa/blob/master/web_scraping/scraper.py) you can use to get the reviews. Movie data can be found [here](https://datasets.imdbws.com/). You'll want the files 'title.basics.tsv.gz' and 'title.ratings.tsv.gz'.

2. Folders in the current working directory titled:
    - /models
        - This is where trained models are saved.
    - /training_data
        - This is where training data is saved.
    - /exported_data
         - This contains a small version of the IMDb title.basics.tsv file that we use to quickly retrieve movie info.
         - It also contains /letterboxd and /imdb, where test data is saved for scoring various models. I have included my own data as an example of how to format any test data you can collect. The letterboxd data is just the unzipped Letterboxd export folder, exactly as it comes from the site.


## Connect to Database
Do this at the start of every session.

In [None]:
! pip3 install psycopg2-binary --user
import pandas as pd
import psycopg2
import numpy as np
from getpass import getpass

# connect to database. Never save your password to a notebook.
connection = psycopg2.connect(
    database  = "postgres",
    user      = "postgres",
    password  = getpass(), # secure password entry. Enter DB password in the prompt and press Enter.
    host      = "groadb1.c8fhsnrobgko.us-east-1.rds.amazonaws.com",
    port      = '5432'
)

# create cursor that is used throughout
try:
    c = connection.cursor()
    print("Connected!")
except Exception as e:
    print("Connection problem chief!\n")
    print(e)

## Data preparation plan:

1. Get the list of reviewers whose reviews we want (about 17k).
    - We only want to train on users who have positively rated a minimum number of movies, otherwise their watch histories will give us poor-quality associations between movies. So we set the minimum number as the variable `n` below.
2. Create the dataframe of reviewers, movie IDs with positive reviews
3. Inner join the above two dataframes to remove positive reviews whose reviewers don't meet our criteria.
4. Run the list constructor on the join table to construct the training data.
    - Training data is of this format: [['movieid1', 'movieid2', ...],['movieid3', 'movieid4', ...], ...]
                                      ^user1 watch history     ^user2 watch history 
5. Save the training data.
6. Train Word2Vec on the list of watch histories (which are themselves lists of movie IDs).

In [49]:
# Get reviewers with at least n positive reviews (rating m-10 inclusive)

n = 5 # minimum number of positive reviews for a watch history to be included
m = 7 # minimum star rating for inclusion (IMDb rating scale)

c.execute(f"""
SELECT username
FROM reviews
WHERE user_rating BETWEEN {m} AND  10
GROUP BY username
HAVING COUNT(username) >= {n}
ORDER BY COUNT(username) DESC
""")

'''
Minimum rating for training data has been increased to 8 stars in v3_LimitingFactor.

Explanation: v2_MistakeNot is returning movies with an average rating of 7.66, 
which is towards the low end of the distribution in the training data. It might be 
near the mean, but we want our model to give the user an above-average movie experience.


'''
reviewers = c.fetchall()

In [3]:
# how many reviewers qualify for inclusion?
len(reviewers)

35371

#### Training note: 

The following query currently returns reviews in no discernible order.
This is because the reviews were inserted into the database by multiple scrapers
running in parallel.

Future users of this notebook should take care to note whether their database gives
the same result. The reason the order is important is that Word2Vec trains by learning to predict the movies
within a 10-movie window. A non-random ordering may introduce a bias. This bias might improve the model, e.g. 
in the case where training data is sorted by review date. For all our initial models, however, we have elected
not to pursue that approach. 

There are two reasons for this:
1. Ultimately, the user "taste vector" that is the input for making recommendations is
a vector average of all the movies the user has watched, so it's not perfectly analogous to finding similar movies
to a single movie.
2. More importantly, sorting the training data by review date would influence the model to associate movies
according to the order people watch them in. This has pros and cons. We don't want to provide recommendations 
that lead a user down a path that others have trodden, and this would seem to be one potential drawback. But further
testing is needed. It might be a great idea.

In [4]:
# Get IDs of all positive reviews from database with minimum star rating m
# Note: we didn't scrape Letterboxd reviews so we only have 10-scale ratings

# query for IDs
c.execute(f"""SELECT movie_id, username 
            FROM reviews 
            WHERE user_rating BETWEEN {m} and 10""")
result = c.fetchall()

# create reviews dataframe
df = pd.DataFrame(result, columns = ['movieid', 'userid'])
df.head()

Unnamed: 0,movieid,userid
0,1485796,jchao-49659
1,1485796,areneehunt
2,1485796,elycooldzine
3,1485796,tonka-82858
4,1485796,aryassen


In [5]:
# create reviewers dataframe
df_reviewers = pd.DataFrame(reviewers, columns = ['userid'])

In [6]:
# number of reviewers
df_reviewers.shape

(35371, 1)

In [7]:
# merge to get only the IDs relevant to training
df = df.merge(df_reviewers, how='inner', on='userid')
df.shape

(933359, 2)

In [11]:
# ! sudo su
# ! yum update -y
# ! yum -y install python-pip
! python -V #should be python 3.7

Python 3.7.1


In [66]:
# ! which pip

### Install gensim

In [None]:
! python -m pip install tqdm # this is just a terminal loading bar, not necessary
# ! python -c 'import tqdm'
! python -m pip install gensim

In [14]:
# list to capture watch history of the users
watched_train = []

# populate the list with the movie codes
for i in reviewers:
    temp = df[df["userid"] == i[0]]["movieid"].tolist()
    watched_train.append(temp)
    
len(watched_train) # number of watch histories

35371

In [15]:
# save the data so we don't lose all that hard work
import pickle
pickle.dump(watched_train, open(f'training_data/watched_train_{m}-10_{n}reviews.sav', 'wb'))

In [166]:
# # uncomment if you want to save the model in protocol 2 so it can be opened in python 2.7
# import pickle
# temp = pickle.load(open('watched_train.sav', 'rb'))
# pickle.dump(temp, open('watched_train.sav', 'wb'), protocol=2)

## Train the Model

Small note:
The first version of this model was trained on movie IDs that were inside lists of length 1, with watch histories being lists of lists. This version eschews the inner lists. Each watch history is simply a list of strings.

In [154]:
# should be 3.6.0 or above
gensim.__version__

'3.6.0'

In [58]:
# load training data
import pickle
watched_train = pickle.load(open('training_data/watched_train_4.sav', 'rb'))
len(watched_train)

35371

### Tuning Hyperparameters

Much insight can be gained from reading [Word2vec applied to Recommendation: Hyperparameters Matter](https://www.groundai.com/project/word2vec-applied-to-recommendation-hyperparameters-matter/1).

- 4 hyperparameters that can significantly improve results:
    - negative sampling distribution
        - not included yet.
        - (negative) 0.5 in best results
    - number of epochs
        - 90-150 in the best results.
    - subsampling parameter
        - 10^-4 in all the best results.
    - window-size
        - Set to 6000 to capture whole history?
        - Perhaps it shouldn't be so large, to capture movies watched around the same time. But that requires ordering the training data by review date. Great experiment to try. 
        - Best results set this 3-7. Surprisingly small.
- `Tuning the parameters seems to drive the algorithm towards organizing the space in a way or another (e.g. better positioning the top items in the space, or pushing away less demanded items). Furthermore, the homogeneity of popularity between items of a same sequence, the shape of the popularity distribution, or the heterogeneity of the items in the catalog have a direct impact on the task evaluation.`
    - Probably do this after the first try at optimizing the above.
- `Some authors claim without empirical or theoretical verification that it is best to use a ”infinite” window-size (Barkan and
Koenigstein, [2016](https://www.groundai.com/project/word2vec-applied-to-recommendation-hyperparameters-matter/1#bib.bib2)), meaning that the whole sessions is considered as one context, but most arbitrarily used a fixed value without further discussion.`


In [59]:
import random
from gensim.models import Word2Vec 
import matplotlib.pyplot as plt
%matplotlib inline
import warnings;
warnings.filterwarnings('ignore')

# train word2vec model
model = Word2Vec(
                 size = 100, # vector size
                 window = 10, # perhaps increase this
                 sg = 1, # sets to skip-gram
                 hs = 0, # must be set to 0 for negative sampling
                 negative = 10, # for negative sampling
                 ns_exponent = 0.3, # 0.5 in best results
                 alpha=0.03, min_alpha=0.0007,
                 seed = 14,
                 sample = 0.0001 # 10^-4 in best results
                )
model.min_rating = "8?" # store the minimum rating of training data
model.min_pos_reviews = 5 # store the minimum number of positive reviews
model.build_vocab(watched_train, progress_per=200)
model.train(watched_train, total_examples = model.corpus_count, 
            epochs=1500, # best results set this 90-150
            report_delay=60, compute_loss=True)

# save word2vec model
model.save("models/w2v_limitingfactor_v3.36.model")

## Use the model

In [None]:
!pip install gensim==3.8.1

### Define inferencing functions

In [None]:
import gensim
import pandas as pd
import re
import warnings;
warnings.filterwarnings('ignore')
    
def fill_id(id):
    """Adds leading zeroes back if necessary. This makes the id match the database."""
    if len(str(id)) < 7:
        length = len(str(id))
        id = "0"*(7 - length) + str(id)
    return str(id)
    
def df_to_id_list(df, id_book):
    """Converts dataframe of movies to a list of the IDs for those movies.

    Every title in the input dataframe is checked against the local file, which
    includes all the titles and IDs in our database. For anything without a match,
    replace the non-alphanumeric characters with wildcards, and query the database
    for matches.
    """
    df['Year'] = df['Year'].astype(int).astype(str)
    matched = pd.merge(df, id_book,
                left_on=['Name', 'Year'], right_on=['primaryTitle', 'startYear'],
                how='inner')
    ids = matched['tconst'].astype(str).tolist()
    final_ratings = []
    names = df.Name.tolist()
    years = [int(year) for year in df.Year.tolist()]
    if 'Rating' in df.columns:
        stars = [int(rating) for rating in df.Rating.tolist()]
        info = list(zip(names, years, stars))
        final_ratings = matched['Rating'].astype(int).tolist()
    else:
        info = list(zip(names, years, list(range(len(years)))))
    missed = [x for x in info if x[0] not in matched['primaryTitle'].tolist()]
    for i, j, k in missed:
        i = re.sub('[^\s0-9a-zA-Z\s]+', '%', i)
        try:
            cursor_dog.execute(f"""
                SELECT movie_id, original_title, primary_title
                FROM movies
                WHERE primary_title ILIKE '{i}' AND start_year = {j}
                  OR original_title ILIKE '{i}' AND start_year = {j}
                ORDER BY runtime_minutes DESC
                LIMIT 1""")
            id = cursor_dog.fetchone()[0]
            ids.append(id)
            final_ratings.append(k)
        except:
            continue
    ids = [fill_id(id) for id in ids]
    final_ratings = [x*2 for x in final_ratings]
    ratings_dict = dict(zip(ids, final_ratings))
    return tuple([ids, ratings_dict])
    
def prep_data(ratings_df, watched_df=None, watchlist_df=None, 
                   good_threshold=4, bad_threshold=3):
    """Converts dataframes of exported Letterboxd data to lists of movie_ids.

    Parameters
    ----------
    ratings_df : pd dataframe
        Letterboxd ratings.

    watched_df : pd dataframe
        Letterboxd watch history.

    watchlist_df : pd dataframe
        Letterboxd list of movies the user wants to watch.
        Used in val_list for scoring the model's performance.

    good_threshold : int
        Minimum star rating (10pt scale) for a movie to be considered "enjoyed" by the user.

    bad_threshold : int
        Maximum star rating (10pt scale) for a movie to be considered "disliked" by the user.


    Returns
    -------
    tuple of lists of ids.
        (good_list, bad_list, hist_list, val_list)
    """
    try:
        # try to read Letterboxd user data
        # drop rows with nulls in the columns we use
        ratings_df = ratings_df.dropna(axis=0, subset=['Rating', 'Name', 'Year'])
        # split according to user rating
        good_df = ratings_df[ratings_df['Rating'] >= good_threshold]
        bad_df = ratings_df[ratings_df['Rating'] <= bad_threshold]
        neutral_df = ratings_df[(ratings_df['Rating'] > bad_threshold) & (ratings_df['Rating'] < good_threshold)]
        # convert dataframes to lists
        good_list, good_dict = df_to_id_list(good_df, id_book)
        bad_list, bad_dict = df_to_id_list(bad_df, id_book)
        neutral_list, neutral_dict = df_to_id_list(neutral_df, id_book)
    except KeyError:
        # Try to read IMDb user data
        # strip ids of "tt" prefix
        ratings_df['movie_id'] = ratings_df['Const'].str.lstrip("tt")
        # drop rows with nulls in the columns we use
        ratings_df = ratings_df.dropna(axis=0, subset=['Your Rating', 'Year'])
        # split according to user rating
        good_df = ratings_df[ratings_df['Your Rating'] >= good_threshold*2]
        bad_df = ratings_df[ratings_df['Your Rating'] <= bad_threshold*2]
        neutral_df = ratings_df[(ratings_df['Your Rating'] > bad_threshold*2) & (ratings_df['Your Rating'] < good_threshold*2)]
        # convert dataframes to lists
        good_list = good_df['movie_id'].to_list()
        bad_list = bad_df['movie_id'].to_list()
        neutral_list = neutral_df['movie_id'].to_list()
    except Exception as e:
        # can't read the dataframe as Letterboxd or IMDb user data
        print("This dataframe has columns:", ratings_df.columns)
        raise Exception(e)
        
    ratings_dict = dict(list(good_dict.items()) + list(bad_dict.items()) + list(neutral_dict.items()))

    if watched_df is not None:
        # Construct list of watched movies that aren't rated "good" or "bad"
        # First, get a set of identified IDs.
        rated_names = set(good_df.Name.tolist() + bad_df.Name.tolist() + neutral_list)
        # drop nulls from watched dataframe
        full_history = watched_df.dropna(axis=0, subset=['Name', 'Year'])
        # get list of watched movies that haven't been rated
        hist_list = df_to_id_list(full_history[~full_history['Name'].isin(rated_names)], id_book)[0]
        # add back list of "neutral" movies (whose IDs we already found before)
        hist_list = hist_list + neutral_list
    else: hist_list = neutral_list

    if watchlist_df is not None:
        try:
            watchlist_df = watchlist_df.dropna(axis=0, subset=['Name', 'Year'])
            val_list = df_to_id_list(watchlist_df, id_book)[0]
        except KeyError:
            watchlist_df = watchlist_df.dropna(axis=0, subset=['Const', 'Year'])
            watchlist_df['movie_id'] = watchlist_df['Const'].str.lstrip("tt")
            val_list = watchlist_df['movie_id'].tolist()
    else: val_list = []

    return (good_list, bad_list, hist_list, val_list, ratings_dict)

class Recommender(object):

    def __init__(self, model_path):
        """Initialize model with name of .model file"""
        self.model_path = model_path
        self.model = None
        self.cursor_dog = c # set to this notebook's connection
        self.id_book = pd.read_csv('title_basics_small.csv')

    def connect_db(self):
        """connect to database, create cursor.
        In the notebook, this isn't used, and all connections are 
        handled through the notebook's global connection at the top."""
        # connect to database
        connection = psycopg2.connect(
            database  = "postgres",
            user      = "postgres",
            password  = os.getenv('DB_PASSWORD'),
            host      = "movie-rec-scrape.cvslmiksgnix.us-east-1.rds.amazonaws.com",
            port      = '5432'
        )
        # create cursor that is used throughout

        try:
            self.cursor_dog = connection.cursor()
            print("Connected!")
        except:
            print("Connection problem chief!")

    def _get_model(self):
        """Get the model object for this instance, loading it if it's not already loaded."""
        if self.model == None:
            model_path = self.model_path
            w2v_model = gensim.models.Word2Vec.load(model_path)
            # Keep only the normalized vectors.
            # This saves memory but makes the model untrainable (read-only).
            w2v_model.init_sims(replace=True)
            self.model = w2v_model
        return self.model

    def _get_info(self, id, score=None):
        """Takes an id string and returns the movie info with a url."""
        try:
            info_query = f"""
            SELECT m.primary_title, m.start_year, r.average_rating, r.num_votes
            FROM movies m
            JOIN ratings r ON m.movie_id = r.movie_id
            WHERE m.movie_id = '{id}'"""
            self.cursor_dog.execute(info_query)
        except Exception as e:
            return tuple([f"Movie title unknown. ID:{id}", None, None, None, None, None, id])

        t = self.cursor_dog.fetchone()
        if t:
            title = tuple([t[0], t[1], f"https://www.imdb.com/title/tt{id}/", t[2], t[3], score, id])
            return title
        else:
            return tuple([f"Movie title not retrieved. ID:{id}", None, None, None, None, None, id])

    def get_most_similar_title(self, id, id_list):
        """Get the title of the most similar movie to id from id_list"""
        clf = self._get_model()
        vocab = clf.wv.vocab
        if id not in vocab:
            return ""
        id_list = [id for id in id_list if id in vocab] # ensure all in vocab
        id_book = self.id_book
        match = clf.wv.most_similar_to_given(id, id_list)
        return id_book['primaryTitle'].loc[id_book['tconst'] == int(match)].values[0]

    def predict(self, input, bad_movies=[], hist_list=[], val_list=[],
                ratings_dict = {}, checked_list=[], rejected_list=[],
                n=50, harshness=1, rec_movies=True,
                show_vibes=False, scoring=False, return_scores=False):
        """Returns a list of recommendations and useful metadata, given a pretrained
        word2vec model and a list of movies.

        Parameters
        ----------

            input : iterable
                List of movies that the user likes.

            bad_movies : iterable
                List of movies that the user dislikes.

            hist_list : iterable
                List of movies the user has seen.

            val_list : iterable
                List of movies the user has already indicated interest in.
                Example: https://letterboxd.com/tabula_rasta/watchlist/
                People really load these up over the years, and so they make for 
                the best validation set we can ask for with current resources.

            ratings_dict : dictionary
                Dictionary of movie_id keys, user rating values.

            checked_list : iterable
                List of movies the user likes on the feedback form.

            rejected_list : iterable
                List of movies the user dislikes on the feedback form.

            n : int
                Number of recommendations to return.

            harshness : int
                Weighting to apply to disliked movies.
                Ex:
                    1 - most strongly account for disliked movies.
                    3 - divide "disliked movies" vector by 3.

            rec_movies : boolean
                If False, doesn't return movie recommendations (used for scoring).

            show_vibes : boolean
                If True, prints out the dupes as a feature.
                These movies are closest to the user's taste vector, 
                indicating some combination of importance and popularity.

            scoring : boolean
                If True, prints out the validation score.
            
            return_scores : boolean
                If True, skips printing out

        Returns
        -------
        A list of tuples
            (Title, Year, IMDb URL, Average Rating, Number of Votes, Similarity score)
        """

        clf = self._get_model()
        dupes = []                 # list for storing duplicates for scoring

        def _aggregate_vectors(movies, feedback_list=[]):
            """Gets the vector average of a list of movies."""
            movie_vec = []
            for i in movies:
                try:
                    m_vec = clf[i]  # get the vector for each movie
                    if ratings_dict:
                        try:
                            r = ratings_dict[i] # get user_rating for each movie
                            # Use a polynomial to weight the movie by rating.
                            # This equation is somewhat arbitrary. I just fit a polynomial
                            # to some weights that look good. The effect is to raise
                            # the importance of 1, 2, 9, and 10 star ratings to about 1.8.
                            w = ((r**3)*-0.00143) + ((r**2)*0.0533) + (r*-0.4695) + 2.1867
                            m_vec = m_vec * w
                        except KeyError:
                            continue
                    movie_vec.append(m_vec)
                except KeyError:
                    continue
            if feedback_list:
                for i in feedback_list:
                    try:
                        f_vec = clf[i]
                        movie_vec.append(f_vec*1.8) # weight feedback by changing multiplier here
                    except KeyError:
                        continue
            return np.mean(movie_vec, axis=0)

        def _similar_movies(v, bad_movies=[], n=50):
            """Aggregates movies and finds n vectors with highest cosine similarity."""
            if bad_movies:
                v = _remove_dislikes(bad_movies, v, harshness=harshness)
            return clf.similar_by_vector(v, topn= n+1)[1:]

        def _remove_dupes(recs, input, bad_movies, hist_list=[], feedback_list=[]):
            """remove any recommended IDs that were in the input list"""
            all_rated = input + bad_movies + hist_list + feedback_list
            nonlocal dupes
            dupes = [x for x in recs if x[0] in input]
            return [x for x in recs if x[0] not in all_rated]

        def _remove_dislikes(bad_movies, good_movies_vec, rejected_list=[], harshness=1):
            """Takes a list of movies that the user dislikes.
            Their embeddings are averaged,
            and subtracted from the input."""
            bad_vec = _aggregate_vectors(bad_movies, rejected_list)
            bad_vec = bad_vec / harshness
            return good_movies_vec - bad_vec

        def _score_model(recs, val_list):
            """Returns the number of recs that were already in the user's watchlist. Validation!"""
            ids = [x[0] for x in recs]
            return len(list(set(ids) & set(val_list)))

        aggregated = _aggregate_vectors(input, checked_list)
        recs = _similar_movies(aggregated, bad_movies, n=n)
        recs = _remove_dupes(recs, input, bad_movies, hist_list, checked_list + rejected_list)
        formatted_recs = [self._get_info(x[0], x[1]) for x in recs]
        if val_list:
            if return_scores:
                return tuple([_score_model(recs, val_list), sum([i[3] for i in formatted_recs if i[3] is not None])/len(formatted_recs)])
            elif scoring:
                print(f"The model recommended {_score_model(recs, val_list)} movies that were on the watchlist!\n")
                print(f"\t\t Average Rating: {sum([i[3] for i in formatted_recs if i[3] is not None])/len(formatted_recs)}\n")
        if show_vibes:
            print("You'll get along with people who like: \n")
            for x in dupes:
                print(self._get_info(x[0], x[1]))
            print('\n')
        if rec_movies:
            return formatted_recs

### Prep generic test data

In [64]:
# df to lookup ids from titles
id_book = pd.read_csv('exported_data/title_basics_small.csv')

# import user Letterboxd data
ratings = pd.read_csv('exported_data/letterboxd/cooper/ratings.csv')
watched = pd.read_csv('exported_data/letterboxd/cooper/watched.csv')
watchlist = pd.read_csv('exported_data/letterboxd/cooper/watchlist.csv')

# note: if you import IMDb data, it's currently encoded 'cp1252' (but they may someday switch to utf-8)

# prep user data
good_list, bad_list, hist_list, val_list, ratings_dict = prep_data(
                                    ratings, watchlist_df=watchlist, good_threshold=4, bad_threshold=3)

In [65]:
print(len(good_list), len(bad_list), len(hist_list), len(val_list))

389 194 146 1062


In [None]:
#-------------------------------------------------------------#
# To inspect a model, enter its path here and run the cell
model_path = "models/w2v_limitingfactor_v4.12.model"
#------------------------------------------------#

model = gensim.models.Word2Vec.load(model_path)

def _get_info(id):
            """Takes an id string and returns the movie info with a url."""
            try:
                c.execute(f"""
                select m.primary_title, m.start_year, r.average_rating, r.num_votes
                from movies m
                join ratings r on m.movie_id = r.movie_id
                where m.movie_id = '{id[0]}'""")
            except:
                return tuple([f"Movie title unknown. ID:{id[0]}", None, None, None, None, None])

            t = c.fetchone()
            if t:
                title = tuple([t[0], t[1], f"https://www.imdb.com/title/tt{id[0]}/", t[2], t[3], id[1]])
                return title
            else:
                return tuple([f"Movie title unknown. ID:{id[0]}", None, None, None, None, None])

print(model)
print(f"""\t
            corpus_count: {model.corpus_count}
            corpus_total_words: {model.corpus_total_words}
            window: {model.window}
            sg: {model.sg}
            hs: {model.hs}
            negative: {model.negative}
            ns_exponent: {model.ns_exponent}
            alpha: {model.alpha}
            min_alpha: {model.min_alpha}
            sample: {model.sample}
            epochs: {model.epochs}
            """)
print("Most similar movies to Porco Rosso minus Kiki's Delivery Service?")
movies = model.similar_by_vector(model['0104652']
                                -model['0097814']
                                 ,topn=11)
for i in movies:
    print(_get_info(i))

## Best model so far

LimitingFactor_v3.51 is the model to beat. It performs consistently well across various tests, which are detailed below.

In [96]:
s = Recommender('w2v_limitingfactor_v3.51.model')
s.predict(good_list, bad_list, hist_list, val_list, ratings_dict, n=100, harshness=1, rec_movies=False, scoring=True,)
# s.predict(aj2, n=100, harshness=1)

The model recommended 25 movies that were on the watchlist!

		 Average Rating: 7.891489361702127



### Add test cases

In [17]:
# Early test cases

# A list of some Coen Bros movies.
coen_bros = ['116282', '2042568', '1019452', 
             '1403865', '190590', '138524', 
             '335245', '477348', '887883', '101410']

# Data scientist's recent watches.
cooper_recent = ['0053285', '0038650', '0046022', 
                 '4520988', '1605783', '6751668', 
                 '0083791', '0115685', '0051459', 
                 '8772262', '0061184', '0041959',
                 '7775622']

# dirkh public letterboxd recent watches.
dirkh = ['7975244', '8106534', '1489887', 
         '1302006', '7286456', '6751668', 
         '8364368', '2283362', '6146586', 
         '2194499', '7131622', '6857112']

# Marvin watches
marvin = ['7286456', '0816692', '2543164', '2935510', 
          '2798920', '0468569', '5013056', '1375666', 
          '3659388', '0470752', '0266915', '0092675', 
          '0137523', '0133093', '1285016']  

# Gabe watches
gabe = ['6292852','0816692','2737304','3748528',
        '3065204','4154796','1536537','1825683',
        '1375666','8236336','2488496','1772341',
        '0317705','6857112','5052448']

# Eric watches
eric = ['2974050','1595842','0118539','0093405',
        '3216920','1256535','5612742','3120314',
        '1893371','0046248','0058548','0199481',
        '2296777','0071198','0077834']

chuckie = ['4263482',
'0084787',
'3286052',
'5715874',
'1172994',
'4805316',
'3139756',
'8772262',
'7784604',
'1034415',]

harlan = ['1065073','5052448','0470752','5688932','1853728','1596363','0432283','6412452','4633694','9495224','0443453','0063823',
          '0066921','0405296','1130884','1179933','0120630','0268126','0137523','0374900','8772262','0116996','0107290','7339248']

ryan = ['0166924','2866360','0050825','2798920','3416742','0060827','1817273','0338013','0482571','5715874','2316411','4550098']

karyn = ['4425200','0464141','1465522','0093779','0099810','0076759','3748528','6763664','0317740','2798920','0096283','0258463','0118799','0058092','0107290','0045152','0106364']

richard = ['0074119','0064115','0070735','0080474','0061512','0067774','0057115','0070511','0081283',
           '0065126','0068421','0078227','0079100','0078966','0081696','0082085','0072431','0075784',
           '0093640','0098051','0094226','0097576','0099810','0081633','0080761','0077975','0085244','0095159','0101969']

joe = ['6335734','0291350','0113568','0208502','0169858','0095327','0097814','0983213','0094625','7089878']

lena = ['1990314','3236120','1816518','0241527','0097757','0268978','0467406','2543164','2245084','3741834']

wade = ['0118665','0270846','0288441','2287250','2287238','8668804','9448868','1702443','1608290','5519340']

aj1 = ['0087995','0118694','0181689','0061184','0063032','2402927','4633694','0058946','0103074','0060196',
       '2543164','0109445','0245429','5105250','0088846','0370986','0246578','0053114','0014429','0047478',
       '0081505','2396224','0054215','1259521','0096283','0095159','0093779','0087544']

aj2 = ['0173716','0086541','0119809','0109445','0112887','0120879','0081455','0079813','0087995','0156610',
       '0097940','0089886','0088846','0090967','1523483','0109424','0102536','0105793','0246578','0370986']

# Score the models on various test cases

In [18]:
Recommender('w2v_limitingfactor_v3.model').predict(good_list, bad_list, hist_list, val_list, ratings_dict, n=100, harshness=1, rec_movies=False, scoring=True,)

The model recommended 22 movies that were on the watchlist!

		 Average Rating: 7.798148148148148



The below cell is your one-stop shop for scoring the various trained models and comparing their performance. Before using it, check the following:

    - The database connection is live (top cell of this notebook).
    
    - You have run the two cells under the headings "Define inferencing functions" and "prep generic data"
    
    - `id_book` is reading a CSV containing data from the IMDb title.basics.tsv document found [here](https://datasets.imdbws.com/).
    
    - `test_users` list contains the names of folders containing users' Letterboxd data that you want to use for scoring.
    
    - `model_list` contains the names of the versions you want to score.
    
Scoring is acheived using two metrics:
    1. Watchlist validation: How many movies were found that the user has already indicated interest in?
    2. Avg. Rating: What is the average rating of the movies recommended?
    
The cell below scores all models with these metrics and records the results in three dataframes. The first two dataframes, match_test_results and rating_test_results, use only different configurations of the 'cooper' data. This is meant to demonstrate various scenarios where more or less data are provided from the user, and different settings are selected. The third dataframe records scoring from both metrics across all users you care to test. For the sake of privacy, I have not included my test data with this repo, so future experimenters will have to gather their own Letterboxd data.

In [44]:
# log test results for 3 tests.

# df to lookup ids from titles
id_book = pd.read_csv('exported_data/title_basics_small.csv')

# define "user-test" test cases. set 'cooper' to last so it defines good_list, bad_list etc. for the cooper test cases
# each name must correspond to a folder of letterboxd data under "exported_data"
'''EXAMPLE file structure
current directory
    |
    thisnotebook.ipynb
    w2v_someversion.model
    ...
    |
    /exported_data
        |
        /eric
        ... other folders
        /cooper
            |
            ratings.csv
            watched.csv
            watchlist.csv
'''
test_users = ['eric', 'wade', 'aj', 'kelly', 'cooper'] 

# these names must match the file names of the versions you want to test, without prefix or extensions
model_list = ['mistakenot',
              'limitingfactor_v1', 'limitingfactor_v2', 
              'limitingfactor_v3', 'limitingfactor_v3.5', 'limitingfactor_v3.51', 'limitingfactor_v3.6', 
              'limitingfactor_v4', 'limitingfactor_v4.1', 'limitingfactor_v4.12']

###########################################################################################
# Nothing below this needs to be configured, unless you want to change the tests themselves.
###########################################################################################

# import user Letterboxd data
test_users_data = {}
for user in test_users:
    user_data = {}
    path = f"exported_data/letterboxd/{user}/"
    ratings = pd.read_csv(f'{path}ratings.csv')
    watched = pd.read_csv(f'{path}watched.csv')
    watchlist = pd.read_csv(f'{path}watchlist.csv')
    good_list, bad_list, hist_list, val_list, ratings_dict = prep_data(
                                    ratings, watched_df=watched, watchlist_df=watchlist, good_threshold=4, bad_threshold=3)
    user_data.update([("good_list", good_list), ("bad_list", bad_list), 
                      ("hist_list", hist_list), ("val_list", val_list), 
                      ("ratings_dict", ratings_dict)])
    test_users_data.update([(str(user), user_data)])

# empty dictionary to use when user elects not to give extra weight to extreme ratings
no_weights = {}

# define "cooper" test cases
params = [
    (cooper_recent, [], 1, ratings_dict), #1
    (cooper_recent, bad_list, 1, ratings_dict),
    (cooper_recent, bad_list, 2, ratings_dict),
    (cooper_recent, bad_list, 3, ratings_dict),
    (good_list, [], 1, ratings_dict), #5
    (good_list, bad_list, 1, ratings_dict),
    (good_list, bad_list, 2, ratings_dict),
    (good_list, bad_list, 3, ratings_dict),
    (cooper_recent, [], 1, no_weights),
    (cooper_recent, bad_list, 1, no_weights), #10
    (cooper_recent, bad_list, 2, no_weights),
    (cooper_recent, bad_list, 3, no_weights),
    (good_list, [], 1, no_weights),
    (good_list, bad_list, 1, no_weights),
    (good_list, bad_list, 2, no_weights), #15
    (good_list, bad_list, 3, no_weights),
#     (good_3p5_list, [], 1),
#     (good_3p5_list, bad_list, 1), #10
#     (good_3p5_list, bad_list, 2),
#     (good_3p5_list, bad_list, 3),
#     (cooper_recent, karyn, 1),
#     (good_list, karyn, 1),
#     (good_3p5_list, karyn, 1), #15
]

mods = dict(zip(model_list, ["w2v_" + x + ".model" for x in model_list])) # zip model paths with their names

# attributes/training hyperparameters we want to tabulate
attr_list = ['vector_size', 'corpus_count', 'corpus_total_words', 
             'window', 'sg', 'hs', 'negative', 
             'alpha', 'min_alpha', 
             'sample', 'epochs']

# the three tests we will run
match_test_results = pd.DataFrame({'model':model_list}) # Test different settings for "cooper", score on watchlist items found
rating_test_results = pd.DataFrame({'model':model_list}) # Test different settings for "cooper", score on avg. rating of recs
user_test_results = pd.DataFrame({'model':model_list}) # Test different users on the same settings, score on watchlist items found


# Get model training parameters for models
for attr in attr_list:
    match_test_results[str(attr)] = match_test_results['model'].apply(lambda x: getattr(Recommender(mods[x])._get_model(), attr))
    rating_test_results[str(attr)] = match_test_results[str(attr)]

# add ns_exponent parameter, which is buried in the model.vocabulary (can't be gotten with getattr())
for df in [match_test_results, rating_test_results]:
    df['ns_exponent'] = df['model'].apply(lambda x: Recommender(mods[x])._get_model().vocabulary.ns_exponent)

# get match scores, average ratings, attributes for all "cooper" test cases
count = 0
for i, j, k, m in params:
        count+=1
        print(count, "\t")
        match_test_results[str(count)] = match_test_results['model'].apply(
                                                lambda x: Recommender(mods[x]).predict(input=i, bad_movies=j, 
                                                                                          val_list=val_list, ratings_dict=m, 
                                                                                          n=100, harshness=k, 
                                                                                          rec_movies=False, show_vibes=False, 
                                                                                          scoring=True, return_scores=True))
        rating_test_results[str(count)] = match_test_results[str(count)].apply(lambda x: x[1])
        match_test_results[str(count)] = match_test_results[str(count)].apply(lambda x: x[0])
        
for user, data in test_users_data.items():
    print(str(user))
    user_test_results[str(user)] = user_test_results['model'].apply(
                                                lambda x: Recommender(mods[x]).predict(input=data['good_list'], 
                                                                                        bad_movies=data['bad_list'], 
                                                                                        val_list=data['val_list'], 
                                                                                        ratings_dict=data['ratings_dict'], 
                                                                                        n=100, harshness=1, 
                                                                                        rec_movies=False, show_vibes=False, 
                                                                                        scoring=True, return_scores=True))
        

1 	
2 	
3 	
4 	
5 	
6 	
7 	
8 	
9 	
10 	
11 	
12 	
13 	
14 	
15 	
16 	
eric
wade
aj
lena
cooper


## Test cases legend:

```
1 	(cooper_recent, [], 1, ratings_dict)

2 	(cooper_recent, bad_list, 1, ratings_dict)

3 	(cooper_recent, bad_list, 2, ratings_dict)

4 	(cooper_recent, bad_list, 3, ratings_dict)

5 	(good_list, [], 1, ratings_dict)

6 	(good_list, bad_list, 1, ratings_dict) 
        # hardcore movie fan looking for something crazy

7 	(good_list, bad_list, 2, ratings_dict)

8 	(good_list, bad_list, 3, ratings_dict)

9 	(cooper_recent, [], 1, no_weights) 
        # small list of movies with only a few good movie ratings

10 	(cooper_recent, bad_list, 1, no_weights) 
        # small list of movies with some bad movies too

11 	(cooper_recent, bad_list, 2, no_weights)

12 	(cooper_recent, bad_list, 3, no_weights)

13 	(good_list, [], 1, no_weights) 
        # Unlikely case

14 	(good_list, bad_list, 1, no_weights) 
        # Hardcore movie fan

15 	(good_list, bad_list, 2, no_weights)

16 	(good_list, bad_list, 3, no_weights)
```

### Show test results

Overall, the best models scored seem to be limitingfactor_v1 and limitingfactor_v3.51. Unfortunately I did have the presence of mind to document the settings for the former's training data. But I believe it may have been `m=7, n=10`.

In [45]:
pd.set_option('display.max_columns', 50)
match_test_results # for each model: training params, validation score out of 100 for 16 test cases

Unnamed: 0,model,vector_size,corpus_count,corpus_total_words,window,sg,hs,negative,alpha,min_alpha,sample,epochs,ns_exponent,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
0,mistakenot,100,17812,904140,10,1,0,10,0.03,0.0007,0.001,10,0.75,12,19,12,13,5,22,23,12,12,13,11,13,4,19,23,12
1,limitingfactor_v1,100,17812,817687,10,1,0,10,0.03,0.0007,0.0001,90,0.5,16,28,26,22,5,21,11,8,17,19,24,22,5,23,11,7
2,limitingfactor_v2,100,13641,766970,10,1,0,10,0.03,0.0007,0.0001,90,0.5,16,19,20,17,5,21,11,7,15,18,18,17,5,21,10,7
3,limitingfactor_v3,100,35371,933359,10,1,0,10,0.03,0.0007,0.0001,90,0.5,17,23,23,20,3,22,15,9,16,22,20,19,2,21,14,9
4,limitingfactor_v3.5,100,35371,933359,10,1,0,10,0.03,0.0007,0.0001,90,0.35,17,25,22,21,5,22,14,10,17,23,23,20,5,20,15,9
5,limitingfactor_v3.51,100,35371,933359,10,1,0,10,0.03,0.0007,0.0001,500,0.35,16,26,22,19,5,25,13,9,15,25,24,20,4,26,14,9
6,limitingfactor_v4,300,35371,933359,10,1,0,10,0.03,0.0007,0.0001,30,0.5,2,7,3,3,0,2,1,1,1,2,2,1,0,2,1,1
7,limitingfactor_v4.1,300,35371,933359,10,1,0,10,0.03,0.0007,0.0001,100,0.5,3,14,9,6,0,9,0,0,3,13,10,6,0,12,0,0
8,limitingfactor_v4.12,300,35371,933359,10,1,0,10,0.03,0.0007,0.0001,500,0.5,7,23,17,14,1,18,1,0,6,21,17,16,1,18,1,0


In [46]:
rating_test_results # for each model: training params, average rating of movies recommended for 16 test cases

Unnamed: 0,model,vector_size,corpus_count,corpus_total_words,window,sg,hs,negative,alpha,min_alpha,sample,epochs,ns_exponent,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
0,mistakenot,100,17812,904140,10,1,0,10,0.03,0.0007,0.001,10,0.75,7.069792,7.339175,7.195745,7.129474,7.153659,7.633333,7.628378,7.454054,7.26701,7.195918,7.278125,7.257895,7.13494,7.604444,7.647945,7.443243
1,limitingfactor_v1,100,17812,817687,10,1,0,10,0.03,0.0007,0.0001,90,0.5,7.478495,7.604255,7.496667,7.505495,8.167742,7.665152,8.01,8.088889,7.512903,7.494681,7.554945,7.531868,8.167742,7.602857,7.984375,8.115385
2,limitingfactor_v2,100,13641,766970,10,1,0,10,0.03,0.0007,0.0001,90,0.5,7.445161,7.488542,7.441935,7.448913,8.128125,7.667188,7.985714,8.062963,7.427957,7.482105,7.537363,7.445055,8.125806,7.574627,7.966667,8.062963
3,limitingfactor_v3,100,35371,933359,10,1,0,10,0.03,0.0007,0.0001,90,0.5,7.567033,7.437634,7.577174,7.588889,8.2,7.807273,8.025,8.137037,7.554348,7.476596,7.550538,7.595699,8.210714,7.728333,8.016129,8.139286
4,limitingfactor_v3.5,100,35371,933359,10,1,0,10,0.03,0.0007,0.0001,90,0.35,7.629348,7.482796,7.605435,7.60989,8.124138,7.784211,8.023333,8.144828,7.651064,7.444681,7.607527,7.541935,8.126667,7.720968,8.043333,8.164286
5,limitingfactor_v3.51,100,35371,933359,10,1,0,10,0.03,0.0007,0.0001,500,0.35,7.536264,7.486022,7.566667,7.555556,8.103448,7.9,8.148148,8.045833,7.582796,7.581915,7.578495,7.566304,8.103333,7.877778,8.15,8.045833
6,limitingfactor_v4,300,35371,933359,10,1,0,10,0.03,0.0007,0.0001,30,0.5,6.874227,7.027174,6.767742,6.823656,6.973958,7.329032,7.302151,7.257292,6.812121,6.769565,6.722581,6.723958,6.954167,7.303158,7.298936,7.298969
7,limitingfactor_v4.1,300,35371,933359,10,1,0,10,0.03,0.0007,0.0001,100,0.5,6.589011,7.269663,6.933708,6.652222,6.509302,7.366129,6.577465,6.469737,6.558696,7.335955,6.823596,6.542222,6.50814,7.396923,6.576056,6.5
8,limitingfactor_v4.12,300,35371,933359,10,1,0,10,0.03,0.0007,0.0001,500,0.5,6.94382,7.451685,7.237079,7.185227,6.511538,7.604348,6.418644,6.423881,7.009091,7.477273,7.338636,7.27931,6.4725,7.614286,6.4375,6.415152


In [47]:
user_test_results # for each model: validation score and avg. rating for a user's inputs.

Unnamed: 0,model,eric,wade,aj,lena,cooper
0,mistakenot,"(1, 7.299999999999996)","(4, 7.682474226804127)","(6, 7.234375000000001)",,"(22, 7.633333333333332)"
1,limitingfactor_v1,"(3, 7.618888888888891)","(6, 7.460465116279073)","(3, 7.043333333333335)",,"(21, 7.665151515151515)"
2,limitingfactor_v2,"(1, 7.59186046511628)","(10, 7.539325842696628)","(6, 7.212765957446808)",,"(21, 7.667187500000001)"
3,limitingfactor_v3,"(4, 7.673493975903615)","(11, 7.539759036144578)","(5, 7.288043478260868)",,"(22, 7.807272727272728)"
4,limitingfactor_v3.5,"(4, 7.746428571428571)","(10, 7.508045977011496)","(6, 7.327472527472528)",,"(22, 7.784210526315789)"
5,limitingfactor_v3.51,"(5, 7.801219512195122)","(15, 7.611538461538464)","(6, 7.559340659340662)",,"(25, 7.8999999999999995)"
6,limitingfactor_v4,"(0, 6.976842105263159)","(5, 7.422222222222221)","(1, 6.941935483870969)",,"(2, 7.329032258064515)"
7,limitingfactor_v4.1,"(0, 7.341025641025642)","(8, 7.38875)","(4, 6.950000000000001)",,"(9, 7.366129032258064)"
8,limitingfactor_v4.12,"(3, 7.556944444444446)","(11, 7.363636363636366)","(4, 7.296590909090906)",,"(18, 7.604347826086957)"


In [41]:
# # Export test results
match_test_results.to_csv("w2v_match_test_results.csv", index=False)
rating_test_results.to_csv("w2v_rating_test_results.csv", index=False)
user_test_results.to_csv("w2v_user_test_results.csv", index=False)

# Beyond this point: Recommendation examples (long output)

## Best settings for LimitingFactor_v1

In [38]:
# focus on the best settings for LimitingFactor_v1
params = [
    (cooper_recent, bad_list, 2), #3
    (cooper_recent, bad_list, 3), #4
    (good_list, bad_list, 1), #6
    (good_3p5_list, bad_list, 1), #10
    (good_list, karyn, 1), #14
    (good_3p5_list, karyn, 1), #15
]

count = 0
settings = [3, 4, 6, 10, 14, 15]
# Test all the promising parameter sets
for i, j, k in params:
    print(settings[count], "\t")
    prediction = s.predict(input=i, bad_movies=j, n=100, harshness=k, rec_movies=True, show_vibes=False, scoring=True)
    for i in prediction:
        print(f"{i[0]}\t{i[1]}\n\t{i[2]}\n\tAvg. Rating: {i[3]}\n\t# Votes: {i[4]}\n\tSimilarity: {i[5]}\n\n")
    count+=1

"""
Settings graded out of 100:
3... 91
4... 86
6... 89 (lots of old stuff)
10.. 90
14.. 91
15.. 90

I'm inclined to pronounce model LimitingFactor_v1 a success, because it gives 
diverse and high-quality results in more cases than its predecessor.
In 6 cases it gives 20+ watchlisted movies, where before we only got 3.

In other words, it gives good results in cases where input data may be rich or sparse.

"""


3 	
The model recommended 23 movies that were on the watchlist!

		 Average Rating: 7.444999999999996

Marriage Story	2019
	https://www.imdb.com/title/tt7653254/
	Avg. Rating: 8.1
	# Votes: 130590
	Similarity: 0.6401818990707397


Joker	2019
	https://www.imdb.com/title/tt7286456/
	Avg. Rating: 8.6
	# Votes: 620299
	Similarity: 0.6330293416976929


The Lighthouse	2019
	https://www.imdb.com/title/tt7984734/
	Avg. Rating: 7.8
	# Votes: 52091
	Similarity: 0.6307841539382935


Apollo 11	2019
	https://www.imdb.com/title/tt8760684/
	Avg. Rating: 8.2
	# Votes: 15085
	Similarity: 0.6201057434082031


Once Upon a Time... in Hollywood	2019
	https://www.imdb.com/title/tt7131622/
	Avg. Rating: 7.8
	# Votes: 342653
	Similarity: 0.6006509065628052


Knives Out	2019
	https://www.imdb.com/title/tt8946378/
	Avg. Rating: 8.0
	# Votes: 124017
	Similarity: 0.6003093719482422


A Star Is Born	2018
	https://www.imdb.com/title/tt1517451/
	Avg. Rating: 7.7
	# Votes: 293127
	Similarity: 0.5982445478439331


The

### Compare two test cases for v1

In [32]:
#Compare the unique recs of settings 7 and 11 from above.
#The param difference is that setting 7 uses movies rated 4 and above,
#while setting 11 uses movies rated 3.5 and above.

p4 = s.predict(input=good_list, bad_movies=bad_list, n=200, harshness=2, scoring=True)
p35 = s.predict(input=good_3p5_list, bad_movies=bad_list, n=200, harshness=2, scoring=True)

print(len(p4), len(p35))

p4_unique = [x for x in p4 if x[0] not in [x[0] for x in p35]]
p35_unique = [x for x in p35 if x[0] not in [x[0] for x in p4]]

print(len(p4_unique), len(p35_unique))


The model recommended 29 movies that were on the watchlist!

		 Average Rating: 7.786516853932587

The model recommended 27 movies that were on the watchlist!

		 Average Rating: 7.732142857142859

89 84
27 22


In [33]:
for i in p4_unique:
    print(f"{i[0]}\t{i[1]}\n\t{i[2]}")

Le parfum d'Yvonne	1994
	https://www.imdb.com/title/tt0110776/
Beautiful	1951
	https://www.imdb.com/title/tt0043332/
Journey to Italy	1954
	https://www.imdb.com/title/tt0046511/
A Story from Chikamatsu	1954
	https://www.imdb.com/title/tt0046851/
Manhattan	1979
	https://www.imdb.com/title/tt0079522/
The Circus	1928
	https://www.imdb.com/title/tt0018773/
Cold War	2018
	https://www.imdb.com/title/tt6543652/
Aguirre, the Wrath of God	1972
	https://www.imdb.com/title/tt0068182/
La Dolce Vita	1960
	https://www.imdb.com/title/tt0053779/
The Thin Blue Line	1988
	https://www.imdb.com/title/tt0096257/
Lancelot of the Lake	1974
	https://www.imdb.com/title/tt0071737/
Conversation Piece	1974
	https://www.imdb.com/title/tt0071585/
4 Months, 3 Weeks and 2 Days	2007
	https://www.imdb.com/title/tt1032846/
Romero	1989
	https://www.imdb.com/title/tt0098219/
Stagecoach	1939
	https://www.imdb.com/title/tt0031971/
Miller's Crossing	1990
	https://www.imdb.com/title/tt0100150/
Ankur: The Seedling	1974
	https:

In [212]:
for i in p35_unique:
    print(f"{i[0]}\t{i[1]}\n\t{i[2]}")

Once Upon a Time in High School: The Spirit of Jeet Kune Do	2004
	https://www.imdb.com/title/tt0390205/
Chillar Party	2011
	https://www.imdb.com/title/tt1841542/
Angel's Egg	1985
	https://www.imdb.com/title/tt0208502/
Unacknowledged	2017
	https://www.imdb.com/title/tt6400614/
Lootera	2013
	https://www.imdb.com/title/tt2224317/
Awe!	2018
	https://www.imdb.com/title/tt7797658/
Vada Chennai	2018
	https://www.imdb.com/title/tt5959980/
Aruvi	2016
	https://www.imdb.com/title/tt5867800/
Big Fish & Begonia	2016
	https://www.imdb.com/title/tt1920885/
Thithi	2015
	https://www.imdb.com/title/tt4881362/
Boy and the World	2013
	https://www.imdb.com/title/tt3183630/
Neon Genesis Evangelion: The End of Evangelion	1997
	https://www.imdb.com/title/tt0169858/
That Girl in Yellow Boots	2010
	https://www.imdb.com/title/tt1580704/
The Breath	2009
	https://www.imdb.com/title/tt1171701/
The Stupids	1996
	https://www.imdb.com/title/tt0117768/
To the Wonder	2012
	https://www.imdb.com/title/tt1595656/
The Meyer

### Examine ideal case

In [34]:
prediction = s.predict(input=good_list, bad_movies=bad_list, n=200, harshness=2)
for i in prediction:
    print(f"{i[0]}\t{i[1]}\n\t{i[2]}\n\tAvg. Rating: {i[3]}\n\t# Votes: {i[4]}\n\tSimilarity: {i[5]}\n\n")

One Flew Over the Cuckoo's Nest	1975
	https://www.imdb.com/title/tt0073486/
	Avg. Rating: 8.7
	# Votes: 860775
	Similarity: 0.637021541595459


The Devil, Probably	1977
	https://www.imdb.com/title/tt0075938/
	Avg. Rating: 7.3
	# Votes: 3150
	Similarity: 0.6019104719161987


Do the Right Thing	1989
	https://www.imdb.com/title/tt0097216/
	Avg. Rating: 7.9
	# Votes: 78577
	Similarity: 0.5995447635650635


Chinatown	1974
	https://www.imdb.com/title/tt0071315/
	Avg. Rating: 8.2
	# Votes: 275306
	Similarity: 0.5974133014678955


Sunset Blvd.	1950
	https://www.imdb.com/title/tt0043014/
	Avg. Rating: 8.4
	# Votes: 186742
	Similarity: 0.5961176753044128


On the Waterfront	1954
	https://www.imdb.com/title/tt0047296/
	Avg. Rating: 8.1
	# Votes: 132973
	Similarity: 0.5904337763786316


La Collectionneuse	1967
	https://www.imdb.com/title/tt0061495/
	Avg. Rating: 7.5
	# Votes: 5523
	Similarity: 0.5855226516723633


The Elephant Man	1980
	https://www.imdb.com/title/tt0080678/
	Avg. Rating: 8.1
	# Vo

## Best settings for v2

In [33]:
"""LimitingFactor_v2 is trained on movies rated 8 stars and above. The first thing I notice 
about it is that, using my movie history, whenever its recommendations get a high
average score, it strays farther from my watchlist. So this is where it may become 
challenging to evaluate the model: are we optimizing for stuff I haven't heard of, or average rating?"""

# focus on the best settings for LimitingFactor_v2
params = [
    (good_list, bad_list, 3), #8
    (good_3p5_list, bad_list, 2), #11
    (good_3p5_list, bad_list, 3)  #12
]

count = 0
settings = [8, 11, 12]
# Test all the promising parameter sets
for i, j, k in params:
    print(settings[count], "\t")
    prediction = s.predict(input=i, bad_movies=j, n=100, harshness=k, rec_movies=True, show_vibes=False, scoring=True)
    for i in prediction:
        print(f"{i[0]}\t{i[1]}\n\t{i[2]}\n\tAvg. Rating: {i[3]}\n\t# Votes: {i[4]}\n\tSimilarity: {i[5]}\n\n")
    count+=1

"""
Wow. Model v2 is crazily overfitted. Which means that it returns so many duplicates that it recommends nothing new.
This technically shouldn't be happening, because we are actively removing duplicates. But _remove_dupes() cannot,
as of this time, remove titles that don't have ILIKE matches in the database. At n=25, we get a div0 error because
the recs list is empty. So at n=30 we're getting just the unhandled dupes as a result.

Conclusions:
1. v2 is overfitted.
2. Average Rating is not a good metric to use for v2.
3. v2 fails the watchlist test spectacularly due to (1).

As a result, we should train on ratings 7-10.

"""


8 	
The model recommended 7 movies that were on the watchlist!

		 Average Rating: 7.970000000000001

Schindler's List	1993
	https://www.imdb.com/title/tt0108052/
	Avg. Rating: 8.9
	# Votes: 1138968
	Similarity: 0.6192294359207153


One Flew Over the Cuckoo's Nest	1975
	https://www.imdb.com/title/tt0073486/
	Avg. Rating: 8.7
	# Votes: 860775
	Similarity: 0.6185670495033264


The Last Supper	1976
	https://www.imdb.com/title/tt0075363/
	Avg. Rating: 7.5
	# Votes: 518
	Similarity: 0.6030973196029663


Pan's Labyrinth	2006
	https://www.imdb.com/title/tt0457430/
	Avg. Rating: 8.2
	# Votes: 588428
	Similarity: 0.5942332744598389


Casino	1995
	https://www.imdb.com/title/tt0112641/
	Avg. Rating: 8.2
	# Votes: 432378
	Similarity: 0.5923681855201721


78/52: Hitchcock's Shower Scene	2017
	https://www.imdb.com/title/tt4372240/
	Avg. Rating: 7.3
	# Votes: 2652
	Similarity: 0.5904847383499146


Chinatown	1974
	https://www.imdb.com/title/tt0071315/
	Avg. Rating: 8.2
	# Votes: 275306
	Similarity: 0.



"\nWow. Model v2 is crazily overfitted. Which means that it returns so many duplicates that it recommends nothing new.\nThis technically shouldn't be happening, because we are actively removing duplicates. But _remove_dupes() cannot,\nas of this time, remove titles that don't have ILIKE matches in the database. At n=25, we get a div0 error because\nthe recs list is empty. So at n=30 we're getting just the unhandled dupes as a result.\n\nConclusions:\nv2 is overfitted.\nAverage Rating is not a good metric to use for v2.\nv2 fails the \n\n\n\n\n"

# Junkyard

In [None]:
# s = Recommender('w2v_limitingfactor_v3.51.model')


# possible good inputs: cooper_recent (13 most recent), good_list(my 4-5 star ratings), good_3p5_list (my 3.5-5 star ratings)
# possible bad_movies: [] (not removing dislikes), bad_list(my ratings 3 and below), karyn (individual whose taste markedly differs from mine)
# possible harshness: 1, 2, 3 (harshness 1 strongly removes disliked movies, while harshness 3 removes with a third of the strength)
# define test cases
# params = [
#     (cooper_recent, [], 1), #1
#     (cooper_recent, bad_list),
#     (cooper_recent, bad_list),
#     (cooper_recent, bad_list),
#     (good_list, [], 1), #5
#     (good_list, bad_list, 1),
#     (good_list, bad_list, 2),
#     (good_list, bad_list, 3),
#     (good_3p5_list, [], 1),
#     (good_3p5_list, bad_list, 1), #10
#     (good_3p5_list, bad_list, 2),
#     (good_3p5_list, bad_list, 3),
#     (cooper_recent, karyn, 1),
#     (good_list, karyn, 1),
#     (good_3p5_list, karyn, 1), #15
# ]
# Good performance for v2MistakeNot: 6, 7, 11
# Good performance for v3LimitingFactor: 3, 4, 6, 10, 14, 15
# Good performance for v3LimitingFactor_3.51: 2, 3, 4, 6, 14
# Setting 6 seems robust to significant hyperparameter changes.