# Content Based

Collaborative filtering relies solely on user-item interactions within the utility matrix. The issue with this approach is that brand new users and items with few interactions get excluded from the recommendation system. This is called the "cold start" problem. Content-based filtering handles this problem by generating recommendations based on user and item features.

## Step 1: Import the Dependencies

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

movies = pd.read_csv('data/movies.csv')

movies['genres'] = movies['genres'].apply(lambda x: x.split("|"))

def get_title(title):
    try:
        return title.split('(')[0].strip()
    except ValueError:
        return None
    
def get_year(title):
    try:
        return int(title.split('(')[-1][:4])
    except ValueError:
        return None
    
movies['year'] = movies['title'].apply(get_year)
movies['title'] = movies['title'].apply(get_title)

movies = movies.dropna()

movies = movies[movies.title != ''].reset_index()

In [2]:
movies = movies[['title', 'genres']]

In [3]:
movies.head()

Unnamed: 0,title,genres
0,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]"
1,Jumanji,"[Adventure, Children, Fantasy]"
2,Grumpier Old Men,"[Comedy, Romance]"
3,Waiting to Exhale,"[Comedy, Drama, Romance]"
4,Father of the Bride Part II,[Comedy]


In [4]:
genres = []
for i, row in movies.iterrows():
    for genre in row.genres:
        genres.append(genre)
genres = set(genres)

In [5]:
movie_genres_matrix = movies.copy()

for g in genres:
    movie_genres_matrix[g] = movies.genres.transform(lambda x: int(g in x))
    
movie_genres_matrix = movie_genres_matrix.drop('genres', axis=1)

In [6]:
from sklearn.metrics.pairwise import cosine_similarity

mgm = movie_genres_matrix.drop('title', axis=1)
X = cosine_similarity(mgm, mgm)

In [7]:
def movie_lookup(idx, movie_genres_matrix):
    return movie_genres_matrix.iloc[idx].title

In [8]:
movie_lookup(8, movie_genres_matrix)

'Sudden Death'

In [9]:
X[8]

array([0.        , 0.        , 0.        , ..., 0.        , 0.70710678,
       0.        ])

In [10]:
idx = 8
n_recommendations = 10
sim_scores = list(enumerate(X[idx]))
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
sim_scores = sim_scores[1:(n_recommendations+1)]
similar_movies = [i[0] for i in sim_scores]
similar_movies

[movie_lookup(i, movie_genres_matrix) for i in similar_movies]

['Fair Game',
 'Under Siege 2: Dark Territory',
 'Hunted, The',
 'Bloodsport 2',
 'Best of the Best 3: No Turning Back',
 'Double Team',
 'Steel',
 'Knock Off',
 'Avalanche',
 'Aces: Iron Eagle III']