Content-based recommenders: suggest similar items based on a particular item. This system uses item metadata, such as genre, director, description, actors, etc. for movies, to make these recommendations. The general idea behind these recommender systems is that if a person likes a particular item, he or she will also like an item that is similar to it. And to recommend that, it will make use of the user's past item metadata. A good example could be YouTube, where based on your history, it suggests you new videos that you could potentially watch.

--> Then also add Simple recommender feature:
This offer generalized recommendations to every user, based on movie popularity and/or genre. The basic idea behind this system is that movies that are more popular and critically acclaimed will have a higher probability of being liked by the average audience. An example could be IMDB Top 250.

Source: https://github.com/codeheroku/Introduction-to-Machine-Learning/blob/master/Building%20a%20Movie%20Recommendation%20Engine/Assignment%20Solution.ipynb

In [2]:
#Content based recommender

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [3]:
###### helper functions. Use them when needed #######
def get_title_from_index(index):
	return df[df.index == index]["vid_id"].values[0]

def get_index_from_videoID(vid_id):
	return df[df.vid_id == vid_id].index[0]

    
##################################################

In [4]:
##Step 1: Read CSV File
df = pd.read_csv("df_content_v2.csv")
print(df.columns)

Index(['vid_title', 'vid_id', 'category', 'duration', 'views', 'likes', 'url',
       'description', 'tags'],
      dtype='object')


In [5]:
##Step 2: Select Features that are could be important

features = ['vid_title','category','description','tags']

In [6]:
##Step 3: Create a column in DF which combines all selected features
for feature in features:
	df[feature] = df[feature].fillna('')

def combine_features(row):
	try:
		return row['vid_title'] +" "+row['category']+" "+row["description"]+" "+row["tags"]
	except:
		print("Error:", row	)

df["combined_features"] = df.apply(combine_features,axis=1)

print("Combined Features:", df["combined_features"].head())

Combined Features: 0    5 Minute Abs Desk Workout | Exercises For a Fl...
1    Real Time Desk Exercises &amp; Stretches - Ask...
2    Standing Desk Workout - 5 Exercises to do at W...
3    Easy 10-Minute Workout You Can Do at Your Desk...
4    How to Get Your Daily Workout at Your Desk | W...
Name: combined_features, dtype: object


In [7]:
##Step 4: Create count matrix from this new combined column
cv = CountVectorizer()

count_matrix = cv.fit_transform(df["combined_features"])
print(count_matrix)

  (0, 6725)	5
  (0, 680)	10
  (0, 3120)	5
  (0, 11418)	22
  (0, 3842)	2
  (0, 4207)	6
  (0, 4138)	4
  (0, 1613)	3
  (0, 10390)	3
  (0, 9077)	2
  (0, 5447)	3
  (0, 7676)	1
  (0, 10490)	7
  (0, 10510)	1
  (0, 11577)	5
  (0, 10684)	1
  (0, 1052)	7
  (0, 1993)	1
  (0, 3982)	2
  (0, 5213)	8
  (0, 2413)	1
  (0, 7379)	6
  (0, 1303)	5
  (0, 11410)	3
  (0, 6255)	8
  :	:
  (699, 11633)	1
  (699, 5216)	1
  (699, 6921)	1
  (699, 3994)	1
  (699, 9229)	1
  (699, 1005)	1
  (699, 1267)	1
  (699, 9653)	1
  (699, 9996)	1
  (699, 9170)	1
  (699, 8580)	1
  (699, 10922)	1
  (699, 7022)	1
  (699, 8761)	1
  (699, 3194)	1
  (699, 8726)	1
  (699, 1239)	1
  (699, 9651)	1
  (699, 1416)	1
  (699, 3449)	1
  (699, 1270)	2
  (699, 3683)	1
  (699, 6675)	1
  (699, 4572)	1
  (699, 10122)	1


In [8]:
##Step 5: Compute the Cosine Similarity based on the count_matrix
cosine_sim = cosine_similarity(count_matrix) 
video_user_likes = "GwRzjFQa_Og"

In [9]:
## Step 6: Get index of this movie from its title
video = get_index_from_videoID(video_user_likes)
similar_videos =  list(enumerate(cosine_sim[video]))
print(video)

501


In [10]:
## Step 7: Get a list of similar movies in descending order of similarity score
sorted_similar_videos = sorted(similar_videos,key=lambda x:x[1],reverse=True)

In [11]:
sort_by_likes = sorted(sorted_similar_videos,key=lambda x:df["views"][x[0]],reverse=True)

In [12]:
i=0
recommendations = []
print("Suggesting top 5 videos in order of likes:\n")
for element in sort_by_likes:
    print(get_title_from_index(element[0]))
    recommendations.append(get_title_from_index(element[0]))
    i=i+1
    if i>5:
        break

Suggesting top 5 videos in order of likes:

07d2dXHYb94
RKU6x1n9Hak
rZi_8t0xK44
afghBre8NlI
DODLEX4zzLQ
Tb4B0Fc8V_A


In [13]:
## Step 8: Print titles of first 50 movies
recommendations = []
i=0
for element in sorted_similar_videos:
	recommendations.append(get_title_from_index(element[0]))
	i += 1
	if i>10:
		break

In [15]:
#filter dataframe for recommended videos:
df[df['vid_id'].isin(recommendations)]

Unnamed: 0,vid_title,vid_id,category,duration,views,likes,url,description,tags,combined_features
656,Award Winning Hindi Short Film | The Blue Helm...,kHq0jniKmIE,Short Movie,9.97,222396,5966,https://www.youtube.com/watch?v=kHq0jniKmIE,"Mira, mid fifties lives alone in an apartment ...","['short film', 'short film hindi', 'hindi shor...",Award Winning Hindi Short Film | The Blue Helm...
658,Dear Biwi ( Short Film ) | Rahim Pardesi | Hee...,iFuY20IERq8,Short Movie,9.45,1384524,65139,https://www.youtube.com/watch?v=iFuY20IERq8,Dear Biwi ( Short Film ) | Rahim Pardesi | Hee...,"['rahim pardesi', 'new rahim pardesi video', '...",Dear Biwi ( Short Film ) | Rahim Pardesi | Hee...
659,Painkiller | Dark Comedy Short Film | MYM,HgZIYY1Ty7o,Short Movie,14.5,1510439,32423,https://www.youtube.com/watch?v=HgZIYY1Ty7o,"A ""dark comedy short film"" which sees street-s...","['painkiller short film', 'dark comedy short f...",Painkiller | Dark Comedy Short Film | MYM Shor...
666,Award Winning Hindi Short Film | Masala Steps ...,24Db-vgCsDE,Short Movie,20.93,235225,5355,https://www.youtube.com/watch?v=24Db-vgCsDE,An investment advisor Paresh (Vikram Kochhar) ...,"['hindi short films award winning', 'husband w...",Award Winning Hindi Short Film | Masala Steps ...
667,Stationary (2020) | Drama Short Film | MYM,WSvsRe4hqCs,Short Movie,12.78,172883,3800,https://www.youtube.com/watch?v=WSvsRe4hqCs,Taking place in a parked car over the course o...,"['short film', 'short film uk', 'short drama f...",Stationary (2020) | Drama Short Film | MYM Sho...
676,JOY (2020) | Drama Short Film | MYM [2K],qIwPsgP9KSg,Short Movie,23.05,34864,2694,https://www.youtube.com/watch?v=qIwPsgP9KSg,Joy is an emotional short film following the s...,"['joy 2020', 'short film 2020', 'joy', 'joy mo...",JOY (2020) | Drama Short Film | MYM [2K] Short...
679,WAJOOD (Selfhood) वजूद | Short Film | Bawra Ma...,r6qmspmMvS0,Short Movie,12.22,4360699,31449,https://www.youtube.com/watch?v=r6qmspmMvS0,"We all deserve to be loved, don't we? \nWAJOOD...","['Wajood', 'Selfhood', 'Bawra Manjhi', 'vishal...",WAJOOD (Selfhood) वजूद | Short Film | Bawra Ma...
680,SHIN-CHAN - HIMAWARI : शीन चैन 2 SHORT FILM | ...,DqbSXcLMr7c,Short Movie,12.0,215851,10560,https://www.youtube.com/watch?v=DqbSXcLMr7c,Shin Chan Is A Short Film About A Boy Shin Cha...,"['dogs love', 'short film', 'new short film', ...",SHIN-CHAN - HIMAWARI : शीन चैन 2 SHORT FILM | ...
687,The Twist | Short Film | Ritvik Sahore | Susha...,AxOe5CiNydg,Short Movie,13.35,3444180,120202,https://www.youtube.com/watch?v=AxOe5CiNydg,👉🏻 SUBSCRIBE to Zee Music Company - https://bi...,"['The Twist', 'short film', 'short movie', 'sh...",The Twist | Short Film | Ritvik Sahore | Susha...
693,ਮਾੜਾ ਸਮਾਂ true story #RishtayForever New punja...,Y5neQNTNPO8,Short Movie,10.38,10930,576,https://www.youtube.com/watch?v=Y5neQNTNPO8,"Hello friends,\nthis is Raman Saroya the owner...","['RISHTAYFOREVER', 'PUNJABI VIDEOS', 'rishtay ...",ਮਾੜਾ ਸਮਾਂ true story #RishtayForever New punja...
