# Requirements of Totality  Corp:
<b>Assignment 1 </b><br>(Design Recommendation System Architecture)<br>
Content discovery is a vital part for the Yovo. Users often don't know what they want to watch and need a way to discover content without searching for it. <b>Create a feed personalisation algorithm strategy which enables users to discover the right content. Underlying algorithm must strike an elegant balance between Machine Learning and giving the user control over what content they want to see.</b>
Note:
- Video has certain text attributes: tags, category, title text (context) - Use pseudocode wherever necessary
- Download ​yovo app​ to see actual feed



# Potential solutions to the problem:

### Recommender Systems generally follow one of two methods:
##### 1. Collaborative filtering 
    
This approach <b> will not apply to the Yovo app as there is no option for users to like certain videos in their feed.  
The approach would require the app to keep track of each user and their likes, shares etc. in the form of user matrix. </b>


##### 2. Content Based filtering (Suitable for YOVO app)

This approach utilizes a series of discrete characteristics of an item in order to recommend additional items with similar properties. 

<b>Based on items,which are videos in the case of Yovo including the characteristics - tags, category, title text (context) in this case. 

This method is suitable in the case where metadata is available and no matrix of users ids, preferences is available.</b>


# Practical Example of Content based filtering (Architecture) :

## The Dataset 

The dataset of 500 entries of different items like shoes, shirts etc., along with an item-id and a <b>textual description of the item.</b><br>
The system creates a profile for each item and recommends similar items.

<b>For totality, I imagine this dataset will be replaced with one containing textual discription of the video(videoId)
using the tags or title  attributes.</b>

## Process

#### 1. Extract TF * IDF [(term frequency)*(Inverse document frequency)] Score

The TF*IDF algorithm finds the importance of a word in the tag. This is done for each word in the tag and for each item.<br>
<b>This is implemented using scikit-learns inbuilt TF-IDF vectorizer. </b>

#### 2. Calculating Similarity using Cosine Similarity  
Once we have the vectors for each item, we can use cosine similarity to find items/ words that are similar.<br>
Cosine similarity judges how close the cosine angles are in the vector representation of the items.<br>

<b>This is done using the linear_kernel method of scikit-learn. It takes the tfidf matrix of the items as input and compares them to find items that are similar. </b>
    
#### 3. Store results of cosine similarity 
The results of cosine similarity are stored in result, arranged according to similarity with item i.

#### 4. Recommending Items
The function recommend takes in the item for which a recommendation is to be made and the number of recommmendations to be made and reads out the most similar items from results.
<br>
We input a threshold value to only get recommendation above a certain similarity index.<br>
<b> The items recommended can then be fed into the personalized feed of the user for relevant video recommendations.</b>

In [31]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel,cosine_similarity 

# reads dataset 
ds = pd.read_csv("sample-data.csv")

ds.head()

# 1. calculates tf-idf scores for items
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(ds['description'])


# 2. Calculate cosine similarity 
cosine_similarities = cosine_similarity(tfidf_matrix, tfidf_matrix) 
results = {}

# 3. Saving results in order of similarity to item i to dictionary results

for idx, row in ds.iterrows():
    similar_indices = cosine_similarities[idx].argsort()[:-100:-1]
    similar_items = [(cosine_similarities[idx][i], ds['id'][i]) for i in similar_indices]

    results[row['id']] = similar_items[1:]

print('Saved')

def item(id):
    return ds.loc[ds['id'] == id]['description'].tolist()[0].split(' - ')[0]

# 4.a Just reads the results out of the dictionary.
def recommend(item_id, num, threshold):
    print("Recommending " + str(num) + " products similar to " + item(item_id) + " with threshold " + str(threshold) + "...")
    print("-------")
    recs = results[item_id][:num]
    for rec in recs:
        
        # A condition can be added to only to print items above a similairty threshold (50% etc.)
        if(rec[0]>threshold):
            print("Recommended: " + item(rec[1]) + " (score:" + str(rec[0]) + ")")

# 4. Recommend items similar to item_id and num = number of recommendations with a similarity threshold value 


recommend(item_id=9, num=5, threshold= 0.3)  # Recommend  5 items similar to item_id 9 with similarity of >30%
print('')
print('')
print('')
recommend(item_id=22, num=10, threshold= 0.5) # Recommend  10 items similar to item_id 22 with similarity of >50%
for itemid in range(1,500):
    print('')
    print('')
    recommend(item_id=itemid, num=10, threshold= 0.2)
    print('')
    print('')


Saved
Recommending 5 products similar to Baby micro d-luxe cardigan with threshold 0.3...
-------
Recommended: Micro d-luxe cardigan (score:0.37550840843454386)



Recommending 10 products similar to Cap 2 t-shirt with threshold 0.5...
-------
Recommended: Cap 2 crew (score:0.7049890803634628)
Recommended: Cap 2 t-shirt (score:0.7041906217524192)
Recommended: Cap 2 cap sleeve (score:0.6635362790007221)
Recommended: Cap 2 zip neck (score:0.6225162259563579)
Recommended: Cap 2 zip neck (score:0.5295225236280278)
Recommended: Cap 2 v-neck (score:0.5094581638236435)


Recommending 10 products similar to Active classic boxers with threshold 0.2...
-------
Recommended: Cap 1 boxer briefs (score:0.2203792147261746)




Recommending 10 products similar to Active sport boxer briefs with threshold 0.2...
-------
Recommended: Active sport briefs (score:0.4181663992161582)




Recommending 10 products similar to Active sport briefs with threshold 0.2...
-------
Recommended: Active sport boxer brie

Recommended: Organic logo t-shirt (score:0.3335370611111286)
Recommended: Mountain island t-shirt (score:0.33051956151354395)
Recommended: Squid t-shirt (score:0.3304360043896634)
Recommended: Fish frenzy t-shirt (score:0.32911161576557)
Recommended: Trout head t-shirt (score:0.3251562642285503)




Recommending 10 products similar to Duck pants with threshold 0.2...
-------
Recommended: Duck pants (score:0.9139073627612282)
Recommended: Duck pants (score:0.9129682337046948)
Recommended: Duck shorts (score:0.33733748601845526)
Recommended: Custodian pants (score:0.2713781940673174)
Recommended: Custodian pants (score:0.24146132925667188)
Recommended: Custodian pants (score:0.24120636598984607)




Recommending 10 products similar to Elias fz sweatshirt with threshold 0.2...
-------
Recommended: Elias sweatshirt (score:0.2782682084187074)
Recommended: Cotton fleece hoody (score:0.26265607432431826)




Recommending 10 products similar to Elias sweatshirt with threshold 0.2...
-------
Re

Recommended: Synch vest (score:0.261138002095995)




Recommending 10 products similar to Torrentshell jkt with threshold 0.2...
-------
Recommended: Rain shadow jkt (score:0.23885819327432883)
Recommended: Rain shadow jkt (score:0.2363945317399587)
Recommended: Rain shadow pants (score:0.22025132122716704)
Recommended: Rain shadow pants (score:0.21993842689385612)
Recommended: Torrentshell pants (score:0.21925039681865538)
Recommended: Torrentshell pants (score:0.21612667322608442)
Recommended: Torrentshell pants (score:0.21612259422748661)
Recommended: Rain shadow trench coat (score:0.20927125622854106)
Recommended: Torrentshell jkt (score:0.20142122220791234)




Recommending 10 products similar to La surfer maria t-shirt with threshold 0.2...
-------
Recommended: Organic logo t-shirt (score:0.3692080722309289)
Recommended: Mountain island t-shirt (score:0.366919468434053)
Recommended: Dragoons t-shirt (score:0.3613339630219294)
Recommended: Flying fish 2 t-shirt (score:0.3341259494

Recommended: Merino 2 crew (score:0.5683494594687434)
Recommended: Merino 2 t-shirt (score:0.5474223305741465)
Recommended: Merino 2 dress (score:0.5266566579166622)
Recommended: Merino 1 t-shirt (score:0.4420357576662773)
Recommended: Merino 1 crew (score:0.44173832718983536)
Recommended: Merino 1 crew (score:0.44078461163020627)




Recommending 10 products similar to Merino 2 t-shirt with threshold 0.2...
-------
Recommended: Merino 2 crew (score:0.7387583673473276)
Recommended: Merino 2 polo (score:0.6627218930438882)
Recommended: Merino 2 bottoms (score:0.6433303120933099)
Recommended: Merino 2 bottoms (score:0.6354414878769344)
Recommended: Merino 2 t-shirt (score:0.6213454076383335)
Recommended: Merino 2 crew (score:0.6097511363106206)
Recommended: Merino 2 dress (score:0.5396530354364161)
Recommended: Merino 1 t-shirt (score:0.469709488093083)
Recommended: Merino 1 crew (score:0.4603842313913137)
Recommended: Merino 1 t-shirt (score:0.458477566949799)




Recommending 10 produc

Recommending 10 products similar to Runshade top with threshold 0.2...
-------
Recommended: L/s runshade top (score:0.6118589217273248)
Recommended: Runshade t-shirt (score:0.5386214969873796)
Recommended: Runshade t-shirt (score:0.5299142171988567)
Recommended: L/s runshade top (score:0.476473219459205)




Recommending 10 products similar to Shop pants with threshold 0.2...
-------
Recommended: Shop pants (score:0.908753317300124)
Recommended: Shop pants (score:0.9085259342535041)
Recommended: Custodian pants (score:0.273793740065032)
Recommended: Custodian pants (score:0.2597508298428583)
Recommended: Custodian pants (score:0.25939696332174766)
Recommended: Reg fit organic ctn jeans-reg (score:0.2479546098624955)
Recommended: Reg fit organic ctn jeans-long (score:0.23946612142047038)
Recommended: Reg fit organic ctn jeans-short (score:0.23905244939639902)




Recommending 10 products similar to Shop pants with threshold 0.2...
-------
Recommended: Shop pants (score:0.908753317300124

Recommended: Ultra lw endurance ped socks (score:0.25966656464295407)
Recommended: Lw endurance ankle socks (score:0.24613399760856947)
Recommended: Lw everyday socks (score:0.20034710899949698)




Recommending 10 products similar to Velocity cap with threshold 0.2...
-------
Recommended: Velocity visor (score:0.37423906554910363)
Recommended: Logo visor (score:0.34254585796782283)
Recommended: Bimini cap (score:0.26695066186505606)




Recommending 10 products similar to Watermaster hip highs with threshold 0.2...
-------
Recommended: Guidewater waders (score:0.4127584882598938)
Recommended: Guidewater waders (score:0.4106052517131119)
Recommended: Watermaster waders (score:0.39980878180576895)
Recommended: Watermaster waders (score:0.399783373732339)
Recommended: Watermaster waders (score:0.3997378539482242)
Recommended: Watermaster waders (score:0.39964417876770364)




Recommending 10 products similar to World according to bikers t-shir with threshold 0.2...
-------
Recommended: F

Recommended: Cap 3 bottoms (score:0.268659496717569)




Recommending 10 products similar to Cap 1 t-shirt with threshold 0.2...
-------
Recommended: Cap 1 graphic t-shirt (score:0.6296589693391986)
Recommended: Cap 1 t-shirt (score:0.5766557156725726)
Recommended: Cap 1 scoop (score:0.574899830223809)
Recommended: Cap 1 graphic t-shirt (score:0.5374827505943757)
Recommended: Cap 1 crew (score:0.5341938228481482)
Recommended: Cap 1 graphic crew (score:0.5122701521239463)
Recommended: Cap 1 bottoms (score:0.49086410290038135)
Recommended: Cap 1 bottoms (score:0.48930235404068384)
Recommended: Cap 1 graphic tee (score:0.28598921277152983)




Recommending 10 products similar to Cap 2 bottoms with threshold 0.2...
-------
Recommended: Cap 2 bottoms (score:0.615185258187686)
Recommended: Cap 2 t-shirt (score:0.3846137053121345)
Recommended: Cap 2 crew (score:0.37226731482237774)
Recommended: Cap 2 zip neck (score:0.3614905740829708)
Recommended: Cap 2 t-shirt (score:0.36133399207736067)
Re

Recommended: Merino 1 t-shirt (score:0.6983391545795753)
Recommended: Merino 1 crew (score:0.6168376730361484)
Recommended: Merino 2 bottoms (score:0.47577285831584376)
Recommended: Merino 2 bottoms (score:0.4583042769288987)
Recommended: Merino 2 crew (score:0.4528023389148849)
Recommended: Merino 2 t-shirt (score:0.4478660780216554)
Recommended: Merino 2 t-shirt (score:0.4392176679632506)




Recommending 10 products similar to Merino 1 tank with threshold 0.2...
-------
Recommended: Merino 1 crew (score:0.8785539575354706)
Recommended: Merino 1 t-shirt (score:0.8464071452201953)
Recommended: Merino 1 graphic t-shirt (score:0.8268955309642986)
Recommended: Merino 1 t-shirt (score:0.6738398804186708)
Recommended: Merino 1 crew (score:0.5795538801791587)
Recommended: Merino 2 bottoms (score:0.4622176228841482)
Recommended: Merino 2 bottoms (score:0.4452066943860219)
Recommended: Merino 2 crew (score:0.4409528684236743)
Recommended: Merino 2 t-shirt (score:0.43784025158152606)
Recommend




Recommending 10 products similar to Solimar shorts with threshold 0.2...
-------
Recommended: Solimar pants (score:0.4786796733425654)
Recommended: Solimar skirt (score:0.3398238319486447)




Recommending 10 products similar to S/s a/c shirt with threshold 0.2...
-------
Recommended: Sleeveless a/c shirt (score:0.47740989747230084)




Recommending 10 products similar to S/s rashguard with threshold 0.2...
-------
Recommended: L/s rashguard (score:0.6668707412901518)
Recommended: S/s rashguard (score:0.5124016137205434)
Recommended: L/s rashguard (score:0.4848634690827261)
Recommended: L/s hooded rashguard (score:0.4563928739536496)
Recommended: Rashguard (score:0.29598830799473974)
Recommended: Sun glove (score:0.20296186696334617)




Recommending 10 products similar to S/s sol patrol shirt with threshold 0.2...
-------
Recommended: Sol patrol shirt (score:0.48432824258789176)
Recommended: S/s sol patrol shirt (score:0.2338708370994384)
Recommended: L/s sol patrol shirt (score:0.

Recommending 10 products similar to Sol patrol shirt with threshold 0.2...
-------
Recommended: S/s sol patrol shirt (score:0.48432824258789176)




Recommending 10 products similar to Sport top with threshold 0.2...
-------
Recommended: Active classic cami (score:0.27707959791357367)




Recommending 10 products similar to Meridian board shorts with threshold 0.2...
-------
Recommended: Girona board shorts (score:0.4476156051343866)
Recommended: Wavefarer board shorts (score:0.26280808525009236)




Recommending 10 products similar to Merino 1 crew with threshold 0.2...
-------
Recommended: Merino 1 t-shirt (score:0.9186840685666897)
Recommended: Merino 1 graphic t-shirt (score:0.9137388147141502)
Recommended: Merino 1 tank (score:0.8785539575354706)
Recommended: Merino 1 t-shirt (score:0.7559689146017282)
Recommended: Merino 1 crew (score:0.6385540623104611)
Recommended: Merino 2 bottoms (score:0.4995122651129927)
Recommended: Merino 2 bottoms (score:0.48117297319073316)
Recommended:

Recommending 10 products similar to Versatili-tee with threshold 0.2...
-------
Recommended: Versatiliti tee (score:0.39103997927626244)
Recommended: Versatiliti tank (score:0.3707327530898328)
Recommended: Versatiliti cardi (score:0.3654315836344197)
Recommended: Versatiliti polo (score:0.288086308762844)




Recommending 10 products similar to Versatiliti cardi with threshold 0.2...
-------
Recommended: Versatiliti tank (score:0.38108743563985825)
Recommended: Versatili-tee (score:0.3654315836344197)
Recommended: Versatiliti tee (score:0.3460128428257535)
Recommended: Versatiliti polo (score:0.31699166048005584)




Recommending 10 products similar to Versatiliti tank with threshold 0.2...
-------
Recommended: Versatiliti cardi (score:0.38108743563985825)
Recommended: Versatili-tee (score:0.3707327530898328)
Recommended: Versatiliti tee (score:0.34690029959730645)
Recommended: Versatiliti polo (score:0.3131627079404868)




Recommending 10 products similar to Active boy shorts with t

Recommended: Logo visor (score:0.40386415909056234)
Recommended: Velocity cap (score:0.37423906554910363)
Recommended: Bimini cap (score:0.28697284108739224)




Recommending 10 products similar to Versatiliti polo with threshold 0.2...
-------
Recommended: Versatiliti tee (score:0.4119054174837137)
Recommended: Versatiliti cardi (score:0.31699166048005584)
Recommended: Versatiliti tank (score:0.3131627079404868)
Recommended: Versatili-tee (score:0.288086308762844)




Recommending 10 products similar to Versatiliti tee with threshold 0.2...
-------
Recommended: Versatiliti polo (score:0.4119054174837137)
Recommended: Versatili-tee (score:0.39103997927626244)
Recommended: Versatiliti tank (score:0.34690029959730645)
Recommended: Versatiliti cardi (score:0.3460128428257535)




Recommending 10 products similar to Vintage logo pkt t-shirt with threshold 0.2...
-------
Recommended: Organic logo t-shirt (score:0.3756107019376241)
Recommended: Tarpon t-shirt (score:0.36561482739794426)
Reco



Recommending 10 products similar to Stand up shorts-7 in. with threshold 0.2...
-------
Recommended: Stand up shorts-5 in. (score:0.9616490301397086)




Recommending 10 products similar to Sticks 'n stones morocco poster with threshold 0.2...
-------
Recommended: Wyoming climbing poster (score:0.6720724010252392)
Recommended: Wild steelhead, alaska poster (score:0.6367033487849026)
Recommended: Going big in b.c. poster (score:0.6351296822709666)
Recommended: Lead an examined life poster (score:0.5647187318326301)
Recommended: Symmetry w16 poster (score:0.5506502377475058)
Recommended: Flyfishing the athabasca poster (score:0.5375184762507991)
Recommended: Traversing auguille d'entreves (score:0.5268481527546258)




Recommending 10 products similar to Peregrine t-shirt with threshold 0.2...
-------
Recommended: Fish frenzy t-shirt (score:0.4110301626864297)
Recommended: Trout head t-shirt (score:0.3746507592621579)
Recommended: Wind path t-shirt (score:0.36100709684469845)
Recommend



Recommending 10 products similar to Hip pack with threshold 0.2...
-------
Recommended: Pocket pack (score:0.27359141874773923)
Recommended: Atom (score:0.2545734766061306)
Recommended: Lightwire (score:0.21866871948410752)
Recommended: Single shot (score:0.21114229544356092)
Recommended: Crosstown (score:0.20333055417406964)




Recommending 10 products similar to Hooded monk sweatshirt with threshold 0.2...
-------




Recommending 10 products similar to Houdini full-zip jkt with threshold 0.2...
-------
Recommended: Houdini full-zip jkt (score:0.9376904578333136)




Recommending 10 products similar to Merino 2 bottoms with threshold 0.2...
-------
Recommended: Merino 2 bottoms (score:0.7843682198961147)
Recommended: Merino 2 crew (score:0.6466715499130407)
Recommended: Merino 2 t-shirt (score:0.6354414878769344)
Recommended: Merino 2 crew (score:0.6311892715268155)
Recommended: Merino 2 polo (score:0.6121277706073257)
Recommended: Merino 2 t-shirt (score:0.5887829006504969)
Recom

Recommended: Cap 2 bottoms (score:0.3002635911201806)
Recommended: Cap 4 bottoms (score:0.26279920332755136)
Recommended: Cap 2 bottoms (score:0.25479589708110395)
Recommended: Cap 3 bottoms (score:0.25403203925738566)
Recommended: Cap 4 bottoms (score:0.2265549359066764)
Recommended: Cap 1 bottoms (score:0.21495967877761227)




Recommending 10 products similar to Cap 3 crew with threshold 0.2...
-------
Recommended: Cap 3 crew (score:0.7734953701018465)
Recommended: Cap 3 zip neck (score:0.6309802636575084)
Recommended: Cap 3 bottoms (score:0.6055608676833139)
Recommended: Cap 3 bottoms (score:0.5529831864961339)
Recommended: Cap 2 crew (score:0.2608243059380334)
Recommended: Cap 2 t-shirt (score:0.22979312127318086)
Recommended: Cap 2 t-shirt (score:0.22239043571277542)
Recommended: Cap 3 crew (score:0.2215911354225018)
Recommended: Cap 2 cap sleeve (score:0.21597848229722838)
Recommended: Cap 3 bottoms (score:0.21580738082163753)




Recommending 10 products similar to Guidewater p

Recommending 10 products similar to Baby live simply deer t-shirt with threshold 0.2...
-------
Recommended: Baby live simply seal t-shirt (score:0.7371334886173992)
Recommended: Live simply deer t-shirt (score:0.5116094608869235)
Recommended: Live simply deer t-shirt (score:0.4652068666866127)
Recommended: Baby circus t-shirt (score:0.441860443463269)
Recommended: Baby tag you're it t-shirt (score:0.3827597256511779)
Recommended: Girl's live simply deer t-shirt (score:0.3640280930374177)
Recommended: Live simply guitar t-shirt (score:0.3411362565465133)
Recommended: Live simply bug t-shirt (score:0.3301615527444832)
Recommended: Rockpile t-shirt (score:0.32526708528728754)
Recommended: Flying fish 2 t-shirt (score:0.31336151058210865)




Recommending 10 products similar to Baby live simply seal t-shirt with threshold 0.2...
-------
Recommended: Baby live simply deer t-shirt (score:0.7371334886173992)
Recommended: Baby circus t-shirt (score:0.4160000825442999)
Recommended: Baby tag yo

Recommended: Cap 2 t-shirt (score:0.3723038081114073)
Recommended: Cap 2 zip neck (score:0.33874337630298856)
Recommended: Cap 2 v-neck (score:0.33541585404245494)
Recommended: Cap 3 bottoms (score:0.3002635911201806)
Recommended: Cap 3 bottoms (score:0.255209064611744)




Recommending 10 products similar to Cap 2 crew with threshold 0.2...
-------
Recommended: Cap 2 t-shirt (score:0.7049890803634628)
Recommended: Cap 2 t-shirt (score:0.6334452370495761)
Recommended: Cap 2 cap sleeve (score:0.6081146129427671)
Recommended: Cap 2 zip neck (score:0.5672627008820263)
Recommended: Cap 2 zip neck (score:0.5308649780944353)
Recommended: Cap 2 v-neck (score:0.5148329784985639)
Recommended: Cap 2 bottoms (score:0.38356142106852953)
Recommended: Cap 2 bottoms (score:0.37226731482237774)
Recommended: Cap 3 crew (score:0.3038439419387368)
Recommended: Cap 3 crew (score:0.2608243059380334)




Recommending 10 products similar to All-time shell with threshold 0.2...
-------
Recommended: All-time s

In [2]:
data = pd.read_csv('ml-latest-small/movies.csv')
data.info()
print('')
print('')


ratings = pd.read_csv('ml-latest-small/ratings.csv')
del ratings['timestamp']
ratings.info()
print('')
print('')

tags = pd.read_csv('ml-latest-small/tags.csv')
del tags['timestamp']
tags.info()
print('')
print('')

#Joining movies and ratings

movies_data = data.merge(ratings, on = 'movieId', how = 'inner')

#movies_data.info()
#movies_data.isnull().any()
#print(len(movies_data['movieId'].unique().tolist()))

# dropping duplicate  moviId as the recommender system takes input for item(movieId here) and will give recommendation 
# based on tags in single text stream, not multiple rows of duplicate item.

movies_data = movies_data.drop_duplicates(subset = ['movieId'])
print('No of unique moviId in merged datset')
print(len(movies_data['movieId'].unique().tolist()))
print('')
print('')
movies_data.info()
print('')
print('')

#Joining movies_data and tags
tags = tags.drop_duplicates(subset = ['movieId'])
tags.info()

movies_data_tags = movies_data.merge(tags, on ='movieId', how = 'inner')
movies_data_tags.head()

#1554 unique movies with tags

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9742 entries, 0 to 9741
Data columns (total 3 columns):
movieId    9742 non-null int64
title      9742 non-null object
genres     9742 non-null object
dtypes: int64(1), object(2)
memory usage: 228.5+ KB


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100836 entries, 0 to 100835
Data columns (total 3 columns):
userId     100836 non-null int64
movieId    100836 non-null int64
rating     100836 non-null float64
dtypes: float64(1), int64(2)
memory usage: 2.3 MB


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3683 entries, 0 to 3682
Data columns (total 3 columns):
userId     3683 non-null int64
movieId    3683 non-null int64
tag        3683 non-null object
dtypes: int64(2), object(1)
memory usage: 86.4+ KB


No of unique moviId in merged datset
9724


<class 'pandas.core.frame.DataFrame'>
Int64Index: 9724 entries, 0 to 100835
Data columns (total 5 columns):
movieId    9724 non-null int64
title      9724 non-null object
genres     972

Unnamed: 0,movieId,title,genres,userId_x,rating,userId_y,tag
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,1,4.0,336,pixar
1,2,Jumanji (1995),Adventure|Children|Fantasy,6,4.0,62,fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance,1,4.0,289,moldy
3,5,Father of the Bride Part II (1995),Comedy,6,5.0,474,pregnancy
4,7,Sabrina (1995),Comedy|Romance,6,4.0,474,remake


from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel 

# 1. calculates tf-idf scores for items
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(movies_data_tags['tag'])


# 2. Calculate cosine similarity 
cosine_similarities = linear_kernel(tfidf_matrix, tfidf_matrix) 
results = {}

# 3. Saving results in order of similarity to item i

for idx, row in movies_data_tags.iterrows():
    similar_indices = cosine_similarities[idx].argsort()[:-100:-1]
    similar_items = [(cosine_similarities[idx][i], movies_data_tags['tag'][i]) for i in similar_indices]

    results[row['movieId']] = similar_items[1:]
    
print('Saved')
print(results)
#def item(id):
#return movies_data_tags.loc[movies_data_tags['movieId'] == id]

# 4.a Just reads the results out of the dictionary.
def recommend(item_id, num, threshold):
    #print("Recommending " + str(num) + " products similar to " + movies_data_tags['movieId'] == item_id + "...")
    #print("-------")
    recs = results[item_id][:num]
    for rec in recs:
        
        # A condition can be added to only to print items above a similairty threshold (50% etc.)
        if(rec[0]>threshold):
            print("Recommended: " + item(rec[1]) + " (score:" + str(rec[0]) + ")")

# 4. Recommend items similar to item_id and num = number of recommendations with a similarity threshold value 


recommend(item_id=1, num=5, threshold= 0.3)  # Recommend  5 items similar to item_id 9 with similarity of >30%
print('')
print('')
print('')
recommend(item_id=3, num=10, threshold= 0.5) # Recommend  10 items similar to item_id 22 with similarity of >50%

