## Find me Love

#### This is a dating recommender system which would suggest the relevant profiles to users. Libimset dataset has been used to build the system. We focus on recommendation systems of different types:
#### 1. Popular Profiles Recommendation
#### 2. Profile Recommendation using Weighted Average
#### 3. Profile Recommendation using Pearson Correlation
#### 4. Profile Recommendation based on KNN

In [1]:
import pandas as pd
import numpy as np
gender = pd.read_csv('data/gender.dat',names = ['userid','gender'],header=None)
ratings = pd.read_csv('data/ratings.dat',names = ['userid','profileid','rating'],header=None)

In [None]:
gender.head()

In [None]:
ratings.head()

In [2]:
dataset = pd.merge(gender,ratings,on='userid')
dataset.head()

Unnamed: 0,userid,gender,profileid,rating
0,1,F,133,8
1,1,F,720,6
2,1,F,971,10
3,1,F,1095,7
4,1,F,1616,10


In [None]:
dataset.isnull().sum()

#### We see that there are no missing values in the dataset.

## Exploratory Data Analysis

In [None]:
dataset[['rating','gender']].describe(include='all')

In [None]:
dataset['gender'].value_counts().plot(kind='bar')

#### We have 3 types of genders: M for Male, F for Female and U for Unknown. The highest number of users on the platform are Females

In [None]:
dataset[['rating']].hist(bins=20)

#### The highest number of ratings are 10. Seems the platform has many favourite profiles.

## 1. Popular Profiles Recommendation

#### Here, we find the top profiles on our platform using the below approach

In [None]:
topprofiles = pd.DataFrame(dataset.groupby('profileid')['rating'].count())
topprofiles = topprofiles.rename(columns={'rating':'ratingcounts'})
topprofiles = topprofiles.sort_values('ratingcounts',ascending=False)
topprofiles.head(10)

#### So we found above the most famous profiles. However, we still have no information of the ratings on these profiles. What if these profiles are famous but poorly rated. So let us go ahead with trying to find average ratings for these profiles.

In [None]:
topprofiles['avgrating'] = dataset.groupby('profileid')['rating'].mean()
topprofiles.head(10)

#### Hence, we can see that profiles such as 89855, 162707 are famous but have poor rating. Hence, we explore other approaches to give a better recommendation.

## 2. Profile Recommendation using Weighted Average

In [None]:
profileavgratings = pd.DataFrame(dataset.groupby('profileid')['rating'].mean())
profileavgratings['numratingstoprofile'] = dataset.groupby('profileid')['rating'].count()
profileavgratings = profileavgratings.rename(columns={'rating':'avgrating'})
profileavgratings.head()

In [None]:
R = profileavgratings['avgrating']
v = profileavgratings['numratingstoprofile']
m = profileavgratings['numratingstoprofile'].quantile(0.70)
C = profileavgratings['avgrating'].mean()

In [None]:
profileavgratings['Weighted Average'] = ((R*v) + (C*m))/(v + m)
profileavgratings.head()
result_sorted = profileavgratings.sort_values('Weighted Average',ascending=False)#.reset_index()
result_sorted.head(10)

## 3. Profile Recommendation using Pearon Correlation

#### During EDA, we see there are many profiles which have low numbr of ratings. We need to filter these profiles to build a strong system(eg. profileid = 4 only has 1 rating). We set a threshold value to filter number of rows because the entire dataset failed to run on my machine. Hence, the below would only recommend profiles which have atleast 2000 ratings.

In [None]:
thres = 2000
filter_profileavgratings = profileavgratings.query('numratingstoprofile >= @thres')
filter_profileavgratings.head()

In [None]:
data_users = pd.merge(filter_profileavgratings,dataset,on='profileid').drop(['Weighted Average','avgrating','numratingstoprofile'],axis=1)
data_users.head()

In [None]:
pivot_table = data_users.pivot_table(index='userid',columns='profileid',values='rating')
pivot_table = pivot_table.fillna(0)

In [None]:
pivot_table.head()

#### Below we pick a random profile from our dataset, assuming this profile is what the user sees first.

In [None]:
ranchoice = np.random.choice(pivot_table.columns)
print(ranchoice)

In [None]:
profile = pivot_table[ranchoice]
similar_profiles = pivot_table.corrwith(profile)
recommend = pd.DataFrame(similar_profiles, columns=['pearsonR'])
recommend.dropna(inplace=True)
recommend.head()
recommend = recommend.join(profileavgratings['numratingstoprofile'])
recommend = recommend.sort_values('pearsonR', ascending=False)

In [None]:
recommend[1:11]

## 4. Profile Recommendation based on KNN

In [None]:
pivot_table_transpose = pivot_table.T
pivot_table_transpose.shape

In [None]:
pivot_table_transpose.head()

In [None]:
from scipy.sparse import csr_matrix
matrix = csr_matrix(pivot_table_transpose)

In [None]:
from sklearn.neighbors import NearestNeighbors

model_knn = NearestNeighbors(metric='cosine',algorithm='brute')#by default Knn works on Euclidean distances
model_knn.fit(matrix)

#### We pick a random profile from our dataset and find the profiles closest to it using KNN

In [None]:
q_index = np.random.choice(matrix.shape[0])
profileid = pivot_table_transpose.index[q_index]

#### Now we print the closest profiles to this profile with their similarity

In [None]:
distances,indices = model_knn.kneighbors(pivot_table_transpose.iloc[q_index,:].values.reshape(1,-1),n_neighbors=6)
for i in range(0,len(distances.flatten())):
    if i == 0:
        print("Recommendations for {0}:-\n".format(pivot_table_transpose.index[q_index]))
    else:
        print("{0}: {1} with distance of {2}".format(i,pivot_table_transpose.index[indices.flatten()[i]],distances.flatten()[i]))

#### If we try to look for the same ID using our Pearson Method discussed above, then the results are:

In [None]:
#ranchoice = 54349
ranchoice = profileid
profile = pivot_table[ranchoice]
similar_profiles = pivot_table.corrwith(profile)
recommend = pd.DataFrame(similar_profiles, columns=['pearsonR'])
recommend.dropna(inplace=True)
recommend.head()
recommend = recommend.join(profileavgratings['numratingstoprofile'])
recommend = recommend.sort_values('pearsonR', ascending=False)
recommend[1:6]

## 5. Collaborative Filtering

In [3]:
dataset.head()

Unnamed: 0,userid,gender,profileid,rating
0,1,F,133,8
1,1,F,720,6
2,1,F,971,10
3,1,F,1095,7
4,1,F,1616,10


In [4]:
Mean = ratings.groupby(by="userid",as_index=False)['rating'].mean()#.rename(columns={'rating':'meanrating'})
avg_rating = pd.merge(ratings,Mean,on='userid')
avg_rating['adjustrating']=avg_rating['rating_x']-avg_rating['rating_y']
avg_rating.head()

Unnamed: 0,userid,profileid,rating_x,rating_y,adjustrating
0,1,133,8,6.510145,1.489855
1,1,720,6,6.510145,-0.510145
2,1,971,10,6.510145,3.489855
3,1,1095,7,6.510145,0.489855
4,1,1616,10,6.510145,3.489855


In [5]:
profileavgratings = pd.DataFrame(dataset.groupby('profileid')['rating'].mean())
profileavgratings['numratingstoprofile'] = dataset.groupby('profileid')['rating'].count()
#profileavgratings = profileavgratings.rename(columns={'rating':'avgrating'})
#profileavgratings.head()
thres = 500
filter_profileavgratings = profileavgratings.query('numratingstoprofile >= @thres')
filter_profileavgratings.head()

Unnamed: 0_level_0,rating,numratingstoprofile
profileid,Unnamed: 1_level_1,Unnamed: 2_level_1
55,5.780652,889
77,9.200611,982
90,4.439437,1065
132,2.787524,513
133,6.22125,6974


In [6]:
useravgratings = pd.DataFrame(dataset.groupby('userid')['rating'].mean())
useravgratings['numratingsfromuser'] = dataset.groupby('userid')['rating'].count()
#useravgratings = useravgratings.rename(columns={'rating':'avgratingfromuser'})
#profileavgratings.head()
thres = 500
filter_useravgratings = useravgratings.query('numratingsfromuser >= @thres')
filter_useravgratings.head()

Unnamed: 0_level_0,rating,numratingsfromuser
userid,Unnamed: 1_level_1,Unnamed: 2_level_1
9,5.856007,3521
73,6.799544,1317
99,5.869725,545
128,4.818512,551
134,6.427885,5616


In [7]:
small_data = pd.merge(filter_profileavgratings,avg_rating,on='profileid')
small_data.head()

Unnamed: 0,profileid,rating,numratingstoprofile,userid,rating_x,rating_y,adjustrating
0,55,5.780652,889,9,5,5.856007,-0.856007
1,55,5.780652,889,251,5,4.479259,0.520741
2,55,5.780652,889,316,7,5.463988,1.536012
3,55,5.780652,889,365,5,3.691781,1.308219
4,55,5.780652,889,378,9,7.925926,1.074074


In [8]:
small_data = pd.merge(small_data,filter_useravgratings,on='userid')
small_data.head()

Unnamed: 0,profileid,rating_x,numratingstoprofile,userid,rating_x.1,rating_y,adjustrating,rating_y.1,numratingsfromuser
0,55,5.780652,889,9,5,5.856007,-0.856007,5.856007,3521
1,466,5.644139,2619,9,3,5.856007,-2.856007,5.856007,3521
2,538,7.161812,618,9,10,5.856007,4.143993,5.856007,3521
3,855,6.703476,1467,9,5,5.856007,-0.856007,5.856007,3521
4,1205,2.414752,827,9,1,5.856007,-4.856007,5.856007,3521


In [116]:
small_data[(small_data['userid']==9) & (small_data['profileid']==55)]

Unnamed: 0,profileid,rating_x,numratingstoprofile,userid,rating_x.1,rating_y,adjustrating,rating_y.1,numratingsfromuser
0,55,5.780652,889,9,5,5.856007,-0.856007,5.856007,3521


In [9]:
# check = pd.pivot_table(avg_rating,values='rating_x',index='userId',columns='movieId')
# check.head()

In [10]:
final = pd.pivot_table(small_data,values='adjustrating',index='userid',columns='profileid')
final.head()

profileid,55,77,90,132,133,208,215,243,261,276,...,220715,220717,220718,220752,220754,220760,220782,220840,220861,220953
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
9,-0.856007,,,,,,,,,,...,,,,,,,,,,
73,,,,,,,,,,,...,,,,,,,,,,
99,,,,,,,,,,,...,,,,,,,,,,
128,,,,,,,5.181488,,,,...,,,,,,,,,,
134,,,,,,,,,,,...,,,-3.427885,,,,,,,-5.427885


In [39]:
#final_profileratings = final.fillna(final.mean(axis=0))
# Replacing NaN by user Average
final_profileratings = final.apply(lambda row: row.fillna(row.mean()), axis=1)
final_profileratings.shape

(3910, 6526)

In [40]:
# user similarity on replacing NAN by item(movie) avg
from sklearn.metrics.pairwise import cosine_similarity
cosine = cosine_similarity(final_profileratings)
np.fill_diagonal(cosine, 0 )
similarity_with_movie = pd.DataFrame(cosine,index=final_profileratings.index)
similarity_with_movie.columns=final_profileratings.index
similarity_with_movie.head()


userid,9,73,99,128,134,147,155,245,251,299,...,135036,135066,135158,135216,135234,135240,135273,135281,135285,135298
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
9,0.0,0.219461,0.329288,0.325482,0.112167,0.249814,0.151569,0.344976,0.322972,0.220652,...,0.231191,0.235683,0.26863,0.268901,0.243671,0.239375,0.351643,0.293893,0.19732,0.284677
73,0.219461,0.0,0.541188,0.559878,0.276397,0.472133,0.300351,0.508174,0.37966,0.453018,...,0.431213,0.395241,0.489603,0.367853,0.448288,0.496369,0.377587,0.170179,0.314685,0.590231
99,0.329288,0.541188,0.0,0.785191,0.266659,0.628168,0.359158,0.739263,0.554422,0.539665,...,0.586076,0.518502,0.657078,0.516782,0.584182,0.592498,0.525282,0.232268,0.388382,0.711534
128,0.325482,0.559878,0.785191,0.0,0.278457,0.647201,0.37087,0.727758,0.580035,0.558577,...,0.606109,0.537182,0.680752,0.526027,0.600947,0.603,0.516266,0.232507,0.402916,0.739803
134,0.112167,0.276397,0.266659,0.278457,0.0,0.312911,0.415371,0.257473,0.196617,0.261836,...,0.264193,0.279788,0.280639,0.194904,0.250376,0.23057,0.224869,0.140025,0.1966,0.295105


In [41]:
def find_n_neighbours(df,n):
    order = np.argsort(df.values, axis=1)[:, :n]
    df = df.apply(lambda x: pd.Series(x.sort_values(ascending=False)
           .iloc[:n].index, 
          index=['top{}'.format(i) for i in range(1, n+1)]), axis=1)
    return df

In [42]:
# top 30 neighbours for each user
sim_user_30_m = find_n_neighbours(similarity_with_movie,30)
sim_user_30_m.head()#this tells u 30 closest users to this particular used based on cosine similarity found above

Unnamed: 0_level_0,top1,top2,top3,top4,top5,top6,top7,top8,top9,top10,...,top21,top22,top23,top24,top25,top26,top27,top28,top29,top30
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
9,35605,115105,1017,106373,83817,89746,11858,31620,119759,31007,...,114810,7747,114670,115533,100299,122930,68332,43562,65384,72409
73,37972,33737,70299,119030,48997,5294,28220,64187,125520,30195,...,3398,107754,134696,44497,41155,128398,57331,132034,128775,99329
99,12087,18628,62538,93286,128982,74937,52056,76802,90763,26530,...,31847,28697,35899,81790,64961,119038,47981,28298,129403,58386
128,52498,93516,62538,90763,10838,74937,50294,133879,37830,87943,...,28697,105133,129403,58386,18628,76802,35151,48978,81790,64961
134,78392,46161,77473,15056,44717,81552,63060,37654,54499,76099,...,107092,103232,10036,46716,58801,69075,85983,71873,86691,17678


In [43]:
def get_user_similar_profiles( user1, user2 ):
    common_profiles = avg_rating[avg_rating.userid == user1].merge(
    avg_rating[avg_rating.userid == user2],
    on = "profileid",
    how = "inner" )
    return common_profiles#.merge( dataset, on = 'profileid' )

In [45]:
a = get_user_similar_profiles(134,78392)
a
a = a.loc[ : , ['rating_x_x','rating_x_y','profileid']]
a.head()

Unnamed: 0,rating_x_x,rating_x_y,profileid
0,10,10,199
1,4,1,214
2,1,1,225
3,7,8,328
4,10,10,394


In [46]:
final_profileratings.head()

profileid,55,77,90,132,133,208,215,243,261,276,...,220715,220717,220718,220752,220754,220760,220782,220840,220861,220953
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
9,-0.856007,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,...,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257
73,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,...,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251
99,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,...,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309
128,0.806488,0.806488,0.806488,0.806488,0.806488,0.806488,5.181488,0.806488,0.806488,0.806488,...,0.806488,0.806488,0.806488,0.806488,0.806488,0.806488,0.806488,0.806488,0.806488,0.806488
134,0.401145,0.401145,0.401145,0.401145,0.401145,0.401145,0.401145,0.401145,0.401145,0.401145,...,0.401145,0.401145,-3.427885,0.401145,0.401145,0.401145,0.401145,0.401145,0.401145,-5.427885


In [18]:
Mean.head()

Unnamed: 0,userid,rating
0,1,6.510145
1,2,8.041237
2,3,7.15
3,4,6.841584
4,5,8.419048


In [47]:
def User_item_score(user,item):
    #user = 30
    #item = 
    a = sim_user_30_m[sim_user_30_m.index==user].values
    #print(a)
    b = a.squeeze().tolist()
    c = final_profileratings.loc[:,item]
    #print(c)
    d = c[c.index.isin(b)]
    f = d[d.notnull()]
    avg_user = Mean.loc[Mean['userid'] == user,'rating'].values[0]
    index = f.index.values.squeeze().tolist()
    corr = similarity_with_movie.loc[user,index]
    fin = pd.concat([f, corr], axis=1)
    fin.columns = ['adg_score','correlation']
    fin['score']=fin.apply(lambda x:x['adg_score'] * x['correlation'],axis=1)
    nume = fin['score'].sum()
    deno = fin['correlation'].sum()
    final_score = avg_user + (nume/deno)
    return final_score
'''
user = 128#The userid
item = 55#The profileid
a = sim_user_30_m[sim_user_30_m.index==user].values#Get the Top30 profileid matching with this user
b = a.squeeze().tolist()#Convert a to list

c = final_profileratings.loc[:,item]#Get all the user and their ratings for the specific profileid

d = c[c.index.isin(b)]#b has list of 30 users closest to me. c has users who have liked the profile. So let us find the users close to me who have liked the profile
f = d[d.notnull()]
#print(d)
avg_user = Mean.loc[Mean['userid'] == user,'rating'].values[0] #Get the average rating of the user 128
index = f.index.values.squeeze().tolist()
#print(index) #Same as d

corr = similarity_with_movie.loc[user,index]#Find correlation between 128 and the users in index
fin = pd.concat([f, corr], axis=1)#So now concat deviation and correlation
fin.columns = ['adg_score','correlation']
fin['score']=fin.apply(lambda x:x['adg_score'] * x['correlation'],axis=1)
nume = fin['score'].sum()
deno = fin['correlation'].sum()
final_score = avg_user + (nume/deno)
return final_score
'''

"\nuser = 128#The userid\nitem = 55#The profileid\na = sim_user_30_m[sim_user_30_m.index==user].values#Get the Top30 profileid matching with this user\nb = a.squeeze().tolist()#Convert a to list\n\nc = final_profileratings.loc[:,item]#Get all the user and their ratings for the specific profileid\n\nd = c[c.index.isin(b)]#b has list of 30 users closest to me. c has users who have liked the profile. So let us find the users close to me who have liked the profile\nf = d[d.notnull()]\n#print(d)\navg_user = Mean.loc[Mean['userid'] == user,'rating'].values[0] #Get the average rating of the user 128\nindex = f.index.values.squeeze().tolist()\n#print(index) #Same as d\n\ncorr = similarity_with_movie.loc[user,index]#Find correlation between 128 and the users in index\nfin = pd.concat([f, corr], axis=1)#So now concat deviation and correlation\nfin.columns = ['adg_score','correlation']\nfin['score']=fin.apply(lambda x:x['adg_score'] * x['correlation'],axis=1)\nnume = fin['score'].sum()\ndeno = fi

In [48]:
score = User_item_score(128,55)
print("score (u,i) is",score)

score (u,i) is 5.663415218142876


In [56]:
final_profileratings.head()

profileid,55,77,90,132,133,208,215,243,261,276,...,220715,220717,220718,220752,220754,220760,220782,220840,220861,220953
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
9,-0.856007,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,...,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257,0.383257
73,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,...,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251,0.620251
99,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,...,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309,0.72309
128,0.806488,0.806488,0.806488,0.806488,0.806488,0.806488,5.181488,0.806488,0.806488,0.806488,...,0.806488,0.806488,0.806488,0.806488,0.806488,0.806488,0.806488,0.806488,0.806488,0.806488
134,0.401145,0.401145,0.401145,0.401145,0.401145,0.401145,0.401145,0.401145,0.401145,0.401145,...,0.401145,0.401145,-3.427885,0.401145,0.401145,0.401145,0.401145,0.401145,0.401145,-5.427885


In [79]:
final.head()

profileid,55,77,90,132,133,208,215,243,261,276,...,220715,220717,220718,220752,220754,220760,220782,220840,220861,220953
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
9,-0.856007,,,,,,,,,,...,,,,,,,,,,
73,,,,,,,,,,,...,,,,,,,,,,
99,,,,,,,,,,,...,,,,,,,,,,
128,,,,,,,5.181488,,,,...,,,,,,,,,,
134,,,,,,,,,,,...,,,-3.427885,,,,,,,-5.427885


In [117]:
small_data = small_data.astype({"profileid": str})#Convert profile is as str to use in join below
Movie_user = small_data.groupby(by = 'userid')['profileid'].apply(lambda x:','.join(x))
#Commma Seperated list of all profiles each user has rated

In [129]:
def User_item_score1(user):
    Movie_seen_by_user = final.columns[final[final.index==user].notna().any()].tolist()
    a = sim_user_30_m[sim_user_30_m.index==user].values
    b = a.squeeze().tolist()
    d = Movie_user[Movie_user.index.isin(b)]
    l = ','.join(d.values)
    Movie_seen_by_similar_users = l.split(',')
    Movies_under_consideration = list(set(Movie_seen_by_similar_users)-set(list(map(str, Movie_seen_by_user))))
    Movies_under_consideration = list(map(int, Movies_under_consideration))
    #print(Movies_under_consideration)
    score = []
    for item in Movies_under_consideration:
        c = final_profileratings.loc[:,item]
        d = c[c.index.isin(b)]
        f = d[d.notnull()]
        avg_user = Mean.loc[Mean['userid'] == user,'rating'].values[0]
        index = f.index.values.squeeze().tolist()
        corr = similarity_with_movie.loc[user,index]
        fin = pd.concat([f, corr], axis=1)
        fin.columns = ['adg_score','correlation']
        fin['score']=fin.apply(lambda x:x['adg_score'] * x['correlation'],axis=1)
        nume = fin['score'].sum()
        deno = fin['correlation'].sum()
        final_score = avg_user + (nume/deno)
        score.append(final_score)
    data = pd.DataFrame({'profileid':Movies_under_consideration,'score':score})
    top_5_recommendation = data.sort_values(by='score',ascending=False).head(5)
    Movie_Name = top_5_recommendation.merge(filter_profileavgratings, how='inner', on='profileid')
    Movie_Names = Movie_Name.profileid.values.tolist()
    return Movie_Names
'''
user = 9
Movie_seen_by_user = final.columns[final[final.index==user].notna().any()].tolist() #Profiles actually rated by user
print(Movie_seen_by_user)
a = sim_user_30_m[sim_user_30_m.index==user].values#Get the Top30 profileid matching with this user
b = a.squeeze().tolist()
d = Movie_user[Movie_user.index.isin(b)]#Finding profiles liked by Top 30 users
l = ','.join(d.values)#Join all the profiles liked by all the 30 useres in comma seperated values
#print(d)

Movie_seen_by_similar_users = l.split(',')#Individual profile ids seen by the Top 30 users
#print(Movie_seen_by_similar_users)

Movies_under_consideration = list(set(Movie_seen_by_similar_users)-set(list(map(str, Movie_seen_by_user))))
#print(Movies_under_consideration)#Profiles not seen by user but seen by his Top 30

Movies_under_consideration = list(map(int, Movies_under_consideration))

score = []
for item in Movies_under_consideration:
    c = final_movie.loc[:,item]
    d = c[c.index.isin(b)]
    f = d[d.notnull()]
    avg_user = Mean.loc[Mean['userid'] == user,'meanrating'].values[0]
    index = f.index.values.squeeze().tolist()
    corr = similarity_with_movie.loc[user,index]
    fin = pd.concat([f, corr], axis=1)
    fin.columns = ['adg_score','correlation']
    fin['score']=fin.apply(lambda x:x['adg_score'] * x['correlation'],axis=1)
    nume = fin['score'].sum()
    deno = fin['correlation'].sum()
    final_score = avg_user + (nume/deno)
    score.append(final_score)
data = pd.DataFrame({'movieId':Movies_under_consideration,'score':score})
top_5_recommendation = data.sort_values(by='score',ascending=False).head(5)
Movie_Name = top_5_recommendation.merge(movies, how='inner', on='movieId')
Movie_Names = Movie_Name.title.values.tolist()
return Movie_Names
'''

"\nuser = 9\nMovie_seen_by_user = final.columns[final[final.index==user].notna().any()].tolist() #Profiles actually rated by user\nprint(Movie_seen_by_user)\na = sim_user_30_m[sim_user_30_m.index==user].values#Get the Top30 profileid matching with this user\nb = a.squeeze().tolist()\nd = Movie_user[Movie_user.index.isin(b)]#Finding profiles liked by Top 30 users\nl = ','.join(d.values)#Join all the profiles liked by all the 30 useres in comma seperated values\n#print(d)\n\nMovie_seen_by_similar_users = l.split(',')#Individual profile ids seen by the Top 30 users\n#print(Movie_seen_by_similar_users)\n\nMovies_under_consideration = list(set(Movie_seen_by_similar_users)-set(list(map(str, Movie_seen_by_user))))\n#print(Movies_under_consideration)#Profiles not seen by user but seen by his Top 30\n\nMovies_under_consideration = list(map(int, Movies_under_consideration))\n\nscore = []\nfor item in Movies_under_consideration:\n    c = final_movie.loc[:,item]\n    d = c[c.index.isin(b)]\n    f 

In [133]:
user = int(input("Enter the user id to whom you want to recommend : "))
predicted_movies = User_item_score1(user)
print(predicted_movies)
print(" ")
print("The Recommendations for User Id : 370")
print("   ")
for i in predicted_movies:
    print(i)

Enter the user id to whom you want to recommend : 128
[71636, 93681, 130120, 9855, 32792]
 
The Recommendations for User Id : 370
   
71636
93681
130120
9855
32792
