### Evaluation
- You are given a recommendations.csv file with user ids and you are supposed to recommend atmost 10 songs.
- The training data is provided in train.csv file.
- Your entries in the recommendations.csv file should be such that for each row the first value must be user_id followed by recommended song_ids all seperated by comma.
- Make sure you have atleast one recommendation for each user in recommendations.csv or else your score will be zero
- The recommended songs to a user must be different from what you already have in the training set for the same user.
- A user can have at most 10 recommendations.
- The final score F1 will be a harmonic mean of precision and recall values.
- Run the below script to score your recommendations.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
import numpy as np
import Recommenders as Recommenders



In [2]:
song_df = pd.read_csv('train.csv')
song_df.head(3)


Unnamed: 0,user_id,song_id,listen_count,title,release,artist_name,year
0,806ccae96c8ecb1c198482aff785ccd6bbe17143,SOBOAFP12A8C131F36,1,Lucky (Album Version),We Sing. We Dance. We Steal Things.,Jason Mraz & Colbie Caillat,0
1,ed3664f9cd689031fe4d0ed6c66503bdc3ad7cb6,SOPTLQL12AB018D56F,1,Billionaire [feat. Bruno Mars] (Explicit Albu...,Billionaire [feat. Bruno Mars],Travie McCoy,0
2,0dd93f61fe69f292ac336715ef607214efb3dbaa,SORALYQ12A8151BA99,3,If I Ain't Got You,R&B Love Collection 08,Alicia Keys,2003


In [3]:
#Merge song title and artist_name columns to make a merged column
song_df['song'] = song_df['title'].map(str) + " - " + song_df['artist_name']

In [4]:
song_grouped = song_df.groupby(['song']).agg({'listen_count': 'count'}).reset_index()
grouped_sum = song_grouped['listen_count'].sum()
song_grouped['percentage']  = song_grouped['listen_count'].div(grouped_sum)*100
song_grouped.sort_values(['listen_count', 'song'], ascending = [0,1])

Unnamed: 0,song,listen_count,percentage
4703,Sehr kosmisch - Harmonia,68,0.42500
1360,Dog Days Are Over (Radio Edit) - Florence + Th...,58,0.36250
5987,Undo - Björk,56,0.35000
4697,Secrets - OneRepublic,52,0.32500
4462,Revelry - Kings Of Leon,51,0.31875
...,...,...,...
6551,Zwitter - Rammstein,1,0.00625
6553,aNYway - Armand Van Helden & A-TRAK Present Du...,1,0.00625
6555,high fives - Four Tet,1,0.00625
6556,in white rooms - Booka Shade,1,0.00625


In [5]:
users = song_df['user_id'].unique()

In [13]:
is_model = Recommenders.item_similarity_recommender_py()
is_model.create(song_df, 'user_id', 'song_id')

In [14]:
test_df = pd.read_csv("recommendations.csv", header=None)
test_df.rename(columns = {list(test_df)[0]: 'user_id'}, inplace = True)

In [15]:
test_users = test_df['user_id'].unique()

In [16]:
for j in range(10):
    test_df["song {}".format(j)] = 0

In [17]:
for i in range(5):
    user_id = test_df['user_id'][i]
    df, nb = is_model.recommend(user_id)
    if nb<10:
        df = df.iloc[:nb]     
    for j in range(len(df)):
        test_df["song {}".format(j)][i]= df.song[j]

    
    

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [21]:
test_df = test_df.astype(str)
for j in range(10):
    test_df['song {}'.format(j)].replace(['0', '0.0'], '', inplace=True)

In [22]:
test_df.head(5)

Unnamed: 0,user_id,song 0,song 1,song 2,song 3,song 4,song 5,song 6,song 7,song 8,song 9
0,43683da3c6c5a93c7938ff550faf0d039a9a639a,SOERVER12A58A7DE48,SODABLD12A6D4F8B3C,SOKKDQB12AB01883B7,SOXUSKK12A8C144F94,SOBITYB12AB01830F5,SOLKPSQ12A6D223BC4,SOXDFVJ12A6D4FD18A,SOVZGLW12A8AE4570B,SOPOYLD12A8C13B17A,SODYWBD12A8C139845
1,85d0d381551960608e02df98956277e495b3cf6b,SOQPYQS12A58A7B8DF,SOCCYYG12AB0184DE8,SOCMSRR12A81C22F45,SORCAKN12A58A7A2CF,SOHGWFC12AB017F2E7,SOUFABE12A6701CFFF,SOBYRVR12A6D4FAF83,SOSHQHA12A58A7B1E9,SOFRQTD12A81C233C0,
2,ac1cb58f839ae6773732125e99b4a7394e0661e4,SOVMKIC12AF72A05CC,SOWMGHQ12A6D4F914D,SOZZIOH12A67ADE300,SOLGWFD12B0B807B28,SOFWUWJ12A6D4F991E,SOPRVKW12A6D4FD57B,SOQAQYN12A58A7B08D,SOPAETP12A8C131E3B,SOIJTAV12AB01825CA,SOTMFRY12A8C13C8A2
3,9c2032efba612bccec98435a3928b67d69350bed,SOEPJEF12AF72A44DA,SODLYRF12AB01861E0,SOCKZGC1280EC90D76,SOTOXYL12A8C139E18,SOWMACK12AB017E581,SOVXEFY12AB017CE50,SOMRCAS12A58A7CB34,,,
4,c4bcf00d005e6848a032d94f7fb212f499cdc1ba,SOWKVVW12A8AE45E8C,SOLVTSK12AB017EFCC,SOCZVSX12A6D4F5033,SOAUNAX12AB01876D0,SOTMTTY12A6D4F95A1,SOBNXJY12A8C13E070,SOVFFSK12A6BD55C96,SOSJIUS12A8C13C501,SOGISVQ12A8C13AE9B,


In [19]:
import dill
import csv

In [None]:
Evaluate = dill.load(open("Evaluate.pik", 'rb'))
evaluate =  dill.load(open("eval.pik", 'rb'))
print("F1 score: {}%".format(evaluate.score('recommendations.csv')))