[View in Colaboratory](https://colab.research.google.com/github/AanyaJindal/Problem-Recommender/blob/master/Lastfm_dataset_als.ipynb)

**Notebook to implement MF - ALS by using the cython library ([here](https://github.com/benfred/implicit)) on the last-fm-360K dataset**

In [1]:
!wget http://mtg.upf.edu/static/datasets/last.fm/lastfm-dataset-360K.tar.gz


Redirecting output to ‘wget-log’.


In [2]:
!ls

lastfm-dataset-360K.tar.gz  sample_data  wget-log


In [3]:
!tar -xvzf lastfm-dataset-360K.tar.gz

lastfm-dataset-360K/
lastfm-dataset-360K/usersha1-artmbid-artname-plays.tsv
lastfm-dataset-360K/README.txt
lastfm-dataset-360K/mbox_sha1sum.py
lastfm-dataset-360K/usersha1-profile.tsv


In [4]:
!ls

lastfm-dataset-360K  lastfm-dataset-360K.tar.gz  sample_data  wget-log


In [8]:
!pip install implicit

Collecting implicit
[?25l  Downloading https://files.pythonhosted.org/packages/d9/0c/7f9f065cceab3c27b7207bbbc3127ff698f1dbaabc2e5f5ef92cb3a39a43/implicit-0.3.6.tar.gz (766kB)
[K    100% |████████████████████████████████| 768kB 6.7MB/s 
Building wheels for collected packages: implicit
  Running setup.py bdist_wheel for implicit ... [?25l- \ | / - \ | / - \ | / - done
[?25h  Stored in directory: /root/.cache/pip/wheels/6c/3d/30/d09ce4a97747d950f06bebbf644590915788b0e8d406795c6f
Successfully built implicit
Installing collected packages: implicit
Successfully installed implicit-0.3.6


import all required libraries

In [0]:
import sys
import pandas as pd
import numpy as np
import scipy.sparse as sparse
from scipy.sparse.linalg import spsolve
import random

from sklearn.preprocessing import MinMaxScaler

import implicit

Load the dataset

In [0]:
# Load the data like we did before
raw_data = pd.read_table('lastfm-dataset-360K/usersha1-artmbid-artname-plays.tsv')
raw_data = raw_data.drop(raw_data.columns[1], axis=1)
raw_data.columns = ['user', 'artist', 'plays']
# Drop NaN columns
data = raw_data.dropna()
data = data.copy()

# Create a numeric user_id and artist_id column
data['user'] = data['user'].astype("category")
data['artist'] = data['artist'].astype("category")
data['user_id'] = data['user'].cat.codes
data['artist_id'] = data['artist'].cat.codes


**Training with first 500k rows only to test out**


In [0]:

# The implicit library expects data as a item-user matrix so we
# create two matricies, one for fitting the model (item-user) 
# and one for recommendations (user-item) -- weeding out already heard artists
sparse_item_user = sparse.csr_matrix((data['plays'][1:500000].astype(float), (data['artist_id'][1:500000], data['user_id'][1:500000])))
sparse_user_item = sparse.csr_matrix((data['plays'][1:500000].astype(float), (data['user_id'][1:500000], data['artist_id'][1:500000])))

**training the model**

In [37]:
# Initialize the als model and fit it using the sparse item-user matrix
model = implicit.als.AlternatingLeastSquares(factors=20, regularization=0.1, iterations=20)

# Calculate the confidence by multiplying it by our alpha value.
alpha_val = 15
data_conf = (sparse_item_user * alpha_val).astype('double')

#Fit the model
model.fit(data_conf)


100%|██████████| 20.0/20 [01:53<00:00,  5.05s/it]


In [38]:
# Find the 10 most similar to Jay-Z
item_id = 147068 #Jay-Z
n_similar = 10

# Use implicit to get similar items.
similar = model.similar_items(item_id, n_similar)

# Print the names of our most similar artists
for item in similar:
    idx, score = item
    print (data.artist.loc[data.artist_id == idx].iloc[0])

    


jay-z
outkast
the roots
kanye west
atmosphere
2pac
girl talk
justin timberlake
eminem
beastie boys


In [41]:

# Create recommendations for user with id 999
user_id = 999

# Use the implicit recommender.
recommended = model.recommend(user_id, sparse_user_item)

artists = []
scores = []

# Get artist names from ids
for item in recommended:
    idx, score = item
    artists.append(data.artist.loc[data.artist_id == idx].iloc[0])
    scores.append(score)

# Create a dataframe of artist names and scores
recommendations = pd.DataFrame({'artist': artists, 'score': scores})

print (recommendations)

                artist     score
0         taylor swift  1.299753
1        metro station  1.277994
2        savage garden  1.243646
3       carter burwell  1.227155
4         kate voegele  1.211949
5      plain white t's  1.199481
6             aly & aj  1.195360
7  panic! at the disco  1.160142
8          onerepublic  1.148700
9         superchic[k]  1.146393
