# Summary

Created on Thu Jan 23 2020 <br>
@author: goutham <br>

### User Based Music Recommendation System <br>

Topics
* Collabarative Filtering
* User-Based or Memory-Based Filtering <br>

Similarity Measures Used: 
* Minkowski Distance
* Pearson Correlation <br>

The purpose of this code is to show rudimentary working of a recommendation system. Hence, all the procedures are coded at the least possible complexity. That said, this code could be used as a starting point to build a complex recommendation system.

# Setup the Environment

In [0]:
# Import required packages and functions
import math
from operator import itemgetter

# Create distance and similarity calculation modules as a Class

In [0]:
# definie class similarity
class similarity:
    
    # Class instantiation 
    def __init__ (self, ratingP, ratingQ):
        self.ratings1 = ratingP
        self.ratings2 = ratingQ

    # Minkowski Distance between two vectors
    def minkowski(self, r):
    
        # calcualte minkowski distance
        distance = 0       
        for k in (set(self.ratings1.keys()) & set(self.ratings2.keys())):
            p = self.ratings1[k]
            q = self.ratings2[k]
            distance += pow(abs(p - q), r)
    
        # return value of minkowski distance
        return pow(distance,1/r)

    # Pearson Correlation between two vectors
    def pearson(self):
        
        # Step 1.1
        # set n to the number of common keys
        # do not hardcode! 
        # this should work no matter which 2 dictionares we provide
        commonRat = set(self.ratings1.keys()) & set(self.ratings2.keys())
        n = len(commonRat)
        
        # Step 1.2
        # error check for n==0 condition, and
        # return -2 if n==0
        if n==0:
            n = -2
        else:
            n = n
         
        # Step 1.3
        # use a SINGLE for loop to calculate the partial sums
        # in the computationally efficient form of the pearson correlation   
        sumP=0
        sumPsq=0
        sumQ=0
        sumQsq=0
        sumPQ=0
        for x in commonRat:
            p = self.ratings1[x]
            q = self.ratings2[x]
            sumP = sumP+p
            sumPsq = sumPsq+(p**2)
            sumQ = sumQ+q
            sumQsq = sumQsq+(q**2)
            sumPQ = sumPQ+(p*q)
          
        # Step 1.4
        # calcualte the numerator term for pearson correlation
        # using relevant partial sums
        numer = sumPQ-((sumP*sumQ)/n)
        
        # Step 1.5
        # calcualte the denominator term for pearson correlation
        # using relevant partial sums
        denor = (pow(sumPsq-((sumP**2)/n),1/2)) * (pow(sumQsq-((sumQ**2)/n),1/2))
        
        # Step 1.6
        # error check for denominator==0 condition
        # return -2 if denominator==0
        if denor==0:
            denor = -2
        else:
            denor = denor

        # Step 1.7
        # calcualte the pearson correlation 
        # using the numerator and deonomminator
        # and return the pearson correlation
        r = numer/denor
        return r

# Load the data

In [0]:
# As the intention was to show how the system works, the data size is limited for ease of processing

# user ratings
songData = {"Angelica": {"Blues Traveler": 3.5, "Broken Bells": 2.0, "Norah Jones": 4.5, "Phoenix": 5.0, "Slightly Stoopid": 1.5, "The Strokes": 2.5, "Vampire Weekend": 2.0},
         "Bill":{"Blues Traveler": 2.0, "Broken Bells": 3.5, "Deadmau5": 4.0, "Phoenix": 2.0, "Slightly Stoopid": 3.5, "Vampire Weekend": 3.0},
         "Chan": {"Blues Traveler": 5.0, "Broken Bells": 1.0, "Deadmau5": 1.0, "Norah Jones": 3.0, "Phoenix": 5, "Slightly Stoopid": 1.0},
         "Dan": {"Blues Traveler": 3.0, "Broken Bells": 4.0, "Deadmau5": 4.5, "Phoenix": 3.0, "Slightly Stoopid": 4.5, "The Strokes": 4.0, "Vampire Weekend": 2.0},
         "Hailey": {"Broken Bells": 4.0, "Deadmau5": 1.0, "Norah Jones": 4.0, "The Strokes": 4.0, "Vampire Weekend": 1.0},
         "Jordyn":  {"Broken Bells": 4.5, "Deadmau5": 4.0, "Norah Jones": 5.0, "Phoenix": 5.0, "Slightly Stoopid": 4.5, "The Strokes": 4.0, "Vampire Weekend": 4.0},
         "Sam": {"Blues Traveler": 5.0, "Broken Bells": 2.0, "Norah Jones": 3.0, "Phoenix": 5.0, "Slightly Stoopid": 4.0, "The Strokes": 5.0},
         "Veronica": {"Blues Traveler": 3.0, "Norah Jones": 5.0, "Phoenix": 4.0, "Slightly Stoopid": 2.5, "The Strokes": 3.0}
        }

# Make a Recommendation

In [0]:
# for whom are we making recommendations?
userX = "Veronica" #Angelica,Bill,Chan,Dan,Hailey,Jordyn,Sam,Veronica

In [5]:
# Step 2.0
# Extract userX ratings
userXRatings = songData[userX]

# Step 2.1
# find the similarity measure (pearson correlation) between userX's ratings, and each of the other user's ratings.
# DO NOT include userX's similarity measure from userX.
# use a for loop to get at the other users and their ratings - DO NOT hard code.
# use the similarity class to caclulate the simialrity measure (pearson correlation) between user ratings.
# assign list of (user, similarityMeasure) tuples to a variable called userSimilarities.
# Example of how userSimilarities might look: [('Angelica', 0.42), ('Bill', 0.0), ('Chan', 0.5), ('Dan', 0.39), ('Jordyn', 0.61), ('Sam', -2), ('Veronica', -2)]
userSimilarities = []
for userY, userYRatings in songData.items():
    sim = similarity(userXRatings,userYRatings)
    pearsonXY = sim.pearson()
    if userX != userY:
        userSimilarityXY = (userY,pearsonXY)
        userSimilarities.append(userSimilarityXY)
#print(userSimilarities)

# Step 2.2
# sort the list of tuples by highest simialrity to lowest similarity.
# assign the sorted list to a variable called sortedUserSimilarities.
# Example of how sortedUserSimilarities might look: [('Jordyn', 0.61), ('Chan', 0.5), ('Angelica', 0.42), ('Dan', 0.39), ('Bill', 0.0), ('Sam', -2), ('Veronica', -2)]
sortedUserSimilarities = sorted(userSimilarities, key=itemgetter(1), reverse=True)
#print(sortedUserSimilarities)

# Step 2.3
# userX's NN is the user at the 0th position of the sorted list.
# assign the NN to a variable called userXNN.
# Example of how userXNN might look: 'Jordyn'
# To keep things simple, we are considering only one NN
userXNN = sortedUserSimilarities[0][0]
#print(userXNN)

# Step 2.4
# recos for userX should include albums rated by userXNN, not already rated by userX.
# assign the list of (album, rating) tuples to a variable called userXRecos.
# Example of how userXRecos might look: [('Slightly Stoopid', 4.5), ('Phoenix', 5.0)]
userXNNRatings = songData[userXNN]

userXRecos = []
for artist in set(userXNNRatings.keys()):
    if artist not in set(userXRatings.keys()):
        artistRating = (artist,userXNNRatings[artist])
        userXRecos.append(artistRating)
#print(userXRecos)

# Step 2.5
# sort list of tuples by highest rating to lowest rating.
# assign sorted list to a varaible userXSortedRecos.
# Example of how userXSortedRecos might look: [('Phoenix', 5.0), ('Slightly Stoopid', 4.5)]
userXSortedRecos = sorted(userXRecos, key=itemgetter(1), reverse=True)
#print(userXSortedRecos)

# Final Output
print()
print ("Recommendations for", userX)
print ("-"*(len("Recommendations for"+userX)+1))
print ()
print (userXSortedRecos)


Recommendations for Veronica
----------------------------

[('Broken Bells', 2.0), ('Vampire Weekend', 2.0)]
