### User based recommendation System 

- This code is to build a user based recommendation system.
- It will recommend users new locations like museums, monuments, events etc.. that they have never seen before
- The dataset is just a small mackup data but the system can potentially work with any dataset of the same format

In [38]:
import json
data = json.load(open('data/data.json'))

In [39]:
for i, k in data.items():
    print(i, len(list(k.items())))

Alket 6
Paolo 6
Maurizio 4
Simone 5
Francesco 6
Marco 5
Mirko 3


In [40]:
# accessing the value of rating
data['Alket']['Museo delle cere']

2.5

### Finding Similar Users

- After collecting data about the things people like, you need a way to determine how similar people are in their tastes. You do this by comparing each person with every other person and calculating a similarity score. 


- there are two systems for calculating similarity scores: Euclidean distance and Pearson correlation.

### Using Euclidean distance

In [41]:
from math import sqrt

# Returns a distance-based similarity score for person1 and person2
def sim_distance(prefs, person1, person2):
    
    # Get the list of shared_items
    shared_items={}
    for item in prefs[person1]:
        if item in prefs[person2]:
            shared_items[item]=1
 
    print(shared_items)
    
    
    # if they have no ratings in common, return 0
    if len(shared_items)== 0: 
        return 0

    # Add up the squares of all the differences
    sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2) for item in shared_items])

    return 1/(1+sqrt(sum_of_squares))

In [42]:
sim_distance(data, 'Alket','Francesco')

{'Museo delle cere': 1, 'Duomo di Bologna': 1, 'Piazza Nettuno': 1, 'Punto Panoramico San Luca': 1, 'Mercato delle spezie': 1, 'Teatro Valli': 1}


0.4142135623730951

### Using Pearson correlation

In [43]:
# Returns the Pearson correlation coefficient for p1 and p2
def sim_pearson(prefs,p1,p2):
    # Get the list of mutually rated items
    si={}
    for item in prefs[p1]:
        if item in prefs[p2]: si[item]=1

    # Find the number of elements
    n=len(si)

    # if they have no ratings in common, return 0
    if n==0: return 0

    # Add up all the preferences
    sum1=sum([prefs[p1][it] for it in si])
    sum2=sum([prefs[p2][it] for it in si])

    # Sum up the squares
    sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
    sum2Sq=sum([pow(prefs[p2][it],2) for it in si])

    # Sum up the products
    pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])

    # Calculate Pearson score
    num=pSum - (sum1*sum2/n)
    den=sqrt((sum1Sq - pow(sum1,2)/n)*(sum2Sq - pow(sum2,2)/n))
    if den==0: return 0
    r=num/den

    return r

In [44]:
sim_pearson(data,'Alket','Francesco')

0.6482037235521628

In [45]:
# Returns the best matches for person from the prefs dictionary.
# Number of results and similarity function are optional params.
def topMatches(prefs,person,n=5,similarity=sim_pearson):
    
    scores=[(similarity(prefs,person,other),other) for other in prefs if other!=person]

    # Sort the list so the highest scores appear at the top
    scores.sort(  )
    scores.reverse(  )
    return scores[0:n]

In [46]:
topMatches(data,'Alket',n=3)

[(0.6482037235521628, 'Francesco'),
 (-0.10714285714285597, 'Marco'),
 (-0.13483997249264842, 'Maurizio')]

### Recommending Items
- Finding a good critic to read is great, but what I really want is a movie recommendation right now. I could just look at the person who has tastes most similar to mine and look for a movie he likes that I haven’t seen yet, but that would be too permissive. Such an approach could accidentally turn up reviewers who haven’t reviewed some of the movies that I might like. It could also return a reviewer who strangely liked a movie that got bad reviews from all the other critics returned by topMatches.

- To solve these issues, you need to score the items by producing a weighted score that ranks the critics. Take the votes of all the other critics and multiply how similar they are to me by the score they gave each movie

In [47]:
# Gets recommendations for a person by using a weighted average
# of every other user's rankings
def getRecommendations(prefs,person,similarity=sim_pearson):
    totals={}
    simSums={}
    for other in prefs:
        # don't compare me to myself
        if other==person: continue
        sim=similarity(prefs,person,other)

        # ignore scores of zero or lower
        if sim<=0: continue
        for item in prefs[other]:

            # only score movies I haven't seen yet
            if item not in prefs[person] or prefs[person][item]==0:
            # Similarity * Score
                totals.setdefault(item,0)
                totals[item]+=prefs[other][item]*sim
                # Sum of similarities
                simSums.setdefault(item,0)
                simSums[item]+=sim

    # Create the normalized list
    rankings=[(total/simSums[item],item) for item,total in totals.items(  )]

    # Return the sorted list
    rankings.sort(  )
    rankings.reverse(  )
    return rankings

In [49]:
getRecommendations(data,'Mirko')

[(4.5, 'Piazza Nettuno'),
 (4.0, 'Punto Panoramico San Luca'),
 (2.5, 'Mercato delle spezie')]