# Similarity Scores

We can pick similar data points with K-Nearest Neighbors...but how do we decide what "similar" means?  The two most popular scores are Euclidean and Pearson scores.

- Euclidean scores use the Euclidean distance between two points.  Can be unbounded.
  - Uses the inverse of Euclidean distance, hence the Euclidean score is inversely proportional to the Euclidean distance.
- Pearson scores use the covariance and standard deviation between two points, and can be from -1 (very dissimilar) to 1 (very similar).  Can also be 0 (no relation)

## Imports

In [43]:
import numpy
import json
import sys
import ipywidgets as widgets

sys.path.append("../")

from common.compute_scores import euclidean, pearson

## Data Set and Widgets

In [44]:
with open("ratings.json", 'r') as file:
    dataset = json.load(file)

users = tuple(sorted(dataset.keys()))

user1 = widgets.Dropdown(options=users, description="User #1")
user2 = widgets.Dropdown(options=users, description="User #2")

## Computing Similarity

In [50]:
def similarity(user1, user2):
    euclidean_score = euclidean(dataset, user1, user2)
    pearson_score = pearson(dataset, user1, user2)
    
    print("Movie Taste: {0} vs. {1}".format(user1, user2))
    print("{0}'s Movies:\t{1}".format(user1, tuple(sorted(dataset[user1]))))
    print("{0}'s Movies:\t{1}".format(user2, tuple(sorted(dataset[user2]))))
    print()
    print("Euclidean Similarity:\t{0:.4f}".format(euclidean_score))
    print("Pearson Similarity:\t{0:.4f}".format(pearson_score))

widgets.interactive(
    similarity,
    user1=user1,
    user2=user2
)

interactive(children=(Dropdown(description='User #1', index=6, options=('Adam Cohen', 'Bill Duffy', 'Brenda Pe…