# K-Nearest Neighbor Regressor

In [12]:
import json
from sklearn.neighbors import KNeighborsRegressor

## 1. Regression

**Task 1**  
- We’ve imported most of the K-Nearest Neighbor algorithm. 
- Before we dive into finishing the regressor, let’s refresh ourselves with the data.
- At the bottom of your code, print `movie_dataset["Life of Pi"]`. 
- You should see a list of three values. 
- These values are the normalized values for the movie’s budget, runtime, and release year.

<br>

**Task 2**  
- Print the rating for `"Life of Pi"`. 
- This can be found in `movie_ratings`.

<br>

**Task 3**  
- We’ve included the majority of the K-Nearest Neighbor algorithm in the `predict()` function. 
- Right now, the variable neighbors stores a list of `[distance, title]` pairs.
- Loop through every neighbor and find its rating in `movie_ratings`. 
- Add those ratings together and return that sum divided by the total number of neighbors.

<br>

**Task 4**  
- Call `predict` with the following parameters:
    - `[0.016, 0.300, 1.022]`
    - `movie_dataset`
    - `movie_ratings`
    - `5`
- Print the result.
- Note that the list `[0.016, 0.300, 1.022]` is the normalized budget, runtime, and year of the movie *Incredibles 2*! 
- The normalized year is larger than 1 because our training set only had movies that were released between 1927 and 2016 — *Incredibles 2* was released in 2018.

In [9]:
movie_dataset = json.loads(open("movie_dataset.json").read())
movie_ratings = json.loads(open("movie_ratings.json").read())


def distance(movie1, movie2):
    squared_difference = 0
    for i in range(len(movie1)):
        squared_difference += (movie1[i] - movie2[i]) ** 2
    final_distance = squared_difference ** 0.5
    return final_distance

def predict(unknown, dataset, movie_ratings, k):
    distances = []
    #Looping through all points in the dataset
    for title in dataset:
        movie = dataset[title]
        distance_to_point = distance(movie, unknown)
        #Adding the distance and point associated with that distance
        distances.append([distance_to_point, title])
    distances.sort()
    #Taking only the k closest points
    neighbors = distances[0:k]
    rating_sum = 0
    for neighbor in neighbors:
        rating = movie_ratings[neighbor[1]]
        rating_sum += rating
    return rating_sum / k



print(movie_dataset["Life of Pi"])
print(movie_ratings["Life of Pi"])
print(predict([0.016, 0.300, 1.022], movie_dataset, movie_ratings, 5))

[0.00982356711895032, 0.30716723549488056, 0.9550561797752809]
8.0
6.859999999999999


## 2. Weighted Regression

**Task 1**  
- Let’s redo our `predict()` function so it computes the weighted average.
- Before you begin looping through the neighbors, create a variable named `numerator` and set it to `0`. 
- Loop through every neighbor and add the neighbor’s rating (found in `movie_ratings`) divided by the neighbor’s distance to `numerator`.
- For now, return `numerator`.

<br>

**Task 2**  
- Let’s now calculate the denominator of the weighted average. 
- Before your loop, create a variable named `denominator` and set it equal to `0`.
- Inside your for loop, add `1` divided by the neighbor’s distance to `denominator`.
- Outside the loop, return `numerator/denominator`.

<br>

**Task 3**  
- Once again call your `predict` function using *Incredibles 2*‘s features. 
- Those features were `[0.016, 0.300, 1.022]`. 
- Set `k = 5`. 
- Print the results.
- How did using a weighted average change the predicted rating? Remember, before calculating the weighted average the prediction was 6.86.

In [11]:
movie_dataset = json.loads(open("movie_dataset.json").read())
movie_ratings = json.loads(open("movie_ratings.json").read())


def distance(movie1, movie2):
    squared_difference = 0
    for i in range(len(movie1)):
        squared_difference += (movie1[i] - movie2[i]) ** 2
    final_distance = squared_difference ** 0.5
    return final_distance

def predict(unknown, dataset, movie_ratings, k):
    distances = []
    #Looping through all points in the dataset
    for title in dataset:
        movie = dataset[title]
        distance_to_point = distance(movie, unknown)
        #Adding the distance and point associated with that distance
        distances.append([distance_to_point, title])
    distances.sort()
    #Taking only the k closest points
    neighbors = distances[0:k]
    numerator = 0
    denominator = 0
    for neighbor in neighbors:
        rating = movie_ratings[neighbor[1]]
        numerator += rating / neighbor[0]
        denominator += 1 / neighbor[0]
    return numerator/denominator


print(predict([0.016, 0.300, 1.022], movie_dataset, movie_ratings, 5))

6.849139678439045


## 3. Scikit-learn

**Task 1**  
- Create a `KNeighborsRegressor` named `regressor` where `n_neighbors = 5` and `weights = "distance"`.

<br>

**Task 2**  
- We’ve also imported some movie data. 
- Train your classifier using `movie_dataset` as the training points and `movie_ratings` as the training values.

<br>

**Task 3**  
- Let’s predict some movie ratings. 
- Predict the ratings for the following movies:
    - `[0.016, 0.300, 1.022]`,
    - `[0.0004092981, 0.283, 1.0112]`,
    - `[0.00687649, 0.235, 1.0112]` 
- These three lists are the features for *Incredibles 2*, *The Big Sick*, and *The Greatest Showman*. 
- Those three numbers associated with a movie are the normalized budget, runtime, and year of release.
- Print the predictions!

In [13]:
movie_dataset = json.loads(open("movie_dataset.json").read())
movie_ratings = json.loads(open("movie_ratings.json").read())

In [16]:
# Task 1
regressor = KNeighborsRegressor(n_neighbors=5, weights="distance")

# Task 2
regressor.fit(list(movie_dataset.values()), list(movie_ratings.values()))

# Task 3
regressor.predict([[0.016, 0.300, 1.022], [0.0004092981, 0.283, 1.0112], [0.00687649, 0.235, 1.0112]])

array([6.84913968, 5.47572913, 6.91067999])