# The K-Nearest Neighbors Algorithm with Julia
## Daisy Nsibu

k-Nearest neighbors is one of the simplest supervised machine learning algortithm there is. This algorithm can be used to solve classification, regression, and search (reccomendations) problems. It makes no mathematical assumptions, and it doesn’t require any sort of heavy machinery. 

The only things it requires are:

•	Some notion of distance

•	An assumption that points that are close to one another are similar

Pros:
+  Can be used for classification, regression, and search problems
+  Simple and easy to implement
+  No optimization of parameters
    
Cons:
+  Slow
+  Sensitive to high dimension feature vectors and high volume of data


# KNN in Practice

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcR6_nOORLYqVlmjaWOMOAg1_JdIvFZlVSb99w&usqp=CAU)



This is a simple example showing how useful k-nearest neighbors can be. In the following example, the data are for real IMDB rated movies and this machine learning algorithm can make a reccomendation as to what movies are similar to the queried movie. 

Given our movies data set let us find the 6 most similar movies to any given movie!

## Data Set Information:

We are going to use the  [movies_recommendation data](https://github.com/Dnsibu/NsibuD_DATA_4319/blob/main/Supervised%20Learning/02-KNN/movies_recommendation_data.csv)

The dataset contains 30 rows and 11 columns. 

The data contains the following columns:

+ **MovieID**: The ID for the movie

+ **MovieName**: The name of the movie

+ **IMDBRating**: The IMDB rating of the movie

+ **Biography**: Labeled 1: if Biography genre , 0: if not

+ **Drama**: Labeled 1: if Drama genre,  0: if not

+ **Thriller**: Labeled 1: if Thriller genre,  0: if not

## Import Packages

In [1]:
using Pkg
using RDatasets
using Plots
using CSV

┌ Info: Precompiling RDatasets [ce6b1742-4840-55fa-b093-852dadbb1d8b]
└ @ Base loading.jl:1278


## Read Dataset

In [2]:
movies = CSV.read("movies_recommendation_data.csv", DataFrame)

Unnamed: 0_level_0,MovieID,MovieName,IMDBRating,Biography,Drama,Thriller
Unnamed: 0_level_1,Int64,String,Float64,Int64,Int64,Int64
1,58,The Imitation Game,8.0,1,1,1
2,8,Ex Machina,7.7,0,1,0
3,46,A Beautiful Mind,8.2,1,1,0
4,62,Good Will Hunting,8.3,0,1,0
5,97,Forrest Gump,8.8,0,1,0
6,98,21,6.8,0,1,0
7,31,Gifted,7.6,0,1,0
8,3,Travelling Salesman,5.9,0,1,0
9,51,Avatar,7.9,0,0,0
10,47,The Karate Kid,7.2,0,1,0


## X and Y Arrays

In [3]:
x_movie_data = [x for x in zip(movies.IMDBRating, movies.Biography, movies.Drama, movies.Thriller)]
y_movie_data = [x for x in movies.MovieName]

30-element Array{String,1}:
 "The Imitation Game"
 "Ex Machina"
 "A Beautiful Mind"
 "Good Will Hunting"
 "Forrest Gump"
 "21"
 "Gifted"
 "Travelling Salesman"
 "Avatar"
 "The Karate Kid"
 "A Brilliant Young Mind"
 "A Time To Kill"
 "Interstellar"
 ⋮
 "Finding Forrester"
 "The Fountain"
 "The DaVinci Code"
 "Stand and Deliver"
 "The Terminator"
 "21 Jump Street"
 "The Avengers"
 "Thor: Ragnarok"
 "Spirit: Stallion of the Cimarron"
 "Hacksaw Ridge"
 "12 Years a Slave"
 "Queen of Katwe"

## Distance Function: Euclidean distance
![Euclidean distance](https://i.stack.imgur.com/RtnTY.jpg)

In [4]:
function distance(p1,p2)
    return sqrt(sum((p1[i] - p2[i])^2 for i = 1:length(p1)))
end

distance (generic function with 1 method)

In [5]:
function KNN(p, features, labels, k)
    distance_array = [(distance(p, features[i]), labels[i]) for i = 1:length(features)]
    sort!(distance_array, by = x -> x[1])
    return  distance_array[1:k]
end

KNN (generic function with 1 method)

In [10]:
function more_like_this(movie_name, features, labels, k)
    for i = 1:length(labels)
        if labels[i] == movie_name
            neighbors = KNN(features[i], features, labels, k+1)
            println("The top $k movies similar to, $movie_name are:")
            for j = 2:k+1
                println("$(j-1). ", neighbors[j][2] )
            end
        end
    end
end

more_like_this (generic function with 1 method)

In [11]:
more_like_this("The Terminator", x_movie_data, y_movie_data, 6)

The top 6 movies similar to, The Terminator are:
1. Avatar
2. The Avengers
3. Thor: Ragnarok
4. Black Panther
5. Spirited Away
6. The Fountain
