# **KNN Model**

KNN (K-Nearest Neighbors) is a supervised learning algorithm that can be used for classification or regression tasks. 

It works by finding the K most similar instances in the training data to a new instance and using their labels to make a prediction.

KNN is a simple and intuitive algorithm, but it can be computationally expensive for large datasets. 
It is also sensitive to the choice of K, which can affect the accuracy of the model. 

Use cases for KNN include:
- Image classification
- Recommendation systems
- Time series forecasting


In [174]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.metrics import classification_report

In [175]:
df = pd.read_csv("https://github.com/RyanNolanData/YouTubeData/blob/main/500hits.csv?raw=true", encoding="latin-1")

In [176]:
df.head()

Unnamed: 0,PLAYER,YRS,G,AB,R,H,2B,3B,HR,RBI,BB,SO,SB,CS,BA,HOF
0,Ty Cobb,24,3035,11434,2246,4189,724,295,117,726,1249,357,892,178,0.366,1
1,Stan Musial,22,3026,10972,1949,3630,725,177,475,1951,1599,696,78,31,0.331,1
2,Tris Speaker,22,2789,10195,1882,3514,792,222,117,724,1381,220,432,129,0.345,1
3,Derek Jeter,20,2747,11195,1923,3465,544,66,260,1311,1082,1840,358,97,0.31,1
4,Honus Wagner,21,2792,10430,1736,3430,640,252,101,0,963,327,722,15,0.329,1


In [177]:
df = df.drop(columns= ['PLAYER', 'CS'])
df.head()

Unnamed: 0,YRS,G,AB,R,H,2B,3B,HR,RBI,BB,SO,SB,BA,HOF
0,24,3035,11434,2246,4189,724,295,117,726,1249,357,892,0.366,1
1,22,3026,10972,1949,3630,725,177,475,1951,1599,696,78,0.331,1
2,22,2789,10195,1882,3514,792,222,117,724,1381,220,432,0.345,1
3,20,2747,11195,1923,3465,544,66,260,1311,1082,1840,358,0.31,1
4,21,2792,10430,1736,3430,640,252,101,0,963,327,722,0.329,1


In [178]:
X = df.iloc[:, 0:13]
y = df.iloc[:, 13]

In [179]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=45)

In [180]:
# Initialize the MinMaxScaler
# This will scale the features to a range between 0 and 1
scaler = MinMaxScaler(feature_range=(0, 1))

In [181]:
X_train = scaler.fit_transform(X_train)

In [182]:
X_test = scaler.fit_transform(X_test)

In [183]:
# Initialize the KNN classifier
# Using 8 neighbors 
# This is a common choice for KNN
# Using Euclidean distance as the metric, which is standard for KNN
# Euclidean distance is the most common distance metric used in KNN that measures the straight-line distance between two points in Euclidean space.
knn = KNeighborsClassifier(n_neighbors=8, metric='euclidean')

In [184]:
knn.fit(X_train, y_train)

In [185]:
Y_pred = knn.predict(X_test)
print(Y_pred)

[1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 1 1 0 0 1 0 1 0
 1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 1 1 1 0 1 0 1 1 0 0 0 1 0 0 0 1 0
 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 0 0]


In [186]:
print("KNN Score:", knn.score(X_test, y_test))
# The accuracy of the KNN model on the test set
# Accuracy is the ratio of correctly predicted instances to the total instances in the test set.
print("Accuracy:", accuracy_score(y_test, Y_pred))
# Print the confusion matrix and classification report
# Confusion matrix shows the number of correct and incorrect predictions for each class
print("Confusion Matrix:\n", confusion_matrix(y_test, Y_pred))
# Classification report provides precision, recall, f1-score for each class
# It gives a detailed performance evaluation of the model.
print("Classification Report:\n", classification_report(y_test, Y_pred))

KNN Score: 0.8387096774193549
Accuracy: 0.8387096774193549
Confusion Matrix:
 [[58 11]
 [ 4 20]]
Classification Report:
               precision    recall  f1-score   support

           0       0.94      0.84      0.89        69
           1       0.65      0.83      0.73        24

    accuracy                           0.84        93
   macro avg       0.79      0.84      0.81        93
weighted avg       0.86      0.84      0.84        93



In [187]:
print(knn.n_samples_fit_)

372
