# Recap

In [55]:
import pandas as pd

df = pd.read_csv('data.csv')

df.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,True,southwest,16884.924
1,18,male,33.77,1,False,southeast,1725.5523
2,28,male,33.0,3,False,southeast,4449.462
3,33,male,22.705,0,False,northwest,21984.47061
4,32,male,28.88,0,False,northwest,3866.8552


## Finding nearest neighbors

❓I am 28 years old, I have a bmi of 30, and I don't smoke. Which person in the dataset is most like me, and how much does she pay? 

Check out the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html#sklearn.neighbors.KNeighborsRegressor.kneighbors)

In [179]:
from sklearn.neighbors import KNeighborsRegressor

# Define X and y
X = df[['age','bmi','smoker']]
y = df['charges']

# Instanciate and train model
knn_model = KNeighborsRegressor()
knn_model.fit(X,y)

knn_model.kneighbors([[28,30,False]],n_neighbors=1)

(array([[0.68493223]]), array([[63]]))

In [180]:
df.iloc[63]

age                  28
sex              female
bmi             30.6849
children              1
smoker                0
region        northwest
charges         4133.64
age_scaled     0.217391
bmi_scaled     0.396151
Name: 63, dtype: object

❓Which person is least like me?

In [184]:
knn_model.kneighbors([[28,30,False]],n_neighbors=len(df))

(array([[ 0.68493223,  0.74      ,  0.875     , ..., 37.18936542,
         37.28391074, 37.49440492]]),
 array([[  63, 1006,  749, ...,  199,  768,  534]]))

In [185]:
df.iloc[534]

age                  64
sex                male
bmi               40.48
children              0
smoker                0
region        southeast
charges         13831.1
age_scaled            1
bmi_scaled     0.659672
Name: 534, dtype: object

## Base KNN

👇 Train and score a base KNN model with `age`,`bmi`, and `smoker` to predict `charges`.

In [186]:
from sklearn.model_selection import train_test_split

# Train/test split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=3)

# Train KNN model
knn_model = KNeighborsRegressor()
knn_model.fit(X_train,y_train)
knn_model.score(X_train,y_train)

0.4831389887309615

## Scaling

👇 Machine Learning algorithms are sensitive to the scale of features. Go to [this link](https://www.codecademy.com/articles/normalization#:~:text=Min%2Dmax%20normalization%20is%20one,decimal%20between%200%20and%201.), read up to the part on Min-Max Normalization, and transform `X` according to the formula.

In [188]:
normalized_X= (X-X.min())/(X.max()-X.min())

normalized_X.head()

Unnamed: 0,age,bmi,smoker
0,0.021739,0.321227,1.0
1,0.0,0.47915,0.0
2,0.217391,0.458434,0.0
3,0.326087,0.181464,0.0
4,0.304348,0.347592,0.0


## KNN Scaled features

👇 Train and score a KNN model with the features you just scaled.

In [191]:
X_train,X_test,y_train,y_test = train_test_split(normalized_X,y,test_size=0.3,random_state=3)

knn_model = KNeighborsRegressor()

knn_model.fit(X_train,y_train)

knn_model.score(X_test,y_test)

0.8019048118837959

# 🏁