<a href="https://colab.research.google.com/github/devadathen/datasciencelab/blob/main/Devadathan_U_KNN3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Goal of the Project

This project is designed for you to practice and solve the activities that are based on the concept:  kNN.




---

### Problem Statement

As an owner of a startup, you wish to forecast the sales of your product to plan how much money should be spent on advertisements. This is because the sale of a product is usually proportional to the money spent on advertisements. To analyse this, you are given a dataset having the following attributes:         

|Attribute|Description|
|-|-|
|TV| TV advertising budget in thousands of dollars.|
|Radio| Radio advertising budget in thousands of dollars.|
|Newspaper| Newspaper advertising budget in thousands of dollars.|
|Sales| Product Sales in thousands of dollars.|

  **Source:** https://www.kaggle.com/ishaanv/ISLR-Auto

Predict the impact of TV advertising on your product sales by using kNN regression and evaluate the accuracy of the model.





---

### List of Activities

**Activity 1:** Import Modules and Read Data
  
**Activity 2:**  Perform Train-Test Split

**Activity 4:** Build kNN Regressor Model






---


#### Activity 1: Import Modules and Read Data

Create a Pandas DataFrame for **Advertising-Sales** dataset using the below link. This dataset contains information about the money spent on the TV, radio, and newspaper advertisement (in thousand dollars) and their generated sales (in thousand units). The dataset consists of examples that are divided by 1000.



  **Dataset :** advertising.csv

Also, print the first five rows of the dataset. Check for null values and treat them accordingly.




In [None]:
# Import modules
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
# Load the dataset
df = pd.read_csv('/content/advertising.csv')
# Print first five rows using head() function
df.head(50)

Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,12.0
3,151.5,41.3,58.5,16.5
4,180.8,10.8,58.4,17.9
5,8.7,48.9,75.0,7.2
6,57.5,32.8,23.5,11.8
7,120.2,19.6,11.6,13.2
8,8.6,2.1,1.0,4.8
9,199.8,2.6,21.2,15.6


In [None]:
# Check if there are any null values. If any column has null values, treat them accordingly.
df.isnull().sum()

TV           0
Radio        0
Newspaper    0
Sales        0
dtype: int64

**Q:** Are there any missing or null values in the dataset?

**A:**no


---

#### Activity 2: Perform Train-Test Split

In this dataset, `Sales` is the target variable and all other columns other than `Sales` are feature variables.

Create two separate DataFrames, one containing the feature variables and the other containing the target variable.





In [None]:
df.columns

Index(['TV', 'Radio', 'Newspaper', 'Sales'], dtype='object')

In [None]:
x=df[['TV', 'Radio', 'Newspaper', 'Sales']]
y=df['Sales']

In [None]:
# Split the dataset into dependent and independent features
feature_df=df.drop(['Sales'],axis=1)
target_df=df['Sales']

Normalise all the feature variables using the `StandardScaler` technique so that all the features have mean `0` and the same variance before applying kNN.

In [None]:
# Normalise the feature variables using 'StandardScaler'.

# Import 'StandardScaler' from 'sklearn.preprocessing' module.
from sklearn.preprocessing import StandardScaler
# Create an object of 'StandardScaler' and call 'fit_transform()' function by passing feature variables.
ob = StandardScaler()
scaled_df = ob.fit_transform(feature_df)
# print(scaled_df)
# Convert the scaled features array obtained from 'fit_transform()' function into a DataFrame.
converted_df = pd.DataFrame(scaled_df)
converted_df.columns = feature_df.columns
converted_df.head(50)

Unnamed: 0,TV,Radio,Newspaper
0,0.969852,0.981522,1.778945
1,-1.197376,1.082808,0.669579
2,-1.516155,1.528463,1.783549
3,0.05205,1.217855,1.286405
4,0.394182,-0.841614,1.281802
5,-1.615408,1.731034,2.04593
6,-1.045577,0.643905,-0.324708
7,-0.313437,-0.247406,-0.872487
8,-1.616576,-1.429069,-1.360424
9,0.616043,-1.395307,-0.430582


Split the dataset into a train set and test set such that the train set contains 70% of the instances and the remaining instances will become the test set.

In [None]:
# Split the DataFrame into the train and test sets.
# Perform train-test split using 'train_test_split' function.
x_train,x_test,y_train,y_test=train_test_split(converted_df,target_df,train_size=0.7,random_state=1)

# Print the shape of train and test sets.
print("x Train",x_train.shape)
print("x Test",x_test.shape)
print("y Train",y_train.shape)
print("y Test",y_test.shape)

x Train (140, 3)
x Test (60, 3)
y Train (140,)
y Test (60,)


After this activity, you must obtain train and test sets so that they can be used for training and testing the kNN regressor model.

---

#### Activity 3: Build kNN Regressor Model

Deploy the kNN regressor model for the optimal value of $k$ using the steps given below:   

1. Import the `KNeighborsRegressor` class from the `sklearn.neighbors` module (if not imported yet).

2. Create an object of `KNeighborsRegressor` and pass the optimal $k$ value 2 as input to its constructor.

3. Call the `fit()` function using the regressor object and pass the train set as inputs to this function.

4. Perform prediction for train and test sets using the `predict()` function.

5. Also, determine the accuracy score of the train and test sets using the `score()` function.

In [None]:
from sklearn.neighbors import KNeighborsRegressor
# Train kNN regressor model
kn = KNeighborsRegressor(n_neighbors=2)
kn.fit(x_train,y_train)
# Perform prediction using 'predict()' function.
y_predicted = kn.predict(x_test)
sample_data = [[-1.516155 ,1.528463,1.783549]]
p=kn.predict(sample_data)
print("Predicted Values=",p)
print("Test score",kn.score(x_test,y_test))
# Call the 'score()' function to check the accuracy score of the train set and test set.
print("Train score",kn.score(x_train,y_train))

Predicted Values= [9.6]
Test score 0.9243088142911436
Train score 0.9635754264321423




**Q:** Write down the train and test set accuracy scores for the kNN regressor?

Test score 0.9243088142911436
Train score 0.9635754264321423


After this activity, you must obtain a kNN regressor model using the `sklearn` module for predicting total sales based on advertising budgets.

---