[Note: Just run the cells for the Data Preparation part and start answering the questions after that]

### Data Preparation
For this task, you will perform the following steps:
- Load all the necessary packages for this exercise
- Load the data
- Split the data into input features and the target variable
- Set cateogorical columns as "Categorical" in the input dataset
- Split the data into training and validation datasets
- Standardize numeric variables in the datasets

In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import f1_score, precision_score, recall_score, accuracy_score, mean_squared_error

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

# Suppressing Warnings
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Importing the dataset
telecom = pd.read_csv("telecom_churn_dataset.csv")

In [None]:
##Random Delete later
telecom.shape

In [None]:
non_categorical_columns = ['tenure','MonthlyCharges','TotalCharges']
for column in telecom.columns:
    if column not in non_categorical_columns:
        telecom[column] = pd.Categorical(telecom[column])

In [None]:
X = telecom.drop(['Churn','customerID'], axis=1)
y = telecom['Churn']

X = pd.get_dummies(X, drop_first=False) #for kNN and trees
X2 = pd.get_dummies(X, drop_first=True) #for logistic regression

X_train, X_val, X2_train, X2_val, y_train, y_val = train_test_split(X, X2, y, test_size=0.3, random_state = 1)

# Standardize our non-dummy variables
scaler = StandardScaler()
X_train[['tenure','MonthlyCharges','TotalCharges']]= scaler.fit_transform(X_train[['tenure','MonthlyCharges','TotalCharges']])
X_val[['tenure','MonthlyCharges','TotalCharges']]= scaler.transform(X_val[['tenure','MonthlyCharges','TotalCharges']])

X2_train[['tenure','MonthlyCharges','TotalCharges']]= scaler.fit_transform(X2_train[['tenure','MonthlyCharges','TotalCharges']])
X2_val[['tenure','MonthlyCharges','TotalCharges']]= scaler.transform(X2_val[['tenure','MonthlyCharges','TotalCharges']])

### Q1 - Value of k: Validation Set

First, we will build a k-NN model for this problem statement. What is the optimal k for fitting a k-NN model using the validation set? (Iterate the k value from 1 to 35)


For this task, you will perform the following steps:
- Find the optimal k value for which the kNN model gives the maximum validation set accuracy

In [None]:
# Define the parameter range of k from 1 to 35

# Fit a kNN model for each k value, find the validation set accuracy and store them in a list


# Find the k value which gives the maximum validation set accuracy in the list


### Q2 - Value of k: Cross Validation

What is the optimal k for fitting a k-NN model using cross validation? (Iterate the k value from 1 to 35 and use 5 folds of cross validation)

For this task, you will perform the following steps:
- Find the optimal k value for which the kNN model gives the maximum mean test accuracy using GridSearchCV

In [None]:
# Initialize the kNN classifier model


# defining Grid search cv with parameter range


# Define the parameter range of k from 1 to 35

# Define the GridSearchCV with the parameter range, kNN model and 5 folds of cross validation

# Fit the GridSearchCV on the training dataset

# Find the best k value from the grid search


### Q3. Accuracy

From the optimal k found using the validation set and using cross validation, which one gives the highest accuracy on the validation set?

For this task, you will perform the following steps:
- Find the approach whose optimal k gives the maximum validation set accuracy

In [None]:
# Fit a kNN model using the optimal k value obtained in Q1 and find the validation set accuracy

# Fit a kNN model using the optimal k value obtained in Q2 and find the validation set accuracy

# Find which task's optimal k gives the highest validation set accuracy


### Q4 - Model Performance

Explore the performance of a logistic regression model and a decision tree model for this dataset and select the correct statements from the options given below.

Note - 

    Use the optimal k obtained using the validation set for the kNN model
    Use the CCP alpha as 0.0048016 for the decision tree model
    Use no penalty, lbfgs solver, random state as 0 and maximum iteration as 200 for the logistic regression model

For this task, you will perform the following steps:
- Analyzing the training and validation accuracies obtained for the logistic regression model, decision tree model and k-NN model

#### Logistic Regression Model

In [None]:
# Fit a logistic regression model on the training dataset and find the accuracy for training and validation datasets
# Hint: You need to set the 'penalty' parameter to 'none' and 'solver' to 'lbfgs'
# Note: Use 'max_iter = 200' and 'random_state = 0' for the model
# Note: Use X2_train and X2_val for logistic regression



In [None]:
#Compute the training accuracy


In [None]:
#Compute the validation accuracy


#### Decision Tree Model

In [None]:
# Fit a decision tree model on the training dataset and find the accuracy for training and validation datasets
# Note: Use 'ccp_alpha = 0.0048016' and 'random_state = 0' for the model

# Fit a decision tree model on the training dataset and find the accuracy for training and validation datasets
# Note: Use optimal k value obtained using the validation dataset in Task 2

# Tabulate your results and answer the question

In [None]:
#Write your code to build the Tree Model Here


In [None]:
##Compute the training accuracy


In [None]:
##Compute the validation accuracy
