# ASSIGNMENT 4 - INTRO TO MACHINE LEARNING | Resampling, Model Evaluation, Feature Selection and Regularization


> **FULL MARKS = 160**





1.2. Leave-One-Out Cross Validation(LOOCV) | SCORE :


**Note:** To submit the assignment, please follow the same steps and in assignments 1, 2, & 3.

In this assignment we will use things we have learned from previous exercises. You will not be given instructions on how to load data, plot data, standardize/normalize data, how to write functions and many more. You will be given instructions on what you will be doing. **SO PLEASE START THIS ASSIGNMENT AS EARLY AS POSSIBLE**

1. **Resampling Methods | SCORE : 40**
  
  **1.1 K-Fold Cross Validation**
      
    References
    > Please follow lecture notes

  **1.2 Leave-One-Out Cross Validation(LOOCV) | SCORE :**
      
    References
    > Please follow lecture notes


2. **Model Evaluation | SCORE : 35**
  
  **2.1 Confusion Matrix:**
      
    References
    > Please follow references on previous assignments

  **2.2 Metrics : Accuracy, Precision, Recall, F1-Score and ROC-Curve**
      
    References
    > https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html
    > https://towardsdatascience.com/receiver-operating-characteristic-curves-demystified-in-python-bd531a4364d0
    > https://www.daniweb.com/programming/computer-science/tutorials/520084/understanding-roc-curves-from-scratch

3. **Linear Model Selection | SCORE : 40**
  
  **3.1 Subset Feature Selection**
      
    References
    > https://archive.ics.uci.edu/ml/datasets/Online+Video+Characteristics+and+Transcoding+Time+Dataset#

  **3.2 Forward stepwise feature selection**
      
    References
    > Please follow lecture ntoes

4. **Model Regularization | SCORE : 45**
  
  **4.1 Ridge Regression - L2 Rgularization**
      
    References
    > https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html

  **4.2 Lasso Regression - L1 Regularization**
      
    References
    > https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html


### 1. Resampling Methods
---



---



In [None]:
# Required Library are loaded for you
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker
from matplotlib import rcParams
# figure size in inches
rcParams['figure.figsize'] = 16,8

In [None]:
# The following base class ResamplingMethods is implemented
# You will implemenet KFold and LOOCV class by inheriting this base class
# To make your work simpler we will show you a simple example
# by inheriting this class to make validation set resampling method

In [None]:
class ResamplingMethods(object):
  def __init__(self, sklearn_databunch = load_iris(), test_size = 0.2, val_size = 0.2):
    """
    :param dict sklearn_databunch: a databunch or dictionary that contains keys(data, target, feature_names, target_names, DESCR)
    :param float test_size: This represent percentage of data size to be selected as test set from given data
    :param float val_size: This represent percentage of data size to be selected as validation set from given data
    :return None
    """
    # Data Initialization
    self.__x = preprocessing.scale(sklearn_databunch.get('data'))
    self.__y = sklearn_databunch.get('target')
    self.xlabel = sklearn_databunch.get('feature_names')
    self.ylabel = sklearn_databunch.get('target_names')
    self.description = sklearn_databunch.get('DESCR')

    self.test_size = test_size
    self.val_size = val_size

    # Do train-test split here
    # Please note one important remark here
    # self.xtest and self.ytest will be untouched and will be used only in final step of model validation
    # To implement resampling methods we will further split self.xtrain and self.ytrain into train and val set
    # So we can have different train-val set throughout different resampling implementation
    # Howevere, our test set will be keep fixed, this is our FINAL CHECK,
    # If our model doesn't perform well on FINAL_CHECK,
    # i.e, for perfect case(note that accuracy is acceptable), test_accuracy>=val_accuracy>train_accuracy
    self.xtrain, self.xtest, self.ytrain, self.ytest = train_test_split(self.__x,
                                                                        self.__y,
                                                                        test_size=self.test_size,
                                                                        random_state=4347)

    self.record = {'data':[],'accuracy':[],'loss':[],}

  # Create_model creates a logistic regression model with random state of our course_id
  def create_model(self,**kwargs):
    # Classifier Initialization
    self.model = LogisticRegression(random_state=4347)

  # If you print an instance of this class it will print description of the dataset
  def __repr__(self):
    return self.description

  # Run corresponding resampling methods implemented in inherited classes
  def run(self,**kwargs):
    return getattr(self,self.method)()

  # Calculate and return logistic between y and y_(predicted values of y given x)
  def logistic_loss(self,x,y):
    """
    :param ndarray x: input to model
    :param ndarray y: true label
    :return float: logistic_loss
    """
    y_hat = self.model.predict_proba(x)
    y = np.eye(len(self.ylabel))[y]
    return -np.mean(y*np.log(y_hat)+(1-y)*np.log(1-y_hat))

  def predict(self,*X):
    """
    :param tuple of ndarray X: X contains number of different x's eg.X may be xtrain only or (xtrain,xval,xtest)
    :return list of ndarray: if X contains more than one x(i.e, xtrain,xval,xtest) returns list of [ytrain_predicted, yval_predicted,ytest_predicted]
    """
    return [*map(lambda x:self.model.predict(x),X)]

  def accuracy(self,*XY):
    """
    :params pair tuple of ndarray XY: eg. ((xtrain,ytrain),(xval,yval),(xtest,ytest))
    :return list of accuracy: eg.[train_accuracy,val_accuracy,test_accuracy]
    """
    return [*map(lambda xy:self.model.score(*xy),XY)]

  def loss(self,*XY):
    """
    :params pair tuple of ndarray XY: eg. ((xtrain,ytrain),(xval,yval),(xtest,ytest))
    :return list of loss: eg.[train_loss,val_loss,test_loss]
    """
    return [*map(lambda xy:self.logistic_loss(*xy),XY)]

  def fitmodel(self,xtrain, ytrain, **kwargs):
    """
    :param ndarray xtrain: input to model
    :param ndarray ytrain: true label
    """
    self.create_model()
    self.model.fit(xtrain, ytrain)

  def get_report(self):
    """
    calculate average loss and accuracy
    """
    report = self.record.groupby(['data']).mean()[['accuracy','loss']]
    report['method'] = self.method
    return report

  def visualize(self, **kwargs):
    """
    :param dict kwargs:contain argument for plotting
    """
    config = dict(data=self.record,
                  x=self.method,
                  y="accuracy",
                  hue="data",
                  size="loss",
                  sizes=(40, 400),
                  alpha=.5,
                  aspect=3)
    config.update(**kwargs)
    sns.relplot(**config)
    plt.title(f'Visualization for CV Method : {self.method}')

In [None]:
# The following is a simple implementation of Validation Set Approach
# Please follow the lecture notes in given references to understand the validation set approach
# We will now create a class ValidationSet by inheriting Base Class ResamplingMethods
# We need to implement two important functions here
# Function train_val_split will return xtrain,xval,ytrain,yval
# Function valset will perfom 3 important tasks
                  # 1. Create a record instance, which will keep track of accuracy and loss and finally convert this instance to DataFrame
                  # 2. Fit N random steps (in each step we do random train_test split which you can see in function train_val_split, without specifying random_state)
                  # 3. Calculate Accuracy
# All other methods will be reused from base class(you can think of this as an extension class to base)

In [None]:
class ValidationSet(ResamplingMethods):
  def __init__(self,fit_steps = 10):
    """
    :params int fit_steps: This is a number of times model needs to fit with random split of data(except test) to xtrain and val set
    """
    super(ValidationSet, self).__init__()
    self.fit_steps = fit_steps
    self.method = 'valset'

  def train_val_split(self, **kwargs):
    """
    :params ndarray x: x is a feature that should be splitted into further train-val sets
    :params ndarray y: x is a label that should be splitted into further train-val sets
    :return tuples of ndarray
    """
    x = kwargs.get('x', self.xtrain)
    y = kwargs.get('y', self.ytrain)
    xtrain, xval, ytrain, yval = train_test_split(x,y,test_size = self.val_size, random_state = kwargs.get('random_state'))
    return xtrain, xval, ytrain, yval

  def valset(self,**kwargs):
    self.record.update({self.method:[]})
    # Run fit_steps time
    # Each time perform split on self.xtrain and self.ytrain to produce xtrain,xval,ytrain,yval
    # Each time fit new model with xtrain,ytrain]
    # Calculate accuracy on (xtrain,ytrain),(xval,yval),(self.xtest,self.ytest)
    # Calculate loss on (xtrain,ytrain),(xval,yval),(self.xtest,self.ytest)
    # Record all metrics
    # After done convert record into dataframe, which will be used for visualization
    for i in range(self.fit_steps):
      xtrain, xval, ytrain, yval = self.train_val_split(**kwargs)
      self.fitmodel(xtrain,ytrain)
      self.record['data'].extend(['train','test','val'])
      self.record['accuracy'].extend(self.accuracy((xtrain, ytrain),(xval,yval),(self.xtest,self.ytest)))
      self.record['loss'].extend(self.loss((xtrain, ytrain),(xval,yval),(self.xtest,self.ytest)))
      self.record[self.method].extend([i]*3)

    self.record = pd.DataFrame(self.record)

# Now here Lets Create an instance of validation set with fit steps = 10
vs = ValidationSet(fit_steps = 10)
# Lets see the data description
print(vs)
# Now run this model
vs.run()
# Now Let us Visualize this model
vs.visualize()

***EXERCISE NO. 1***

  > **Task-1 | Score : 10**

In [None]:
# Now you will implement KFold Methods
# Please follow the lecture notes in the given references to understand how KFold set approach
# You will now create a class KFold by inheriting Base Class ResamplingMethods
# You need to implement two important function here, First two function are implemented for you
                  # 1. create_k_slices : This function will take k,N(optional, default : length of self.xtrain),window(optional, default : length of self.xtrain // k)
                                         # and return list of slices, eg. for k = 10, and length of self.train=120, [slice(0,12),slice(12,24),.....,slice(108,120)]
                  # 2. get_kth_mask : This function will take k, which represents train_val set of kth fold and return train_mask and val_mask
                  # 3. train_val_split : This function will take k, which represents train_val set of kth fold, calculate masks and return xtrain,xval,ytrain,yval
                  # 4. kfold : this function will run a loop k steps, each steps fit model on kth fold and record results and finally replace record as dataframe
# Now fill empty code section wherever asked

In [None]:
class KFold(ResamplingMethods):
  def __init__(self,k = 10):
    super(KFold, self).__init__()
    """
    :params int k: This is a number of times model needs to fit with k fold dataset
    """
    self.k = k
    self.create_k_slices(k)
    self.method = 'kfold'

  def create_k_slices(self, k =10, **kwargs):
    N = kwargs.get('N',len(self.xtrain))
    window = kwargs.get('window', len(self.xtrain)//k)
    self.slices = [slice(*i) for i in zip(range(0,N+1,window),range(window,N+1,window))]

  def get_kth_mask(self, k=0):
    assert k<len(self.slices), '!!!K CANNOT BE MORE THAN NUMBER OF SLICES!!!'
    indices = np.arange(0,len(self.xtrain))
    val_mask = (self.slices[k].start <= indices)*(self.slices[k].stop > indices)
    train_mask = np.logical_not(val_mask)
    return train_mask,val_mask

  def train_val_split(self,k):
    # calculate train and val mask
    train_mask, val_mask = ??
    # Now you will be selecting xtrain and xval set from self.xtrain
    # Use indexing mask and select required data
    # Note: xtrain, xval are splitted from self.xtrain using mask index so xtrain is not self.xtrain but a slice of it
    # select xtrain using train mask
    xtrain = ??
    # select xval using val mask
    xval = ??
    # select ytrain using train mask
    ytrain = ??
    # select yval using val mask
    yval = ??
    # return xtrain,xval, ytrain, yval
    return xtrain,xval,ytrain,yval

  def kfold(self,**kwargs):
    self.record.update({self.method:[]})
    for i in range(self.k):
      # calculate xtrain,xval,ytrain,yval for ith fold
      xtrain, xval, ytrain, yval = ??
      # fit model with xtrain and ytrain
      ??
      self.record['data'].extend(['train','test','val'])
      # record results for train-val-test sets
      self.record['accuracy'].extend(??)
      self.record['loss'].extend(??)

      self.record[self.method].extend([i]*3)
    # Convert results into dataframe
    self.record = pd.DataFrame(self.record)



# Now here Lets Create an instance of KFOld set with k = 10
kf = KFold(k = 10)
# Now run this model
kf.run()
# Now Let us Visualize this model
kf.visualize()

***EXERCISE NO. 1***

  > **Task-2 | Score :5**

In [None]:
# Now you will implement LOOCV(Leave One Out Cross Validation) Method
# Please follow lecture notes in given references to understand how LOOCV set approach

# LOOCV is a special case of KFold with k is as equal to train size and validation set will have size = 1(that is why leave one out)
# So we will now Inherit KFold Class instead of Base
# You have to fit your model k times(that is as much as train size is)
# You have to make one small change in __init__ to make

In [None]:
class LOOCV(KFold):
  def __init__(self):
    super(LOOCV, self).__init__()
    self.method = 'leaveoneout'
    # Make that small change here
    self.k = ??
    ??#create_k_slices_using_k_ you can use function defined in BaseClass(KFold)


  def leaveoneout(self):
    self.record.update({self.method:[]})
    # just call default kfold method
    return self.kfold()

# Now here Lets Create an instance of LOOCV
lo = LOOCV()
# Now run this model
lo.run()
# Now Let us Visualize this model
lo.visualize()

***EXERCISE NO. 1***

  > **Task-3 | Score :10**

In [None]:
# Report Plotting
def plotreports(*reports,metric='accuracy'):
  reports = pd.concat(reports).reset_index()
  sns.barplot(data = reports, x = 'method', hue='data', y=metric)
  plt.title(f'Comparing {metric}')

# Now Plot all 3 different results
vs.visualize(aspect=1.5, height = 4)
kf.visualize(aspect=1.5, height = 4)
lo.visualize(aspect=1.5, height = 4)

In [None]:
# Plot accuracy report
plotreports(vs.get_report(),kf.get_report(),lo.get_report(), metric='accuracy')

In [None]:
# Plot loss report
plotreports(vs.get_report(),kf.get_report(),lo.get_report(), metric='loss')

***EXERCISE NO. 1***

  > **Task-4 | Score :15**

In [None]:
# Now Answer following questions
# Answer should go below
"""
Question no. 1 : Why the Validation Set approach has a high variability?

Question no. 2 : Why the Validation leaveoneout approach has very low variability?

Question no. 3: Why would you prefer KFold instead of the Validation Set Approach and LeaveOneout Approach?

Question no. 4: What is the serious limitation of the LeaveOneOut Approach?

Question no. 5: Compare these 3 different results.

"""
print()

***EXERCISE NO. 2***

  > **Task-1 | Score :10**

In [None]:
# We will create an instance of the validation set model with fit steps = 1, and fit the model
# We will now assign the model associated with the vs to a variable named model
# We use random_state = 4347 to make sure while running .run method we have a specific validation set that may not change
vs = ??
??#run with given random state

# Now get xtrain,ytrain,xval,yval,xtest,ytest
# Since we used random_state=4347 in .run we need to pass this random_state=4347 argument to get xtrain,xval,ytrain,yval from train_val_split
xtrain, xval, ytrain, yval = ??
xtest,ytest = ??

# Model you will be using from vs
model = ??

***EXERCISE NO. 2***

  > **Task-1 | Score :10**

In [None]:
# Now you have model, and you know how to access the iris dataset
# Now perform following task based on your previous exercise

# 1. Calculate Two Confusion Matrices, on val and test set




In [None]:
# 2. Plot your confusion matrix on val set (better use seaborn plotting as shown in previous exercises)
#


In [None]:
# 3. Plot your confusion matrix on test set (better use seaborn plotting as shown in previous exercises)



In [None]:
# 4. Now Answer following questions

"""
# Question no. 1: Based on your confusion matrices discuss important observations you made


"""
print()

***EXERCISE NO. 2***

  > **Task-2 | Score :25**

In [None]:
# Here we will perform One vs all approach
# The Iris dataset has 3 classes namely, ['setosa', 'versicolor', 'virginica']
# We will modify this dataset to make this 3 different datasets,
# First Case will be setosa vs other, i.e, setosa will be encoded as 1 and rest as 0
# Second Case will be setosa vs other, i.e, versicolor will be encoded as 1 and rest as 0
# Third Case will be setosa vs other, i.e, virginica will be encoded as 1 and rest as 0
# We will consider only one case where we treat this problem as setosa vs other

x, y = load_iris(return_X_y = True)# x = preprocessing.scale(x)
y = 1*(y==0)
xtrain,xtest, ytrain, ytest = train_test_split(x,y,test_size=0.2)

In [None]:
# Now fit a logistic regression model using sklearn
model = LogisticRegression()

# Now fit this model with corresponding data from configs
model.fit(xtrain,ytrain)

# Now Follow following reference
# https://www.daniweb.com/programming/computer-science/tutorials/520084/understanding-roc-curves-from-scratch
# Peform similar experiment

In [None]:
# 1. Report TPR,FPR,F1 Score

# 2. Discuss meaning of these scores

# 3. Plot a Predicted_probability vs Decision_boundary plot

# 4. Plot an ROC Curve

# 5. Discuss your important findings

***EXERCISE NO. 3***

  > **Task-1 | Score :10**

In [None]:
# The following is a dataset for Online Video Characteristics and Transcoding Time Dataset Data Set from UCI machine learning datasets
# This dataset is much larger than the iris dataset
# We will use this dataset for a regression problem
# You can view meta data information in given link
import IPython
IPython.display.HTML("https://archive.ics.uci.edu/ml/datasets/Online+Video+Characteristics+and+Transcoding+Time+Dataset#")

In [None]:
# We have already downloaded and processed this dataset for you
# You can download this dataset by
!wget https://raw.githubusercontent.com/keshavsbhandari/CS4347/master/assignment1_regression/data/video_feature_train.csv
!wget https://raw.githubusercontent.com/keshavsbhandari/CS4347/master/assignment1_regression/data/video_feature_test.csv

In [None]:
# Now let us load this dataset
# You have to do nothing, this data is already processed and standardize
train = pd.read_csv('video_feature_train.csv')
test = pd.read_csv('video_feature_test.csv')

feature_names = list(train.columns)[:-1]
target_names = list(train.columns)[-1:]


# Now create xtrain and xtest
xtrain = train.drop(columns=['label']).values
ytrain = train['label'].values

xtest = test.drop(columns=['label']).values
ytest = test['label'].values

# View size of data
print(f'xtrain : {xtrain.shape}, ytrain : {ytrain.shape}')
print(f'xtest : {xtest.shape}, ytest : {ytest.shape}')

In [None]:
# Now let us define a LinearRegression model
from sklearn.linear_model import LinearRegression
model = LinearRegression()

In [None]:
# Teck %%timeit check on model fitting
%%timeit
model.fit(xtrain,ytrain,)

In [None]:
# Now let us see our model score
# Score represents what percentage our model explains the target variable pretty well
score = model.score(xtest,ytest)
score

In [None]:
# Let us recall some basic understanding
# Here our model is infact following equeation
# model(x) = b + w0*x0 + w1*x1 + ..... + w24*x24
# Note: x has dim 25
# where b is intercept with dim 1
# where w0,w1 is coef_ = [w0,w1,.....,w24] of dim25

In [None]:
# Print size of coef_ and intercept
print(f'coef_ shape : {model.coef_.size}, intercept_shape : {model.intercept_.size}')

In [None]:
# Now let us see our coef_ = [w0,w1,.....]
model.coef_

In [None]:
# Now let us see our intercept_
model.intercept_

In [None]:
# Let us visualize importance of these coef_
plt.bar(x=feature_names, height=np.abs(model.coef_), label='Feature Importance')
plt.xticks(rotation='vertical')
plt.legend()

In [None]:
# We can see that many features do not contribute much as they are close to 0

In [None]:
# So with this understanding we come to realize that not every feature is important
# We have many different ways to get rid of features that are not very important
# One of the easiest way is itereate over all possible combinations

In [None]:
# We will do subset feature selection here
# Subset feature selection is a way to take all possible permutations of features
# For our case this will be extremely bigger, so let us filter out some of weight less than mean of coef_

In [None]:
important_feature = list(np.array(feature_names)[model.coef_>np.mean(model.coef_)])
print(important_feature)
print(f'length : {len(important_feature)}')

In [None]:
# Let us get entire combinations
from itertools import  combinations
all_feature_combinations = []
for i in range(1,len(important_feature)+1):
  combo = [*combinations(important_feature,i)]
  combo = [*map(list,combo)]
  all_feature_combinations.extend(combo)

In [None]:
# Print length of all cominations
print(len(all_feature_combinations))

In [None]:
# Print all feature combinations
all_feature_combinations

In [None]:
# Let us write a function that does subset selection
# this function takes combo, which is a list of feature_list
def fit_and_record(combo):
  record = {'subset':None,'score':-1,'model':None}
  record_feature_count_wise = {}
  for i,subset in enumerate(combo):
    if i%100==0:
      print(f'{i} of {len(combo)}')
    # Create a model
    model = ??
    # Select columns corr to subset and extract values from train, and test df
    # and assign to xtrain and xtest
    xtrain = ??
    xtest = ??

    # Fit model with xtrain and ytrain
    ??

    # calculate score
    score = ??

    if len(subset) in record_feature_count_wise:
      record_feature_count_wise[len(subset)].append(score)
    else:
      record_feature_count_wise[len(subset)] = [score,]

    # Write a condition where if current score is greater update subset, model and score in record
    if record.get('score')<score:
      record['subset'] = subset
      record['model'] = model
      record['score'] = score
  record_feature_count_wise = {i:np.array(j).mean() for i,j in record_feature_count_wise.items()}
  return record,record_feature_count_wise

result,count_wise_score = fit_and_record(all_feature_combinations)

In [None]:
# Print the result
print(result)

In [None]:
# Print count_wise_score
print(count_wise_score)

In [None]:
# Let us visualize importance of these coef_
plt.bar(x=result['subset'], height=np.abs(result['model'].coef_), label='Feature Importance')
plt.xticks(rotation='vertical')
plt.legend()

In [None]:
# Let us visualize count_wise_score
plt.scatter(x=count_wise_score.keys(), y=count_wise_score.values(), label='Count Wise Score')
plt.xticks(rotation='vertical')
plt.legend()

***EXERCISE NO. 3***

  > **Task-2 | Score :15**

In [None]:
# Now answer following

"""
Q1. What important difference did you notice in two feature importance plots? How does this importance change?


Q2. What important trend do you notice in feature_count_wise_score?


Q3. Why do you think this method is not a good approach? Write two important points.


"""
print()

***EXERCISE NO. 3***

  > **Task-3 | Score :10**

In [None]:
# IMPORTANT!!!!!!!!!!!!!! PLEASE PAY ATTENTION ONLY IN INSTRUCTIONS
# YOU DON'T HAVE TO CODE VERYTHING
# JUST TRY TO FIGURE OUT FEW THINGS AS INSTRUCTED


# Since you have now realized the above method is not a good one
# Recall lecture from class, where we talked about forward feature selection
# Now we will sequentially increase our feature space
# We will start with best_feature = []
# We will add first feature, store its score
# We then add next feature, and see if score is high or less
# If score is high we will consider it in our best_feature otherwise we will drop(pop) it
# We will continue this until we finished iterating over all the feature space

# Since you have now realized above method is not a good one
# Recall lecture from class, where we talked about forward feature selection
# Now we will sequentially increase our feature space
# We will start with best_feature = []
# We will add first feature, store its score
# We then add next feature, and see if score is high or less
# If score is high we will consider it in our best_feature otherwise we will drop(pop) it
# We will continue this until we finished iterating over all the feature space

# FOLLOWING FUNCTION WILL fit the given candidate features and return score and model coefficient
# example candidate_best_features = ['feature1', 'feature2',...] from feature_names
from graphviz import Digraph

def fit_and_return_score(candidate_best_features):
  xtrain = train[candidate_best_features].values
  xtest = test[candidate_best_features].values
  if len(xtrain.shape) == 1:
    xtrain = xtrain.reshape(-1,1)
    xtest = xtest.reshape(-1,1)
  model = LinearRegression()
  model.fit(xtrain,ytrain)
  return model.score(xtest,ytest), model.coef_


# This will take existing best features, candidate features that could be possible best feature and weird network graph object
# You will understand why we use this network graph object later
# You don't need to implement everything inside the greedy function
# Try to fill the remaining code section as instructed


def greedy(existing_best_feature, candidate_feature_list,process):
  best_feature = None
  if existing_best_feature:
    score, coef_ = fit_and_return_score(existing_best_feature)
  else:
    score = -1

  best_candidate = None
  for i,feature in enumerate(candidate_feature_list):
    score_, coef_ = fit_and_return_score(existing_best_feature + [feature])

    # Do not pay attention here,
    # You are welcome to explore what I am doing here
    # But this is just for recording our search in terms of graph
    # You will play with this graph later

    if existing_best_feature:
      process.edge(existing_best_feature[-1]+'_'+str(len(existing_best_feature)+1),feature+'_'+str(len(existing_best_feature)+1))
      process.edge(feature+'_'+str(len(existing_best_feature)+1), f'Score = {score_:.6f}')
    else:
      process.edge(feature, f'Score = {score_:.6f}')

    # Here if current score that is newly calculated is less than old one get inside a condition block and do as instructed
    if score < score_:
      # if above condition as instructed is true
      # do following
      #1. update old score to new score
      #2. update best_feature to current feature
      ??
      ??

  if best_feature:
    # If we were able to get our best feature we have to do something here
    # Do following
    # 1. remove best_feature from candidate_feature_list (use .remove method, see this method from list)
    # 2. append best_feature to existing_best_feature

    ??
    ??
    #Please do not pay attention on following code
    process.edge(f'Score = {score:.6f}',best_feature+'_'+str(len(existing_best_feature)+1))

  else:
    # If we are not able to get any best_feature than we have to terminate our search
    # Best way to do this is make our candidate_feature_list None or empty list
    # Do following
    # 1. make candidate_feature_list None
    ??
    # We need to reset our model with old existing_best_feature
    score_, coef_ = fit_and_return_score(existing_best_feature)

  return existing_best_feature, candidate_feature_list, coef_, process


def forward_stepwise_selection(features):
  best_feature = []
  coef = None
  score = -1
  process = Digraph('FSW')
  process.node(f'Score = {0}')
  process.edges([(f'Score = {0}',f) for f in features])

  # Here we will run our while loop unless we run out of features(which infact is candidate_feature_list)
  # Or unless our search is terminated explicitly during greedy search
  # So run while loop unless we have something in features
  # complete while ?? , ?? should be a condition that is true if there is something in features
  while ??:
    best_feature, features, coef,process = greedy(best_feature.copy(), features.copy(), process)
  return best_feature, coef,process

best_feature, coef,_ = forward_stepwise_selection(feature_names.copy())

In [None]:
# You might be wondering how things workout in above example
# Now we will play with our graph object called as process
# You don't have to code anything here
# Just play with given code
# You can tweak number of features
# We have 25 feature names choose number from 1 to 25 to play with the graph
# If you choose smaller number graph will be smaller
# If you choose larger number graph will be larger
# You have to scroll right-left top-down to see whats going on in graph
# This graph is basically showing tree traversal
# Where we are looping over each combination of features one by one progressively from 0 to 1 to 2 to 3 to 4..... and so on
# You will see we will pick up only the max score generating feature sets

# You can run this cell as many times as you like and will get different results
import random
number_of_feature_you_want_to_play_with = 15
best_feature_, _,process = forward_stepwise_selection(random.sample(feature_names.copy(),number_of_feature_you_want_to_play_with))
print(best_feature_)
# You might see '_someNumber' in feature names, these number represents tree level or height
process

In [None]:
# Now plot coef and best_feature in barplot
plt.bar(x = feature_names, height = np.abs(coef),label = 'Forward Step Wise')
plt.xticks(rotation='vertical')
# plt.bar(x = range(25), height = f_regression(xtrain,ytrain)[1],label = 'Important')
plt.legend()

***EXERCISE NO. 3***

  > **Task-4 | Score :5**

In [None]:
# Answer following question

"""
Q1. What is the advantage of step-wise forward feature selection vs subset feature selection method?



"""
print()

***EXERCISE NO. 4***

  > **Task-1 | Score :10**

In [None]:
# So far we see some manual way of doing feature selection
# Now we will try to understand two very important methods for feature selection
# In this first assignment we will study Ridge Regression - L2 Rgularization
# https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html
from sklearn.linear_model import  RidgeCV

def ridge_regr(min_alpha = 1e-3, max_alpha = 1, count = 100):
  # Create a linearly spaced numpy array with given min , max and count
  alphas = ??
  # create ridge model with given alphas
  ridge = ??
  # Fit model
  ??
  print('score ',ridge.score(xtest,ytest))
  return ridge.coef_

coef = ridge_regr()

In [None]:
# Now plot coef and best_feature in barplot
plt.bar(x = feature_names, height = np.abs(coef),label = 'Forward Step Wise')
plt.xticks(rotation='vertical')
# plt.bar(x = range(25), height = f_regression(xtrain,ytrain)[1],label = 'Important')
plt.legend()

***EXERCISE NO. 4***

  > **Task-2 | Score :10**

In [None]:
# So far we see some manual way of doing feature selection
# Now we will try to understand two very important methods for feature selection
# In this first assignment we will study Lasso Regression - L1 Rgularization
# https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html
from sklearn.linear_model import  LassoCV

def lasso_regr(min_alpha = 1e-3, max_alpha = 1, count = 100):
  # Create a linearly spaced numpy array with given min , max and count
  alphas = ??
  # create lasso model with given alphas
  lasso = ??
  # fit your model with xtrain and ytrain
  ??
  print('score ',lasso.score(xtest,ytest))
  return lasso.coef_

coef = lasso_regr()

In [None]:
# Now plot coef and best_feature in barplot
plt.bar(x = feature_names, height = np.abs(coef),label = 'Forward Step Wise')
plt.xticks(rotation='vertical')
# plt.bar(x = range(25), height = f_regression(xtrain,ytrain)[1],label = 'Important')
plt.legend()

***EXERCISE NO. 4***

  > **Task-3 | Score :25**

In [None]:
# Answer following question

"""
Question no.1 What is the key difference between L2 (Ridge) and L1 (LASSO) approach?

Question no.2 What is your key observation on Task1 and Task2 on Ex4?

Question no.3 How is the bias/variance trade-off related with Task1 and Task2?

Question no.4 What are the advantages of Task2 and Task3 methods in Ex4 compared to methods in Ex3?

Question no.5 What are disadvantages of above two methods(L2 and L1)?


"""