<div>
<img src=https://www.institutedata.com/wp-content/uploads/2019/10/iod_h_tp_primary_c.svg width="300">
</div>

# Lab 4.3: Measurements

Building upon the forward feature selection technique, we apply it to the diabetes dataset. By iterating over the entire dataset, we identify the subset of features that yield the best adjusted R-squared score. Furthermore, we visualise the results by plotting the R-squared and adjusted R-squared values, providing insights into the model's performance with different feature combinations.

In [None]:
## Import Libraries

import pandas as pd

%matplotlib inline
import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

### 1. Forward Feature Selection

> Forward Selection: Forward selection is an iterative method in which we start with having no feature in the model. In each iteration, we keep adding the feature which best improves our model till an addition of a new variable does not improve the performance of the model.

Create a Regression model using Forward Feature Selection by looping over all the features adding one at a time until there are no improvements on the prediction metric ( R2  and  AdjustedR2  in this case).

#### 1.1 Load Diabetics Data Using datasets of sklearn

In [None]:
## Load the Diabetes dataset

# Load the diabetes dataset from sklearn
diabetes = datasets.load_diabetes()

In [None]:
# Description
print(diabetes.DESCR)

In [None]:
# Predictors
X = pd.DataFrame(diabetes.data, columns = diabetes.feature_names)

In [None]:
# Target
y = diabetes.target

In [None]:
## Create training and testing subsets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

In [None]:
X_train.shape

#### 1.2 Use Forward Feature Selection to pick a good model

**Hint: Same as Lab 4.2.2**

- Add R^2 value in a list
- Add Adjusted R^2 in another list
- Display both R^2 and Adjusted R^2

In [None]:
## Flag intermediate output

show_steps = False   # for testing/debugging
# show_steps = False  # without showing steps

In [None]:
## Use Forward Feature Selection to pick a good model

# start with no predictors
included = []
# keep track of model and parameters
best = {'feature': '', 'r2': 0, 'a_r2': 0}
# create a model object to hold the modelling parameters
model = LinearRegression()
# get the number of cases in the training data
n = X_train.shape[0]

r2_list = []
adjusted_r2_list = []

while True: 
    changed = False

    if show_steps:
        print('')

    excluded =  list(set(X.columns) - set(included))

    if show_steps:
        print(f"(Step) Excluded = {', '.join(excluded)}")

    for new_column in excluded:
        if show_steps:
            print(f"(Step) Trying {new_column}...")
            print(f"(Step) - Features = {', '.join(included + [new_column])}")

       
        fit = model.fit(X_train[included + [new_column]], y_train)
        
        r2 = model.score(X_train[included + [new_column]], y_train)

        
        k = len(included) + 1
       
        adjusted_r2 = 1 - ( ( (1 - r2) * (n - 1) ) / (n - k - 1) )

        if show_steps:
            print(f"(Step) - Adjusted R^2: This = {adjusted_r2:.3f}; Best = {best['a_r2']:.3f}")

       
        if adjusted_r2 > best['a_r2']:
           
            best = {'feature': new_column, 'r2': r2, 'a_r2': adjusted_r2}
            
            changed = True
            if show_steps:
                print("(Step) - New Best!   : Feature = {best['feature']}; R^2 = {best['r2']:.3f}; Adjusted R^2 = {best['a_r2']:.3f}")
   

    r2_list.append(best['r2'])
    adjusted_r2_list.append(best['a_r2'])

    
    if changed:
      
        included.append(best['feature'])
        excluded = list(set(excluded) - set(best['feature']))
        print(f"Added feature {best['feature']} with R^2 = {best['r2']:.3f} and adjusted R^2 = {best['a_r2']:.3f}")
    else:
        
        print('*'*50)
        break

print('')
print('Resulting features:')
print(', '.join(included))

In [None]:
## Chart both R^2 and Adjusted R^2

_range = range(1, len(r2_list)+1)

# define chart size
plt.figure(figsize = (10, 5))
# plot each metric
plt.plot(_range, r2_list, label = '$R^2$')
plt.plot(_range, adjusted_r2_list, label = '$Adjusted \: R^2$')
# add some better visualisation
plt.xlabel('Number of Features')
plt.legend()
# output the chart
plt.show()



---



---



> > > > > > > > > © 2024 Institute of Data


---



---



