### Question 1

Interpretability is important in machine learning because it helps us understand how a model makes its decisions. This is especially important in sensitive areas like healthcare, finance, or hiring, where decisions can have a big impact on people's lives. Being able to explain a model's predictions builds trust, makes it easier to catch and fix biases or errors, and helps ensure the model is being used fairly and responsibly. It also supports legal and ethical standards, giving users a way to understand or even challenge decisions made by the model. Without interpretability, we risk relying on “black box” models that make decisions we can't fully understand or justify.

### Question 2

In [None]:
!pip install autofeat

Collecting autofeat
  Downloading autofeat-2.1.3-py3-none-any.whl.metadata (1.7 kB)
Collecting numpy<2.0.0,>=1.20.3 (from autofeat)
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
Collecting pint<1.0,>=0.17 (from autofeat)
  Downloading Pint-0.24.4-py3-none-any.whl.metadata (8.5 kB)
Collecting flexcache>=0.3 (from pint<1.0,>=0.17->autofeat)
  Downloading flexcache-0.3-py3-none-any.whl.metadata (7.0 kB)
Collecting flexparser>=0.4 (from pint<1.0,>=0.17->autofeat)
  Downloading flexparser-0.4-py3-none-any.whl.metadata (18 kB)
Downloading autofeat-2.1.3-py3-none-any.whl (23 kB)
Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.3/18.3 MB[0m [31m111.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading Pint-0.24.4-

In [None]:
from sklearn.datasets import load_diabetes
import pandas as pd
from autofeat import FeatureSelector

# load data
X, y = load_diabetes(return_X_y=True, as_frame=True)

# initialize FeatureSelector
fs = FeatureSelector(verbose=1)
X_selected = fs.fit_transform(X, y)

# number of discarded features
num_original_features = X.shape[1]
num_selected_features = X_selected.shape[1]
num_discarded = num_original_features - num_selected_features

print(f"Original features: {num_original_features}")
print(f"Selected features: {num_selected_features}")
print(f"Discarded features: {num_discarded}")

[featsel] Scaling data...done.
Original features: 10
Selected features: 6
Discarded features: 4


  if np.max(np.abs(correlations[c].ravel()[:i])) < 0.9:


### Question 3

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# use selected features
X_train, X_test, y_train, y_test = train_test_split(X_selected, y, test_size=0.2, random_state=42)

# fit a regression model
model = LinearRegression()
model.fit(X_train, y_train)

# evaluate
r2_train = r2_score(y_train, model.predict(X_train))
r2_test = r2_score(y_test, model.predict(X_test))

print(f"R2 score (train): {r2_train:.4f}")
print(f"R2 score (test): {r2_test:.4f}")

R2 score (train): 0.5213
R2 score (test): 0.4632


### Question 4

In [None]:
from autofeat import AutoFeatRegressor

# initialize and fit AutoFeatRegressor with 3 feature engineering steps
afr = AutoFeatRegressor(verbose=1, feateng_steps=3)
X_train_feat = afr.fit_transform(X_train, y_train)
X_test_feat = afr.transform(X_test)

model.fit(X_train_feat, y_train)

[featsel] Scaling data...done.


  if np.max(np.abs(correlations[c].ravel()[:i])) < 0.9:






In [None]:
# calculate r^2 scores
r2_train_feat = model.score(X_train_feat, y_train)
r2_test_feat = model.score(X_test_feat, y_test)

print(f"R² score on training set (with feature engineering): {r2_train_feat:.4f}")
print(f"R² score on test set (with feature engineering): {r2_test_feat:.4f}")

# show 5 new features generated by AutoFeat
new_features = afr.new_feat_cols_[:5]
print("Five new features generated:", new_features)

R² score on training set (with feature engineering): 0.5746
R² score on test set (with feature engineering): 0.5072
Five new features generated: ['Abs(s5)/s5', 'exp(s3)*Abs(sex)', 'exp(bmi)*exp(bp)', 'exp(bmi)*exp(s5)', 's3/Abs(bmi)']


The AutoFeatRegressor output shows a clear improvement in both training and test $R^2$ scores, rising to 0.5746 and 0.5072 respectively, compared to the baseline model. While this suggests that the engineered features captured more complex patterns, the gap between training and test performance indicates a possible risk of overfitting. The newly created features, such as exp(bmi)*exp(s5) and s3/Abs(bmi), reflect non-linear relationships that may boost model accuracy but should be used with caution to ensure generalizability.