## Homework

> Note: sometimes your answer doesn't match one of 
> the options exactly. That's fine. 
> Select the option that's closest to your solution.
> If it's exactly in between two options, select the higher value.


### Dataset

In this homework, we continue using the fuel efficiency dataset.
Download it from <a href='https://raw.githubusercontent.com/alexeygrigorev/datasets/master/car_fuel_efficiency.csv'>here</a>.

You can do it with wget:

In [2]:
!wget https://raw.githubusercontent.com/alexeygrigorev/datasets/master/car_fuel_efficiency.csv

--2025-11-02 12:40:02--  https://raw.githubusercontent.com/alexeygrigorev/datasets/master/car_fuel_efficiency.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 874188 (854K) [text/plain]
Saving to: ‘car_fuel_efficiency.csv’


2025-11-02 12:40:02 (129 MB/s) - ‘car_fuel_efficiency.csv’ saved [874188/874188]




The goal of this homework is to create a regression model for predicting the car fuel efficiency (column `'fuel_efficiency_mpg'`).



### Preparing the dataset 

Preparation:

* Fill missing values with zeros. ✔️
* Do train/validation/test split with 60%/20%/20% distribution. ✔️
* Use the `train_test_split` function and set the `random_state` parameter to 1. ✔️
* Use `DictVectorizer(sparse=True)` to turn the dataframes into matrices.

In [1]:
import pandas as pd, numpy as np

file = 'car_fuel_efficiency.csv'
df_original = pd.read_csv(file)
df = df_original.copy()
display(df.isnull().sum())
df = df.fillna(0)
df

engine_displacement      0
num_cylinders          482
horsepower             708
vehicle_weight           0
acceleration           930
model_year               0
origin                   0
fuel_type                0
drivetrain               0
num_doors              502
fuel_efficiency_mpg      0
dtype: int64

Unnamed: 0,engine_displacement,num_cylinders,horsepower,vehicle_weight,acceleration,model_year,origin,fuel_type,drivetrain,num_doors,fuel_efficiency_mpg
0,170,3.0,159.0,3413.433759,17.7,2003,Europe,Gasoline,All-wheel drive,0.0,13.231729
1,130,5.0,97.0,3149.664934,17.8,2007,USA,Gasoline,Front-wheel drive,0.0,13.688217
2,170,0.0,78.0,3079.038997,15.1,2018,Europe,Gasoline,Front-wheel drive,0.0,14.246341
3,220,4.0,0.0,2542.392402,20.2,2009,USA,Diesel,All-wheel drive,2.0,16.912736
4,210,1.0,140.0,3460.870990,14.4,2009,Europe,Gasoline,All-wheel drive,2.0,12.488369
...,...,...,...,...,...,...,...,...,...,...,...
9699,140,5.0,164.0,2981.107371,17.3,2013,Europe,Diesel,Front-wheel drive,0.0,15.101802
9700,180,0.0,154.0,2439.525729,15.0,2004,USA,Gasoline,All-wheel drive,0.0,17.962326
9701,220,2.0,138.0,2583.471318,15.1,2008,USA,Diesel,All-wheel drive,-1.0,17.186587
9702,230,4.0,177.0,2905.527390,19.4,2011,USA,Diesel,Front-wheel drive,1.0,15.331551


In [2]:
from sklearn.model_selection import train_test_split

df_full_train, df_test = train_test_split(df, test_size=0.2, random_state=1)
df_train, df_val = train_test_split(df_full_train, test_size=0.25, random_state=1)

df_train = df_train.reset_index(drop=True)
df_val = df_val.reset_index(drop=True)
df_test = df_test.reset_index(drop=True)

y_train = df_train.fuel_efficiency_mpg.values
y_val = df_val.fuel_efficiency_mpg.values
y_test = df_test.fuel_efficiency_mpg.values

In [3]:
del df_train['fuel_efficiency_mpg']
del df_val['fuel_efficiency_mpg']
del df_test['fuel_efficiency_mpg']

In [4]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.feature_extraction import DictVectorizer
from sklearn.metrics import roc_auc_score

dv = DictVectorizer(sparse=True)
train_dict = df_train.to_dict(orient='records')
val_dicts = df_val.to_dict(orient='records')
X_train = dv.fit_transform(train_dict)
display(X_train)
X_val = dv.transform(val_dicts)
X_val

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 58220 stored elements and shape (5822, 14)>

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 19410 stored elements and shape (1941, 14)>

## Question 1

Let's train a decision tree regressor to predict the `fuel_efficiency_mpg` variable. 

* Train a model with `max_depth=1`.


Which feature is used for splitting the data?


* `'vehicle_weight'` <---
* `'model_year'`
* `'origin'`
* `'fuel_type'`

In [5]:
from sklearn.tree import export_text

dt = DecisionTreeRegressor(max_depth=1)
dt.fit(X_train, y_train)
print(export_text(dt, feature_names=list(dv.get_feature_names_out())))

|--- vehicle_weight <= 3022.11
|   |--- value: [16.88]
|--- vehicle_weight >  3022.11
|   |--- value: [12.94]



## Question 2

Train a random forest regressor with these parameters:

* `n_estimators=10`
* `random_state=1`
* `n_jobs=-1` (optional - to make training faster)


What's the RMSE of this model on the validation data?

* 0.045
* 0.45 <---
* 4.5
* 45.0

In [6]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

rfr = RandomForestRegressor(n_estimators=10, random_state=1, n_jobs=-1)
rfr.fit(X_train, y_train)

# Predict
y_pred = rfr.predict(X_val)
rmse = np.sqrt(mean_squared_error(y_val, y_pred))
print(f"RMSE: {rmse}")

RMSE: 0.4595777223092726


## Question 3

Now let's experiment with the `n_estimators` parameter

* Try different values of this parameter from 10 to 200 with step 10.
* Set `random_state` to `1`.
* Evaluate the model on the validation dataset.


After which value of `n_estimators` does RMSE stop improving?
Consider 3 decimal places for calculating the answer.

- 10
- 25
- 80
- 200 <---

If it doesn't stop improving, use the latest iteration number in
your answer.

In [8]:
for n, rmse in rmse_values:
    print(f"n_estimators: {n}, RMSE: {rmse}")

n_estimators: 10, RMSE: 0.4595777223092726
n_estimators: 20, RMSE: 0.45359067251247054
n_estimators: 30, RMSE: 0.45168672575457125
n_estimators: 40, RMSE: 0.44872083017369974
n_estimators: 50, RMSE: 0.4466568972416094
n_estimators: 60, RMSE: 0.4454597026081121
n_estimators: 70, RMSE: 0.4451263244986996
n_estimators: 80, RMSE: 0.4449843119777284
n_estimators: 90, RMSE: 0.4448614906399875
n_estimators: 100, RMSE: 0.4446518680868042
n_estimators: 110, RMSE: 0.44357876439860233
n_estimators: 120, RMSE: 0.4439118681233817
n_estimators: 130, RMSE: 0.443702590396687
n_estimators: 140, RMSE: 0.4433549955101688
n_estimators: 150, RMSE: 0.44289761494219454
n_estimators: 160, RMSE: 0.4427612219659299
n_estimators: 170, RMSE: 0.44280146504730905
n_estimators: 180, RMSE: 0.44236195357041347
n_estimators: 190, RMSE: 0.4424939711220692
n_estimators: 200, RMSE: 0.4424785084688597


In [7]:
rmse_values = []

for n in range(10, 201, 10):
    print(f'n = {n}')
    model = RandomForestRegressor(n_estimators=n, random_state=1, n_jobs=-1)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_val)
    rmse = np.sqrt(mean_squared_error(y_val, y_pred))
    rmse_values.append((n, rmse))

# Print results
for n, rmse in rmse_values:
    print(f"n_estimators: {n}, RMSE: {rmse:.3f}")

n = 10
n = 20
n = 30
n = 40
n = 50
n = 60
n = 70
n = 80
n = 90
n = 100
n = 110
n = 120
n = 130
n = 140
n = 150
n = 160
n = 170
n = 180
n = 190
n = 200
n_estimators: 10, RMSE: 0.460
n_estimators: 20, RMSE: 0.454
n_estimators: 30, RMSE: 0.452
n_estimators: 40, RMSE: 0.449
n_estimators: 50, RMSE: 0.447
n_estimators: 60, RMSE: 0.445
n_estimators: 70, RMSE: 0.445
n_estimators: 80, RMSE: 0.445
n_estimators: 90, RMSE: 0.445
n_estimators: 100, RMSE: 0.445
n_estimators: 110, RMSE: 0.444
n_estimators: 120, RMSE: 0.444
n_estimators: 130, RMSE: 0.444
n_estimators: 140, RMSE: 0.443
n_estimators: 150, RMSE: 0.443
n_estimators: 160, RMSE: 0.443
n_estimators: 170, RMSE: 0.443
n_estimators: 180, RMSE: 0.442
n_estimators: 190, RMSE: 0.442
n_estimators: 200, RMSE: 0.442


## Question 4

Let's select the best `max_depth`:

* Try different values of `max_depth`: `[10, 15, 20, 25]` ✔️
* For each of these values,
  * try different values of `n_estimators` from 10 till 200 (with step 10) ✔️
  * calculate the mean RMSE ✔️
* Fix the random seed: `random_state=1` ✔️


What's the best `max_depth`, using the mean RMSE?

* 10 <---
* 15
* 20
* 25

In [25]:
from sklearn.model_selection import cross_val_score
from tqdm.auto import tqdm

max_depth_values = [10, 15, 20, 25]
n_estimators_range = range(10, 201, 10)

results = {}

for depth in tqdm(max_depth_values):
    print(f'current depth={depth}')
    rmse_list = []
    for n in tqdm(n_estimators_range):
        print(f'    current estimator={n}')
        model = RandomForestRegressor(
            max_depth=depth,
            n_estimators=n,
            random_state=1,
            n_jobs=-1
        )
        scores = cross_val_score(model, X_train, y_train, 
                                 scoring='neg_root_mean_squared_error', cv=5)
        mean_rmse = -np.mean(scores)
        rmse_list.append(mean_rmse)
    results[depth] = np.mean(rmse_list)
    print(f'result is {results[depth]}')

# Find best max_depth
best_depth = min(results, key=results.get)
print("Best max_depth:", best_depth)
print("Mean RMSE:", results[best_depth])

  0%|          | 0/4 [00:00<?, ?it/s]

current depth=10


  0%|          | 0/20 [00:00<?, ?it/s]

    current estimator=10
    current estimator=20


KeyboardInterrupt: 

In [19]:
import csv
import os
from sklearn.model_selection import cross_val_score

from tqdm.auto import tqdm

max_depth_values = [10, 15, 20, 25]
n_estimators_range = range(10, 201, 10)
results_file = "results.csv"
results = {}

# --- Load existing results if file exists ---
if os.path.exists(results_file):
    with open(results_file, "r") as f:
        reader = csv.reader(f)
        # next(reader)  # skip header
        for row in reader:
            if row[0] == "max_depth":
                continue  # skip any header rows
            depth = int(row[0])
            print(f'{depth} detected')
            mean_rmse = float(row[1])
            results[depth] = mean_rmse
            print(f"Result for depth={depth}: {mean_rmse:.4f}")

# --- Open file for appending new results ---
with open(results_file, "a", newline="") as f:
    writer = csv.writer(f)
    if not results:  # write header only if file was empty
        print('header being written')
        writer.writerow(["max_depth", "mean_rmse"])

    for depth in tqdm(max_depth_values):
        if depth in results:
            print(f"Skipping depth={depth} (already computed)")
            continue

        print(f"Computing depth={depth}")
        rmse_list = []
        for n in tqdm(n_estimators_range, leave=True):
            print(f"    estimator={n}")
            model = RandomForestRegressor(
                max_depth=depth,
                n_estimators=n,
                random_state=1,
                n_jobs=-1
            )
            scores = cross_val_score(model, X_train, y_train,
                                     scoring='neg_root_mean_squared_error', cv=5)
            mean_rmse = -np.mean(scores)
            rmse_list.append(mean_rmse)

        avg_rmse = np.mean(rmse_list)
        results[depth] = avg_rmse
        writer.writerow([depth, avg_rmse])
        f.flush()
        print(f"Result for depth={depth}: {avg_rmse:.4f}")

# --- Final result ---
best_depth = min(results, key=results.get)
print("Best max_depth:", best_depth)
print("Mean RMSE:", results[best_depth])


header being written


  0%|          | 0/4 [00:00<?, ?it/s]

Computing depth=10


  0%|          | 0/20 [00:00<?, ?it/s]

    estimator=10
    estimator=20
    estimator=30
    estimator=40
    estimator=50
    estimator=60
    estimator=70
    estimator=80
    estimator=90
    estimator=100
    estimator=110
    estimator=120
    estimator=130
    estimator=140
    estimator=150
    estimator=160
    estimator=170
    estimator=180
    estimator=190
    estimator=200
Result for depth=10: 0.4432
Computing depth=15


  0%|          | 0/20 [00:00<?, ?it/s]

    estimator=10
    estimator=20
    estimator=30
    estimator=40
    estimator=50
    estimator=60
    estimator=70
    estimator=80
    estimator=90
    estimator=100
    estimator=110
    estimator=120
    estimator=130
    estimator=140
    estimator=150
    estimator=160
    estimator=170
    estimator=180
    estimator=190
    estimator=200
Result for depth=15: 0.4457
Computing depth=20


  0%|          | 0/20 [00:00<?, ?it/s]

    estimator=10
    estimator=20
    estimator=30
    estimator=40
    estimator=50
    estimator=60
    estimator=70
    estimator=80
    estimator=90
    estimator=100
    estimator=110
    estimator=120
    estimator=130
    estimator=140
    estimator=150
    estimator=160
    estimator=170
    estimator=180
    estimator=190
    estimator=200
Result for depth=20: 0.4455
Computing depth=25


  0%|          | 0/20 [00:00<?, ?it/s]

    estimator=10
    estimator=20
    estimator=30
    estimator=40
    estimator=50
    estimator=60
    estimator=70
    estimator=80
    estimator=90
    estimator=100
    estimator=110
    estimator=120
    estimator=130
    estimator=140
    estimator=150
    estimator=160
    estimator=170
    estimator=180
    estimator=190
    estimator=200
Result for depth=25: 0.4454
Best max_depth: 10
Mean RMSE: 0.4431805944095529


In [18]:
results_old = results
results_old

{10: 0.44318059440955293,
 15: 0.44569659863415,
 20: 0.44553083381794745,
 25: 0.44544460600529623}

# Question 5

We can extract feature importance information from tree-based models. 

At each step of the decision tree learning algorithm, it finds the best split. 
When doing it, we can calculate "gain" - the reduction in impurity before and after the split. 
This gain is quite useful in understanding what are the important features for tree-based models.

In Scikit-Learn, tree-based models contain this information in the
[`feature_importances_`](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor.feature_importances_)
field. 

For this homework question, we'll find the most important feature:

* Train the model with these parameters: ✔️
  * `n_estimators=10`,
  * `max_depth=20`,
  * `random_state=1`,
  * `n_jobs=-1` (optional)
* Get the feature importance information from this model ✔️


What's the most important feature (among these 4)? 

* `vehicle_weight` <---
*	`horsepower`
* `acceleration`
* `engine_displacement`

In [11]:
model = RandomForestRegressor(
    n_estimators=10,
    max_depth=20,
    random_state=1,
    n_jobs=-1
)
model.fit(X_train, y_train)

# Get feature importances
importances = model.feature_importances_
feature_names = dv.get_feature_names_out()
feature_names

array(['acceleration', 'drivetrain=All-wheel drive',
       'drivetrain=Front-wheel drive', 'engine_displacement',
       'fuel_type=Diesel', 'fuel_type=Gasoline', 'horsepower',
       'model_year', 'num_cylinders', 'num_doors', 'origin=Asia',
       'origin=Europe', 'origin=USA', 'vehicle_weight'], dtype=object)

In [12]:
importance_dict = dict(zip(feature_names, importances))
display(importance_dict)
sorted_features = sorted(importance_dict.items(), key=lambda x: x[1], reverse=True)
sorted_features

{'acceleration': np.float64(0.01147970063142938),
 'drivetrain=All-wheel drive': np.float64(0.0003571085493021933),
 'drivetrain=Front-wheel drive': np.float64(0.00034538411263183535),
 'engine_displacement': np.float64(0.0032727919136094864),
 'fuel_type=Diesel': np.float64(0.000325424322869738),
 'fuel_type=Gasoline': np.float64(0.00036038360069172865),
 'horsepower': np.float64(0.015997897714266237),
 'model_year': np.float64(0.003212300094794675),
 'num_cylinders': np.float64(0.0023433469524512048),
 'num_doors': np.float64(0.0016349895439306998),
 'origin=Asia': np.float64(0.0004622464955097423),
 'origin=Europe': np.float64(0.000518739638586969),
 'origin=USA': np.float64(0.0005397216891829147),
 'vehicle_weight': np.float64(0.9591499647407432)}

[('vehicle_weight', np.float64(0.9591499647407432)),
 ('horsepower', np.float64(0.015997897714266237)),
 ('acceleration', np.float64(0.01147970063142938)),
 ('engine_displacement', np.float64(0.0032727919136094864)),
 ('model_year', np.float64(0.003212300094794675)),
 ('num_cylinders', np.float64(0.0023433469524512048)),
 ('num_doors', np.float64(0.0016349895439306998)),
 ('origin=USA', np.float64(0.0005397216891829147)),
 ('origin=Europe', np.float64(0.000518739638586969)),
 ('origin=Asia', np.float64(0.0004622464955097423)),
 ('fuel_type=Gasoline', np.float64(0.00036038360069172865)),
 ('drivetrain=All-wheel drive', np.float64(0.0003571085493021933)),
 ('drivetrain=Front-wheel drive', np.float64(0.00034538411263183535)),
 ('fuel_type=Diesel', np.float64(0.000325424322869738))]

## Question 6

Now let's train an XGBoost model! For this question, we'll tune the `eta` parameter:

* Install XGBoost ✔️
* Create DMatrix for train and validation ✔️
* Create a watchlist ✔️
* Train a model with these parameters for 100 rounds: ✔️

```
xgb_params = {
    'eta': 0.3, 
    'max_depth': 6,
    'min_child_weight': 1,
    
    'objective': 'reg:squarederror',
    'nthread': 8,
    
    'seed': 1,
    'verbosity': 1,
}
```

In [13]:
import xgboost as xgb

dtrain = xgb.DMatrix(X_train, label=y_train)
dval = xgb.DMatrix(X_val, label=y_val)

watchlist = [(dtrain, 'train'), (dval, 'val')]

xgb_params = {
    'eta': 0.3,
    'max_depth': 6,
    'min_child_weight': 1,
    'objective': 'reg:squarederror',
    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(
    params=xgb_params,
    dtrain=dtrain,
    num_boost_round=100,
    evals=watchlist
)

[0]	train-rmse:1.81393	val-rmse:1.85444
[1]	train-rmse:1.31919	val-rmse:1.35353
[2]	train-rmse:0.98120	val-rmse:1.01316
[3]	train-rmse:0.75443	val-rmse:0.78667
[4]	train-rmse:0.60680	val-rmse:0.64318
[5]	train-rmse:0.51381	val-rmse:0.55664
[6]	train-rmse:0.45470	val-rmse:0.50321
[7]	train-rmse:0.41881	val-rmse:0.47254
[8]	train-rmse:0.39534	val-rmse:0.45509
[9]	train-rmse:0.38038	val-rmse:0.44564
[10]	train-rmse:0.37115	val-rmse:0.43896
[11]	train-rmse:0.36361	val-rmse:0.43594
[12]	train-rmse:0.35850	val-rmse:0.43558
[13]	train-rmse:0.35365	val-rmse:0.43394
[14]	train-rmse:0.35025	val-rmse:0.43349
[15]	train-rmse:0.34666	val-rmse:0.43362
[16]	train-rmse:0.34459	val-rmse:0.43378
[17]	train-rmse:0.34128	val-rmse:0.43405
[18]	train-rmse:0.33822	val-rmse:0.43391
[19]	train-rmse:0.33709	val-rmse:0.43374
[20]	train-rmse:0.33553	val-rmse:0.43376
[21]	train-rmse:0.33243	val-rmse:0.43453
[22]	train-rmse:0.33031	val-rmse:0.43510
[23]	train-rmse:0.32815	val-rmse:0.43601
[24]	train-rmse:0.32670	va

In [14]:

xgb_params['eta'] = 0.1
model_low_eta = xgb.train(xgb_params, dtrain, num_boost_round=100, evals=watchlist)

xgb_params['eta'] = 0.3
model_high_eta = xgb.train(xgb_params, dtrain, num_boost_round=100, evals=watchlist)


[0]	train-rmse:2.28944	val-rmse:2.34561
[1]	train-rmse:2.07396	val-rmse:2.12434
[2]	train-rmse:1.88066	val-rmse:1.92597
[3]	train-rmse:1.70730	val-rmse:1.74987
[4]	train-rmse:1.55163	val-rmse:1.59059
[5]	train-rmse:1.41247	val-rmse:1.44988
[6]	train-rmse:1.28796	val-rmse:1.32329
[7]	train-rmse:1.17660	val-rmse:1.20930
[8]	train-rmse:1.07736	val-rmse:1.10830
[9]	train-rmse:0.98883	val-rmse:1.02009
[10]	train-rmse:0.91008	val-rmse:0.94062
[11]	train-rmse:0.84030	val-rmse:0.87100
[12]	train-rmse:0.77874	val-rmse:0.80916
[13]	train-rmse:0.72417	val-rmse:0.75465
[14]	train-rmse:0.67626	val-rmse:0.70780
[15]	train-rmse:0.63402	val-rmse:0.66672
[16]	train-rmse:0.59690	val-rmse:0.63062
[17]	train-rmse:0.56447	val-rmse:0.60016
[18]	train-rmse:0.53619	val-rmse:0.57383
[19]	train-rmse:0.51138	val-rmse:0.55044
[20]	train-rmse:0.48983	val-rmse:0.53064
[21]	train-rmse:0.47135	val-rmse:0.51451
[22]	train-rmse:0.45501	val-rmse:0.49998
[23]	train-rmse:0.44120	val-rmse:0.48790
[24]	train-rmse:0.42929	va

Now change `eta` from `0.3` to `0.1`.

Which eta leads to the best RMSE score on the validation dataset?

* 0.3
* 0.1 <---
* Both give equal value

In [15]:
# Prepare DMatrix
dtrain = xgb.DMatrix(X_train, label=y_train)
dval = xgb.DMatrix(X_val, label=y_val)
watchlist = [(dtrain, 'train'), (dval, 'val')]

# Base parameters
base_params = {
    'max_depth': 6,
    'min_child_weight': 1,
    'objective': 'reg:squarederror',
    'nthread': 8,
    'seed': 1,
    'verbosity': 0
}

etas = [0.3, 0.1]
rmse_results = {}

for eta in etas:
    print(f"\nTraining with eta={eta}")
    params = base_params.copy()
    params['eta'] = eta

    evals_result = {}
    model = xgb.train(
        params=params,
        dtrain=dtrain,
        num_boost_round=100,
        evals=watchlist,
        evals_result=evals_result,
        verbose_eval=False
    )

    val_rmse = evals_result['val']['rmse']
    best_rmse = min(val_rmse)
    rmse_results[eta] = best_rmse
    print(f"Best RMSE for eta={eta}: {best_rmse:.4f}")

# Final comparison
best_eta = min(rmse_results, key=rmse_results.get)
print("\n--- Comparison Summary ---")
for eta, score in rmse_results.items():
    print(f"eta={eta}: RMSE={score:.9f}")
print(f"\n✅ Best eta: {best_eta}")


Training with eta=0.3
Best RMSE for eta=0.3: 0.4335

Training with eta=0.1
Best RMSE for eta=0.1: 0.4243

--- Comparison Summary ---
eta=0.3: RMSE=0.433486130
eta=0.1: RMSE=0.424262563

✅ Best eta: 0.1
