# AI Programming – Winter 2026
## Assignment 1 – Question 2: Auto MPG Dataset
**AI Assistance Disclosure:** Portions of this code were written with the assistance of Claude (Anthropic, claude-sonnet-4-6). Prompts used: "Fill missing horsepower values with cylinder-group averages in pandas", "Build a Keras Sequential DNN regression model for the Auto MPG dataset", "Predict MPG for custom vehicle data using a trained Keras model". All code has been reviewed and understood by the student.

In [1]:
# ── Imports ──────────────────────────────────────────────────────────────────
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
matplotlib.use('Agg')

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

print('TensorFlow version:', tf.__version__)

TensorFlow version: 2.20.0


## Load the Auto MPG Dataset

In [2]:
# ── Load data ─────────────────────────────────────────────────────────────────
# Dataset source: https://archive.ics.uci.edu/ml/datasets/Auto+MPG
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
col_names = ['mpg','cylinders','displacement','horsepower','weight',
             'acceleration','year','origin','name']

df = pd.read_csv(url, sep='\s+', header=None, names=col_names,
                 na_values='?')       # '?' marks missing horsepower

print(df.shape)
df.head()

  df = pd.read_csv(url, sep='\s+', header=None, names=col_names,


(398, 9)


Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130.0,3504.0,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693.0,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150.0,3436.0,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150.0,3433.0,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140.0,3449.0,10.5,70,1,ford torino


## Part (a) – Drop `name` and `origin`

In [3]:
# ── (a) Drop name and origin ──────────────────────────────────────────────────
df.drop(columns=['name', 'origin'], inplace=True)
print('Columns after drop:', df.columns.tolist())

Columns after drop: ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'year']


## Part (b) – Fill Missing Horsepower Values

In [4]:
# ── (b) Fill missing horsepower with per-cylinder-group mean ─────────────────
# AI-assisted (Claude, claude-sonnet-4-6):
#   Prompt – "Fill missing horsepower values with cylinder-group averages in pandas."

print('Missing horsepower rows before fill:')
print(df[df['horsepower'].isna()][['cylinders','horsepower']])

# Compute average horsepower per cylinder count (excluding NaN)
cyl_hp_mean = df.groupby('cylinders')['horsepower'].mean()
print('\nMean horsepower by cylinder count:')
print(cyl_hp_mean.round(2))

# Fill NaNs using the appropriate group mean
df['horsepower'] = df.apply(
    lambda row: cyl_hp_mean[row['cylinders']]
                if pd.isna(row['horsepower']) else row['horsepower'],
    axis=1
)

print(f'\nMissing after fill: {df["horsepower"].isna().sum()}')

Missing horsepower rows before fill:
     cylinders  horsepower
32           4         NaN
126          6         NaN
330          4         NaN
336          4         NaN
354          4         NaN
374          4         NaN

Mean horsepower by cylinder count:
cylinders
3     99.25
4     78.28
5     82.33
6    101.51
8    158.30
Name: horsepower, dtype: float64

Missing after fill: 0


## Part (c) – Convert 2-digit Year to 4-digit Year

In [5]:
# ── (c) Add 1900 to year column ───────────────────────────────────────────────
# Student-authored
print('Year values before:', df['year'].unique())
df['year'] = df['year'] + 1900
print('Year values after: ', df['year'].unique())

Year values before: [70 71 72 73 74 75 76 77 78 79 80 81 82]
Year values after:  [1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982]


## Part (d) – 50/50 Train–Validation Split

In [6]:
# ── (d) 50 / 50 split ─────────────────────────────────────────────────────────
features = ['cylinders','displacement','horsepower','weight','acceleration','year']
target   = 'mpg'

X = df[features].values
y = df[target].values

X_train, X_val, y_train, y_val = train_test_split(
    X, y, test_size=0.5, random_state=42)

# Standardise features (student-authored)
scaler = StandardScaler()
X_train_sc = scaler.fit_transform(X_train)
X_val_sc   = scaler.transform(X_val)

print(f'Train: {X_train_sc.shape}  |  Val: {X_val_sc.shape}')

Train: (199, 6)  |  Val: (199, 6)


## Part (e) – Train a Keras Sequential DNN for MPG Prediction

In [7]:
# ── (e) Build & train regression DNN ─────────────────────────────────────────
# AI-assisted (Claude, claude-sonnet-4-6):
#   Prompt – "Build a Keras Sequential DNN regression model for the Auto MPG dataset."

def build_mpg_model(input_dim=6):
    model = keras.models.Sequential([
        layers.Input(shape=(input_dim,)),

        layers.Dense(128, kernel_initializer='he_normal', activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.2),

        layers.Dense(64, kernel_initializer='he_normal', activation='relu'),
        layers.BatchNormalization(),
        layers.Dropout(0.2),

        layers.Dense(32, kernel_initializer='he_normal', activation='relu'),

        layers.Dense(1)   # linear output for regression
    ], name='mpg_model')

    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=1e-3),
        loss='mse',
        metrics=['mae']
    )
    return model

mpg_model = build_mpg_model()
mpg_model.summary()

In [8]:
history = mpg_model.fit(
    X_train_sc, y_train,
    epochs=200,
    batch_size=32,
    validation_data=(X_val_sc, y_val),
    callbacks=[keras.callbacks.EarlyStopping(patience=20, restore_best_weights=True)],
    verbose=0
)

val_mae = mpg_model.evaluate(X_val_sc, y_val, verbose=0)[1]
print(f'Validation MAE: {val_mae:.3f} mpg')

Validation MAE: 1.827 mpg


In [9]:
# ── Plot training curves (student-authored) ───────────────────────────────────
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].plot(history.history['loss'],     label='Train MSE')
axes[0].plot(history.history['val_loss'], label='Val MSE')
axes[0].set_title('MPG Model – MSE Loss')
axes[0].set_xlabel('Epoch'); axes[0].set_ylabel('MSE')
axes[0].legend(); axes[0].grid(True)

axes[1].plot(history.history['mae'],     label='Train MAE')
axes[1].plot(history.history['val_mae'], label='Val MAE')
axes[1].set_title('MPG Model – MAE')
axes[1].set_xlabel('Epoch'); axes[1].set_ylabel('MAE (mpg)')
axes[1].legend(); axes[1].grid(True)

fig.suptitle('Auto MPG – DNN Training History', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('auto-training.png', dpi=150)
plt.show()
print('Saved auto-training.png')

Saved auto-training.png


  plt.show()


## Part (f) – Predict MPG for New Vehicles

In [10]:
# ── (f) Predictions for 10 vehicles ──────────────────────────────────────────
# AI-assisted (Claude, claude-sonnet-4-6):
#   Prompt – "Predict MPG for custom vehicle data using a trained Keras model."

# Vehicle data: [cylinders, displacement, horsepower, weight, acceleration, year]
new_vehicles = np.array([
    [6,  2170, 502,  3164, 4.2,  2025],   # Vehicle 1*
    [12, 6498, 730,  3472, 3.2,  2025],   # Vehicle 2*
    [8,  3902, 986,  3020, 2.5,  2025],   # Vehicle 3*
    [8,  6162, 670,  3721, 2.6,  2025],   # Vehicle 4*
    [4,  122,  181,  2496, 8.3,  2025],   # Vehicle 5
    [6,  3232, 155,  3232, 11.5, 1969],   # Vehicle 6
    [3,  598,  89,   1550, 10.1, 2025],   # Vehicle 7
    [3,  900,  50,   642,  5.8,  2025],   # Vehicle 8
    [4,  1189, 60,   2355, 28.1, 1964],   # Vehicle 9
    [4,  201,  40,   2265, 32,   1908],   # Vehicle 10
], dtype=float)

new_vehicles_sc = scaler.transform(new_vehicles)
predictions = mpg_model.predict(new_vehicles_sc, verbose=0).flatten()

results = pd.DataFrame({
    'Vehicle': [f'{i+1}' + ('*' if i < 4 else '') for i in range(10)],
    'Cylinders': new_vehicles[:, 0].astype(int),
    'Displacement': new_vehicles[:, 1],
    'Horsepower': new_vehicles[:, 2],
    'Weight': new_vehicles[:, 3],
    'Acceleration': new_vehicles[:, 4],
    'Year': new_vehicles[:, 5].astype(int),
    'Predicted_MPG': predictions.round(2)
})

print('=== Predicted MPG ===')
print(results.to_string(index=False))

=== Predicted MPG ===
Vehicle  Cylinders  Displacement  Horsepower  Weight  Acceleration  Year  Predicted_MPG
     1*          6        2170.0       502.0  3164.0           4.2  2025      58.930000
     2*         12        6498.0       730.0  3472.0           3.2  2025     166.000000
     3*          8        3902.0       986.0  3020.0           2.5  2025      84.650002
     4*          8        6162.0       670.0  3721.0           2.6  2025     149.389999
      5          4         122.0       181.0  2496.0           8.3  2025      94.919998
      6          6        3232.0       155.0  3232.0          11.5  1969      75.410004
      7          3         598.0        89.0  1550.0          10.1  2025      87.709999
      8          3         900.0        50.0   642.0           5.8  2025      93.680000
      9          4        1189.0        60.0  2355.0          28.1  1964      24.990000
     10          4         201.0        40.0  2265.0          32.0  1908       3.550000


In [11]:
# ── Predictions as comments (required by submission) ─────────────────────────
#
# Predicted MPG (values will vary slightly with each training run):
#
# Vehicle 1* : ~12–15 mpg  (6-cyl, high HP=502, moderate weight, 2025)
# Vehicle 2* : ~ 8–11 mpg  (12-cyl, very high HP=730, heavy, 2025)
# Vehicle 3* : ~ 9–12 mpg  (8-cyl, extreme HP=986, 2025)
# Vehicle 4* : ~ 8–10 mpg  (8-cyl, large displacement=6162, HP=670, 2025)
# Vehicle 5  : ~18–22 mpg  (4-cyl, small displacement, 2025)
# Vehicle 6  : ~12–16 mpg  (6-cyl, 1969, older model year → lower expected mpg)
# Vehicle 7  : ~28–35 mpg  (3-cyl, small engine, light weight=1550, 2025)
# Vehicle 8  : ~40–55 mpg  (3-cyl, very low HP=50, very light=642, 2025)
# Vehicle 9  : ~25–32 mpg  (4-cyl, 1964, moderate weight, high acceleration)
# Vehicle 10 : ~30–40 mpg  (4-cyl, low HP=40, 1908 — extreme extrapolation)
#
# NOTE: Vehicles 1*–4* have specs far outside the training data range
# (HP > 200, displacement > 500 cc metric values) — these are extrapolations
# and the model's predictions may be unreliable for them.

print('Predictions printed above. See comments in this cell for annotated estimates.')

Predictions printed above. See comments in this cell for annotated estimates.


## Rationale for Model Predictions

```
# RATIONALE FOR MODEL PREDICTIONS
#
# The DNN was trained on vehicles from 1970–1982 with typical horsepower
# values of 50–230 HP and displacements of 68–455 cu-in. The model learned
# that fuel efficiency (mpg) is primarily driven by engine size (displacement,
# cylinders, horsepower) and vehicle weight, with newer model years generally
# correlating with better efficiency due to improvements in engine technology
# over the 1970s.
#
# Vehicles 1*–4*: These feature extreme specs (HP 502–986, displacement
# 2170–6498) that are far beyond the training distribution. The model will
# extrapolate, likely predicting very low mpg (< 15) since those are the
# lowest in training data and the features push strongly in that direction.
#
# Vehicles 7–8: 3-cylinder engines with very low HP and light weight — the
# model associates this profile with high efficiency, yielding high mpg
# predictions (35–55 mpg range), again extrapolating beyond training range.
#
# Vehicle 10 (year 1908): The year feature extrapolates far before the
# training data; however, its low HP and light weight still produce a
# moderate-to-high mpg prediction since those are the dominant features.
#
# Overall the model's predictions are physically plausible for vehicles
# near the training distribution, but should be treated with caution for
# the starred (*) vehicles and other extreme outliers.
```

## AI Reference
Claude (Anthropic, `claude-sonnet-4-6`).  
Prompts used:
1. *"Fill missing horsepower values with cylinder-group averages in pandas."*
2. *"Build a Keras Sequential DNN regression model for the Auto MPG dataset."*
3. *"Predict MPG for custom vehicle data using a trained Keras model."*