# Tutorial A: Simple Regression with DNN

Adapted from: https://www.tensorflow.org/tutorials/keras/regression

This tutorial helps you guide to work on a simple *regression* problem. The regression task will be done by a **deep neural network**.

The dataset that is used is the Auto MPG dataset. Given several features of a vehicle, we plan to predict how many MPGs of fuel the vehicle consumes on average.

## 1. Prepare Libraries

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Make NumPy printouts easier to read.
np.set_printoptions(precision=3, suppress=True)

In [None]:
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.utils import plot_model

print(tf.__version__)

We can also view what GPU resources that we can work with using this kernel. This command is specific for NVIDIA GPUs only.

In [None]:
!nvidia-smi

The GPU that is installed in the DGX-A100 system (NVIDIA A100) has **Multi-Instance GPU (MIG)** support, where one GPU can be divided into several seperate instances. The resources that you see on the second table is the GPU instance that is assigned for this specific Pod.

More information on NVIDIA's MIG: https://blogs.nvidia.com/blog/2020/05/14/multi-instance-gpus/

## 2. Retrieve The Dataset

For this task, we will use The Auto MPG dataset provided on UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/)

The dataset consists of 8 columns, which are:
- **MPG (Miles per gallon)**: Measures the fuel efficiency of a vehicle.
- **Cylinders**: How many cylinders that the vehicle's engine have
- **Displacement**: Measures the size of the engine.
- **Horsepower**: Measures the power output of the vehicle.
- **Weight**: The total weight of the vehicle.
- **Acceleration**: How many seconds the vehicle needed to reach 60 mph from 0.
- **Model Year**: Defines the year when the vehicle was made.
- **Origin**: Defines which area the vehicle was originally manufactured.

In [None]:
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight',
                'Acceleration', 'Model Year', 'Origin']

raw_dataset = pd.read_csv(url, names=column_names,
                          na_values='?', comment='\t',
                          sep=' ', skipinitialspace=True)

In [None]:
dataset = raw_dataset.copy()
dataset.tail()

### Cleaning The Data

Most of the data is already clean enough, except for one specific column, which is horsepower. There are several ways to clean the data. For this case, we will just ignore the vehicles without any horsepower information completely.

In [None]:
dataset.isna().sum()

In [None]:
dataset = dataset.dropna()

### Do One-Hot Encoding for Column "Origin"

We will do a one-hot encoding for the column origin, due to its categorical attribute.

In [None]:
dataset['Origin'] = dataset['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})
dataset = pd.get_dummies(dataset, columns=['Origin'], prefix='', prefix_sep='')
dataset.tail()

### Split The Data

In [None]:
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)

## 3. Exploratory Data Analysis

In [None]:
sns.pairplot(train_dataset[['MPG', 'Cylinders', 'Displacement', 'Weight']], diag_kind='kde')

In [None]:
train_dataset.describe()

### Seperate Features ($X$) and Labels ($y$)


In [None]:
train_features = train_dataset.copy()
test_features = test_dataset.copy()

train_labels = train_features.pop('MPG')
test_labels = test_features.pop('MPG')

## 4. Simple Regression with Deep Neural Network

Now we will begin the modeling process using a deep neural network.

### Add Normalization

Before going any further, it is recommended to apply a normalization for all the columns to make sure all the features are equally scaled.

In [None]:
normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(np.array(train_features))

For the model, we will use a normalization layer, then added with 3 hidden dense layers and one output layer.

In [None]:
def build_and_compile_model(norm):
  model = keras.Sequential([
      norm,
      layers.Dense(64, activation='relu'),
      layers.Dense(64, activation='relu'),
      layers.Dense(64, activation='relu'),
      layers.Dense(1)  # output layer
  ])

  model.compile(loss='mean_absolute_error',
                optimizer=tf.keras.optimizers.Adam(0.001))
  return model

In [None]:
dnn_model = build_and_compile_model(normalizer)
dnn_model.summary()

In [None]:
plot_model(dnn_model)

We will start the training process with 100 epochs. Feel free to experiment with different hyperparameter values.

In [None]:
%%time
history = dnn_model.fit(
    train_features,
    train_labels,
    validation_split=0.2,
    verbose=0, epochs=100)

In [None]:
def plot_loss(history):
  plt.plot(history.history['loss'], label='loss')
  plt.plot(history.history['val_loss'], label='val_loss')
  plt.ylim([0, 10])
  plt.xlabel('Epoch')
  plt.ylabel('Error [MPG]')
  plt.legend()
  plt.grid(True)

plot_loss(history)

## Additional: View GPU Usage

In [None]:
if tf.config.list_physical_devices('GPU'):
  print(tf.config.experimental.get_memory_info('GPU:0'))