# Part 1: Basic regression: Predict fuel efficiency

In a *regression* problem, we aim to predict the output of a continuous value, like a price or a probability. Contrast this with a *classification* problem, where we aim to select a class from a list of classes (for example, where a picture contains an apple or an orange, recognizing which fruit is in the picture).

This notebook uses the classic [Auto MPG](https://archive.ics.uci.edu/ml/datasets/auto+mpg) Dataset and builds a model to predict the fuel efficiency of late-1970s and early 1980s automobiles. To do this, we'll provide the model with a description of many automobiles from that time period. This description includes attributes like: cylinders, displacement, horsepower, and weight.

This example uses the `tf.keras` API, see [this guide](https://www.tensorflow.org/guide/keras) for details.

In [1]:
# Use seaborn for pairplot
!pip install -q seaborn 

In [None]:
# Use some functions from tensorflow_docs
!pip install git+https://github.com/tensorflow/docs

In [None]:
from __future__ import absolute_import, division, print_function, unicode_literals

import pathlib

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns


In [None]:
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
import tensorflow_docs as tfdocs
import tensorflow_docs.plots
import tensorflow_docs.modeling

print(tf.__version__) #確認是2.0版本

## The Auto MPG dataset

The dataset is available from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/).



### Get the data
First download the dataset.

In [None]:
dataset_path = keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
dataset_path

Import it using pandas

In [None]:
#Rename column_names
column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
dataset = pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)

dataset.head()

### Clean the data

The dataset contains a few unknown values.

In [None]:
dataset.isna().sum()

To keep this initial tutorial simple drop those rows.

In [None]:
dataset = dataset.dropna() #去除欄位有na的資料

The `"Origin"` column is really categorical, not numeric. So convert that to a one-hot:

In [None]:
#欄位Origin原本內容為數字，我們已知各個數字代表的國家，將他命名為Europe,Japan,USA
dataset['Origin'] = dataset['Origin'].map(lambda x: {1: 'USA', 2: 'Europe', 3: 'Japan'}.get(x))

In [None]:
dataset.head()

In [None]:
#將類別欄位做one-hot encoding
dataset = pd.get_dummies(dataset, prefix='', prefix_sep='')
dataset.head()

### Split the data into train and test

Now split the dataset into a training set and a test set.

We will use the test set in the final evaluation of our model.

In [None]:
train = dataset.sample(frac=0.8,random_state=0)
test = dataset.drop(train.index)

### Split features from labels

Separate the target value, or "label", from the features. This label is the value that you will train the model to predict.

In [None]:
train = np.array(train)
test = np.array(test)

In [None]:
train_x = train[:,1:]
train_y = train[:,0:1]
test_x = test[:,1:]
test_y = test[:,0:1]

### Know how many training and test samples

In [None]:
print("train_x shape: " + str(train_x.shape)) # => (314, 9)
print("train_y shape: " + str(train_y.shape)) # => (314, 1)
print("test_x shape: " + str(test_x.shape)) # => (78, 9)
print("test_y shape: " + str(test_y.shape)) # => (78, 1)

### Normalize the data


In [None]:
def compute_mean_std(X):
    """ 
    X: 訓練資料 
    mu: 平均數
    sigma: 標準差
    """  
    # hint 若利用np.mean()，需加上`keepdim=True`。
    mu = np.mean(X, keepdims = True , axis=0)
    sigma = np.std(X, keepdims = True , axis=)

    return mu, sigma

# 利用平均數跟標準差進行標準化
def normalize_feat(X, mu, sigma):

    normalized_X = (X - mu) / sigma
    
    return normalized_X



mu, sigma = compute_mean_std(train_x)

# 讓訓練資料跟測試資料都進行標準化

train_x = normalize_feat(train_x, mu, sigma)
test_x = normalize_feat(test_x, mu, sigma)

## The model

### Build the model

Let's build our model. Here, we'll use a `Sequential` model with two densely connected hidden layers, and an output layer that returns a single, continuous value. The model building steps are wrapped in a function, `build_model`, since we'll create a second model, later on.

Different from classification problem, the loss function of regression problem here is mse. 

In [None]:
def build_model(input_dim,learning_rate):
    
    
    model = keras.Sequential([
        
        
        ### FILL IN THE BLANK ###
        ### hint: use relu as activation function in the hidden layer
        layers.Dense(64, activation=__, input_shape=(input_dim,)),  #input維度為(input_dim,*) output維度為(*,64)
        
        layers.Dense(64, activation=__),
        
        ### START CODE HERE ###
        ### hint: output layer
        
        ### END CODE HERE ###        

  ])
    optimizer = tf.keras.optimizers.RMSprop(lr=learning_rate)
    
    model.compile(loss='mse',
                optimizer=optimizer,
                metrics=['mae', 'mse'])
    return model

In [None]:
### START CODE HERE ###
### hint: feature number of training set
dim =
### END CODE HERE ###

learning_rate=0.001

model = build_model(input_dim=dim,learning_rate=learning_rate)

### Inspect the model

Use the `.summary` method to print a simple description of the model

In [None]:
model.summary()

### Train the model

Train the model for 1000 epochs, and record the training and validation accuracy in the `history` object.

In [None]:
EPOCHS = 1000

history = model.fit(
  train_x, train_y,
  epochs=EPOCHS, verbose=0,validation_split = 0.2,
  callbacks=[tfdocs.modeling.EpochDots()],)

Visualize the model's training progress using the stats stored in the `history` object.

In [None]:
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()

In [None]:
plotter = tfdocs.plots.HistoryPlotter(smoothing_std=2)

In [None]:
plotter.plot({'Basic': history}, metric = "mae")
plt.ylim([0, 10])
plt.ylabel('MAE [MPG]')

In [None]:
plotter.plot({'Basic': history}, metric = "mse")
plt.ylim([0, 20])
plt.ylabel('MSE [MPG^2]')

This graph shows little improvement, or even degradation in the validation error after about 100 epochs. 

Let's update the model.fit call to automatically stop training when the validation score doesn't improve. We'll use an **EarlyStopping** callback that tests a training condition for every epoch. If a set amount of epochs elapses without showing improvement, then automatically stop the training.

In [None]:
model = build_model(dim,learning_rate)

# The patience parameter is the amount of epochs to check for improvement
early_stop = keras.callbacks._____(monitor='val_loss', patience=10)

# use the same "model.fit" parameter above, while add early_stop to callbacks
early_history = model.fit(___)

In [None]:
plotter.plot({'Early Stopping': early_history}, metric = "mae")
plt.ylim([0, 10])
plt.ylabel('MAE [MPG]')

Let's see how well the model generalizes by using the **test** set, which we did not use when training the model.  This tells us how well we can expect the model to predict when we use it in the real world.

In [None]:
loss, mae, mse = model.evaluate(test_x, test_y, verbose=2)

print("Testing set Mean Abs Error: {:5.2f} MPG".format(mae))

### Your turn

Try different learning rate or optimizer (etc. Stochastic gradient descent(SGD), Momentum, Adam). 

You can learn more about the usage of optimizer in Keras [here](https://keras.io/zh/optimizers/).

Build the model, and plot the loss of train&validation set. Does it perform better? Does it perform as well in test set?

In [None]:
def build_model2(input_dim,learning_rate):
    
    
    model = keras.Sequential([
        
        
        ### FILL IN THE BLANK ###
        ### hint: use relu as activation function in the hidden layer
        layers.Dense(64, activation="relu", input_shape=(input_dim,)),  #input維度為(input_dim,*) output維度為(*,64)
        
        layers.Dense(64, activation="relu"),
        
        ### START CODE HERE ###
        ### hint: output layer
        layers.Dense(1)
        ### END CODE HERE ###        

  ])
    ### START CODE HERE ###
    ### TRY OTHER OPTIMIZER HERE
    optimizer = 
    ### END CODE HERE ###
    
    
    model.compile(loss='mse',
                optimizer=optimizer,
                metrics=['mae', 'mse'])
    return model

In [None]:
### FILL IN THE BLANK ###
### try different learning rate
learning_rate =
model2 = build_model2(dim,learning_rate)

In [None]:
EPOCHS = 1000

history = model.fit(
  train_x, train_y,
  epochs=EPOCHS, verbose=0,validation_split = 0.2,
  callbacks=[tfdocs.modeling.EpochDots()],)

In [None]:
plotter = tfdocs.plots.HistoryPlotter(smoothing_std=2)
plotter.plot({'Basic': history}, metric = "mae")
plt.ylim([0, 10])
plt.ylabel('MAE [MPG]')

In [None]:
plotter.plot({'Basic': history}, metric = "mse")
plt.ylim([0, 20])
plt.ylabel('MSE [MPG^2]')

In [None]:
model2 = build_model2(dim,learning_rate)

# The patience parameter is the amount of epochs to check for improvement
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)

early_history = model2.fit(train_x, train_y, 
                    epochs=EPOCHS, validation_split = 0.2, verbose=0, 
                    callbacks=[early_stop, tfdocs.modeling.EpochDots()])

In [None]:
plotter.plot({'Early Stopping': early_history}, metric = "mae")
plt.ylim([0, 10])
plt.ylabel('MAE [MPG]')

In [None]:
loss, mae, mse = model2.evaluate(test_x, test_y, verbose=2)

print("Testing set Mean Abs Error: {:5.2f} MPG".format(mae))

### Make predictions

Finally, predict MPG values using data in the testing set:

In [None]:
test_predictions = model.predict(test_x).flatten()

a = plt.axes(aspect='equal')
plt.scatter(test_y, test_predictions)
plt.xlabel('True Values [MPG]')
plt.ylabel('Predictions [MPG]')
lims = [0, 50]
plt.xlim(lims)
plt.ylim(lims)
plt.plot(lims, lims)


It looks like our model predicts reasonably well. Let's take a look at the error distribution.

In [None]:
error = test_predictions - test_y.flatten()
plt.hist(error, bins = 25)
plt.xlabel("Prediction Error [MPG]")
plt.ylabel("Count")

It's not quite gaussian, but we might expect that because the number of samples is very small.

## Conclusion

This notebook introduced a few techniques to handle a regression problem.

* Mean Squared Error (MSE) is a common loss function used for regression problems (different loss functions are used for classification problems).
* Similarly, evaluation metrics used for regression differ from classification. A common regression metric is Mean Absolute Error (MAE).
* When numeric input data features have values with different ranges, each feature should be scaled independently to the same range.
* If there is not much training data, one technique is to prefer a small network with few hidden layers to avoid overfitting.