# Neural Network Models for Combined Classification and Regression

Some prediction problems require predicting both numeric values and a class label for the same input.

A simple approach is to develop both regression and classification predictive models on the same data and use the models sequentially.

An alternative and often more effective approach is to develop a single neural network model that can predict both a numeric and class label value from the same input. This is called a multi-output model and can be relatively easy to develop and evaluate using modern deep learning libraries such as Keras and TensorFlow

**Single Model for Regression and Classification**

It is common to develop a deep learning neural network model for a regression or classification problem, but on some predictive modeling tasks, we may want to develop a single model that can make both regression and classification predictions.

Regression refers to predictive modeling problems that involve predicting a numeric value given an input.
Classification refers to predictive modeling problems that involve predicting a class label or probability of class labels for a given input.
There may be some problems where we want to predict both a numerical value and a classification value.

One approach to solving this problem is to develop a separate model for each prediction that is required.
The problem with this approach is that the predictions made by the separate models may diverge.
An alternate approach that can be used when using neural network models is to develop a single model capable of making separate predictions for a numeric and class output for the same input.

**This is called a multi-output neural network model.**

The benefit of this type of model is that we have a single model to develop and maintain instead of two models and that training and updating the model on both output types at the same time may offer more consistency in the predictions between the two output types.

We will develop a multi-output neural network model capable of making regression and classification predictions at the same time.

First, let’s select a dataset where this requirement makes sense and start by developing separate models for both regression and classification predictions.

**Separate Regression and Classification Models**

In this section, we will start by selecting a real dataset where we may want regression and classification predictions at the same time, then develop separate models for each type of prediction.

**Abalone Dataset**
We will use the “abalone” dataset.

Determining the age of an abalone is a time-consuming task and it is desirable to determine the age from physical details alone.

This is a dataset that describes the physical details of abalone and requires predicting the number of rings of the abalone, which is a proxy for the age of the creature.

You can learn more about the dataset from here:

- Dataset (abalone.csv)
- Dataset Details (abalone.names)
The “age” can be predicted as both a numerical value (in years) or a class label (ordinal year as a class).

No need to download the dataset as we will download it automatically as part of the worked examples.
The dataset provides an example of a dataset where we may want both a numerical and classification of an input.


In [16]:
#Load the dataset
import pandas as pd
from pandas import read_csv
from matplotlib import pyplot
import numpy as np
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/abalone.csv'
dataframe = read_csv(url, header=None)
# summarize shape
print(dataframe.shape)
# summarize first few lines
print(dataframe.head())

(4177, 9)
   0      1      2      3       4       5       6      7   8
0  M  0.455  0.365  0.095  0.5140  0.2245  0.1010  0.150  15
1  M  0.350  0.265  0.090  0.2255  0.0995  0.0485  0.070   7
2  F  0.530  0.420  0.135  0.6770  0.2565  0.1415  0.210   9
3  M  0.440  0.365  0.125  0.5160  0.2155  0.1140  0.155  10
4  I  0.330  0.255  0.080  0.2050  0.0895  0.0395  0.055   7


We can see that there are 4,177 examples (rows) that we can use to train and evaluate a model and 9 features (columns) including the target variable.

We can see that all input variables are numeric except the first, which is a string value.

To keep data preparation simple, we will drop the first column from our models and focus on modeling the numeric input values.

# Regression Model

In [17]:
dataset = dataframe.values
X, y = dataset[:, 1:-1], dataset[:, -1]
X, y = X.astype('float'), y.astype('float')
n_features = X.shape[1]

In [18]:
#https://www.askpython.com/python/built-in-methods/python-iloc-function#:~:text=Python%20iloc()%20function%20enables,a%20data%20frame%20or%20dataset.
#https://datacarpentry.org/python-ecology-lesson/03-index-slice-subset/index.html
#X = dataframe.iloc[: , 1:8]
#X.head()

In [19]:
#Use iloc and select all rows (:) against the last column (-1):
#y = dataframe.iloc[: ,-1:] 
#y.head()

In [20]:
X, y = X.astype('float'), y.astype('float')
n_features = X.shape[1]
print(X.shape[0]) #Rows
print(X.shape[1]) # Cols

4177
7


In [21]:
#Split data into train test using train_test_split
import sklearn
from sklearn.model_selection import train_test_split
X_train , X_test , y_train , y_test = train_test_split(X,y,test_size =0.33 , random_state =1)

We can then define an MLP neural network model.

The model will have two hidden layers, **the first with 20 nodes** **and the second with 10 nodes**, both using ReLU activation and “he normal” weight initialization (a good practice). The number of layers and nodes were chosen arbitrarily.

The output layer will have a single node for predicting a numeric value and a linear activation function.

In [22]:
#define model
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import Activation, Dense 

model = keras.Sequential()
model.add(Dense(20 , input_dim = n_features , activation = 'relu',kernel_initializer = 'he_normal'))
model.add(Dense(10 , activation = 'relu',kernel_initializer = 'he_normal'))
model.add(Dense(1 , activation = 'relu'))

In [23]:
# compile the keras model
model.compile(loss='mse', optimizer='adam') #mean-squared-error , Adam = momentum+SGD

In [24]:
# fit the keras model on the dataset
model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=2)

Epoch 1/150
2798/2798 - 0s - loss: 72.4171
Epoch 2/150
2798/2798 - 0s - loss: 22.7966
Epoch 3/150
2798/2798 - 0s - loss: 10.6451
Epoch 4/150
2798/2798 - 0s - loss: 9.6749
Epoch 5/150
2798/2798 - 0s - loss: 8.8557
Epoch 6/150
2798/2798 - 0s - loss: 8.2203
Epoch 7/150
2798/2798 - 0s - loss: 7.7734
Epoch 8/150
2798/2798 - 0s - loss: 7.4433
Epoch 9/150
2798/2798 - 0s - loss: 7.1905
Epoch 10/150
2798/2798 - 0s - loss: 6.9354
Epoch 11/150
2798/2798 - 0s - loss: 6.7577
Epoch 12/150
2798/2798 - 0s - loss: 6.5414
Epoch 13/150
2798/2798 - 0s - loss: 6.2976
Epoch 14/150
2798/2798 - 0s - loss: 6.0872
Epoch 15/150
2798/2798 - 0s - loss: 5.8842
Epoch 16/150
2798/2798 - 0s - loss: 5.7507
Epoch 17/150
2798/2798 - 0s - loss: 5.6194
Epoch 18/150
2798/2798 - 0s - loss: 5.4922
Epoch 19/150
2798/2798 - 0s - loss: 5.4805
Epoch 20/150
2798/2798 - 0s - loss: 5.3337
Epoch 21/150
2798/2798 - 0s - loss: 5.2353
Epoch 22/150
2798/2798 - 0s - loss: 5.1823
Epoch 23/150
2798/2798 - 0s - loss: 5.1416
Epoch 24/150
2798

<tensorflow.python.keras.callbacks.History at 0x7fa315008ac8>

In [25]:
#Evaluate test set
from sklearn.metrics import mean_absolute_error
yhat = model.predict(X_test)
error = mean_absolute_error(y_test, yhat)
print('MAE: %.3f' % error)

MAE: 1.551


# Classification Model

The abalone dataset can be framed as a classification problem where each “ring” integer is taken as a separate class label.

The example and model are much the same as the above example for regression, with a few important changes.

This requires first assigning a separate integer for each “ring” value, starting at 0 and ending at the total number of “classes” minus one.

This can be achieved using the LabelEncoder.

We can also record the total number of classes as the total number of unique encoded class values, which will be needed by the model later

In [31]:
#encode strings to integer
from sklearn import preprocessing
y = preprocessing.LabelEncoder().fit_transform(y)
n_class = len(pd.unique(y))

After splitting the data into train and test sets as before, we can define the model and change the number of outputs from the model to equal the number of classes and use the softmax activation function, common for multi-class classification.

In [62]:
model = Sequential()
model.add(Dense(20, input_dim=n_features, activation='relu', kernel_initializer='he_normal'))
model.add(Dense(10, activation='relu', kernel_initializer='he_normal'))
model.add(Dense(n_class, activation='softmax'))

Given we have encoded class labels as integer values, we can fit the model by minimizing the sparse categorical cross-entropy loss function, appropriate for multi-class classification tasks with integer encoded class labels

In [63]:
# compile the keras model
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')

After the model is fit on the training dataset as before, we can evaluate the performance of the model by calculating the classification accuracy on the hold-out test set.

In [64]:
# evaluate on test set
from sklearn.metrics import accuracy_score
yhat = model.predict(X_test)
yhat = np.argmax(yhat, axis=-1).astype('int')
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)

Accuracy: 0.057


# Combined Regression and Classification Models

In this section, we can develop a single MLP neural network model that can make both regression and classification predictions for a single input.

This is called a multi-output model and can be developed using the functional Keras API.

In [65]:
# mlp for combined regression and classification predictions on the abalone dataset
from numpy import unique
from numpy import argmax
from pandas import read_csv
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import plot_model

# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/abalone.csv'
dataframe = read_csv(url, header=None)
dataset = dataframe.values

# split into input (X) and output (y) variables
X, y = dataset[:, 1:-1], dataset[:, -1]
X, y = X.astype('float'), y.astype('float')
n_features = X.shape[1]

# encode strings to integer
y_class = LabelEncoder().fit_transform(y)
n_class = len(unique(y_class))
# split data into train and test sets
X_train, X_test, y_train, y_test, y_train_class, y_test_class = train_test_split(X, y, y_class, test_size=0.33, random_state=1)
# input
visible = Input(shape=(n_features,))
hidden1 = Dense(20, activation='relu', kernel_initializer='he_normal')(visible)
hidden2 = Dense(10, activation='relu', kernel_initializer='he_normal')(hidden1)
# regression output
out_reg = Dense(1, activation='linear')(hidden2)
# classification output
out_clas = Dense(n_class, activation='softmax')(hidden2)
# define model
model = Model(inputs=visible, outputs=[out_reg, out_clas])
# compile the keras model
model.compile(loss=['mse','sparse_categorical_crossentropy'], optimizer='adam')
# plot graph of model
plot_model(model, to_file='model.png', show_shapes=True)
# fit the keras model on the dataset
model.fit(X_train, [y_train,y_train_class], epochs=150, batch_size=32, verbose=2)
# make predictions on test set
yhat1, yhat2 = model.predict(X_test)
# calculate error for regression model
error = mean_absolute_error(y_test, yhat1)
print('MAE: %.3f' % error)
# evaluate accuracy for classification model
yhat2 = argmax(yhat2, axis=-1).astype('int')
acc = accuracy_score(y_test_class, yhat2)
print('Accuracy: %.3f' % acc)

Failed to import pydot. You must install pydot and graphviz for `pydotprint` to work.
Epoch 1/150
2798/2798 - 0s - loss: 86.6990 - dense_55_loss: 83.1833 - dense_56_loss: 3.2698
Epoch 2/150
2798/2798 - 0s - loss: 36.5393 - dense_55_loss: 33.4620 - dense_56_loss: 2.9229
Epoch 3/150
2798/2798 - 0s - loss: 13.3321 - dense_55_loss: 10.6753 - dense_56_loss: 2.6392
Epoch 4/150
2798/2798 - 0s - loss: 10.8706 - dense_55_loss: 8.3228 - dense_56_loss: 2.5527
Epoch 5/150
2798/2798 - 0s - loss: 10.5514 - dense_55_loss: 8.0116 - dense_56_loss: 2.5191
Epoch 6/150
2798/2798 - 0s - loss: 10.2564 - dense_55_loss: 7.7441 - dense_56_loss: 2.5023
Epoch 7/150
2798/2798 - 0s - loss: 9.9802 - dense_55_loss: 7.4908 - dense_56_loss: 2.4914
Epoch 8/150
2798/2798 - 0s - loss: 9.7149 - dense_55_loss: 7.2150 - dense_56_loss: 2.4769
Epoch 9/150
2798/2798 - 0s - loss: 9.4734 - dense_55_loss: 6.9737 - dense_56_loss: 2.4658
Epoch 10/150
2798/2798 - 0s - loss: 9.2510 - dense_55_loss: 6.8008 - dense_56_loss: 2.4577
Epoc

---
https://machinelearningmastery.com/deep-learning-models-for-multi-output-regression/

END