# DNN multi-class classifier
Let’s say we have a set of body measurements (height and weight, for instance) and the gender associated with each set of measurements, and we want to predict whether someone is a baby, toddler, preteen, teenager, or adult. We want our model to classify, or predict, from more than one class or label—in this example, we have a total of five classes of age categories. To do this, we can use another form of a DNN, called a multiclass classifier.

We can already see we will have some complications. For example, men on average as adults are taller than women. But during the preteen years, girls tend to be taller than boys. We know on average that men get heavier early in their adult years in comparison to their teenage years, but women on average are less likely to become heavier. So we should anticipate problems in predicting around the preteen years for girls, teenage years for boys, and adult years for women.

These problems are examples of nonlinearity; the relationship between a feature and a prediction is not linear. Instead, the relationship can be broken into segments of disjointed linearity. This is the type of problem neural networks are good at.

<img src="img_5.png">

The following code shows an example of constructing a multiclass classifier DNN. We start by setting up our input and output layers with the multiple features and multiple classes, respectively. Then we change the activation function from sigmoid to softmax. Next we set our loss function to categorical_crossentropy. This is generally the most recommended for a multiclass classification.


The following code shows an example of constructing a multiclass classifier DNN. We start by setting up our input and output layers with the multiple features and multiple classes, respectively. Then we change the activation function from sigmoid to softmax. Next we set our loss function to categorical_crossentropy. This is generally the most recommended for a multiclass classification.

Finally, we will use a popular and widely used variant of gradient descent called the Adam optimizer (adam). Adam incorporates several aspects of other methods, such as rmsprop (root mean square) and adagrad (adaptive gradient), along with an adaptive learning rate. It’s generally considered the best-in-class optimizer for a wide variety of neural networks:

In [1]:
from keras.layers import Dense, Input
from keras import Model, Sequential

## Sequential API

In [2]:
model = Sequential([
    Dense(10, activation="relu", input_shape=(13,)),
    Dense(10, activation="relu"),
    Dense(5, activation="softmax")
])
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=['accuracy'])
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 10)                140       
                                                                 
 dense_1 (Dense)             (None, 10)                110       
                                                                 
 dense_2 (Dense)             (None, 5)                 55        
                                                                 
Total params: 305
Trainable params: 305
Non-trainable params: 0
_________________________________________________________________


## Functional API

In [3]:
functional_input = Input((13,))
X = Dense(10, activation="relu")(functional_input)
X = Dense(10, activation="relu")(X)
functional_output = Dense(5, activation="softmax")(X)
functional_model = Model(functional_input, functional_output)
functional_model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
functional_model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 13)]              0         
                                                                 
 dense_3 (Dense)             (None, 10)                140       
                                                                 
 dense_4 (Dense)             (None, 10)                110       
                                                                 
 dense_5 (Dense)             (None, 5)                 55        
                                                                 
Total params: 305
Trainable params: 305
Non-trainable params: 0
_________________________________________________________________
