Load in the iris dataset:

In [1]:
from sklearn.datasets import load_iris

X, y = load_iris(as_frame=True, return_X_y=True)

In [3]:
X

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


In [2]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

We need to convert our pandas dataframes into tensorflow `Dataset` objects.

In [9]:
import tensorflow as tf

train = tf.data.Dataset.from_tensor_slices((X_train, y_train))
test = tf.data.Dataset.from_tensor_slices((X_test, y_test))

We are going to show the model the data 20 times apiece, since each time it sees the same example, it will be at a different point in the training process. We shuffle the data so that it doesn't see examples in the same order every time. The `batch` part of the code shows the examples to the model 32 at a time (which only affects the speed of training, not the outcome).

In [11]:
train = train.repeat(20).shuffle(1000).batch(32)

In [12]:
test = test.batch(1)

# Model set-up

We set up the model so that we have 2 hidden layers containing 10 nodes apiece. The output layer has 3 nodes because we have three species that we are trying to classify. By using `Dense`, we are specifying that each node is connected to every node in the next layer.

The `activation` argument specifies our activation function $\phi$. The ReLU function is

For the output node, we want the output data to be interpretable. In our case, we want to be able to treat the output as a probability that the observation belongs to that particular species.

In [13]:
model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation=tf.nn.relu),   # hidden layer
    tf.keras.layers.Dense(10, activation=tf.nn.relu),   # hidden layer
    tf.keras.layers.Dense(3, activation=tf.nn.softmax)  # output layer
])

We need a loss function that can handle data with more than two classes. We are using sparse categorical cross-entropy. We will use accuracy to show how well the model is performing.

In [14]:
model.compile(
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

In [15]:
model.fit(
    train,
    validation_data=test,
    epochs=10,
)

Epoch 1/10
 1/70 [..............................] - ETA: 46s - loss: 1.2447 - accuracy: 0.2500

2024-03-15 11:08:17.737809: I external/local_xla/xla/service/service.cc:168] XLA service 0x7088aea4df70 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-03-15 11:08:17.737830: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce MX550, Compute Capability 7.5
2024-03-15 11:08:17.741363: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-03-15 11:08:17.749016: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8907
I0000 00:00:1710500897.775393   83361 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7089ef9ce450>

In [16]:
predict_X = [
  [5.1, 3.3, 1.7, 0.5],
  [5.9, 3.0, 4.2, 1.5],
  [6.9, 3.1, 5.4, 2.1],
]

predictions = model.predict(predict_X)



In [17]:
predictions[0]

array([9.6228272e-01, 3.7702501e-02, 1.4804093e-05], dtype=float32)

In [18]:
predictions[0].argmax()

0

In [20]:
for pred_dict, expected in zip(predictions, ["setosa", "versicolor", "virginica"]):
    predicted_index = pred_dict.argmax()
    predicted = load_iris().target_names[predicted_index]
    probability = pred_dict.max()
    tick_cross = "✓" if predicted == expected else "✗"
    print(f"{tick_cross} Prediction is '{predicted}' ({100 * probability:.1f}%), expected '{expected}'")

✓ Prediction is 'setosa' (96.2%), expected 'setosa'
✓ Prediction is 'versicolor' (81.2%), expected 'versicolor'
✓ Prediction is 'virginica' (70.6%), expected 'virginica'


What happens if we decrease the number of hidden layers?

In [22]:
hidden1_model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation=tf.nn.relu),   # hidden layer
    tf.keras.layers.Dense(3, activation=tf.nn.softmax)  # output layer
])
hidden1_model.compile(
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)
hidden1_model.fit(
    train,
    validation_data=test,
    epochs=10,
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7089917dfa50>

We actually reach `val_accuracy` of 97.37% after 5 epochs, which is faster than the model with 2 hidden layers. Let's try it with 0 hidden layers:

In [23]:
hidden0_model = tf.keras.Sequential([
    tf.keras.layers.Dense(3, activation=tf.nn.softmax)  # output layer
])
hidden0_model.compile(
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)
hidden0_model.fit(
    train,
    validation_data=test,
    epochs=10,
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7089705d8e90>

It looks like we don't actually need any hidden layers to get high validation accuracy.

What happens if we decrease the number of nodes in the hidden layers?