In [1]:
import keras
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.optimizers import Adam


In [9]:
model = Sequential([
    Input(shape=(4,)),  # Define the input shape using an Input layer
    Dense(16, activation='relu'),
    Dense(32, activation='relu'),
    Dense(3, activation='softmax')
])

Explanation of the Arguments:
optimizer='adam':

Adam (Adaptive Moment Estimation) is a popular optimizer that adjusts learning rates dynamically based on gradients and their moments.
You can adjust the learning_rate as needed. Default is typically 0.001.
loss='categorical_crossentropy':

Used for multi-class classification problems when your target labels are one-hot encoded.
If your labels are integers (not one-hot encoded), use sparse_categorical_crossentropy.
metrics=['accuracy']:

Specifies the metrics to evaluate model performance during training and validation.
accuracy measures how often the model predicts the correct class.


When to Use Other Loss Functions or Optimizers?
For regression tasks: Use loss='mean_squared_error' or loss='mean_absolute_error'.
For binary classification tasks: Use loss='binary_crossentropy'.
Alternative optimizers: Try SGD, RMSprop, or Adagrad if Adam doesn't yield good results.

In [12]:

# Compile the model
model.compile(
    optimizer=Adam(learning_rate=0.001),  # Use the Adam optimizer with a learning rate of 0.001
    loss='categorical_crossentropy',      # Use cross-entropy loss for classification tasks
    metrics=['accuracy']                  # Track accuracy during training
)

In [14]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
import numpy as np

# Load the Iris dataset
data = load_iris()
X = data.data  # Features
y = data.target.reshape(-1, 1)  # Labels reshaped for one-hot encoding

# One-hot encode the target labels
encoder = OneHotEncoder(sparse_output=False)
y = encoder.fit_transform(y)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [16]:
# Train the model
history = model.fit(X_train, y_train, epochs=50, batch_size=16, validation_split=0.2)


Epoch 1/50
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 43ms/step - accuracy: 0.3940 - loss: 1.8997 - val_accuracy: 0.2083 - val_loss: 2.0298
Epoch 2/50
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.3688 - loss: 1.5508 - val_accuracy: 0.2083 - val_loss: 1.6687
Epoch 3/50
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 0.2521 - loss: 1.3324 - val_accuracy: 0.1667 - val_loss: 1.4578
Epoch 4/50
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - accuracy: 0.2210 - loss: 1.2205 - val_accuracy: 0.2917 - val_loss: 1.3153
Epoch 5/50
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - accuracy: 0.4205 - loss: 1.1259 - val_accuracy: 0.4167 - val_loss: 1.2051
Epoch 6/50
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - accuracy: 0.6562 - loss: 1.0312 - val_accuracy: 0.5000 - val_loss: 1.1097
Epoch 7/50
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━

In [18]:
# Evaluate the model on the test set
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss}")
print(f"Test Accuracy: {accuracy}")


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step - accuracy: 1.0000 - loss: 0.3510
Test Loss: 0.3509959876537323
Test Accuracy: 1.0



Having a 100% accuracy and a non-zero loss (e.g., 0.3510) may seem counterintuitive at first, but it is entirely possible due to how the loss and accuracy metrics are calculated. Here’s why this happens:

1. Accuracy vs. Loss
Accuracy: It is a discrete metric that checks whether the predicted class matches the true class. If the predicted class is correct (even if it’s only slightly better than others), it contributes to the accuracy metric.
Loss: It is a continuous metric that evaluates how confident the model's predictions are. For example, it measures the difference between the predicted probabilities (e.g., [0.8, 0.1, 0.1]) and the true probabilities (e.g., [1, 0, 0]) using the categorical_crossentropy loss function.