In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical

(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(-1, 28, 28, 1) / 255.0
X_test = X_test.reshape(-1, 28, 28, 1) / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, batch_size=64, epochs=10, verbose=1)

loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss}")
print(f"Test Accuracy: {accuracy}")


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test Loss: 0.037763990461826324
Test Accuracy: 0.9905999898910522


TensorFlow: A machine learning framework for creating and training neural networks.
Keras: A high-level API for building neural networks, now part of TensorFlow.
MNIST: A dataset of handwritten digits widely used for training and evaluating machine learning models.
Sequential Model: A simple linear stack of layers, ideal for straightforward model architectures.
Conv2D, MaxPooling2D, Flatten, Dense: Layers commonly used in CNNs.
to_categorical: A utility function to convert class labels to one-hot encoded format.
    
Loading Data: The mnist.load_data() function loads the MNIST dataset, returning training and test sets.
Reshaping and Normalizing: The code reshapes the images to have a single channel (grayscale) and scales pixel values to the range [0, 1] by dividing by 255. This normalization step improves convergence during training.
One-Hot Encoding: Converts the labels (digits 0-9) into a one-hot encoded format, where each label is represented as a binary vector (e.g., 3 becomes [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]).
    
Sequential Model: Defines a straightforward CNN with a stack of layers.
Conv2D Layers: Perform 2D convolutions on the input images. Convolutional layers apply filters (small matrices) to detect features such as edges or textures.
Activation Function: ReLU (Rectified Linear Unit) is used in the convolutional layers, introducing non-linearity to the model.
MaxPooling2D: Reduces the spatial dimensions of the feature maps, typically by taking the maximum value in each pooling region. This process helps reduce computational complexity and captures significant features.
Flatten: Converts the 3D feature maps into a 1D vector for input into dense (fully connected) layers.
Dense Layers: Fully connected layers where each node is connected to every node in the previous layer. The final dense layer uses the softmax activation function, producing a probability distribution across 10 classes, corresponding to digits 0-9.
    
Optimizer: Adam (Adaptive Moment Estimation) is a commonly used optimizer for deep learning. It combines the benefits of adaptive learning rates and momentum.
Loss Function: Categorical cross-entropy is used for multi-class classification tasks with one-hot encoded labels.
Metrics: Accuracy is used to evaluate model performance during training and testing.
    
Training: The fit method trains the model on the training data.
Batch Size: 64 images are processed at a time during each training step (minibatch training).
Epochs: The model is trained for 10 epochs, indicating how many times the entire training set is passed through the network.
Verbose: Specifies the level of detail for logging during training.
    
Evaluation: The evaluate method computes the loss and accuracy on the test set, providing a measure of the model's performance on unseen data.
Loss: Indicates how well the model fits the test data. Lower loss generally means better performance.
Accuracy: The proportion of correct predictions out of the total predictions, used to assess classification performance.

Conclusion
This code snippet demonstrates how to build, train, and evaluate a CNN for digit classification using the MNIST dataset. The model structure, activation functions, optimizer, and loss function are common choices for this type of task. The code's modularity and flexibility make it a solid starting point for more complex deep learning projects.