In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam

# Load and preprocess the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train / 255.0
X_test = X_test / 255.0

# Define the model architecture
model = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer=Adam(learning_rate=0.001),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(X_train, y_train, batch_size=64, epochs=10, verbose=1)

# Evaluate the model
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss}")
print(f"Test Accuracy: {accuracy}")

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test Loss: 0.07652948051691055
Test Accuracy: 0.9782000184059143


TensorFlow: A popular machine learning framework developed by Google for building and deploying neural networks.
Keras: The high-level API within TensorFlow used to create neural networks in a user-friendly way.
MNIST Dataset: A benchmark dataset for handwritten digit classification, widely used in machine learning.
Sequential Model: A simple linear model that is built by stacking layers one after the other.
Dense and Flatten Layers: Common layers used in neural networks. Flatten transforms multidimensional data into a 1D vector, while Dense is a fully connected layer.
Adam Optimizer: An optimization algorithm that uses adaptive learning rates and momentum.

Load Data: The mnist.load_data() function loads the dataset into training and testing sets. Each set consists of image data (X) and labels (y).
Normalization: Dividing by 255 scales the pixel values from [0, 255] to [0, 1]. This normalization helps with faster convergence during training and reduces sensitivity to varying input scales.
    
Flatten Layer: The first layer in the model, converting 2D images (28x28) into a 1D vector of 784 elements. This prepares the data for the fully connected layers.
Dense Layer (128 units): A fully connected layer with 128 units and ReLU (Rectified Linear Unit) activation. ReLU introduces non-linearity, helping the model learn complex patterns.
Dense Layer (10 units): The output layer with 10 units, corresponding to the 10 digits (0-9). The softmax activation function ensures the output is a probability distribution, summing to 1. This is useful for multi-class classification, as it indicates the predicted probability for each class.

    
Optimizer (Adam): Adam is a commonly used optimizer in deep learning, known for its adaptive learning rates and fast convergence.
Loss Function (Sparse Categorical Cross-Entropy): This loss function is used for multi-class classification tasks with sparse labels (labels are integers, not one-hot encoded). It calculates the cross-entropy loss, which measures the difference between the predicted probabilities and the true class.
Metrics (Accuracy): Accuracy is a common metric for classification tasks, indicating the proportion of correct predictions.
    
Training: The fit method trains the model on the training dataset.
Batch Size (64): The number of samples processed in each training step. Smaller batch sizes can lead to faster training but more noise, while larger batch sizes are smoother but require more memory.
Epochs (10): The number of times the entire training dataset is passed through the model during training.
Verbose: Controls the amount of output during training. A value of 1 displays a progress bar and training metrics.
    
Evaluation: The evaluate method measures the model's performance on the test dataset, providing the loss and accuracy.
Loss: Indicates how well the model's predictions align with the actual labels. Lower loss typically means better performance.
Accuracy: A measure of the proportion of correct predictions. Higher accuracy indicates better classification performance.

Conclusion
This code demonstrates a simple feedforward neural network for the MNIST dataset. The model includes a Flatten layer to convert 2D images to a 1D vector, followed by a fully connected (Dense) layer with ReLU activation and an output layer with softmax activation. The code normalizes the input data, compiles the model with Adam optimizer, and trains the model with a batch size of 64 over 10 epochs. The evaluation step assesses the model's performance on the test set, providing a measure of accuracy and loss. Overall, this code snippet serves as a good starting point for understanding basic neural network concepts and their application to digit classification.

