Importing the TensorFlow library and loads the MNIST dataset, which consists of 70,000 grayscale images of handwritten digits (0–9), each sized 28x28 pixels. The dataset is automatically split into a training set and a test set, where `x_train` and `y_train` contain the images and labels for training (60,000 samples), and `x_test` and `y_test` contain the images and labels for testing (10,000 samples). The printed shapes of these arrays help verify that the data has been loaded correctly and show the structure of the input and output data. This dataset is commonly used for training and evaluating models in image classification tasks, especially for beginners in deep learning.

In [None]:
import tensorflow as tf
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
print("x_train shape:", x_train.shape)
print("y_train shape:", y_train.shape)
print("x_test shape:", x_test.shape)
print("y_test shape:", y_test.shape)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step
x_train shape: (60000, 28, 28)
y_train shape: (60000,)
x_test shape: (10000, 28, 28)
y_test shape: (10000,)


This code installs the required libraries (scikit-image, OpenCV, and TensorFlow) and then extracts meaningful features from the MNIST dataset using various image processing techniques. It uses the Histogram of Oriented Gradients (HOG) feature descriptor to capture edge and texture information from the images. Additionally, it computes image characteristics such as the mean and variance of pixel values, histogram equalization for contrast enhancement, and edge detection using the Canny algorithm. These features are then compiled into feature vectors for each image in both the training and test sets. The resulting features are saved into CSV files (`mnist_train_features.csv` and `mnist_test_features.csv`), which can be used for further analysis or machine learning model training.

In [None]:
!pip install scikit-image opencv-python tensorflow
import tensorflow as tf
import numpy as np
import pandas as pd
from skimage.feature import hog
from skimage import exposure
import cv2
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
def extract_features(images):
    all_features = []
    for img in images:
        image_features = []
        fd, hog_image = hog(img, orientations=9, pixels_per_cell=(8, 8),
                            cells_per_block=(2, 2), visualize=True, channel_axis=None)
        image_features.extend(fd)

        equ = exposure.equalize_hist(img)
        image_features.append(np.mean(equ))

        edges = cv2.Canny(img, 100, 200)
        image_features.append(np.sum(edges))

        image_features.append(np.mean(img))

        image_features.append(np.var(img))

        all_features.append(image_features)

    return np.array(all_features)


train_features = extract_features(x_train)

test_features = extract_features(x_test)


train_df = pd.DataFrame(train_features)
train_df['label'] = y_train

test_df = pd.DataFrame(test_features)
test_df['label'] = y_test

train_df.to_csv('mnist_train_features.csv', index=False)
test_df.to_csv('mnist_test_features.csv', index=False)

print("Feature extraction and CSV creation complete.")

Feature extraction and CSV creation complete.


This code attempts to load the previously saved CSV files containing the MNIST dataset features using the `pandas` library. It first tries to read the `mnist_train_features.csv` and `mnist_test_features.csv` files. If successful, it prints the first few rows of each dataset to display a snapshot of the training and testing data. The code also includes error handling to catch various issues: if the files are not found, it will display a `FileNotFoundError` message; if the files are empty, it will display an `EmptyDataError` message; and if there's a problem with parsing the files, it will raise a `ParserError`. Any other unexpected errors are captured by a general exception handler, ensuring that the program doesn't crash unexpectedly.

In [None]:
import pandas as pd

try:
    train_df = pd.read_csv('mnist_train_features.csv')
    test_df = pd.read_csv('mnist_test_features.csv')

    print("Training Data:")
    print(train_df.head())

    print("\nTesting Data:")
    print(test_df.head())

except FileNotFoundError:
    print("Error: One or both CSV files not found. Please make sure 'mnist_train_features.csv' and 'mnist_test_features.csv' exist in the current directory.")
except pd.errors.EmptyDataError:
    print("Error: One or both CSV files are empty.")
except pd.errors.ParserError:
    print("Error: Could not parse the CSV file(s). Check the file format.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Training Data:
          0    1         2    3         4         5    6         7    8  \
0  0.114675  0.0  0.022903  0.0  0.000000  0.000000  0.0  0.000000  0.0   
1  0.000000  0.0  0.000000  0.0  0.000000  0.000000  0.0  0.000000  0.0   
2  0.177914  0.0  0.155180  0.0  0.058893  0.071822  0.0  0.061913  0.0   
3  0.000000  0.0  0.000000  0.0  0.000000  0.000000  0.0  0.000000  0.0   
4  0.000000  0.0  0.000000  0.0  0.000000  0.000000  0.0  0.000000  0.0   

          9  ...       139  140       141       142       143       144  \
0  0.000000  ...  0.064006  0.0  0.181533  0.000000  0.179817  0.813206   
1  0.003872  ...  0.054652  0.0  0.000000  0.010408  0.013600  0.802690   
2  0.000000  ...  0.000000  0.0  0.000000  0.000000  0.357638  0.859121   
3  0.000000  ...  0.000000  0.0  0.000000  0.000000  0.000000  0.885738   
4  0.000000  ...  0.000000  0.0  0.000000  0.073144  0.050103  0.836489   

       145        146          147  label  
0  24735.0  35.108418  6343.935950     

This code performs model training and evaluation using various machine learning algorithms on the MNIST dataset features stored in CSV files. After loading the training and testing data, it separates the features (`X_train` and `X_test`) from the labels (`y_train` and `y_test`). The script then defines a set of machine learning models, including K-Nearest Neighbors (KNN), Random Forest, Support Vector Machine (SVM), Decision Tree, and Logistic Regression. Each model is trained on the training data, and its performance is evaluated on the test data using multiple metrics: accuracy, precision, and F1 score. The code also prints a detailed classification report for each model, which includes precision, recall, and F1 score per class. Finally, it prints a summary comparison of the models' performance. Error handling is implemented to catch issues such as missing files or other unexpected errors.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, f1_score, classification_report

try:
    train_df = pd.read_csv('mnist_train_features.csv')
    test_df = pd.read_csv('mnist_test_features.csv')

    X_train = train_df.drop('label', axis=1)
    y_train = train_df['label']
    X_test = test_df.drop('label', axis=1)
    y_test = test_df['label']

    models = {
        "KNN": KNeighborsClassifier(),
        "Random Forest": RandomForestClassifier(),
        "SVM": SVC(),
        "Decision Tree": DecisionTreeClassifier(),
        "Logistic Regression": LogisticRegression(max_iter=1000)
    }

    results = {}
    for name, model in models.items():
        print(f"Training {name}...")
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)

        results[name] = {
            "Accuracy": accuracy_score(y_test, y_pred),
            "Precision": precision_score(y_test, y_pred, average='weighted'),
            "F1 Score": f1_score(y_test, y_pred, average='weighted')
        }
        print(classification_report(y_test,y_pred))

    print("\nComparison of Model Performance:")
    for name, metrics in results.items():
        print(f"{name}:")
        for metric, value in metrics.items():
            print(f"  {metric}: {value:.4f}")
except FileNotFoundError:
    print("Error: One or both CSV files not found.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Training KNN...
              precision    recall  f1-score   support

           0       0.29      0.52      0.37       980
           1       0.80      0.90      0.85      1135
           2       0.15      0.22      0.18      1032
           3       0.14      0.14      0.14      1010
           4       0.20      0.20      0.20       982
           5       0.12      0.08      0.10       892
           6       0.13      0.09      0.11       958
           7       0.25      0.24      0.24      1028
           8       0.12      0.07      0.09       974
           9       0.13      0.08      0.10      1009

    accuracy                           0.26     10000
   macro avg       0.23      0.25      0.24     10000
weighted avg       0.24      0.26      0.25     10000

Training Random Forest...
              precision    recall  f1-score   support

           0       0.96      0.98      0.97       980
           1       0.98      0.99      0.99      1135
           2       0.95      0.97   

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

           0       0.31      0.73      0.44       980
           1       0.88      0.89      0.89      1135
           2       0.21      0.17      0.19      1032
           3       0.15      0.18      0.16      1010
           4       0.18      0.44      0.25       982
           5       0.10      0.00      0.00       892
           6       0.36      0.00      0.01       958
           7       0.30      0.52      0.38      1028
           8       0.00      0.00      0.00       974
           9       0.13      0.03      0.05      1009

    accuracy                           0.31     10000
   macro avg       0.26      0.30      0.24     10000
weighted avg       0.27      0.31      0.25     10000

Training Decision Tree...
              precision    recall  f1-score   support

           0       0.86      0.89      0.88       980
           1       0.95      0.97      0.96      1135
           2       0.81      0.83      0.82      103

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Installing Gradio to deploy

In [None]:
    !pip install gradio


Collecting gradio
  Downloading gradio-5.25.2-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<25.0,>=22.0 (from gradio)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.8.0 (from gradio)
  Downloading gradio_client-1.8.0-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Downloading ruff-0.11.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)
Collecting safehttpx<0.2.0,>=0.1.6 (

This code creates a Gradio interface for a handwritten digit recognition system. It allows users to upload an image of a handwritten digit, select a machine learning model, and receive a prediction of the digit. The `predict_image` function first processes the uploaded image: it resizes the image to 28x28 pixels, converts it to grayscale, and extracts relevant features using the `extract_features` function. Depending on the model selected by the user (SVM, KNN, Random Forest, Decision Tree, or Logistic Regression), it uses the corresponding trained model to predict the digit and returns the result as a string. If there’s an error during prediction, it provides an error message. The Gradio interface is set up with an image input, a radio button for model selection, and a text output to display the prediction. The app is launched with debugging enabled to aid troubleshooting.

In [None]:
import gradio as gr

def predict_image(image, model_choice):
    try:
        image = cv2.resize(image, (28, 28))
        image = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
        image_features = extract_features(np.array([image]))

        if model_choice == "SVM":
            prediction = svm_model.predict(image_features)[0]
        elif model_choice == "KNN":
            prediction = models["KNN"].predict(image_features)[0]
        elif model_choice == "Random Forest":
            prediction = models["Random Forest"].predict(image_features)[0]
        elif model_choice == "Decision Tree":
            prediction = models["Decision Tree"].predict(image_features)[0]
        elif model_choice == "Logistic Regression":
            prediction = models["Logistic Regression"].predict(image_features)[0]
        else:
            prediction = "Invalid model selected"

        return str(prediction)
    except Exception as e:
        return f"Error during prediction: {e}"

iface = gr.Interface(
    fn=predict_image,
    inputs=[
        gr.Image(type="numpy"),
        gr.Radio(["SVM", "KNN", "Random Forest", "Decision Tree", "Logistic Regression"], label="Choose a model")
    ],
    outputs="text",
    title="Handwritten Digit Recognition",
    description="Upload an image of a handwritten digit (0-9) and select a model to predict the digit."
)

iface.launch(debug=True)


It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://5b2bc619b53241fba6.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://5b2bc619b53241fba6.gradio.live


