# Program 1: Plant Disease Classification using a CNN

This program designs and implements a deep learning model to classify diseases in tomato plants. The model is a Convolutional Neural Network (CNN) built using TensorFlow and Keras. The dataset used is `plant_village` from TensorFlow Datasets, which contains images of healthy and diseased tomato leaves.

In [1]:
import tensorflow as tf
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
import numpy as np

2025-08-06 21:46:14.669201: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-08-06 21:46:14.751304: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1754496974.785768    9389 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1754496974.796194    9389 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1754496974.861944    9389 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

### 1. Load and Preprocess the Dataset

In [2]:
(train_ds, test_ds), ds_info = tfds.load(
    'plant_village',
    split=['train[:80%]', 'train[80%:]'],
    with_info=True,
    as_supervised=True,
)

num_classes = ds_info.features['label'].num_classes
class_names = ds_info.features['label'].names

print(f"Number of classes: {num_classes}")
print(f"Class names: {class_names}")

IMG_SIZE = 224

def preprocess_image(image, label):
    image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
    image = tf.cast(image, tf.float32) / 255.0
    return image, label

train_ds = train_ds.map(preprocess_image).batch(32).prefetch(tf.data.AUTOTUNE)
test_ds = test_ds.map(preprocess_image).batch(32).prefetch(tf.data.AUTOTUNE)

  from .autonotebook import tqdm as notebook_tqdm


[1mDownloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /home/abhijit-42/tensorflow_datasets/plant_village/1.0.2...[0m


Dl Completed...: 0 url [00:00, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:00<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:01<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:03<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:03<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:04<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:06<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:09<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:12<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:16<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:20<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:24<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:27<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:30<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:33<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:37<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:41<?, ? url/s]
Dl Completed...:   0%|          | 0/1 [00:45<?, 

KeyboardInterrupt: 

### 2. Build the CNN Model

In [None]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(128, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(num_classes, activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


### 3. Train and Evaluate the Model

In [None]:
history = model.fit(
    train_ds,
    epochs=10,
    validation_data=test_ds
)

test_loss, test_accuracy = model.evaluate(test_ds)
print(f"\nTest Accuracy: {test_accuracy*100:.2f}%")

Epoch 1/10
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m103s[0m 69ms/step - accuracy: 0.5583 - loss: 1.6428 - val_accuracy: 0.7324 - val_loss: 0.9218
Epoch 2/10
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m68s[0m 50ms/step - accuracy: 0.8739 - loss: 0.4025 - val_accuracy: 0.8816 - val_loss: 0.4161
Epoch 3/10
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m70s[0m 51ms/step - accuracy: 0.9298 - loss: 0.2205 - val_accuracy: 0.8883 - val_loss: 0.4046
Epoch 4/10
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m68s[0m 50ms/step - accuracy: 0.9520 - loss: 0.1438 - val_accuracy: 0.8986 - val_loss: 0.3781
Epoch 5/10
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 51ms/step - accuracy: 0.9679 - loss: 0.0956 - val_accuracy: 0.9140 - val_loss: 0.3712
Epoch 6/10
[1m1358/1358[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m69s[0m 51ms/step - accuracy: 0.9734 - loss: 0.0829 - val_accuracy: 0.9093 - val_loss: 0.3975
Epo

### 4. Observations

The CNN model was trained to classify tomato leaf diseases. After 10 epochs, the model should achieve a reasonable accuracy on the test set. The training history can be plotted to visualize the model's learning process, showing the training and validation accuracy and loss over epochs. This helps in identifying potential overfitting. Further improvements could be made by using a more complex model architecture, data augmentation, or transfer learning.

# Program 2: Vision Transformer (ViT) for CIFAR-10 Classification

This program implements a Vision Transformer (ViT) in TensorFlow/Keras to classify images from the CIFAR-10 dataset. The ViT model is built with a patch size of 8x8, 4 transformer encoder layers, and multi-head self-attention. The model is trained for 10 epochs, and its training and test accuracy are reported.

In [None]:
import tensorflow as tf
from tensorflow.keras import layers
import tensorflow_datasets as tfds

### 1. Load and Preprocess the CIFAR-10 Dataset

In [None]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)

print(f"x_train shape: {x_train.shape} - y_train shape: {y_train.shape}")
print(f"x_test shape: {x_test.shape} - y_test shape: {y_test.shape}")

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 0us/step
x_train shape: (50000, 32, 32, 3) - y_train shape: (50000, 10)
x_test shape: (10000, 32, 32, 3) - y_test shape: (10000, 10)


### 2. Implement the Vision Transformer

In [None]:
IMAGE_SIZE = 32
PATCH_SIZE = 8
NUM_PATCHES = (IMAGE_SIZE // PATCH_SIZE) ** 2
PROJECTION_DIM = 64
NUM_HEADS = 4
TRANSFORMER_UNITS = [PROJECTION_DIM * 2, PROJECTION_DIM]
TRANSFORMER_LAYERS = 4
MLP_HEAD_UNITS = [2048, 1024]

class Patches(layers.Layer):
    def __init__(self, patch_size):
        super(Patches, self).__init__()
        self.patch_size = patch_size

    def call(self, images):
        batch_size = tf.shape(images)[0]
        patches = tf.image.extract_patches(
            images=images,
            sizes=[1, self.patch_size, self.patch_size, 1],
            strides=[1, self.patch_size, self.patch_size, 1],
            rates=[1, 1, 1, 1],
            padding="VALID",
        )
        patch_dims = patches.shape[-1]
        patches = tf.reshape(patches, [batch_size, -1, patch_dims])
        return patches

class PatchEncoder(layers.Layer):
    def __init__(self, num_patches, projection_dim):
        super(PatchEncoder, self).__init__()
        self.num_patches = num_patches
        self.projection = layers.Dense(units=projection_dim)
        self.position_embedding = layers.Embedding(
            input_dim=num_patches, output_dim=projection_dim
        )

    def call(self, patch):
        positions = tf.range(start=0, limit=self.num_patches, delta=1)
        encoded = self.projection(patch) + self.position_embedding(positions)
        return encoded

def mlp(x, hidden_units, dropout_rate):
    for units in hidden_units:
        x = layers.Dense(units, activation=tf.nn.gelu)(x)
        x = layers.Dropout(dropout_rate)(x)
    return x

def create_vit_classifier():
    inputs = layers.Input(shape=(IMAGE_SIZE, IMAGE_SIZE, 3))
    patches = Patches(PATCH_SIZE)(inputs)
    encoded_patches = PatchEncoder(NUM_PATCHES, PROJECTION_DIM)(patches)

    for _ in range(TRANSFORMER_LAYERS):
        x1 = layers.LayerNormalization(epsilon=1e-6)(encoded_patches)
        attention_output = layers.MultiHeadAttention(
            num_heads=NUM_HEADS, key_dim=PROJECTION_DIM, dropout=0.1
        )(x1, x1)
        x2 = layers.Add()([attention_output, encoded_patches])
        x3 = layers.LayerNormalization(epsilon=1e-6)(x2)
        x3 = mlp(x3, hidden_units=TRANSFORMER_UNITS, dropout_rate=0.1)
        encoded_patches = layers.Add()([x3, x2])

    representation = layers.LayerNormalization(epsilon=1e-6)(encoded_patches)
    representation = layers.Flatten()(representation)
    representation = layers.Dropout(0.5)(representation)
    features = mlp(representation, hidden_units=MLP_HEAD_UNITS, dropout_rate=0.5)
    logits = layers.Dense(10)(features)
    model = tf.keras.Model(inputs=inputs, outputs=logits)
    return model

### 3. Build, Train, and Evaluate the ViT Model

In [None]:
vit_classifier = create_vit_classifier()
vit_classifier.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
    loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"],
)

history = vit_classifier.fit(
    x=x_train,
    y=y_train,
    batch_size=64,
    epochs=10,
    validation_data=(x_test, y_test),
)

train_loss, train_accuracy = vit_classifier.evaluate(x_train, y_train)
test_loss, test_accuracy = vit_classifier.evaluate(x_test, y_test)

print(f"\nTraining Accuracy: {train_accuracy*100:.2f}%")
print(f"Test Accuracy: {test_accuracy*100:.2f}%")

Epoch 1/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m51s[0m 31ms/step - accuracy: 0.2002 - loss: 2.2493 - val_accuracy: 0.4207 - val_loss: 1.6108
Epoch 2/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 10ms/step - accuracy: 0.3816 - loss: 1.6897 - val_accuracy: 0.4828 - val_loss: 1.4406
Epoch 3/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 10ms/step - accuracy: 0.4462 - loss: 1.5362 - val_accuracy: 0.5147 - val_loss: 1.3379
Epoch 4/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 10ms/step - accuracy: 0.4835 - loss: 1.4513 - val_accuracy: 0.5382 - val_loss: 1.2657
Epoch 5/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 10ms/step - accuracy: 0.5085 - loss: 1.3702 - val_accuracy: 0.5496 - val_loss: 1.2438
Epoch 6/10
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 10ms/step - accuracy: 0.5262 - loss: 1.3234 - val_accuracy: 0.5714 - val_loss: 1.1995
Epoch 7/10
[1m7

### 4. Inferences

The Vision Transformer model was successfully implemented and trained on the CIFAR-10 dataset. The model's performance, in terms of training and test accuracy, demonstrates the viability of Transformers for image classification tasks, even on smaller datasets like CIFAR-10. The training process shows a steady increase in accuracy and a decrease in loss, indicating that the model is learning effectively. The final test accuracy provides a benchmark for the model's ability to generalize to unseen data. For a lightweight mobile application, the trade-offs between the ViT's performance and its computational cost (compared to a traditional CNN) would be an important consideration for the computer vision engineers.