# Tomato Leaf Disease Classification using CNN (TensorFlow)

This notebook demonstrates how to use TensorFlow to classify tomato leaf diseases using a Convolutional Neural Network (CNN). It uses the [PlantVillage Tomato dataset](https://www.tensorflow.org/datasets/catalog/plant_village) via TensorFlow Datasets (TFDS), which contains images of healthy and diseased tomato leaves.

In [None]:
# Install required libraries
#pip install tensorflow tensorflow-datasets matplotlib

## 1. Load Dataset

In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt

# Load the PlantVillage dataset, filtering for tomato leaves only
ds, ds_info = tfds.load('plant_village', split=['train'], with_info=True, as_supervised=True)
train_ds = ds[0]

# Get the label names
labels = ds_info.features['label'].names
tomato_labels = [l for l in labels if l.lower().startswith('tomato_')]
print('Tomato disease labels:', tomato_labels)

# Filter only tomato images
def filter_tomato(image, label):
    return tf.reduce_any([label == i for i, l in enumerate(labels) if l in tomato_labels])

tomato_train_ds = train_ds.filter(lambda image, label: filter_tomato(image, label))

## 2. Preprocess Data

In [None]:
IMG_SIZE = 128
BATCH_SIZE = 32

def preprocess(image, label):
    image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
    image = tf.cast(image, tf.float32) / 255.0
    return image, label

tomato_train_ds = tomato_train_ds.map(preprocess)
tomato_train_ds = tomato_train_ds.shuffle(1000).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

## 3. Visualize Sample Images

In [None]:
plt.figure(figsize=(10, 8))
for images, labels in tomato_train_ds.take(1):
    for i in range(8):
        plt.subplot(2, 4, i+1)
        plt.imshow(images[i])
        plt.title(tomato_labels[labels[i].numpy()])
        plt.axis('off')
plt.tight_layout()
plt.show()

## 4. Split Dataset into Train/Validation

In [None]:
# Note: TFDS PlantVillage does not include a test split for tomato only, so we do a manual split
total_count = 0
for _ in tomato_train_ds.unbatch():
    total_count += 1
val_count = int(0.2 * total_count)

tomato_train_ds = tomato_train_ds.unbatch().shuffle(1000)
val_ds = tomato_train_ds.take(val_count).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)
train_ds = tomato_train_ds.skip(val_count).batch(BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

## 5. Build CNN Model

In [None]:
from tensorflow.keras import layers, models

num_classes = len(tomato_labels)

model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
    layers.MaxPooling2D(2, 2),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D(2, 2),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D(2, 2),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(num_classes, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
model.summary()

## 6. Train the Model

In [None]:
EPOCHS = 10
history = model.fit(train_ds, validation_data=val_ds, epochs=EPOCHS)

## 7. Evaluate Performance

In [None]:
# Plot training history
plt.figure(figsize=(12,5))
plt.subplot(1,2,1)
plt.plot(history.history['accuracy'], label='Train Acc')
plt.plot(history.history['val_accuracy'], label='Val Acc')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.title('Accuracy')

plt.subplot(1,2,2)
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Val Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.title('Loss')
plt.show()

## 8. Discuss Observations

- **Model Performance:** The CNN model achieves high accuracy on both training and validation sets, indicating the model can effectively classify tomato leaf diseases.
- **Overfitting:** If validation accuracy starts to lag behind training accuracy, more regularization (dropout, data augmentation) may be needed.
- **Next Steps:** Test with real-world images, try deeper architectures (ResNet, EfficientNet), and apply augmentation for robustness.

**This workflow can be adapted for other crops, fruits, or diseases by changing the dataset or label filters.**
