### Step 0: Import Kaggle dataset into Colab
This cell downloads the PlantVillage dataset split (70/15/15) directly from Kaggle using `kagglehub`.  
It ensures that the dataset is available in the Colab environment for training and testing.  
⚠️ Note: Colab’s environment differs from Kaggle’s, so some libraries may not be preinstalled.


In [None]:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.
import kagglehub
rabieoudghiri_plantvillage_split_70_15_15_path = kagglehub.dataset_download('rabieoudghiri/plantvillage-split-70-15-15')

print('Data source import complete.')


Downloading from https://www.kaggle.com/api/v1/datasets/download/rabieoudghiri/plantvillage-split-70-15-15?dataset_version_number=1...


100%|██████████| 485M/485M [00:24<00:00, 20.4MB/s]

Extracting files...





Data source import complete.


In [None]:
!pip install tensorflow



### Step 1: Import libraries and set random seed
In this step, we import the necessary libraries for building and training our deep learning model.  
We use TensorFlow/Keras for model creation, ResNet50 as the base architecture, and utility modules for preprocessing images.  
We also set a random seed to ensure reproducibility — meaning that results will be consistent each time the notebook is run.


In [None]:
# Import necessary libraries
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense
from tensorflow.keras.optimizers import Adam
import numpy as np

# Set the seed for reproducibility
seed = 42
np.random.seed(seed)
tf.random.set_seed(seed)


### Step 2: Define dataset directories
We specify the paths to the training, validation, and test sets.  
These directories contain the PlantVillage leaf images split into 70% training, 15% validation, and 15% test.  
Keras will later use these paths to load images and generate batches for model training and evaluation.


In [None]:
# Define dataset directories
train_dir = '/kaggle/input/plantvillage-split-70-15-15/plantvillage_split/train'
val_dir = '/kaggle/input/plantvillage-split-70-15-15/plantvillage_split/valid'
test_dir = '/kaggle/input/plantvillage-split-70-15-15/plantvillage_split/test'


### Step 3: Prepare data generators
We redefine the dataset directories using the path downloaded from KaggleHub.  
Then, we create `ImageDataGenerator` objects for training, validation, and testing.  
These generators handle preprocessing (using ResNet50’s `preprocess_input`) and automatically load images from the dataset folders.  
Finally, we build data loaders (`flow_from_directory`) that will feed batches of images into the model during training and evaluation.


In [None]:
from tensorflow.keras.applications.resnet50 import preprocess_input
import os # Import the os module for path manipulation

# Redefine dataset directories with the correct path obtained from kagglehub
base_data_path = rabieoudghiri_plantvillage_split_70_15_15_path
train_dir = os.path.join(base_data_path, 'plantvillage_split', 'train')
val_dir = os.path.join(base_data_path, 'plantvillage_split', 'valid')
test_dir = os.path.join(base_data_path, 'plantvillage_split', 'test')

train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
val_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
test_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

train_generator = train_datagen.flow_from_directory(train_dir, target_size=(224,224), batch_size=32, class_mode='categorical')
val_generator = val_datagen.flow_from_directory(val_dir, target_size=(224,224), batch_size=32, class_mode='categorical')
test_generator = test_datagen.flow_from_directory(test_dir, target_size=(224,224), batch_size=32, class_mode='categorical')

Found 37997 images belonging to 38 classes.
Found 8129 images belonging to 38 classes.
Found 8180 images belonging to 38 classes.


### Step 4: Build the transfer learning model
We use ResNet50 (pretrained on ImageNet) as the base model for feature extraction.  
The top (fully connected) layers are removed, and we freeze the base layers so they are not retrained.  
On top of ResNet50, we add custom layers:
- Global Average Pooling to reduce feature maps into a single vector.
- Dense layers (128 and 64 units) with ReLU activation for learning task-specific features.
- Final Dense layer with softmax activation to classify images into the PlantVillage disease categories.


In [None]:
# Load ResNet50 base model
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224,224,3))
base_model.trainable = False  # freeze base layers

# Add custom layers
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(128, activation='relu')(x)
x = Dense(64, activation='relu')(x)
predictions = Dense(train_generator.num_classes, activation='softmax')(x)

# Build model
model = Model(inputs=base_model.input, outputs=predictions)


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m94765736/94765736[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 0us/step


### Step 5: Compile the model
We compile the model by specifying:
- **Optimizer:** Adam with a learning rate of 0.0001, which balances speed and stability during training.
- **Loss function:** Categorical crossentropy, suitable for multi-class classification problems.
- **Metrics:** Accuracy, to monitor how well the model is performing during training and evaluation.


In [None]:
model.compile(optimizer=Adam(learning_rate=1e-4),
              loss='categorical_crossentropy',
              metrics=['accuracy'])


### Step 6: Train the model
We train the model using the training data generator for 20 epochs.  
During training, the model learns to classify leaf images into disease categories.  
We also provide the validation generator so the model’s performance can be monitored on unseen data after each epoch.  
The training history (loss and accuracy values) will be stored in the `history` object for later visualization.


In [None]:
history = model.fit(train_generator,
                    epochs=20,
                    validation_data=val_generator)


  self._warn_if_super_not_called()


Epoch 1/20
[1m1188/1188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m145s[0m 111ms/step - accuracy: 0.5584 - loss: 1.8523 - val_accuracy: 0.8945 - val_loss: 0.4256
Epoch 2/20
[1m1188/1188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m129s[0m 106ms/step - accuracy: 0.9132 - loss: 0.3468 - val_accuracy: 0.9363 - val_loss: 0.2389
Epoch 3/20
[1m1188/1188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m125s[0m 105ms/step - accuracy: 0.9466 - loss: 0.2042 - val_accuracy: 0.9519 - val_loss: 0.1820
Epoch 4/20
[1m1188/1188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m126s[0m 106ms/step - accuracy: 0.9626 - loss: 0.1471 - val_accuracy: 0.9546 - val_loss: 0.1570
Epoch 5/20
[1m1188/1188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 106ms/step - accuracy: 0.9715 - loss: 0.1131 - val_accuracy: 0.9592 - val_loss: 0.1410
Epoch 6/20
[1m1188/1188[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m125s[0m 106ms/step - accuracy: 0.9767 - loss: 0.0909 - val_accuracy: 0.9612 - val_loss:

### Step 7: Evaluate the model on the test set
After training, we evaluate the model using the test dataset.  
This step measures how well the model generalizes to completely unseen data.  
We calculate the test loss and test accuracy, then print the accuracy score.


In [None]:
test_loss, test_acc = model.evaluate(test_generator)
print(f"Test accuracy: {test_acc:.4f}")


[1m256/256[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 103ms/step - accuracy: 0.9780 - loss: 0.0794
Test accuracy: 0.9741


In [None]:
model.save('/content/plant_disease_resnet50.h5')




In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Save model to Google Drive
model.save('/content/drive/MyDrive/plant_disease_resnet50.h5')




Mounted at /content/drive
