# Cassava Leaf Disease Classification using EfficientNetB3

## üìñ Introduction

**Cassava** is a critical staple crop for millions of people in Africa, Asia, and Latin America, providing a vital source of calories. However, its cultivation is severely threatened by various viral diseases, which can lead to significant yield losses and jeopardize food security. Accurate and early detection of these diseases is crucial for effective management and control.

This notebook tackles the **Cassava Leaf Disease Classification** challenge. The primary goal is to develop a machine learning model capable of accurately identifying the type of disease present on a cassava plant, or classifying it as healthy, based on an image of its leaf.

---

### üåø The Diseases

The dataset focuses on the following five categories:
1.  **Cassava Bacterial Blight (CBB)**
2.  **Cassava Brown Streak Disease (CBSD)**
3.  **Cassava Green Mottle (CGM)**
4.  **Cassava Mosaic Disease (CMD)**
5.  **Healthy**

---

### üéØ Our Approach

We will employ a powerful deep learning technique called **transfer learning**. Specifically, we will fine-tune a pre-trained **EfficientNetB3** model, a state-of-the-art convolutional neural network (CNN), on the cassava leaf dataset. To accelerate the training process, we'll leverage Google's Tensor Processing Units (TPUs).

**This notebook focuses on the training phase and saving the model. The inference (prediction generation) will be handled in a separate notebook.**  
[Cassava Leaf inference with TTA](https://www.kaggle.com/code/amirmohamadaskari/cassava-leaf-inference-tta-with-loading-my-model)

The workflow in *this training notebook* includes:

-   **Setting up the Environment**: Configuring the TPU strategy for distributed training.
-   **Data Preprocessing**: Creating an efficient data pipeline using TFRecords.
-   **Data Augmentation**: Applying various image transformations to enhance model robustness and prevent overfitting.
-   **Model Building**: Constructing a custom classifier on top of the EfficientNetB3 base.
-   **Two-Phase Training**:
    1.  Training only the classifier head.
    2.  Fine-tuning the entire model with a low learning rate.
-   **Model Saving**: Saving the trained `final_model_cassava.keras` to this notebook's output, making it available for use in an inference notebook (or via [Cassava Leaf Model](https://www.kaggle.com/models/amirmohamadaskari/cassava-leaf-model)).

## üõ†Ô∏è 1. Initial Setup and Imports

Let's begin by importing the essential libraries for our project. We'll need `numpy` and `pandas` for data manipulation, `tensorflow` for building and training our deep learning model, `cv2` and `matplotlib` for image processing and visualization, and `os` and `json` for file handling.

In [None]:
import os
import numpy as np
import random
import pandas as pd
import numpy as np
import tensorflow as tf
import cv2
import matplotlib.pyplot as plt
import json
import seaborn as sns
from tensorflow.keras import mixed_precision
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.regularizers import l2

## üå± 2. Ensuring Reproducibility

To make our experiments reproducible, it's crucial to set a global seed. This ensures that any process involving randomness‚Äîsuch as weight initialization, data shuffling, and augmentation‚Äîproduces the same results every time the code is run. The function below sets the seed for Python's `random` module, `NumPy`, and `TensorFlow`.

In [None]:
def seed_everthing(SEED=28):
    # Set the Python's random module seed
    random.seed(SEED)
    
    # Set the NumPy random seed
    np.random.seed(SEED)
    
    # Set the TensorFlow random seed
    tf.random.set_seed(SEED)

    print(f"Global seed set to {SEED} üå±")

In [None]:
seed_everthing()

## üöÄ 3. Hardware Accelerator Configuration

Training deep learning models can be computationally intensive. To speed up this process, we'll use a hardware accelerator like a TPU (Tensor Processing Unit) or GPU (Graphics Processing Unit). The following code detects the available hardware and sets up the appropriate TensorFlow distribution strategy. 

- **TPUStrategy**: For training on TPUs.
- **MirroredStrategy**: For training on multiple GPUs on a single machine.
- **Default Strategy**: For training on a single GPU or CPU.

The number of `REPLICAS` indicates how many parallel processing units are available, which is essential for scaling our batch size and distributing the training workload.

In [None]:
print("Available devices:")
for device in tf.config.list_logical_devices():
    print(device.name, device.device_type)

In [None]:
import tensorflow as tf

def get_strategy():
    """
    Detects and returns the best TensorFlow distribution strategy.
    - TPUStrategy for TPU(s)
    - MirroredStrategy for GPU(s)
    - Default strategy for CPU
    """
    try:
        tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu= 'local')
        tf.config.experimental_connect_to_cluster(tpu)
        tf.tpu.experimental.initialize_tpu_system(tpu)
        strategy = tf.distribute.TPUStrategy(tpu)
        print("Using TPU strategy:", type(strategy).__name__)
    except ValueError:
        # If TPU is not found, try GPU
        physical_devices = tf.config.list_physical_devices('GPU')
        if physical_devices:
            strategy = tf.distribute.MirroredStrategy()
            print("Using GPU strategy:", type(strategy).__name__)
        else:
            strategy = tf.distribute.get_strategy()  # default CPU
            print("No TPU/GPU found. Using default strategy:", type(strategy).__name__)
    except Exception as e:
        print("Failed to initialize strategy:", e)
        strategy = tf.distribute.get_strategy()
        print("Using fallback strategy:", type(strategy).__name__)

    print("REPLICAS:", strategy.num_replicas_in_sync)
    return strategy

# Call this function once at the beginning of your script
strategy = get_strategy()

In [None]:
print("REPLICAS:", strategy.num_replicas_in_sync)
print("TensorFlow version:", tf.__version__)

## ‚ö° 4. Mixed Precision Training

To further optimize our training process, we'll enable mixed precision. This technique uses a combination of 16-bit and 32-bit floating-point types during training. It can significantly speed up computations and reduce memory usage, especially on modern GPUs and TPUs, without a substantial loss in model accuracy.

In [None]:
# List all physical devices labeled as 'GPU'
gpus = tf.config.list_physical_devices('GPU')

if gpus:
    print(f"‚úÖ GPU detected: {gpus}")
    mixed_precision.set_global_policy('mixed_float16')
else:
    print("‚ùå No GPU found. Using CPU instead.")


## üìÇ 5. Data Loading and Exploration

Now, let's define the paths to our data files. We need the directory containing the training images, the CSV file with image labels, and the JSON file that maps numerical labels to their corresponding disease names.

In [None]:
DATA_DIR = '/kaggle/input/cassava-leaf-disease-classification'
TRAIN_DIR = os.path.join(DATA_DIR, 'train_images')
CSV_PATH = os.path.join(DATA_DIR, 'train.csv')
LABEL_PATH = os.path.join(DATA_DIR, 'label_num_to_disease_map.json')

In [None]:
# Let's have a look at some of the image file names
print(os.listdir(TRAIN_DIR)[:5])

In [None]:
# Read one image to check its dimensions
img_files = os.listdir(TRAIN_DIR)
img_path = os.path.join(TRAIN_DIR, img_files[0])
img = cv2.imread(img_path)
print(img.shape)

In [None]:
# Load the label-to-disease name mapping from the JSON file
with open(LABEL_PATH, 'r') as f:
    label_map = json.load(f)

### Loading the Training DataFrame
We load the `train.csv` file into a pandas DataFrame. This file contains the `image_id` and its corresponding `label`. We'll convert the labels to string type for easier handling.

In [None]:
df = pd.read_csv(CSV_PATH)
# Convert the 'label' column to string type for consistency
df['label'] = df['label'].astype(str)
print(df.tail())
print(df.shape)

### Analyzing Class Distribution
Understanding the distribution of classes is a vital step in any classification problem. It helps us identify if there is a class imbalance, which can affect the model's performance. A significant imbalance might require special techniques like class weighting or over/under-sampling.

In [None]:
# Get the count of images for each class
df['label'].value_counts()

In [None]:
# Visualize the class distribution using a bar chart
plt.figure(figsize=(25,8))
ax=sns.countplot(x=df["label"],palette="viridis",order=df['label'].value_counts().index)
# Add labels on top of the bars for clarity
for p in ax.containers:
    ax.bar_label(p, fontsize=20, color='black', padding=5);

### Visualizing Sample Images
To get a better feel for the dataset, let's visualize one sample image from each of the five categories. This helps us understand the visual characteristics of healthy leaves and leaves affected by different diseases.

In [None]:
def show_images_from_each_category(df, label_map, rows= 1, images_per_class= 1):
    classes = df['label'].unique()
    num_classes = len(classes)
    columns = images_per_class
    
    plt.figure(figsize= (10* rows,10* columns))
    
    for i, c in enumerate(classes):
        # Get random samples for the current class
        class_samples = df[df['label'] == c].sample(images_per_class)
        for j ,(_, row) in enumerate(class_samples.iterrows()):
            # Construct the full image path
            img_path = os.path.join(TRAIN_DIR, row['image_id'])
            img = cv2.imread(img_path)
            # Convert image from BGR (OpenCV default) to RGB for correct display with Matplotlib
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

            # Get the disease name from the label map
            label_name = label_map.get(row['label'], row['label'])

            # Create a subplot for each image
            plt.subplot(num_classes, columns, i* columns + j +1)
            plt.imshow(img)
            plt.title(f'{label_name}')
            plt.axis('off')

    plt.tight_layout()
    plt.show()

In [None]:
show_images_from_each_category(df, label_map, rows= 1, images_per_class= 1)

## ‚öôÔ∏è 6. Data Pipeline with TFRecords

For efficient data handling, especially with large datasets and TPUs, we will use the **TFRecord** format. TFRecord is a binary file format that stores a sequence of protocol buffer messages, which is highly optimized for reading data in TensorFlow.

First, we'll locate the directory containing our pre-processed TFRecord files.

In [None]:
tfrecord_dir = '/kaggle/input/cassava-leaf-disease-classification/train_tfrecords'
print(os.listdir(tfrecord_dir))

### Configuration Parameters
Here, we define several key parameters for our data pipeline and model. 

- `AUTO`: An instruction for TensorFlow to automatically tune the level of parallelism for data pipeline operations.
- `IMAGE_SIZE`: The dimensions to which all images will be resized. EfficientNetB3 performs well with larger image sizes.
- `NUM_CLASSES`: The number of distinct categories in our classification task.
- `BATCH_SIZE`: The total number of samples processed in one forward/backward pass. We calculate the **global batch size** by multiplying the per-replica batch size by the number of available replicas (TPU cores or GPUs). This ensures that the model sees the same total number of images per step, regardless of the distribution strategy.

In [None]:
AUTO = tf.data.AUTOTUNE
IMAGE_SIZE = (512, 512)
BATCH_SIZE_PER_REPLICA = 8
NUM_CLASSES = 5
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
print(f'Global Batch size: {BATCH_SIZE}')

### Decoding TFRecord Examples
This function is responsible for parsing a single example from a TFRecord file. It reads the raw byte strings for the image and its target label, decodes the JPEG image, resizes it to our standard `IMAGE_SIZE`, and finally, one-hot encodes the label. One-hot encoding converts the integer label (e.g., 3) into a binary vector (e.g., `[0, 0, 0, 1, 0]`), which is the required format for categorical cross-entropy loss.

In [None]:
def decode_example(example):
    # Define the structure of the features in the TFRecord file
    feature_description = {
        'image': tf.io.FixedLenFeature([], tf.string),
        'target': tf.io.FixedLenFeature([], tf.int64),
    }
    # Parse the input `example` protocol buffer using the feature description
    example = tf.io.parse_single_example(example, feature_description)
    
    # Decode the JPEG-encoded image string to a tensor
    image = tf.image.decode_jpeg(example['image'], channels=3)
    # Resize the image to the desired dimensions
    image = tf.image.resize(image, IMAGE_SIZE)
    
    # Get the integer label
    label_int = example['target']
    
    # One-hot encode the label for the model
    label = tf.one_hot(label_int, NUM_CLASSES)
    
    return image, label

### Preprocessing Input for EfficientNet
Pre-trained models like EfficientNet expect their input data to be preprocessed in a specific way (e.g., pixel values scaled to a certain range). The `preprocess_input` function from `tf.keras.applications.efficientnet` handles this for us, ensuring our images are in the correct format for the model.

In [None]:
from tensorflow.keras.applications.efficientnet import preprocess_input

def preprocess(image, label):
    # Apply the specific preprocessing required by the EfficientNet model
    image = preprocess_input(image)
    return image, label

### üé® Data Augmentation
Data augmentation is a powerful technique to increase the diversity of the training data without collecting new samples. By applying random transformations like rotations, flips, and zooms to the training images, we can make our model more robust and less prone to overfitting. We define a `Sequential` model that will apply these augmentations on the fly during training.

In [None]:
data_augmentation = tf.keras.Sequential([
    # Geometric Transformations
    tf.keras.layers.RandomRotation(40/ 360), # Randomly rotate images
    tf.keras.layers.RandomTranslation(0.2, 0.2), # Randomly shift images horizontally and vertically
    tf.keras.layers.RandomZoom(0.2, 0.2), # Randomly zoom into images
    tf.keras.layers.RandomFlip('horizontal'), # Randomly flip images horizontally
    tf.keras.layers.RandomFlip('vertical') # Randomly flip images vertically
], name="data_augmentation")

In [None]:
# Define the augmentation function that will be mapped to the dataset
def apply_augmentation(image, label):
    # Keras augmentation layers expect a batch of images.
    # We add a batch dimension, apply the augmentation, and then remove it.
    image = tf.expand_dims(image, 0) # Add batch dimension
    image = data_augmentation(image)
    image = tf.squeeze(image, 0)      # Remove batch dimension
    return image, label

### Assembling the Data Pipeline
This function brings everything together to create our final `tf.data.Dataset` object. It reads the TFRecord files, decodes and preprocesses the data, and applies augmentation (only to the training set). Key steps include:

- `.with_options(ignore_order)`: Disables deterministic order to improve performance.
- `.map()`: Applies our decoding, preprocessing, and augmentation functions in parallel.
- `.shuffle()`: Shuffles the training data to ensure the model doesn't learn from the order of examples.
- `.batch()`: Groups the data into batches.
- `.prefetch()`: Prepares subsequent batches while the current one is being processed, which helps to prevent data pipeline bottlenecks.

In [None]:
def load_dataset(filenames, is_training=True):
    ignore_order = tf.data.Options()
    ignore_order.experimental_deterministic = False  # for performance
    # Create a dataset from the TFRecord files
    dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads= AUTO)
    # Disable deterministic order for better performance
    dataset = dataset.with_options(ignore_order)
    # Decode each example in the dataset
    dataset = dataset.map(decode_example, num_parallel_calls= AUTO)
    # Preprocess the images for the model
    dataset = dataset.map(preprocess, num_parallel_calls= AUTO)
    # Apply augmentations and shuffling only to the training set
    if is_training:
        dataset = dataset.shuffle(8196)
        dataset = dataset.map(apply_augmentation, num_parallel_calls= AUTO)
    # Batch the dataset and prefetch for performance
    dataset = dataset.batch(BATCH_SIZE, drop_remainder= True).prefetch(AUTO)
    return dataset

### Creating Training and Validation Datasets
We'll split our TFRecord files into a training set and a validation set. A common split is 80% for training and 20% for validation. The validation set is crucial for monitoring the model's performance on unseen data during training and for tuning hyperparameters.

In [None]:
# Get a list of all TFRecord files
all_files = sorted(tf.io.gfile.glob(os.path.join(tfrecord_dir, '*.tfrec')))

# Split the files into training and validation sets (e.g., 80% train, 20% val)
split_index = int(0.8 * len(all_files)) + 1
train_files = all_files[:split_index]
val_files = all_files[split_index:]

In [None]:
# Create the dataset objects using our pipeline function
train_dataset = load_dataset(train_files, is_training=True)
val_dataset = load_dataset(val_files, is_training=False)
print(f'Train and validation set created successfully!')

### Calculating Steps Per Epoch
To properly train our model, we need to know how many batches of data constitute one full epoch. We calculate `steps_per_epoch` for the training set and `validation_steps` for the validation set. This is done by counting the total number of images in each set and dividing by the global batch size.

In [None]:
# Helper function to count the total number of examples in a set of TFRecord files
def count_total_examples(tfrecord_files):
    count = 0
    for fname in tfrecord_files:
        count += sum(1 for _ in tf.data.TFRecordDataset(fname))
    return count

In [None]:
# Count the number of images in the training and validation sets
num_train_images = count_total_examples(train_files)
num_val_images = count_total_examples(val_files)

# Calculate the number of steps (batches) per epoch
steps_per_epoch = num_train_images // BATCH_SIZE
validation_steps = num_val_images // BATCH_SIZE

print(f'Batch steps on training: {steps_per_epoch}\nSteps on validation: {validation_steps}')

### Verifying the Data Pipeline
Let's inspect a single batch from our training dataset to ensure that the images and labels have the correct shapes and data types. This is a good sanity check before starting the training process.

In [None]:
# Take one batch from the training dataset
for images, labels in train_dataset.take(1):
    # Print the shape of the image and label tensors
    print("Image batch shape:", images.shape)
    print("Label batch shape:", labels.shape)
    # Print an example one-hot encoded label
    print("Label example (one-hot):", labels[0].numpy())

## üß† 7. Model Building with Transfer Learning

We will now construct our model using **transfer learning**. We'll use **EfficientNetB3**, a powerful and efficient model pre-trained on the large ImageNet dataset. The idea is to leverage the features learned by this model (like edges, textures, and shapes) and adapt them to our specific task of classifying cassava leaf diseases.

Our model architecture will consist of:
1.  **The Base Model**: The pre-trained `EfficientNetB3` with its top classification layer removed (`include_top=False`). We'll initially freeze its weights so they don't change during the first phase of training.
2.  **A Custom Classifier Head**: We will add our own layers on top of the base model:
    - `GlobalAveragePooling2D`: To flatten the feature maps from the base model.
    - `Dense` layer with `relu` activation: A hidden layer to learn more complex patterns.
    - `Dropout`: A regularization technique to prevent overfitting.
    - `Dense` output layer with `softmax` activation: To produce a probability distribution over the 5 classes.

In [None]:
from tensorflow.keras.regularizers import l2
def cassava_model(IMAGE_SIZE):
    
    # Load the EfficientNetB3 model, pre-trained on ImageNet
    base_model = tf.keras.applications.EfficientNetB3(
        weights= 'imagenet',
        include_top= False, # Do not include the final ImageNet classifier layer
        input_shape= IMAGE_SIZE + (3,)
    )
    
    # Freeze the weights of the base model. We will only train the new classifier head initially.
    base_model.trainable = False
        
    # Define the model input
    inputs = tf.keras.layers.Input(shape= IMAGE_SIZE + (3, ))
    # Pass the inputs through the base model
    eff = base_model(inputs)
    # Add our custom classifier head
    avg = tf.keras.layers.GlobalAveragePooling2D()(eff)
    fc = tf.keras.layers.Dense(256, activation= 'relu', kernel_regularizer= l2(0.001))(avg)
    dropout = tf.keras.layers.Dropout(0.3)(fc)
    outputs = tf.keras.layers.Dense(NUM_CLASSES, activation= 'softmax')(dropout)
    
    # Create the final model
    model = tf.keras.Model(inputs= inputs, outputs= outputs)
    # Print a summary of the model architecture
    model.summary()

    return model

## üèãÔ∏è 8. Training Phase 1: Training the Head

In the first phase of training, we only train the weights of the custom classifier head we added. The weights of the EfficientNetB3 base model remain frozen. This allows the new layers to learn to interpret the features extracted by the base model for our specific dataset without disrupting the valuable pre-trained knowledge.

### Callbacks
We will use several callbacks to manage the training process:
- `ModelCheckpoint`: Saves the model with the best validation loss.
- `EarlyStopping`: Stops training if the validation loss doesn't improve for a set number of epochs, preventing overfitting.
- `ReduceLROnPlateau`: Reduces the learning rate if the training plateaus, helping the model to find a better minimum.

In [None]:
# Save the best model based on validation loss
checkpoint_cb = ModelCheckpoint(
    '/kaggle/working/initial_model_cassava.keras', # File to save the best model
    monitor='val_loss',
    save_best_only=True,
    mode='min' # We want to minimize loss
)

# Stop training if validation loss doesn't improve for 3 epochs
early_stopping_cb = EarlyStopping(
    monitor='val_loss',
    patience=3,
    restore_best_weights=True # This is great, it restores the weights from the best epoch
)

# Reduce learning rate when learning plateaus
reduce_lr_cb = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.2,
    patience=2,
    min_lr=1e-6
)

callbacks = [checkpoint_cb, early_stopping_cb, reduce_lr_cb]

### Compiling and Fitting the Model
We compile the model within the `strategy.scope()` to ensure it's distributed across the available TPUs/GPUs. We use the `Adam` optimizer, `CategoricalCrossentropy` loss with **label smoothing** (a regularization technique that prevents the model from becoming overconfident), and track `accuracy` as our performance metric. Then, we start the training process.

In [None]:
with strategy.scope():
    # Build the model
    model = cassava_model(IMAGE_SIZE)
    # Compile the model with optimizer, loss, and metrics
    model.compile(optimizer = tf.keras.optimizers.Adam(3e-4), 
                  loss= tf.keras.losses.CategoricalCrossentropy(label_smoothing= 0.01), 
                  metrics= ['accuracy'],
                  steps_per_execution= 32 if isinstance(strategy, tf.distribute.TPUStrategy) else 1)

# Set the number of epochs for this training phase
initial_epoch = 15
print("--- Starting Phase 1: Training Head Classifier ---")
# Fit the model to the training data
history = model.fit(train_dataset.repeat(), 
                    validation_data= val_dataset.repeat(),
                    epochs= initial_epoch,
                    callbacks= callbacks,
                    steps_per_epoch= steps_per_epoch,
                    validation_steps= validation_steps)

In [None]:
# Check the history object from your training run
val_loss_history = history.history['val_loss']
best_val_loss = min(val_loss_history)
best_epoch = val_loss_history.index(best_val_loss) + 1

print(f"Lowest Validation Loss: {best_val_loss:.4f} at Epoch {best_epoch}")
print(f"Final Validation Loss: {val_loss_history[-1]:.4f} at Epoch {len(val_loss_history)}")

## üé® 9. Training Phase 2: Fine-Tuning

After the classifier head has been trained and has converged, we can move to the fine-tuning phase. Here, we **unfreeze** the entire base model (or some of its top layers) and continue training with a **very low learning rate**. 

This allows the model to make small adjustments to the pre-trained weights, adapting them more closely to the specifics of the cassava leaf dataset. Using a low learning rate is critical to avoid destroying the valuable features learned during pre-training.

In [None]:
# Define a new set of callbacks for the fine-tuning phase
fn_checkpoint_cb = ModelCheckpoint(
    '/kaggle/working/final_model_cassava.keras', # File to save the best model
    monitor='val_loss',
    save_best_only=True,
    mode='min' # We want to minimize loss
)

# Stop training if validation loss doesn't improve for 4 epochs
early_stopping_cb = EarlyStopping(
    monitor='val_loss',
    patience=4,
    restore_best_weights=True # This is great, it restores the weights from the best epoch
)

# Reduce learning rate when learning plateaus
reduce_lr_cb = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.2,
    patience=2,
    min_lr=1e-7
)

callbacks = [fn_checkpoint_cb, early_stopping_cb, reduce_lr_cb]

In [None]:
with strategy.scope():

    # Unfreeze the base model to make it trainable
    base_model = model.get_layer('efficientnetb3')
    base_model.trainable = True

    # Re-compile the model with a much lower learning rate for fine-tuning
    model.compile(
        optimizer=tf.keras.optimizers.Adam(3e-5),
        loss=tf.keras.losses.CategoricalCrossentropy(label_smoothing=0.01),
        metrics=['accuracy'],
        steps_per_execution= 32 if isinstance(strategy, tf.distribute.TPUStrategy) else 1
    )

print("--- Starting Phase 2: Fine tune some last layers ---")
final_epochs = initial_epoch + 50
# Continue training the model
history_final = model.fit(
    train_dataset.repeat(),
    validation_data= val_dataset.repeat(),
    epochs= final_epochs,
    initial_epoch= initial_epoch, # Start from the epoch number where the first phase left off
    callbacks= callbacks,
    steps_per_epoch= steps_per_epoch,
    validation_steps= validation_steps
)

## ‚ú® End of Training Notebook: What's Next?

Congratulations! You've successfully completed the training phase of your model. This notebook's primary purpose was to **train your deep learning model (e.g., EfficientNet)** using the provided dataset and accelerators like TPUs or GPUs.

---

### Your Accomplishments in This Notebook:

* **Model Training:** You've trained `final_model_cassava.keras`, learning patterns and features from the large training dataset.
* **Model Saving:** Crucially, the trained model has been **saved to this notebook's output directory** (`/kaggle/working/`). When you "Save Version" of this notebook (especially with "Save & Run All`), this saved model becomes a persistent asset.

---

### Moving to Inference: The Next Step

For Kaggle competitions, particularly those with hidden test sets or long inference times, it's best practice to separate training from prediction. Your trained model is now ready for the **Inference Notebook**.

**Here's the planned workflow for making your final submission:**

1.  **Create a New Inference Notebook:** Start a fresh Kaggle notebook. This notebook will be solely for making predictions.
2.  **Add This Notebook's Output as Input:** In the new Inference Notebook, go to the "Add Data" section. You'll find the output of *this* training notebook (which includes `final_model_cassava.keras`) and add it as an input. This makes your trained model available for use.
    * Alternatively, you can also use the model directly from Kaggle Models: [Cassava Leaf Model](https://www.kaggle.com/models/amirmohamadaskari/cassava-leaf-model).
3.  **Load the Model:** In the Inference Notebook, you'll load `final_model_cassava.keras` from its path within `/kaggle/input/` (e.g., `/kaggle/input/your-training-notebook-output-name/final_model_cassava.keras`).
4.  **Perform Inference & Create Submission File:** The Inference Notebook will then load the competition's test data, use your loaded model to make predictions, and generate the `submission.csv` file in the correct format.
5.  **Submit!** Once the Inference Notebook successfully runs and creates `submission.csv`, you can submit it to the competition leaderboard.

This two-notebook approach keeps your training and inference workflows clean, efficient, and compliant with Kaggle's submission system. Good luck with your submission! üöÄ

Inference and Test time Augmentation will make on this notebook:  
[Cassava Leaf inference with TTA](https://www.kaggle.com/code/amirmohamadaskari/cassava-leaf-inference-tta-with-loading-my-model)  
**Note**: For simplicity and also did'nt have access to ouput versions of my notebook as input to inference notebook, I download my model as just uploaded again on models. so in inference notebook, we loaded our model that I already uploaded in models and don't add output of my notebook as input, but this is a standard way to avoid wasting time through training with CPU or GPU in some competitions like Cassavs that using TPU was banned:)