## Automated Diabetic Retinopathy Screening – Phase 4: Final Evaluation & Packaging

### 🎯 **Objective**
The goal of this final notebook is to take our best-performing model from the experimentation phase, `best_model.keras`, and use it to generate a final submission file for the unlabeled Kaggle test set. This notebook will serve as the definitive conclusion to our Proof-of-Concept, providing a final executive summary and a clear, data-driven roadmap for future development.

---

### 🧾 **Business Rationale (Senior BA Perspective)**
This notebook represents the final deliverable of our technical PoC. Having successfully identified and optimized a winning architecture in the previous phase, our task now is to package this asset for its final test run and to formally conclude the project.

The `submission.csv` file generated here is the tangible output that fulfills the project's technical requirements. More importantly, the final summary and future roadmap will be the key strategic documents we use to make a business case for "Project MedVision: Phase 2," outlining the necessary steps to transition this successful prototype into a production-ready, clinical-grade solution.


In [5]:
#SETUP, LIBRARIES, AND DATA LOADING

import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os
from tqdm import tqdm

# --- Configuration ---
# The model file we saved from our previous notebook
MODEL_PATH = "best_model.keras"

# Paths for the unlabeled test data
TEST_IMG_DIR = "Data/aptos2019-blindness-detection/test_images"
TEST_CSV_PATH = "Data/aptos2019-blindness-detection/test.csv"

# Model parameters (must match the settings used during training)
IMG_SIZE = 224
BATCH_SIZE = 32 # We can use a larger batch size for inference if memory allows

# --- Load Test Metadata ---
try:
    test_df = pd.read_csv(TEST_CSV_PATH)
    # Construct the full filepath for each test image
    test_df['filepath'] = test_df['id_code'].apply(lambda x: os.path.join(TEST_IMG_DIR, f"{x}.png"))
    print(f"✅ Successfully loaded test metadata for {len(test_df)} images.")
except FileNotFoundError:
    print(f"❌ ERROR: Could not find the test.csv file at {TEST_CSV_PATH}")
    test_df = pd.DataFrame()

if not test_df.empty:
    print("\nTest data ready for prediction.")

✅ Successfully loaded test metadata for 1928 images.

Test data ready for prediction.


### Path Verification and Data Generator Setup

Before creating the generator, it's crucial to verify that our file paths are correct. The following "Sanity Check" will print the current working directory and check if the first image file exists. If this check fails, you may need to adjust the `TEST_IMG_DIR` variable above.

In [6]:
if not test_df.empty:
    # --- Sanity Check ---
    print("--- Path Sanity Check ---")
    print(f"Current Working Directory: {os.getcwd()}")

    # Check the first file path
    first_filepath = test_df['filepath'].iloc[0]
    print(f"Checking for file at: {first_filepath}")

    if os.path.exists(first_filepath):
        print("✅ Sanity check passed: First image found successfully.")
    else:
        print("❌ SANITY CHECK FAILED: Could not find the first image.")
        print("Please check your 'TEST_IMG_DIR' path and your script's working directory.")
    print("---------------------------\n")


    # --- Create Generator ---
    # Create a generator for the TEST data (only rescaling)
    test_datagen = ImageDataGenerator(rescale=1./255)

    test_generator = test_datagen.flow_from_dataframe(
        dataframe=test_df,
        x_col='filepath',
        y_col=None, # No labels for the test set
        target_size=(IMG_SIZE, IMG_SIZE),
        batch_size=BATCH_SIZE,
        class_mode=None, # Crucial for unlabeled data
        shuffle=False # DO NOT shuffle test data
    )
    print("✅ Test data generator created.")

--- Path Sanity Check ---
Current Working Directory: D:\__Monica Documents\PyCharm Repo\ML Evolution Lab Repo\01_Projects\Diabetic Retinopathy Detection
Checking for file at: Data/aptos2019-blindness-detection/test_images\0005cfc8afb6.png
✅ Sanity check passed: First image found successfully.
---------------------------

Found 1928 validated image filenames.
✅ Test data generator created.


### Load Model and Make Predictions

With our paths verified, we will now load our best-performing model and use it to predict the diagnosis for each image in the test set.

In [7]:
if not test_df.empty and test_generator.n > 0:
    # 1. Load the best saved model
    try:
        print(f"Loading the best model from '{MODEL_PATH}'...")
        best_model = load_model(MODEL_PATH)
        print("✅ Model loaded successfully.")
    except Exception as e:
        print(f"❌ ERROR: Could not load the model. Ensure '{MODEL_PATH}' exists. Error: {e}")
        best_model = None

    # 2. Make predictions on the test data
    if best_model is not None:
        print("\nMaking predictions on the test data. This may take a few minutes...")

        y_pred_probs = best_model.predict(test_generator, steps=len(test_generator))
        y_pred = np.argmax(y_pred_probs, axis=1)

        print("✅ Predictions complete.")
        print(f"Sample predictions (first 5): {y_pred[:5]}")
else:
    print("Skipping prediction because the test generator is empty.")

Loading the best model from 'best_model.keras'...
✅ Model loaded successfully.

Making predictions on the test data. This may take a few minutes...


  self._warn_if_super_not_called()


[1m61/61[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m91s[0m 1s/step
✅ Predictions complete.
Sample predictions (first 5): [1 3 2 2 3]


### Create and Save Submission File

This is the final operational step. We will now create a new DataFrame, populate it with the image IDs from our test set and our model's predictions, and save it to `submission.csv` in the format required by the Kaggle competition.

In [8]:
SUBMISSION_CSV_PATH = "Data/aptos2019-blindness-detection/Final_Submission.csv"

if 'y_pred' in locals():
    # 1. Create a new DataFrame for the submission
    submission_df = pd.DataFrame({
        'id_code': test_df['id_code'],
        'diagnosis': y_pred
    })

    # 2. Save the DataFrame to a CSV file
    # `index=False` is crucial to prevent pandas from writing row numbers
    submission_df.to_csv(SUBMISSION_CSV_PATH, index=False)

    print(f"\n✅ Submission file created successfully at: {SUBMISSION_CSV_PATH}")
    print("Sample of the submission file:")
    print(submission_df.head())
else:
    print("\nSkipping submission file creation because no predictions were made.")


✅ Submission file created successfully at: Data/aptos2019-blindness-detection/Final_Submission.csv
Sample of the submission file:
        id_code  diagnosis
0  0005cfc8afb6          1
1  003f0afdcd15          3
2  006efc72b638          2
3  00836aaacf06          2
4  009245722fa4          3
