<a href="https://colab.research.google.com/github/DinurakshanRavichandran/Visio-Glance/blob/Pre-Processed-Datasets-NLP/unified_eye_disease_detection_corrected.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Unified Eye Disease Detection Model
This notebook implements a machine learning pipeline to predict one of six eye diseases based on symptom datasets.

In [8]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


01) Concatenating the 6 datasets.

In [16]:
import pandas as pd

# Load all datasets (assuming they are stored as CSV files in a folder)
datasets = {
              "/content/drive/MyDrive/PROJECT 29/FINAL MODEL/preprocessed_glaucoma_dataset.csv" : "Glaucoma",
              "/content/drive/MyDrive/PROJECT 29/FINAL MODEL/Preprocessed_Cataract_Dataset.csv" : "Cataract",
              "/content/drive/MyDrive/PROJECT 29/FINAL MODEL/Preprocessed_Diabetic_Retinopathy_Dataset.csv" : "Diabetic Retinopathy",
              "/content/drive/MyDrive/PROJECT 29/FINAL MODEL/Preprocessed_CNV_Detection_Dataset.csv" : "CNV",
              "/content/drive/MyDrive/PROJECT 29/FINAL MODEL/Preprocessed_DME_Dataset.csv" : "DME",
              "/content/drive/MyDrive/PROJECT 29/FINAL MODEL/Preprocessed_Drusen_Dataset.csv" : "Drusen"
}

# Load and label each dataset
dataframes = []
for file, disease in datasets.items():
    df = pd.read_csv(file)  # Load dataset
    df["Disease_Label"] = disease  # Add disease label
    dataframes.append(df)

# Concatenate all datasets
merged_df = pd.concat(dataframes, ignore_index=True)

# Fill missing values with zero
merged_df.fillna(0, inplace=True)

# Save the merged dataset
merged_df.to_csv("/content/drive/MyDrive/PROJECT 29/FINAL MODEL/merged_dataset.csv", index=False)

# Display dataset shape and preview
print(f"Final dataset shape: {merged_df.shape}")
print(merged_df.head())


Final dataset shape: (73624, 112)
   Age  Intraocular Pressure (IOP)  Cup-to-Disc Ratio (CDR)  Pachymetry  \
0   69                       19.46                     0.42      541.51   
1   69                       18.39                     0.72      552.77   
2   67                       23.65                     0.72      573.65   
3   23                       18.04                     0.61      590.67   
4   21                       15.87                     0.30      588.41   

   Diagnosis  Visual Symptom_vomiting  Visual Symptom_nausea  \
0          1                      0.0                    1.0   
1          1                      0.0                    0.0   
2          1                      0.0                    0.0   
3          1                      0.0                    1.0   
4          1                      0.0                    0.0   

   Visual Symptom_eye pain  Visual Symptom_vision loss  \
0                      1.0                         0.0   
1             

02) Multi-Class Classification (Detect Specific Disease)

In [17]:
from sklearn.preprocessing import LabelEncoder

# Encode Disease_Label into numerical values (0-5)
label_encoder = LabelEncoder()
merged_df["Diagnosis"] = label_encoder.fit_transform(merged_df["Disease_Label"])

# Drop the original text label column
merged_df.drop(columns=["Disease_Label"], inplace=True)

# Save the updated dataset
merged_df.to_csv("merged_multi_class_classification.csv", index=False)

# Print class mapping
print("Class Mapping:", dict(enumerate(label_encoder.classes_)))
print("Multi-class classification dataset prepared!")


Class Mapping: {0: 'CNV', 1: 'Cataract', 2: 'DME', 3: 'Diabetic Retinopathy', 4: 'Drusen', 5: 'Glaucoma'}
Multi-class classification dataset prepared!
