<a href="https://colab.research.google.com/github/Sonia-Mokhtari/Deep-Learning/blob/sp25_KenParWay_data01/sp25_GhaMokNeg_data01_prep.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **A. Introduction & Dataset Overview**

### **Dataset Name and Description**
The **OASIS-1** dataset consists of **T1-weighted MRI** scans from **416 participants**, including both **cognitively normal** individuals and those with **Alzheimer's disease**. About **100 participants** over the age of 60 have Alzheimer's (ranging from very mild to moderate cases). The dataset helps researchers study brain structures and its relationship to  cognitive disorders.

The scans are in **NIfTI (.nii.gz)** format, which represents the brain as a **3D volume of voxels** (volumetric pixels, which are small cubes of data representing a point in 3D space). The MRI data captures the brain's internal structures from multiple perspectives, allowing for visualization in three main planes:

**_Axial View:_** A top-down view, slicing the brain horizontally.

**_Sagittal View:_** A side view, dividing the brain into left and right halves.

**_Coronal View:_** A front-facing view, slicing the brain vertically from front to back.

Additionally, the dataset includes metadata for each participant, which provides important information to support analysis. Key metadata fields include:

**_Participant ID:_** A unique identifier for each individual.

**_Age:_** The participant's age at the time of the scan.

**_Sex:_** Indicates if the participant is male or female.

**_Clinical Diagnosis:_** Classification as either cognitively_normal or Alzheimer's.

**_T1 File Name:_** The filename of the corresponding MRI scan.
Scanner Info: Manufacturer and model of the MRI machine used.

### **Machine Learning Problem Definition**
In this project, the goal is to solve a **classification problem** using deep learning. The task involves predicting whether a participant is **cognitively normal** or has **Alzheimer's disease** based on their MRI scans.

**_Input:_** T1-weighted MRI scans from the dataset, represented as 3D images.

**_Output:_** A binary classification (e.g., cognitively_normal or Alzheimer's).

**_Objective:_** Train a neural network that can identify key structural differences in the brain to make accurate predictions.

This problem is crucial for early diagnosis and treatment planning for Alzheimer's disease, as early detection can significantly improve patient outcomes.

## **B. Data Loading & Cleaning**

In [None]:
# Load data from Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Define paths to the dataset files
data_dir= "/content/drive/MyDrive/Deep Learning Projects/Data/OASIS-1"

# Load Metadata
import pandas as pd
metadata_path = "/content/drive/MyDrive/Deep Learning Projects/Data/metadata.csv"
metadata=pd.read_csv(metadata_path)

# Display first few rows
metadata.head()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Unnamed: 0,t1_local_path,split,study,participant_id,session_id,age,sex,clinical_diagnosis,scanner_manufacturer,scanner_model,field_strength,image_quality_rating,total_intracranial_volume,radiata_id
0,DLBS/sub-0028326/ses-01/anat/msub-0028326_ses-...,train,DLBS,28326,1,66,male,cognitively_normal,Philips,Achieva,3T,2.114547,1518.74,1
1,DLBS/sub-0028327/ses-01/anat/msub-0028327_ses-...,train,DLBS,28327,1,73,female,cognitively_normal,Philips,Achieva,3T,2.534683,1325.9,2
2,DLBS/sub-0028328/ses-01/anat/msub-0028328_ses-...,train,DLBS,28328,1,30,male,cognitively_normal,Philips,Achieva,3T,2.295338,1638.25,3
3,DLBS/sub-0028331/ses-01/anat/msub-0028331_ses-...,train,DLBS,28331,1,77,female,cognitively_normal,Philips,Achieva,3T,1.955475,1329.82,6
4,DLBS/sub-0028332/ses-01/anat/msub-0028332_ses-...,train,DLBS,28332,1,71,male,cognitively_normal,Philips,Achieva,3T,2.668323,1545.15,7


In [None]:
# Filter Metadata
# Dispay unique values in the study column
print('Unique values in the study column in metadata:', metadata['study'].unique())

# Filter Metadata for OASIS-1
oasis1_metadata= metadata[metadata['study']=='OASIS-1']

# Display unique values in the clinical_diagnosis	column for OASIS-1 only
print('Unique values in the clinical_diagnosis	column in oasis1_metadata:', oasis1_metadata['clinical_diagnosis'].unique())

# Filter Metadata for Alzheimers_disease within OASIS-1
alzheimer1_metadata=oasis1_metadata[oasis1_metadata['clinical_diagnosis']=='Alzheimers_disease']

# Dispaly how many Cognitively normal and Alzheimers disease cases are present in OASIS-1
print(oasis1_metadata['clinical_diagnosis'].value_counts())

# Display first few rows of oasis_metadata
oasis1_metadata.head()

Unique values in the study column in metadata: ['DLBS' 'IXI' 'NKI-RS' 'OASIS-1' 'OASIS-2']
Unique values in the clinical_diagnosis	column in oasis1_metadata: ['cognitively_normal' 'Alzheimers_disease']
clinical_diagnosis
cognitively_normal    314
Alzheimers_disease    100
Name: count, dtype: int64


Unnamed: 0,t1_local_path,split,study,participant_id,session_id,age,sex,clinical_diagnosis,scanner_manufacturer,scanner_model,field_strength,image_quality_rating,total_intracranial_volume,radiata_id
2430,OASIS-1/sub-OASIS10001/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10001,M000,74,female,cognitively_normal,Siemens,Vision,1.5T,1.8382,1263.92,3196
2431,OASIS-1/sub-OASIS10002/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10002,M000,55,female,cognitively_normal,Siemens,Vision,1.5T,1.866688,1150.82,3197
2432,OASIS-1/sub-OASIS10003/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10003,M000,73,female,Alzheimers_disease,Siemens,Vision,1.5T,1.852032,1386.11,3198
2433,OASIS-1/sub-OASIS10005/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10005,M000,18,male,cognitively_normal,Siemens,Vision,1.5T,1.852903,1666.84,3200
2434,OASIS-1/sub-OASIS10009/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10009,M000,20,female,cognitively_normal,Siemens,Vision,1.5T,1.846645,1509.45,3203


In [None]:
import os

#Define the root directory for MRI files
mri_root_dir= "/content/drive/MyDrive/Deep Learning Projects/Data"

# Make a copy of oasis1_dataset
oasis1_metadata=oasis1_metadata.copy()

# Creat full MRI paths
oasis1_metadata. loc[:,'full_mri_path']= oasis1_metadata['t1_local_path'].apply (lambda x: os.path.join(mri_root_dir, x))

# Check if MRI files exist
oasis1_metadata.loc[:, "mri_exists"] = oasis1_metadata["full_mri_path"].apply(lambda x: os.path.exists(x))

# Print summary
print(oasis1_metadata["mri_exists"].value_counts())
print("MRI paths that are missing:", (~oasis1_metadata["mri_exists"]).sum())

# Display oasis1_metadata
oasis1_metadata.head()

mri_exists
True    414
Name: count, dtype: int64
MRI paths that are missing: 0


Unnamed: 0,t1_local_path,split,study,participant_id,session_id,age,sex,clinical_diagnosis,scanner_manufacturer,scanner_model,field_strength,image_quality_rating,total_intracranial_volume,radiata_id,full_mri_path,mri_exists
2430,OASIS-1/sub-OASIS10001/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10001,M000,74,female,cognitively_normal,Siemens,Vision,1.5T,1.8382,1263.92,3196,/content/drive/MyDrive/Deep Learning Projects/...,True
2431,OASIS-1/sub-OASIS10002/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10002,M000,55,female,cognitively_normal,Siemens,Vision,1.5T,1.866688,1150.82,3197,/content/drive/MyDrive/Deep Learning Projects/...,True
2432,OASIS-1/sub-OASIS10003/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10003,M000,73,female,Alzheimers_disease,Siemens,Vision,1.5T,1.852032,1386.11,3198,/content/drive/MyDrive/Deep Learning Projects/...,True
2433,OASIS-1/sub-OASIS10005/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10005,M000,18,male,cognitively_normal,Siemens,Vision,1.5T,1.852903,1666.84,3200,/content/drive/MyDrive/Deep Learning Projects/...,True
2434,OASIS-1/sub-OASIS10009/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10009,M000,20,female,cognitively_normal,Siemens,Vision,1.5T,1.846645,1509.45,3203,/content/drive/MyDrive/Deep Learning Projects/...,True


In [None]:
# Install nibabel
!pip install nibabel



In [None]:
import nibabel as nib

# Load and process MRI images
def load_mri_image(image_path):
    """ Load an MRI image and returns it as a NumPy array."""
    try:
        img=nib.load(image_path)  # Load MRI image
        data=img.get_fdata()    # Convert to NumPy array

        # Apply Min-Max Normalization
        data_min=data.min()
        data_max=data.max()

        if data_min<data_max:
          data=(data-data_min)/(data_max-data_min)  # Normalize to [0,1]
        else:
          data=np.zero_like(data)   # If all values are the same, set all to 0
        return data

    # Print an error message if error (e.g., file not found, corrupted file) occurs
    except Exception as e:
        print(f"Error loading {image_path}: {e}")
        return None

# Load MRI images and store them in a new column
oasis1_metadata.loc[:, "mri_image"] = oasis1_metadata["full_mri_path"].apply(load_mri_image)

# Display first few rows of oasis1_metadata
oasis1_metadata.head()


Unnamed: 0,t1_local_path,split,study,participant_id,session_id,age,sex,clinical_diagnosis,scanner_manufacturer,scanner_model,field_strength,image_quality_rating,total_intracranial_volume,radiata_id,full_mri_path,mri_exists,mri_image
2430,OASIS-1/sub-OASIS10001/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10001,M000,74,female,cognitively_normal,Siemens,Vision,1.5T,1.8382,1263.92,3196,/content/drive/MyDrive/Deep Learning Projects/...,True,"[[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0..."
2431,OASIS-1/sub-OASIS10002/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10002,M000,55,female,cognitively_normal,Siemens,Vision,1.5T,1.866688,1150.82,3197,/content/drive/MyDrive/Deep Learning Projects/...,True,"[[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0..."
2432,OASIS-1/sub-OASIS10003/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10003,M000,73,female,Alzheimers_disease,Siemens,Vision,1.5T,1.852032,1386.11,3198,/content/drive/MyDrive/Deep Learning Projects/...,True,"[[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0..."
2433,OASIS-1/sub-OASIS10005/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10005,M000,18,male,cognitively_normal,Siemens,Vision,1.5T,1.852903,1666.84,3200,/content/drive/MyDrive/Deep Learning Projects/...,True,"[[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0..."
2434,OASIS-1/sub-OASIS10009/ses-M000/anat/msub-OASI...,train,OASIS-1,OASIS10009,M000,20,female,cognitively_normal,Siemens,Vision,1.5T,1.846645,1509.45,3203,/content/drive/MyDrive/Deep Learning Projects/...,True,"[[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0..."


In [None]:
# Cleaning and Preprocessing
# Check for missing values in the OASIS-1
missing_values=oasis1_metadata.isnull().sum()
print(f'Missing values in the OASIS-1:\n{missing_values}')

# Check data type in the OASIS-1
print(f'Data type in the OASIS-1: \n{oasis1_metadata.dtypes}')

# Dispaly unique values in the defined list of columns
column_list=['split', 'study', 'session_id', 'sex', 'clinical_diagnosis',
             'scanner_manufacturer', 'scanner_model', 'field_strength']
for column in column_list:
    unique_values = oasis1_metadata[column].unique()
    print(f"Unique values in column '{column}': {unique_values}")

# Store mapping of `participant_id` → `mri_image` BEFORE removing `mri_image`
mri_image_mapping = oasis1_metadata.set_index("participant_id")["mri_image"].to_dict()

# Drop unnecessary categorical columns
unnecessary_columns=['study', 'session_id', 'scanner_manufacturer',
                     'scanner_model', 't1_local_path',
                     'radiata_id', 'field_strength', 'mri_exists','full_mri_path']
oasis1_metadata=oasis1_metadata.drop(unnecessary_columns, axis=1)

# Remove "OASIS" from participant_id and convert it to numerical feature
oasis1_metadata['participant_id']=oasis1_metadata['participant_id'].str.replace('OASIS', '').astype(int)

# Display first few rows of oasis1_metadata
oasis1_metadata.head()
# Label encode 'clinical_diagnosis' and 'sex' column
from sklearn.preprocessing import LabelEncoder
label_encoder=LabelEncoder()
oasis1_metadata['clinical_diagnosis']=label_encoder.fit_transform(oasis1_metadata['clinical_diagnosis'])
oasis1_metadata['sex']=label_encoder.fit_transform(oasis1_metadata['sex'])


oasis1_metadata.head()

Missing values in the OASIS-1:
t1_local_path                0
split                        0
study                        0
participant_id               0
session_id                   0
age                          0
sex                          0
clinical_diagnosis           0
scanner_manufacturer         0
scanner_model                0
field_strength               0
image_quality_rating         0
total_intracranial_volume    0
radiata_id                   0
full_mri_path                0
mri_exists                   0
mri_image                    0
dtype: int64
Data type in the OASIS-1: 
t1_local_path                 object
split                         object
study                         object
participant_id                object
session_id                    object
age                            int64
sex                           object
clinical_diagnosis            object
scanner_manufacturer          object
scanner_model                 object
field_strength                ob

Unnamed: 0,split,participant_id,age,sex,clinical_diagnosis,image_quality_rating,total_intracranial_volume,mri_image
2430,train,10001,74,0,1,1.8382,1263.92,"[[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0..."
2431,train,10002,55,0,1,1.866688,1150.82,"[[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0..."
2432,train,10003,73,0,0,1.852032,1386.11,"[[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0..."
2433,train,10005,18,1,1,1.852903,1666.84,"[[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0..."
2434,train,10009,20,0,1,1.846645,1509.45,"[[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0..."


In [None]:
import numpy as np

def detect_outliers_iqr(df):
    outliers_dict = {}

    # Select only numerical columns
    numerical_columns = df.select_dtypes(include=[np.number]).columns

    for column in numerical_columns:
        Q1 = df[column].quantile(0.25)  # First quartile (25%)
        Q3 = df[column].quantile(0.75)  # Third quartile (75%)
        IQR = Q3 - Q1  # Interquartile Range
        lower_bound = Q1 - 1.5 * IQR  # Lower bound
        upper_bound = Q3 + 1.5 * IQR  # Upper bound

        # Find outliers
        outliers = df[(df[column] < lower_bound) | (df[column] > upper_bound)]
        num_outliers = outliers.shape[0]  # Count outliers

        # Store in dictionary
        outliers_dict[column] = num_outliers

        print(f"Column: {column} → Outliers Detected: {num_outliers}")

    return outliers_dict

# Run the function on numerical columns
outliers_summary = detect_outliers_iqr(oasis1_metadata)

# Clip Image Quality Rating at 99th percentile
upper_limit = oasis1_metadata["image_quality_rating"].quantile(0.99)
oasis1_metadata["image_quality_rating"] = oasis1_metadata["image_quality_rating"].clip(upper=upper_limit)

# Apply capping to extreme outliers in "total_intracranial_volume" column
Q1 = oasis1_metadata["total_intracranial_volume"].quantile(0.25)
Q3 = oasis1_metadata["total_intracranial_volume"].quantile(0.75)
# Apply capping
oasis1_metadata["total_intracranial_volume"] = oasis1_metadata["total_intracranial_volume"].clip(lower=Q1, upper=Q3)

print(oasis1_metadata["image_quality_rating"].describe())
print(oasis1_metadata["total_intracranial_volume"].describe())

Column: participant_id → Outliers Detected: 0
Column: age → Outliers Detected: 0
Column: sex → Outliers Detected: 0
Column: clinical_diagnosis → Outliers Detected: 100
Column: image_quality_rating → Outliers Detected: 40
Column: total_intracranial_volume → Outliers Detected: 4
count    414.000000
mean       1.860240
std        0.048602
min        1.835361
25%        1.842263
50%        1.847598
75%        1.855458
max        2.154923
Name: image_quality_rating, dtype: float64
count     414.000000
mean     1437.077029
std        72.457374
min      1347.975000
25%      1348.076250
50%      1437.580000
75%      1526.474375
max      1526.502500
Name: total_intracranial_volume, dtype: float64


In [None]:
# Apply SMOTE to numerical features of oasis1-metadata
# Balance the Training set only (Not Validation/Test)
from imblearn.over_sampling import SMOTE

# Split data on 'split' column
train_data=oasis1_metadata[oasis1_metadata['split']=='train']
val_data=oasis1_metadata[oasis1_metadata['split']=='validation']
test_data=oasis1_metadata[oasis1_metadata['split']=='test']

# Drop 'split' column since it is not needed anymore
train_data=train_data.drop('split', axis=1)
val_data=val_data.drop('split', axis=1)
test_data=test_data.drop('split', axis=1)

# Store participant_id and mri_image BEFORE applying SMOTE
train_participant_ids = train_data["participant_id"].values
train_mri_images = train_data["mri_image"].values

# Seperate features and labels in training set
X_train=train_data.drop(columns=['participant_id', 'mri_image', 'clinical_diagnosis'], axis=1)
y_train=train_data['clinical_diagnosis']

#Apply SMOTE only on the training set
smote=SMOTE(sampling_strategy=0.5, random_state=42)   # Make Class 0 = 50% of Class 1
X_train_resampled, y_train_resampled=smote.fit_resample(X_train, y_train)

# Convert SMOTE output (NumPy arrays) back to Pandas DataFrame in one step
train_data_balanced = pd.DataFrame(X_train_resampled, columns=X_train.columns)
train_data_balanced['clinical_diagnosis'] = y_train_resampled

# Print class distribution in train set before and after imbalancment
print('Class distribution in train set before SMOTE:')
print(y_train.value_counts())
print('Class distribution in train set after SMOTE:')
print(y_train_resampled.value_counts())

Class distribution in train set before SMOTE:
clinical_diagnosis
1    252
0     88
Name: count, dtype: int64
Class distribution in train set after SMOTE:
clinical_diagnosis
1    252
0    126
Name: count, dtype: int64


Since OASIS-1 dataset is not extremely imbalanced, fully balancing (252:252) may introduce too many synthetic samples, which can hurt generalization. Instead, increasing Class 0 to 126 samples (50% of Class 1) keeps things more natural.

In [None]:
# Compute the number of synthetic samples created by SMOTE
num_synthetic_samples = len(X_train_resampled) - len(train_participant_ids)

# Generate synthetic participant IDs by randomly duplicating real ones
synthetic_participant_ids = np.random.choice(train_participant_ids, size=num_synthetic_samples, replace=True) if num_synthetic_samples > 0 else np.array([])

# Generate synthetic MRI images (assign existing images to synthetic samples)
synthetic_mri_images=np.random.choice(train_mri_images, size=num_synthetic_samples, replace=True) if num_synthetic_samples > 0 else np.array([])

# Assign participant IDs back to train_data_balanced
train_data_balanced["participant_id"] = np.concatenate([train_participant_ids, synthetic_participant_ids])
train_data_balanced["mri_image"] = np.concatenate([train_mri_images, synthetic_mri_images])

# Print a sample of the restored participant IDs
print(train_data_balanced["participant_id"].head())

0    10001
1    10002
2    10003
3    10005
4    10009
Name: participant_id, dtype: int64


In [None]:
# Normalizaion of numerical features of oasis1-metadata
from sklearn.preprocessing import StandardScaler

# Initialize the scaler
scaler = StandardScaler()

# Select numerical features (Excluding 'participant_id' and 'mri_image')
numerical_features = train_data_balanced.drop(columns=['participant_id', 'mri_image', 'clinical_diagnosis'])

# Fit the scaler on the training set and transform
X_train_scaled = scaler.fit_transform(numerical_features)
X_val_scaled = scaler.transform(val_data.drop(columns=['participant_id', 'mri_image', 'clinical_diagnosis'], axis=1))
X_test_scaled = scaler.transform(test_data.drop(columns=['participant_id', 'mri_image','clinical_diagnosis'], axis=1))

# Convert back to DataFrame
X_train_scaled = pd.DataFrame(X_train_scaled, columns=numerical_features.columns)
X_val_scaled = pd.DataFrame(X_val_scaled, columns=numerical_features.columns)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=numerical_features.columns)

# Add `participant_id`, `mri_image`, and `clinical_diagnosis` back
X_train_scaled['participant_id'] = train_data_balanced['participant_id']
X_train_scaled['mri_image'] = train_data_balanced['mri_image']
X_train_scaled['clinical_diagnosis'] = train_data_balanced['clinical_diagnosis']

X_val_scaled['participant_id']=val_data['participant_id']
X_val_scaled['mri_image']=val_data['mri_image']
X_val_scaled['clinical_diagnosis']=val_data['clinical_diagnosis']

X_test_scaled['participant_id']=test_data['participant_id']
X_test_scaled['mri_image']=test_data['mri_image']
X_test_scaled['clinical_diagnosis']=test_data['clinical_diagnosis']

print(X_train_scaled.dtypes)
print(X_val_scaled.dtypes)
print(X_test_scaled.dtypes)

age                          float64
sex                          float64
image_quality_rating         float64
total_intracranial_volume    float64
participant_id                 int64
mri_image                     object
clinical_diagnosis             int64
dtype: object
age                          float64
sex                          float64
image_quality_rating         float64
total_intracranial_volume    float64
participant_id               float64
mri_image                     object
clinical_diagnosis           float64
dtype: object
age                          float64
sex                          float64
image_quality_rating         float64
total_intracranial_volume    float64
participant_id               float64
mri_image                     object
clinical_diagnosis           float64
dtype: object


## **C. Convert Dataset into Tensor Format**

In [None]:
# Convert Train set to PyTorch Tensors

import torch

# Convert Train Set to PyTorch Tensors
X_train_np=X_train_scaled.drop(columns=['participant_id', 'mri_image', 'clinical_diagnosis']).to_numpy()
y_train_np=train_data_balanced['clinical_diagnosis'].to_numpy()
X_train_tensor = torch.tensor(X_train_np, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train_np, dtype=torch.long)

# Convert Validation Set to PyTorch Tensors
X_val_np= X_val_scaled.drop(columns=['participant_id', 'mri_image', 'clinical_diagnosis']).to_numpy()
y_val_np=val_data['clinical_diagnosis'].to_numpy()
X_val_tensor=torch.tensor(X_val_np, dtype=torch.float32)
y_val_tensor=torch.tensor(y_val_np, dtype=torch.long)

# Convert Test Set to PyTorch Tensors
X_test_np=X_test_scaled.drop(columns=['participant_id', 'mri_image', 'clinical_diagnosis']).to_numpy()
y_test_np=test_data['clinical_diagnosis'].to_numpy()
X_test_tensor=torch.tensor(X_test_np, dtype=torch.float32)
y_test_tensor=torch.tensor(y_test_np, dtype=torch.long)

# Print tensor shape
print(X_train_tensor.shape)
print(y_train_tensor.shape)
print(X_val_tensor.shape)
print(y_val_tensor.shape)
print(X_test_tensor.shape)
print(y_test_tensor.shape)

torch.Size([378, 4])
torch.Size([378])
torch.Size([38, 4])
torch.Size([38])
torch.Size([36, 4])
torch.Size([36])


In [None]:
# Convert MRI images to PyTorch tensores
oasis1_metadata['mri_image']=oasis1_metadata['mri_image'].apply(
    lambda x: torch.tensor(x, dtype=torch.float32) if x is not None else None
)

# Print shape of a sample image
sample_image=oasis1_metadata.iloc[0]['mri_image']
print('Sample MRI image shape:', sample_image.shape if sample_image is not None else 'No image loaded')

Sample MRI image shape: torch.Size([113, 137, 113])
