<a href="https://colab.research.google.com/github/Pukar33/CBEAS-Project/blob/main/Week3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#Import all necessary libraries
import numpy as np
import pandas as pd
import nibabel as nib
from matplotlib import pyplot as plt
import os
from glob import glob
from google.colab import drive
from ipywidgets import interact, IntSlider

In [None]:
# Mount the drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# ***Part 1: Files Extraction***

*You have a folder called OASIS-3 Dataset Part 1 in Google Drive. This folder contains image files and label files generated from Freesurfer for 25 different patients. Patient index ranges from 30001-30025. Image files contain T1 weighted images of these patients. Every patient has T1w image file in zipped nifti format (.nii.gz) format and every patient is supposed to have corresponding label file. Label files in Freesurfer are generated in (.mgz) format and can be found in mri subfolder within FreeSurfer data subfolder for every patient folder.*

## **Task 1: Extract file paths of all image files and label files in the given dataset. How many images files and label files can be found?**

In [None]:
#Create a function to extract .nii.gz image files and aseg.mgz label files path
def extract_files(directory):
  Loc_Image_Files=sorted(glob(f'{directory}/**/**.nii.gz*',recursive=True))
  Label_Files_Path=sorted(glob(f'{directory}/**/aseg.mgz*',recursive=True))
  Patient_Index=[Label_Files_Path[i].split('/')[7] for i in range(0,len(Label_Files_Path))]
  Image_Files_Path=[]
  for i in range(0,len(Loc_Image_Files)):
    if Loc_Image_Files[i].split('/')[7]!=Loc_Image_Files[i-1].split('/')[7]:
      if Loc_Image_Files[i].split('/')[7] in Patient_Index:
        Image_Files_Path.append(Loc_Image_Files[i])
  return Image_Files_Path,Label_Files_Path

In [None]:
#Create a function to save image and label files in two different folders
def save2folder(Image_Files_Path,Label_Files_Path):
  save_img_directory='/content/drive/MyDrive/Datasets/OASIS Dataset/Image Folder'
  save_label_directory='/content/drive/MyDrive/Datasets/OASIS Dataset/Label Folder'
  for items in Image_Files_Path:
    img=nib.load(items)
    img_data=img.get_fdata()
    save_img=nib.Nifti1Image(dataobj=img_data,affine=img.affine)
    file_name=os.path.join(save_img_directory,items.split('/')[8]+'_'+items.split('/')[9]+'.nii.gz')
    nib.save(save_img,filename=file_name)
  for items in Label_Files_Path:
    img=nib.load(items)
    img_data=img.get_fdata()
    save_img=nib.Nifti1Image(dataobj=img_data,affine=img.affine)
    file_name=os.path.join(save_label_directory,items.split('/')[8]+'.nii.gz')
    nib.save(save_img,filename=file_name)
  return None

In [None]:
#Create a function to create a patient dataframe with arguments as Image folder path and Label Folder Path
def create_patient_dataframe(img_directory,label_directory):
  Image_Path=sorted(glob(f'{img_directory}/*.nii.gz',recursive=True))
  Label_Path=sorted(glob(f'{label_directory}/*.nii.gz',recursive=True))
  Pat_Data = []
  for img_file,label_file in zip(Image_Path,Label_Path):
    img = nib.load(img_file)
    label = nib.load(label_file)
    Pat_Data.append({
        'Image Path': img_file,
        'Label Path': label_file,
        'Image Shape': img.get_fdata().shape,
        'Label Shape': label.get_fdata().shape,
        'Image Orientation': nib.aff2axcodes(img.affine),
        'Label Orientation': nib.aff2axcodes(label.affine),
        'Image Voxel Size': img.header.get_zooms(),
        'Label Voxel Size': label.header.get_zooms()
    })
  Patient_Dataframe = pd.DataFrame(Pat_Data)
  return Patient_Dataframe

In [None]:
# To view the data using pandas
directory='/content/drive/MyDrive/Datasets/OASIS3_DataSet_Part1'
img_directory='/content/drive/MyDrive/Datasets/OASIS Dataset/Image Folder'
label_directory='/content/drive/MyDrive/Datasets/OASIS Dataset/Label Folder'

os.makedirs(img_directory,exist_ok=True)
os.makedirs(label_directory,exist_ok=True)

Image_Files_Path,Label_Files_Path=extract_files(directory)
save2folder(Image_Files_Path,Label_Files_Path)
Pat_Data=create_patient_dataframe(img_directory,label_directory)
Pat_Data

Unnamed: 0,Image Path,Label Path,Image Shape,Label Shape,Image Orientation,Label Orientation,Image Voxel Size,Label Voxel Size
0,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 256, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(0.9999988, 1.0, 1.0)","(1.0, 0.99999994, 0.99999976)"
1,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 240, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(1.1999978, 1.0546874, 1.0546875)","(1.0000001, 1.0, 1.0000001)"
2,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 256, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(0.9999997, 1.0, 1.0)","(1.0, 0.99999994, 1.0000001)"
3,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 240, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(1.2000046, 1.0546875, 1.0546874)","(1.0, 1.0, 1.0)"
4,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 240, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(1.2000046, 1.0546875, 1.0546875)","(1.0, 1.0, 1.0)"
5,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 256, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(1.0000027, 0.99999994, 1.0)","(1.0, 1.0, 1.0000001)"
6,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 256, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(0.99999845, 1.0, 1.0)","(1.0000001, 1.0000001, 1.0000001)"
7,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 240, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(1.2000046, 1.0546875, 1.0546875)","(1.0, 1.0, 1.0)"
8,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 256, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(1.0000005, 1.0, 1.0)","(0.9999998, 0.9999999, 1.0)"
9,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 240, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(1.2000046, 1.0546875, 1.0546875)","(1.0, 1.0, 1.0)"


In [None]:
# To visualize the data
def display_image(slice_index,patient_index):
  img=nib.load(Pat_Data['Image Path'].iloc[patient_index])
  img_data=img.get_fdata()
  print(img_data.shape)
  plt.imshow(img_data[:,:,slice_index],cmap='grey')
  return plt.show()

interact(display_image,slice_index=IntSlider(min=0,max=255,value=100,step=1),patient_index=IntSlider(min=0,max=22,value=0,step=1))

interactive(children=(IntSlider(value=100, description='slice_index', max=255), IntSlider(value=0, description…

## **Task 2: Remove any redundant image files for any patient and also remove any image files for any patient that doesn't have corresponding label files.**

*Note: You can possibly observe that image files for patient index 11,18 and 20 are wrongly encoded as LAS orientation. These image files should be removed along with their corresponding label files.*

In [None]:
#Removing wrongly encoded image and corresponding label files
for i in range(0,len(Pat_Data)):
  if Pat_Data['Image Orientation'].iloc[i]!=('R','A','S'):
    os.remove(Pat_Data['Image Path'].iloc[i])
    os.remove(Pat_Data['Label Path'].iloc[i])


In [None]:
#Now visualize the dataframe once wrongly encoded image and label files are removed.
Pat_Data=create_patient_dataframe(img_directory,label_directory)
Pat_Data


Unnamed: 0,Image Path,Label Path,Image Shape,Label Shape,Image Orientation,Label Orientation,Image Voxel Size,Label Voxel Size
0,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 256, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(0.9999988, 1.0, 1.0)","(1.0, 0.99999994, 0.99999976)"
1,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 240, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(1.1999978, 1.0546874, 1.0546875)","(1.0000001, 1.0, 1.0000001)"
2,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 256, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(0.9999997, 1.0, 1.0)","(1.0, 0.99999994, 1.0000001)"
3,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 240, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(1.2000046, 1.0546875, 1.0546874)","(1.0, 1.0, 1.0)"
4,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 240, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(1.2000046, 1.0546875, 1.0546875)","(1.0, 1.0, 1.0)"
5,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 256, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(1.0000027, 0.99999994, 1.0)","(1.0, 1.0, 1.0000001)"
6,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 256, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(0.99999845, 1.0, 1.0)","(1.0000001, 1.0000001, 1.0000001)"
7,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 240, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(1.2000046, 1.0546875, 1.0546875)","(1.0, 1.0, 1.0)"
8,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 256, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(1.0000005, 1.0, 1.0)","(0.9999998, 0.9999999, 1.0)"
9,/content/drive/MyDrive/Datasets/OASIS Dataset/...,/content/drive/MyDrive/Datasets/OASIS Dataset/...,"(176, 240, 256)","(256, 256, 256)","(R, A, S)","(L, I, A)","(1.2000046, 1.0546875, 1.0546875)","(1.0, 1.0, 1.0)"


*Note: You should end up with 23 image and label files. Hint: Use glob library to work with file paths. You can use nibabel library to load .mgz file as well.*

# ***Part 2: Dataset Correction***

## **Task 1: Inspect all image files if they are correctly encoded or not. Create a pandas dataframe with following columns:**


*   **Image File Path**
*   **Image Voxel Size**
*   **Image Resolution**
*   **Image Orientation**
*   **Label File Path**
*   **Label Voxel Size**
*   **Label Resolution**
*   **Label Oreintation**





In [None]:
# Performed above

## **Task 2: Inspect any inconsistencies in image file like wrong image orientation using ipywidgets. Remove any image files that are wrongly encoded like orientation. Also remove its corresponding label file as well.**

In [None]:
# Performed above

# ***Part 3: Folder Organisation***

*After Part 1 and 2, you may want to create two folders in Google Drive namely Image Folder with all valid image files(.nii.gz) and Label Folder with all valid label files(aseg.mgz), since these two are what we need eventually. Attempt this step by writing code(not by copying and pasting files from one folder to another in Google Drive) and reading image data and saving data in certain file name. Remember file name should have patient_index,session_number for identification. Moreover, label file with extension .mgz file should be eventually changed to nifti format(.nii.gz) as well. After this step you can delete the OASIS3_Dataset_Part1 folder and just retain just two folders: Image Folder and Label Folder. This will reduce the storage burden for the drive by removing unncecessary files and folders.Also save all image files and label files in zipped format.*

# **Important Note: You may want to create a function that executes all parts above which can be used later in the future for other parts of the datasets as well.**