# Training BIANCA Automatically

The purpose of this notebook is to document the process to train BIANCA and to have a script that allows re-doing the process automatically without having to manually sort, rename or adjust any files.

Before proceeding to the code part, it is essential to understand the steps that will be required for training BIANCA.

Following the training practical [https://open.win.ox.ac.uk/pages/fslcourse/practicals/seg_struc/index.html#bianca], we will be needing the following files for each subject, regardless of how many subjects there are:
- Flair Brain Image
- Flair Brian Lesion Mask 
- T1 Brain Image
- Flair Brain to MNI for .mat
- T1 Brain to Flair Brain

## Pre-requisities:

To start off, we will be needing the **Flair Brain Image**, **Flair Brain Lesion Mask**, and the **T1 Brain Image** for each subject. 
Once we have those, we can proceed and extract the remaining by registration and resampling. 

Another condition to ensure the proper functioning of the code blocks in this Jupyter Notebook is to have a folder that contains the names of the **Flair Brain Image**, **Flair Brain Lesion Mask**, and the **T1 Brain Image** with the following convention and structure:

**ms_0xx_tpy_z.nii.gz**
(x = subject number, y = timepoint, z = file name)
- As an example:
    - ms_002_tp1_flair_brain.nii.gz 
        - (having 'flair_brain' in the name is necessary)
    - ms_002_tp1_lesion_mask.nii.gz 
        - (having 'mask' in the name is necessary)
    - ms_002_tp1_T1_brain.nii.gz    
        - (having 'T1' in the name is necessary)

### An example of a folder I created for the test run is as follows:


<img src="notebook_images/file_structure.png" alt="Drawing" style="width: 800px;"/>


## First Step: Resampling all Images

One thing to note is that BIANCA requires all the **Flair Brain Image(s)** to be in the same dimensions, therefore we will figure out the most common/ occuring dimensions and will then resample all other images to be of that very dimension.

### Getting the dimensions for all the flair brain Images

#### First, we get the folder path where all the images are stored

In [1]:
from tkinter import Tk, filedialog

root = Tk() # pointing root to Tk() to use it as Tk() in program.
root.withdraw() # Hides small tkinter window.
''
root.attributes('-topmost', True) # Opened windows will be active. above all windows despite of selection.
''
open_file = filedialog.askdirectory() # Returns opened path as str
print(open_file) 

/Users/zunairviqar/Desktop/MRI_Work/Jupyter Notebooks/test_data


#### Ensuring the correct folder is selected by printing out the names of all the files within the folder

In [2]:
import os
files = os.listdir(open_file) # returns list
print(files)

['.DS_Store', 'ms_003_tp1_flair_brain.nii.gz', 'MNI152_2mm_brain.nii.gz', 'ms_002_tp1_flair_lesion_mask.nii.gz', 'Training-2022-04-25', 'ms_002_tp1_T1_brain.nii.gz', 'ms_002_tp1_flair_brain.nii.gz', 'ms_003_tp1_T1_brain.nii.gz']


#### Second, we create 3 different lists, and then store the files for the Flair Brain, Lesion Mask and T1 Brain images respectively

In [3]:
flairs = []
masks = []
t1s = []

for file in files:
#     print(file)
    if 'flair_brain' in file:
        flairs.append(file)
    elif 'mask' in file:
        masks.append(file)
    elif 'T1' in file:
        t1s.append(file)

In [4]:
print("All the Flair Brain Images are:")
print(flairs)
print()
print("All the Lesion Mask Images are:")
print(masks)
print()
print("All the T1 Brain Images are:")
print(t1s)

All the Flair Brain Images are:
['ms_003_tp1_flair_brain.nii.gz', 'ms_002_tp1_flair_brain.nii.gz']

All the Lesion Mask Images are:
['ms_002_tp1_flair_lesion_mask.nii.gz']

All the T1 Brain Images are:
['ms_002_tp1_T1_brain.nii.gz', 'ms_003_tp1_T1_brain.nii.gz']


### Checking the most frequently occuring dimension

In [5]:
dimensions = []
directory_1 = open_file.replace(' ', '\ ')

def check_dimensions(file_name, printvals):
    p = os.popen(
        f"cd {directory_1} && fslinfo {file_name}")
    if p:
        output = p.read()
        #print(output)
        output = output.replace("\t", "\n")
        file_info = output.split("\n")
        while '' in file_info:
            file_info.remove('')
        #print(file_info)
        dim1 = file_info[3]
        dim2 = file_info[5]
        dim3 = file_info[7]
        if printvals== True:
            print("Dimensions of: ", file_name)
            print(dim1,dim2,dim3)
        return(dim1,dim2,dim3)

for i in range(len(flairs)):
    dimensions.append(check_dimensions(flairs[i], True))

Dimensions of:  ms_003_tp1_flair_brain.nii.gz
280 320 32
Dimensions of:  ms_002_tp1_flair_brain.nii.gz
256 256 46


In [6]:
def most_frequent(List):
    return max(set(List), key = List.count)
 
most_freq = most_frequent(dimensions)
print("The Most Frequently Occuring Dimension is: ")
print(most_freq[0],"x", most_freq[1],"x", most_freq[2])

The Most Frequently Occuring Dimension is: 
256 x 256 x 46


### Picking an Image with the most frequently occuring dimension for reference purposes

#### Index for the most frequently occuring dimension

In [7]:
ind = dimensions.index(most_freq)

#### Getting the corresponding Value from the *flairs* list for the corresponding image

In [8]:
ref_img = flairs[ind]
print(ref_img)

ms_002_tp1_flair_brain.nii.gz


### Create a subdirectory 'Training-{DATE}' to store all the resampled files, which shall be used for training purposes

In [9]:
from datetime import date
today = date.today()
directory_name = 'Training-' + str(today)

In [10]:
path = os.path.join(open_file, directory_name)
try:
    os.mkdir(path)
except OSError as error:
    print(error)

[Errno 17] File exists: '/Users/zunairviqar/Desktop/MRI_Work/Jupyter Notebooks/test_data/Training-2022-04-25'


### Within the subdirectory, we create multiple subdirectories for each subject

In [11]:
folder_names = []

for file in flairs:
    print("_".join(file.split('_')[0:3]))
    folder_name = "_".join(file.split('_')[0:3])
    folder_names.append(folder_name)
    
    path = os.path.join(open_file + '/' + directory_name, folder_name)
    try:
        os.mkdir(path)
    except OSError as error:
        print(error)

ms_003_tp1
[Errno 17] File exists: '/Users/zunairviqar/Desktop/MRI_Work/Jupyter Notebooks/test_data/Training-2022-04-25/ms_003_tp1'
ms_002_tp1
[Errno 17] File exists: '/Users/zunairviqar/Desktop/MRI_Work/Jupyter Notebooks/test_data/Training-2022-04-25/ms_002_tp1'


### Now, we should essentially have a directory structure like the following:

<img src="notebook_images/new_folders.png" alt="Drawing" style="width: 700px;"/>


### Using the reference Image, we iterate through all the flair brain images with different dimensions and resample them to be in the same dimensions, and then store them into their respective subdirectory

#### We do the same for the corresponding Lesion Masks to those Flair Brain Images since they should be resampled into the same dimensions as their corresponding flairs

#### However, out of all the subjects, one of the Subjects will be solely for testing purposes and will not have an existing flair lesion mask. Therefore, since it can be any, we will ask the user for the subject number and timepoint for the test subject

In [12]:
training_folder = input("Please enter the Test Subject as ms_0xx_tpx (example: ms_003_tp1)")

Please enter the Test Subject as ms_0xx_tpx (example: ms_003_tp1)ms_003_tp1


In [13]:
print("The Test Subject is: ", training_folder)

The Test Subject is:  ms_003_tp1


In [14]:
for file in flairs:
    
    folder_name = "_".join(file.split('_')[0:3])
    
    # For all the images that have a dimension other than the most occuring dimension
    if check_dimensions(file, False) != check_dimensions(ref_img, False):
        print("RESAMPLING....")
        print("FILE TO BE RESAMPLED: ", file)
        print()
                
        # Resampling the Flair Brain Image
        p = os.popen(
            f"cd {directory_1} && flirt -in {file} -ref {ref_img} -out {directory_name}/{folder_name}/{folder_name}_flair_brain.nii.gz -omat {directory_name}/{folder_name}/resampling.mat -interp nearestneighbour -dof 6")
        if p:
            output = p.read()
            print(output)
        
        if(folder_name != training_folder):
            
            # FIND THE LESION MASK FOR THE SAME FILE
            for mask in masks:
                if (folder_name in mask):
                    print("Mask to be Resampled: ",mask)
                    mask_file = mask
                    break


            # Resampling the Flair Lesion Mask
            p = os.popen(
                f"cd {directory_1} && flirt -in {mask_file} -ref {ref_img} -out {directory_name}/{folder_name}/{folder_name}_flair_lesion_mask.nii.gz -applyxfm -init {directory_name}/{folder_name}/resampling.mat -interp nearestneighbour -dof 6")
            if p:
                output = p.read()
                print(output)

        # Deleting the Resampling.mat file since it is of no use to us        
        p = os.popen(
            f"cd {directory_1} && rm {directory_name}/{folder_name}/resampling.mat")
        if p:
            output = p.read()
            print(output)
    
    # In the case that the file does not need to be resampled, we simply just copy/paste it into the specific training directory     
    else:    
        print("COPYING...")
        print("Copying the file:", file)
        # Copying the Flair Brain Images        
        p = os.popen(
            f"cd {directory_1} && cp {file} {directory_name}/{folder_name}/{folder_name}_flair_brain.nii.gz")
        if p:
            output = p.read()
            print(output)  
        
        if(folder_name != training_folder):
            
            # FIND THE LESION MASK FOR THE SAME FILE
            for mask in masks:
                if (folder_name in mask):
                    print("Mask to be Copied: ",mask)
                    mask_file = mask
                    break

            print("Copying the file:", mask_file)
            # Copying the Flair Lesion Mask        
            p = os.popen(
                f"cd {directory_1} && cp {mask_file} {directory_name}/{folder_name}/{folder_name}_flair_lesion_mask.nii.gz")
            if p:
                output = p.read()
                print(output)  
            
            
    

RESAMPLING....
FILE TO BE RESAMPLED:  ms_003_tp1_flair_brain.nii.gz


COPYING...
Copying the file: ms_002_tp1_flair_brain.nii.gz

Mask to be Copied:  ms_002_tp1_flair_lesion_mask.nii.gz
Copying the file: ms_002_tp1_flair_lesion_mask.nii.gz



## Second Step: Flair Brain to MNI for .mat

Now that we have all the images over the same dimensions, we can move forward with the step of extracting the .mat
file needed to train BIANCA by registering the flair brain image over the sample MNI Image.

#### As a pre-requisite to this step, we need the MNI Space reference file in order to register the resampled flair brains to MNI Space and retrieve the .mat file

<img src="MNI_file.png" alt="Drawing" style="width: 500px;"/>


### Since the location of this file is not fixed, we can go ahead and select the file path for the MNI Space file by running the code block below:

In [15]:
from tkinter import Tk, filedialog

root = Tk() # pointing root to Tk() to use it as Tk() in program.
root.withdraw() # Hides small tkinter window.
''
root.attributes('-topmost', True) # Opened windows will be active. above all windows despite of selection.
''
mni_file = filedialog.askopenfilename() # Returns opened path as str

mni_directory = mni_file.replace(' ', '\ ')
print(mni_directory) 


/Users/zunairviqar/Desktop/MRI_Work/Jupyter\ Notebooks/test_data/MNI152_2mm_brain.nii.gz


### Then, we actually move forward and apply the registration command to transform the Flair Brain Image to the MNI Space and extract the .mat file

In [16]:
for file in flairs:
    
    folder_name = "_".join(file.split('_')[0:3])
    
    # Registering the FLAIR Brain Image to MNI Space
    print(f"File being Registered is:  {folder_name}_flair_brain.nii.gz")
    p = os.popen(
        f"cd {directory_1}/{directory_name} && flirt -in {folder_name}/{folder_name}_flair_brain.nii.gz -ref {mni_directory} -out {folder_name}/{folder_name}_flair_brain_to_MNI.nii.gz -omat {folder_name}/{folder_name}_flair_brain_to_MNI.mat -bins 256 -cost normcorr -searchrx -180 180 -searchry -180 180 -searchrz -180 180 -dof 7 -interp nearestneighbour")
    if p:
        output = p.read()
        print(output)

File being Registered is:  ms_003_tp1_flair_brain.nii.gz

File being Registered is:  ms_002_tp1_flair_brain.nii.gz



## Third Step: T1 Brain to Flair Brain

Following a similar approach to the Second Step, this time we will use the T1 Brain Image and then register it onto the Flair Brain images across all the subjects. These T1 Images (stored in the master directory, will actually be registered to the resampled version of the Flair Brain Images (stored in their respective directories).

As a demonstration, the grey highlighted file ***'ms_003_tp1_T1_brain.nii.gz'*** in the left most column will be registered to the blue highlighted file ***'ms_003_tp1_flair_brain.nii.gz'*** in the right most column.

<img src="t1_flair.png" alt="Drawing" style="width: 1000px;"/>


### Now, we actually move forward and apply the registration command to transform the T1 Brain Image to the Resampled Flair Brain Image and obtain the T1 Brain to Flair Brain Image

In [17]:
for file in t1s:
    
    folder_name = "_".join(file.split('_')[0:3])
    
    # Registering the FLAIR Brain Image to MNI Space
    print("Registering the file: ",file)
    p = os.popen(
        f"cd {directory_1} && flirt -in {file} -ref {directory_name}/{folder_name}/{folder_name}_flair_brain.nii.gz -out {directory_name}/{folder_name}/{folder_name}_t1_to_flair.nii.gz -omat {directory_name}/{folder_name}/{folder_name}_t1_to_flair.mat -bins 256 -cost normcorr -searchrx -180 180 -searchry -180 180 -searchrz -180 180 -dof 7 -interp nearestneighbour")
    if p:
        output = p.read()
        print(output)

Registering the file:  ms_002_tp1_T1_brain.nii.gz

Registering the file:  ms_003_tp1_T1_brain.nii.gz



## So far, we should have the following files and a similar file structure:

<img src="all_file_structure.png" alt="Drawing" style="width: 800px;"/>

## Fourth Step: Generating the Masterfile

Once we have all the files, we will be creating a masterfile.txt which includes all the four files that were highlighted on the previous screenshot.

In [18]:
f = open(f"{open_file}/{directory_name}/masterfile.txt", "w")

for file in flairs:
    
    folder_name = "_".join(file.split('_')[0:3])
    
    if(folder_name != training_folder):

        print(f"{folder_name}/{folder_name}_flair_brain.nii.gz {folder_name}/{folder_name}_t1_to_flair.nii.gz {folder_name}/{folder_name}_flair_brain_to_MNI.mat {folder_name}/{folder_name}_flair_lesion_mask.nii.gz")

        f.write(f"{folder_name}/{folder_name}_flair_brain.nii.gz {folder_name}/{folder_name}_t1_to_flair.nii.gz {folder_name}/{folder_name}_flair_brain_to_MNI.mat {folder_name}/{folder_name}_flair_lesion_mask.nii.gz \n")

# Writing the test subject at the very end to correspond with the Training Command (in the upcoming code blocks)
print(f"{training_folder}/{training_folder}_flair_brain.nii.gz {training_folder}/{training_folder}_t1_to_flair.nii.gz {training_folder}/{training_folder}_flair_brain_to_MNI.mat {training_folder}/{training_folder}_flair_lesion_mask.nii.gz")

f.write(f"{training_folder}/{training_folder}_flair_brain.nii.gz {training_folder}/{training_folder}_t1_to_flair.nii.gz {training_folder}/{training_folder}_flair_brain_to_MNI.mat {training_folder}/{training_folder}_flair_lesion_mask.nii.gz \n")

        
f.close()

ms_002_tp1/ms_002_tp1_flair_brain.nii.gz ms_002_tp1/ms_002_tp1_t1_to_flair.nii.gz ms_002_tp1/ms_002_tp1_flair_brain_to_MNI.mat ms_002_tp1/ms_002_tp1_flair_lesion_mask.nii.gz
ms_003_tp1/ms_003_tp1_flair_brain.nii.gz ms_003_tp1/ms_003_tp1_t1_to_flair.nii.gz ms_003_tp1/ms_003_tp1_flair_brain_to_MNI.mat ms_003_tp1/ms_003_tp1_flair_lesion_mask.nii.gz


### The file should look similar to the following:

Each subject should have their individual entry on each line

<img src="notebook_images/masterfile_img.png" alt="Drawing" style="width: 1000px;"/>

## Fifth Step: Training BIANCA

Now that we have all the data to Train BIANCA, and it is linked in the Masterfile, we will move forward and train BIANCA

In [22]:
training_nums = []
for i in range(len(flairs)):
    training_nums.append(i)

t_nums = ",".join([str(elem+1) for elem in training_nums])
print(t_nums)

1,2


In [21]:
print(
    f"cd {directory_1}/{directory_name} && \
    bianca --singlefile=masterfile.txt --trainingnums={t_nums} --labelfeaturenum=4 \
    --querysubjectnum={len(training_nums)} --brainmaskfeaturenum=1 --featuresubset=1,2 --matfeaturenum=3 \
    --trainingpts=2000 --nonlespts=10000 --selectpts=noborder -o {training_folder}/bianca_output \
    --saveclassifierdata=mytraining -v")

cd /Users/zunairviqar/Desktop/MRI_Work/Jupyter\ Notebooks/test_data/Training-2022-04-25 &&     bianca --singlefile=masterfile.txt --trainingnums=1,2 --labelfeaturenum=4     --querysubjectnum=2 --brainmaskfeaturenum=1 --featuresubset=1,2 --matfeaturenum=3     --trainingpts=2000 --nonlespts=10000 --selectpts=noborder -o ms_003_tp1/bianca_output     --saveclassifierdata=mytraining -v


In [None]:
p = os.popen(
    f"cd {directory_1}/{directory_name} && \
    bianca --singlefile=masterfile.txt --trainingnums={t_nums} --labelfeaturenum=4 \
    --querysubjectnum={len(training_nums)} --brainmaskfeaturenum=1 --featuresubset=1,2 --matfeaturenum=3 \
    --trainingpts=2000 --nonlespts=10000 --selectpts=noborder -o {training_folder}/bianca_output \
    --saveclassifierdata=mytraining -v")
if p:
    output = p.read()
    print(output)