## Extract a sub-section
The reason for this is that training on a full MRI scan would be too memory intensive to be practical. 

In [1]:
import numpy as np
import tensorflow.keras as keras
import pandas as pd

In [2]:
# simple one dimensional "image" to extract from
image = np.array([10,11,12,13,14,15])
image

array([10, 11, 12, 13, 14, 15])

In [3]:
# Compute the dimensions "image"
image_length = image.shape[0]
image_length

6

## Sub-sections
Define a "patch size" in three dimensions, that will be the size of the sub-section we want to extract. Here, we only need to define a patch size in one dimension.

In [4]:
# patch length, which will be the size of your extracted sub-section
patch_length = 3

To extract a patch of length patch_length we will first define an index at which to start the patch.


In [5]:
# start index
start_i = 0

In [6]:
# An end index given your start index and patch size
print(f"start index {start_i}")
end_i = start_i + patch_length
print(f"end index {end_i}")

# Extract a sub-section from your "image"
sub_section = image[start_i: end_i]
print("output patch length: ", len(sub_section))
print("output patch array: ", sub_section)

# Add one to your start index
start_i +=1

start index 0
end index 3
output patch length:  3
output patch array:  [10 11 12]


Neural network will be expecting a particular sub-section size and will not accept inputs of other dimensions. For the start indices, we will be randomly choosing values and we need to ensure that your random number generator is set up to avoid the edges of  image object.


In [7]:
# start index to 3 to extract a valid patch
start_i = 3
print(f"start index {start_i}")
end_i = start_i + patch_length
print(f"end index {end_i}")
sub_section = image[start_i: end_i]
print("output patch array: ", sub_section)

start index 3
end index 6
output patch array:  [13 14 15]


In [9]:
#print the largest valid value for start index
print(f"The largest start index for which "
      f"a sub section is still valid is "
      f"{image_length - patch_length}")

The largest start index for which a sub section is still valid is 3


In [10]:
# print the range of valid start indices
print(f"The range of valid start indices is:")

# Compute valid start indices, note the range() function excludes the upper bound
valid_start_i = [i for i in range(image_length - patch_length + 1)]
print(valid_start_i)

The range of valid start indices is:
[0, 1, 2, 3]


## Random selection of start indices
we need to randomly select a valid integer for the start index in each of three dimensions. The way to do this is by following the logic above to identify valid start indices and then selecting randomly from that range of valid numbers.


In [11]:
#  random start index,np.random.randint() function excludes the upper bound.
start_i = np.random.randint(image_length - patch_length + 1)
print(f"randomly selected start index {start_i}")

randomly selected start index 3


In [12]:
# Randomly select multiple start indices in a loop
for _ in range(10):
    start_i = np.random.randint(image_length - patch_length + 1)
    print(f"randomly selected start index {start_i}")

randomly selected start index 0
randomly selected start index 2
randomly selected start index 1
randomly selected start index 0
randomly selected start index 0
randomly selected start index 1
randomly selected start index 2
randomly selected start index 3
randomly selected start index 1
randomly selected start index 1


## Background Ratio
Another thing  is to compute the ratio of background to edema and tumorous regions. We will be having MRIs with following categories:

0: background
1: edema
2: non-enhancing tumor
3: enhancing tumor

In [15]:
# A straightforward approach to get the background ratio is
# to count the number of 0's and divide by the patch length
patch_labels = np.random.randint(0,4, (16))

bgrd_ratio = np.count_nonzero(patch_labels == 0) / len(patch_labels)
print("using np.count_nonzero(): ", bgrd_ratio)

bgrd_ratio = len(np.where(patch_labels == 0)[0]) / len(patch_labels)
print("using np.where(): ", bgrd_ratio)

using np.count_nonzero():  0.1875
using np.where():  0.1875


In [16]:
# However, we'll use our label array to train a neural network
# so we can opt to compute the ratio a bit later after we do some preprocessing. 
# First, we convert the label's categories into one-hot format so it can be used to train the model

patch_labels_one_hot = keras.utils.to_categorical(patch_labels, num_classes=4)
print(patch_labels_one_hot)

[[0. 1. 0. 0.]
 [0. 0. 0. 1.]
 [0. 1. 0. 0.]
 [0. 0. 0. 1.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 0. 1.]
 [0. 0. 0. 1.]
 [1. 0. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]
 [1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]]


In [17]:


pd.DataFrame(patch_labels_one_hot, columns=['background', 'edema', 'non-enhancing tumor', 'enhancing tumor'])

Unnamed: 0,background,edema,non-enhancing tumor,enhancing tumor
0,0.0,1.0,0.0,0.0
1,0.0,0.0,0.0,1.0
2,0.0,1.0,0.0,0.0
3,0.0,0.0,0.0,1.0
4,0.0,1.0,0.0,0.0
5,0.0,0.0,1.0,0.0
6,1.0,0.0,0.0,0.0
7,0.0,1.0,0.0,0.0
8,0.0,0.0,0.0,1.0
9,0.0,0.0,0.0,1.0


In [18]:
# What we're interested in is the first column because that 
# indicates if the element is part of the background
# In this case, 1 = background, 0 = non-background

print("background column: ", patch_labels_one_hot[:,0])

background column:  [0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 0. 0.]


In [19]:
# we can compute the background ratio by counting the number of 1's 
# in the said column divided by the length of the patch

bgrd_ratio = np.sum(patch_labels_one_hot[:,0])/ len(patch_labels)
print("using one-hot column: ", bgrd_ratio)

using one-hot column:  0.1875


# Now we will build U-Net model our project 