Q1:

The given images with pixel data is similar to a data being represented in an array. We can use either "Nested Lists" or "Arrays" to store this information. But using "Nested Lists" would cost a lot of storage, especially for the given large image data (100,000 x 100,000 pixels), as lists can store data of multiple datatypes. While in our case, we want to store either of the 2 values representing each pixel, i.e., "Black" for the blob region that is parasite body or the dye region and "White" for outside the blob or dye region. This translates to usage of boolean values, where we can use "Black" as "1" or "True" and "White" as "0" or "False". So, we can use "Numpy Array" to represent the given images, more spcifically a "Boolean Array" since we are only using 2 values, rather than using an "Integer Array", since "Boolean Array" consume less space than "Integer Array" (Boolean array uses 1 Byte of memory per element, and Integer array uses 8 Bytes of memory per element).

Q2:

Below is the code to generate fake simulated images with the above chosen data structure "Numpy Arrays".

In [13]:
import numpy as np

# Image resolution in pixels, i.e. number of rows and columns
pixelRows = 1000
pixelCols = 1000

# Function to generate fake simulated image data
def gen_fakeImagePixelData(p, imgType):
    # Total number of pixels in the fake image data
    N = pixelRows * pixelCols
    # Generate a random variable "K" which is greater than "p%" of "N", 
    # so that the generated fake image has "p%" of "1"s or "0"s as needed.
    # This is needed, as we want atleast "25%" of the image to be parasite body
    # and similarly for the dye region in the image.
    K = np.random.randint(p * N, N + 1)
    if (imgType == "parasite"):
      # When image is parasite then the number of "1"s are greater than p% of N
      # "True" represents "1"s and "False" represents "0"s
      randPixels = np.array([True] * K + [False] * (N-K))
    else:
      # When it is dye image then the number of "0"s are greater than p% of N
      # "True" represents "1"s and "False" represents "0"s
      randPixels = np.array([False] * K + [True] * (N-K))
    # Shuffling to randomize the image data generated
    np.random.shuffle(randPixels)
    # Re-shaping the 1D vector into the required 2D matrix
    fakeImgPixels = np.reshape(randPixels, (pixelRows, pixelCols))

    return fakeImgPixels


Q3:

Below is the code to identify if the generated fake image has cancer or not and to compute the percentage of parasites which have cancer in the total sample collection.

In [14]:
# Function to calculate the percentage of parasites which have cancer
def findCancerPercetage():
    parasiteCancer = False

    # Obtain fake parasite image
    parasitePixel = gen_fakeImagePixelData(p = 0.25, imgType = "parasite")
    # Count the number of 1s to calculate the parasite body to the image ratio
    parasitePixelOnesCount = np.count_nonzero(parasitePixel)
    # Parasite body percentage
    parasiteBodyPercentage = parasitePixelOnesCount/(pixelRows * pixelCols)

    # Obtain fake dye image
    dyePixel = gen_fakeImagePixelData(p = 0.905, imgType = "dye")

    # Obtain the cancer regions inside the parasite body, 
    # so here we use "Hadamard Product" of both the images as it gives us 
    # the dye present only inside the parasite body and discards the 
    # dye leaked outside the body, as outer regions are 0s.
    combinedPixelImage = np.multiply(parasitePixel, dyePixel)
    # Count number of 1s to obtain the dye present inside the parasite body
    cancerPixelOnesCount = np.count_nonzero(combinedPixelImage)
    # Cancer percentage inside the parasite body
    cancerPercentage = cancerPixelOnesCount / parasitePixelOnesCount

    # Check if the parasite has cancer
    # If the dye % in the parasite body is > 10% then it has cancer, else no
    if (cancerPercentage > 0.1):
      parasiteCancer = True
    else:
      parasiteCancer = False

    return parasiteCancer

# Initializing the counter to 0 to count number of parasites have cancer
cancerPercentageCounter = 0
# Number of samples being analyzed
samples = 1000
# Loop to check the number of parasites which have cancer
for i in range(samples):
    if (findCancerPercetage() == True):
        # Incrementing the counter by 1, if the parasite has cancer
        cancerPercentageCounter = cancerPercentageCounter + 1

# Calculate the percentage
cancerPercentage = ((cancerPercentageCounter / samples) * 100)
print("Percentage of parasites which have cancer:", cancerPercentage, "%")


Percentage of parasites which have cancer: 0.0 %


Q4:

Considering the large size of the image data (100,000 x 100,000 pixels), it would take a lot of time to process the data just using cpu. So to speed up the computation we could use tensor array and gpu for processing. Below is the code with the usage of tensor array and gpu.

For 1000 samples of 1000 x 1000 pixel image example I have considered, with the use of Numpy Arrays", it took approximately 2 mins 27 sec to calculate the "Percentage of parasites which have cancer", but with the use of "Tensor Array and GPU" the compuation was done in less than a second.

In [17]:
import torch
import numpy as np

# Using GPU for faster processing
cuda = torch.device('cuda')

# Image resolution in pixels, i.e. number of rows and columns
pixelRows = 1000
pixelCols = 1000

# Function to generate fake simulated image data
def gen_fakeImagePixelDataTensor():
    # Generate a tensor type 2D array, which has elemnts as true or false 
    # representing either parasite body or surrounding space.
    # Here its compared with greater than or equal to "0.25", as we need 
    # minimum "25%" of the image to be the parasite body
    parasitePixel = torch.rand((pixelRows, pixelCols), device = cuda) >= 0.25

    # Generate a tensor type 2D array, which has elemnts as true or false 
    # representing either dye or no dye region
    # Here we are generating a dye image with approx 10% of region with dye
    dyePixel = torch.rand((pixelRows, pixelCols), device = cuda) > 0.90095

    return (parasitePixel, dyePixel)

# Function to calculate the percentage of parasites which have cancer
def findCancerPercetageTensor():
    parasiteCancer = False

    # Obtain fake parasite and dye image
    parasitePixel, dyePixel = gen_fakeImagePixelDataTensor()

    # Count the number of 1s to calculate the parasite body to the image ratio
    parasitePixelOnesCount = parasitePixel.eq(1).sum()
    # Parasite body percentage
    parasiteBodyPercentage = parasitePixelOnesCount/(pixelRows * pixelCols)

    # Obtain the cancer regions inside the parasite body, 
    # so here we use "Hadamard Product" of both the images as it gives us 
    # the dye present only inside the parasite body and discards the 
    # dye leaked outside the body, as outer regions are 0s.
    combinedPixelImage = torch.mul(parasitePixel, dyePixel)
    # Count number of 1s to obtain the dye present inside the parasite body
    cancerPixelOnesCount = combinedPixelImage.eq(1).sum()
    # Cancer percentage inside the parasite body
    cancerPercentage = cancerPixelOnesCount / parasitePixelOnesCount

    # Check if the parasite has cancer
    # If the dye % in the parasite body is > 10% then it has cancer, else no
    if (cancerPercentage > 0.1):
      parasiteCancer = True
    else:
      parasiteCancer = False

    return parasiteCancer

# Initializing the counter to 0 to count number of parasites have cancer
cancerPercentageCounter = 0
# Number of samples being analyzed
samples = 1000
# Loop to check the number of parasites which have cancer
for i in range(samples):
    if (findCancerPercetageTensor() == True):
        # Incrementing the counter by 1, if the parasite has cancer
        cancerPercentageCounter = cancerPercentageCounter + 1

# Calculate the percentage
cancerPercentage = ((cancerPercentageCounter / samples) * 100)
print("Percentage of parasites which have cancer:", cancerPercentage, "%")


Percentage of parasites which have cancer: 0.1 %
