# Secondary Segmentation

This notebook will explore the next set of segmenation options

## Author: Alexander Goudemond, Student Number: 219030365

# Imports


In [1]:
from os import getcwd, walk, mkdir, stat, remove
from os import sep # used later on, in a function, to print directory contents
from os.path import exists, basename, join

from shutil import copyfile

from PIL.Image import fromarray
import cv2

import matplotlib.pyplot as plt
import numpy as np

# Directories for the Processed of Data-Sets

This section of the notebook will find a way to create directories for the images!

The file order of the data-set is important as we have manually segmented and manually tracked pictures, which we do not plan on processing. We need to find a way to generate the 2 processed data-sets without altering this information

An initial option to consider, is generating a list of all the file paths to our images...

This is quite simple, thankfully:

In [2]:
def get_directories(startPath):
    location_array = []
    acceptable_folders = ["\\01", "\\02", "SEG", "TRA"]

    for root, dirs, files in walk(startPath):
        # skip this folder
        if ("OriginalZipped" in root):
            continue

        elif (root[ -3 : ] not in acceptable_folders):
            continue

        location_array.append(root)
    
    return location_array
###

In [3]:
current_directory = getcwd()
desired_directory = "..\\..\\Comp700_Processed_DataSets_1"

In [4]:
path = (current_directory + "\\" + desired_directory)
location_array = get_directories(path)

In [5]:
# first 10
print( location_array[0:10] ) 
print("Number of folders:", len( location_array ) ) 

['c:\\Users\\G5\\Documents\\GitHub\\COMP700\\..\\..\\Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\01', 'c:\\Users\\G5\\Documents\\GitHub\\COMP700\\..\\..\\Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\01_GT\\SEG', 'c:\\Users\\G5\\Documents\\GitHub\\COMP700\\..\\..\\Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\01_GT\\TRA', 'c:\\Users\\G5\\Documents\\GitHub\\COMP700\\..\\..\\Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\01_ST\\SEG', 'c:\\Users\\G5\\Documents\\GitHub\\COMP700\\..\\..\\Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\02', 'c:\\Users\\G5\\Documents\\GitHub\\COMP700\\..\\..\\Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\02_GT\\SEG', 'c:\\Users\\G5\\Documents\\GitHub\\COMP700\\..\\..\\Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\02_GT\\TRA', 'c:\\Users\\G5\\Documents\\GitHub\\COMP700\\..\\..\\Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\02_ST\\SEG', 'c:\\Users\\G5\\Documents\\GitHub\\COMP700\\..\

Great! We can use that variable to generate the locations for our processed images! We just need to replace the keyword "Comp700_DataSets" with our desired folder name, and everything else will follow nicely!

We can further improve the folder readability though, by only keeping the Comp700_DataSets etc. :

In [6]:
def cut_string_array(position, array):
    new_array = []

    for item in array:
        new_array.append( item[position : ])
    
    return new_array
###

In [7]:
position = len(current_directory + "\\..\\..\\")
# print(position)

reduced_location_array = cut_string_array(position, location_array)


In [8]:
# first 10
print(reduced_location_array[ 0 : 10])
print()
print("Number of folders:", len( reduced_location_array ) ) 

['Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\01', 'Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\01_GT\\SEG', 'Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\01_GT\\TRA', 'Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\01_ST\\SEG', 'Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\02', 'Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\02_GT\\SEG', 'Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\02_GT\\TRA', 'Comp700_Processed_DataSets_1\\BF-C2DL-HSC\\BF-C2DL-HSC\\02_ST\\SEG', 'Comp700_Processed_DataSets_1\\BF-C2DL-HSC (1)\\BF-C2DL-HSC (1)\\01', 'Comp700_Processed_DataSets_1\\BF-C2DL-HSC (1)\\BF-C2DL-HSC (1)\\02']

Number of folders: 96


Let's modify the keyword now to our destination folder:

In [9]:
def replace_part_of_array(key_word, new_word, array):
    new_array = []
    temp = ""

    for item in array:
        temp = item.replace(key_word, new_word)
        new_array.append(temp)
    
    return new_array
###

In [10]:
desired_locations = replace_part_of_array("Comp700_Processed_DataSets_1", "Comp700_Segmented", reduced_location_array)


In [11]:
print( desired_locations[0:10] )
print("Number of folders:", len( desired_locations ) ) 

['Comp700_Segmented\\BF-C2DL-HSC\\BF-C2DL-HSC\\01', 'Comp700_Segmented\\BF-C2DL-HSC\\BF-C2DL-HSC\\01_GT\\SEG', 'Comp700_Segmented\\BF-C2DL-HSC\\BF-C2DL-HSC\\01_GT\\TRA', 'Comp700_Segmented\\BF-C2DL-HSC\\BF-C2DL-HSC\\01_ST\\SEG', 'Comp700_Segmented\\BF-C2DL-HSC\\BF-C2DL-HSC\\02', 'Comp700_Segmented\\BF-C2DL-HSC\\BF-C2DL-HSC\\02_GT\\SEG', 'Comp700_Segmented\\BF-C2DL-HSC\\BF-C2DL-HSC\\02_GT\\TRA', 'Comp700_Segmented\\BF-C2DL-HSC\\BF-C2DL-HSC\\02_ST\\SEG', 'Comp700_Segmented\\BF-C2DL-HSC (1)\\BF-C2DL-HSC (1)\\01', 'Comp700_Segmented\\BF-C2DL-HSC (1)\\BF-C2DL-HSC (1)\\02']
Number of folders: 96


Okay! We now have a variable containing the folder locations! We can now define some functions to validate all directories exist

In [12]:
# create directory for work we create
def tryMakeDirectory(current_directory, destination_directory):
    try:
        # join comes from os.path
        mkdir( join(current_directory, destination_directory) )
    except FileExistsError:
        # print("Folder already exists!")
        pass
    except:
        print("Unknown Error Encountered...")
###

def createBulkDirectories(current_directory, array):
    sub_folders = []
    path = "..\\..\\"

    for item in array:
        sub_folders = item.split("\\")
        # print(sub_folders)

        for folder in sub_folders:
            path += folder
            tryMakeDirectory(current_directory, path)
            path += "\\"
        
        # reset
        path = "..\\..\\"

    print("Done!")
###

In [13]:
createBulkDirectories(current_directory, desired_locations)

Done!


# Data-Set Segmentation

This section of the notebook focusses on processing the entire data-set, following the methods found in 005 for 1103_10 and 1103_11

We are going to take advantage of the Thresholding found with OpenCV - specifically the mask value of 17. Let's create a function to do that processing for us:

In [14]:
def opencvThresh(img, value=17):
    ret, thresh = cv2.threshold(img, 0, 255, value)

    return thresh
###

# used to make the segmented values visible, by saving via matplotlib
def getImage(filePath):
    img = plt.imread(filePath) 
    plt.imsave("temp.jpg", img, cmap="gray") # desired colourmap for us
    img = cv2.imread( "temp.jpg", cv2.IMREAD_GRAYSCALE)

    return img
###

# process choice influences processOne or processTwo
def bulkProcess(current_directory, original_dataset, location_array):
    kernel = np.ones((3,3), np.uint8)
    counter = 0
    valid_folders = ["01", "02", "SEG", "TRA"]

    name = "segmented_"

    # go to the original_dataset
    path = walk(current_directory + "\\" + original_dataset)
    
    print("Starting...")

    for root, dirs, files in path:
        # skip zipped files
        if ("OriginalZipped" in root):
            continue
        # end loop because locations exhausted
        elif (counter >= len(location_array)):
            break

        # print(root)

        for item in files:
            # manual info, simply copy as is
            if ("man_" in item):
                # print("Counter:", counter)
                img_path = current_directory + "\\..\\..\\" + location_array[counter] + "\\" +  item
                # print(img_path)

                # handle text files
                if (".txt" in item):
                    copyfile(root + "\\" + item, img_path)
                else:
                    # print("EISH")
                    img = getImage(root + "\\" + item)
                    cv2.imwrite(img_path, img)

                
            # stop working, zipped files found
            elif (".zip" in item):
                break
            else:
                # print("Nope")

                img = getImage(root + "\\" + item)

                processed_pic = opencvThresh(img)

                # print("Counter:", counter)
                img_path = current_directory + "\\..\\..\\" + location_array[counter] + "\\" + name + item
                # print(img_path)

                cv2.imwrite(img_path, processed_pic)

            # remove later
            # break
        
        # update counter
        if (basename(root) in valid_folders):
            counter += 1
    
    # remove at end
    if (exists("temp.jpg")):
        remove("temp.jpg")
    
    print("Finished...")
### 

def getFileQuantities(path):
    count = 0
    size_array = []
    valid_folders = ["01", "02", "SEG", "TRA"]

    for root, dirs, files in walk(path):
        count = 0

        for file in files:
            count += 1
        
        if (basename(root) in valid_folders):
            size_array.append(count)
    
    return size_array
###

In [15]:
original_sizes = getFileQuantities( current_directory + "\\" + "..\\..\\Comp700_DataSets" )

In [16]:
original_sizes

[1764,
 49,
 1765,
 1764,
 1764,
 8,
 1765,
 1764,
 1763,
 1763,
 1375,
 50,
 1377,
 1376,
 1376,
 50,
 1377,
 1376,
 1376,
 1375,
 84,
 9,
 85,
 84,
 84,
 9,
 85,
 84,
 115,
 115,
 30,
 8,
 31,
 30,
 5,
 31,
 30,
 30,
 48,
 18,
 49,
 48,
 48,
 33,
 49,
 48,
 48,
 48,
 92,
 30,
 93,
 92,
 92,
 20,
 93,
 92,
 92,
 92,
 65,
 65,
 66,
 150,
 150,
 151,
 110,
 138,
 92,
 28,
 93,
 92,
 92,
 8,
 93,
 92,
 92,
 92,
 115,
 15,
 116,
 115,
 115,
 19,
 116,
 115,
 115,
 115,
 300,
 2,
 301,
 300,
 300,
 2,
 301,
 300,
 300,
 300]

In [17]:
segmented_sizes = getFileQuantities( current_directory + "\\" + "..\\..\\Comp700_Segmented" )

segmented_sizes

[0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0]

In [20]:
if (original_sizes == segmented_sizes):
    print("True")
else:
    print("False")
    print("\nGenerating now")
    bulkProcess(current_directory, "..\\..\\Comp700_Processed_DataSets_1", desired_locations)

False

Generating now
Starting...
Finished...
