# Analysis of the data set

The aim is to determine the maximum number of objects with different identifications within the data set.

The annotations are masks that are used for semantic segmentation. It is the targets that are needed to train a neural network. These masks mark objects in relation to an original image. To differentiate between the objects, each mask object has an object-specific identification number (class number) with consecutive numbering starting with 1. The background itself is given the number 0.

The object classification by means of a neural network (semantic instance segmentation) requires the number of maximum output channels as a parameter. The mask objects provide information about this.

In this project section, the mask images are analyzed in order to determine the maximum number of possible object types.

## Libraries, modules, classes

In [None]:
# library, module
from skimage.color import rgb2gray
import numpy as np
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
from scipy import ndimage
import pathlib
import glob
import os
from skimage.io import imread
import importlib # zum aktualisieren von Bibliotheken
from skimage.transform import resize
# --------------------------
from configuration_QU import Path   as PATH   # Pfade und Dateinamen

## Constants

In [None]:
# Veranlasst Ausgaben 
VERBOSE=True

## Commands to download the dataset

In [None]:
if not os.path.exists('Warwick QU Dataset (Released 2016_07_08)'):
    !wget https://warwick.ac.uk/fac/sci/dcs/research/tia/glascontest/download/warwick_qu_dataset_released_2016_07_08.zip
    !unzip warwick_qu_dataset_released_2016_07_08.zip     

## Create file names as a list
A path for the images and a path for the associated masks are selected.

In [None]:
path=PATH() # Instanz der Klasse für Methodenaufruf erforderlich
path_images=path.trainimages
path_masks=path.trainmasks
imgfilter= ['train_??.bmp',       'testA_??.bmp',     'testB_??.bmp']
maskfilter=['train_??_anno.bmp', 'testA_??_anno.bmp', 'testB_??_anno.bmp']
dataset=0
# input and target files -TRAIN[0], TESTA[1], TESTB[2]
image_filenames = path.get_filenames(path_images, dateifilter= imgfilter[dataset],  sort=True)
mask_filenames  = path.get_filenames(path_masks,  dateifilter= maskfilter[dataset], sort=True)
if VERBOSE:
    print(path_images)
    print(path_masks)

In [None]:
print('Anzahl der Bild-Masken-Paare: ',len(image_filenames),' : ', len(mask_filenames))
if VERBOSE:
    for idx in range(len(image_filenames)):
        # print(os.path.basename(images_names[idx]),' | ', os.path.basename(targets_names[idx]))
        print(idx, ' | ', os.path.basename(image_filenames[idx]),'\t\t-> ', os.path.basename(mask_filenames[idx]))

## Loading the images and the associated masks from the directories

In [None]:
# read images and store them in memory
images = [imread(img_name) for img_name in image_filenames]
masks = [imread(mask_name) for mask_name in mask_filenames]

In [None]:
if VERBOSE:    
    print('path_images: ', path_images)
    print('path_masks : ', path_masks)
    # print(targets_names)
    print('Anzahl der Bilder: inputs {}, targets {} '.format(len(images), len(masks)))
    print('Shape of origin: inputs {}, targets {} '.format(images[0].shape, masks[0].shape))

## Determination of the number of objects in a mask

In [None]:
max_class_of_all=0
max_idx=[]
for i in range(len(masks)):
    max_idx.append(masks[i].max())
print(max_idx)

In [None]:
max_idx_all=max(max_idx)
idx_of_max= np.argmax(max_idx)
print('Maximum number of classes or number of object IDs:\n Max= {}[idx={}]: '.
      format(max_idx_all,idx_of_max ))    

## Visualization

##### Histograms

In [None]:
from MODULE.JH.visualize import Show as SHOW
show=SHOW(experiment='', figsize=(60,70), fontsize=16)

In [None]:
#histogramms=[]
histogramm=[show.histogramm(mask,f'(7-1)_Summ_histogram_{maskfilter[dataset][:5]}',\
                            ordinate='Frequency, Häufigkeit H', relativ=False)\
            for mask in masks]

In [None]:
histogramm=show.histogramm(masks[idx_of_max],f'(7-1)_Histogramm_{maskfilter[dataset][:5]}_({idx_of_max})',\
                            ordinate='Relative Häufigkeit h', relativ=True)

#### Images and masks

In [None]:
show=SHOW(experiment='', figsize=(40,18), fontsize=40)
idx_list=[10, 20, 32]
path_set= f'(7-1)_images_masks_listset_{maskfilter[dataset][:5]}_{str(idx_list)}.png'
listset=list([images, masks])
titles=list(['Image', 'Actual masks'])
show.list_set(idx_list, listset,titles, path=path_set)

In [None]:
print('ENDE')