# 5-01 Analysis of the data set

The aim is to determine the maximum number of objects with different identifications within the data set.

The annotations are masks that are used for semantic segmentation. It is the targets that are needed to train a neural network. These masks mark objects in relation to an original image. To differentiate between the objects, each mask object has an object-specific identification number (class number) with consecutive numbering starting with 1. The background itself is given the number 0.

The object classification by means of a neural network (semantic instance segmentation) requires the number of maximum output channels as a parameter. The mask objects provide information about this.

In this project section, the mask images are analyzed in order to determine the maximum number of possible object types.

## Libraries, modules, classes

In [None]:
import numpy as np
import os
from skimage.io import imread
# JH - Jonas Heinke
from configuration_CRAG import Path   as PATH # Paths and filenames
from MODULE.JH.visualize import Show as SHOW

## Constants

In [None]:
# for display only
VERBOSE=True

**Text fett markieren**## 1. Generate file names as a list
A path for the images and a path for the associated masks are selected.

In [None]:
path=PATH() # Instance of the class required for method call
path_images=[path.trainimages, path.testimages]
path_masks= [path.trainmasks, path.testmasks]
dataset=0
# input and target files
image_filenames  = path.get_filenames(path_images[dataset] ,dateifilter= '*.png', sort=True)
mask_filenames = path.get_filenames(path_masks[dataset], dateifilter= '*.png', sort=True)

In [None]:
print('Number of image-mask pairs:: ',len(image_filenames),' : ', len(mask_filenames))
if VERBOSE:
    for idx in range(len(image_filenames)):
        print(idx, ' | ', os.path.basename(image_filenames[idx]),'\t-> ', os.path.basename(mask_filenames[idx]))

##  2. Loading the images and the associated masks from the directories

In [None]:
# read images and store them in memory
images = [imread(img_name) for img_name in image_filenames]
masks = [imread(mask_name) for mask_name in mask_filenames]

In [None]:
if VERBOSE:    
    print('path_images: ', path_images)
    print('path_masks : ', path_masks)
    print('Anzahl der Bilder: inputs {}, targets {} '.format(len(images), len(masks)))
    print('Shape of origin: inputs {}, targets {} '.format(images[0].shape, masks[0].shape))

## 3. Determination of the number of objects in a mask

In [None]:
max_class_of_all=0
id
max_idx=[]
for i in range(len(masks)):
    max_idx.append(masks[i].max())
if VERBOSE:
    print([f'{max_idx[i]}[{i}]' for i in range(len(max_idx))])

In [None]:
max_idx_all=max(max_idx)
idx_of_max= np.argmax(max_idx)
print('Maximum number of classes or number of object IDs:\n Max= {}[idx={}]: '.
      format(max_idx_all,idx_of_max ))    

## 4. Visualization

#### Histograms

In [None]:
#+ from MODULE.JH.visualize import Show as SHOW
show=SHOW(experiment='', figsize=(60,70), fontsize=16)

In [None]:
#histogramms=[]
histogramm=[show.histogramm(mask,f'(5-1)_Sum_histogram__{dataset}',\
                            ordinate='Frequency, Häufigkeit H', relativ=False)\
            for mask in masks]

In [None]:
histogramm=show.histogramm(masks[idx_of_max],f'(5-1)_histogram_{dataset}_({idx_of_max})',\
                            ordinate='Relative frequency, Häufigkeit h', relativ=True)

#### Images and masks

In [None]:
show=SHOW(experiment='', figsize=(40,20), fontsize=40)
idx_list=[14, 2, 165]
path_set= f'(5-1)_images_masks_listset_{dataset}_{str(idx_list)}.png'
listset=list([images, masks])
titles=list(['Image', 'Actual mask'])
show.list_set(idx_list, listset,titles, path=path_set)

In [None]:
print('END')