# Heerlen-Aachen Annotated Steel Microstructure Dataset 

This notebook contains the accompanying code for the Heerlen-Aachen Annotated Steel Microstructure Dataset. 
The dataset was created by Center for Actionable Resarch of Open Universiteit (CAROU) and the Steel Institute of RWTH Aachen University.

The dataset contains expert-annotated microstructures of steel that are visible on microscopy images. Two levels of annotations are provided. First, the point-of-interest (POI), that are the coordinates of points within specific microstructures. Secondly, the polygons that are the segments around the boundries of these structures.

This particular notebook guides the reader through the creation of a computer-vision baseline model that is able to automatically detect segments (contours) around the indicated blobs. It also includes the code that measures the accuracy of the models using the Intersection-over-Union (IOU) metric. The purpose is to exemplify and guide the training of more sophisticated machine learning segmentation models and to show how to measure their accuracy.

Contact: Deniz Iren, PhD (deniz.iren@ou.nl | deniziren@gmail.com)

In [10]:
## IMAGE FILES
path_folder_images_png = 'D:/Aachen Steel Image/UPLOAD/PNG/'
path_folder_images_tiff = 'D:/Aachen Steel Image/UPLOAD/TIFF/'

## STEEL SAMPLE METADATA
path_file_metadata = "nature_scidata_steel_metadata.csv"

## POI ANNOTATIONS
path_file_annotations_POIcsv = "nature_scidata_heerlen_aachen_steel_annotations_POI.csv"

## POLYGON ANNOTATIONS 
path_file_annotations_POIs_polygons_pickle = 'nature_scidata_heerlen_aachen_steel_annotations_polygon.pickle'
path_file_annotations_POIs_polygons = 'nature_scidata_heerlen_aachen_steel_annotations_polygon.csv'

## MORPHOLOGICAL FEATURES
path_file_annotations_morphology = 'nature_scidata_heerlen_aachen_steel_morph.pickle'

## CALCULATED CONTOURS
path_file_AnnotationsPOIsAndPolygonsShapely = 'nature_scidata_dfPOIPolygonContourShapely.pickle'
path_file_Evaluations = 'nature_scidata_dfEvaluation.pickle'

___
## Contour Detection around POIs

This part of the code automatically detects contours around the expert-annotated POIs. The purpose of this approach is to create a baseline method for contour detection. 
Doubtlessly, the baseline method does not provide perfect results. A significant number of segments are missing because around 2000 contours could not be detected by the baseline method. 

Any usable machine learning method must overperform this baseline.

In [11]:
import pandas as pd 
import pickle

with open(path_file_annotations_morphology, 'rb') as handle:
    dfPOIPolyShapely = pickle.load(handle)

In [12]:
dfPOIPolyShapely.head()
dfPOIPolyShapely.columns

Index(['image_url', 'point', 'polygon', 'point_shapely', 'poly_shapely',
       'polygon_area', 'polygon_area_metric', 'polygon_perimeter',
       'polygon_perimeter_metric', 'aspect_ratio', 'height', 'width',
       'polygon_compactness', 'rotation_angle_polygon'],
      dtype='object')

In [14]:
dfPOIPolyShapely[['polygon_area', 'polygon_area_metric', 'polygon_perimeter',
       'polygon_perimeter_metric', 'aspect_ratio', 'height', 'width',
       'polygon_compactness', 'rotation_angle_polygon']].describe()

Unnamed: 0,polygon_area,polygon_area_metric,polygon_perimeter,polygon_perimeter_metric,aspect_ratio,height,width,polygon_compactness,rotation_angle_polygon
count,8909.0,8909.0,8909.0,8909.0,8909.0,8909.0,8909.0,8909.0,8909.0
mean,11135.883611,1.927958,562.622393,7.402926,1.030346,139.830466,155.323942,0.456448,92.834373
std,13058.812088,2.260875,392.687224,5.166937,0.545767,83.531001,97.519387,0.163137,49.552412
min,324.0,0.056094,69.229688,0.910917,0.119403,21.0,17.0,0.057774,0.172059
25%,4341.5,0.751645,321.600438,4.231585,0.656051,85.0,91.0,0.331784,54.865807
50%,7248.0,1.254848,451.723946,5.943736,0.90873,118.0,129.0,0.44843,92.862405
75%,13083.5,2.265149,677.270318,8.911452,1.255102,171.0,189.0,0.574021,131.227716
max,205015.0,35.494287,4477.596563,58.915744,6.481481,766.0,857.0,0.925462,180.0


In [15]:
dir = path_folder_images_png
from shapely.geometry import Polygon, Point
import pandas as pd
from carou.annotation.spatial.display import processImage

def processImagesAndFindContours(dir, dfPolyPoi, settings = []):
    import cv2
    '''Takes the directory of the images and the dataframe of points and polygons as inputs. 
        Traverses the dataframe and reads all images listed as URLs in there. 
        Processes the image.
        Finds Contours on the processed image.
        Returns a dataframe that is enriched with the following columns: ['point_shapely', 'poly_shapely', 'contour_polygon_shapely'] '''
    
    ## Default settings. 
    if len(settings) == 0: 
        histogramEqualization='NONE'
        gaussianBlur=True
        gaussianBlurKernelSize=(11,11)
        erodeIterations=4
        dilateIterations=2
    else:
        histogramEqualization = settings[0]
        erodeIterations = settings[1]
        dilateIterations = settings[2]
        gaussianBlur = settings[3]
        gaussianBlurKernelSize = settings[4]
    
    listOfAll = []
    for index, row in dfPolyPoi.iterrows():
        img_name = row['image_url']
        baz_imge_cv = cv2.imread(dir + img_name, 0)
        processedImage = processImage(baz_imge_cv.copy(),histogramEqualization, gaussianBlur, gaussianBlurKernelSize, erodeIterations, dilateIterations)
        
        #ret_img, point_shapely, poly_shapely, contour_polygon_shapely = findContourAroundPoint(baz_imge_cv, processedImage, row['point'], row['polygon'])
        #listOfAll.append([img_name, row['original_url'], row['point'], row['polygon'], point_shapely, poly_shapely, contour_polygon_shapely])
        ret_img, contour_polygon_shapely = findContourAroundPoint(baz_imge_cv, processedImage, row['point'])
        listOfAll.append([img_name, row['point'], row['polygon'], row['point_shapely'], row['poly_shapely'], contour_polygon_shapely])

    return listOfAll

def findContourAroundPoint(base_image, hayStackImage, point_tuple):
    '''Takes a base_image, hayStackImage, and a point tuple. 
        Finds the contours on hayStackImage (which is the enhanced, preprocessed image). Draws the polygons on base_image. 
        Returns the anotated image and the detected contours'''
    from shapely.geometry import Polygon, Point
    import cv2
    import numpy as np
    
    
    #point_shapely = Point(point_tuple)
    #print(point_tuple, point_tuple[0], point_tuple[1], point_shapely)
    #poly_shapely = Polygon(poly_tuple_list)
    #if poly_shapely.is_valid:
    #    poly_shapely = poly_shapely
    #else:
    #    poly_shapely = poly_shapely.buffer(0) 

    contours, hierarchy = cv2.findContours(hayStackImage,cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)
    
    ret_img = base_image.copy()
    contour_polygon_shapely = None
    for cx in contours:

        dist= cv2.pointPolygonTest(cx, point_tuple, False)
        if dist > 0:  # if the point is inside the contour
            ret_img = cv2.drawContours(ret_img, [cx], 0, (0,255,0), 2)

            myCont = np.squeeze(cx)
            myPoly = Polygon(myCont)

            if myPoly.is_valid:
                contour_polygon_shapely = myPoly
            else:
                contour_polygon_shapely = myPoly.buffer(0)
        
    #return ret_img, point_shapely, poly_shapely, contour_polygon_shapely
    return ret_img, contour_polygon_shapely

In [16]:
## Create Shapely representation of expert-annotated 'point' and 'polygon'. Then use the baseline automated contour detector to draw segments around the objects that contain the 'point'

dir = path_folder_images_png
settings = ['EQUALIZE', 6, 6, True, (7, 7)] # Optimal settings

dfPOIPolyShapely_sub = dfPOIPolyShapely.copy()
dfPOIPolyShapely_sub.head()

dfPOIPolyShapely_sub['contour_shapely'] = None
dfPOIPolyShapely_sub.head()
calculatedList = processImagesAndFindContours(dir, dfPOIPolyShapely_sub, settings)

len(calculatedList)

dfShapelyPOIPolyContours = pd.DataFrame(calculatedList, columns=['image_url', 'point', 'polygon', 'point_shapely', 'poly_shapely', 'contour_polygon_shapely'])
print(len(dfShapelyPOIPolyContours))
dfShapelyPOIPolyContours.head()

8909


Unnamed: 0,image_url,point,polygon,point_shapely,poly_shapely,contour_polygon_shapely
0,IMG_01457.png,"(627, 319)","[(720, 288), (725, 294), (731, 300), (737, 302...",POINT (627 319),"POLYGON ((720 288, 725 294, 731 300, 737 302, ...","POLYGON ((661 286, 660 287, 660 288, 660 289, ..."
1,IMG_01457.png,"(820, 288)","[(823, 268), (835, 271), (856, 276), (898, 281...",POINT (820 288),"POLYGON ((823 268, 835 271, 856 276, 898 281, ...","POLYGON ((796 249, 795 250, 794 250, 793 250, ..."
2,IMG_01457.png,"(881, 523)","[(871, 542), (859, 527), (848, 526), (858, 509...",POINT (881 523),"POLYGON ((871 542, 859 527, 848 526, 858 509, ...","POLYGON ((861 510, 860 511, 859 512, 858 513, ..."
3,IMG_01457.png,"(477, 353)","[(427, 358), (479, 342), (522, 329), (523, 342...",POINT (477 353),"POLYGON ((427 358, 479 342, 522 329, 523 342, ...","POLYGON ((492 336, 491 337, 490 337, 489 338, ..."
4,IMG_01457.png,"(520, 737)","[(511, 717), (556, 721), (556, 729), (536, 734...",POINT (520 737),"POLYGON ((511 717, 556 721, 556 729, 536 734, ...","POLYGON ((521 714, 520 715, 520 716, 519 717, ..."


In [26]:
dfShapelyPOIPolyContours.columns


Index(['image_url', 'point', 'polygon', 'point_shapely', 'poly_shapely',
       'contour_polygon_shapely'],
      dtype='object')

In [27]:
## Dump the new dataframe as pickle. The following code is commented-out to avoid execution by mistake. Please, uncomment the code statement below and execute if needed.
import pickle
with open(path_file_AnnotationsPOIsAndPolygonsShapely, 'wb') as handle:
    pickle.dump(dfShapelyPOIPolyContours, handle, protocol=pickle.HIGHEST_PROTOCOL)
print('Done')

Done


### Evaluation Method for Models 

This exemplifies how to evaluate future machine-learning segmentation models. Here, we evaluate the baseline method using Intersrction-over-Union (IOU).  

In [18]:
def calculateIOU(dfShapely):
    from shapely.geometry import Polygon, Point
    from carou.annotation.spatial import calculation

    retList = []
    for index, row in dfShapely.iterrows(): 
        goldPolygon = row['poly_shapely']
        contour = row['contour_polygon_shapely']
        dice, goldArea, contourArea = 0, 0, 0
        if contour is not None:
            if goldPolygon.is_valid and contour.is_valid:
                goldArea, contourArea, intersectionArea, unionArea, dice = calculation.polygonCompareAgainstGold(contour, goldPolygon)
            else:
                print("invalid shapely object: ", goldPolygon, contour)
                goldArea, x_contourArea, x_intersectionArea, x_unionArea, x_dice = calculation.polygonCompareAgainstGold(goldPolygon, goldPolygon)
                dice = 0
                
        else:
            goldArea, x_contourArea, x_intersectionArea, x_unionArea, x_dice = calculation.polygonCompareAgainstGold(goldPolygon, goldPolygon)
            dice = 0
            
        retList.append([row['image_url'], row['point'], row['polygon'], row['point_shapely'], row['poly_shapely'], row['contour_polygon_shapely'], goldArea, contourArea, dice])
    dfReturn = pd.DataFrame(retList, columns = ['image_url', 'point', 'polygon', 'point_shapely', 'poly_shapely', 'contour_polygon_shapely', 'area_poly', 'area_contour', 'IOU'])
    return dfReturn


In [None]:
def validatePolygon(p):
    if p.is_valid:
        poly_shapely = p
    else:
        poly_shapely = p.buffer(0) 

In [19]:
dfEvaluation = calculateIOU(dfShapelyPOIPolyContours)

In [21]:
dfEvaluation.head()

Unnamed: 0,image_url,point,polygon,point_shapely,poly_shapely,contour_polygon_shapely,area_poly,area_contour,IOU
0,IMG_01457.png,"(627, 319)","[(720, 288), (725, 294), (731, 300), (737, 302...",POINT (627 319),"POLYGON ((720 288, 725 294, 731 300, 737 302, ...","POLYGON ((661 286, 660 287, 660 288, 660 289, ...",8300.0,6258.0,0.647352
1,IMG_01457.png,"(820, 288)","[(823, 268), (835, 271), (856, 276), (898, 281...",POINT (820 288),"POLYGON ((823 268, 835 271, 856 276, 898 281, ...","POLYGON ((796 249, 795 250, 794 250, 793 250, ...",11433.5,8274.0,0.499188
2,IMG_01457.png,"(881, 523)","[(871, 542), (859, 527), (848, 526), (858, 509...",POINT (881 523),"POLYGON ((871 542, 859 527, 848 526, 858 509, ...","POLYGON ((861 510, 860 511, 859 512, 858 513, ...",1303.5,2656.5,0.415329
3,IMG_01457.png,"(477, 353)","[(427, 358), (479, 342), (522, 329), (523, 342...",POINT (477 353),"POLYGON ((427 358, 479 342, 522 329, 523 342, ...","POLYGON ((492 336, 491 337, 490 337, 489 338, ...",1648.0,979.5,0.525188
4,IMG_01457.png,"(520, 737)","[(511, 717), (556, 721), (556, 729), (536, 734...",POINT (520 737),"POLYGON ((511 717, 556 721, 556 729, 536 734, ...","POLYGON ((521 714, 520 715, 520 716, 519 717, ...",1656.5,1293.5,0.676211


In [22]:
# Instances where shapely objects are equal to None
print(dfShapelyPOIPolyContours.point_shapely.isna().sum(), dfShapelyPOIPolyContours.poly_shapely.isna().sum(), dfShapelyPOIPolyContours.contour_polygon_shapely.isna().sum())

0 0 2044


In [23]:
# The mean IOU (a candidate measure for the performance of a segmentation model)
dfEvaluation.IOU.mean()

0.35779133503094535

In [25]:
## Dump the new dataframe as pickle. The following code is commented-out to avoid execution by mistake. Please, uncomment the code statement below and execute if needed.
import pickle
#with open(path_file_Evaluations, 'wb') as handle:
#    pickle.dump(dfEvaluation, handle, protocol=pickle.HIGHEST_PROTOCOL)
print('Done')

Done


___