Osnabrück University - Computer Vision (Winter Term 2016/17) - Prof. Dr.-Ing. G. Heidemann, Ulf Krumnack

# Exercise Sheet 05: Segmentation 2

## Introduction

This week's sheet should be solved and handed in before the end of **Sunday, December 11, 2016**. If you need help (and Google and other resources were not enough), feel free to contact your groups' designated tutor or whomever of us you run into first. Please upload your results to your group's Stud.IP folder.

## Assignment 0: Math recap (Periodic functions) [0 Points]

This exercise is supposed to be very easy, does not give any points, and is voluntary. There will be a similar exercise on every sheet. It is intended to revise some basic mathematical notions that are assumed throughout this class and to allow you to check if you are comfortable with them. Usually you should have no problem to answer these questions offhand, but if you feel unsure, this is a good time to look them up again. You are always welcome to discuss questions with the tutors or in the practice session. Also, if you have a (math) topic you would like to recap, please let us know.

**a)** What are periodic functions? Can you provide a definition?

**b)** What are *amplitude*, *frequency*, *wave length*, and *phase* of a sine function? How can you change these properties?

**c)** How are sine and cosine defined for complex arguments? In what sense does this generalize the real case?

## Assignment 1: Multiscale representation [8 Points]

**a)** What is a *scale space*? What is a multiscale representation? What is the motivation for a multiscale strategy in the context of segmentation?

Multiscale representations are based on the idea that patterns can be expressive on multiple scales. The image is therefore represented on multiple scales (we scale down the original image). Segmentation in this context relies on homogeneity on multiple scales. Scale space is a generalization of the discrete scales used for the gaussian/laplacian pyramid to a continuous scale space.

**b)** What is the *Gaussian pyramid*? How does the **reduce** operation work? Explain in your own words what low pass filtering is and why it should be used when building the pyramid? Implement the **reduce** operation and generate a figure similar to the one on (CV-07 slide 32).

We take an image and reduce it multiple times. The reduction is working by replacing four pixels of the current level with one pixel in the next level. Thereby we reduce the size of an image to half its length/width and a quarter of its size. These layers on top of each other yield the pyramid. It is called a gaussian pyramid, if we use a Gaussian filter before the reduction. Low pass filtering removes noise and high frequencies, so that the resulting picture is better suited for the following reduction.

In [None]:
import numpy as np
from scipy import misc
import matplotlib.pyplot as plt
from scipy.ndimage.filters import gaussian_filter
img = misc.imread('mermaid.png', mode='F')
sigma = 1
filtered = gaussian_filter(img,sigma)
scales = 3

# Initialize recursive relations.
pyramid_image = filtered

for rowIndex in range(0,filtered.shape[0]-1,2):
    for colIndex in range(0,filtered.shape[1]-1):
        # Sum 4-square region.
        pixelSum = filtered[rowIndex,colIndex] + filtered[rowIndex+1,colIndex] + filtered[rowIndex,colIndex+1] + filtered[rowIndex+1,colIndex+1]
        
        # Overwrite pyramid image.
        pyramid_image[filtered.shape[0]/2+rowIndex/2,colIndex/2] = pixelSum/4

plt.gray()
plt.imshow(img)
plt.show()
plt.imshow(pyramid_image)
plt.title('Sigma = '+str(sigma))
plt.show()

**c)** What is the **expand** operation? Why can the **reduce** operation not be inverted? Implement the **expand** operation and generate an image similar to the one on (CV-07 slide 34).

Expand can be seen as the complementary operation to reduction. We try to rebuild the next bigger layer from a smaller layer. Since we cannot retain the pixel information from the 4 pixels we used during the reduction step, it cannot be inverted. By resizing we are basically only rescaling the image to a bigger size, so we are blurring it up.

In [None]:
import numpy as np
from scipy import misc
import matplotlib.pyplot as plt
from scipy.ndimage.filters import gaussian_filter

img = misc.imread('mermaid.png', mode='F')

# filtered = pyramid_image
sigma = 1
filtered = gaussian_filter(img,sigma)
original = pyramid_image.copy()
#expansion 
for rowIndex in range(filtered.shape[0]-1, 0, -2):
    for colIndex in range(filtered.shape[1]-1, 0):

#         # = (filtered[rowIndex,colIndex]+filtered[rowIndex+1,colIndex]+filtered[rowIndex,colIndex+1]+filtered[rowIndex+1,colIndex+1])/4
        interpolated1 = (pyramid_image[rowIndex,colIndex]+pyramid_image[rowIndex-1,colIndex+1])/2 # Top right.
        interpolated2 = (pyramid_image[rowIndex,colIndex]+pyramid_image[rowIndex-1,colIndex])/2 # Top.
        interpolated3 = (pyramid_image[rowIndex,colIndex]+pyramid_image[rowIndex+1,colIndex])/2 # Bottom.
        interpolated4 = pyramid_image[rowIndex,colIndex] # Center.
#         interpolated5 = (pyramid_image[rowIndex,colIndex]+pyramid_image[rowIndex,colIndex-1])/2 # Left.
#         interpolated8 = (pyramid_image[rowIndex,colIndex]+pyramid_image[rowIndex-1,colIndex-1])/2 # Top left.
#         interpolated8 = (pyramid_image[rowIndex,colIndex]+pyramid_image[rowIndex-1,colIndex-1])/2 # Bottomleft.
#         interpolated6 = (pyramid_image[rowIndex,colIndex]+pyramid_image[rowIndex,colIndex+1])/2 # Right.
#         interpolated7 = (pyramid_image[rowIndex,colIndex]+pyramid_image[rowIndex+1,colIndex+1])/2 # Bottom right.
        pyramid_image[filtered.shape[0]/2-rowIndex/2, colIndex*2] = interpolated1
        pyramid_image[filtered.shape[0]/2-rowIndex/2, colIndex*2] = interpolated2
        pyramid_image[filtered.shape[0]/2-rowIndex/2, colIndex*2] = interpolated3
        pyramid_image[filtered.shape[0]/2-rowIndex/2, colIndex*2] = interpolated4
        #pyramid_image[filtered.shape[0]/2+rowIndex/2, colIndex/2] = interpolated1
        #pyramid_image[filtered.shape[0]/2+rowIndex/2, colIndex/2] = interpolated2
        #pyramid_image[filtered.shape[0]/2+rowIndex/2, colIndex/2] = interpolated3
        #pyramid_image[filtered.shape[0]/2+rowIndex/2, colIndex/2] = interpolated4

plt.gray()
plt.imshow(filtered)
plt.show()
plt.imshow(pyramid_image)
plt.show()

**d)** What is the *Laplacian pyramid*? What is it used for? Compute the Laplacian pyramid and generate an image similar to the one on (CV-07 slide 36).

In [None]:
import numpy as np
from scipy import misc
import matplotlib.pyplot as plt

img = misc.imread('mermaid.png', mode='F')

pyramid_image = img # change this!
# FIXME: put your code here!

plt.gray()
plt.imshow(pyramid_image)
plt.show()

## Assignment 2: Texture Segmentation [6 Points]

**a)** What is texture? Try to define it in your own words. Can there be a standard definition? What problems do you expect for texture based segmentation? 

**b)** What is a co-occurence matrix? How can it be used to characterize texture?

**c)** Implement a function to compute the co-occurence matrix of an image (patch). Apply it and compare your results to (CV-07 slide 54).

In [None]:
import numpy as np
from scipy import misc
import matplotlib.pyplot as plt

img = misc.imread('mermaid.png', mode='L')

def get_patch(img, x, y, size = 40):
    """
    Extract a rectangular patch from an image and mark it in the original image.
    """
    result = img[x:x+size,y:y+size].copy()
    img[x:x+size, [y,y+1,y+size,y+size+1]] = 0
    img[[x,x+1,x+size,x+size+1], y:y+size] = 0
    return result

patches = []
patches.append(get_patch(img, 50,130))
patches.append(get_patch(img, 110,80))
patches.append(get_patch(img, 260,340))
patches.append(get_patch(img, 310,110))
patches.append(get_patch(img, 100,440))


def cooccurrence(img, dx = 1, dy = 1):
    """
    Compute a co-occurence matrix for the given image.
    
    Args:
        img          the grayscale image (uint8)
        dx,dy        the offset between the two reference points

    Returns:
        matrix       the co-occurence matrix
    """
    matrix = np.empty((256,256))
    # FIXME: put your code here!

    return matrix


plt.figure(figsize=(12, 12))
plt.gray()
plt.imshow(img)
plt.show()


plt.figure(figsize=(12, 12))
i = 0
for p in patches:
    plt.subplot(len(patches),3,i+1); plt.axis('off'); plt.imshow(p)
    plt.subplot(len(patches),3,i+2); plt.imshow(cooccurrence(p,1,0))
    plt.subplot(len(patches),3,i+3); plt.imshow(cooccurrence(p,0,1))
    i += 3
plt.show()

## Assignment 3: Region merging [6 Points]

Implement the *region merging* algorithm (CV-07 slide 39) and apply it to the image `segments.png` (or some part of it). Use a simple *homogeneity condition*, e.g. that the maximal difference between gray values in a segment is not larger than a given threshold.

In [None]:
# FIXME: put your code here!
