# Homework 03 - Binary Leaves

Contact: David C. Schedl (david.schedl@fh-hagenberg.at)

Note: this is the starter pack for the **Digital Imaging / Computer Vision** homework. You do not need to use the exact same template and can start from scratch as well!
Using regular Python files (.py) is also possible.

# Task 
<a name="Task-A" id="Task-A"> </a>

The goal of this assignment is to use binary image processing to describe and identify leaves. 
You can use binary-region properties such as area, perimeter, circularity, centralized moments, and Hu moments to describe the leaves.
The `binary_leaves` dataset contains multiple images of 5 different leave types: 
- Japanese maple,
- Chinese cinnamon*,
- ginkgo, maidenhair tree,
- Chinese tulip tree*, and
- tangerine.

The binary images are all of the same size, the leaves, however, are rotated and scaled slightly differently. 
Furthermore, there is natural variation in the leaf shapes, which makes the task more challenging.

Try to come up with a good description/threshold for each leaf type and evaluate how good that description is by answering the following questions: 
- How well can you distinguish between the different leaf types? 
- Which leaves are easy to distinguish? Which are hard, and why is that?
- How many different leaf types can you distinguish?
- What are the numbers (e.g., how many percent are correctly classified)?

You don't need to use 5 different leave types. 
Start with less types and add more if you feel confident.

**Hint(s):** 
- You can use all the code that we used for binary images as basis.
- For simplicity start with only 2 or 3 leave types and then optionally extend 5 (leave out the leaves with *). 
- When combining multiple descriptions be careful with ranges and scaling!
- When you want to use Hu moments, remember that the later moments (hu_3, hu_4, ....) are very sensitive to noise (maybe don't use them).



# Setup

Let's import useful libraries, first. 
We'll download binary images into the `binary_leaves` folder. 

In [None]:
import os
import cv2 # openCV
import numpy as np
import matplotlib.pyplot as plt
import json
import pandas as pd # nice tables in python


!curl -LJO "https://raw.githubusercontent.com/Digital-Media/cv_data/main/binary_leaves.zip" --silent
import zipfile
with zipfile.ZipFile("binary_leaves.zip", 'r') as zip_ref:
    zip_ref.extractall(".")

## Loading the data

Below you can find the code to load and display the data.
For each leaf you have a binary image and a label (0 to 5) indicting the leaf type.


In [None]:
# Load binary leave images and labels

# load label to name mapping from json file
with open('binary_leaves/labels.json') as f:
    label_to_name = dict(json.load(f))
#print(label_to_name)

# load images and labels
images = []
labels = []
file_names = []
for label, name in label_to_name.items():
    for file in os.listdir(f'binary_leaves/{label}'):
        image = (cv2.imread(f'binary_leaves/{label}/{file}', cv2.IMREAD_GRAYSCALE)>0).astype(np.uint8)
        images.append(image)
        labels.append(label)
        file_names.append(file)

# print simple statistics
print(f'number of images: {len(images)}')

# show an example image for each class
plt.figure(figsize=(15, 5))
# init the random number generator
np.random.seed(42)
N = 3 # number of images from the same class to show
for label, name in label_to_name.items():
    for n in range(N):
        plt.subplot(N, len(label_to_name), n*len(label_to_name) +(int(label)+1))
        # random sample from images with the same label
        idx = np.random.choice(np.where(np.array(labels)==label)[0])
        image = images[idx]
        plt.imshow(image, cmap='gray'), plt.axis('off')
        if n == 0:
            plt.title(f"{name[:15]} ({label})")
plt.show()

## Working with the dataset

Below is a simple example that shows you how to work with the dataset.
It computes properties for one leaf of each class. 

In [None]:

def compute_properties(img, hu_log=True):
    """Compute properties of a binary image.
    Args:
        img (np.array): binary image
        hu_log (bool): if True, compute the log of the Hu moments
    Returns:
        dict: dictionary with properties area, perimeter, circularity, and hu moments (hu_0, hu_1, ... hu_6)
    """
    # get binary regions of binary image and compute their properties (area, BBs, centroid)
    retval, labels, stats, centroids = cv2.connectedComponentsWithStats(img)
    assert( len(stats) == 2 ) # foreground (1) and background (0)
    # compute the contour perimeter 
    perimeter = cv2.arcLength(cv2.findContours((labels==1).astype(np.uint8), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[0][0], True)
    circularity = 4.0*np.pi*stats[1,4]/ (.95*perimeter)**2

    simple_props = {'area': stats[1,4], 'perimeter': perimeter, 'circularity': circularity}

    # compute Hu moments 
    hu_moments = cv2.HuMoments(cv2.moments((labels==1).astype(np.uint8))).flatten()
    if hu_log:
        hu_moments = np.sign(hu_moments) * np.log(np.abs(hu_moments)) # log is only defined for positive values, thus use abs
    hu_props = {'hu_'+str(i): hu_moments[i] for i in range(len(hu_moments))}

    return dict( **simple_props, **hu_props )



# compute properties for one examplary image
np.random.seed(123) # init the random number generator
props = {}
for label, name in label_to_name.items():
        # random sample from images with the same label
        idx = np.random.choice(np.where(np.array(labels)==label)[0])
        image = images[idx]
        
        props[f"{name} ({file_names[idx]})"] = compute_properties(image)

# make a pandas table with the hu moments
pd.options.display.float_format = "{:.3f}".format
df = pd.DataFrame(props)

# show the table
df

## Further comments/hints:
*   You do not need to come up with super efficient implementations! It is mostly about understanding the topic and the problem.
*   Think about the problem, solve it, and evaluate your solutions on the test images.
*   Summarize your ideas and solutions in the report! 


**Have fun!** 😸
