[![Fixel Algorithms](https://fixelalgorithms.co/images/CCExt.png)](https://fixelalgorithms.gitlab.io)

# Image Processing with Python

## Global Thresholding

> Notebook by:
> - Royi Avital RoyiAvital@fixelalgorithms.com

## Revision History

| Version | Date       | User        |Content / Changes                                                   |
|---------|------------|-------------|--------------------------------------------------------------------|
| 0.1.000 | 03/10/2023 | Royi Avital | First version                                                      |

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FixelAlgorithmsTeam/FixelCourses/blob/master/ImageProcessingPython/0002SciKitImageBasics.ipynb)

In [None]:
# Import Packages

# General Tools
import numpy as np
import scipy as sp
import pandas as pd

from numba import jit, njit

# Image Processing
import skimage as ski

# Machine Learning


# Miscellaneous
import os
from platform import python_version
import random
import timeit

# Typing
from typing import Callable, List, Tuple

# Visualization
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

# Jupyter
from IPython import get_ipython
from IPython.display import Image, display
from ipywidgets import Dropdown, FloatRangeSlider, FloatSlider, interact, IntSlider, Layout

## Notations

* <font color='red'>(**?**)</font> Question to answer interactively.
* <font color='blue'>(**!**)</font> Simple task to add code for the notebook.
* <font color='green'>(**@**)</font> Optional / Extra self practice.
* <font color='brown'>(**#**)</font> Note / Useful resource / Food for thought.

### Code Exercise

 - Single line fill

 ```python
 vallToFill = ???
 ```

 - Multi Line to Fill (At least one)

 ```python
 # You need to start writing
 ????
 ```

 - Section to Fill

```python
#===========================Fill This===========================#
# 1. Explanation about what to do.
# !! Remarks to follow / take under consideration.
mX = ???

???
#===============================================================#
```

In [None]:
# Configuration
# %matplotlib inline

seedNum = 512
np.random.seed(seedNum)
random.seed(seedNum)

# sns.set_theme() #>! Apply SeaBorn theme

runInGoogleColab = 'google.colab' in str(get_ipython())

In [None]:
# Constants



In [None]:
# Fixel Algorithms Packages


In [None]:
# General Auxiliary Functions

def PlotThrImage( mI: np.ndarray, valThr: float ) -> None:

    mB = mI >= valThr

    hF, vHA = plt.subplots(nrows = 1, ncols = 2, figsize = (14, 6))
    vHA = vHA.flat
    
    hA = vHA[0]
    hA.imshow(mI, cmap = 'gray')
    hA.set_title('Input Image')

    hA = vHA[1]
    hA.imshow(mB, cmap = 'gray')
    hA.set_title(f'Binarized Image, T = {valThr:0.3f}')

    return
    


## Image Thresholding

Image Thresholding use thresholding for the task of classification of each pixel.  
Given a set of thresholds $\mathcal{T} = \left\{ {T}_{1}, {T}_{2}, \ldots, \right\}$ it partitions the image into $L = \left| \mathcal{T} \right| + 1$ labels.

The case of a single threshold is called _Binarization_.

There are many approaches for calculating the threshold.  
This notebook focuses on the binary case with Global threshold.

* <font color='brown'>(**#**)</font> Binarization is one of the most common pre processes for _Optical Character Recognition_ (OCR).
* <font color='brown'>(**#**)</font> Commonly the threshold is the result of optimizing the parameter of an objective.
* <font color='brown'>(**#**)</font> The [Document Image Binarization](https://dib.cin.ufpe.br) and [DIBCO](https://users.iit.demokritos.gr/~bgat/DIBCO2009/benchmark) are common data sets for the task.

In [None]:
# Parameters

# imgUrl = 'https://i.imgur.com/b8bBmtT.png' #<! From DIBCO data set
imgUrl = 'https://i.postimg.cc/8PQCVSQK/148749752-88e0661f-4356-45f5-b1b1-bc34cd872164.png' #<! From DIBCO data set

numBins = 200

## Generate Data

This section loads the image used to evaluate the results.  

In [None]:
# Load / Generate Data

mI = ski.io.imread(imgUrl)
mI = ski.util.img_as_float64(mI)
mI = np.mean(mI, axis = 2)


In [None]:
# Display the Data

hF, hA = plt.subplots(figsize = (8, 6))
hA.imshow(mI, cmap = 'gray')
hA.set_title('Input Image');

## Global Image Thresholding

The approach of _Global Image Thresholding_ is applying the **same** threshold for all pixels.    

* <font color='brown'>(**#**)</font> By definition a global implementation means the threshold can not be optimized to local anomalies.
* <font color='brown'>(**#**)</font> Global methods are popular when the the compute resources are limited.


### Manual Method

Set the threshold manually.

In [None]:
# Manual Global Image Thresholding

hPlotThrImage = lambda valThr: PlotThrImage(mI, valThr)

valThrSlider = FloatSlider(value = 0.5, min = 0, max = 1, step = 0.01, 
                                  description = 'Threshold Value:', continuous_update = False, 
                                  orientation = 'horizontal', readout = True, readout_format = '0.2f', 
                                  layout = Layout(width = '20%'), style = {'description_width': 'initial'})
interact(hPlotThrImage, valThr = valThrSlider)

plt.show()


### Otsu Threshold

The [Otsu Method](https://en.wikipedia.org/wiki/Otsu%27s_method) is based on finding the threshold which maximizes the ratio between the _inter cluster variance_ (${\sigma}_{b}^{2}$) and _intra cluster variance_ (${\sigma}_{w}^{2}$):

$$ \arg \max_{T} \frac{{\sigma}_{b}^{2} \left( T \right)}{{\sigma}_{w}^{2} \left( T \right)} $$

Since the total variance is the sum of variances, ${\sigma}^{2} = {\sigma}_{w}^{2} \left( T \right) + {\sigma}_{b}^{2} \left( T \right)$ , one can maximize the _inter cluster variance_ $\arg \max_{T} {\sigma}_{b}^{2} \left( T \right)$.  
The method to calculate the optimal threshold, $T$, is a 1D optimization problem.

Given an histogram of probabilities where $p \left( i \right)$ is the probability of the the $i$ -th value and a threshold $TT, then:

$$ {w}_{0} = \sum_{i = 0}^{T - 1} p \left( i \right), \, {w}_{1} = \sum_{i = T}^{L - 1} p \left( i \right), \, {\mu}_{0} = \frac{1}{ {w}_{0} } \sum_{i = 0}^{T - 1} i p \left( i \right), \, {\mu}_{1} = \frac{1}{ {w}_{1} } \sum_{i = T}^{L - 1} i p \left( i \right), \, \mu = \sum_{i = 0}^{L - 1} i p \left( i \right) $$

The inter class variance can be derived as:

$$ {\sigma}_{b}^{2} = {w}_{0} {\left( {\mu}_{0} - \mu \right)}^{2} + {w}_{1} {\left( {\mu}_{1} - \mu \right)}^{2} = {w}_{0} {w}_{1} {\left( {\mu}_{1} - {\mu}_{0} \right)}^{2} $$

Since ${w}_{0} + {w}_{1} = 1$ and $\mu = {w}_{0} {\mu}_{0} + {w}_{1} {\mu}_{1}$ it can be written as:

$$ {\sigma}_{b}^{2} = \frac{{w}_{0}}{1 - {w}_{0}} {\left( {\mu}_{0} - \mu \right)}^{2} = \frac{1}{ {w}_{0} \left( 1 - {w}_{0} \right) } {\left( \sum_{i = 0}^{T - 1} i p \left( i \right) - {w}_{0} \mu \right)}^{2} $$

* <font color='brown'>(**#**)</font> The motivation is to formulate it as a function of a single group is to enable broadcasting a simpler on a 1D arrays.

This section implements the Otsu Method.

In [None]:
# Otsu Method

#===========================Fill This===========================#
# 1. Implement the Otsu Method as a function.
# 2. The input is an Histogram `vH` which is calculated at the grid `vG`.
# !! Each value of `vG` represents the center of the bin of the histogram.
# !! Numba can be used.

def OtsuMethod( vH: np.ndarray, vG: np.ndarray ) -> float:    
    """
    Compute the Otsu threshold on a grid of points using a histogram vector.
    
    Parameters:
    -----------
    vH : np.ndarray
        The histogram vector of the image, normalized such that it represents
        probabilities and sums to 1.
    vG : np.ndarray
        The grid points to divide the intensity range [0, 1].
        
    Returns:
    --------
    valThr : float
        The optimal threshold value in the range [0, 1] that maximizes the inter class variance.

    Remarks:
    --------
     - The input histogram can be based on counts or probabilities.
     - The optimal threshold also minimizes the intra class variance.
    """
    
    # Normalize the histogram so that it represents probabilities
    vH = vH / np.sum(vH) #<! Does not mutate the input
    
    # Compute cumulative sums for probabilities (Weights) and means
    vCdf     = np.cumsum(vH) #<! CDF (w_0)
    vCumMean = np.cumsum(vH * vG) #<! μ_0 * w_0
    
    # Total mean
    μ = vCumMean[-1]
    
    # Inter class variance for each threshold
    vNum = np.square(vCumMean - μ * vCdf) #<! Numerator
    vDen = vCdf * (1 - vCdf) #<! Denominator
    
    # Avoid division by zero
    vDen[vDen == 0] = 1e9 #<! Large number
    
    # Inter class variance for each threshold
    vσb = vNum / vDen
    
    # The optimal threshold is where the between class variance is maximized
    idxThr = np.argmax(vσb)
    valThr = vG[idxThr]

    return valThr

#===============================================================#

In [None]:
# Image Histogram

vH, vG = ski.exposure.histogram(mI, nbins = numBins, normalize = True)

In [None]:
# Calculate the Threshold

valThr = OtsuMethod(vH, vG)


In [None]:
# Display Result

PlotThrImage(mI, valThr)

* <font color='red'>(**?**)</font> Will higher value of `numBins` improve, in the sense of the Otsu method objective, results?  
  Consider the type of the input image.

<!-- There is no point with more bins than the number of levels in the image. -->

* <font color='red'>(**?**)</font> Consider a very large image with float values. How could you use [`scipy.optimize.minimize_scalar`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize_scalar.html)?

<!-- Solve with crude binning and use the function for finetuning given a starting point. -->