<a href="https://colab.research.google.com/github/HamidrezaKmK/2times2048/blob/master/src/preprocessing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Preprocessing


Preprocessing the images is a standard procedure that is done on panoramic
images. In this section, we highlight the importance of tuning the dynamic
range of panoramic images and try to come up with a method to unify the input
image's dynamic range.

The following two images show an analogy between the images from Dr Shahab
dataset and the Brazillian UFBA dataset.

![UFBA dataset vs Dr. Shahab dataset.](https://github.com/zlab-foss/dental/blob/master/figs/UFBAvsDrShahab.png?raw=1)

The following shows a comparison between the dynamic range of two datasets.
500 random images are extracted from both datasets and the distribution of their
pixel intensities are compared in the following histogram.

![Comparing the histograms.](https://github.com/zlab-foss/dental/blob/master/figs/HistogramComparison.png?raw=1)

As shown in the histogram, Dr. Shahab's database contains a larger range of
pixel intensities whilst the UFBA dataset only contains a limited range.

### Manual image tuning

In this section, we will tune the images from Dr. Shahab dataset using the
following parameters in GIMP:
- Exposure
- Brightness
- Contrast
- Gamma
- Pixel intensity mapping

The left image is an overexposed image from DrShahab database (drsh_2021_03_381)
where the edges of the roots for tooth no. 41 and 47 are either over-exposed or faded.
By changing the gamma factor to 0.37, it can be seen that the edges are better
defined and a distinction can be made betweem the root of tooth 41 and its
gingival surrounding.

<p float="left">
<img src="https://github.com/zlab-foss/dental/blob/master/figs/over_exposed_drsh_2021_03_381.jpeg?raw=1" width="500" />
<img src="https://github.com/zlab-foss/dental/blob/master/figs/over_exposed_gamma_manipulated_drsh_2021_03_381.jpeg?raw=1" width="500" />
</p>

The following photos show changes on brightness, contrast, and exposure on
a certain photo. The left side shows a sample data drawn from Dr. Shahab
dataset and the second one is produced by increasing the brightness to 50%,
the third one is produced by increasing the contrast by 35% and the last one
is caused by reducing the exposure by 0.42; all of these changes are done
manually in GIMP. As it can be seen, such subtle changes can greatly affect
the way we perceive the image and therefore models can use these changes
to their advantage.

<p float="left">
<img src="https://github.com/zlab-foss/dental/blob/master/figs/test_drsh_2021_03_385.jpeg?raw=1" width="230" />
<img src="https://github.com/zlab-foss/dental/blob/master/figs/test_brightness_50p_drsh_2021_03_385.jpeg?raw=1" width="230" />
<img src="https://github.com/zlab-foss/dental/blob/master/figs/test_contrast35_drsh_2021_03_385.jpeg?raw=1" width="230" />
<img src="https://github.com/zlab-foss/dental/blob/master/figs/test_exposure_042_drsh_2021_03_385.jpeg?raw=1" width="230" />
</p>

We can also straght-up map each pixel intensity to another. The following
figure does so using a pixel mapping curve:

<p float="left">
<img src="https://github.com/zlab-foss/dental/blob/master/figs/test_drsh_2021_03_385.jpeg?raw=1" width="500"/>
<img src="https://github.com/zlab-foss/dental/blob/master/figs/change_curve.png?raw=1" width="500"/>
</p>

## Unifying dynamic ranges

We aim to obtain a unified space for images we denote by $\Gamma$.
Assume that the model is trained to do sementic segmentation and
analysis on images in $\Gamma$; for example, this could be the images
obtained from the UFBA dataset. However, images from different datasets
may have different dynamic ranges and the model might not be able to
do the semantic segmentation task with these images coming from a different
device. That being said, we aim to create a model $M$ that receives an image
from an arbitrary device and maps it to $\Gamma$.

### Histogram equalization
In the remainder, we can explore different ways of addressing this problem.
A simple way is to perform a mapping between different pixel intensities; which
is known as **histogram equalization**. In this method, assume we are
trying to obtain a mapping from images coming from $\Gamma'$ to
$\Gamma$. We may approximate the Cumulative Density Function (CDF) 
of pixel intensities ranging from 0 to 255 for each of the $\Gamma$
and $\Gamma'$; we can refer to these CDF functions as $F$ and $F'$.
Intensity $x$ from the first dataset is mapped to $F'^{-1} o F(x)$.
Using this simple method we may obtain images similar to $\Gamma$ from
$\Gamma'$.

In [9]:
### YOU SHOULD HAVE ACCESS TO THE DATABASE TO RUN THE FOLLOWING CODES ###

# Mount on drive for google colab use
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [6]:
!ls

drive  sample_data


The following code reads 200 samples from Dr. Shahab dataset and 200 samples from UFBA dataset randomly and displays one sample from the selected.

In [10]:
from cv2 import cv2
import numpy as np
import matplotlib.pyplot as plt
import math
import os
import random

# Change to a directory containing Dr.Shahab images:
path_to_dr_shahab = '/content/drive/MyDrive/WORK/Dr Shahab/2021_01/' 

shahab_images = os.listdir(path_to_dr_shahab)
# Change to a directory containing the UFBA images
path_to_brazil = '/content/drive/MyDrive/WORK/UFBA/all_images/'

brazil_images = os.listdir(path_to_brazil)

samples_shahab = []
samples_brazil = []
sample_size_shahab = min(len(shahab_images), 200)
sample_size_brazil = min(len(brazil_images), 200)

for x in random.sample(shahab_images, sample_size_shahab):
  samples_shahab.append(cv2.imread(path_to_dr_shahab + x, cv2.IMREAD_GRAYSCALE))
for x in random.sample(brazil_images, sample_size_brazil):
  samples_brazil.append(cv2.imread(path_to_brazil + x, cv2.IMREAD_GRAYSCALE))

plt.imshow(samples_shahab[0], cmap='gray')
plt.show()
plt.imshow(samples_brazil[0], cmap='gray')
plt.show()


FileNotFoundError: ignored

Run the following piece of code to get the cdf function from both randomly sampled data.

In [None]:
from tqdm import tqdm

def get_cdf(list_of_images):
  """
  Gets a list of images and returns the cdf of the intensity distribution
  """
  cdf = np.zeros(256)
  for img in tqdm(list_of_images):
    cdf += np.pad(np.bincount(img), 256, 'constant')
  cdf = cdf.astype('float64')
  cdf /= cdf.sum()
  for i in range(1, len(cdf)):
    cdf[i] += cdf[i-1]
  return cdf

In [None]:
cdf_brazil = get_cdf(samples_brazil)
cdf_shahab = get_cdf(samples_shahab)

Next, we try to obtain a histogram equalization mapping between the two dataset using the method described above.

In [None]:
shahab_to_brazil = np.zeros(256)
pnt = 0
for i in range(256):
  while pnt < 256 and cdf_brazil[pnt] < cdf_shahab[i]:
    pnt += 1
  shahab_to_brazil[i] = pnt
  if pnt == 256 or (pnt > 0 and abs(cdf_brazil[pnt-1] - cdf_shahab[i]) < 
                    abs(cdf_brazil[pnt] - cdf_shahab[i])):
    shahab_to_brazil[i] = pnt - 1

Finally, we display an image from Dr. Shahab database and a random image from UFBA dataset; afterward, we map each pixel from the first image to obtain a similar dynamic range to UFBA.

In [None]:
img = samples_shahab[0]
plt.imshow(img, cmap='gray')
plt.show()
img_standard = np.random.sample(samples_brazil, 1)[0]
plt.imshow(img_standard, cmap='gray')
plt.show()
def convert(mapping, img_input):
  img_output = np.zeros_like(img_input)
  for x in range(img_output.shape[0]):
    for y in range(img_output.shape[1]):
      img_output[x, y] = mapping[img_input[x,y]]
  return img_output
img_standardized = convert(shahab_to_brazil, img)
plt.imshow(img_standardized, cmap='gray')
plt.show()