# Extract more informations from images 🖼️

## TL;DR: Remap the colors of the images in a smart way (see [🐳&🐬 - 🪄 Enhanced dataset [Ensemble of tricks]](https://www.kaggle.com/wolfy73/enhanced-dataset-ensemble-of-tricks) to add it to your pipeline)
It is now well establish that a lot of the work to be done in this competition is in the data processing side of the pipeline. Models are known to work better on images of bigger resolutions (in my opinion because a lot of features useful for identification are small relative to the image). Also, numerous methods have been thought of to crop the images and maximizing the effective resolution used to achieve the task.<br>

However I believe it is still not enough and that we can do better.

In [None]:
import numpy as np
import pandas as pd
import os
import cv2
import matplotlib.pyplot as plt

In [None]:
BASE_PATH = "../input/happywhale-enhanced-dataset-v1"
data = pd.read_csv(os.path.join(BASE_PATH, "train.csv"))

# The initial idea 💡
The idea of this notebook first came to me when I realized that there are some details on the individuals that are just noticable in some pictures. I hypothesized that we could enhance the details by changing the contrast of the images

In [None]:
image = cv2.imread(os.path.join(BASE_PATH, "train_images", data["image"].iloc[0]))[:, :, ::-1]
plt.figure(figsize=(2*3, 1*3))

plt.subplot(1, 2, 1)
plt.imshow(image)
plt.title("Original image")
plt.axis("off")

alpha = 3
beta = -50
image = np.clip(alpha*image.astype(np.float32) + beta, 0, 255).astype(np.uint8)
plt.subplot(1, 2, 2)
plt.imshow(image)
plt.title("More contrast, less brightness")
plt.axis("off")
plt.tight_layout()
plt.savefig('contrast.png')
plt.show()

Now we start noticing some more details:
<p style="text-align:center">
<img style="display:inline-block" src="https://i.imgur.com/KBgLGxJ.png"/>
</p>
<br>
Great ! so now we just have to increase contrast on all images of the dataset and we are done ... Right ?

In [None]:
def plot_images_contrast(batch, row=2, col=2, alpha=3, beta=-50, base_path=os.path.join(BASE_PATH, "train_images")):
    """
        Copied and adapted from https://www.kaggle.com/awsaf49/happywhale-data-distribution
    """
    plt.figure(figsize=(col*3, row*3))
    for i in range(row*col):
        plt.subplot(row, col, i+1)
        img = cv2.imread(os.path.join(base_path,  batch["image"].iloc[i]))
        max_size = 400
        if img.shape[0] > max_size or img.shape[1] > max_size:
            factor = max(img.shape[0] / max_size, img.shape[1] / max_size)
            img = cv2.resize(img, (int(round(img.shape[1] / factor)), int(round(img.shape[0] / factor))))
        if img is None:
            continue
        img = img[:, :, ::-1]
        img = np.clip(alpha*img.astype(np.float32) + beta, 0, 255).astype(np.uint8)
        plt.imshow(img)
        if "species" in batch:
            plt.title(batch["species"].iloc[i])
        plt.axis('off')
    plt.tight_layout()
    plt.show()

In [None]:
plot_images_contrast(data, row=4, col=4)

Well... Not really...<br>
To understand what is going on, we have to take a step back

# A bit of theory 📘
To grasp the principle behind contrast, we must have a look at the pixels histogram of an image. Let's show an example in greyscale for simplicity:

In [None]:
image = cv2.imread(os.path.join(BASE_PATH, "train_images", data["image"].iloc[1]), cv2.IMREAD_GRAYSCALE)
plt.imshow(np.repeat(image, 3).reshape((*image.shape, 3)))
plt.title("Original image")
plt.axis("off")
plt.show()
plt.title("Histogram of the pixels of the image")
plt.hist(image.ravel(), bins=256)
plt.show()

This histogram shows the count of pixels having a given value. For this particular histogram, we see two sort of "spikes":
1. One around ~50 intensity, which would correspond to the pixels of the whale
2. A second one between ~150 and ~200 which would correspond to the rest of the image (i.e. the water)

Let's see how the histogram evolve when we increase the contrast:

In [None]:
alpha = 3.0
beta = 0.0
image_more_contrast = np.clip(alpha*image.astype(np.float32) + beta, 0, 255).astype(np.uint8)
plt.imshow(np.repeat(image_more_contrast, 3).reshape((*image_more_contrast.shape, 3)))
plt.title("Contrast increased")
plt.axis("off")
plt.show()
plt.title("Histogram of the pixels of the image")
# I removed the fully white pixels otherwise we couldn't see the rest of the distribution
plt.hist(image_more_contrast[image_more_contrast < 255].ravel(), bins=256)
plt.show()

We see that now the image is mainly white pixels (I removed them from the histogram for clarity). If we have a look at the histogram we see that now the "whale" spike covers almost all the pixels values while the "water" spike has been transformed into white pixels.<br>

So if we see more details, it's because the pixels of the whale are distributed across a wider range of intensities than previously.<br>

## Intermediate conclusion
*To extract more informations from the images we have to make the pixels of the individual be spreaded across a wider range of intensities.*<br>

But as we can see we can't do the same transformation for every images because on some of them the pixels we are interested in will be sent in the white region where nothing is distinguishable at all...<br>

# What's the plan then ? 🗺️
Now we have several possibilities more or less complicated:
1. If we have access to a pixel segmentation model, we could make an histogram of only the pixels we are interested in and transform the image such that those pixels take the total range of values
2. Maximizing the information in the entire image, this would be less effective but at least it do not require anything other than a few lines of codes
3. A bit of bot: If, for example, we have a way to make bounding boxes for the dataset we can crop the images such that *most* of the pixels belong to the individual we want to identify and then apply a global maximizing information method

As some effort have been made to create proper bounding boxes models for this competition, we can try the 3rd option which is probably the best balance. But if you, dear reader by any chance come across a segmentation model that works on this data, you will probably have much better results 😉

# Maximizing information in an image 👨‍🔬
For thoses who know a little bit about information theory, we want to remap the pixels values such that the distribution of the values of the pixels maximizes its entropy. (i.e. follows an uniform distribution) <br>

For those who don't know about information theory:<br>
Remember the histogram I mentionned earlier ? We want it to look like that:

In [None]:
y = np.ones(256) / 256
plt.title("Uniform distribution")
plt.plot(y)
plt.fill_between(np.arange(256), 0, y)
plt.show()

## How ?
### A bit of probabilities 📈 
* Let $p$ be the value of a pixel in an image and $p^{\prime}$ be the value of the same pixel in the transformed image
* Let $X$ be a random variable representing the intensity of a random pixel within the image
* Let $\mathcal{P}$ be the ensemble of pixels in the original image and $n$ its number of pixels

The transformation we are looking for is:
$$
    p^{\prime} = 255 \times \mathbb{P}\left(X \leq p\right) \\
    p^{\prime} = 255 \times \frac{1}{n}\sum_{x \in \mathcal{P}}\mathbb{1}_{x \leq p} \\
$$

We can prove that this formula has the property I mentionned:
$$
    \mathbb{P}\left(p^{\prime} \leq k\right) = \mathbb{P}\left[255 \times \frac{1}{n}\sum_{x \in \mathcal{P}}\mathbb{1}_{x \leq p} \leq k\right] \\
    \mathbb{P}\left(p^{\prime} \leq k\right) = \mathbb{P}\left[\frac{1}{n}\sum_{x \in \mathcal{P}}\mathbb{1}_{x \leq p} \leq \frac{k}{255}\right]
$$
Note that $\frac{1}{n}\sum_{x \in \mathcal{P}}\mathbb{1}_{x \leq p} \sim \mathcal{U}(0, 1)$, which gives us:
$$
    \mathbb{P}\left(p^{\prime} \leq k\right) = \mathbb{P}\left[\frac{1}{n}\sum_{x \in \mathcal{P}}\mathbb{1}_{x \leq p} \leq \frac{k}{255}\right] = \frac{k}{255}\\
$$
Thus, if we note $F(x)$ the repartition function of $p^{\prime}$ and $f(x)$ its density, we have:
$$
    F(x) = \frac{x}{255} \\
    f(x) = \frac{\partial F}{\partial x} = \frac{1}{255}
$$
By identification, we showed that:
$$
    p^{\prime} \sim \mathcal{U}(0, 255)
$$
We can now derive the following algorithm:

In [None]:
def remap_channel(image):
    # Argsort of the image to keep spatial information
    ids_sorted = np.argsort((image + np.random.random(image.shape) - 0.5).ravel())
    # Shades of grey
    values = np.floor(np.linspace(0.0, 256.0, num=len(ids_sorted), endpoint=False)).astype(np.uint8)
    s = image.shape
    image = image.ravel()
    # Reorder the shades of greyto look like the original image
    image[ids_sorted] = values
    image = image.reshape(s)
    return image

def remap_colors(image):
    """
        The remapping is equivalent to create an image with n shades of grey and move the pixels in it such that it look like the original image
    """
    if len(image.shape) == 2:
        return remap_channel(image)
    image[:, :, 0] = remap_channel(image[:, :, 0])
    image[:, :, 1] = remap_channel(image[:, :, 1])
    image[:, :, 2] = remap_channel(image[:, :, 2])
    return image

🚨 I am not sure yet how we should apply the same principle to colored images. So I intuitively thought about performing the mapping individually on each color channel but there is no theoretical reason for that nor evidence that it's the best way to deal with them...

## Test this on real images 🖼️
Let's see another example image:

In [None]:
image = cv2.imread(os.path.join(BASE_PATH, "train_images", data["image"].iloc[16]))[:, :, ::-1]
plt.imshow(image)
plt.title("Original image")
plt.axis("off")
plt.show()
plt.title("Histogram of the pixels of the original image")
plt.hist(image.ravel(), bins=256)
plt.show()
image = remap_colors(image)
plt.title("Transformed image")
plt.imshow(image)
plt.axis("off")
plt.show()
plt.title("Histogram of the pixels of the transformed image")
plt.hist(image.ravel(), bins=256)
plt.show()

Amazing ! We have succesfully remaped the colors to have a uniform histogram. And the results looks visually good.<br>
Let's try to apply it to more images now:

In [None]:
def plot_images_color_remap(batch, row=2, col=2, alpha=3, beta=-50, base_path=os.path.join(BASE_PATH, "train_images")):
    """
        Copied and adapted from https://www.kaggle.com/awsaf49/happywhale-data-distribution
    """
    plt.figure(figsize=(col*3, row*3))
    for i in range(row*col):
        plt.subplot(row, col, i+1)
        img = cv2.imread(os.path.join(base_path,  batch["image"].iloc[i]))
        max_size = 400
        if img.shape[0] > max_size or img.shape[1] > max_size:
            factor = max(img.shape[0] / max_size, img.shape[1] / max_size)
            img = cv2.resize(img, (int(round(img.shape[1] / factor)), int(round(img.shape[0] / factor))))
        if img is None:
            continue
        img = img[:, :, ::-1]
        img = remap_colors(img)
        plt.imshow(img)
        if "species" in batch:
            plt.title(batch["species"].iloc[i])
        plt.axis('off')
    plt.tight_layout()
    plt.show()

In [None]:
plot_images_color_remap(data, row=4, col=4)

I consider it good enough to give it a try in my pipeline now !

# Limitations 🚨
I would like though to point out a certain number of limitations:
1. The less water in the image, the better. Ideally we would like to only take into account the pixels of the individual when computing the remaping, that's why I talked about using a segmentation model earlier
2. Some information is lost such as the overall colors of the image so we could try pipelines that incorporate both the initial image and the remaped one
3. To work out, this implementation needs that the pixels values within the original image are unique. To do so, I'm adding random noise, but it could be better to resize the image with interpolation method such as the cubic one, without retresholding it. I currently am not aware of an efficient implementation in python of a resizing algorithm that outputs pixels values as floating points...
4. How to deal properly with colored images is stil unresolved, I will probably solve this question via experiments
5. If you are using this trick with a pretrained model, keep in mind that they expect "natural" images, and that the transformed images are not coming from the "natural" images distribution, so I'm not sure what will happen

# Recolor the dataset 🎨
Now you may want to incorporate this trick into your pipeline. A complete code can de found in this notebook: [🐳&🐬 - 🪄 Enhanced dataset [Ensemble of tricks]](https://www.kaggle.com/wolfy73/enhanced-dataset-ensemble-of-tricks)

# Conclusion 🤷
We discovered a method to enhance the contrast in images and maximize the information in the image that you can integrate into your pipeline. Some limitations and interrogations are still remaining, I will try to improve the method in futur versions.

👍 If you found this notebook helpful or learned something please consider giving an upvote, and if you disagree with the content, I'll be pleased to dicsuss it with you in the comments.

😊 Happy Kaggling everyone !

![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Thats_all_folks.svg/2560px-Thats_all_folks.svg.png)