From https://www.kaggle.com/apacheco/shades-of-gray-color-constancy

# Introduction

The paper [Improving dermoscopy image classification using color constancy](https://ieeexplore.ieee.org/abstract/document/6866131/) shows that using a color compensation technique to reduce the influence of the acquisition setup on the color features extracted from the images provides a improvement on the performance for skin cancer classification. 

In ISIC 2019 challenge, the top three approaches in both tasks [[1]](https://isic-challenge-stade.s3.amazonaws.com/99bdfa5c-4b6b-4c3c-94c0-f614e6a05bc4/method_description.pdf?AWSAccessKeyId=AKIA2FPBP3II4S6KTWEU&Signature=3myZOh3ZfEdZ5UFO8Z1DGmelRrk%3D&Expires=1593068545) [[2]](https://isic-challenge-stade.s3.amazonaws.com/9e2e7c9c-480c-48dc-a452-c1dd577cc2b2/ISIC2019-paper-0816.pdf?AWSAccessKeyId=AKIA2FPBP3II4S6KTWEU&Signature=Up3vDSfqGwmf%2FS6nKDOlNSmKZug%3D&Expires=1593068545) [[3]](https://isic-challenge-stade.s3.amazonaws.com/f6d46ceb-bf66-42ff-8b22-49562aefd4b8/ISIC_2019.pdf?AWSAccessKeyId=AKIA2FPBP3II4S6KTWEU&Signature=3XwGMDlkwcusfCwZ1Nk%2Fw5IFwUY%3D&Expires=1593068545) applied the Shades of Gray algorithm [[4]](https://pdfs.semanticscholar.org/acf3/6cdadfec869f136602ea41cad8b07e3f8ddb.pdf) as their color constancy method to improve their performance.

The goal of this notebook is to apply this algorithm to the current dataset and rise some discussion about this method.

## Import Libraries

In [1]:
import cv2
import numpy as np
from glob import glob
from tqdm import tqdm
import matplotlib.pyplot as plt
import os

The function below was originally designed by [LincolnZjx](https://github.com/LincolnZjx/ISIC_2018_Classification) for the ISIC 2018 challenge.

Edit: As [Andrew Anikin](https://www.kaggle.com/andrewanikin) pointed out in comments, we shoud include `img = np.clip(img, a_min=0, a_max=255)` to avoid values above 255 in the image, which results in red, yellow, purple etc colors.

In [2]:
def shade_of_gray_cc(img, power=6, gamma=None):
    """
    img (numpy array): the original image with format of (h, w, c)
    power (int): the degree of norm, 6 is used in reference paper
    gamma (float): the value of gamma correction, 2.2 is used in reference paper
    """
    img_dtype = img.dtype

    if gamma is not None:
        img = img.astype('uint8')
        look_up_table = np.ones((256,1), dtype='uint8') * 0
        for i in range(256):
            look_up_table[i][0] = 255 * pow(i/255, 1/gamma)
        img = cv2.LUT(img, look_up_table)

    img = img.astype('float32')
    img_power = np.power(img, power)
    rgb_vec = np.power(np.mean(img_power, (0,1)), 1/power)
    rgb_norm = np.sqrt(np.sum(np.power(rgb_vec, 2.0)))
    rgb_vec = rgb_vec/rgb_norm
    rgb_vec = 1/(rgb_vec*np.sqrt(3))
    img = np.multiply(img, rgb_vec)

    # Andrew Anikin suggestion
    img = np.clip(img, a_min=0, a_max=255)
    
    return img.astype(img_dtype)

Testing the method and displaying random images to compare the image with and without color constancy

## Applying the color constacy method to the whole dataset

In [3]:
def apply_cc (img_paths, output_folder_path, resize=None):
    
    if not os.path.isdir(output_folder_path):
        os.mkdir(output_folder_path)    

    with tqdm(total=len(img_paths), ascii=True, ncols=100) as t:
        
        for img_path in img_paths:
            img_name = img_path.split('/')[-1]
            img_ = cv2.imread(img_path, cv2.IMREAD_COLOR)
            if resize is not None:
                img_ = cv2.resize(img_, resize, cv2.INTER_AREA)
            np_img = shade_of_gray_cc (img_)            
            cv2.imwrite(os.path.join(output_folder_path, img_name.split('.')[0] + '.jpg'), np_img)
            t.update()

### Train set

In [4]:
train_akiec_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/train/akiec/*.jpg')
train_bcc_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/train/bcc/*.jpg')
train_bkl_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/train/bkl/*.jpg')
train_df_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/train/df/*.jpg')
train_mel_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/train/mel/*.jpg')
train_vasc_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/train/vasc/*.jpg')

In [5]:
apply_cc(train_akiec_paths,'cc_train/akiec/', (224,224))
apply_cc(train_bcc_paths,'cc_train/bcc/', (224,224))
apply_cc(train_bkl_paths,'cc_train/bkl/', (224,224))
apply_cc(train_df_paths,'cc_train/df/', (224,224))
apply_cc(train_mel_paths,'cc_train/mel/', (224,224))
apply_cc(train_vasc_paths,'cc_train/vasc/', (224,224))

100%|#############################################################| 205/205 [00:02<00:00, 75.69it/s]
100%|#############################################################| 323/323 [00:04<00:00, 78.36it/s]
100%|#############################################################| 692/692 [00:09<00:00, 73.93it/s]
100%|###############################################################| 72/72 [00:00<00:00, 74.42it/s]
100%|#############################################################| 700/700 [00:09<00:00, 73.74it/s]
100%|###############################################################| 88/88 [00:01<00:00, 76.02it/s]


### Validation set

In [6]:
val_akiec_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/val/akiec/*.jpg')
val_bcc_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/val/bcc/*.jpg')
val_bkl_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/val/bkl/*.jpg')
val_df_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/val/df/*.jpg')
val_mel_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/val/mel/*.jpg')
val_vasc_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/val/vasc/*.jpg')

In [7]:
apply_cc(val_akiec_paths,'cc_val/akiec/', (224,224))
apply_cc(val_bcc_paths,'cc_val/bcc/', (224,224))
apply_cc(val_bkl_paths,'cc_val/bkl/', (224,224))
apply_cc(val_df_paths,'cc_val/df/', (224,224))
apply_cc(val_mel_paths,'cc_val/mel/', (224,224))
apply_cc(val_vasc_paths,'cc_val/vasc/', (224,224))

100%|###############################################################| 89/89 [00:01<00:00, 74.14it/s]
100%|#############################################################| 139/139 [00:01<00:00, 76.04it/s]
100%|#############################################################| 297/297 [00:04<00:00, 73.84it/s]
100%|###############################################################| 31/31 [00:00<00:00, 74.15it/s]
100%|#############################################################| 301/301 [00:04<00:00, 72.79it/s]
100%|###############################################################| 39/39 [00:00<00:00, 75.56it/s]


### Test

In [8]:
test_akiec_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/test/akiec/*.jpg')
test_bcc_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/test/bcc/*.jpg')
test_bkl_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/test/bkl/*.jpg')
test_df_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/test/df/*.jpg')
test_mel_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/test/mel/*.jpg')
test_vasc_paths = glob('/Users/waranthornchansawang/Documents/GitHub/6_classes_HAM10000_split/test/vasc/*.jpg')

In [9]:
apply_cc(test_akiec_paths,'cc_test/akiec/', (224,224))
apply_cc(test_bcc_paths,'cc_test/bcc/', (224,224))
apply_cc(test_bkl_paths,'cc_test/bkl/', (224,224))
apply_cc(test_df_paths,'cc_test/df/', (224,224))
apply_cc(test_mel_paths,'cc_test/mel/', (224,224))
apply_cc(test_vasc_paths,'cc_test/vasc/', (224,224))

100%|###############################################################| 33/33 [00:00<00:00, 77.61it/s]
100%|###############################################################| 52/52 [00:00<00:00, 75.28it/s]
100%|#############################################################| 110/110 [00:01<00:00, 71.53it/s]
100%|###############################################################| 12/12 [00:00<00:00, 75.59it/s]
100%|#############################################################| 112/112 [00:01<00:00, 73.25it/s]
100%|###############################################################| 15/15 [00:00<00:00, 74.38it/s]


**That's all folks!**

I hope it was useful for you!