# Compare two methods of assessing skin tone

This notebook demonstrates how to use the **preprocess** and **detection** functions to implement two methods of assessing skin tone in black & white photos -- then, use **conf_matrix**, **mse**, and **OTHER FIT MODULES**? to assess which one worked better.

### Import packages

First, import relevant dependencies:

In [1]:
%cd ..
%cd ..
import os
import sys
import pandas as pd
import numpy as np
import cv2
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from tonelocator.tonelocator import detection, conf_matrix
from tonelocator.tonelocator.colorizer import colorizer

/Users/elizabethpelletier/tonelocator
/Users/elizabethpelletier


### Create array of image file names and list of photos

Next, create an array that lists the file names of all the photos you want to run the tonelocator complexion detection methods on. 

Replace 'folder' below with the file path to the folder your photos are in.

In [6]:
#TODO: replace with the correct folder and a set of unprocessed photos
examplefolder = 'tonelocator/example/photos/examples_unprocessed'
imgpaths = []
imgnames = []
for file in os.listdir(examplefolder):
    if not file.startswith('.'):
        imgpaths.append(os.path.join(examplefolder, file))
        imgnames.append(file)
imgpaths = np.array(imgpaths)
imgnames = np.array(imgnames)
print(imgpaths)
print(imgnames)

['tonelocator/example/photos/examples_unprocessed/a_01.jpg'
 'tonelocator/example/photos/examples_unprocessed/a_03.jpg'
 'tonelocator/example/photos/examples_unprocessed/a_02.jpg'
 'tonelocator/example/photos/examples_unprocessed/a_05.jpg'
 'tonelocator/example/photos/examples_unprocessed/a_04.jpg']
['a_01.jpg' 'a_03.jpg' 'a_02.jpg' 'a_05.jpg' 'a_04.jpg']


### Pre-process photos

Next, pre-process the photos into three sets of files: 
1. the color photos that we'll use to detect the "true" color composition of the photos 
2. the grayscale photos that we'll use to test our B&W Monk Scale detection methods on
3. colorized versions of the grayscale photos that we'll used to test B&W Monk Scale detection method 2

The **preprocess** module crops the photos and has an option to convert them to grayscale. The **colorizer** module is used to colorize B&W photos. 

In [3]:
# Create subfolders to contain preprocessed color photos,
# B&W versions of those photos, and colorized versions
# of the B&W photos
folder = 'tonelocator/example/photos'
!mkdir $folder/color
!mkdir $folder/grayscale
!mkdir $folder/colorized

mkdir: tonelocator/example/photos/color: File exists
mkdir: tonelocator/example/photos/grayscale: File exists
mkdir: tonelocator/example/photos/colorized: File exists


In [15]:
# Preprocess color images and save one set in color and one set in B&W 
# TODO: add code to save set of preprocessed pics in respective folders (color & grayscale)
#preprocess(i) for i in arr
#etc

#RETURN: 
#arr_color
#arr_grayscale

imgpaths_gray = []
imgnames_gray = []
for file in os.listdir(folder + '/grayscale'):
    if not file.startswith('.'):
        imgpaths_gray.append(os.path.join(folder + '/grayscale', file))
        imgnames_gray.append(file)
imgpaths_gray = np.array(imgpaths_gray)
print(imgpaths_gray)
print(imgnames_gray)

imgpaths_color = []
imgnames_color = []
for file in os.listdir(folder + '/color'):
    if not file.startswith('.'):
        imgpaths_color.append(os.path.join(folder + '/color', file))
        imgnames_color.append(file)
imgpaths_color = np.array(imgpaths_color)
imgnames_color = np.array(imgnames_color)
print(imgpaths_color)
print(imgnames_color)

['tonelocator/example/photos/grayscale/a_01.jpg'
 'tonelocator/example/photos/grayscale/a_03.jpg'
 'tonelocator/example/photos/grayscale/a_02.jpg'
 'tonelocator/example/photos/grayscale/a_05.jpg'
 'tonelocator/example/photos/grayscale/a_04.jpg']
['a_01.jpg', 'a_03.jpg', 'a_02.jpg', 'a_05.jpg', 'a_04.jpg']
['tonelocator/example/photos/color/a_01.jpg'
 'tonelocator/example/photos/color/a_03.jpg'
 'tonelocator/example/photos/color/a_02.jpg'
 'tonelocator/example/photos/color/a_05.jpg'
 'tonelocator/example/photos/color/a_04.jpg']
['a_01.jpg' 'a_03.jpg' 'a_02.jpg' 'a_05.jpg' 'a_04.jpg']


In [16]:
# Colorize each grayscale and save in 'colorized' output folder
for pic in imgnames:
    """
    Loop that will run through every file in the input folder, colorize them, and save the colorized image to the output folder.
    """
    imgpath = folder + "/grayscale/" + pic
    outpath = folder + "/colorized/" + pic
    print(outpath)
    image, colorized = colorizer.colorize_image(imgpath)
    print('done with colorize' + pic)
    data=cv2.imencode('.png', colorized)[1].tobytes()
    cv2.imwrite(outpath, colorized)

imgpaths_colorized = []
imgnames_colorized = []
for file in os.listdir(folder + '/colorized'):
    if not file.startswith('.'):
        imgpaths_colorized.append(os.path.join(folder + '/colorized', file))
        imgnames_colorized.append(file)
imgpaths_colorized = np.array(imgpaths_colorized)
imgnames_colorized = np.array(imgnames_colorized)
print(imgpaths_colorized)
print(imgnames_colorized)

tonelocator/example/photos/colorized/a_01.jpg
done with colorizea_01.jpg
tonelocator/example/photos/colorized/a_03.jpg
done with colorizea_03.jpg
tonelocator/example/photos/colorized/a_02.jpg
done with colorizea_02.jpg
tonelocator/example/photos/colorized/a_05.jpg
done with colorizea_05.jpg
tonelocator/example/photos/colorized/a_04.jpg
done with colorizea_04.jpg
['tonelocator/example/photos/colorized/a_01.jpg'
 'tonelocator/example/photos/colorized/a_03.jpg'
 'tonelocator/example/photos/colorized/a_02.jpg'
 'tonelocator/example/photos/colorized/a_05.jpg'
 'tonelocator/example/photos/colorized/a_04.jpg']
['a_01.jpg' 'a_03.jpg' 'a_02.jpg' 'a_05.jpg' 'a_04.jpg']


### Detect Monk scale based on color photos - the 'true' result we'll compare our detection methods to

Next, use the **complexion_detection** function to detect the color composition of the color photos. This is the 'true' result we'll use as a baseline to compare against the two methods of detecting complexion from B&W photos. 

In [32]:
true_results = np.array([detection.complexion_detection(i, rounding_places=2, grayscale=False) 
                                  for i in imgpaths_color])
true_df = pd.DataFrame(true_results)
true_df['picid'] = imgnames_color
# TODO: fix picid to be a numeric identifier
print(true_df)
true_df.columns

     0    1    2     3     4     5     6     7     8    9     picid
0  0.0  0.0  0.0  0.00  0.06  0.02  0.02  0.02  0.00  0.0  a_01.jpg
1  0.0  0.0  0.0  0.01  0.12  0.02  0.03  0.01  0.01  0.0  a_03.jpg
2  0.0  0.0  0.0  0.01  0.18  0.03  0.02  0.01  0.00  0.0  a_02.jpg
3  0.0  0.0  0.2  0.03  0.05  0.01  0.00  0.00  0.00  0.0  a_05.jpg
4  0.0  0.0  0.0  0.00  0.05  0.03  0.04  0.02  0.01  0.0  a_04.jpg


Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 'picid'], dtype='object')

### METHOD 1: Detect Monk scale based on B&W versions of color photos and reference to B&W Monk scale

In [20]:
m1_results = np.array([detection.complexion_detection(i, 
                                                      rounding_places=2,
                                                     grayscale=True) 
                                  for i in imgpaths_gray])
m1pred_df = pd.DataFrame(m1_results)
m1pred_df['picid'] = imgnames_gray
print(m1pred_df)

     0    1     2     3     4     5     6     7     8     9     picid
0  0.0  0.0  0.00  0.00  0.21  0.11  0.08  0.06  0.03  0.01  a_01.jpg
1  0.0  0.0  0.07  0.06  0.20  0.08  0.12  0.23  0.07  0.05  a_03.jpg
2  0.0  0.0  0.02  0.04  0.35  0.21  0.15  0.10  0.03  0.01  a_02.jpg
3  0.0  0.0  0.30  0.06  0.10  0.09  0.11  0.12  0.08  0.02  a_05.jpg
4  0.0  0.0  0.00  0.01  0.18  0.27  0.13  0.10  0.09  0.04  a_04.jpg


### METHOD 2: Detect Monk scale based on colorized photo

In [21]:
m2_results = np.array([detection.complexion_detection(i, 
                                                      rounding_places=2,
                                                     grayscale=False) 
                                  for i in imgpaths_colorized])
m2pred_df = pd.DataFrame(m2_results)
m2pred_df['picid'] = imgnames_colorized
print(m2pred_df)

     0    1     2     3     4     5     6     7     8    9     picid
0  0.0  0.0  0.00  0.00  0.13  0.02  0.01  0.00  0.00  0.0  a_01.jpg
1  0.0  0.0  0.00  0.01  0.14  0.02  0.02  0.03  0.00  0.0  a_03.jpg
2  0.0  0.0  0.00  0.01  0.22  0.07  0.03  0.01  0.00  0.0  a_02.jpg
3  0.0  0.0  0.02  0.01  0.06  0.01  0.01  0.01  0.01  0.0  a_05.jpg
4  0.0  0.0  0.00  0.00  0.11  0.04  0.05  0.02  0.01  0.0  a_04.jpg


### Compare effectiveness of each method

First, create a confusion matrix for each method. 

In [24]:
conf_matrix.conf_matrix(true=true_df, pred=m1pred_df).plot()

ValueError: true needs column named 0

In [None]:
conf_matrix(true=true_df, pred=m2pred_df).plot()

Calculate the MSE for each method:

In [None]:
print(mse(true=true_df, pred=m1pred_df, bybin=False))
print(mse(true=true_df, pred=m2pred_df, bybin=False))

The MSE is higher for X method and the confusion matrix suggests the fit is better for X method - this is probably because XYZ