# Compare two methods of assessing skin tone

This notebook demonstrates how to use the **preprocess** and **detection** functions to implement two methods of assessing skin tone in black & white photos -- then, use **conf_matrix**, **mse**, and **OTHER FIT MODULES**? to assess which one worked better.

### Import packages

First, import relevant dependencies:

In [None]:
import os
import sys
import pandas as pd
import numpy as np
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from tonelocator import preprocess, detection, colorizer, conf_matrix

### Create array of image file names

Next, create an array that lists the file names of all the photos you want to run the tonelocator complexion detection methods on. 

Replace 'folder' below with the file path to the folder your photos are in.

In [None]:
#TODO: replace with the correct folder and a set of unprocessed photos
arr = []
folder = '../tonelocator/data/practice_set'
for file in os.listdir(folder):
    arr.append(os.path.join(folder, file))
arr = np.array(arr)

### Pre-process photos

Next, pre-process the photos into three sets of files: 
1. the color photos that we'll use to detect the "true" color composition of the photos 
2. the grayscale photos that we'll use to test our B&W Monk Scale detection methods on
3. colorized versions of the grayscale photos that we'll used to test B&W Monk Scale detection method 2

The **preprocess** module crops the photos and has an option to convert them to grayscale. The **colorizer** module is used to colorize B&W photos. 

In [None]:
#Create subfolders to contain preprocessed color photos,
# B&W versions of those photos, and colorized versions
# of the B&W photos
folder = '../tonelocator/data/practice_set'
!mkdir $folder/color
!mkdir $folder/grayscale
!mkdir $folder/colorized

# TODO: add code to save set of preprocessed pics in respective folders (color & grayscale)
preprocess(i) for i in arr
etc

# TODO: add colorizer code

#RETURN: three arrays of the links to the color, grayscale, and colorized photos:

#arr_color
#arr_grayscale
#arr_colorized


### Detect Monk scale based on color photos - the 'true' result we'll compare our detection methods to

Next, use the **complexion_detection** function to detect the color composition of the color photos. This is the 'true' result we'll use as a baseline to compare against the two methods of detecting complexion from B&W photos. 

In [None]:
true_results = np.array([detection.complexion_detection(i, rounding_places=2) 
                                  for i in arr_color])
true_df = pd.DataFrame(true_results)
true_df['picid'] = arr_color
# TODO: fix picid to be a numeric identifier

### METHOD 1: Detect Monk scale based on B&W versions of color photos and reference to B&W Monk scale

In [None]:
m1_results = np.array([detection.complexion_detection(i, 
                                                      rounding_places=2,
                                                     grayscale=True) 
                                  for i in arr_grayscale])
m1pred_df = pd.DataFrame(m1_results)
m1pred_df['picid'] = arr_color
# TODO: fix picid to be a numeric identifier

### METHOD 2: Detect Monk scale based on colorized photo

In [None]:
m2_results = np.array([detection.complexion_detection(i, 
                                                      rounding_places=2,
                                                     grayscale=False) 
                                  for i in arr_colorized])
m2pred_df = pd.DataFrame(m2_results)
m2pred_df['picid'] = arr_colorized
# TODO: fix picid to be a numeric identifier

### Compare effectiveness of each method

First, create a confusion matrix for each method. 

In [None]:
conf_matrix(true=true_df, pred=m1pred_df).plot()

In [None]:
conf_matrix(true=true_df, pred=m2pred_df).plot()

Calculate the MSE for each method:

In [None]:
print(mse(true=true_df, pred=m1pred_df, bybin=False))
print(mse(true=true_df, pred=m2pred_df, bybin=False))

The MSE is higher for X method and the confusion matrix suggests the fit is better for X method - this is probably because XYZ