<h1>Part 2b - Data Acquisition and Exploration for Piece Classification</h1>

<h2>Overview</h2>

This section will demonstrate the data labeling GUI for squares. The squares are created by splitting a chessboard into 8ths width- and height-wise. The chessboards themselved are picked up from screenshots captured from YouTube which were labeled in the process capture in Part 2a. 

This is also atool I wrote for simplifying the labeling process. The interface is written in tkinter for cross-platform. For now, uses a csv file for keeping square content by square color (not deemed necessary at this time), piece color and piece type. There is also a column for whether a human checked the labeling or not.

After pointing out some guiding principles and showing a basic function of the labeler, I will conclude with some basic counts.

<h2>A Guide to Square GUI Design</h2>

**Methodology aims to lessen the cognitive load of the labeler. <br>
  A natural next step is to keep the end user away from dealing with csv file and manual handling of entering piece type and color in the file..**

To this end, the user can/should:
    <li>Label a square object within the GUI with mouse click - radio buttons are used for their toggle property.</li>
    <li>Use minimum clicks I - default state for a square is set to E(mpty) as one can at most have half the squares filled in a chess game. </li>
    <li>Use minimum clicks - A button is added to confirm "Human Check" wholesale for the screen</li>
    <li>Label sets of squares in succession.
    <li>Move to the next image (or exit) without worrying about whether the work was saved or not.
     <br>
    
The implementation is in gcb_utils/gcb_utils.py
        

<br>Now, let's briefly explore the process.
      

In [3]:
#import packages
import gcb_utils.gcb_utils as gcb_utils
import pandas as pd

<h2>GUI Demo in Pictures</h2>
(The squares shown are from chess games in Chess.com's Youtube channel.)

In [None]:
SQ_LABEL_COLUMNS = ['fname', 'SqColor-BWE', 'PcColor-BWE', 'PcType-PRNBQKE','HumCheck-YN'] 

def run_sq_label_update():
    gcb_utils.insert_data_fnames('data/raw/squares', 'data/model/sq_labels.csv', SQ_LABEL_COLUMNS, update_fn = gcb_utils.square_insert_default_values, update_fn_kwargs={'label_cols':['SqColor-BWE', 'PcColor-BWE', 'PcType-PRNBQKE'], 'hum_check_col':['HumCheck-YN']})
    gcb_utils.update_sq_labels('data/raw/squares', 'data/model/sq_labels.csv') 

run_sq_label_update()

#please uncomment the below for documentation
#print(help(gcb_utils.insert_data_fnames))
#print(help(gcb_utils.square_insert_default_values))
#print(help(gcb_utils.update_sq_labels))

<h3>State Before Labeling</h3>
Note that the defaults for piece color and type are E(mpty) and human check is N(o). At this point, Make Estimates is dysfunctional. A plan for the future is to connect the Piece Identification model to it for better default values.


![Alt_text](z_markdown_jpgs/SquareLabeling-UnlabeledSquares.png)

<h3>State After Labeling</h3>


![Alt_text](z_markdown_jpgs/SquareLabeling-LabeledSquares.png)

<h2>Note on Labeling Screenshots - Performance / Use</h2>

With my methodology, I was able to label around 5600 squares - perhaps in 6 hours or so.   

Because more than half of the labeled squares are empty, let's take a look the breakdown by piece. This will help us understand biases in class distribution and provide remedies if necessary.


In [54]:
square_csv_full_path = 'data/model/sq_labels.csv'
squares_df = pd.read_csv(square_csv_full_path)
squares_df.head()

Unnamed: 0,fname,SqColor-BWE,PcColor-BWE,PcType-PRNBQKE,HumCheck-YN
0,ScreenShot2021-09-30at5-07-40PM_brd_squ_R0_C0.png,W,B,R,Y
1,ScreenShot2021-09-30at5-07-40PM_brd_squ_R0_C1.png,B,B,N,Y
2,ScreenShot2021-09-30at5-07-40PM_brd_squ_R0_C2.png,W,B,B,Y
3,ScreenShot2021-09-30at5-07-40PM_brd_squ_R0_C3.png,B,B,Q,Y
4,ScreenShot2021-09-30at5-07-40PM_brd_squ_R0_C4.png,W,B,K,Y


In [56]:
hc_yes = squares_df['HumCheck-YN']=='Y'
pc_col_notE = squares_df['PcType-PRNBQKE'] != 'E'
target_cols = ['fname','PcColor-BWE', 'PcType-PRNBQKE']

square_counts_df = squares_df[hc_yes][target_cols].pivot_table(
        columns = ['PcType-PRNBQKE'],
        index = ['PcColor-BWE'], 
        values = ['fname'],
        aggfunc = 'count',
        margins = True, 
        fill_value = '-')

print('Count of Pieces by Color and Type - includes E(mpty) squares')
square_counts_df





Count of Pieces by Color and Type - includes E(mpty) squares


Unnamed: 0_level_0,fname,fname,fname,fname,fname,fname,fname,fname
PcType-PRNBQKE,B,E,K,N,P,Q,R,All
PcColor-BWE,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
B,117.0,-,89.0,112.0,531.0,65.0,145.0,1059
E,-,3525.0,-,-,-,-,-,3525
W,118.0,-,85.0,114.0,531.0,67.0,143.0,1058
All,235,3525,174,226,1062,132,288,5642


From the above, we can see that, as expected Empty squares make more than half (\~62.5%) of the sample creating a bias. In aggregate, next are Pawns (\~19%) followed by Rooks (\~5%), Bishops (\~4%), kNights (\~4%), Kings (\~3%) and Queens (\~2.5%).   <br>

However, the "In aggregate" statement above is comes with a potentially big caveat. Noting that the black and white seem balanced, the weights of each piece type should be halved if one were to classify across piece color and type. <br>