<h1>Part 2a - Data Acquisition Tooling and Exploration for Chessboard Detection</h1>

<h2>Overview</h2>

This section will demonstrate the data labeling GUI for screenshots captured from YouTube. 

This is a tool I wrote for simplifying the labeling process. The interface is written in tkinter for cross-platform. For now, uses a csv file for keeping screenshot bounding box data.

After pointing out some guiding principles and showing a basic function of the labeler, we will explore some examples pointing out some idiosyncracies and conclude with some basic counts.

<h2>A Guide to Screenshot GUI Design</h2>

**Methodology aims to lessen the cognitive load of the labeler. <br>
  A natural next step is to keep the end user away from dealing with csv file and manual handling of file names and bounding box pixels in there.**

To this end, the user can/should:
    <li>Label a chessboard object within the GUI with mouse drag </li>
    <li>Label multiple objects </li>
    <li>Delete any labels that are not desired</li>
    <li>Label screenshots in succession.
    <li>Move to the next image (or exit) without worrying about whether the work was saved or not.
     <br>
    
The implementation is in gcb_utils/gcb_utils.py
        

<br>Now, let's briefly explore the process.
      

In [2]:
#import packages
import gcb_utils.gcb_utils as gcb_utils

In [None]:

screenshot_data_path = 'data/raw/screenshots'
screenshot_labels_fname = 'data/model/screenshot_boundboxes.csv'
SCREENSHOT_LABEL_COLUMNS = ['fname', 'height_pxl','width_pxl','label','x_min_pxl','y_min_pxl'    ,'x_max_pxl','y_max_pxl', 'HumCheck-YN']
update_fn = gcb_utils.screenshot_height_width_update



SCREENSHOT_LABEL_COLUMNS = ['fname', 'height_pxl','width_pxl','label','x_min_pxl','y_min_pxl'    ,'x_max_pxl','y_max_pxl', 'HumCheck-YN']

def run_screenshot_label_update():
    gcb_utils.insert_data_fnames(screenshot_data_path, screenshot_labels_fname, SCREENSHOT_LABEL_COLUMNS, update_fn=update_fn, update_fn_kwargs={'screenshot_path':screenshot_data_path})
    gcb_utils.update_screenshot_labels(screenshot_data_path, screenshot_labels_fname) 

run_screenshot_label_update()

#please uncomment the below for documentation
#print(help(gcb_utils.insert_data_fnames))
#print(help(gcb_utils.screenshot_height_width_update))
#print(help(gcb_utils.update_screenshot_labels))

<h3>State Before Labeling</h3>
Here, note that bounding box pixels are all NaN before labeling. <br>
Labeling is a click-drag-unclick sequence.

![Alt_text](z_markdown_jpgs/BoardLabeling-Empty.png)

<h3>Add First Label</h3>
Here, note the updated bounding box values.

![](z_markdown_jpgs/BoardLabeling-AddFirstBoard.png)

<h3>Add Final Label</h3>
The GUI also lets you insert multiple labels or delete any mislabels.

![](z_markdown_jpgs/BoardLabeling-AddFinalBoard.png)

<h2>Labeling Smaller Chessboards: An Example</h2>

Some of my screenshots also included previous screenshots and screenshots from a prior point in the video. For completion, I labeled those as well. (The screenshot shown is also from Chess.com's Youtube channel.)


![Alt_text](z_markdown_jpgs/BoardLabeling-SmallBoards.png)

<h2>Notes on Screenshots and Preparing Labels for Training</h2>
    
<h3> Labeling Speed with the GUI</h3>

<li>With my methodology, I was able to label around 80 screenshots pretty quickly - in approximately 2-3 hours. 
    <li>At that time, the above labeling process gave me confidence that if I needed more data, I could get more pretty quickly. <br><br>
        
<h3>Information on Screenshots and Screenshot Size Statistics</h3>

<li>I have captured the screenshots using a macOS operating system.
<li>Each screenshot is 2880x1800 (widthxheight in pixels) in png format. The average size is 3.8Mb (min/max/std: 1.6/7.2/1.3Mb) <br><br>

    
    
<h3>Preparing Screenshots for Training: Yaml file for directories and classes +  yolov5 Labeling Format for boundboxes + Train/Validation/Test Split </h3>
    <li>yolov5 requires a .yaml file that indicates directories for training and validation images. The file also requires that the user indicate number of classes in the dataset along with their names.
    <li>In addition to indicating image directories and a class mapping, labeling (class+boundingbox) must be available for each image in a specified yolov5 format.I choose to include this information in a separate file for each image.
    <li>gcb_utils_prepare_scr_input_for_yolov5 is for this purpose. It creates the yaml file and a label file for each image in yolov5 format [e.g. for bounding boxes representation should normalized over image width and height]
    <li>Train/Validation/Test is done by another function: gcb_utils.split_train_valid_test. It randomizes the source images into indicated train/validation/test ratios and transfers them to respective directories at the indicated size along with normalized their label files.
     <li>Please refer to gcb_utils.py for more help/information/functionality and gbc_utils_sample_runs.py for a sample runs.

In [7]:
#please uncomment the below for documentation
#print(help(gcb_utils.prepare_scr_input_for_yolov5))
#print(help(gcb_utils.split_train_valid_test))