# Data Processing

The data processing is split in two parts. Firstly Input processing, secondly output processing.

Input processing consists mostly in:
1. Selecting the most interesting columns

## Loading the input data and creating Checkboxes

We will be using a checkbox system to simplify the selection system. The following cells load the complete data, 
create the list of columns and all the checkboxes.

Firstly, as usual, we change our working directory to go to the root of the project. The working diretory should be 
something like 'xxx\Roll Wear Project'

In [1]:
from utils_notebooks import move_current_path_up
move_current_path_up(n_times=2)

We load the complete data

In [2]:
import pandas as pd

input_df = pd.read_hdf('Data/notebooks_data/wear_center.h5', key='inputs')
columns_names = input_df.columns

We create the list of checkboxes from the list of columns

In [3]:
import IPython.display
import ipywidgets as widgets

checkbox_list = []

supplier_list, family_list, lineup_list = [], [], []
supplier_description, family_description, lineup_description = 'Roll Supplier', 'Strip Families', 'Line Up'

# For each column of the DataFrame, we create a single checkbox.
# Except for suppliers, families and lineups which have a single checkbox per category
for col_name in columns_names:
    if 'supplier' in col_name:
        supplier_list.append(col_name)
    elif 'family' in col_name:
        family_list.append(col_name)
    elif 'lineup' in col_name:
        lineup_list.append(col_name)
    else:
        checkbox_list.append(widgets.Checkbox(description=col_name))
    
checkbox_list.append(widgets.Checkbox(description=supplier_description))
checkbox_list.append(widgets.Checkbox(description=family_description))
checkbox_list.append(widgets.Checkbox(description=lineup_description))

# We add all the checkbox into one grid container
column_selection_gridbox = widgets.GridBox(checkbox_list, layout=widgets.Layout(grid_template_columns="repeat(3, 300px)"))

# We create the button and output 
button = widgets.Button(description='validate selection and process Inputs')
out = widgets.Output()

# We put all those elements in one final container
final_container = widgets.VBox([column_selection_gridbox, button, out])

## Creating the selection function and interactive button

We create the selection function, which looks to which checkboxes are selected or not and returns a dataset with only 
the corresponding columns. 

In [4]:
def columns_selection():
    """ This function gets the selected columns in the checkboxes and returns the new DataFrame with selected columns 
    
    :return: DataFrame with only selected columns
    """
    new_columns = []
    # We go through all our checkbox and check is they are checked
    for box in checkbox_list:
        if box.value:
            # If the CheckBox is selected we add the corresponding column name to the new list
            name = box.description
            # If the name correspond to one of the conglomerate list, we add this list
            if name == supplier_description:
                new_columns.extend(supplier_list)
            elif name == family_description:
                new_columns.extend(family_list)
            elif name == lineup_description:
                new_columns.extend(lineup_list)
            # Otherwise, we add the checkbox name
            else:
                new_columns.append(name)
                
    return input_df[new_columns]

Finally, we create the function which will be called when clicking on the button, and link it with the previously 
created button.

In [5]:
def on_button_clicked(_):
      # "linking function with output"
      with out:
          IPython.display.clear_output()

          selected_input_df: pd.DataFrame = columns_selection()
          if len(selected_input_df.columns) != 0: 
              selected_input_df.to_hdf('Data/notebooks_data/wear_center_preprocessed.h5', key='inputs')
          
              print('Columns_selected and saved in file.\nSelected columns are : [%r]' 
                    % ', '.join(selected_input_df.columns))
          else:
              print('No columns were selected, nothing has been done')
          
# linking button and function together using a button's method
button.on_click(on_button_clicked)

## The columns selection GUI (Graphical User Interface)

And eventually, we plot the GUI, which can be used to make the columns selection.

In [6]:
final_container

VBox(children=(GridBox(children=(Checkbox(value=False, description='STRIP CODE'), Checkbox(value=False, descri…