# 0.8 Hierarchical Learning

## Boilerplate

The following subsections are largely boilerplate code, so skip around as needed.

### Jupyter Extensions

Load [watermark](https://github.com/rasbt/watermark) to see the state of the machine and environment that's running the notebook. To make sense of the options, take a look at the [usage](https://github.com/rasbt/watermark#usage) section of the readme.

In [1]:
# Load `watermark` extension
%load_ext watermark
# Display the status of the machine and packages. Add more as necessary.
%watermark -v -n -m -g -b -t -p numpy,matplotlib,seaborn,tensorflow

Tue May 28 2019 22:28:57 

CPython 3.6.8
IPython 7.3.0

numpy 1.16.2
matplotlib 3.0.3
seaborn 0.9.0
tensorflow 1.12.0

compiler   : GCC 7.3.0
system     : Linux
release    : 4.4.0-130-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 12
interpreter: 64bit
Git hash   : b05a537249072d1ffeee1d2927e868a4c995b283
Git branch : master


Load [autoreload](https://ipython.org/ipython-doc/3/config/extensions/autoreload.html) which will always reload modules marked with `%aimport`.

This behavior can be inverted by running `autoreload 2` which will set everything to be auto-reloaded *except* for modules marked with `%aimport`.

In [2]:
# Load `autoreload` extension
%load_ext autoreload
# Set autoreload behavior
%autoreload 1

Load `matplotlib` in one of the more `jupyter`-friendly [rich-output modes](https://ipython.readthedocs.io/en/stable/interactive/plotting.html). Some options (that may or may not have worked) are `inline`, `notebook`, and `gtk`.

In [3]:
# Set the matplotlib mode.
%matplotlib inline

### Imports

Static imports that shouldn't necessarily change throughout the notebook.

In [4]:
# Standard library imports
import logging
import os
from pathlib import Path
from copy import deepcopy
from pprint import pprint

# Third party
import IPython as ipy
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
from pstar import pdict

Local imports that may or may not be autoreloaded. This section contains things that will likely have to be re-imported multiple times, and have additions or subtractions made throughout the project.

In [5]:
# Utility functions
%aimport leabratf.utils
from leabratf.utils import setup_logging
%aimport leabratf.constants
from leabratf.constants import DIR_DATA_PROC

### Initial Setup

Set [seaborn defaults](https://seaborn.pydata.org/generated/seaborn.set.html) for matplotlib.

In [6]:
sns.set()
sns.set_context("notebook")

Set up the logger configuration to something more useful than baseline. Creates log files for the different log levels in the `logs` directory.

See `logging.yml` for the exact logging configuration.

In [7]:
# Run base logger setup
setup_logging()
# Define a logger object
logger = logging.getLogger('leabratf')

## Task Definitions

### Constants

In [12]:
N_COLORS = 5
N_SHAPES = 4

### Phase Colors and Shapes

In [8]:
# All the colors and shapes
all_colors = [0, 1, 2, 3, 4]
all_shapes = [1 ,2, 3, 4]

# Phase A
phase_a_colors = [0, 1, 2]
phase_a_shapes = [1, 2]

# Phase B
phase_b_colors = [0, 1, 2]
phase_b_shapes = [3, 4]

# Phase C
phase_c_colors = [3, 4]
phase_b_shapes = [3, 4]

# Color lines correspond to a particular horizontal line
# # colors are not uniformly selected for
# Shapes corespond to a particular vertical line
# Color, Shape combinations correspond to a particular action 1-4

### Action Mapping


In [13]:
import leabratf.tasks.combinatorics.default_configuration as config
from leabratf.utils import as_list, flatten

def generate_labels(n_samples=1,
                    slots=config.slots,
                    size=config.size,
                    dims=config.dims,
                    n_lines=config.n_lines,
                    line_stats=None):
    """Returns an array of labels to construct the data from.

    Parameters
    ----------
    n_samples : int, optional
    	Number of samples to return.

    slots : int, optional
    	Number of slots per sample.

    size : int, optional
    	Size of the nxn matrix to use for the task.

    dims : int, optional
    	Number of dimensions for the task.

    n_lines : int, optional
    	Total number of lines to have per sample.

    line_stats : list or None, optional
    	Statistics for sampling from the ``size x dims`` elements.
    
    Returns
    -------
    labels : np.ndarray of shape ``(n_samples, stack, size, dims)``
    	The resulting task labels.

    Raises
    ------
    ValueError
    	If ``dims`` does not match the number of lines provided (assuming more
    	than one number was provided for it)    
    """
    # Ensure `n_lines` is an int
    n_lines = int(n_lines)
    # This will be useful going forward
    n_idx = size * dims
    # It must be less than the number of available indices
    if n_lines >= n_idx:
        raise ValueError('n_lines must be less than size * dims.')

    # Get default value for line_stats if its None
    if line_stats is None:
        # Check if these are default conditions, ie n_idx is what ``config``
        # would specify them to be.
        if size == config.size and dims == config.dims:
            line_stats = config.line_stats
        # Otherwise, generate a uniform distribution with the appropriate length
        else:
            line_stats = [1] * size * dims

    # Normalize `line_stats` to sum to 1 if it isn't already
    line_stats = flatten(line_stats)
    line_stats = np.array(line_stats) / sum(line_stats)
        
    # Generate a zero array to fill with 1s
    raw_labels = np.zeros((n_samples, slots, n_idx))
    
    # Create a list of length `dims` that contains arrays with the indices which
    # to set the value to 1. Each array is of shape `n_samples` by `stack` by
    # `n_line[i]` where `i` is the line index.
    arg_ones = np.array([np.random.choice(range(n_idx), 
                                          n_lines, 
                                          replace=False,
                                          p=line_stats)
                         for _ in range(n_samples * slots)]).reshape(
                                 (n_samples, slots, n_lines))
    return arg_ones

generate_labels(2)

array([[[1, 2],
        [0, 7],
        [6, 7],
        [6, 4]],

       [[7, 1],
        [6, 8],
        [6, 0],
        [5, 2]]])

In [69]:
def phase_a_data(n_samples=100, 
                 colors=phase_a_colors,
                 shapes=phase_a_shapes,
                ):
    # N Color samples
    color_choices = np.eye(N_COLORS)[np.random.choice(
        phase_a_colors,
        size=n_samples,
        replace=True,
        p=[.25, .25, .5],
    )].reshape((n_samples, N_COLORS, 1))
    # N Shape samples
    shape_choices = np.eye(N_SHAPES)[np.random.choice(
        [s-1 for s in phase_a_shapes],
        size=n_samples,
        replace=True,
        p=[.5, .5],
    )].reshape((n_samples, N_SHAPES, 1))
    
    # Full Color array
    color_array = np.tile(color_choices, N_SHAPES)
    # Full Shape Array
    shape_array = np.transpose(
        np.tile(shape_choices, N_COLORS),
        [0, 2, 1])
        
    # Full data with both
    x_data = np.maximum(color_array, shape_array)
    return full_data

phase_a_data(10)
    

array([[[1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 1., 1., 1.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.]],

       [[1., 0., 0., 0.],
        [1., 1., 1., 1.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.]],

       [[1., 1., 1., 1.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.]],

       [[0., 1., 0., 0.],
        [1., 1., 1., 1.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.]],

       [[1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 1., 1., 1.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.]],

       [[0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [1., 1., 1., 1.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.]],

       [[0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [1., 1., 1., 1.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.]],

       [[0., 1., 0., 0.],
        [1., 1., 1., 1.],
        [0., 1., 0., 0.]