# Purpose
1. Check the bias distribution in a feedforward OS model without cleanup, and see whether it is mostly negative. 
2. If the bias is related to the probability of activation in a node, we should see (negative) correlation between the two. Where the low activation chance nodes should have a more negative bias

In [None]:
import os
os.chdir('/home/jupyter/tf')

import troubleshooting
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

In [None]:
code_names = ('OS_ff', 'Refrac3_local', 'Refrac3_after6')

name_map = {
    'OS_ff': 'OS feedforward with no cleanup',
    'Refrac3_local': 'Original triangle with 2-12 tick error injection',
    'Refrac3_after6': 'Triangle with 7-12 tick error injection'
}


In [None]:
def plot_inputs_sem(code_name):
    """Plot bias and input density"""

    d = troubleshooting.Diagnosis(code_name)
    d.eval('train_r100', task='triangle', epoch=d.cfg.total_number_of_epoch)

    data = {
        'ss1': 0.5 * np.sum(d.get_weight(name="w_ss"), axis=0), # Lazy matmul. 
        'cs1': 0.5 * np.sum(d.get_weight(name="w_cs"), axis=0),
        'os1': 0.5 * np.sum(d.get_weight(name="w_hos_hs"), axis=0),
        'bias1': d.get_weight(name="bias_s") 
    }

    df = pd.DataFrame.from_dict(data)

    fig, ax = plt.subplots(1, 4, figsize=(15,6))
    for i, k in enumerate(data.keys()):
        df[k].plot.density(ax=ax[i], title=k)

    fig.suptitle(name_map[code_name])

    return df

In [None]:
df_og = plot_inputs_sem('Refrac3_local')

- For inputs, we multiply the initilized state 0.5, to weights. In real representation, the activation will be much lower, which will leads to a lower input. 
- For bias, it is taken as is.
- Bias mean is slightly positive !?

In [None]:
df_a6 = plot_inputs_sem('Refrac3_after6')

In error injection 7-12, the bias mean is near zero

In [None]:
df_ff = plot_inputs_sem('OS_ff')

- bias in OS feedforward is almost all negative

# Probability of activation per slot in semantic

In [None]:
import data_wrangling
data = data_wrangling.MyData()

Normalized log word frequency is the log10 frequency in each word divided by the max log frequency in all words

$\Huge{swf_i = \frac{\log(wf_i + 1)}{\max(\log(wf + 1))}}$

Weighted activation probability in each node

$\Huge{p_j = \frac{swf_i * act_{ij}}{n_i}}$

where i is word index, j is unit index

In [None]:
# Frequency weighting
wf = np.log10(data.df_train.wf+1)
swf = wf/wf.max()

# Tile (copy) to fit all 2446 units
wf_tile = np.transpose(np.tile(swf, reps=(2446,1)))

unweighted_p = data.np_representations['sem']
weigthed_activation = unweighted_p * wf_tile

# Probability of activation in each node
mean_unweighted_p = np.mean(s, axis=0)
mean_weighted_p = np.mean(weigthed_activation, axis=0)

def pad_sq(x:np.array)->np.array:
    """Pad the vector into 2500 units and reshape it into shape: (50, 50)"""
    assert len(x) == 2446
    return np.concatenate([x, np.zeros((54,))]).reshape(50,50)



## Unweighted activation probabilty

In [None]:
plt.imshow(pad_sq(mean_unweighted_p), cmap='hot')
plt.colorbar()
plt.title('Unweighted probability of activation per each semantic node')

- Sparse
- High p nodes exist

In [None]:
# 10 nodes having unweighted p > 0.1
mean_unweighted_p[mean_unweighted_p>0.1]

## Weighted by word frequency

In [None]:
plt.imshow(pad_sq(mean_weighted_p), cmap='hot')
plt.colorbar()
plt.title('Log frequency weighted probability of activation per semantic node')

In [None]:
# Bias in OS feedforword model
bias_ff_pad_sq = pad_sq(df_ff.bias1)
plt.imshow(bias_ff_pad_sq, cmap='hot')
plt.colorbar()
plt.title('bias in OP feedforward model')

# Correlation between bias and node activation probabiliity

In [None]:
df_units = pd.DataFrame({
    "bias_original": df_og.bias1,
    "bias_after6": df_a6.bias1,
    "bias_ff": df_ff.bias1,
    "unit_act_p_unweighted": mean_unweighted_p,
    "unit_act_p_weighted": mean_weighted_p
})

df_units.corr(method='pearson')

- Correlation magnitude is small... 
- In Feedforward it is negative !?