## Init and Run GMM

In [1]:
%%javascript
Jupyter.utils.load_extensions('tdb_ext/main')

<IPython.core.display.Javascript object>

In [2]:
#this sets the backend to jupyter/ipython that (i think) displays
#     images directly. anyway, it prevents the matplotlib framework
#     python error that is my least favorite thing eeeevvvveeeer.
%matplotlib notebook

import sys
import os
os.chdir('/Users/azane/GitRepo/spider') #TODO just make actual modules?
sys.path.append("./scripts27")
sys.path.append("./scripts27/gauss_mix")

import gmix_model as gmix
import numpy as np
import tdb as tdb
import tensorflow as tf
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import gmix_sample_mixture as smpl
import graph_NPZ as graph_highD

In [3]:
def remove_nan_rows(x, y):
    #hstack negated isnan checks
    b = ~(np.hstack((np.isnan(x), np.isnan(y))))
    #get rows where all columns are True (not nan)
    b = b.all(axis=1)
    
    return x[b], y[b]

In [4]:
#read in training and test data
s_x, s_t = gmix.get_xt_from_npz('data/spi_gmix_train.npz', True)
t_x, t_t = gmix.get_xt_from_npz('data/spi_gmix_test.npz', True)

#train for only some dimensions.
xDims = np.array([0,1,2,-1]) #muscle, muscle, balance, sensor, time
s_x = s_x[:,xDims]
t_x = t_x[:,xDims]

s_x, s_t = remove_nan_rows(s_x, s_t)
t_x, t_t = remove_nan_rows(t_x, t_t)

#TEMP
#expand target dimension so variance can be happy.
scaleOut = 100
s_t *= scaleOut
t_t *= scaleOut

### Note on Scaling and Variance Saturation:

   ```GaussianMixtureModel``` needs to handle the variance scaling. The question is, should it scale up from tanh and *then* calculate loss? Or should it keep everything within the tanh range, and then scale up only for outputs?
   
   If the actual range is large, then it would be more accurate and safer to scale up to the range. But, like in the spider example, if the range is small, so it would actually be safer to keep it at tanh. I think if it can be determined that the tanh range can accurately represent the means and variances, then it's best to keep it there. Otherwise, we might need to expand everything to a middle-man range where loss can be calculated, and then expand to the actual range on formula retrieval.

In [5]:
#create gmm with data
np.random.seed(np.random.randint(100000))
gmm = gmix.GaussianMixtureModel(np.copy(s_x), np.copy(s_t),
                                np.copy(t_x), np.copy(t_t),
                               numGaussianComponents=15, hiddenLayerSize=20,
                               learningRate=1e-3) #0.005 worked for 2d

### Thoughts on Hyperparameters

  * The larger the hidden layers, the more representations of globally best solutions. Thus, a slower training rate can be afforded, as there are more routes out of local minima.
  * Small hidden layers may require a larger training rate so it can jump out of local minima.
  * Examining the mixing coefficient averages reveals whether or not some gaussian components are not being used. These should be minimized.
 
### Hyperparameters as Variables

   * We need an **intelligent learning rate**. It should make guesses as to whether it's stuck in a local minima, or honing in on a good solution. If it thinks it's stuck, set the learning rate high to jump out, if it's working on a good solution, keep the learning rate low to stay on track.
      * the loss function needs to scale with the number of samples, otherwise we'll see steeper gradients for for larger sample batches.
   * The **number of gaussian components** can be selected based on how many are being used, and how much. Having this change during training would require a restructuring of the network, however, and preserving training before restructuring may be impossible.
      * in other words, this may slow down training considerably, but complexity reduction would vastly increase execution.
   * It may be worth spawning **a number of networks** working on the same solution. This is a good way to determine whether a **local or global** solution has been found.
#### ```numGaussianComponents``` Hyperparameter
If the network is initialized with a large number of gaussian components, there are more chances for a mean to start close to the correct values. The most relevant components (or parts of those components; sampling can occur given mixing coefficients) can then be selected, and a data set can be built to train the only network layer for the means; that is, to train the output activations of the means. After the output activations of the means are trained, normal training can resume considering the full GMM.

# Train Step

In [41]:
%%capture
#d will be a dictionary of evaluated tensors under their standard name.
sessions = 20

sessionIterations = 100
runTimes = sessions*sessionIterations

gmm.train(iterations=runTimes, testBatchSize=1000,
          trainBatchSize=3000, reportEvery=sessionIterations)



# Visualization

In [44]:
m, v, u = gmm.get_xmvu()

#play with stuff
#v = np.ones((t_t.shape))*0.01

x, y = smpl.sample_mixture(t_x, m, v, u) #set to gmm sample

xCols = [1,2,-1]
yLow = -0.03*scaleOut
yHigh = 0.03*scaleOut

#get figure
fig, _ = graph_highD.graph3x1y(x, y, xCols=xCols,
                      yLow=yLow, yHigh=yHigh,
                      sbpltLoc=211, numPoints=1000)
#use the previous figure
graph_highD.graph3x1y(s_x, s_t, xCols=xCols,
                      yLow=yLow, yHigh=yHigh, fig=fig,
                      sbpltLoc=212, numPoints=1000)

<IPython.core.display.Javascript object>

(<matplotlib.figure.Figure at 0x150276650>,
 <mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x1454cd1d0>)

## Debugging

In [45]:
evalStr = [
    'calc_agg_grad_w1',
    'calc_agg_grad_b1',
    'calc_agg_grad_w2',
    'calc_agg_grad_b2',
    'calc_agg_grad_w3',
    'calc_agg_grad_b3',

    'v',
    'm',
    
    'w1',
    'w2',
    'w3',
    'b1',
    'b2',
    'b3'
    ]
d = gmm.get_evals(evalStr)

In [46]:
#save parameters of trained network for use by the spider brain.
#TODO fix variance scaling, otherwise,
#   the spider will need to rescale the output.
np.savez('data/spi_gmm_wb.npz',
         w1=d['w1'],
         w2=d['w2'],
         w3=d['w3'],
         b1=d['b1'],
         b2=d['b2'],
         b3=d['b3']
        )

In [10]:
%%capture
print 'calc_agg_grad_w1'
print d['calc_agg_grad_w1']
print 'calc_agg_grad_b1'
print d['calc_agg_grad_b1']
print 'calc_agg_grad_w2'
print d['calc_agg_grad_w2']
print 'calc_agg_grad_b2'
print d['calc_agg_grad_b2']
print 'calc_agg_grad_w3'
print d['calc_agg_grad_w3']
print 'calc_agg_grad_b3'
print d['calc_agg_grad_b3']

In [47]:
print d['v']
print np.mean(d['v'])

[[ 0.47956634]
 [ 0.44731617]
 [ 0.31900981]
 ..., 
 [ 0.18411332]
 [ 0.19727083]
 [ 0.21820399]]
0.168724


In [12]:
print np.mean(d['m'], 0)
print np.max(d['m'], 0)
print np.min(d['m'], 0)

[ 0.05314204  0.0904423   0.04779665  0.06963902  0.07524298  0.0609627
  0.07716006  0.08456244  0.06270435  0.06910693  0.06239186  0.06995435
  0.05773248  0.06231488  0.05684701]
[ 0.11496955  0.19471961  0.11017855  0.16878057  0.16479446  0.15127546
  0.15141183  0.17052777  0.12337659  0.17438205  0.1124177   0.14675713
  0.12191352  0.15784413  0.18477817]
[ 0.01846081  0.01284278  0.0160036   0.01311039  0.01314539  0.01776297
  0.01994157  0.01717548  0.01786521  0.01679872  0.02452451  0.01897383
  0.01785818  0.01560597  0.01376266]


### Note on Mixing Coefficients
I have yet to see a mixing coefficient much below .1. This tells me something may be awry, and may be/is the cause of many stray points.

In [13]:
print d['calc_agg_grad_w1']*gmm.learningRate

[[ -5.08141471e-04  -9.37364312e-05  -2.85662856e-04   2.83353234e-04
    3.86066997e-04   3.42515908e-04   6.71011396e-04   6.39747363e-04
    8.88504946e-05  -4.05519764e-04  -1.46168086e-03   2.65100880e-05
   -6.12867239e-04  -1.93101514e-04   1.68259867e-04  -3.21105850e-04
    2.17349327e-04   1.23808728e-04   8.80030158e-04  -5.16059459e-04]
 [  8.60669068e-04   2.27946715e-04   3.16140300e-04  -1.15959498e-03
   -1.10112247e-04  -1.45411026e-03  -6.14168646e-04   1.51275087e-03
   -1.41247001e-03   1.35937799e-03   6.11135736e-04   1.04572647e-03
    2.18207424e-04   2.09775841e-04   2.00235238e-03  -1.46688323e-03
    2.31269471e-04   1.08094566e-04  -4.81867843e-04   4.20850905e-04]
 [ -4.81804134e-04   1.69261002e-05  -3.76444688e-04   1.30146043e-04
    2.52861122e-04   6.83020859e-04   8.37785425e-04   1.91859042e-04
    3.31604533e-04  -1.79427676e-04  -7.67519989e-04  -3.28905473e-04
   -4.76842106e-04  -2.44983792e-04  -3.31875461e-04   2.05940727e-04
    4.06062114e-04