## Init and Run GMM

In [1]:
%%javascript
Jupyter.utils.load_extensions('tdb_ext/main')

<IPython.core.display.Javascript object>

In [2]:
#this sets the backend to jupyter/ipython that (i think) displays
#     images directly. anyway, it prevents the matplotlib framework
#     python error that is my least favorite thing eeeevvvveeeer.
%matplotlib notebook

import sys
import os
os.chdir('/Users/azane/GitRepo/spider') #TODO just make actual modules?
sys.path.append("./scripts27")
sys.path.append("./scripts27/gauss_mix")

import gmix_model as gmix
import numpy as np
import tdb as tdb
import tensorflow as tf
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import gmix_sample_mixture as smpl
import graph_NPZ as graph_highD
import spider_solution_explorers as sexp

In [3]:
def remove_nan_rows(x, y):
    #hstack negated isnan checks
    b = ~(np.hstack((np.isnan(x), np.isnan(y))))
    #get rows where all columns are True (not nan)
    b = b.all(axis=1)
    
    return x[b], y[b]

In [4]:
#read in training and test data
s_x, s_t = gmix.get_xt_from_npz('data/spi_data.npz', True)
#t_x, t_t = gmix.get_xt_from_npz('data/spi_gmix_test.npz', True)
t_x, t_t = gmix.get_xt_from_npz('data/spi_data.npz', True)

#train for only some dimensions.
#xDims = np.array([0,1,2,-1]) #muscle, muscle, balance, time
#s_x = s_x[:,xDims]
#t_x = t_x[:,xDims]

s_x, s_t = remove_nan_rows(s_x, s_t)
t_x, t_t = remove_nan_rows(t_x, t_t)

#TEMP
#expand target dimension so variance can be happy.
#scaleOut = 100
#s_t *= scaleOut
#t_t *= scaleOut

### Note on Scaling and Variance Saturation:

   ```GaussianMixtureModel``` needs to handle the variance scaling. The question is, should it scale up from tanh and *then* calculate loss? Or should it keep everything within the tanh range, and then scale up only for outputs?
   
   If the actual range is large, then it would be more accurate and safer to scale up to the range. But, like in the spider example, if the range is small, so it would actually be safer to keep it at tanh. I think if it can be determined that the tanh range can accurately represent the means and variances, then it's best to keep it there. Otherwise, we might need to expand everything to a middle-man range where loss can be calculated, and then expand to the actual range on formula retrieval.

### GMM Init

In [5]:
#create gmm with data
np.random.seed(np.random.randint(100000))
gmm = gmix.GaussianMixtureModel(np.copy(s_x), np.copy(s_t),
                                np.copy(t_x), np.copy(t_t),
                               numGaussianComponents=15, hiddenLayerSize=20,
                               learningRate=1e-2) #0.005 worked for 2d

### ExplorerHQ Init

In [6]:
explorerHQ = sexp.ExplorerHQ(numExplorers=3,
                             
                             xRange=gmm.inRange, sRange=gmm.outRange,
                             #note that because the explorers are using the
                             #  refDict of the net being trained,
                             #  the weights will be automatically updated.
                             #  and there is no need to call the updater.
                             forwardRD=gmm.get_refDict(),
                             
                             certainty_func=sexp.gmm_bigI,
                             expectation_func=sexp.gmm_expectation,
                             parameter_update_func=sexp.gmm_p_updater,
                             
                             sensorGoal=np.array([-1.7]),
                             modifiers=dict(C=.2, T=.1, S=2.))

### Thoughts on Hyperparameters

  * The larger the hidden layers, the more representations of globally best solutions. Thus, a slower training rate can be afforded, as there are more routes out of local minima.
  * Small hidden layers may require a larger training rate so it can jump out of local minima.
  * Examining the mixing coefficient averages reveals whether or not some gaussian components are not being used. These should be minimized.
 
### Hyperparameters as Variables

   * We need an **intelligent learning rate**. It should make guesses as to whether it's stuck in a local minima, or honing in on a good solution. If it thinks it's stuck, set the learning rate high to jump out, if it's working on a good solution, keep the learning rate low to stay on track.
      * the loss function needs to scale with the number of samples, otherwise we'll see steeper gradients for for larger sample batches.
   * The **number of gaussian components** can be selected based on how many are being used, and how much. Having this change during training would require a restructuring of the network, however, and preserving training before restructuring may be impossible.
      * in other words, this may slow down training considerably, but complexity reduction would vastly increase execution.
   * It may be worth spawning **a number of networks** working on the same solution. This is a good way to determine whether a **local or global** solution has been found.
#### ```numGaussianComponents``` Hyperparameter
If the network is initialized with a large number of gaussian components, there are more chances for a mean to start close to the correct values. The most relevant components (or parts of those components; sampling can occur given mixing coefficients) can then be selected, and a data set can be built to train the only network layer for the means; that is, to train the output activations of the means. After the output activations of the means are trained, normal training can resume considering the full GMM.

# Train Step

In [26]:
%%capture
#d will be a dictionary of evaluated tensors under their standard name.
sessions = 3

sessionIterations = 100
runTimes = sessions*sessionIterations

gmm.train(iterations=runTimes, testBatchSize=1000,
          trainBatchSize=3000, reportEvery=sessionIterations)



In [27]:
#get trained data
evalStr = [
    'calc_agg_grad_w1',
    'calc_agg_grad_b1',
    'calc_agg_grad_w2',
    'calc_agg_grad_b2',
    'calc_agg_grad_w3',
    'calc_agg_grad_b3',

    'v',
    'm',
    
    'w1',
    'w2',
    'w3',
    'b1',
    'b2',
    'b3'
    ]
d = gmm.get_evals(evalStr)

# Visualization

In [38]:
#get mvu from test vals
m, v, u = gmm.get_xmvu()

#send weights to explorerHQ
explorerHQ.update_params(
                            w1=d['w1'],
                            w2=d['w2'],
                            w3=d['w3'],
                            b1=d['b1'],
                            b2=d['b2'],
                            b3=d['b3']
                        )

#get point value calculations
_, pv_v, pv_c, pv_t, pv_s, pv_tests = explorerHQ.graph_space(s_x)
#expand to 2d for graphing reqs.
pv_v = np.expand_dims(pv_v, 1)

#get gmm estimations
_, y_smpl = smpl.sample_mixture(s_x, m, v, u) #set to gmm sample
_, y = smpl.mixture_expectation(s_x, m, v, u) #set to gmm expectation

xCols = [1,2,-1]
yLow = None#-0.03*scaleOut
yHigh = None#0.03*scaleOut

modifiers: 
{'C': 0.12571524300793382, 'S': 0.76053290110100413, 'T': 0.11375185589106206}


In [39]:
#actual data
fig, _ = graph_highD.graph3x1y(s_x, y, xCols=xCols,
                      yLow=yLow, yHigh=yHigh,
                      sbpltLoc=311, numPoints=700)
graph_highD.graph3x1y(s_x, y_smpl, xCols=xCols,
                      yLow=yLow, yHigh=yHigh, fig=fig,
                      sbpltLoc=312, numPoints=700)
graph_highD.graph3x1y(s_x, s_t, xCols=xCols,
                      yLow=yLow, yHigh=yHigh, fig=fig,
                      sbpltLoc=313, numPoints=700)
fig.suptitle('Data: Expectation, Sample, Actual')

<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x11a1cd210>

In [40]:
#point value
fig, _ = graph_highD.graph3x1y(s_x, pv_v, xCols=xCols,
                      sbpltLoc=221, numPoints=500)
graph_highD.graph3x1y(s_x, pv_c, xCols=xCols, fig=fig,
                      sbpltLoc=222, numPoints=500)
graph_highD.graph3x1y(s_x, pv_t, xCols=xCols, fig=fig,
                      sbpltLoc=223, numPoints=500)
graph_highD.graph3x1y(s_x, pv_s, xCols=xCols, fig=fig,
                      sbpltLoc=224, numPoints=500)
fig.suptitle('Point Value: Value, Certainty, Time, Sensor')

<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x123b806d0>

In [41]:
#expectation and sensor value
fig, _ = graph_highD.graph3x1y(s_x, y, xCols=xCols,
                      sbpltLoc=211, numPoints=1000)
graph_highD.graph3x1y(s_x, pv_s, xCols=xCols, fig=fig,
                      sbpltLoc=212, numPoints=1000)
fig.suptitle('Expectation and Sensor Value')



<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x11a0da990>

In [42]:
#sample and certainty value
fig, _ = graph_highD.graph3x1y(s_x, y_smpl, xCols=xCols,
                      sbpltLoc=211, numPoints=1000)
graph_highD.graph3x1y(s_x, pv_c, xCols=xCols, fig=fig,
                      yHigh=None, yLow=None,
                      sbpltLoc=212, numPoints=1000)
fig.suptitle('Sample and Certainty Value')

<IPython.core.display.Javascript object>

<matplotlib.text.Text at 0x1225fd850>

## Debugging

In [14]:
print pv_tests[3]

[[ 0.63129508 -0.19704221  0.40208349 -0.4567216  -0.40090248 -0.28985363
  -0.04868295 -0.22283645  0.12633494  0.3161763  -0.20192023 -0.13954423
  -0.09997731 -0.35804236 -0.07423001  0.35432756  0.13813242  0.67565888
   0.20967464 -0.47752631]
 [ 0.09923327 -0.13083693  0.12049382 -0.12166163 -0.18326369 -0.44545907
   0.07668111  0.73097444  0.47858286  0.12535675  0.4828153   0.51680535
  -0.00930588 -0.76497763  0.11653093 -0.32332134  0.98129904  0.28698811
  -0.70221233 -0.8020246 ]
 [ 0.09680201 -0.03822198 -0.29717103 -0.49720341 -0.50030583 -0.14539152
   0.17635    -0.00481532 -0.45830926  0.32613158  0.16875471 -0.26496398
  -0.02604711  0.46587023 -0.69762951  0.00311021 -0.42356208  0.12212422
   0.15115985 -0.69138235]
 [ 0.68756098 -0.08818178 -0.04225502 -0.30519801 -1.02301311 -0.07334306
   0.2179869   0.27692306 -0.64333075 -0.15543325  1.01591337 -0.22828121
   0.09282777  0.92014664 -0.00832414  0.04071184  0.73224217  0.26557416
  -0.02953959  0.00809513]]


In [15]:
print pv_tests[7].shape
print pv_tests[8].shape
print pv_tests[9].shape
print pv_tests[9].mean()
print pv_tests[9].max()
print pv_tests[9].min()

(5000, 1)
(5000, 1)
(5000, 1)
0.0576336
0.20392
0.00569649


In [16]:
print "errDen"
print pv_tests[0].shape
print pv_tests[0]
print
print "errNum"
print pv_tests[1].shape
print pv_tests[1]
print
print "sensorVal"
print pv_tests[2].shape
print pv_tests[2]

err = pv_tests[1]/pv_tests[0]

print
print "error"
print err
print np.mean(err)

print

errDen
(1, 1)
[[ 13.92373753]]

errNum
(5000, 1)
[[ 0.38188589]
 [ 0.36589146]
 [ 0.37728176]
 ..., 
 [ 1.5422827 ]
 [ 1.63692689]
 [ 1.73605001]]

sensorVal
(5000, 1)
[[-1.08203089]
 [-1.09511042]
 [-1.08576739]
 ..., 
 [-0.45811328]
 [-0.42057562]
 [-0.38240752]]

error
[[ 0.02742697]
 [ 0.02627825]
 [ 0.0270963 ]
 ..., 
 [ 0.11076643]
 [ 0.11756375]
 [ 0.12468276]]
0.188461



In [17]:
print pv_tests[1]

[[ 0.38188589]
 [ 0.36589146]
 [ 0.37728176]
 ..., 
 [ 1.5422827 ]
 [ 1.63692689]
 [ 1.73605001]]


In [18]:
print explorerHQ._sRange
print explorerHQ._sRange*np.array([[-1.,1.]])

[[-2.11224604  1.61920631]]
[[ 2.11224604  1.61920631]]


#### Turn to code to write wb
```python

#save parameters of trained network for use by the spider brain.
#TODO fix variance scaling, otherwise,
#the spider will need to rescale the output.
print d['w1'].shape
print s_x.shape
np.savez('data/spi_gmm_wb.npz',
         w1=d['w1'],
         w2=d['w2'],
         w3=d['w3'],
         b1=d['b1'],
         b2=d['b2'],
         b3=d['b3']
        )
```

In [19]:
%%capture
print 'calc_agg_grad_w1'
print d['calc_agg_grad_w1']
print 'calc_agg_grad_b1'
print d['calc_agg_grad_b1']
print 'calc_agg_grad_w2'
print d['calc_agg_grad_w2']
print 'calc_agg_grad_b2'
print d['calc_agg_grad_b2']
print 'calc_agg_grad_w3'
print d['calc_agg_grad_w3']
print 'calc_agg_grad_b3'
print d['calc_agg_grad_b3']

In [20]:
print d['v']
print np.mean(d['v'])

[[ 0.1131371 ]
 [ 0.11408822]
 [ 0.11305103]
 ..., 
 [ 0.1002355 ]
 [ 0.10215706]
 [ 0.10454674]]
0.156225


In [21]:
print np.mean(d['m'], 0)
print np.max(d['m'], 0)
print np.min(d['m'], 0)

[ 0.07364285  0.05303255  0.11785071  0.05326494  0.04374653  0.10141068
  0.03589891  0.04088121  0.06738541  0.03489024  0.09170999  0.07677877
  0.04179003  0.05390102  0.11381634]
[ 0.13549156  0.15810624  0.20376851  0.13096568  0.13728796  0.1713599
  0.11396219  0.10460725  0.14785974  0.10068955  0.18484756  0.12677944
  0.11463881  0.15666929  0.20375629]
[ 0.02112789  0.01929844  0.01676596  0.01913163  0.0201099   0.01765483
  0.01374101  0.01936497  0.01987297  0.01567168  0.01862783  0.02263645
  0.01859967  0.01602668  0.01446973]


### Note on Mixing Coefficients
I have yet to see a mixing coefficient much below .1. This tells me something may be awry, and may be/is the cause of many stray points.

In [22]:
print d['calc_agg_grad_w1']*gmm.learningRate

[[  3.14363162e-04  -1.70603814e-03  -2.17387525e-04   6.74501425e-05
   -2.03228628e-04   5.40507201e-04   3.96100048e-04   2.66640418e-04
    1.00341204e-04  -6.55442185e-04   1.31384621e-03   2.09545924e-05
    5.52349025e-04  -5.16940723e-04  -6.49124384e-04   4.05757339e-04
   -1.15607178e-03   5.17660927e-04  -5.26175776e-04   4.84978256e-04]
 [  9.36943921e-04  -6.82451064e-04  -2.03730748e-03   1.33799549e-04
    6.05729947e-05   1.71804870e-03   6.54104806e-04   1.78420189e-04
    7.81710260e-04   4.12818394e-04   2.77537823e-04   9.13603231e-04
   -5.79181069e-04  -6.77624485e-05   6.31625589e-04  -9.09046212e-04
   -1.56095484e-04  -5.32527629e-05   2.50747020e-04   5.06331795e-04]
 [  6.18830032e-04  -1.93409377e-03  -4.63950797e-04   6.03901281e-04
   -6.46972738e-04  -1.46891776e-04   8.90152878e-04   6.29563525e-04
    6.88545813e-04  -3.95537878e-04   1.79284997e-03   5.58499363e-04
    1.98984612e-03   1.27971274e-04  -7.31369480e-04   2.03053863e-03
   -1.21401506e-03