# BIDMach: parameter tuning

In this notebook we'll explore automated parameter exploration by grid search. 

In [19]:
import BIDMat.{CMat,CSMat,DMat,Dict,IDict,FMat,FND,GDMat,GMat,GIMat,GSDMat,GSMat,HMat,Image,IMat,Mat,SMat,SBMat,SDMat}
import BIDMat.MatFunctions._
import BIDMat.SciFunctions._
import BIDMat.Solvers._
import BIDMat.Plotting._
import BIDMach.Learner
import BIDMach.models.{FM,GLM,KMeans,KMeansw,ICA,LDA,LDAgibbs,NMF,RandomForest,SFA}
import BIDMach.datasources.{MatSource,FileSource,SFileSource}
import BIDMach.mixins.{CosineSim,Perplexity,Top,L1Regularizer,L2Regularizer}
import BIDMach.updaters.{ADAGrad,Batch,BatchNorm,IncMult,IncNorm,Telescoping}
import BIDMach.causal.{IPTW}

Mat.checkMKL
Mat.checkCUDA
Mat.plotInline = true
if (Mat.hasCUDA > 0) GPUmem

1 CUDA device found, CUDA version 7.0


(0.15258427,1843126272,12079398912)

## Dataset: Reuters RCV1 V2

The dataset is the widely used Reuters news article dataset RCV1 V2. This dataset and several others are loaded by running the script <code>getdata.sh</code> from the BIDMach/scripts directory. The data include both train and test subsets, and train and test labels (cats). 

In [20]:
var dir = "../data/rcv1/"             // adjust to point to the BIDMach/data/rcv1 directory
tic
val train = loadSMat(dir+"docs.smat.lz4")
val cats = loadFMat(dir+"cats.fmat.lz4")
val test = loadSMat(dir+"testdocs.smat.lz4")
val tcats = loadFMat(dir+"testcats.fmat.lz4")
toc



1.337

First lets enumerate some parameter combinations for learning rate and time exponent of the optimizer (texp)

In [21]:
val lrates = col(0.03f, 0.1f, 0.3f, 1f)        // 4 values
val texps = col(0.3f, 0.4f, 0.5f, 0.6f, 0.7f)  // 5 values



  0.30000
  0.40000
  0.50000
  0.60000
  0.70000


The next step is to enumerate all pairs of parameters. We can do this using the kron operator for now, this will eventually be a custom function:

In [22]:
val lrateparams = ones(texps.nrows, 1) ⊗ lrates
val texpparams = texps ⊗ ones(lrates.nrows,1)
lrateparams \ texpparams



  0.030000   0.30000
   0.10000   0.30000
   0.30000   0.30000
         1   0.30000
  0.030000   0.40000
   0.10000   0.40000
   0.30000   0.40000
         1   0.40000
        ..        ..


Here's the learner again:

In [23]:
val (mm, opts) = GLM.learner(train, cats, GLM.logistic)



BIDMach.models.GLM$LearnOptions@2220a5ce

To keep things simple, we'll focus on just one category and train many models for it. The "targmap" option specifies a mapping from the actual base categories to the model categories. We'll map from category six to all our models:

In [24]:
val nparams = lrateparams.length
val targmap = zeros(nparams, 103)
targmap(?,6) = 1



   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0...
   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0...
   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0...
   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0...
   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0...
   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0...
   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0...
   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0...
  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..


In [25]:
opts.targmap = targmap
opts.lrate = lrateparams
opts.texp = texpparams



  0.30000
  0.30000
  0.30000
  0.30000
  0.40000
  0.40000
  0.40000
  0.40000
       ..


In [26]:
mm.train

corpus perplexity=5582.125391
pass= 0
 2.00%, ll=-0.69315, gf=14.628, secs=0.0, GB=0.02, MB/s=1108.15, GPUmem=0.139997
16.00%, ll=-0.40496, gf=15.181, secs=0.1, GB=0.13, MB/s=1020.17, GPUmem=0.139997
30.00%, ll=-0.37984, gf=15.109, secs=0.2, GB=0.25, MB/s=1004.31, GPUmem=0.139997
44.00%, ll=-0.31457, gf=15.080, secs=0.4, GB=0.36, MB/s=999.76, GPUmem=0.139997
58.00%, ll=-0.34047, gf=15.112, secs=0.5, GB=0.48, MB/s=1001.03, GPUmem=0.139997
72.00%, ll=-0.23351, gf=15.118, secs=0.6, GB=0.59, MB/s=999.33, GPUmem=0.139997
87.00%, ll=-0.28147, gf=15.059, secs=0.7, GB=0.70, MB/s=995.33, GPUmem=0.139997
100.00%, ll=-0.23005, gf=14.928, secs=0.8, GB=0.81, MB/s=983.12, GPUmem=0.139736
pass= 1
 2.00%, ll=-0.28089, gf=14.921, secs=0.8, GB=0.83, MB/s=985.95, GPUmem=0.139736
16.00%, ll=-0.22614, gf=10.039, secs=1.4, GB=0.94, MB/s=663.01, GPUmem=0.139736
30.00%, ll=-0.28112, gf=10.423, secs=1.5, GB=1.05, MB/s=687.93, GPUmem=0.139736
44.00%, ll=-0.27817, gf=10.757, secs=1.6, GB=1.17, MB/s=709.90, GPUme

In [27]:
val (pp, popts) = GLM.predictor(mm.model, test)



BIDMach.models.GLM$PredOptions@43fc3b6f

And invoke the predict method on the predictor:

In [28]:
pp.predict
val preds = FMat(pp.preds(0))

corpus perplexity=65579.335560
Predicting


java.lang.RuntimeException: dimensions mismatch (20 103), (256 772)



Although ll values are printed above, they are not meaningful (there is no target to compare the prediction with). 

We can now compare the accuracy of predictions (preds matrix) with ground truth (the tcats matrix). 

In [29]:
val vcats = targmap * tcats                                          // create some virtual cats
val lls = mean(ln(1e-7f + vcats ∘ preds + (1-vcats) ∘ (1-preds)),2)  // actual logistic likelihood
mean(lls)



java.lang.NoClassDefFoundError: Could not initialize class 



A more thorough measure is ROC area:

In [12]:
val rocs = roc2(preds, vcats, 1-vcats, 100)   // Compute ROC curves for all categories



        0        0        0        0        0        0        0        0...
  0.84498  0.83089  0.70786  0.68339  0.83812  0.83970  0.76794  0.72687...
  0.88921  0.88448  0.82171  0.80113  0.88726  0.88800  0.85490  0.82922...
  0.91925  0.91368  0.87920  0.86826  0.91730  0.91980  0.89681  0.88874...
  0.93529  0.93241  0.90859  0.90460  0.93325  0.93492  0.92101  0.91841...
  0.94632  0.94252  0.93065  0.92722  0.94484  0.94548  0.93427  0.93325...
  0.95299  0.95031  0.94252  0.93677  0.95216  0.95281  0.94669  0.94261...
  0.95800  0.95698  0.95160  0.94474  0.95744  0.95782  0.95337  0.94854...
       ..       ..       ..       ..       ..       ..       ..       ..


In [13]:
plot(rocs)



ptolemy.plot.Plot[,0,0,484x239,layout=java.awt.FlowLayout,alignmentX=0.0,alignmentY=0.0,border=,flags=16777225,maximumSize=,minimumSize=,preferredSize=]

In [14]:
val aucs = mean(rocs)



0.97690,0.97633,0.97336,0.97058,0.97659,0.97681,0.97469,0.97249,0.97606,0.97700,0.97553,0.97189,0.97517,0.97694,0.97613,0.97292,0.97389,0.97664,0.97639,0.97353

The maxi2 function will find the max value and its index.

In [15]:
val (bestv, besti) = maxi2(aucs)



9

And using the best index we can find the optimal parameters:

In [16]:
texpparams(besti) \ lrateparams(besti)



0.50000,0.10000

> Write the optimal values in the cell below:

<b>Note:</b> although our parameters lay in a square grid, we could have enumerated any sequence of pairs, and we could have searched over more parameters. The learner infrastructure supports more intelligent model optimization (e.g. Bayesian methods). 