In [79]:
using Mocha

In [80]:
train_filename = "cc_train.txt"
test_filename = "cc_test.txt"
exp_dir = "snapshots/cc_net_2/"

"snapshots/cc_net_2/"

Data layer is input layer. Source is a text file that points to your HDF5 file or files. Batch size is the size of each mini-batch and shuffle just randomly shuffles the input data which is good for training

In [81]:
data_layer  = HDF5DataLayer(name="train-data", source="cc_data/txt/$train_filename",
batch_size=64, shuffle=true)

Mocha.HDF5DataLayer(train-data)

This network is just a handful of inner product layers demonstrating different activation functions, dimension, and regularization

In [82]:
fc1_layer  = InnerProductLayer(name="ip1", output_dim=256,
neuron=Neurons.ReLU(), bottoms=[:data], tops=[:ip1])
fc2_layer  = InnerProductLayer(name="ip2", output_dim=256,
neuron=Neurons.ReLU(), bottoms=[:ip1], tops=[:ip2])

fc3_layer  = InnerProductLayer(name="ip3", output_dim=64,
neuron=Neurons.ReLU(), weight_regu=L1Regu(1), bottoms=[:ip2], tops=[:ip3])

fc4_layer  = InnerProductLayer(name="ip4", output_dim=64,
neuron=Neurons.Tanh(), bottoms=[:ip3], tops=[:ip4])

fc5_layer  = InnerProductLayer(name="ip5", output_dim=32,
neuron=Neurons.Tanh(), bottoms=[:ip4], tops=[:ip5])

Mocha.InnerProductLayer(ip5)

Final layer is also an inner-product layer, but without activation function. This layer is the classification layer and dimensions correspond to number of classes

In [83]:
fc6_layer  = InnerProductLayer(name="ip6", output_dim=2,
bottoms=[:ip5], tops=[:ip6])

Mocha.InnerProductLayer(ip6)

During training Dropout will randomly "turn off" certain neurons. This works like regularization and can help prevent overfitting

In [84]:
drop_input  = DropoutLayer(name="drop_in", bottoms=[:data], ratio=0.2)
drop_fc1 = DropoutLayer(name="drop_fc1", bottoms=[:ip1], ratio=0.5)

Mocha.DropoutLayer(drop_fc1)

Loss layer defines our loss function. We use Softmax in this case, since we are predicting a class label. We can use other loss functions, or define our own. The familiar "square loss" is available for regression problems 

In [85]:
loss_layer = SoftmaxLossLayer(name="loss", bottoms=[:ip6,:label])

Mocha.SoftmaxLossLayer(loss)

In [86]:
backend = CPUBackend()
init(backend)

In [87]:
common_layers = [fc1_layer, fc2_layer, fc3_layer, fc4_layer, fc5_layer, fc6_layer]
drop_layers = [drop_input, drop_fc1] ;

In [88]:
net = Net("cc-train", backend, [data_layer, common_layers..., drop_layers..., loss_layer]) ;

[2018-10-15 12:07:27 | info | Mocha]: Constructing net cc-train on Mocha.CPUBackend...
[2018-10-15 12:07:27 | info | Mocha]: Topological sorting 10 layers...
[2018-10-15 12:07:27 | info | Mocha]: Setup layers...
[2018-10-15 12:07:27 | info | Mocha]: Network constructed!


In [89]:
method = SGD()
params = make_solver_parameters(method, max_iter=10000, regu_coef=0.0005,
    mom_policy=MomPolicy.Fixed(0.9),
    lr_policy=LRPolicy.Inv(0.03, 0.0001, 0.75),
    load_from=exp_dir)

Dict{Symbol,Any} with 5 entries:
  :lr_policy  => Mocha.LRPolicy.Inv(0.03, 0.0001, 0.75)
  :max_iter   => 10000
  :load_from  => "snapshots/cc_net_2/"
  :regu_coef  => 0.0005
  :mom_policy => Mocha.MomPolicy.Fixed(0.9)

In [90]:
solver = Solver(method, params)

Mocha.Solver{Mocha.SGD}(Mocha.SGD(), Dict{Symbol,Any}(Pair{Symbol,Any}(:lr_policy, Mocha.LRPolicy.Inv(0.03, 0.0001, 0.75)),Pair{Symbol,Any}(:max_iter, 10000),Pair{Symbol,Any}(:load_from, "snapshots/cc_net_2/"),Pair{Symbol,Any}(:regu_coef, 0.0005),Pair{Symbol,Any}(:mom_policy, Mocha.MomPolicy.Fixed(0.9))), Mocha.CoffeeLounge("", 1, :merge, Dict{AbstractString,Dict{Int64,AbstractFloat}}(), Mocha.CoffeeBreak[], true, 2, 0))

In [91]:
setup_coffee_lounge(solver, save_into="$exp_dir/statistics.jld", every_n_iter=1000)

:merge

In [92]:
add_coffee_break(solver, TrainingSummary(), every_n_iter=100)
add_coffee_break(solver, Snapshot(exp_dir), every_n_iter=500)

2-element Array{Mocha.CoffeeBreak,1}:
 Mocha.CoffeeBreak(Mocha.TrainingSummary(Any[:iter, :obj_val]), 100, 0)
 Mocha.CoffeeBreak(Mocha.Snapshot("snapshots/cc_net_2/"), 500, 0)      

In [93]:
data_layer_test = HDF5DataLayer(name="test-data", source="cc_data/txt/$test_filename", batch_size=100)
acc_layer = AccuracyLayer(name="test-accuracy", bottoms=[:ip6, :label])
test_net = Net("cc-test", backend, [data_layer_test, common_layers..., acc_layer]) ;

[2018-10-15 12:07:45 | info | Mocha]: Constructing net cc-test on Mocha.CPUBackend...
[2018-10-15 12:07:45 | info | Mocha]: Topological sorting 8 layers...
[2018-10-15 12:07:45 | info | Mocha]: Setup layers...
[2018-10-15 12:07:45 | info | Mocha]: Network constructed!


In [94]:
add_coffee_break(solver, ValidationPerformance(test_net), every_n_iter=1000);

In [95]:
@time solve(solver, net)

destroy(net)
destroy(test_net)
shutdown(backend)

[2018-10-15 12:07:48 | info | Mocha]: Snapshot directory snapshots/cc_net_2/ already exists
[2018-10-15 12:07:48 | info | Mocha]:  TRAIN iter=000000 obj_val=0.75138515
[2018-10-15 12:07:48 | info | Mocha]: Saving snapshot to snapshot-000000.jld...
[2018-10-15 12:07:49 | info | Mocha]: 
[2018-10-15 12:07:49 | info | Mocha]: ## Performance on Validation Set after 0 iterations
[2018-10-15 12:07:49 | info | Mocha]: ---------------------------------------------------------
[2018-10-15 12:07:49 | info | Mocha]:   Accuracy (avg over 6000) = 68.3000%
[2018-10-15 12:07:49 | info | Mocha]: ---------------------------------------------------------
[2018-10-15 12:07:49 | info | Mocha]: 
[2018-10-15 12:07:49 | info | Mocha]:  TRAIN iter=000100 obj_val=0.48257792
[2018-10-15 12:07:50 | info | Mocha]:  TRAIN iter=000200 obj_val=0.58705467
[2018-10-15 12:07:50 | info | Mocha]:  TRAIN iter=000300 obj_val=0.54265344
[2018-10-15 12:07:50 | info | Mocha]:  TRAIN iter=000400 obj_val=0.59441322
[2018-10-15 

[2018-10-15 12:08:18 | info | Mocha]:  TRAIN iter=006000 obj_val=0.50470620
[2018-10-15 12:08:18 | info | Mocha]: Saving snapshot to snapshot-006000.jld...
[2018-10-15 12:08:18 | info | Mocha]: 
[2018-10-15 12:08:18 | info | Mocha]: ## Performance on Validation Set after 6000 iterations
[2018-10-15 12:08:18 | info | Mocha]: ---------------------------------------------------------
[2018-10-15 12:08:18 | info | Mocha]:   Accuracy (avg over 6000) = 77.7667%
[2018-10-15 12:08:18 | info | Mocha]: ---------------------------------------------------------
[2018-10-15 12:08:18 | info | Mocha]: 
[2018-10-15 12:08:18 | info | Mocha]:  TRAIN iter=006100 obj_val=0.58126116
[2018-10-15 12:08:19 | info | Mocha]:  TRAIN iter=006200 obj_val=0.58992499
[2018-10-15 12:08:19 | info | Mocha]:  TRAIN iter=006300 obj_val=0.48664626
[2018-10-15 12:08:20 | info | Mocha]:  TRAIN iter=006400 obj_val=0.50643069
[2018-10-15 12:08:20 | info | Mocha]:  TRAIN iter=006500 obj_val=0.40633318
[2018-10-15 12:08:20 | in

Dict{AbstractString,Array{Mocha.AbstractParameter,1}} with 0 entries

In [20]:
open("net.dot", "w") do out net2dot(out, net) end