[WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, ANN, BDN algorithm to MLlib #3222

witgo · 2014-11-12T09:16:24Z

Activation function

Gradient descent

AdaDelta
AdaGrad
L-BFGS
Momentum

Regularization method

Dropout
L2

Experimental Results

mnist dataset

MLP

network structure: 784 * 500 * 10; Gradient descent: learning Rate: 0.1; Weight cost 0.0; Dropout rate: 0.2, fraction: 0.05; Training data: 5000; Test data: 5000; AdaGrad(rho 0.99 epsilon 0.01 gamma 0.1 momentum 0.9)

Iterations	Test error
1000	4.86 %

DBN

network structure: 784 * 300 * 300 * 500 * 10; Gradient descent: AdaGrad;learning Rate: 0.05; Weight cost 0.0; Dropout rate: 0.5, miniBatch: 300; Training data: 60000; Test data: 10000;

Iterations	Test error
6000	1.96 %
20000	1.70%

REFERENCES

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng, "Large scale distributed deep networks " in NIPS, 2012.
Matthew D. Zeiler, "ADADELTA: An Adaptive Learning Rate Method", abs/1212.5701, 2012.
Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. "Improving neural networks by preventing co-adaptation of feature detectors", CoRR, abs/1207.0580, 2012.
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting", Journal of Machine Learning Research, vol. 15, pp. 1929–1958, 2014.

SparkQA · 2014-11-12T09:17:41Z

Test build #23257 has started for PR 3222 at commit 8ced3e8.

This patch merges cleanly.

SparkQA · 2014-11-12T09:17:46Z

Test build #23257 has finished for PR 3222 at commit 8ced3e8.

This patch fails RAT tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class RBM(val numVisible: Int, val numHidden: Int, scale: Double = 0.01D)
- case class MinstItem(label: Int, data: Array[Int])
- class MinstDatasetReader(labelsFile: String, imagesFile: String)

AmplabJenkins · 2014-11-12T09:17:47Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23257/
Test FAILed.

SparkQA · 2014-11-12T09:25:30Z

Test build #23258 has started for PR 3222 at commit c5e4324.

This patch merges cleanly.

SparkQA · 2014-11-12T10:47:48Z

Test build #23258 has finished for PR 3222 at commit c5e4324.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class RBM(val numVisible: Int, val numHidden: Int, scale: Double = 0.01D)
- case class MinstItem(label: Int, data: Array[Int])
- class MinstDatasetReader(labelsFile: String, imagesFile: String)

AmplabJenkins · 2014-11-12T10:47:52Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23258/
Test PASSed.

witgo · 2014-11-14T09:29:05Z

Jenkins, retest this please.

SparkQA · 2014-11-14T09:29:58Z

Test build #23363 has started for PR 3222 at commit 1e4fa3b.

This patch merges cleanly.

SparkQA · 2014-11-14T09:35:12Z

Test build #23364 has started for PR 3222 at commit 1e4fa3b.

This patch merges cleanly.

SparkQA · 2014-11-14T10:54:28Z

Test build #23363 has finished for PR 3222 at commit 1e4fa3b.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class NN(val topology: Array[Int], scale: Double = 0.01D)
- class RBM(val numVisible: Int, val numHidden: Int, scale: Double = 0.01D)

AmplabJenkins · 2014-11-14T10:54:32Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23363/
Test PASSed.

SparkQA · 2014-11-14T11:00:28Z

Test build #23364 has finished for PR 3222 at commit 1e4fa3b.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class NN(val topology: Array[Int], scale: Double = 0.01D)
- class RBM(val numVisible: Int, val numHidden: Int, scale: Double = 0.01D)

AmplabJenkins · 2014-11-14T11:00:32Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23364/
Test PASSed.

Myasuka · 2014-11-15T12:16:39Z

Why there are no any annotation or readme in your code?

witgo · 2014-11-15T13:43:50Z

Sorry, This patch is still work in process., I will add the annotation and document at later.
BTW, My English is poor. we can communicate in email,This is more efficient.

SparkQA · 2014-11-16T14:45:03Z

Test build #23440 has started for PR 3222 at commit 5f1c8a0.

This patch merges cleanly.

debasish83 · 2014-11-16T15:00:42Z

@witgo is your neural net model a RDD or it is an array ?

witgo · 2014-11-16T15:28:01Z

Now, neural net model is stored in a matrix. The model is able to support 10000 * 500 * 100 three-layer neural network and 100000*1000 two-layer neural network.

SparkQA · 2014-11-16T15:50:26Z

Test build #23440 has finished for PR 3222 at commit 5f1c8a0.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class DBN(val stackedRBM: StackedRBM, val nn: NN)
- class RBM(val numVisible: Int, val numHidden: Int, scale: Double = 0.001D)
- class StackedRBM(val innerRBMs: Array[RBM])
- case class MinstItem(label: Int, data: Array[Int])
- class MinstDatasetReader(labelsFile: String, imagesFile: String)

AmplabJenkins · 2014-11-16T15:50:29Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23440/
Test FAILed.

witgo · 2014-11-16T16:08:59Z

mllib/src/main/scala/org/apache/spark/mllib/neuralNetwork/DBN.scala

+    weightCost: Double,
+    learningRate: Double): DBN = {
+    StackedRBM.train(data.map(_._1), batchSize, numIteration, dbn.stackedRBM,
+      fraction, momentum, weightCost, learningRate, dbn.stackedRBM.numLayer - 1)


The last layer should be also trained?

I think it all depends on the problem. Last layer usually is changeable when it comes to classification or regression problems etc. It might not be necessary to be trained on pretrain only if trained on finetune

I see, Thanks.

SparkQA · 2014-11-20T15:04:56Z

Test build #23674 has started for PR 3222 at commit af8fbb3.

This patch merges cleanly.

SparkQA · 2015-01-10T16:42:39Z

Test build #25361 has started for PR 3222 at commit 9492dea.

This patch merges cleanly.

SparkQA · 2015-01-10T17:50:23Z

Test build #25361 has finished for PR 3222 at commit 9492dea.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class AdaGradUpdater(
- class DBN(val stackedRBM: StackedRBM)
- class MLP(
- class MomentumUpdater(val momentum: Double) extends Updater
- class RBM(
- class StackedRBM(val innerRBMs: Array[RBM])
- case class MinstItem(label: Int, data: Array[Int])
- class MinstDatasetReader(labelsFile: String, imagesFile: String)
- protected class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String]

AmplabJenkins · 2015-01-10T17:50:27Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25361/
Test PASSed.

musicx · 2015-01-22T08:05:51Z

Hi @witgo, where can I find your email? 中文交流

witgo · 2015-01-22T08:08:17Z

witgo#qq.com

SparkQA · 2015-01-27T06:02:46Z

Test build #26145 has started for PR 3222 at commit 612c7bd.

This patch merges cleanly.

SparkQA · 2015-01-27T07:12:11Z

Test build #26145 has finished for PR 3222 at commit 612c7bd.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class AdaGradUpdater(
- class DBN(val stackedRBM: StackedRBM)
- class MLP(
- class MomentumUpdater(val momentum: Double) extends Updater
- class RBM(
- class StackedRBM(val innerRBMs: Array[RBM])
- case class MinstItem(label: Int, data: Array[Int])
- class MinstDatasetReader(labelsFile: String, imagesFile: String)

AmplabJenkins · 2015-01-27T07:12:15Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26145/
Test PASSed.

SparkQA · 2015-01-28T05:22:44Z

Test build #26207 has started for PR 3222 at commit de47aaf.

This patch merges cleanly.

SparkQA · 2015-01-28T06:32:03Z

Test build #26207 has finished for PR 3222 at commit de47aaf.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class AdaGradUpdater(
- class DBN(val stackedRBM: StackedRBM)
- class MLP(
- class MomentumUpdater(val momentum: Double) extends Updater
- class RBM(
- class StackedRBM(val innerRBMs: Array[RBM])
- case class MinstItem(label: Int, data: Array[Int])
- class MinstDatasetReader(labelsFile: String, imagesFile: String)

AmplabJenkins · 2015-01-28T06:32:07Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26207/
Test PASSed.

srowen · 2015-03-22T17:12:02Z

So, this hasn't been touched in a couple months, and doesn't merge. It overlaps with some existing functionality in MLlib, and some other works in progress. It's really a big-bang change that dumps a lot of new code and I'm not sure it's been argued that all of this belongs in MLlib. Some of this functionality might belong in the new API as a transformation. I think this should be closed, at this point, in favor of collaborating on the other ANN implementation or reintroducing some of this in much smaller changes.

witgo · 2015-03-23T01:44:14Z

Well, I have to close it

witgo force-pushed the rbm branch from 8ced3e8 to c5e4324 Compare November 12, 2014 09:18

witgo changed the title ~~[WIP][SPARK-4251][MLLIB]Add Restricted Boltzmann machine(RBM) algorithm to MLlib~~ [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, ANN,BDN algorithm to MLlib Nov 14, 2014

witgo changed the title ~~[WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, ANN,BDN algorithm to MLlib~~ [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, ANN, BDN algorithm to MLlib Nov 14, 2014

witgo force-pushed the rbm branch 2 times, most recently from c78b421 to 1e4fa3b Compare November 14, 2014 09:23

witgo force-pushed the rbm branch 2 times, most recently from 9855fe1 to 5f1c8a0 Compare November 16, 2014 14:41

witgo reviewed Nov 16, 2014
View reviewed changes

witgo force-pushed the rbm branch from 5f1c8a0 to af8fbb3 Compare November 20, 2014 14:58

witgo force-pushed the rbm branch from 38efd6d to 9492dea Compare January 10, 2015 16:41

witgo force-pushed the rbm branch from 9492dea to 612c7bd Compare January 27, 2015 05:58

witgo added 12 commits January 28, 2015 13:20

Add RBM, ANN and BDN algorithm to MLlib

13132ab

assert -> require

5b43818

Fix initializeDBN

c590e8f

mins -> mnist

2396e40

Refactoring Layer

165ff28

epsilon default 0.2

aa654f3

Refactoring code

c07be5e

Add StackedAutoEncoder

fc41695

revert Add StackedAutoEncoder

ad26d32

Refactoring code 2

4c38776

fix AdaGrad formula error

42fd3b8

use Matrix.transpose

de47aaf

witgo force-pushed the rbm branch from 612c7bd to de47aaf Compare January 28, 2015 05:20

debasish83 mentioned this pull request Mar 17, 2015

[MLLIB] [spark-2352] Implementation of an Artificial Neural Network (ANN) #1290

Closed

witgo closed this Mar 23, 2015

[WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, ANN, BDN algorithm to MLlib #3222

[WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, ANN, BDN algorithm to MLlib #3222

Conversation

witgo commented Nov 12, 2014

Activation function

Gradient descent

Regularization method

Experimental Results

mnist dataset

MLP

DBN

REFERENCES

SparkQA commented Nov 12, 2014

SparkQA commented Nov 12, 2014

AmplabJenkins commented Nov 12, 2014

SparkQA commented Nov 12, 2014

SparkQA commented Nov 12, 2014

AmplabJenkins commented Nov 12, 2014

witgo commented Nov 14, 2014

SparkQA commented Nov 14, 2014

SparkQA commented Nov 14, 2014

SparkQA commented Nov 14, 2014

AmplabJenkins commented Nov 14, 2014

SparkQA commented Nov 14, 2014

AmplabJenkins commented Nov 14, 2014

Myasuka commented Nov 15, 2014

witgo commented Nov 15, 2014

SparkQA commented Nov 16, 2014

debasish83 commented Nov 16, 2014

witgo commented Nov 16, 2014

SparkQA commented Nov 16, 2014

AmplabJenkins commented Nov 16, 2014

witgo Nov 16, 2014

Choose a reason for hiding this comment

Lewuathe Nov 19, 2014

Choose a reason for hiding this comment

witgo Nov 19, 2014

Choose a reason for hiding this comment

SparkQA commented Nov 20, 2014

SparkQA commented Jan 10, 2015

SparkQA commented Jan 10, 2015

AmplabJenkins commented Jan 10, 2015

musicx commented Jan 22, 2015

witgo commented Jan 22, 2015

SparkQA commented Jan 27, 2015

SparkQA commented Jan 27, 2015

AmplabJenkins commented Jan 27, 2015

SparkQA commented Jan 28, 2015

SparkQA commented Jan 28, 2015

AmplabJenkins commented Jan 28, 2015

srowen commented Mar 22, 2015

witgo commented Mar 23, 2015