Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, ANN, BDN algorithm to MLlib #3222

Closed
wants to merge 12 commits into from

Conversation

witgo
Copy link
Contributor

@witgo witgo commented Nov 12, 2014

Activation function

  • softmax
  • sigmoid
  • tanh
  • softplus
  • ReLu
  • Noisy ReLU

Gradient descent

  • AdaDelta
  • AdaGrad
  • L-BFGS
  • Momentum

Regularization method

  • Dropout
  • L2

Experimental Results

mnist dataset

MLP

network structure: 784 * 500 * 10; Gradient descent: learning Rate: 0.1; Weight cost 0.0; Dropout rate: 0.2, fraction: 0.05; Training data: 5000; Test data: 5000; AdaGrad(rho 0.99 epsilon 0.01 gamma 0.1 momentum 0.9)

Iterations Test error
1000 4.86 %
DBN

network structure: 784 * 300 * 300 * 500 * 10; Gradient descent: AdaGrad;learning Rate: 0.05; Weight cost 0.0; Dropout rate: 0.5, miniBatch: 300; Training data: 60000; Test data: 10000;

Iterations Test error
6000 1.96 %
20000 1.70%

REFERENCES

@SparkQA
Copy link

SparkQA commented Nov 12, 2014

Test build #23257 has started for PR 3222 at commit 8ced3e8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 12, 2014

Test build #23257 has finished for PR 3222 at commit 8ced3e8.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RBM(val numVisible: Int, val numHidden: Int, scale: Double = 0.01D)
    • case class MinstItem(label: Int, data: Array[Int])
    • class MinstDatasetReader(labelsFile: String, imagesFile: String)

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23257/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 12, 2014

Test build #23258 has started for PR 3222 at commit c5e4324.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 12, 2014

Test build #23258 has finished for PR 3222 at commit c5e4324.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class RBM(val numVisible: Int, val numHidden: Int, scale: Double = 0.01D)
    • case class MinstItem(label: Int, data: Array[Int])
    • class MinstDatasetReader(labelsFile: String, imagesFile: String)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23258/
Test PASSed.

@witgo witgo changed the title [WIP][SPARK-4251][MLLIB]Add Restricted Boltzmann machine(RBM) algorithm to MLlib [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, ANN,BDN algorithm to MLlib Nov 14, 2014
@witgo witgo changed the title [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, ANN,BDN algorithm to MLlib [WIP][SPARK-4251][SPARK-2352][MLLIB]Add RBM, ANN, BDN algorithm to MLlib Nov 14, 2014
@witgo
Copy link
Contributor Author

witgo commented Nov 14, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23363 has started for PR 3222 at commit 1e4fa3b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23364 has started for PR 3222 at commit 1e4fa3b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23363 has finished for PR 3222 at commit 1e4fa3b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class NN(val topology: Array[Int], scale: Double = 0.01D)
    • class RBM(val numVisible: Int, val numHidden: Int, scale: Double = 0.01D)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23363/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23364 has finished for PR 3222 at commit 1e4fa3b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class NN(val topology: Array[Int], scale: Double = 0.01D)
    • class RBM(val numVisible: Int, val numHidden: Int, scale: Double = 0.01D)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23364/
Test PASSed.

@Myasuka
Copy link
Member

Myasuka commented Nov 15, 2014

Why there are no any annotation or readme in your code?

@witgo
Copy link
Contributor Author

witgo commented Nov 15, 2014

Sorry, This patch is still work in process., I will add the annotation and document at later.
BTW, My English is poor. we can communicate in email,This is more efficient.

@SparkQA
Copy link

SparkQA commented Nov 16, 2014

Test build #23440 has started for PR 3222 at commit 5f1c8a0.

  • This patch merges cleanly.

@debasish83
Copy link

@witgo is your neural net model a RDD or it is an array ?

@witgo
Copy link
Contributor Author

witgo commented Nov 16, 2014

Now, neural net model is stored in a matrix. The model is able to support 10000 * 500 * 100 three-layer neural network and 100000*1000 two-layer neural network.

@SparkQA
Copy link

SparkQA commented Nov 16, 2014

Test build #23440 has finished for PR 3222 at commit 5f1c8a0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class DBN(val stackedRBM: StackedRBM, val nn: NN)
    • class RBM(val numVisible: Int, val numHidden: Int, scale: Double = 0.001D)
    • class StackedRBM(val innerRBMs: Array[RBM])
    • case class MinstItem(label: Int, data: Array[Int])
    • class MinstDatasetReader(labelsFile: String, imagesFile: String)

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23440/
Test FAILed.

weightCost: Double,
learningRate: Double): DBN = {
StackedRBM.train(data.map(_._1), batchSize, numIteration, dbn.stackedRBM,
fraction, momentum, weightCost, learningRate, dbn.stackedRBM.numLayer - 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last layer should be also trained?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it all depends on the problem. Last layer usually is changeable when it comes to classification or regression problems etc. It might not be necessary to be trained on pretrain only if trained on finetune

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, Thanks.

@SparkQA
Copy link

SparkQA commented Nov 20, 2014

Test build #23674 has started for PR 3222 at commit af8fbb3.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 10, 2015

Test build #25361 has started for PR 3222 at commit 9492dea.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 10, 2015

Test build #25361 has finished for PR 3222 at commit 9492dea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class AdaGradUpdater(
    • class DBN(val stackedRBM: StackedRBM)
    • class MLP(
    • class MomentumUpdater(val momentum: Double) extends Updater
    • class RBM(
    • class StackedRBM(val innerRBMs: Array[RBM])
    • case class MinstItem(label: Int, data: Array[Int])
    • class MinstDatasetReader(labelsFile: String, imagesFile: String)
    • protected class CaseInsensitiveMap(map: Map[String, String]) extends Map[String, String]

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25361/
Test PASSed.

@musicx
Copy link

musicx commented Jan 22, 2015

Hi @witgo, where can I find your email? 中文交流

@witgo
Copy link
Contributor Author

witgo commented Jan 22, 2015

witgo#qq.com

@SparkQA
Copy link

SparkQA commented Jan 27, 2015

Test build #26145 has started for PR 3222 at commit 612c7bd.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 27, 2015

Test build #26145 has finished for PR 3222 at commit 612c7bd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class AdaGradUpdater(
    • class DBN(val stackedRBM: StackedRBM)
    • class MLP(
    • class MomentumUpdater(val momentum: Double) extends Updater
    • class RBM(
    • class StackedRBM(val innerRBMs: Array[RBM])
    • case class MinstItem(label: Int, data: Array[Int])
    • class MinstDatasetReader(labelsFile: String, imagesFile: String)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26145/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26207 has started for PR 3222 at commit de47aaf.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 28, 2015

Test build #26207 has finished for PR 3222 at commit de47aaf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class AdaGradUpdater(
    • class DBN(val stackedRBM: StackedRBM)
    • class MLP(
    • class MomentumUpdater(val momentum: Double) extends Updater
    • class RBM(
    • class StackedRBM(val innerRBMs: Array[RBM])
    • case class MinstItem(label: Int, data: Array[Int])
    • class MinstDatasetReader(labelsFile: String, imagesFile: String)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26207/
Test PASSed.

@srowen
Copy link
Member

srowen commented Mar 22, 2015

So, this hasn't been touched in a couple months, and doesn't merge. It overlaps with some existing functionality in MLlib, and some other works in progress. It's really a big-bang change that dumps a lot of new code and I'm not sure it's been argued that all of this belongs in MLlib. Some of this functionality might belong in the new API as a transformation. I think this should be closed, at this point, in favor of collaborating on the other ANN implementation or reintroducing some of this in much smaller changes.

@witgo
Copy link
Contributor Author

witgo commented Mar 23, 2015

Well, I have to close it

@witgo witgo closed this Mar 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants