[MLLIB] [spark-2352] Implementation of an Artificial Neural Network (ANN) #1290

Closed
wants to merge 143 commits into
from

Projects

None yet
@bgreeven
bgreeven commented Jul 3, 2014

The code contains a multi-layer ANN implementation, with variable number of inputs, outputs and hidden nodes. It takes as input an RDD vector pairs, corresponding to the training set with inputs and outputs.

Next to two automated tests, an example program is also included, which also contains a graphical representation.

@mengxr
Contributor
mengxr commented Jul 3, 2014

@bgreeven Please add [MLLIB] to your PR, following https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark . It makes easier for people who want to search MLlib's PRs. Thanks!

@mengxr
Contributor
mengxr commented Jul 3, 2014

Jenkins, test this please.

@bgreeven bgreeven changed the title from [spark-2352] Implementation of an 1-hidden layer Artificial Neural Network (ANN) to [MLLIB] [spark-2352] Implementation of an 1-hidden layer Artificial Neural Network (ANN) Jul 4, 2014
@mengxr
Contributor
mengxr commented Jul 16, 2014

Jenkins, retest this please.

@SparkQA
SparkQA commented Jul 16, 2014

QA tests have started for PR 1290. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16708/consoleFull

@SparkQA
SparkQA commented Jul 16, 2014

QA results for PR 1290:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
abstract class GeneralizedSteepestDescendModel(val weights: Vector )
trait ANN {
class LeastSquaresGradientANN( noInp: Integer, noHid: Integer, noOut: Integer, b: Double ) extends Gradient with ANN {
class ANNUpdater extends Updater {
class ParallelANN (

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16708/consoleFull

@mburke13

@bgreeven Are you continuing work on this pull request so that it passes all unit tests?

@bgreeven

Hi Matthew,

Sure, I can. I was on holiday during the last two weeks, but now back in office. I'll update the code this week.

Best regards,

Bert

From: Matthew Burke [mailto:notifications@github.com]
Sent: 20 July 2014 06:46
To: apache/spark
Cc: Bert Greevenbosch
Subject: Re: [spark] [MLLIB] [spark-2352] Implementation of an 1-hidden layer Artificial Neural Network (ANN) (#1290)

@bgreevenhttps://github.com/bgreeven Are you continuing work on this pull request so that it passes all unit tests?


Reply to this email directly or view it on GitHubhttps://github.com/apache/spark/pull/1290#issuecomment-49531526.

@bgreeven

I updated the two sources to comply with "sbt/sbt scalastyle". Maybe retry the unit tests with the new modifications?

@mengxr
Contributor
mengxr commented Jul 30, 2014

Jenkins, add to whitelist.

@mengxr
Contributor
mengxr commented Jul 30, 2014

Jenkins, test this please.

@mengxr
Contributor
mengxr commented Jul 30, 2014

@bgreeven Jenkins will be automatically triggered for future updates.

@SparkQA
SparkQA commented Jul 30, 2014

QA tests have started for PR 1290. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17412/consoleFull

@SparkQA
SparkQA commented Jul 30, 2014

QA results for PR 1290:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
abstract class GeneralizedSteepestDescendModel(val weights: Vector )
trait ANN {
class LeastSquaresGradientANN(
class ANNUpdater extends Updater {
class ParallelANN (

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17412/consoleFull

@SparkQA
SparkQA commented Jul 30, 2014

QA tests have started for PR 1290. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17440/consoleFull

@SparkQA
SparkQA commented Jul 30, 2014

QA results for PR 1290:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
abstract class GeneralizedSteepestDescendModel(val weights: Vector )
trait ANN {
class LeastSquaresGradientANN(
class ANNUpdater extends Updater {
class ParallelANN (

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17440/consoleFull

@mengxr
Contributor
mengxr commented Aug 1, 2014

@bgreeven The filename mllib/src/main/scala/org/apache/spark/mllib/ann/GeneralizedSteepestDescendAlgorithm doesn't have .scala extension.

@bgreeven
bgreeven commented Aug 1, 2014

Thanks a lot! I have added the extension now.

@SparkQA
SparkQA commented Aug 1, 2014

QA tests have started for PR 1290. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17649/consoleFull

@SparkQA
SparkQA commented Aug 1, 2014

QA results for PR 1290:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
abstract class GeneralizedSteepestDescendModel(val weights: Vector )
trait ANN {
class LeastSquaresGradientANN(
class ANNUpdater extends Updater {
class ParallelANN (

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17649/consoleFull

@SparkQA
SparkQA commented Aug 1, 2014

QA tests have started for PR 1290. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17665/consoleFull

@SparkQA
SparkQA commented Aug 1, 2014

QA results for PR 1290:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
abstract class GeneralizedSteepestDescendModel(val weights: Vector )
trait ANN {
class LeastSquaresGradientANN(
class ANNUpdater extends Updater {
class ParallelANN (

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17665/consoleFull

@hunggpham

Hi Bert,

I want to try your ANN on Spark but could not find it in the latest clone. It's probably not there yet despite the successful tests and merge messages above (10 days ago). How can I get a copy of your ANN code and try it out.

Thanks,
Hung Pham

@debasish83

Hung,
You can merge the repository on your spark fork and you should be able to see the code..

@debasish83

SteepestDescend should be SteepestDescent !

@bgreeven

SteepestDescend -> SteepestDescent can be changed. Thanks for noticing.

Hung Pham, did it work out for you now?

@hunggpham

Yes i forked your repository and can see the codes now. One question:
I don't see backprop codes. Will that be added later? Thanks.

Sent from cell phone. Please excuse typo & brevity
On Aug 12, 2014 1:34 AM, "Bert Greevenbosch" notifications@github.com
wrote:

SteepestDescend -> SteepestDescent can be changed. Thanks for noticing.

Hung Pham, did it work out for you now?


Reply to this email directly or view it on GitHub
#1290 (comment).

@bgreeven

The ANN uses the existent GradientDescent from mllib.optimization for back propagation. It uses the gradient from the new LeastSquaresGradientANN class, and updates using the new ANNUpdater class.

This line in ANNUpdater.compute is the backbone of the back propagation:

brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)

@hunggpham

I finally see the backprop codes in the 2 for loops inside
LeastSquaresGradientANN
that calculates the gradient which is then used to update the weights by
ANNUpdater.

Thanks, Bert.

On Tue, Aug 12, 2014 at 8:54 PM, Bert Greevenbosch <notifications@github.com

wrote:

The ANN uses the existent GradientDescent from mllib.optimization for back
propagation. It uses the gradient from the new LeastSquaresGradientANN
class, and updates using the new ANNUpdater class.

This line in ANNUpdater.compute is the backbone of the back propagation:

brzAxpy(-thisIterStepSize, gradient.toBreeze, brzWeights)


Reply to this email directly or view it on GitHub
#1290 (comment).

@avulanov
Contributor

@bgreeven I've tried to train ParallelANNWithSGD with 3 layers 1000x500x18, numiterations 1000, stepSize 1. My dataset has ~2000 instances, 1000 features, 18 classes. After 17 hours it didn't finish and I killed the Spark process. I think there are some performance issues. I'll try to look at your code but without comments it would be challenging :)

@bgreeven

Thanks for your feedback. I'll write some documentation, and also add some comments. I'll try with similar size data.

The internal data structure of the weights (and gradient) would have a dimension of (1001_500)+(501_18)=509518 floats. The weights are stored in a non-sparse vector, sometimes converted to Breeze. There may be an issue with that for this size of data. It should be possible though, so worth to have a look on how to fix it.

@avulanov
Contributor

@bgreeven I've looked at your code and the algorithm seems to be implemented correctly to the best of my knowledge. Probably, copying of the array of weights harms the performance. I played with single threaded implementation of perceptron in Scala and it works fine for my size of data (i.e. around few minutes).

@SparkQA
SparkQA commented Aug 21, 2014

QA tests have started for PR 1290 at commit 767071f.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Aug 21, 2014

QA tests have finished for PR 1290 at commit 767071f.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class GeneralizedSteepestDescentModel(val weights: Vector )
    • trait ANN
    • class LeastSquaresGradientANN(
    • class ANNUpdater extends Updater
@bgreeven

I have updated the code. Indeed the LeastSquaresGradientANN.compute function was the culprit.

I removed the Breeze instructions, and replaced them by simple Array[Double] instructions. I think especially the removal of taking a Breeze subvector helps.

In addition, there are some values that could be re-used, and don't need re-calculation for each loop in the LeastSquaresGradientANN.compute function. So I have changed the loop order and moved some computations up the loop hierarchy.

There is considerable speed-up. It now works well with a test set size of 1898 items (970 training, 928 test) and an ANN with 256 input, 128 hidden and 26 output nodes (letter classifier).

@avulanov
Contributor

@bgreeven In the meantime I did another implementation of neural networks classifier with arbitrary number of hidden layers. It uses GradientDescent to train on RDD[LabeledPoint] as other classifiers in mllib, has unit tests and works on my example 2000x500x18 with ~2000 instances. It doesn't have bias units though. Do you wish that we join our efforts? I can make a pull request into your branch. My implementation is below: https://github.com/avulanov/spark/blob/neuralnetwork/mllib/src/main/scala/org/apache/spark/mllib/classification/NeuralNetwork.scala (branch https://github.com/avulanov/spark/tree/neuralnetwork)

@SparkQA
SparkQA commented Aug 22, 2014

QA tests have started for PR 1290 at commit 3d79aa6.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Aug 22, 2014

QA tests have started for PR 1290 at commit a1b6a7f.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Aug 22, 2014

QA tests have started for PR 1290 at commit 78dde56.

  • This patch merges cleanly.
@bgreeven

Added documentation

@SparkQA
SparkQA commented Aug 22, 2014

QA tests have started for PR 1290 at commit 95a88c6.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Aug 22, 2014

QA tests have started for PR 1290 at commit 9bb9766.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Aug 22, 2014

QA tests have finished for PR 1290 at commit 3d79aa6.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • The 'ParallelANN' class is the main class of the ANN. This class uses a trait 'ANN', which includes functions for calculating the hidden layer ('computeHidden') and calculation of the output ('computeValues'). The output of 'computeHidden' includes the bias node in the hidden layer, such that it does not need to handle the hidden bias node differently.
    • The input of the training function is an RDD with (input/output) training pairs, each input and output being stored as a 'Vector'. The training function returns a variable of from class 'ParallelANNModel', as described below.
    • The 'ParallelANN' class implements a Artificial Neural Network (ANN), using the stochastic gradient descent method. It takes as input an RDD of input/output values of type 'Vector', and returns an object of type 'ParallelANNModel' containing the parameters of the trained ANN. The 'ParallelANNModel' object can also be used to calculate results after training.
    • abstract class GeneralizedSteepestDescentModel(val weights: Vector )
    • trait ANN
    • class LeastSquaresGradientANN(
    • class ANNUpdater extends Updater
@SparkQA
SparkQA commented Aug 22, 2014

QA tests have finished for PR 1290 at commit a1b6a7f.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • The 'ParallelANN' class is the main class of the ANN. This class uses a trait 'ANN', which includes functions for calculating the hidden layer ('computeHidden') and calculation of the output ('computeValues'). The output of 'computeHidden' includes the bias node in the hidden layer, such that it does not need to handle the hidden bias node differently.
    • The input of the training function is an RDD with (input/output) training pairs, each input and output being stored as a 'Vector'. The training function returns a variable of from class 'ParallelANNModel', as described below.
    • The 'ParallelANN' class implements a Artificial Neural Network (ANN), using the stochastic gradient descent method. It takes as input an RDD of input/output values of type 'Vector', and returns an object of type 'ParallelANNModel' containing the parameters of the trained ANN. The 'ParallelANNModel' object can also be used to calculate results after training.
    • abstract class GeneralizedSteepestDescentModel(val weights: Vector )
    • trait ANN
    • class LeastSquaresGradientANN(
    • class ANNUpdater extends Updater
@SparkQA
SparkQA commented Aug 22, 2014

QA tests have finished for PR 1290 at commit 78dde56.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • The 'ParallelANN' class is the main class of the ANN. This class uses a trait 'ANN', which includes functions for calculating the hidden layer ('computeHidden') and calculation of the output ('computeValues'). The output of 'computeHidden' includes the bias node in the hidden layer, such that it does not need to handle the hidden bias node differently.
    • The input of the training function is an RDD with (input/output) training pairs, each input and output being stored as a 'Vector'. The training function returns a variable of from class 'ParallelANNModel', as described below.
    • The 'ParallelANN' class implements a Artificial Neural Network (ANN), using the stochastic gradient descent method. It takes as input an RDD of input/output values of type 'Vector', and returns an object of type 'ParallelANNModel' containing the parameters of the trained ANN. The 'ParallelANNModel' object can also be used to calculate results after training.
    • abstract class GeneralizedSteepestDescentModel(val weights: Vector )
    • trait ANN
    • class LeastSquaresGradientANN(
    • class ANNUpdater extends Updater
@SparkQA
SparkQA commented Aug 22, 2014

QA tests have finished for PR 1290 at commit 95a88c6.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • The 'ParallelANN' class is the main class of the ANN. This class uses a trait 'ANN', which includes functions for calculating the hidden layer ('computeHidden') and calculation of the output ('computeValues'). The output of 'computeHidden' includes the bias node in the hidden layer, such that it does not need to handle the hidden bias node differently.
    • The input of the training function is an RDD with (input/output) training pairs, each input and output being stored as a 'Vector'. The training function returns a variable of from class 'ParallelANNModel', as described below.
    • The 'ParallelANN' class implements a Artificial Neural Network (ANN), using the stochastic gradient descent method. It takes as input an RDD of input/output values of type 'Vector', and returns an object of type 'ParallelANNModel' containing the parameters of the trained ANN. The 'ParallelANNModel' object can also be used to calculate results after training.
    • In multiclass classification, all$2^`
    • public final class JavaDecisionTree
    • abstract class GeneralizedSteepestDescentModel(val weights: Vector )
    • trait ANN
    • class LeastSquaresGradientANN(
    • class ANNUpdater extends Updater
@SparkQA
SparkQA commented Aug 22, 2014

QA tests have finished for PR 1290 at commit 9bb9766.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • The 'ParallelANN' class is the main class of the ANN. This class uses a trait 'ANN', which includes functions for calculating the hidden layer ('computeHidden') and calculation of the output ('computeValues'). The output of 'computeHidden' includes the bias node in the hidden layer, such that it does not need to handle the hidden bias node differently.
    • The input of the training function is an RDD with (input/output) training pairs, each input and output being stored as a 'Vector'. The training function returns a variable of from class 'ParallelANNModel', as described below.
    • The 'ParallelANN' class implements a Artificial Neural Network (ANN), using the stochastic gradient descent method. It takes as input an RDD of input/output values of type 'Vector', and returns an object of type 'ParallelANNModel' containing the parameters of the trained ANN. The 'ParallelANNModel' object can also be used to calculate results after training.
    • abstract class GeneralizedSteepestDescentModel(val weights: Vector )
    • trait ANN
    • class LeastSquaresGradientANN(
    • class ANNUpdater extends Updater
@bgreeven

Joining efforts / cooperation is always good of course. :-)

Let me have a closer look at your code first, and see how it differs from mine. I'll try it with my data and see its outcome, usability and speed.

Since the optimisation, my code also works with 1024 input nodes, 512 hidden nodes and 26 output nodes. I still need to play a bit more to find the optimal parameters for this particular problem though.

@avulanov
Contributor

@bgreeven ok! I am going to replace the loops with vector operations, which should give a speed-up

@avulanov
Contributor

@bgreeven I've replaced the loops with breeze matrix-vector operations, it works much faster (~4x time for my tests). https://github.com/avulanov/spark/blob/neuralnetwork/mllib/src/main/scala/org/apache/spark/mllib/classification/NeuralNetwork.scala

@SparkQA
SparkQA commented Sep 2, 2014

QA tests have started for PR 1290 at commit 8d60379.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 2, 2014

QA tests have finished for PR 1290 at commit 8d60379.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class GeneralizedModel(val weights: Vector )
    • trait ANN
    • class LeastSquaresGradientANN(
    • class ANNUpdater extends Updater
@SparkQA
SparkQA commented Sep 2, 2014

QA tests have started for PR 1290 at commit f12165a.

  • This patch merges cleanly.
@bgreeven
bgreeven commented Sep 2, 2014

Now updated such that the code supports true back-propagation.

Thanks to Alexander Ulanov (avulanov) for implementing true back-propagation in his repository first. This code borrows extensively from his code, and uses the same back-propagation algorithm (save for using arrays rather than matrices/vectors) and "layers" vector (here called "topology").

Looking forward to continue our collaboration!

@SparkQA
SparkQA commented Sep 2, 2014

QA tests have finished for PR 1290 at commit f12165a.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class GeneralizedModel(val weights: Vector )
    • class ParallelANN(
    • class LeastSquaresGradientANN(
    • class ANNUpdater extends Updater
@elyast
Contributor
elyast commented Sep 7, 2014

Hi @avulanov I run your NeuralNetworkSuite (from your fork / neuralnetwork brach), however it fails randomly, are you sure you have implemented in correctly?

By the way guys did you thought about regularization?

@bgreeven I checked your though and I see a lots of iterations I would recommend to replace it with breeze vector operations since it will be much faster (its true also for matlab or R code).

Let me know what do you think and if I can help with anything.

Last thing to consider is to factor out activation function, so instead sigmoid one can use softmax, tan or else.

@avulanov
Contributor
avulanov commented Sep 7, 2014

@elyast I am playing with various settings right now, use higher learning rate or more iterations
I also thought that breeze will be faster but according to our tests gradient computation in breeze is in par with bgreeven implementation, and there is additional overhead of rolling/unrolling matrices to/from weight vector. Moreover, using native BLAS/LAPACK through breeze doesn't speed anything up. Interestingly, this corresponds to graphs in https://github.com/fommil/netlib-java. I will be happy if you prove the opposite. That said, code with breeze is easier to read.

@mengxr
Contributor
mengxr commented Sep 8, 2014

@bgreeven Thanks for contributing ANN! I made a quick scan over the code. Some high-level comments:

  1. The user guide is for normal users and it should focus on how to use ANN. If we want to leave some notes for developers, we can append a section at the end.
  2. We don't ask users to treat unit tests as demos or examples. Instead, we put a short code snippet in the user guide and put a complete example under examples/.
  3. GeneralizedModel and GeneralizedAlgorithm are definitely out of the scope of this PR and they should not live under mllib.ann. We can make a separate JIRA to discuss the APIs. Could you remove them in this PR?
  4. predict outputs the prediction for the first node. Would the first node be the only special node? How about having predict(v) output the full prediction and predict(v, i) output the prediction to the i-th node?
  5. Could you try to use LBFGS instead of GradientDescent?
  6. Please replace for loops by while loops. The latter is faster in Scala.
  7. Please follow the Spark Code Style Guide and update the style, e.g.,
    a. remove space after ( and before )
    b. add ScalaDoc for all public classes and methods
    c. line width should be smaller than 100 chars (in both main and test)
    d. some verification code are left as comments, please find a way to move them to unit tests
    e. organize imports into groups and order them alphabetically within each group
    f. do not add return or ; unless they are necessary
  8. Please use existing unit tests as templates. For example, please rename TestParallelANN to ParallelANNSuite and use LocalSparkContext and FunSuite for Spark setup and test assertions. Remove main() from unit tests.
  9. Is Parallel necessary in the name ParallelANN?
  10. What methods and classes are public? Could they be private or package private? Please generate the API doc and check the public ones.

Given the size of the PR, I hope we can split it into multiple PRs. My recommendation would be

  1. Remove user guide, GeneralizedAlgorithm, and GeneralizedModel and focus on the implementation and unit tests.
  2. Add user guide and example code.
  3. Discuss interface for generalized algorithm and model.

I suggest removing the user guide because we may need to go through multiple iterations for the code review of the implementation. It is not necessary to update the user guide with each update.

@bgreeven
bgreeven commented Sep 9, 2014

Thanks for your feedback. Your points are very helpful indeed.

Here is my response:

  1. The user guide is for normal users and it should focus on how to use ANN. If we want to leave some notes for developers, we can append a section at the end.
    [bgreeven]: Sure. I think the user guide needs a lot of revision anyway, but as you said, it is better to wait until the code is more stable to update the user guide.
  2. We don't ask users to treat unit tests as demos or examples. Instead, we put a short code snippet in the user guide and put a complete example under examples/.
    [bgreeven]: OK, I'll see how to convert the demo into a unit test.
  3. GeneralizedModel and GeneralizedAlgorithm are definitely out of the scope of this PR and they should not live under mllib.ann. We can make a separate JIRA to discuss the APIs. Could you remove them in this PR?
  4. predict outputs the prediction for the first node. Would the first node be the only special node? How about having predict(v) output the full prediction and predict(v, i) output the prediction to the i-th node?

[bgreeven]: I certainly understand your concerns on points 3 and 4. My reasons for adding GeneralizedModel and GeneralizedAlgorithm were, that I see more uses of ANNs than classification only. A LabeledPoint implementation would restrict the output to essentially a one dimensional value. If you want to learn e.g. a multidimensional function (such as in the demo), then you need something more general than LabeledPoint.

The architecture of taking only the first element of an output vector is for legacy reasons. GeneralizedLinearModel (to which GeneralizedModel was modelled) as well as the ClassificationModel only output a one dimensional output value, hence I made the interface of predict(v) the same and created a separate function predictV(v) to output the multidimensional result.

I think we can indeed open a second JIRA to discuss this, since I think there can also be other uses for multidimensional output than just classification.

  1. Could you try to use LBFGS instead of GradientDescent?
    [bgreeven] Tried it, and that works too. Actually, I would like to make the code more flexible, to allow for replacing the optimisation function. There is a lot of research in (parallelisation of) training ANNs, so the future may bring better optimisation strategies, and it should be easy to plug those into the existing code.
  2. Please replace for loops by while loops. The latter is faster in Scala.
    [bgreeven] Makes sense. Will do so.
  3. Please follow the Spark Code Style Guidehttps://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide and update the style, e.g., a. remove space after ( and before ) b. add ScalaDoc for all public classes and methods c. line width should be smaller than 100 chars (in both main and test) d. some verification code are left as comments, please find a way to move them to unit tests e. organize imports into groups and order them alphabetically within each group f. do not add return or ; unless they are necessary
    [bgreeven] OK, I can do that. B.T.W. it seems that the Spark Code Style Guide is missing some rules. I would be happy to volunteer expanding the Style Guide, also since "sbt/sbt scalastyle" enforces some rules (such as mandatory spaces before and after '+') that are not mentioned in the Style Guide.
  4. Please use existing unit tests as templates. For example, please rename TestParallelANN to ParallelANNSuite and use LocalSparkContext and FunSuite for Spark setup and test assertions. Remove main() from unit tests.
    [bgreeven] OK, I will look at this and see how to convert the demo to a unit test.
  5. Is Parallel necessary in the name ParallelANN?
    [bgreeven] Not really. Better naming is desirable indeed.
  6. What methods and classes are public? Could they be private or package private? Please generate the API doc and check the public ones.
    [bgreeven] Yes I found out about this too. Some classes and methods need to be made public, as they currently cannot be access from outside. Maybe adding a Scala Object as interface (as is done in Alexander's code) is indeed better.
@SparkQA
SparkQA commented Sep 9, 2014

QA tests have started for PR 1290 at commit 62e915d.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 9, 2014

QA tests have finished for PR 1290 at commit 62e915d.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA
SparkQA commented Sep 9, 2014

QA tests have started for PR 1290 at commit 98b427a.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 9, 2014

QA tests have finished for PR 1290 at commit 98b427a.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@mengxr
Contributor
mengxr commented Sep 10, 2014

test this please

@SparkQA
SparkQA commented Sep 10, 2014

QA tests have started for PR 1290 at commit 770ea2a.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 10, 2014

QA tests have finished for PR 1290 at commit 770ea2a.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA
SparkQA commented Sep 10, 2014

QA tests have started for PR 1290 at commit 770ea2a.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 10, 2014

QA tests have finished for PR 1290 at commit 770ea2a.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA
SparkQA commented Sep 12, 2014

QA tests have started for PR 1290 at commit 770ea2a.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 12, 2014

QA tests have finished for PR 1290 at commit 770ea2a.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA
SparkQA commented Sep 15, 2014

QA tests have started for PR 1290 at commit 3a72fae.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 15, 2014

QA tests have finished for PR 1290 at commit 3a72fae.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA
SparkQA commented Sep 17, 2014

QA tests have started for PR 1290 at commit f1af6cf.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 17, 2014

QA tests have finished for PR 1290 at commit f1af6cf.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA
SparkQA commented Sep 22, 2014

QA tests have started for PR 1290 at commit 8acd799.

  • This patch merges cleanly.
@bgreeven

Changed optimiser to LBFGS. Works much faster, but has the disadvantage (due to the increased convergence speed per iteration) that it also starts to exhibit overfitting earlier (after much fewer iterations).

@bgreeven

I also needed to change the demo, as the fast convergence doesn't give an interesting converging graph anymore. I moved the demo to the examples directory, but we can consider whether we want to keep it at all.

@SparkQA
SparkQA commented Sep 22, 2014

QA tests have finished for PR 1290 at commit 8acd799.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class OutputCanvas2D(wd: Int, ht: Int) extends Canvas
    • class OutputFrame2D( title: String ) extends Frame( title )
    • class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends Canvas
    • class OutputFrame3D(title: String, shadowFrac: Double) extends Frame(title)
@SparkQA
SparkQA commented Sep 23, 2014

QA tests have started for PR 1290 at commit b3531d6.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 23, 2014

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20682/

@SparkQA
SparkQA commented Sep 23, 2014

QA tests have started for PR 1290 at commit a28aa4a.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 23, 2014

QA tests have finished for PR 1290 at commit b3531d6.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA
SparkQA commented Sep 23, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20683/

@SparkQA
SparkQA commented Sep 23, 2014

QA tests have finished for PR 1290 at commit a28aa4a.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class OutputCanvas2D(wd: Int, ht: Int) extends Canvas
    • class OutputFrame2D( title: String ) extends Frame( title )
    • class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends Canvas
    • class OutputFrame3D(title: String, shadowFrac: Double) extends Frame(title)
@SparkQA
SparkQA commented Sep 23, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20684/

@SparkQA
SparkQA commented Sep 25, 2014

QA tests have started for PR 1290 at commit 5b91bba.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 25, 2014

QA tests have started for PR 1290 at commit 0db8951.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 25, 2014

QA tests have finished for PR 1290 at commit 0db8951.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class OutputCanvas2D(wd: Int, ht: Int) extends Canvas
    • class OutputFrame2D( title: String ) extends Frame( title )
    • class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends Canvas
    • class OutputFrame3D(title: String, shadowFrac: Double) extends Frame(title)
@SparkQA
SparkQA commented Sep 25, 2014

QA tests have started for PR 1290 at commit d4a692c.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 25, 2014

QA tests have finished for PR 1290 at commit 5b91bba.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class OutputCanvas2D(wd: Int, ht: Int) extends Canvas
    • class OutputFrame2D( title: String ) extends Frame( title )
    • class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends Canvas
    • class OutputFrame3D(title: String, shadowFrac: Double) extends Frame(title)
@SparkQA
SparkQA commented Sep 25, 2014

QA tests have finished for PR 1290 at commit d4a692c.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA
SparkQA commented Sep 26, 2014

QA tests have started for PR 1290 at commit 19d2faa.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 26, 2014

QA tests have finished for PR 1290 at commit 19d2faa.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class OutputCanvas2D(wd: Int, ht: Int) extends Canvas
    • class OutputFrame2D( title: String ) extends Frame( title )
    • class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends Canvas
    • class OutputFrame3D(title: String, shadowFrac: Double) extends Frame(title)
@SparkQA
SparkQA commented Sep 26, 2014

QA tests have started for PR 1290 at commit aaf3162.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 26, 2014

QA tests have finished for PR 1290 at commit aaf3162.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class OutputCanvas2D(wd: Int, ht: Int) extends Canvas
    • class OutputFrame2D( title: String ) extends Frame( title )
    • class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends Canvas
    • class OutputFrame3D(title: String, shadowFrac: Double) extends Frame(title)
@SparkQA
SparkQA commented Sep 28, 2014

QA tests have started for PR 1290 at commit 804c07a.

  • This patch merges cleanly.
@SparkQA
SparkQA commented Sep 28, 2014

QA tests have finished for PR 1290 at commit 804c07a.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class OutputCanvas2D(wd: Int, ht: Int) extends Canvas
    • class OutputFrame2D( title: String ) extends Frame( title )
    • class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends Canvas
    • class OutputFrame3D(title: String, shadowFrac: Double) extends Frame(title)
    • class IndexedRecordToJavaConverter extends Converter[IndexedRecord, JMap[String, Any]]
    • class AvroWrapperToJavaConverter extends Converter[Any, Any]
@SparkQA
SparkQA commented Oct 28, 2014

Test build #22359 has finished for PR 1290 at commit 6d167c5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA
SparkQA commented Oct 29, 2014

Test build #22385 has finished for PR 1290 at commit 79c433e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class OutputCanvas2D(wd: Int, ht: Int) extends Canvas
    • class OutputFrame2D( title: String ) extends Frame( title )
    • class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends Canvas
    • class OutputFrame3D(title: String, shadowFrac: Double) extends Frame(title)
@SparkQA
SparkQA commented Jan 24, 2015

Test build #26036 has finished for PR 1290 at commit d18e9b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class OutputCanvas2D(wd: Int, ht: Int) extends Canvas
    • class OutputFrame2D( title: String ) extends Frame( title )
    • class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends Canvas
    • class OutputFrame3D(title: String, shadowFrac: Double) extends Frame(title)
    • trait ANNClassifierHelper
@SparkQA
SparkQA commented Feb 3, 2015

Test build #26555 has finished for PR 1290 at commit 5de5bad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class OutputCanvas2D(wd: Int, ht: Int) extends Canvas
    • class OutputFrame2D( title: String ) extends Frame( title )
    • class OutputCanvas3D(wd: Int, ht: Int, shadowFrac: Double) extends Canvas
    • class OutputFrame3D(title: String, shadowFrac: Double) extends Frame(title)
    • trait ANNClassifierHelper
@avulanov
Contributor
avulanov commented Feb 3, 2015

There are few PR related to different types of artificial neural networks. It makes sense to come up with common interfaces, as suggested by @jkbradley and @mengxr, reuse the code and make it easily extensible for new related PRs. Currently, we are refactoring the interface of this PR for this purpose. There is an https://issues.apache.org/jira/browse/SPARK-5575 to track ideas.

@Zehao
Zehao commented Mar 16, 2015

Hi, are you still working on this PR? I have a few questions about the latest code from https://github.com/bgreeven/spark .

  1. I ran the code and found that the parameter gradient in ANNUpdater.compute were all zeros during each iteration , and so it didn't change the weights. Does it matter ? I haven't well understand the difference between ANNLeastSquaresGradient.compute and ANNUpdater.compute yet , but ANNLeastSquaresGradient.compute do change the weights.
  2. Can you tell me how the bias nodes in the model work ?
@avulanov
Contributor

Hi, @Zehao
Gradient class computes the delta of gradient dG, updater class updates the gradient, simply G(i+1)=G(i) + r*dG, where r is learning rate. Bias b of the layer works as follows: y=f(A*x^T+b), where f is activation function, x is layer input. A and b are layer parameters or weights.

We are currently working on the more generic interface for the artificial neural networks, which should be easily extensible with the other layer and network types: https://github.com/avulanov/spark/tree/ann-interface

@debasish83

@avulanov could you please point me to a stable branch that I can experiment with..I am focused on collaborative filtering and implemented various matrix factorization formulations (quadratic and nonlinear forms):
https://issues.apache.org/jira/browse/SPARK-2426
https://issues.apache.org/jira/browse/SPARK-6323
Sparse Coding and PLSA are very useful for feature extraction but there are paper where neural nets (autoencoder) have beaten both of them.

I want to start experimenting with the autoencoder variants of neural net. Specifically I will be focused on these 3 aspects:

  1. Distributing the neural net gradient calculation (I think you have already distributed the gradient calculation which is exactly what I want). If not we should do it on graphx similar to LDA architecture.
  2. A block coordinate descent solver running on Master with BFGS/OWLQN as the inner solver.
  3. Use L1 regularization in place of drop out heuristic to automatically select interesting features to make the model sparse
@debasish83

Also how is this #3222 different ? I am confused for autoencoder which one is a better start...

@avulanov
Contributor

@debasish83 Sounds interesting!

  1. Gradient calculation is distributed by means of data, i.e. each worker calculates delta gradient on its part of the data. Although, it would be interesting to distribute the model for the cases when it does not fit into the workers memory.
  2. If you have a different solver you can try using it for artificial neural networks by replacing the current one (which is LBFGS)
  3. If you wish to select features, do you plan to use L1 on the bottom layer? In this case you might need to add it in the code by yourself.

This branch is relatively stable, I've done a lot of testing (see earlier posts) so you can experiment with it. There is a separate branch with stacked autoencoder (based on the code from this branch), which I've also tested with a lot: https://github.com/avulanov/spark/tree/autoencoder

The main goal of the PR you mentioned is DBN and RBM. Our plan with @witgo and @mengxr is to develop a more generic and scalable interface for artificial neural networks and the port existing code.

@debasish83

@avulanov If we want an auto-encoder that's working at the scale of matrix factorization and LDA, we cannot assume that model fits into worker memory. Even for a decent 10M x 3M auto-encoder and 10K hidden units we need 10M x 10K + 3M x 10K model memory (most likely sparse)...If the model fits into worker memory, the solver fits into master memory and there is neither the need for distributed gradient calculation or coordinate descent based solver...Does @witgo 's branch also aggregates gradients in the same way and assumes model will fit on every node ? The idea of using L1 is to decrease the distributed model memory basically...

@avulanov
Contributor

@debasish83 The current implementation is intended for more typical use of artificial neural network, although it would be interesting task to implement distributed model in general, not only for the sparse case. Are you going to build recommender system with artificial neural networks? Could you please elaborate on this or point me to the relevant papers.

If model fits into memory there is still a need for distributed gradient if your data is big. You can process each part of your data on a separate worker simultaneously and then do an update. This will not give linear speed-up but still worth doing. That's how it is implemented in MLlib's optimizers and also both in this branch and GuoQiang's.

@debasish83

@avulanov I like Jeff Dean's first paper...for that you need to distribute the neural net and do a block based solve...The another one that I saw recently which improves upon collaborative filtering using matrix factorization/plsa is https://www.cs.toronto.edu/~amnih/papers/rbmcf.pdf Although this is RBM and I want to start with something simple like Jeff's paper (deep neural net using ReLu)...If there is a JIRA please let me know and we can discuss further on that...

Also I agree with the distributed gradient part...we do that in all mllib algorithms...

@witgo
Contributor
witgo commented Mar 18, 2015

The MLLIB existing infrastructure(BSP) is not suitable for large scale distributed neural net
There is a JIRA about the parameters server.
SPARK-4590

@debasish83

@witgo do you want to use parameter server for gradient calculation as well or you would like to do using graphx (follow something similar to LDA architecture ?) Figure 1 from Jeff's paper Large Scale Distributed Deep Networks

@witgo
Contributor
witgo commented Mar 18, 2015

Parameter server + matrix calculations is usually more common.
The performance of matrix calculations is better.

@debasish83

I am not sure how to write Figure 1 using matrix but using graphx the gradient calculation is just a forward backward pass on the distributed graph...Also I think it is easier to add more layers using graphx...making the graph balanced is a issue but the matrix alternative is not very clear to me...if you agree I will work on @avulanov ann PR and try to come up a graphx based design to compute the gradient on every edge of this distributed graph...I will take the layer information as input...

@witgo
Contributor
witgo commented Mar 18, 2015

This seems to be worth a try. @avulanov what do you think?

@debasish83

Yup and once we have the gradients on edges of the graph, a simple block coordinate descent solver using adagrad should be good start..

@avulanov
Contributor

@debasish83 This seems worth a try although I need to better understand the details. Could you elaborate on how are you going to distribute network layers on the graph and what would be the main overhead during computation? Did you think your about both data and model parallelization for different layers like in Krizhevsky proposed http://arxiv.org/abs/1404.5997? It would be great to have it instead of separate implementations. Also, the artificial neural network primitives at which we are working right now could be reused. We might want to discuss this by phone. @mengxr what do you think?

@debasish83

@avulanov thanks for the reference...I am more interested in fully connected layer and the model parallel part...the convolution is data parallel and current PRs will address it well...Let's discuss it in few weeks...I will ping...

I am understanding LDA's graphx implementation...LDA has 2 partitions (document and words) but I am considering 1 partiton which has features and we add 10K nodes in another partition in this graph (hidden layer)...now the issue is that every data point needs to be passed over this graph but assuming the data is sparse (which is often the case for these models) can we do something ? Why the join with this bipartite graph and the data won't work ? If the data is not sparse and has 3M columns then I don't think anything will work...

@avulanov
Contributor

@debasish83 Sounds interesting! I think this might also work with non-fully connected networks, though they can be parallilized by data as you mentioned. You might also want to consider using BIDMat since it has optimizations for computations on sparse data.

@debasish83

yeah if it can be parallelized by data it's best to do that and not do any graphx joins because for graphx the painful thing is to balance the graph and most of the time that step will need more work than rest of the stuff :-(...

@debasish83

@witgo there are lot of useful building blocks in your RBM PR...are you planning to consolidate them in this PR ?

@witgo
Contributor
witgo commented Mar 23, 2015

The latest code in the branch Sent2vec (WIP).
This branch includes two new classes
Sentence2vec A sentence classification with convolutional-pooling structure
SentenceClassifier A latent semantic model with convolutional-pooling structure

@avulanov
Contributor

@witgo Thanks for the links! I hope soon we'll be able to merge it with the basic interface/implementation from https://github.com/avulanov/spark/tree/ann-interface (ann-interface-gemm, ann-dropout). I am finishing performance testing right now.

@hhbyyh
Contributor
hhbyyh commented Apr 9, 2015

@debasish83 I'm interested in the Graphx-based NN and implemented a prototype. Yet I found both feed forward and back propagation are slow (unacceptably) due to aggregateMessages and joinVertices, which seems to be inevitable, and will become worse if layer count increases.

Is there any other Graph based attempt available? or maybe some suggestion. Thanks.

@debasish83

Google...Jeff Dean did that in his paper :-) most likely we need a detailed analysis...can you point me to your code or once I implement it, I can give more feedback....I did not get time to code it yet..the issue here is to build a nearly balanced graph....without that you won't get good runtime

@debasish83

@hhbyyh https://github.com/hhbyyh/NeuralNetwork ? let me look into it and I will provide feedback...

@avulanov
Contributor
avulanov commented May 4, 2015

Just wanted to update on the further development of this topic. We're finishing testing new implementation of artificial neural network, there are few issues though. Features: logistic and softmax activation functions, mean squared and cross-entropy errors, extensible interfaces, batch computations, BLAS optimized and memory reuse, supports GradientDescent and LBFGS optimizers. https://github.com/avulanov/spark/tree/ann-interface-gemm

@hhbyyh
Contributor
hhbyyh commented May 6, 2015

Hi @avulanov, the new version sounds great. I've tried the implementation in this PR and its Scalability looks great. If possible, can you please share some rough information about the performance improvement from the new implementation ( like 10% better than current PR)? Thanks a lot.

@avulanov
Contributor
avulanov commented May 6, 2015

@hhbyyh The new version allocates memory needed for storing model parameters and intermediate results only once and then reuses it. The actual speed should be up to several times faster, depending on the batch size and number of iterations. The time of one iteration is comparable to Caffe with the same settings, because most of the time is spent in BLAS dgemm (matrix-matrix multiplication). Also, new version has much less probability of hitting the garbage collector. Probably I will plot some comparisons later.

@hhbyyh
Contributor
hhbyyh commented May 6, 2015

Thanks for the quick reply. Great stuff.

@loachli
loachli commented May 7, 2015

@avulanov I look through the new version and it includes a lot of new fucntions. You don't use bp algorithm for ANN? Have you done some comparsion test? Are there any documents for this new version?

@avulanov
Contributor
avulanov commented May 7, 2015

@loachli Back propagation is implemented in general form. I've done some benchmarks, though didn't have time to publish them, docs in process...

@avulanov
Contributor
avulanov commented May 8, 2015

I did small test to compare new implementation (https://github.com/avulanov/spark/tree/ann-interface-gemm) performance with the current one in this branch.

  • Two cluster configurations (native BLAS not used, Java implementation only)

  • C1: 8 machines (Xeon 3.3GHz 4 cores, 16GB RAM) with 7 workers total,

  • C2: 6 machines (2x Xeon 2.6GHz 6 cores, 32GB RAM) with 5 workers total

  • mnist8m dataset, persist in memory

  • Network topology 784x10 (no hidden layer = logistic regression)

  • LBFGS optimizer, 40 steps (epochs), tolerance 1e-4, batch size = 100

  • Accuracy on mnist test set: 0.9076

    Name C1, time, hh:mm:ss C2, time, hh:mm:ss
    Total time 00:03:53 00:02:29
    Avg step time 00:00:06 00:00:04

Code (FOR the new version https://github.com/avulanov/spark/tree/ann-interface-gemm):

import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.mllib.ann.{FeedForwardTrainer, Topology}
import org.apache.spark.mllib.classification.ANNClassifier
val mnist = MLUtils.loadLibSVMFile(sc, "hdfs://my.net:9000/input/mnist8m.scale").persist
val mnist784 = MLUtils.loadLibSVMFile(sc, "hdfs://my.net:9000/input/mnist.scale.t.784").persist
val topology = FeedForwardTopology.multiLayerPerceptron(Array[Int](784, 10), false)
val trainer = new FeedForwardTrainer(topology, 784, 10).setBatchSize(100)
trainer.LBFGSOptimizer.setNumIterations(40).setConvergenceTol(1e-4)
val model40 = new ANNClassifier(trainer).train(mnist)
val predictionAndLabels = mnist784.map( lp => (model40.predict(lp.features), lp.label))
val accuracy = predictionAndLabels.map{ case(p, l) => if (p == l) 1 else 0}.sum() / predictionAndLabels.count()

@hhbyyh
Contributor
hhbyyh commented May 9, 2015

Hi @avulanov, if the data is still available, can you please share how's the performance compared to the version in current PR. Thanks.

@avulanov
Contributor

@hhbyyh I'll plan to do it, though the main motivation behind the new version is extensible interface that allows adding more features (drop-out, RBNs and DBN are in progress), although I made a big improvement in the efficiency of code comparing to the previous one.

@avulanov
Contributor

I did another small test to compare new implementation (https://github.com/avulanov/spark/tree/ann-interface-gemm) performance with the current one in this branch.

  • Cluster configuration (OpenBLAS is used)

  • 6 machines (Xeon 3.3GHz 4 cores, 16GB RAM) with 5 workers total,

  • mnist8m dataset, persist in memory

  • Network topology 784x10 (no hidden layer = logistic regression)

  • LBFGS optimizer, 40 steps (epochs), tolerance 1e-4, batch size = 100

  • Average time on 3 runs

    Implementation time, hh:mm:ss Accuracy
    Current ANN no hidden 00:03:09 0.9076
    New ANN no hidden 00:02:16 0.9076
    LogisticRegressionWithLBFGS* 00:04:11 0.9087

*Does not take advantage of batch computations and native BLAS

@wangzk
wangzk commented May 13, 2015

Can I still use Gradient Descend as the optimizer?
I look through the code and find that the optimizer is set to LBFGS by default. And there is no code for Gradient Descend optimizer.
If I want to use gradient descend as the optimizer, how to change the code? By replace
private val optimizer = new LBFGS(gradient, updater).
with private val optimizer = new GradientDescend(gradient, updater)?

Thank you for your reply :)

@wangzk
wangzk commented May 13, 2015

@avulanov I ran your test code with SGD optimizer and find that there may be some wrong with the output. Your test code try to train an XOR classifier but with SGD it failed (with LBFGS it is OK.).
I got the following result:

(prediction, label)
(0.4442522868286226,0.0)
(0.4341183288415796,1.0)
(0.5305968528789123,1.0)
(0.5141176170143846,0.0)

I use the following test code to train with SGD:

    val initialWeights = FeedForwardModel(topology, 23124).weights()
    val trainer = new FeedForwardTrainer(topology, 2, 1)
    trainer.setWeights(initialWeights)
    trainer.SGDOptimizer.setNumIterations(20).setStepSize(0.1)
    val model = trainer.train(rddData)

I have changed the iteration number and step size but with no help.

PS.
Sorry to bother you. I tried another set of parameters:

    trainer.SGDOptimizer.setNumIterations(200).setStepSize(10)

Now I can get

(prediction, label)
(0.0688974445285474,0.0)
(0.91866703132045,1.0)
(0.9268285473334462,1.0)
(0.0846430143248306,0.0)

Can you give me some advice on parameter setting when I try to train with SGD?

@avulanov
Contributor

@wangzk Optimizer is set in trainer: trainer.LBFGSOptimizer or trainer.SGDOptimizer.

Below are my suggestions with regards to the parameters, most of which are based on my experience rather than strong theoretical assumptions.

LBFGS usually converges faster (i.e. needs much less iterations) than batch gradient descent because the former is quasi Newton method. Also, when the time needed make an iteration on the whole data (i.e. epoch) is small then LBFGS usually converges faster (less iterations and less time) than SGD. I would suggest to use LBFGS for smaller data or simpler models.

However, it was shown that for larger data, SGD is superior because the time needed for its convergence does not depend on the size of data as opposed to batch methods such as LBFGS (http://papers.nips.cc/paper/3323-the-tradeoffs-of-large-scale-learning.pdf).

With regards to SGD parameters, the number of iterations is hard to pick. Usually, one uses a validation set instead. Training is stopped when accuracy of the model reaches satisfying value or starts decreasing on this set. Another rule of thumb is that the number of iterations should not be smaller than the size of data otherwise there is a risk to skip a lot of samples while training.

Step size for SGD is 0.03 as a rule of thumb value to start with, which can also be chosen with the use of validation set. Though there are more interesting strategies when the step decreases each iteration.

There might be some confusion between SGDOptimizer.setMiniBatchFraction and trainer.batchSize. The former is the minibatch size for SGD, i.e. the fraction of data samples used in one iteration. The latter is the size of batch for data processing when data samples are stacked into matrix to take advantage of faster matrix-matrix operations in BLAS. You might want to have these parameters to produce equally sized data batches.

However, it was shown that increasing the minibatch size leads to slower convergence (http://www.cs.cmu.edu/~muli/file/minibatch_sgd.pdf). At the same time batch processing enables to process more samples per second. So, it worth find a balance between them, for example with validation set. A good value for minibatch to start with is in between 100-1000, having in mind that miniBatchFraction is minibatch/data.size.

Also, I would like to mention that "stochastic" part of SGD in MLlib is implemented with RDD.sample, which might be too expensive to perform each iteration especially for larger data.

@wangzk
wangzk commented May 14, 2015

@avulanov Thank you for your detailed advice on the parameters. Your advice makes me more clear about how to tune them. Thank you again!

@avulanov
Contributor

@wangzk you are welcome!

Added Dropout (http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf) implementation to the new version. It is implemented as a different type of network topology. Example of use:

val inputDropoutProb = 0.5
val layerDropoutProb = 0.5
val topology = FeedForwardTopology.multiLayerPerceptron(Array[Int](780, 32, 10), true)
val dropoutTopology = new DropoutTopology(topology.layers, inputDropoutProb,layerDropoutProb)
val trainer = new FeedForwardTrainer(dropoutTopology, 780, 10).setBatchSize(100)
trainer.LBFGSOptimizer.setConvergenceTol(1e-4).setNumIterations(100)
val model = new ANNClassifier(trainer).train(train)
@loachli
loachli commented May 18, 2015

@avulanov I use a small fraction of mnist data (100 for training, 100 for testing). When I use "Topology.multiLayerPerceptron(ArrayInt, false)", the accuracy is 0.68. When I use " Topology.multiLayerPerceptron(ArrayInt, false)", the accuracy is 0.52. That is to say, adding hidden lays might decrease the accuracy.
Could you explain the design idea of deltas in the function "FeedForwardModel.computeGradient". I do not understand the following code :
" deltas(L) = new BDM[Double](0, 0)
deltas(L - 1) = newE
for (i <- (L - 2) to (0, -1)) {
deltas(i) = layerModels(i + 1).prevDelta(deltas(i + 1), outputs(i + 1))
}"

@avulanov
Contributor

@loachli Do you set enough iterations? Adding hidden layers requires more iterations. While I do not know which setting do you use exactly, could you try the example from #1290 (comment) ?

With regards to the back propagation code, I implemented chain rule of back propagation. Good theoretical explanation can be found here: http://www.slideshare.net/kuwajima/cnnbp. It is in general form, so no need to change it if new layers are introduced. The code assumes that there is one functional layer at the top that provides an error function. Error is then back-propagated through all layers and delta is computed (the code you qouted). Delta is later used to compute the gradient of weights.

@loachli
loachli commented May 19, 2015

@avulanov I use the same setting as #1290 (comment), the accuracy is 0.68. The small data set might cause the decreased accuracy with hidden lays. I am on business and will test your new code with big data tomorrow.

@loachli
loachli commented May 20, 2015

@avulanov I did some test, the results is as follows
Machines:
3 machine, 1 master, 2 slave;
Each machine: 24 cores with Hyper threading (cpu MHz: 2099.943), 96G RAM
Spark setting:
spark.executor.memory=40G
spark.cores.max=40
Data size:
mnist dataset:
train data count: 60000
test data count: 10000

Topology iterations tol train time(s) accuracy
(784, 10) 40 1e-4 35 0.9179
(784, 100,10) 40 1e-4 127 0.7587
(784, 100,10) 120 1e-4 341 0.7948
@avulanov
Contributor

@loachli looks good! If you make more iterations for configurations with hidden layers, it will converge with accuracy about 0.95. I suggest using batches (100), plugging native BLAS and persisting the training set to speed up the computations. Also, with such size of dataset you might not take advantage of several nodes that you have, because all data might reside on one node.

@avulanov
Contributor

@witgo Thank you, this is great! We might want to think about documenting all this.

@Mageswaran1989
Contributor

Hi All,
Could you please point a book that can be used as a reference for thorough understand of this implementation?

@loachli
loachli commented May 25, 2015

@Mageswaran1989 You could get the idea of BP in http://neuralnetworksanddeeplearning.com/chap2.html. The book reference might be "Artificial Intelligence A Guide to Intelligent Systems". You could get the idea of avulanov in http://www.slideshare.net/kuwajima/cnnbp

@hntd187
hntd187 commented Jun 9, 2015

Hi everyone,

My colleagues and I are very interested in this implementation and have read most of the discussion going on here about this implementation. Currently, we have this code sitting in it's own maven project and we compile and add it as a maven dependency with spark in order to integrate it with our code. For some initial experiments this has worked fine, all the tests pass and the interfaces in spark work with this algorithm in java. We were curious though why this hasn't been merged yet? This pull request has been open for over and year and there doesn't seem to be a very compelling reason here to not merge this pull request. Could anyone shed some light into why this is still unmerged?

@sirolf2009

@hntd187 I second this, I've implemented this for a school object and it's working like a charm. Installing it on a servers proved more difficult, because on every server I had to pull the branch amd compile it myself, rather than taking the latest version of the website.

@avulanov
Contributor
avulanov commented Jun 9, 2015

@hntd187 @sirolf2009 Thank you for your interest in the project! We are currently discussing merging with @mengxr from Databricks. There are two requirements for this implementation to be merged: 1) extensible interface, so it will be easy to implement new types of networks 2) speed and scalability: our implementation has to be comparable in terms of speed to the state of the art single node implementations like Caffe and also scale well by means of adding nodes to the cluster. We have addressed both requirements in https://github.com/avulanov/spark/tree/ann-interface-gemm, however we need to perform scalability testing with AWS. This is an ongoing work.

@hntd187
hntd187 commented Jun 9, 2015

@avulanov If it would aid in speeding this up I can test or benchmark on some EC2 instances we have, which run on mesos. If you want to give a general dataset to use we could work something out.

@avulanov
Contributor

@hntd187 It would be good to discuss this. Currently I plan to use mnist8m and 6 layer network 784-2500-2000-1500-1000-500-10 which is the best fully configuration for mnist from http://yann.lecun.com/exdb/mnist/. However I am still looking for a more modern dataset probably with more features and corresponding configuration. Are you aware of any?

@hntd187
hntd187 commented Jun 12, 2015

@avulanov To be perfectly honest, does the "modern-ness" of the dataset really matter? This dataset has been a standard for a long time in this area so it seems perfectly reasonable to use this as most people working in this area would recognize the data and know relatively how to compare it to their own implementation.

@avulanov
Contributor

@hntd187 This is true, however it seems that "modern" datasets tend to have more features, so 784 features of mnist might seems too little these days. Anyway, the basic idea of benchmark is as follows: compare performance of Caffe and this implementation both in CPU and GPU mode with different numbers of nodes (workers) for Spark. Performance should be measured in samples/second processed. Here comes another problem: data formats that are supported by Spark and Caffe do not intersect. I can convert mnist8m (libsvm) to HDF5 for Caffe, however it will have different size that means that Caffe will read different amount of data from disk. Do you have an idea how to handle this problem?

@hntd187
hntd187 commented Jun 12, 2015

@avulanov Can spark even read an HDF5 file or would we have to write that as well? While, I can't donate any professional time to this conversion problem, but I may be able to assist if we wanted to write a conversion independently. I suppose the problem here, is even if we get HDF5 and run it in caffe, how would we get spark to use it? Reading a bit online and looking around, it seems to be the consensus to use the pyhdf5 library to read the files in and do a flatMap to RDDs, but that would seem horribly inefficient on a large dataset and we'd be shooting yourselves in the foot trying to make that scale. So I think, our best bet is if we want to compare to caffe is either, get caffe to read another format or add HDF5 reading capability to spark either via a hack or an actual contribution. First one is not ideal, second one is obviously more time consuming.

@thvasilo thvasilo added a commit to thvasilo/flink that referenced this pull request Jun 12, 2015
@thvasilo thvasilo Port of Spark ANN implementation to Flink.
Original code: apache/spark#1290
9be3827
@avulanov
Contributor

@hntd187 Thanks for suggestion, it seems that implementing the HDF5 reader for Spark is the most reasonable option. I need to think what would be the minimum viable implementation.

@thvasilo You should consider using the latest version, https://github.com/avulanov/spark/tree/ann-interface-gemm and also DBN from https://github.com/witgo/spark/tree/ann-interface-gemm-dbn

@hntd187
hntd187 commented Jun 12, 2015

@avulanov Would you like to split some of this work up or do you want to tackle this alone?

@avulanov
Contributor

@hntd187 Any help is really appreciated. We can split it into two functions: read and write. The good place to implement them is MLUtils as saveAsHDF5 and loadHDF5.

@hntd187
hntd187 commented Jun 17, 2015

@avulanov How about I take the read and you take the write?

In an ideal world we should be able to take the implementation from here https://github.com/h5py/h5py and load it into some form of RDD.

Here are the java tools for HDF5 http://www.hdfgroup.org/products/java/release/download.html which is the bindings for the file format, hopefully given the implementations out there this should be pretty straight forward.

@hntd187
hntd187 commented Jun 18, 2015

@avulanov Also, we're going to have to add a dependency with this with the HDF5 library, I think this should be handled the way the netlib is handled with the user having to enable a profile when building spark. So, normally it wouldn't be available, but if you build with it you can use it. I'll update the POM to account for that.

@avulanov
Contributor

@hntd187 Thanks for the links.

I am not sure that presence of hdf5 library should be handled on compilation step because there will be no fallback for the functions we are going to implement, as it is the case for netlib (it falls back to java implementation if you don't include jni binaries). Lets continue our discussion here https://issues.apache.org/jira/browse/SPARK-8449

@Myasuka
Myasuka commented Jul 16, 2015

hi, @avulanov , I have forked your repository about ann-benchmark https://github.com/avulanov/ann-benchmark/blob/master/spark/spark.scala . I feel a little confused about the mini-batch training, it seems that batchSize in code val trainer = new FeedForwardTrainer(topology, 780, 10).setBatchSize(batchSize) means the size of sub-block matrix you group the original input matrix into, and setMiniBatchFraction(1.0) in trainer.SGDOptimizer.setNumIterations(numIterations).setMiniBatchFraction(1.0).setStepSize(0.03) means you actually use full-batch gradient descent not the mini-batch gradient descent method. Does it performs well on mnist8m data? Maybe you can share the training parameters in detail, such as layer units, mini-batch size, stepsize and so on.

@avulanov
Contributor

@Myasuka Thank you for your interest in the benchmark. The goal of the benchmark is to measure the scalability of my implementation and to compare its efficiency with the other tools, such as Caffe. I measure the time needed for one epoch of batch gradient descent on a large network with ~12M of parameters. I don't measure the convergence rate or the accuracy, because they are very use-case specific and they don't show directly how scalable the particular machine learning tool is. The benchmark could be improved though and I am working on it, so thank you for your suggestion.

@Myasuka
Myasuka commented Jul 17, 2015

@avulanov I try to run test on minist data with SGD optimizer, however, I cannot reproduce the result in #1290 (comment), I use topology with (780, 10), and set batchsize as 1000, miniBatchFraction as 1.0, numIterations as 500, the accuracy is only 75%, if I set miniBatchFraction as 0.1, the accuracy still stays at 75%. Would you please share your training parameter in detail so that I can promote the accuray to 90% ?

@avulanov
Contributor

@Myasuka LBFGS was used in the mentioned experiment. SGD needs more iterations to converge in this case.

@bnoreus
bnoreus commented Jul 24, 2015

Hey guys,

I want to start of by saying Thank You for this piece of code. The ANN has been working beautifully so far. I have one question though: When I run the training on a dataset I imported from AWS s3, the logging says "Opening s3://blabla.txt for reading" over and over again. I interpret this as the program opening the S3-file a lot of times, instead of just once. Is this true? Wouldnt it be much faster if the file was only opened once?

@avulanov
Contributor

@bnoreus Thank you for your feedback. This code does not implement any file-related operations. It works with RDD only. I assume that the logging comes from the other piece of code you are using.

@asfgit asfgit pushed a commit that referenced this pull request Jul 31, 2015
@avulanov @mengxr avulanov + mengxr [SPARK-9471] [ML] Multilayer Perceptron
This pull request contains the following feature for ML:
   - Multilayer Perceptron classifier

This implementation is based on our initial pull request with bgreeven: #1290 and inspired by very insightful suggestions from mengxr and witgo (I would like to thank all other people from the mentioned thread for useful discussions). The original code was extensively tested and benchmarked. Since then, I've addressed two main requirements that prevented the code from merging into the main branch:
   - Extensible interface, so it will be easy to implement new types of networks
     - Main building blocks are traits `Layer` and `LayerModel`. They are used for constructing layers of ANN. New layers can be added by extending the `Layer` and `LayerModel` traits. These traits are private in this release in order to save path to improve them based on community feedback
     - Back propagation is implemented in general form, so there is no need to change it (optimization algorithm) when new layers are implemented
   - Speed and scalability: this implementation has to be comparable in terms of speed to the state of the art single node implementations.
     - The developed benchmark for large ANN shows that the proposed code is on par with C++ CPU implementation and scales nicely with the number of workers. Details can be found here: https://github.com/avulanov/ann-benchmark

   - DBN and RBM by witgo https://github.com/witgo/spark/tree/ann-interface-gemm-dbn
   - Dropout https://github.com/avulanov/spark/tree/ann-interface-gemm

mengxr and dbtsai kindly agreed to perform code review.

Author: Alexander Ulanov <nashb@yandex.ru>
Author: Bert Greevenbosch <opensrc@bertgreevenbosch.nl>

Closes #7621 from avulanov/SPARK-2352-ann and squashes the following commits:

4806b6f [Alexander Ulanov] Addressing reviewers comments.
a7e7951 [Alexander Ulanov] Default blockSize: 100. Added documentation to blockSize parameter and DataStacker class
f69bb3d [Alexander Ulanov] Addressing reviewers comments.
374bea6 [Alexander Ulanov] Moving ANN to ML package. GradientDescent constructor is now spark private.
43b0ae2 [Alexander Ulanov] Addressing reviewers comments. Adding multiclass test.
9d18469 [Alexander Ulanov] Addressing reviewers comments: unnecessary copy of data in predict
35125ab [Alexander Ulanov] Style fix in tests
e191301 [Alexander Ulanov] Apache header
a226133 [Alexander Ulanov] Multilayer Perceptron regressor and classifier
6add4ed
@mengxr
Contributor
mengxr commented Jul 31, 2015

@bgreeven We recently merged #7621 from @avulanov . Under the hood, it contains the ANN implementation based on this PR. Additional features should come in follow-up PRs. So do you mind closing this PR for now? We can move the discussion to the JIRA page on individual features. Thanks a lot for your contribution and everyone for the discussion!

@markgrover markgrover pushed a commit to markgrover/spark that referenced this pull request Jul 31, 2015
@avulanov avulanov + Mark Grover [SPARK-9471] [ML] Multilayer Perceptron
This pull request contains the following feature for ML:
   - Multilayer Perceptron classifier

This implementation is based on our initial pull request with bgreeven: apache#1290 and inspired by very insightful suggestions from mengxr and witgo (I would like to thank all other people from the mentioned thread for useful discussions). The original code was extensively tested and benchmarked. Since then, I've addressed two main requirements that prevented the code from merging into the main branch:
   - Extensible interface, so it will be easy to implement new types of networks
     - Main building blocks are traits `Layer` and `LayerModel`. They are used for constructing layers of ANN. New layers can be added by extending the `Layer` and `LayerModel` traits. These traits are private in this release in order to save path to improve them based on community feedback
     - Back propagation is implemented in general form, so there is no need to change it (optimization algorithm) when new layers are implemented
   - Speed and scalability: this implementation has to be comparable in terms of speed to the state of the art single node implementations.
     - The developed benchmark for large ANN shows that the proposed code is on par with C++ CPU implementation and scales nicely with the number of workers. Details can be found here: https://github.com/avulanov/ann-benchmark

   - DBN and RBM by witgo https://github.com/witgo/spark/tree/ann-interface-gemm-dbn
   - Dropout https://github.com/avulanov/spark/tree/ann-interface-gemm

mengxr and dbtsai kindly agreed to perform code review.

Author: Alexander Ulanov <nashb@yandex.ru>
Author: Bert Greevenbosch <opensrc@bertgreevenbosch.nl>

Closes #7621 from avulanov/SPARK-2352-ann and squashes the following commits:

4806b6f [Alexander Ulanov] Addressing reviewers comments.
a7e7951 [Alexander Ulanov] Default blockSize: 100. Added documentation to blockSize parameter and DataStacker class
f69bb3d [Alexander Ulanov] Addressing reviewers comments.
374bea6 [Alexander Ulanov] Moving ANN to ML package. GradientDescent constructor is now spark private.
43b0ae2 [Alexander Ulanov] Addressing reviewers comments. Adding multiclass test.
9d18469 [Alexander Ulanov] Addressing reviewers comments: unnecessary copy of data in predict
35125ab [Alexander Ulanov] Style fix in tests
e191301 [Alexander Ulanov] Apache header
a226133 [Alexander Ulanov] Multilayer Perceptron regressor and classifier
30fe47f
@AmplabJenkins

Merged build finished. Test FAILed.

@dennishuo dennishuo added a commit to dennishuo/spark that referenced this pull request Aug 7, 2015
@avulanov @dennishuo avulanov + dennishuo [SPARK-9471] [ML] Multilayer Perceptron
This pull request contains the following feature for ML:
   - Multilayer Perceptron classifier

This implementation is based on our initial pull request with bgreeven: apache#1290 and inspired by very insightful suggestions from mengxr and witgo (I would like to thank all other people from the mentioned thread for useful discussions). The original code was extensively tested and benchmarked. Since then, I've addressed two main requirements that prevented the code from merging into the main branch:
   - Extensible interface, so it will be easy to implement new types of networks
     - Main building blocks are traits `Layer` and `LayerModel`. They are used for constructing layers of ANN. New layers can be added by extending the `Layer` and `LayerModel` traits. These traits are private in this release in order to save path to improve them based on community feedback
     - Back propagation is implemented in general form, so there is no need to change it (optimization algorithm) when new layers are implemented
   - Speed and scalability: this implementation has to be comparable in terms of speed to the state of the art single node implementations.
     - The developed benchmark for large ANN shows that the proposed code is on par with C++ CPU implementation and scales nicely with the number of workers. Details can be found here: https://github.com/avulanov/ann-benchmark

   - DBN and RBM by witgo https://github.com/witgo/spark/tree/ann-interface-gemm-dbn
   - Dropout https://github.com/avulanov/spark/tree/ann-interface-gemm

mengxr and dbtsai kindly agreed to perform code review.

Author: Alexander Ulanov <nashb@yandex.ru>
Author: Bert Greevenbosch <opensrc@bertgreevenbosch.nl>

Closes #7621 from avulanov/SPARK-2352-ann and squashes the following commits:

4806b6f [Alexander Ulanov] Addressing reviewers comments.
a7e7951 [Alexander Ulanov] Default blockSize: 100. Added documentation to blockSize parameter and DataStacker class
f69bb3d [Alexander Ulanov] Addressing reviewers comments.
374bea6 [Alexander Ulanov] Moving ANN to ML package. GradientDescent constructor is now spark private.
43b0ae2 [Alexander Ulanov] Addressing reviewers comments. Adding multiclass test.
9d18469 [Alexander Ulanov] Addressing reviewers comments: unnecessary copy of data in predict
35125ab [Alexander Ulanov] Style fix in tests
e191301 [Alexander Ulanov] Apache header
a226133 [Alexander Ulanov] Multilayer Perceptron regressor and classifier
50a4a53
@08s011003

@bgreeven Hi, I try to train the model using this implementation and found weird outcome from the output from LBFGS as following:
15/08/11 08:52:24 INFO optimize.LBFGS: Step Size: 1.000
15/08/11 08:52:24 INFO optimize.LBFGS: Val and Grad Norm: 5.77126e+07 (rel: 6.22e-08) 3.59731
15/08/11 08:52:24 INFO optimize.LBFGS: Step Size: 1.000
15/08/11 08:52:24 INFO optimize.LBFGS: Val and Grad Norm: 5.77126e+07 (rel: 3.17e-08) 1.77486
15/08/11 08:52:24 INFO optimize.LBFGS: Step Size: 1.000
15/08/11 08:52:24 INFO optimize.LBFGS: Val and Grad Norm: 5.77126e+07 (rel: 1.56e-08) 0.885332
15/08/11 08:52:24 INFO optimize.LBFGS: Step Size: 1.000
15/08/11 08:52:24 INFO optimize.LBFGS: Val and Grad Norm: 5.77126e+07 (rel: 7.91e-09)
0.442205

I launch model training by : var model = ArtificialNeuralNetwork.train(trainData, Array(2, 3), 5000, 1e-8)

The problem is training process just iterate a few step before return. Obviously the error of the validation test is too large to satisfy expectation.What is the problem?

@asfgit asfgit pushed a commit that closed this pull request Aug 11, 2015
@mengxr mengxr Closes #1290
Closes #4934
423cdfd
@asfgit asfgit closed this in 423cdfd Aug 11, 2015
@mengxr
Contributor
mengxr commented Aug 11, 2015

I closed this PR. We can use Apache JIRA to continue discussion on individual issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment