Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10408] [ML] Implement stacked autoencoder #13621

Closed
wants to merge 2 commits into from

Conversation

avulanov
Copy link
Contributor

What changes were proposed in this pull request?

Implement stacked autoencoder

  • Base on ml.ann Layer and LossFunction
  • Implement two loss functions EmptyLayerWithSquaredError and SigmoidLayerWithSquaredError to handle inputs (-inf, +inf) and [0, 1]
  • Implement greedy training
  • Provide encoder and decoder

How was this patch tested?

Provide unit tests

  • Gradient correctness of the new LossFunctions
  • Correct reconstruction of the original data by encoding and decoding (based on Berkeley's CS182)
  • Successful pre-training of deep network with 6 hidden layers

- Base on ml.ann Layer and LossFunction
- Implement two new loss functions EmptyLayerWithSquaredError and SigmoidLayerWithSquaredError to handle inputs [-inf, +inf] and [0, 1]
- Implement greedy training
- Provide encoder and decoder
@SparkQA
Copy link

SparkQA commented Jun 12, 2016

Test build #60350 has finished for PR 13621 at commit adc81ba.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StackedAutoencoder (override val uid: String)

@SparkQA
Copy link

SparkQA commented Jun 12, 2016

Test build #60351 has finished for PR 13621 at commit b3f5539.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@avulanov
Copy link
Contributor Author

@mengxr @jkbradley could you take a look

@sethah
Copy link
Contributor

sethah commented Aug 3, 2016

@avulanov

I used this implementation to run a simple single layer autoencoder on the MNIST dataset. I also used keras/theano to implement the same autoencoder and run on the MNIST data. With Spark, I got very poor results. First, here are the results of encode/decode using Keras with a cross entropy loss function on the output, and sigmoid activations.

image

The implementation in this patch yielded very similar results.

image

Finally, here is the Keras implementation using RELU activations.

image

It appears the sigmoid activations are saturating during training and preventing the algorithm from learning. If you have any thoughts/suggestions to improve these results I'd really appreciate it.

Does it make sense to add another algorithm based on MLP/NN when the current functionality is so limited? If the autoencoder library is not useful without more than sigmoid activations, I'd vote for focusing on adding new activations before another algorithm. I'm not an expert here, so I would really appreciate your thoughts. Thanks!

@avulanov
Copy link
Contributor Author

avulanov commented Aug 4, 2016

@sethah Thank you for posting the result of your experiment! It looks interesting. It is hard to say how good does it work without numerical results for a particular application, such as e.g. classification error rate. Could you compute the classification error rate on the mnist test data with and without autoencoder pre-training in Spark? I did this a while ago for the network with two hidden layers with 300 and 100 neurons. Autoencoder allowed to improve over standard training and reach the error rate reported in http://yann.lecun.com/exdb/mnist/. The other useful application of autoencoder is unsupervised learning. In this case, it will be interesting to compare the losses for sigmoid and relu autoencoders on the validation set. Would you mind checking this?

Autoencoder is also used to pre-train deep networks that does not converge otherwise due to vanishing gradient issue. There is an example of this use-case in the unit test.

@sethah
Copy link
Contributor

sethah commented Aug 4, 2016

I realize I was a bit unclear now. The results above are from training a single layer autoencoder and using it to reconstruct the original data. I used an encoding layer of 32 neurons so the results above are generated from 1.) encoding 784 dimension input to 32 dimension encoded input and 2.) decoding the 32 dimension vector to 784 dimensions. I will try to work on getting some specific numbers and do pre-training. For now, I wanted to point out that we get poor performance with sigmoid units and discuss where the short-term focus for deep learning in Spark should be.

@MLnick
Copy link
Contributor

MLnick commented Aug 8, 2016

cc @JeremyNixon also

@avulanov
Copy link
Contributor Author

avulanov commented Sep 9, 2016

Added this feature to the Spark scalable-deeplearning package. @sethah Could you take a look? Also, it would be great to add ReLu as you suggested. This package is intended for new features that were not yet merged to Spark ML or that are too experimental to be merged.

@avulanov avulanov changed the title [SPARK-2623] [ML] Implement stacked autoencoder [SPARK-10408] [ML] Implement stacked autoencoder Sep 13, 2016
@JeremyNixon
Copy link
Contributor

I ran the Keras experiment with code up at [GitHub link] if anyone wants to build on this or replicate it.

Running Seth’s example on the training data set, I was able to get the results below.

screen shot 2017-05-11 at 10 08 37 pm

I agree that we should add modern activation functions. More importantly, we should add improved optimizers and a modular API to make this valuable to real users.

I’m going to do a code review here and at scalable-deeplearning in the next few days regardless of the decision we make around this. I think that these improvements (activation functions, optimizers) should be a part of a flexible modular library if we want to give users a modern experience.

@SparkQA
Copy link

SparkQA commented Jul 24, 2019

Test build #108120 has finished for PR 13621 at commit b3f5539.

  • This patch fails R style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@srowen srowen closed this Sep 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants