-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10408] [ML] Implement stacked autoencoder #13621
Conversation
- Base on ml.ann Layer and LossFunction - Implement two new loss functions EmptyLayerWithSquaredError and SigmoidLayerWithSquaredError to handle inputs [-inf, +inf] and [0, 1] - Implement greedy training - Provide encoder and decoder
Test build #60350 has finished for PR 13621 at commit
|
Test build #60351 has finished for PR 13621 at commit
|
@mengxr @jkbradley could you take a look |
I used this implementation to run a simple single layer autoencoder on the MNIST dataset. I also used keras/theano to implement the same autoencoder and run on the MNIST data. With Spark, I got very poor results. First, here are the results of encode/decode using Keras with a cross entropy loss function on the output, and sigmoid activations. The implementation in this patch yielded very similar results. Finally, here is the Keras implementation using RELU activations. It appears the sigmoid activations are saturating during training and preventing the algorithm from learning. If you have any thoughts/suggestions to improve these results I'd really appreciate it. Does it make sense to add another algorithm based on MLP/NN when the current functionality is so limited? If the autoencoder library is not useful without more than sigmoid activations, I'd vote for focusing on adding new activations before another algorithm. I'm not an expert here, so I would really appreciate your thoughts. Thanks! |
@sethah Thank you for posting the result of your experiment! It looks interesting. It is hard to say how good does it work without numerical results for a particular application, such as e.g. classification error rate. Could you compute the classification error rate on the mnist test data with and without autoencoder pre-training in Spark? I did this a while ago for the network with two hidden layers with 300 and 100 neurons. Autoencoder allowed to improve over standard training and reach the error rate reported in http://yann.lecun.com/exdb/mnist/. The other useful application of autoencoder is unsupervised learning. In this case, it will be interesting to compare the losses for sigmoid and relu autoencoders on the validation set. Would you mind checking this? Autoencoder is also used to pre-train deep networks that does not converge otherwise due to vanishing gradient issue. There is an example of this use-case in the unit test. |
I realize I was a bit unclear now. The results above are from training a single layer autoencoder and using it to reconstruct the original data. I used an encoding layer of 32 neurons so the results above are generated from 1.) encoding 784 dimension input to 32 dimension encoded input and 2.) decoding the 32 dimension vector to 784 dimensions. I will try to work on getting some specific numbers and do pre-training. For now, I wanted to point out that we get poor performance with sigmoid units and discuss where the short-term focus for deep learning in Spark should be. |
cc @JeremyNixon also |
Added this feature to the Spark scalable-deeplearning package. @sethah Could you take a look? Also, it would be great to add ReLu as you suggested. This package is intended for new features that were not yet merged to Spark ML or that are too experimental to be merged. |
I ran the Keras experiment with code up at [GitHub link] if anyone wants to build on this or replicate it. Running Seth’s example on the training data set, I was able to get the results below. I agree that we should add modern activation functions. More importantly, we should add improved optimizers and a modular API to make this valuable to real users. I’m going to do a code review here and at scalable-deeplearning in the next few days regardless of the decision we make around this. I think that these improvements (activation functions, optimizers) should be a part of a flexible modular library if we want to give users a modern experience. |
Test build #108120 has finished for PR 13621 at commit
|
What changes were proposed in this pull request?
Implement stacked autoencoder
EmptyLayerWithSquaredError
andSigmoidLayerWithSquaredError
to handle inputs (-inf, +inf) and [0, 1]How was this patch tested?
Provide unit tests