diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md new file mode 100644 index 000000000..74b741374 --- /dev/null +++ b/.github/ISSUE_TEMPLATE.md @@ -0,0 +1,9 @@ +## Due Date +*To be completed by:* YYYY-MM-DD + + +## Description +*Write a short description of what needs to be done.* + +## Assignees +*Please ensure you have assigned at least one person to this issue. Include any authors and reviewers required.* diff --git a/_includes/sidebar.html b/_includes/sidebar.html index 3b822b013..25bb89992 100644 --- a/_includes/sidebar.html +++ b/_includes/sidebar.html @@ -31,6 +31,7 @@
Eclipse Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J brings AIAI to business environments for use on distributed GPUs and CPUs.
+Eclipse Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J brings AI to business environments for use on distributed GPUs and CPUs.
Skymind is its commercial support arm, bundling Deeplearning4j and other libraries such as Tensorflow and Keras in the Skymind Intelligence Layer (Community Edition), a deep learning environment that gives developers an easy, fast way to train and deploy AI models. SKIL CE is free and downloadable here. SKIL acts as a bridge between Python data science environments and the JVM.
Deeplearning4j aims to be cutting-edge plug and play, more convention than configuration, which allows for fast prototyping for non-researchers. DL4J is customizable at scale. Released under the Apache 2.0 license, all derivatives of DL4J belong to their authors. DL4J can import neural net models from most major frameworks via Keras, including TensorFlow, Caffe and Theano, bridging the gap between the Python ecosystem and the JVM with a cross-team toolkit for data scientists, data engineers and DevOps. Keras is employed as Deeplearning4j's Python API. Skymind is the second-largest contributor to Keras after Google, and offers commercial support for Keras. Machine learning models are served in production with Skymind's machine learning server.
+Deeplearning4j aims to be cutting-edge plug and play, more convention than configuration, which allows for fast prototyping for data scientists, machine-learning practitioners and software engineers. DL4J is customizable at scale. Released under the Apache 2.0 license, all derivatives of DL4J belong to their authors. DL4J can import neural net models from most major frameworks via Keras, including TensorFlow, Caffe and Theano, bridging the gap between the Python ecosystem and the JVM with a cross-team toolkit for data scientists, data engineers and DevOps. Keras is Deeplearning4j's Python API. Skymind is the second-largest contributor to Keras after Google, and offers commercial support for Keras. Machine learning models are served in production with Skymind's machine learning server.
+GET STARTED WITH MACHINE LEARNING +
+ +Many machine learning vendors, ranging from Google to startups such as Datarobot and H2O.ai, claim that they can automate machine learning. That sounds great! Then you, the hiring manager, won't need to go chasing after data science talent whose skills you can't judge in a bidding war you can't win. You'll just automate all those skills away. + +The problem is, the skills that data scientists possess are hard to automate, and people who seek to buy automated AI should be aware of what exactly can be automated, and what can't, with present technology. Data scientists perform many tasks. While automating some of those tasks may lighten their workload, unless you can automate all of their tasks, they are still necessary, and that scarce talent will remain a chokepoint that hinders the implementation of machine learning in many organizations. + +## What Can We Automate in Machine Learning? + +I mentioned that data scientists *tune* algorithms. When you tune a complex machine (and these algorithms are just mathematical and symbolic machines), you usually have several knobs to turn. It's kind of like cooking something with several ingredients. To produce the right taste, to tune your dish as it were, those ingredients should be added in proper proportion to one another, just like you might add twice as much [buttermilk as you do butter to a biscuit recipe](https://www.marthastewart.com/349650/biscuits). The idea is, the right proportions matter. + +A data scientist is frequently operating without a "recipe", and must tune knobs in combination with each other other to explore which combination works. In this case, "working" means tuning an algorithm until it is able to learn efficiently from the data it is given to train upon. + +### Hyperparameter Optimization + +In data science, the knobs on an algorithm are called hyperparameters, and so the data scientists are performing "hyperparameter search" as they test different combinations of those hyperparameters, different ratios between their ingredients. + +Hyperparameter search can be automated. [Eclipse Arbiter](https://github.com/deeplearning4j/arbiter) is a hyperparameter optimization library designed to automate hyperparameter tuning for deep neural net training. It is the equivalent of Google Tensorflow's Vizier, or the open-source Python library Spearmint. Arbiter is part of the Deeplearning4j framework. Some startups, like [SigOpt](https://sigopt.com/), are focused solely on hyperparameter optimization. + +You can search for the best combination of hyperparameters with different kinds of search algorithm, like grid search, random search and Bayesian methods. + +### Algorithm Selection + +One thing that AI vendors will do is run the same data through several algorithms whose hyperparameters are set by default, to determine which algorithm can learn best on your data. At the end of the contest, they select the winner. Visualizing these algorithmic beauty contests is a dramatic way to show the work being done. However, it has its limits, notably in the range of algorithms that are chosen to run in any given race, and how well they are tuned. + +### Limited Use Cases on the Happy Path + +AI vendors can be smart about the algorithms they select only if they have some knowledge of the problem that is being solved, and the data that is being used to train the algorithm. In many real-world situations, lengthy data exploration and some domain-specific knowledge are necessary to select the right algorithms. + +In the world of automated machine learning, we pretend that data exploration and domain knowledge don't matter. We can only do that for a few limited use cases. In software, this is called the [happy path](https://en.wikipedia.org/wiki/Happy_path), or the use case where everything goes as we expect it to. Automated machine learning has a narrow happy path; that is, it's easy to step off the path and get into trouble. + +For example, it's easy to automate machine learning for a simple use case like scoring your leads to Salesforce to predict the likelihood that you will close a sale. That's because the schema of the data -- the things you know about your customers -- is constrained by Salesforce software and fairly standardized across sales teams. An automated machine learning solution focused on lead scoring can make strong assumptions about the type of data you will feed it. + +But companies need machine learning for more than lead scoring. Their use cases differ, and so does their data. In those cases, it can be hard to offer a pre-baked solution. Data pipelines, also known as ETL, are often the stage of the AI workflow that require the most human attention. The real world is messy and data, which represents that world, is usually messy, too. Most datasets need to be explored, cleaned and otherwise pre-processed before that data can be fruitfully used to train a machine-learning algorithm. That cleaning and exploration often requires expert humans. + +### Professional Services + +Those companies have two choices: they can hire their own data scientists or rely on processional services from consulting firms. Every major public cloud vendor has introduced machine-learning solutions teams in an attempt to close the talent gap and make machine learning more available to potential users of their clouds. The major consultancies, from Accenture to Bain, have hired teams of data scientists to build solutions for their clients. Even automated machine-learning startups like Data Robot offer "Customer-facing Data Scientists". + +So a lot of time, AI vendors that sell automated machine learning are really "automating" those tasks with humans; i.e. that is, they're allowing their clients to outsource the talent that their clients can't otherwise get access to. This is because the tasks and decisions involved in building AI solutions are many, varied and complex, and the technology does not yet exist to automate all of them. That's automation, it's services. We should call it what it is and recognize that buiding machine-learing often requires the refined judgment of experts, combined with automation for a few narrow tasks in a larger AI workflow. + +### Transfer Learning and Pre-Trained Models + +Machine learning models start out dumb and get smart by being exposed to data that they "train" on. Training involves making guesses about the data, measuring the error in their guesses, and correcting themselves until they make more accurate guesses. Machine learning algorithms train on data to produce an accurate "model" of the data. A trained, accurate model of the data is one that is capable of producing good predictions when it is fed new data that resembles what it trained on. For the purposes of this discussion, imagine a model as a black box that performs a mathematical operation on data to make a prediction about it. The data goes into the model, the prediction comes out; e.g. feed an image of one of your friends into the model, and it will predict the name of the friend in the image. + +Sometimes, you can train a machine-learning model on one set of data, and then use it for another, slightly different set of data later. This only works when the two datasets resemble each other. For example, most photographs have certain characteristics in common. If you train a machine-learning model on, say, celebrity faces, it will learn what humans look like, and with just a little extra learning, you could teach it to transfer what it knows to photographs of your family and friends, whom it has never seen before. Using a pre-trained model could save you the cost of training your own over thousands of hours on distributed GPUs, an expensive proposition. + +Pre-trained machine-learning models that gain some knowledge of the world are useful in computer vision, and widely available. Some well-known pre-trained computer vision models include AlexNet, LeNet, VGG16, YOLO and Inception. [Those pre-trained computer vision models are available here](https://github.com/deeplearning4j/deeplearning4j/tree/master/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model). Google's [Cloud AutoML](https://cloud.google.com/automl/) relies on transfer learning, among other methods, to support its claim that it has "automated machine learning." diff --git a/convolutionalnetwork.md b/convolutionalnetwork.md index df9ebb47d..443cab30a 100755 --- a/convolutionalnetwork.md +++ b/convolutionalnetwork.md @@ -1,24 +1,24 @@ --- -title: A Beginner's Guide to Deep Convolutional Networks (CNNs) +title: A Beginner's Guide to Deep Convolutional Neural Networks (CNNs) layout: default redirect_from: convolutionalnets --- -# A Beginner's Guide to Deep Convolutional Networks (CNNs) +# A Beginner's Guide to Deep Convolutional Neural Networks (CNNs) Contents -* Deep Convolutional Network Introduction +* Deep Convolutional Neural Network Introduction * Images Are 4-D Tensors? -* Convolutional Net Definition -* How Deep Convolutional Networks Work +* Convolutional Neural Network Definition +* How Deep Convolutional Neural Networks Work * Maxpooling/Downsampling * Just Show Me the Code * More CNN Resources -## Introduction to Deep Convolutional Networks +## Introduction to Deep Convolutional Neural Networks -Convolutional networks are deep artificial neural networks that are used primarily to classify images (e.g. name what they see), cluster them by similarity (photo search), and perform object recognition within scenes. They are algorithms that can identify faces, individuals, street signs, tumors, platypuses and many other aspects of visual data. +Convolutional neural networks are deep artificial neural networks that are used primarily to classify images (e.g. name what they see), cluster them by similarity (photo search), and perform object recognition within scenes. They are algorithms that can identify faces, individuals, street signs, tumors, platypuses and many other aspects of visual data. Convolutional networks perform optical character recognition (OCR) to digitize text and make natural-language processing possible on analog and hand-written documents, where the images are symbols to be transcribed. CNNs can also be applied to sound when it is represented visually as a spectrogram. More recently, convolutional networks have been applied directly to [text analytics](http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/) as well as graph data with [graph convolutional networks](./graphanalytics). @@ -30,7 +30,7 @@ The efficacy of convolutional nets (ConvNets or CNNs) in image recognition is on ## Images Are 4-D Tensors? -Convolutional nets ingest and process images as tensors, and tensors are matrices of numbers with additional dimensions. +Convolutional neural networks ingest and process images as tensors, and tensors are matrices of numbers with additional dimensions. They can be hard to visualize, so let’s approach them by analogy. A scalar is just a number, such as 7; a vector is a list of numbers (e.g., `[7,8,9]`); and a matrix is a rectangular grid of numbers occupying several rows and columns like a spreadsheet. Geometrically, if a scalar is a zero-dimensional point, then a vector is a one-dimensional line, a matrix is a two-dimensional plane, a stack of matrices is a three-dimensional cube, and when each element of those matrices has a stack of *feature maps* atttached to it, you enter the fourth dimension. For reference, here’s a 2 x 2 matrix: @@ -181,7 +181,7 @@ All Deeplearning4j [examples of convolutional networks are available here](https ## Other Resources -To see DL4J convolutional networks in action, please run our [examples](https://github.com/deeplearning4j/dl4j-examples/tree/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/convolution/) after following the instructions on the [Quickstart page](./quickstart). +To see DL4J convolutional neural networks in action, please run our [examples](https://github.com/deeplearning4j/dl4j-examples/tree/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/convolution/) after following the instructions on the [Quickstart page](./quickstart). Skymind wraps NVIDIA's cuDNN and integrates with OpenCV. Our convolutional nets run on distributed GPUs using Spark, making them among the fastest in the world. You can learn how to build a [image recognition web app with VGG16 here](./build_vgg_webapp) and how to [deploy CNNs to Android here](./android). diff --git a/java-ai.md b/java-ai.md index 5292ebb08..04fab56d2 100644 --- a/java-ai.md +++ b/java-ai.md @@ -1,17 +1,27 @@ --- -title: Artificial Intelligence (AI) for Java +title: Artificial Intelligence (AI) and Machine Learning for Java layout: default --- -# Artificial Intelligence (AI) for Java +# Artificial Intelligence (AI) and Machine Learning for Java + +## Why Use Java for AI? + +And more broadly, why should you use JVM languagues like Java, [Scala](./scala-ai.html), Clojure or Kotlin to build AI and machine-learning solutions? + +Java is the [most widely used programming language in the world](https://www.tiobe.com/tiobe-index/). Large organizations in the public and private sector have enormous Java code bases, and rely heavily on the JVM as a compute environment. In particular, much of the open-source big data stack is written for the JVM. This includes [Apache Hadoop](http://hadoop.apache.org/) for distributed data management; [Apache Spark](./spark) as a distributed run-time for fast ETL; [Apache Kafka](https://kafka.apache.org/) as a message queue; [ElasticSearch](https://www.elastic.co/), [Apache Lucene](https://lucene.apache.org/) and [Apache Solr](http://lucene.apache.org/solr/) for search; and [Apache Cassandra](http://cassandra.apache.org/) for data storage to name a few. + +Since access to data is a prequisite to building AI and machine-learning solutions, AI tools need to integrate well with those technologies. AI starts with the data you gather. That's why the AI and machine-learning tooling you choose is crucial. The right tools solve a lot of integration problems (many data science projects fail when prototypes can't integrate with the production stack), and they will accelerate the digital transformation of many of the world's businesses. + +"Accelerating digital transformation" sounds like a bunch of empty buzzwords, so let's paraphrase it. Choosing the right machine learning tools allows you to produce more accurate predictions about your data while using your existing technology stack, and those predictions will allow you to make better decisions for your business. Those predictions might be the basis of a cool new product (self-piloting drones) or lead to big cost savings. We have listed the most important machine-learning tools written in Java below. ## Deep Learning & Neural Networks Deep learning usually refers to deep artificial neural networks. [Neural networks](https://deeplearning4j.org/neuralnet-overview) are a type of machine learning algorithm loosely modeled on the neurons in the human brain. Deep neural nets involve stacking several neural nets on top of each other to enable a feature hierarchy for more accurate classification and prediction. Deep learning is the state of the art in most tasks or machine perception, involved classification, clustering and prediction applied to raw sensory data. -### Deeplearning4j +### Eclipse Deeplearning4j -[Deeplearning4j](deeplearning4j.org) is the most widely used open source deep learning library for Java and the JVM. It also has a Scala API and uses Keras as its Python API for neural network configuration. The official website provides many tutorials and simple theoretical explanations for deep learning and neural networks. +[Eclipse Deeplearning4j](https://deeplearning4j.org/) is the most widely used open source deep learning library for Java and the JVM. It includes [multilayer perceptrons](./multilayerperceptron), [convolutional neural networks (CNNs) for image and text classification](./convolutionalnetwork), [recurrent neural networks such as LSTMs for text and time series data](./lstm), and [various autoencoders like VAEs and GANs](./generative-adversarial-network). Its auto-differentiation library, SameDiff, allows developers to create any neural network. It has a Scala API and uses Keras as its optional Python API. The official website provides many tutorials and simple theoretical explanations for deep learning and neural networks. Deeplearning4j includes machine-learning algorithms such as logistic regression and k-neurest neighbors.GET STARTED WITH DEEP LEARNING @@ -53,7 +63,7 @@ Machine learning encompasses a wide range of algorithms that are able to adapt t ### SMILE -[SMILE](https://github.com/haifengl/smile) stands for Statistical and Machine Intelligence Learning Engine. SMILE was create by Haifeng Lee, and provides fast, scalable machine learning for Java. +[SMILE](https://github.com/haifengl/smile) stands for Statistical and Machine Intelligence Learning Engine. SMILE was create by Haifeng Lee, and provides fast, scalable machine learning for Java. SMILE uses ND4J to perform scientific computing for large-scale tensor manipulations. It includes algorithms such as support vector machines (SVMs), [decision trees](./decision-tree), [random forests](./random-forest) and gradient boosting, among others. ### SINGA @@ -69,7 +79,7 @@ Machine learning encompasses a wide range of algorithms that are able to adapt t ### Weka -[Weka](http://www.cs.waikato.ac.nz/ml/weka/) is a collection of machine learning algorithms that can be applied directly to a dataset, through the Weka GUI or API. The WEKA community is large, providing various tutorials for Weka and machine learning itself. +[Weka](http://www.cs.waikato.ac.nz/ml/weka/) is a collection of machine learning algorithms that can be applied directly to a dataset, through the Weka GUI or API. The WEKA community is large, providing various tutorials for Weka and machine learning itself. WEKA uses Deeplearning4j for its neural network implementation. ### MOA (Massive On-line Analysis) [MOA (Massive On-line Analysis)](https://moa.cms.waikato.ac.nz/) is for mining data streams. @@ -101,12 +111,12 @@ All machine learning libraries depend on some form of scientific computing. For people just getting started with deep learning, the following tutorials and videos provide an easy entrance to the fundamental ideas of feedforward networks: * [Introduction to Deep Neural Networks](./neuralnet-overview.html) -* [Convolutional Networks for Image Recognition](./convolutionalnets.html) +* [Convolutional Neural Networks (CNNs) for Image Recognition](./convolutionalnetwork.html) * [Recurrent Networks and LSTMs](./lstm.html) * [Generative Adversarial Networks (GANs)](/generative-adversarial-network.html) * [Deep Reinforcement Learning](./deepreinforcementlearning.html) * [Symbolic Reasoning and Deep Learning](./symbolicreasoning.html) -* [Graph Data and Deep Learning](./graphdata.html) +* [Graph Data Analytics and Deep Learning](./graphanalytics.html) * [Word2vec and Natural-Language Processing](./word2vec.html) * [MNIST for Beginners](./mnist-for-beginners.html) * [Restricted Boltzmann Machines](./restrictedboltzmannmachine.html) @@ -114,5 +124,5 @@ For people just getting started with deep learning, the following tutorials and * [Glossary of Deep-Learning and Neural-Net Terms](./glossary.html) * [Deeplearning4j Examples via Quickstart](./quickstart.html) * [Artificial Intelligence (AI) for Scala](./scala-ai.html) -* [Inference: Machine Learning Model Server](./modelserver.html) +* [Inference: Machine Learning Model Server](./machine-learning-server.html) * [Multilayer Perceptron (MLPs) for Classification](./multilayerperceptron.html) diff --git a/lstm.md b/lstm.md index 15039aa6d..cd28060c1 100644 --- a/lstm.md +++ b/lstm.md @@ -231,6 +231,7 @@ Here are a few ideas to keep in mind when manually optimizing hyperparameters fo * [Introduction to Decision Trees](./decision-tree.html) * [Introduction to Random Forests](./random-forest.html) * [Open Datasets for Machine Learning](./opendata.html) +* [Deep Learning on Apache Spark](./spark.html) * [AI vs. Machine Learning vs. Deep Learning](./ai-machinelearning-deeplearning.html) * [Inference in Production: Machine Learning Model Server](./machine-learning-server.html) diff --git a/opendata.md b/opendata.md index 8da5b73fa..082ed20fb 100644 --- a/opendata.md +++ b/opendata.md @@ -1,9 +1,9 @@ --- -title: Open Datasets for Deep Learning +title: Open Datasets for Deep Learning & Machine Learning layout: default --- -# Open Data for Deep Learning +# Open Data for Deep Learning & Machine Learning Here you'll find an organized list of interesting, high-quality datasets for machine learning research. We welcome your contributions for [curating this list](https://github.com/deeplearning4j/deeplearning4j/blob/gh-pages/opendata.md)! You can find other lists of such datasets [on Wikipedia](https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research), for example. @@ -19,10 +19,13 @@ Here you'll find an organized list of interesting, high-quality datasets for mac * [Open Data Monitor](https://opendatamonitor.eu/) * [Quandl Data Portal](https://www.quandl.com/) * [Mut1ny Face/Head segmentation dataset] (http://www.mut1ny.com/face-headsegmentation-dataset) + +
GET STARTED WITH DEEP LEARNING
+ ## Natural-Image Datasets * [MNIST: handwritten digits](http://yann.lecun.com/exdb/mnist/): The most commonly used sanity check. Dataset of 25x25, centered, B&W handwritten digits. It is an easy task — just because something works on MNIST, doesn’t mean it works. diff --git a/releasenotes.md b/releasenotes.md index 1e7bc219a..4c40d7880 100644 --- a/releasenotes.md +++ b/releasenotes.md @@ -37,7 +37,7 @@ layout: default - Layers (new and enhanced) - - Added Yolo2OutputLayer CNN layer for object detection ([Link](https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/objdetect/Yolo2OutputLayer.java)). See also DataVec's [ObjectDetectionRecordReader](https://github.com/deeplearning4j/DataVec/blob/master/datavec-data/datavec-data-image/src/main/java/org/datavec/image/recordreader/objdetect/ObjectDetectionRecordReader.java) + - Added Yolo2OutputLayer CNN layer for object detection ([Link](https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/objdetect/Yolo2OutputLayer.java)). See also DataVec's [ObjectDetectionRecordReader](https://github.com/deeplearning4j/DataVec/blob/master/datavec-data/datavec-data-image/src/main/java/org/datavec/image/recordreader/objdetect/ObjectDetectionRecordReader.java) - Adds support for 'no bias' layers via ```hasBias(boolean)``` config (DenseLayer, EmbeddingLayer, OutputLayer, RnnOutputLayer, CenterLossOutputLayer, ConvolutionLayer, Convolution1DLayer). EmbeddingLayer now defaults to no bias ([Link](https://github.com/deeplearning4j/deeplearning4j/pull/3882)) - Adds support for dilated convolutions (aka 'atrous' convolutions) - ConvolutionLayer, SubsamplingLayer, and 1D versions there-of. ([Link](https://github.com/deeplearning4j/deeplearning4j/pull/3922)) - Added Upsampling2D layer, Upsampling1D layer ([Link](https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Upsampling2D.java), [Link](https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/layers/Upsampling1D.java)) @@ -59,7 +59,7 @@ layout: default - Added ISchedule interface; added Exponential, Inverse, Map, Poly, Sigmoid and Step schedule implementations ([Link](https://github.com/deeplearning4j/nd4j/tree/master/nd4j-backends/nd4j-api-parent/nd4j-api/src/main/java/org/nd4j/linalg/schedule)) - Added support for both iteration-based and epoch-based schedules via ISchedule. Also added support for custom (user defined) schedules - Learning rate schedules are configured on the updaters, via the ```.updater(IUpdater)``` method -- Added dropout API (IDropout - previously dropout was available but not a class); added Dropout, AlphaDropout (for use with self-normalizing NNs), GaussianDropout (multiplicative), GaussianNoise (additive). Added support for custom dropout types ([Link](https://github.com/deeplearning4j/deeplearning4j/tree/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/dropout)) +- Added dropout API (IDropout - previously dropout was available but not a class); added Dropout, AlphaDropout (for use with self-normalizing NNs), GaussianDropout (multiplicative), GaussianNoise (additive). Added support for custom dropout types ([Link](https://github.com/deeplearning4j/deeplearning4j/tree/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/dropout)) - Added support for dropout schedules via ISchedule interface ([Link](https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/dropout/Dropout.java#L64)) - Added weight/parameter noise API (IWeightNoise interface); added DropConnect and WeightNoise (additive/multiplicative Gaussian noise) implementations ([Link](https://github.com/deeplearning4j/deeplearning4j/tree/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/weightnoise)); dropconnect and dropout can now be used simultaneously - Adds layer configuration alias ```.units(int)``` equivalent to ```.nOut(int)``` ([Link](https://github.com/deeplearning4j/deeplearning4j/pull/3900)) @@ -69,12 +69,12 @@ layout: default - MultiLayerNetwork, ComputationGraph and layerwise trainable layers now track the number of epochs ([Link](https://github.com/deeplearning4j/deeplearning4j/pull/3957)) - Added deeplearning4j-ui-standalone module: uber-jar for easy launching of UI server (usage: ```java -jar deeplearning4j-ui-standalone-1.0.0-alpha.jar -p 9124 -r true -f c:/UIStorage.bin```) - Weight initializations: - - Added ```.weightInit(Distribution)``` convenience/overload (previously: required ```.weightInit(WeightInit.DISTRIBUTION).dist(Distribution)```) ([Link](https://github.com/deeplearning4j/deeplearning4j/commit/45cbb6efc2ad015397b4fdf5eac9d1e9dc70ac9c)) + - Added ```.weightInit(Distribution)``` convenience/overload (previously: required ```.weightInit(WeightInit.DISTRIBUTION).dist(Distribution)```) ([Link](https://github.com/deeplearning4j/deeplearning4j/commit/45cbb6efc2ad015397b4fdf5eac9d1e9dc70ac9c)) - WeightInit.NORMAL (for self-normalizing neural networks) ([Link](https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/weights/WeightInit.java)) - Ones, Identity weight initialization ([Link](https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/weights/WeightInit.java)) - Added new distributions (LogNormalDistribution, TruncatedNormalDistribution, OrthogonalDistribution, ConstantDistribution) which can be used for weight initialization ([Link](https://github.com/deeplearning4j/deeplearning4j/tree/master/deeplearning4j-nn/src/main/java/org/deeplearning4j/nn/conf/distribution)) - RNNs: Added ability to specify weight initialization for recurrent weights separately to "input" weights ([Link](https://github.com/deeplearning4j/deeplearning4j/pull/4579)) -- Added layer alias: Convolution2D (ConvolutionLayer), Pooling1D (Subsampling1DLayer), Pooling2D (SubsamplingLayer) ([Link](https://github.com/deeplearning4j/deeplearning4j/pull/4026)) +- Added layer alias: Convolution2D (ConvolutionLayer), Pooling1D (Subsampling1DLayer), Pooling2D (SubsamplingLayer) ([Link](https://github.com/deeplearning4j/deeplearning4j/pull/4026)) - Added Spark IteratorUtils - wraps a RecordReaderMultiDataSetIterator for use in Spark network training ([Link](https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-scaleout/spark/dl4j-spark/src/main/java/org/deeplearning4j/spark/datavec/iterator/IteratorUtils.java)) - CuDNN-supporting layers (ConvolutionLayer, etc) now warn the user if using CUDA without CuDNN ([Link](https://github.com/deeplearning4j/deeplearning4j/pull/4039)) - Binary cross entropy (LossBinaryXENT) now implements clipping (1e-5 to (1 - 1e-5) by default) to avoid numerical underflow/NaNs ([Link](https://github.com/deeplearning4j/nd4j/pull/2121)) @@ -99,7 +99,7 @@ layout: default - Added additional score functions for early stopping (ROC metrics, full set of Evaluation/Regression metrics, etc) ([Link](https://github.com/deeplearning4j/deeplearning4j/pull/4630)) - Added additional ROC and ROCMultiClass evaluation overloads for MultiLayerNetwork and ComputationGraph ([Link](https://github.com/deeplearning4j/deeplearning4j/pull/4642)) - Clarified Evaluation.stats() output to refer to "Predictions" instead of "Examples" (former is more correct for RNNs) ([Link](https://github.com/deeplearning4j/deeplearning4j/issues/4674)) -- EarlyStoppingConfiguration now supports ```Supplier