Permalink
Browse files

new pages and splits

  • Loading branch information...
1 parent bdcc8e8 commit bfa7a6af65dbf7774c045ab0bedfa278b6f0a70c @karpathy karpathy committed Feb 11, 2015
Showing with 76 additions and 74 deletions.
  1. +17 −0 convnet-tips.md
  2. +10 −51 convolutional-networks.md
  3. +10 −23 index.html
  4. +25 −0 transfer-learning.md
  5. +14 −0 understanding-cnn.md
View
@@ -0,0 +1,17 @@
+---
+layout: page
+permalink: /convnet-tips/
+---
+
+<a name='overfitting'></a>
+### Addressing Overfitting
+
+#### Data Augmentation
+
+- Flip the training images over x-axis
+- Sample random crops / scales in the original image
+- Jitter the colors
+
+#### Dropout
+
+- Dropout is just as effective for Conv layers. Usually people apply less dropout right before early conv layers since there are not that many parameters there compared to later stages of the network (e.g. the fully connected layers).
@@ -14,13 +14,12 @@ Table of Contents:
- [Normalization Layer](#norm)
- [Fully-Connected Layer](#fc)
- [ConvNet Architectures](#architectures)
-- [Visualizing ConvNets](#vis)
-- [Transfer Learning](#transfer) (pre-training, fine-tuning)
-- [Controlling Overfitting](#overfitting) (data augmentations, dropout)
+ - [Layer Patterns](#layerpat)
+ - [Layer Sizing Patterns](#layersizepat)
+ - [Case Studies](#case) (AlexNet / ZFNet / VGGNet)
+ - [Computational Considerations](#comp)
- [Additional References](#add)
-
-
## Convolutional Neural Networks (CNNs / ConvNets)
Convolutional Neural Networks are very similar to ordinary Neural Networks: They are made up of neurons that receive some inputs, perform a dot product with their weight vector and follow the dot product with a non-linearity. The whole network still express a single differentiable score function, from the image on one end to class scores at the other. And they still have a loss function (e.g. SVM/Softmax) on the last fully-connected layer and all the tips/tricks we developed for learning regular Neural Networks still apply.
@@ -192,6 +191,7 @@ Neurons in a fully connected layer have full connections to all activations in t
We have seen that Convolutional Networks are commonly made up of only three layer types: CONV, POOL (we assume Max pool unless stated otherwise) and FC (short for fully-connected). We will also explicitly write the RELU activation function as a layer, which applies elementwise non-linearity. In this section we discuss how these are commonly stacked together to form entire ConvNets.
+<a name='layerpat'></a>
#### Layer Patterns
The most common form of a ConvNet architecture stacks a few CONV-RELU layers, follows them with POOL layers, and repeats this pattern until the image has been merged spatially to a small size. At some point, it is common to transition to fully-connected layers. The last fully-connected layer holds the output, such as the class scores. In other words, the most common ConvNet architecture follows the pattern:
@@ -206,6 +206,7 @@ where the `*` indicates repetition, and the `POOL?` indicates an optional poolin
*Prefer a stack of small filter CONV to one large receptive field CONV layer*. Suppose that you stack three 3x3 CONV layers on top of each other (with non-linearities in between, of course). In this arrangement, each neuron on the first CONV layer has a 3x3 view of the input volume. A neuron on the second CONV layer has a 3x3 view of the first CONV layer, and hence by extension a 5x5 view of the input volume. Similarly, a neuron on the third CONV layer has a 3x3 view of the 2nd CONV layer, and hence a 7x7 view of the input volume. Suppose that instead of these three layers of 3x3 CONV, we only wanted to use a single CONV layer with 7x7 receptive fields. These neurons would have a receptive field size of the input volume that is identical in spatial extent (7x7), but with several disadvantages. First, the neurons would be computing a linear function over the input, while the three stacks of CONV layers contain non-linearities that make their features more expressive. Second, if we suppose that all the volumes have \\(C\\) channels, then it can be seen that the single 7x7 CONV layer would contain \\(C \times (7 \times 7 \times C) = 49 C^2\\) parameters, while the three 3x3 CONV layers would only contain \\(3 \times (C \times (3 \times 3 \times C)) = 27 C^2\\) parameters. Intuitively, stacking CONV layers with tiny filters as opposed to having one CONV layer with big filters allows us to express more powerful features of the input, and with fewer parameters. As a practical disadvantage, we might need more memory to hold all the intermediate CONV layer results if we plan to do backpropagation.
+<a name='layersizepat'></a>
#### Layer Sizing Patterns
Until now we've omitted mentions of common hyperparameters used in each of the layers in a ConvNet. We will first state the common rules of thumb for sizing the architectures and then follow the rules with a discussion of the motation:
@@ -224,64 +225,22 @@ The **pool layers** are in charge of downsampling the spatial dimensions of the
*Compromising based on memory constraints.* In some cases (especially early in the ConvNet architectures), the amount of memory can build up very quickly with the rules of thumb presented above. For example, filtering a 224x224x3 image with three 3x3 CONV layers with 64 filters each and padding 1 would create three activation volumes of size [224x224x64]. This amounts to a total of about 10 million activations, or 36MB of memory (per image). Since GPUs are often bottlenecked by memory, it may be necessary to compromise. In practice, people prefer to make the compromise at only the first CONV layer of the network. For example, one compromise might be to use a first CONV layer with filter sizes of 7x7 and stride of 2 (as seen in a ZF net). As another example, an AlexNet uses filer sizes of 11x11 and stride of 4.
+<a name='case'></a>
#### Case studies
- AlexNet
-- Zeiler&Fergus (ZF) Net
+- Zeiler & Fergus (ZF) Net
- VGGNet
+<a name='comp'></a>
#### Computational Considerations
- Add up the sizes of all volumes, multiply by 4 to get the number of bytes. Multiply by two because we need storage for forward pass (activations) and backward pass (gradients).
- Add the number of bytes to store parameters
- Dividy number of bytes by 1024 to get number of KB, by 1024 to get number of MB, and 1024 again to get GB.
- Most GPUs currently have about 4GB of memory, or 6GB or up to 12GB. Use minibatch size that maxes out the memory in your GPU. Remember that smaller batch sizes likely need smaller learning rates.
-<a name='vis'></a>
-### Visualizing ConvNets
-
-- DeconvNets to visualize source of neuron firings
-- Occluding parts of the image
-- Visualizating the data gradient
-- Reconstructing images based on given ConvNet codes
-- Fooling ConvNets
-
-<a name='transfer'></a>
-### Transfer Learning
-
-In practice, very few people train an entire Convolutional Network from scratch (with random initialization), because it is relatively rare to have a dataset of sufficient size. Instead, it is common to pretrain a ConvNet on a very large dataset (e.g. ImageNet, which contains 1.2 million images with 1000 categories), and then use the ConvNet either as an initialization or a fixed feature extractor for the task of interest. The three major Transfer Learning scenarios look as follows:
-
-- **ConvNet as fixed feature extractor**. Take a ConvNet pretrained on ImageNet, remove the last fully-connected layer (this layer's outputs are the 1000 class scores for a different task like ImageNet), then treat the rest of the ConvNet as a fixed feature extractor for the new dataset. In an AlexNet, this would compute a 4096-D vector for every image that contains the activations of the hidden layer immediately before the classifier. We call these features **CNN codes**. It is important for performance that these codes are ReLUd (i.e. thresholded at zero) if they were also thresholded during the training of the ConvNet on ImageNet (as is usually the case). Once you extract the 4096-D codes for all images, train a linear classifier (e.g. Linear SVM or Softmax classifier) for the new dataset.
-- **Fine-tuning the ConvNet**. The second strategy is to not only replace and retrain the classifier on top of the ConvNet on the new dataset, but to also fine-tune the weights of the pretrained network by continuing the backpropagation. It is possible to fine-tune all the layers of the ConvNet, or it's possible to keep some of the earlier layers fixed (due to overfitting concerns) and only fine-tune some higher-level portion of the network. This is motivated by the observation that the earlier features of a ConvNet contain more generic features (e.g. edge detectors or color blob detectors) that should be useful to many tasks, but later layers of the ConvNet becomes progressively more specific to the details of the classes contained in the original dataset. In case of ImageNet for example, which contains many dog breeds, a significant portion of the representational power of the ConvNet may be devoted to features that are specific to differentiating between dog breeds.
-
-**Pretrained models**. Since modern ConvNets take 2-3 weeks to train across multiple GPUs on ImageNet, it is common to see people release their final ConvNet checkpoints for the benefit of others who can use the networks for fine-tuning. For example, the Caffe library has a [Model Zoo](https://github.com/BVLC/caffe/wiki/Model-Zoo) where people share their network weights.
-
-**When and how to fine-tune?** How do you decide what type of transfer learning you should perform on a new dataset? This is a function of several factors, but the two most important ones are the size of the new dataset (small or big), and its similarity to the original dataset (e.g. ImageNet-like in terms of the content of images and the classes, or very different, such as microscope images). Keeping in mind that ConvNet features are more generic in early layers and more original-dataset-specific in later layers, here are some common rules of thumb for navigating the 4 major scenarios:
-
-1. *New dataset is small and similar to original dataset*. Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns. Since the data is similar to the original data, we expect higher-level features in the ConvNet to be relevant to this dataset as well. Hence, the best idea might be to train a linear classifier on the CNN codes.
-2. *New dataset is large and similar to the original dataset*. Since we have more data, we can have more confidence that we won't overfit if we were to try to fine-tune through the full network.
-3. *New dataset is small but very different from the original dataset*. Since the data is small, it is likely best to only train a linear classifier. Since the dataset is very different, it might not be best to train the classifier form the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier from activations somewhere earlier in the network.
-4. *New dataset is large and very different from the original dataset*. Since the dataset is very large, we may expect that we can afford to train a ConvNet from scratch. However, in practice it is very often still beneficial to initialize with weights from a pretrained model. In this case, we would have enough data and confidence to fine-tune through the entire network.
-
-**Practical advice**.
-
-- *Constraints from pretrained models*. Note that if you wish to use a pretrained network, you may be slightly constrained in terms of the architecture you can use for your new dataset. For example, you can't arbitrarily take out Conv layers from the pretrained network. However, some changes are straight-forward: Due to parameter sharing, you can easily run a pretrained network on images of different spatial size. This is clearly evident in the case of Conv/Pool layers because their forward function is independent of the input volume spatial size (as long as the strides "fit"). In case of FC layers, this still holds true because FC layers can be converted to a Convolutional Layer: For example, in an AlexNet, the final pooling volume before the first FC layer is of size [6x6x512]. Therefore, the FC layer looking at this volume is equivalent to having a Convultional Layer that has receptive field size 6x6, and is applied with padding of 0.
-- *Learning rates*. It's common to use a smaller learning rate for ConvNet weights that are being fine-tuned, in comparison to the (randomly-initialized) weights for the new linear classifier that computes the class scores of your new dataset. This is because we expect that the ConvNet weights are relatively good, so we don't wish to distort them too quickly and too much (especially while the new Linear Classifier above them is being trained from random initialization).
-
-<a name='overfitting'></a>
-### Addressing Overfitting
-
-#### Data Augmentation
-
-- Flip the training images over x-axis
-- Sample random crops / scales in the original image
-- Jitter the colors
-
-#### Dropout
-
-- Dropout is just as effective for Conv layers. Usually people apply less dropout right before early conv layers since there are not that many parameters there compared to later stages of the network (e.g. the fully connected layers).
-
<a name='add'></a>
### Additional Resources
-- [DeepLearning.net tutorial](http://deeplearning.net/tutorial/lenet.html)
+- [DeepLearning.net tutorial](http://deeplearning.net/tutorial/lenet.html)
View
@@ -132,49 +132,36 @@
</div>
<div class="materials-item notyet">
- <a href="">
- Understanding and visualizing Convolutional Neural Networks
- </a>
- </div>
-
- <div class="materials-item notyet">
- <a href="">
- Transfer learning and fine-tuning Convolutional Neural Networks
+ <a href="understanding-cnn/">
+ Understanding and Visualizing Convolutional Neural Networks
</a>
</div>
- <div class="materials-item notyet">
- <a href="">
- ConvNet Tips and tricks: squeezing out the last few percent
+ <div class="materials-item">
+ <a href="transfer-learning/">
+ Transfer Learning and Fine-tuning Convolutional Neural Networks
</a>
</div>
- <div class="module-header">Module 3: CNN In the wild</div>
-
<div class="materials-item notyet">
- <a href="">
- ImageNet challenge
+ <a href="convnet-tips/">
+ ConvNet Tips and Tricks: squeezing out the last few percent
</a>
</div>
- <div class="materials-item notyet">
- <a href="">
- Other visual recognition tasks: localization, detection
- </a>
- </div>
+ <div class="module-header">Module 3: ConvNets in the wild</div>
<div class="materials-item notyet">
<a href="">
- Beyond the ImageNet challenge
+ Other Visual Recognition Tasks: Localization, Detection, Segmentation
</a>
</div>
<div class="materials-item notyet">
<a href="">
- Convolutional Neural Networks in practice: Caffe
+ ConvNets in Practice: Distributed Training, GPU bottlenecks, Libraries
</a>
</div>
-
</div>
</div>
Oops, something went wrong.

0 comments on commit bfa7a6a

Please sign in to comment.