Browse files

fix image links

  • Loading branch information...
1 parent 540ef6c commit 2d4348b1482cefdb4f165afc1cb9b0295f703f60 @myungsub myungsub committed Mar 28, 2016
@@ -3,7 +3,7 @@ title: CS231n Convolutional Neural Networks for Visual Recognition
description: "스탠포드 CS231n: Convolutional Neural Networks for Visual Recognition 수업자료 번역사이트"
baseurl: "/cs231n"
-url: ""
+url: ""
twitter_username: kjw6612
github_username: aikorea
@@ -23,5 +23,5 @@
ga('send', 'pageview');
@@ -21,7 +21,7 @@ Console" button. It will direct you to a signup page which looks like the
<div class='fig figcenter fighighlight'>
- <img src='/assets/aws-signup.png'>
+ <img src='{{site.baseurl}}/assets/aws-signup.png'>
Select the "I am a new user" checkbox, click the "Sign in using our secure
@@ -34,13 +34,13 @@ click on "Sign In to the Console", and this time sign in using your username and
<div class='fig figcenter fighighlight'>
- <img src='/assets/aws-signin.png'>
+ <img src='{{site.baseurl}}/assets/aws-signin.png'>
Once you have signed in, you will be greeted by a page like this:
<div class='fig figcenter fighighlight'>
- <img src='/assets/aws-homepage.png'>
+ <img src='{{site.baseurl}}/assets/aws-homepage.png'>
Make sure that the region information on the top right is set to N. California.
@@ -55,14 +55,14 @@ Next, click on the EC2 link (first link under the Compute category). You will go
to a dashboard page like this:
<div class='fig figcenter fighighlight'>
- <img src='/assets/ec2-dashboard.png'>
+ <img src='{{site.baseurl}}/assets/ec2-dashboard.png'>
Click the blue "Launch Instance" button, and you will be redirected to a page
like the following:
<div class='fig figcenter fighighlight'>
- <img src='/assets/ami-selection.png'>
+ <img src='{{site.baseurl}}/assets/ami-selection.png'>
Click on the "Community AMIs" link on the left sidebar, and search for "cs231n"
@@ -71,19 +71,19 @@ in the search box. You should be able to see the AMI
AMI, and continue to the next step to choose your instance type.
<div class='fig figcenter fighighlight'>
- <img src='/assets/community-AMIs.png'>
+ <img src='{{site.baseurl}}/assets/community-AMIs.png'>
Choose the instance type `g2.2xlarge`, and click on "Review and Launch".
<div class='fig figcenter fighighlight'>
- <img src='/assets/instance-selection.png'>
+ <img src='{{site.baseurl}}/assets/instance-selection.png'>
In the next page, click on Launch.
<div class='fig figcenter fighighlight'>
- <img src='/assets/launch-screen.png'>
+ <img src='{{site.baseurl}}/assets/launch-screen.png'>
You will be then prompted to create or use an existing key-pair. If you already
@@ -94,11 +94,11 @@ somewhere that you won't accidentally delete. Remember that there is **NO WAY**
to get to your instance if you lose your key.
<div class='fig figcenter fighighlight'>
- <img src='/assets/key-pair.png'>
+ <img src='{{site.baseurl}}/assets/key-pair.png'>
<div class='fig figcenter fighighlight'>
- <img src='/assets/key-pair-create.png'>
+ <img src='{{site.baseurl}}/assets/key-pair-create.png'>
Once you download your key, you should change the permissions of the key to
@@ -113,15 +113,15 @@ After this is done, click on "Launch Instances", and you should see a screen
showing that your instances are launching:
<div class='fig figcenter fighighlight'>
- <img src='/assets/launching-screen.png'>
+ <img src='{{site.baseurl}}/assets/launching-screen.png'>
Click on "View Instances" to see your instance state. It should change to
"Running" and "2/2 status checks passed" as shown below within some time. You
are now ready to ssh into the instance.
<div class='fig figcenter fighighlight'>
- <img src='/assets/instances-page.png'>
+ <img src='{{site.baseurl}}/assets/instances-page.png'>
First, note down the Public IP of the instance from the instance listing. Then,
@@ -24,7 +24,7 @@ permalink: /classification/
For example, in the image below an image classification model takes a single image and assigns probabilities to 4 labels, *{cat, dog, hat, mug}*. As shown in the image, keep in mind that to a computer an image is represented as one large 3-dimensional array of numbers. In this example, the cat image is 248 pixels wide, 400 pixels tall, and has three color channels Red,Green,Blue (or RGB for short). Therefore, the image consists of 248 x 400 x 3 numbers, or a total of 297,600 numbers. Each number is an integer that ranges from 0 (black) to 255 (white). Our task is to turn this quarter of a million numbers into a single label, such as *"cat"*.
<div class="fig figcenter fighighlight">
- <img src="/assets/classify.png">
+ <img src="{{site.baseurl}}/assets/classify.png">
<div class="figcaption">The task in Image Classification is to predict a single label (or a distribution over labels as shown here to indicate our confidence) for a given image. Images are 3-dimensional arrays of integers from 0 to 255, of size Width x Height x 3. The 3 represents the three color channels Red, Green, Blue.</div>
@@ -41,14 +41,14 @@ For example, in the image below an image classification model takes a single ima
A good image classification model must be invariant to the cross product of all these variations, while simultaneously retaining sensitivity to the inter-class variations.
<div class="fig figcenter fighighlight">
- <img src="/assets/challenges.jpeg">
+ <img src="{{site.baseurl}}/assets/challenges.jpeg">
<div class="figcaption"></div>
**Data-driven approach**. How might we go about writing an algorithm that can classify images into distinct categories? Unlike writing an algorithm for, for example, sorting a list of numbers, it is not obvious how one might write an algorithm for identifying cats in images. Therefore, instead of trying to specify what every one of the categories of interest look like directly in code, the approach that we will take is not unlike one you would take with a child: we're going to provide the computer with many examples of each class and then develop learning algorithms that look at these examples and learn about the visual appearance of each class. This approach is referred to as a *data-driven approach*, since it relies on first accumulating a *training dataset* of labeled images. Here is an example of what such a dataset might look like:
<div class="fig figcenter fighighlight">
- <img src="/assets/trainset.jpg">
+ <img src="{{site.baseurl}}/assets/trainset.jpg">
<div class="figcaption">An example training set for four visual categories. In practice we may have thousands of categories and hundreds of thousands of images for each category.</div>
@@ -65,7 +65,7 @@ As our first approach, we will develop what we call a **Nearest Neighbor Classif
**Example image classification dataset: CIFAR-10.** One popular toy image classification dataset is the <a href="">CIFAR-10 dataset</a>. This dataset consists of 60,000 tiny images that are 32 pixels high and wide. Each image is labeled with one of 10 classes (for example *"airplane, automobile, bird, etc"*). These 60,000 images are partitioned into a training set of 50,000 images and a test set of 10,000 images. In the image below you can see 10 random example images from each one of the 10 classes:
<div class="fig figcenter fighighlight">
- <img src="/assets/nn.jpg">
+ <img src="{{site.baseurl}}/assets/nn.jpg">
<div class="figcaption">Left: Example images from the <a href="">CIFAR-10 dataset</a>. Right: first column shows a few test images and next to each we show the top 10 nearest neighbors in the training set according to pixel-wise difference.</div>
@@ -80,7 +80,7 @@ $$
Where the sum is taken over all pixels. Here is the procedure visualized:
<div class="fig figcenter fighighlight">
- <img src="/assets/nneg.jpeg">
+ <img src="{{site.baseurl}}/assets/nneg.jpeg">
<div class="figcaption">An example of using pixel-wise differences to compare two images with L1 distance (for one color channel in this example). Two images are subtracted elementwise and then all differences are added up to a single number. If two images are identical the result will be zero. But if the images are very different the result will be large.</div>
@@ -161,7 +161,7 @@ Note that I included the `np.sqrt` call above, but in a practical nearest neighb
You may have noticed that it is strange to only use the label of the nearest image when we wish to make a prediction. Indeed, it is almost always the case that one can do better by using what's called a **k-Nearest Neighbor Classifier**. The idea is very simple: instead of finding the single closest image in the training set, we will find the top **k** closest images, and have them vote on the label of the test image. In particular, when *k = 1*, we recover the Nearest Neighbor classifier. Intuitively, higher values of **k** have a smoothing effect that makes the classifier more resistant to outliers:
<div class="fig figcenter fighighlight">
- <img src="/assets/knn.jpeg">
+ <img src="{{site.baseurl}}/assets/knn.jpeg">
<div class="figcaption">An example of the difference between Nearest Neighbor and a 5-Nearest Neighbor classifier, using 2-dimensional points and 3 classes (red, blue, green). The colored regions show the <b>decision boundaries</b> induced by the classifier with an L2 distance. The white regions show points that are ambiguously classified (i.e. class votes are tied for at least two classes). Notice that in the case of a NN classifier, outlier datapoints (e.g. green point in the middle of a cloud of blue points) create small islands of likely incorrect predictions, while the 5-NN classifier smooths over these irregularities, likely leading to better <b>generalization</b> on the test data (not shown). Also note that the gray regions in the 5-NN image are caused by ties in the votes among the nearest neighbors (e.g. 2 neighbors are red, next two neighbors are blue, last neighbor is green).</div>
@@ -212,7 +212,7 @@ By the end of this procedure, we could plot a graph that shows which values of *
In cases where the size of your training data (and therefore also the validation data) might be small, people sometimes use a more sophisticated technique for hyperparameter tuning called **cross-validation**. Working with our previous example, the idea is that instead of arbitrarily picking the first 1000 datapoints to be the validation set and rest training set, you can get a better and less noisy estimate of how well a certain value of *k* works by iterating over different validation sets and averaging the performance across these. For example, in 5-fold cross-validation, we would split the training data into 5 equal folds, use 4 of them for training, and 1 for validation. We would then iterate over which fold is the validation fold, evaluate the performance, and finally average the performance across the different folds.
<div class="fig figleft fighighlight">
- <img src="/assets/cvplot.png">
+ <img src="{{site.baseurl}}/assets/cvplot.png">
<div class="figcaption">Example of a 5-fold cross-validation run for the parameter <b>k</b>. For each value of <b>k</b> we train on 4 folds and evaluate on the 5th. Hence, for each <b>k</b> we receive 5 accuracies on the validation fold (accuracy is the y-axis, each result is a point). The trend line is drawn through the average of the results for each <b>k</b> and the error bars indicate the standard deviation. Note that in this particular case, the cross-validation suggests that a value of about <b>k</b> = 7 works best on this particular dataset (corresponding to the peak in the plot). If we used more than 5 folds, we might expect to see a smoother (i.e. less noisy) curve.</div>
<div style="clear:both"></div>
@@ -221,7 +221,7 @@ In cases where the size of your training data (and therefore also the validation
**In practice**. In practice, people prefer to avoid cross-validation in favor of having a single validation split, since cross-validation can be computationally expensive. The splits people tend to use is between 50%-90% of the training data for training and rest for validation. However, this depends on multiple factors: For example if the number of hyperparameters is large you may prefer to use bigger validation splits. If the number of examples in the validation set is small (perhaps only a few hundred or so), it is safer to use cross-validation. Typical number of folds you can see in practice would be 3-fold, 5-fold or 10-fold cross-validation.
<div class="fig figcenter fighighlight">
- <img src="/assets/crossval.jpeg">
+ <img src="{{site.baseurl}}/assets/crossval.jpeg">
<div class="figcaption">Common data splits. A training and test set is given. The training set is split into folds (for example 5 folds here). The folds 1-4 become the training set. One fold (e.g. fold 5 here in yellow) is denoted as the Validation fold and is used to tune the hyperparameters. Cross-validation goes a step further iterates over the choice of which fold is the validation fold, separately from 1-5. This would be referred to as 5-fold cross-validation. In the very end once the model is trained and all the best hyperparameters were determined, the model is evaluated a single time on the test data (red).</div>
@@ -235,15 +235,15 @@ As an aside, the computational complexity of the Nearest Neighbor classifier is
The Nearest Neighbor Classifier may sometimes be a good choice in some settings (especially if the data is low-dimensional), but it is rarely appropriate for use in practical image classification settings. One problem is that images are high-dimensional objects (i.e. they often contain many pixels), and distances over high-dimensional spaces can be very counter-intuitive. The image below illustrates the point that the pixel-based L2 similarities we developed above are very different from perceptual similarities:
<div class="fig figcenter fighighlight">
- <img src="/assets/samenorm.png">
+ <img src="{{site.baseurl}}/assets/samenorm.png">
<div class="figcaption">Pixel-based distances on high-dimensional data (and images especially) can be very unintuitive. An original image (left) and three other images next to it that are all equally far away from it based on L2 pixel distance. Clearly, the pixel-wise distance does not correspond at all to perceptual or semantic similarity.</div>
Here is one more visualization to convince you that using pixel differences to compare images is inadequate. We can use a visualization technique called <a href="">t-SNE</a> to take the CIFAR-10 images and embed them in two dimensions so that their (local) pairwise distances are best preserved. In this visualization, images that are shown nearby are considered to be very near according to the L2 pixelwise distance we developed above:
<div class="fig figcenter fighighlight">
- <img src="/assets/pixels_embed_cifar10.jpg">
- <div class="figcaption">CIFAR-10 images embedded in two dimensions with t-SNE. Images that are nearby on this image are considered to be close based on the L2 pixel distance. Notice the strong effect of background rather than semantic class differences. Click <a href="/assets/pixels_embed_cifar10_big.jpg">here</a> for a bigger version of this visualization.</div>
+ <img src="{{site.baseurl}}/assets/pixels_embed_cifar10.jpg">
+ <div class="figcaption">CIFAR-10 images embedded in two dimensions with t-SNE. Images that are nearby on this image are considered to be close based on the L2 pixel distance. Notice the strong effect of background rather than semantic class differences. Click <a href="{{site.baseurl}}/assets/pixels_embed_cifar10_big.jpg">here</a> for a bigger version of this visualization.</div>
In particular, note that images that are nearby each other are much more a function of the general color distribution of the images, or the type of background rather than their semantic identity. For example, a dog can be seen very near a frog since both happen to be on white background. Ideally we would like images of all of the 10 classes to form their own clusters, so that images of the same class are nearby to each other regardless of irrelevant characteristics and variations (such as the background). However, to get this property we will have to go beyond raw pixels.
Oops, something went wrong.

0 comments on commit 2d4348b

Please sign in to comment.