-
Notifications
You must be signed in to change notification settings - Fork 0
/
2015-11-23-ipython_notes_on_understanding_deep_image_representations_by_inverting_them-en.ipynb
267 lines (267 loc) · 9.04 KB
/
2015-11-23-ipython_notes_on_understanding_deep_image_representations_by_inverting_them-en.ipynb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Link to blog post**: http://cesarsalgado.com/ipython_notes_on_understanding_deep_image_representations_by_inverting_them"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Paper title**: Understanding Deep Image Representations by Inverting Them\n",
"\n",
"**Authors**: Aravindh Mahendran, Andrea Vedaldi\n",
"\n",
"**Link**: http://arxiv.org/abs/1412.0035"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<sub>**Warning**: The code in this post is just for illustrative purposes. It isn't runnable.</sub>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This paper tries to reconstruct an image from its representation as a way of visualizing its information. The proposed method is applied to representations obtained from CNN, HOG or DSIFT. It tries to find a reconstructed image xr by minimizing the Euclidean distance between the representation of xr and that of the original image. A regularizer is also added to the loss to be minimized to produce more natural looking images. The paper uses SGD with momentum to perform the minimization. Let start focusing on CNNs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Let's suppose we are using AlexNet and the representation is obtained in its 3rd fully connected (fc) layer.\n",
"# h is the representation of image x given by the activations of the 3rd fc layer of the CNN.\n",
"h = cnn_fc3(x)\n",
"# The paper wants to find one image xr such that cnn_fc3(xr) is close to h \n",
"xr = reconstruct(h, cnn_fc3)\n",
"hr = cnn_fc3(xr)\n",
"sum((hr-h)**2) < epsilon == True # means that the two hs are close."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One of the desired properties of CNNs trained for object recognition is that they are invariant to many transformations of the input. That also means that they are not invertible functions. So there are many input images that will produce the same representation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# There are many pairs xi and xj such that:\n",
"np.all(cnn_fc3(xi) == cnn_fc3(xj)) == True"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Some of these images will probably not even look natural as the following paper on adversarial examples show: [Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images](http://arxiv.org/abs/1412.1897)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But the current paper wants only to find natural looking images so it incorporates an image \"prior\" to the reconstruction method. The way it incorporates the prior is the main novelty of the paper."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The problem of finding the reconstruction is posed as an optimization problem of trying to minimize the Euclidian distance between hr and h:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"xr = argmin( lambda x: sum( ((cnn_fc3(x)-h)**2) ) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But we also need to restrict the possible xr to look natural. So the paper adds also a \"prior\" as a regularizer to the loss equation:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"xr = argmin( lambda x: sum( ((cnn_fc3(x)-h)**2) + regularizer(x) ) )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Actually, the paper uses an equation a bit different from the above to make the guessing of the initial hyper-parameter easier. See section **Balancing the different terms** for more information.\n",
"\n",
"The regularizer term is defined as the sum of two sub-terms: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# this regularizer is one of the main paper's contribution.\n",
"def regularizer(x):\n",
" mult1 = 2.6*1e8\n",
" mult2 = 5\n",
" return mult1*six_norm(x) + mult2*total_variation(x)\n",
"\n",
"def six_norm(x):\n",
" return sum(x**6)\n",
"\n",
"# this is the finite-difference approx. of the total variation\n",
"def total_variation(x):\n",
" beta = 2\n",
" result = 0.0\n",
" # suppose x is properly zero padded.\n",
" for i in xrange(x.shape[0]):\n",
" for j in xrange(x.shape[1]):\n",
" result += ( (x[i,j+1]-x[i,j])**2 + (x[i+1,j]-x[i,j])**2 )**(beta/2)\n",
" return result"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The optimization problem is then solved by using SGD with momentum as all the parts of the loss function are differentiable on x (including off course cnn_fc3).\n",
"\n",
"To visualize the representation of HOG and DSIFT the authors first build these methods using the CNN layers so it becomes easy to differentiate on the input. The algorithm then proceeds as before."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Experiments"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The authors tried to reconstruct the images from other layers of the CNN too. The image below, that was extracted from the paper, shows the reconstruction of an image from the representation of different layers of the used CNN."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![notebook_image](imgs/understanding_deep_image_representations_by_inverting_them__figure_6.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It can be seen that the first layers preserve almost all information of the image and the last layers loses information about the location of the object, for instance, but retain information about some discriminative features of the objects."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I hope papers like this one ends the myth that Neural Nets are black box algorithms. There are many works that try to visualize and understand what is happening inside them CNNs. Below I list some of them:\n",
"\n",
"- [Visualizing and Understanding Convolutional Networks](http://arxiv.org/abs/1311.2901)\n",
"\n",
"- [Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps](http://arxiv.org/abs/1312.6034)\n",
"\n",
"- [Understanding Neural Networks Through Deep Visualization](http://yosinski.com/media/papers/Yosinski__2015__ICML_DL__Understanding_Neural_Networks_Through_Deep_Visualization__.pdf)\n",
"\n",
"The last paper from above has also made available their code: https://github.com/yosinski/deep-visualization-toolbox"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I read this paper, because I started reading [A Neural Algorithm of Artistic Style](http://arxiv.org/abs/1508.06576) and I thought I needed first to read some of its references. So probably my next notes will be about the above paper and the following former work by the same authors: [Texture Synthesis Using Convolutional Neural Networks](arxiv.org/abs/1505.07376)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The Fine Print"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<sub>These notes are similar in spirit to https://twitter.com/hugo_larochelle notes that he announces in his Twitter. The only difference is that from time to time I will also write notes in Jupyter Notebook and insert some small code samples to understand better the concepts exposed in the papers. The code is not meant to reproduce the paper and sometimes may not even be runnable as I pretend not to spend too much time on every paper I read. Sometimes I may wish to dig deeper into a paper and try to reproduce the results, but then I will tag the blog post in a different section of my blog.</sub>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}