-
Notifications
You must be signed in to change notification settings - Fork 0
/
atom.xml
3345 lines (3141 loc) · 927 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>A Stellar Hiker</title>
<link href="/atom.xml" rel="self"/>
<link href="http://conglang.github.io/"/>
<updated>2018-09-03T15:10:30.108Z</updated>
<id>http://conglang.github.io/</id>
<author>
<name>聪</name>
</author>
<generator uri="http://hexo.io/">Hexo</generator>
<entry>
<title>论文 Low Resolution Face Recognition Using a Two-Branch Deep Convolutional Neural Network Architecture</title>
<link href="http://conglang.github.io/2018/09/02/essay-two-branch-dcnn/"/>
<id>http://conglang.github.io/2018/09/02/essay-two-branch-dcnn/</id>
<published>2018-09-01T16:09:45.000Z</published>
<updated>2018-09-03T15:10:30.108Z</updated>
<content type="html"><h2 id="introduction"><a class="markdownIt-Anchor" href="#introduction"></a> Introduction</h2>
<p><img src="/img/lr_3_methods.png" alt="Three general approaches for low resolution face recognition"><br>
对于待识别人脸图像为低像素 LR probe image,训练数据为高像素 HR gallary image 的情况,常见的处理方式:</p>
<ol>
<li>训练数据也 down-sampling 到低像素。丢失有用信息。</li>
<li>从低像素的目标人脸图像生成高像素数据用于识别。对应方法关注点往往在于生成图像的质量,而非人脸识别的性能。[10]-[13]</li>
<li>将 LR probe image 和 HR gallary image 同时转换到一个共同空间,使两者距离接近。[14]-[17] 这篇论文采取的方式。</li>
</ol>
<p>此论文的关键在于找到 nonlinear transformation from LR and HR to common space。 Two deep CNN。<br>
由于包含一个 super-resolution CNN,同时还可以生成 LR 对应的 HR 图像。<br>
Object Function 是 Distance of transformed low and high resolution images in the common space。<br>
Dataset 是 FERET。<br>
占用内存小。</p>
<h2 id="method"><a class="markdownIt-Anchor" href="#method"></a> Method</h2>
<p><img src="/img/arch_two_branch_dcnn.png" alt="Architecture of two deep convolutional neural networks in two branches"><br>
Training set: pairs of LR and HR of same person on different conditions.</p>
<h3 id="networks-architecture"><a class="markdownIt-Anchor" href="#networks-architecture"></a> Networks Architecture</h3>
<p>VGGnet: 13 CONV + 3 FC</p>
<ul>
<li>HR images -&gt; common space: FECNN(feature extraction onvolutional neural network). 224x224 image -&gt; 4096 feature vector. VGGnet - 2 FC</li>
<li>LR images -&gt; common space: SRFECNN = SRnet(super-resolution net) + FECNN. 224x224 -&gt; 4096 feature vector</li>
</ul>
<p>由于去掉两层 FC,比 VGGnet parameter 少,可以放进内存。<br>
<img src="/img/srfecnn_weights.png" alt="SRFECNN weights"></p>
<h3 id="common-subspace-learning"><a class="markdownIt-Anchor" href="#common-subspace-learning"></a> Common Subspace Learning</h3>
<p>3 steps:</p>
<ol>
<li>Use trained VGGnet on face dataset and then dropped the last two FC. 因为这两层是为 classification task 特别起作用的。称为 pre-trained FECNN。</li>
<li>Train the SRnet of the bottom branch with a dataset of high and low resolution face image pairs.</li>
<li>Merge SRnet and FECNN and a training dataset that contains pairs of LR and HR of same persons was fed into the brached.</li>
</ol>
<p>HR 所在上层固定,只训练 LR 所在下层的 FECNN 和 SRnet。<br>
Distance between LR and HR images of the same subjects is the error, backpropagated into the bottom branch net (both FECNN and SRnet).</p>
<p><img src="/img/srfecnn_config.png" alt="Configurations with different super-resolution modules"></p>
<h3 id="reconstruct-input-image"><a class="markdownIt-Anchor" href="#reconstruct-input-image"></a> Reconstruct Input Image</h3>
<p>SRnet 输出。SRnet 的主要作用看上去是让极低分辨率的图片不至于表现太坏。<br>
生成图片的效果并不是很好,为 better recognition performance 做出了牺牲。<br>
<img src="/img/face_diff_config.png" alt="Reconstructed Faces by different configurations"></p>
<h3 id="datasets"><a class="markdownIt-Anchor" href="#datasets"></a> Datasets</h3>
<p><img src="/img/srfecnn_datasets.png" alt="Datasets Used"></p>
<h2 id="ref"><a class="markdownIt-Anchor" href="#ref"></a> Ref</h2>
<p><a href="https://arxiv.org/abs/1706.06247" target="_blank" rel="external">https://arxiv.org/abs/1706.06247</a></p>
</content>
<summary type="html">
论文,Low Resolution Face Recognition Using a Two-Branch Deep Convolutional Neural Network Architecture
</summary>
<category term="ML & DL" scheme="http://conglang.github.io/categories/ML-DL/"/>
<category term="Deep Learning" scheme="http://conglang.github.io/tags/Deep-Learning/"/>
<category term="Essay" scheme="http://conglang.github.io/tags/Essay/"/>
<category term="Face Recognition" scheme="http://conglang.github.io/tags/Face-Recognition/"/>
</entry>
<entry>
<title>数值计算</title>
<link href="http://conglang.github.io/2018/08/05/numerical-computation/"/>
<id>http://conglang.github.io/2018/08/05/numerical-computation/</id>
<published>2018-08-05T12:51:00.000Z</published>
<updated>2018-08-05T16:01:48.000Z</updated>
<content type="html"><p>深度学习花书第4章笔记。</p>
<h2 id="深度学习-第4章-数值计算"><a class="markdownIt-Anchor" href="#深度学习-第4章-数值计算"></a> 深度学习 - 第4章 数值计算</h2>
<p>机器学习算法通常需要大量的数值计算。这通常是指通过迭代过程更新解的估计值来解决数学问题的算法,而不是通过解析过程推导出公式来提供正确解的方法。常见的操作包括优化 (找到最小化或最大化函数值的参数) 和线性方程组的求解。对数学计算机来说,实数无法在有限内存下精确表示,因此仅仅是计算涉及实数的函数也是困难的。</p>
<h3 id="上溢和下溢"><a class="markdownIt-Anchor" href="#上溢和下溢"></a> 上溢和下溢</h3>
<p>舍入误差的累积可能导致巨大的错误。在实现深度学习算法时,底层库的开发者应该牢记数值问题。</p>
<ul>
<li>下溢 underflow<br>
当接近零的数被四舍五入为零时发生下溢。许多函数在其参数为零而不是一个很小的正数时才会表现出质的不同。例如,我们通常要避免被零除或避免取零的对数。下一步运算会变成非数字。</li>
<li>上溢 overflow<br>
当大量级的数被近似为<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi mathvariant="normal">∞</mi></mrow><annotation encoding="application/x-tex">\infty</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathrm">∞</span></span></span></span>或<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mo>−</mo><mi mathvariant="normal">∞</mi></mrow><annotation encoding="application/x-tex">-\infty</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.58333em;"></span><span class="strut bottom" style="height:0.66666em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord">−</span><span class="mord mathrm">∞</span></span></span></span>时发生上溢。进一步的运算通常会导致这些无限制变为非数字。</li>
</ul>
<p>例子:softmax<br>
常用于预测与 Multinouilli 分布相关联的概率。</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>s</mi><mi>o</mi><mi>f</mi><mi>t</mi><mi>m</mi><mi>a</mi><mi>x</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>=</mo><mfrac><mrow><mi>exp</mi><mo>(</mo><msub><mi>x</mi><mi>i</mi></msub><mo>)</mo></mrow><mrow><msubsup><mo>∑</mo><mrow><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></msubsup><mi>exp</mi><mo>(</mo><msub><mi>x</mi><mi>j</mi></msub><mo>)</mo></mrow></mfrac></mrow><annotation encoding="application/x-tex">softmax(x) = \frac{\exp(x_i)}{\sum_{j=1}^n \exp(x_j)}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.427em;"></span><span class="strut bottom" style="height:2.549118em;vertical-align:-1.122118em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord mathit">s</span><span class="mord mathit">o</span><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mord mathit">t</span><span class="mord mathit">m</span><span class="mord mathit">a</span><span class="mord mathit">x</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.6859999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mop"><span class="op-symbol small-op mop" style="top:-0.0000050000000000050004em;">∑</span><span class="vlist"><span style="top:0.30001em;margin-left:0em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit" style="margin-right:0.05724em;">j</span><span class="mrel">=</span><span class="mord mathrm">1</span></span></span></span><span style="top:-0.364em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">n</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mop">exp</span><span class="mopen">(</span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose">)</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mop">exp</span><span class="mopen">(</span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span></span></span></p>
<p>假设所有<span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>x</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">x_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span>都等于<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>c</mi></mrow><annotation encoding="application/x-tex">c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">c</span></span></span></span>,按理说所有输出都应该为<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mfrac><mrow><mn>1</mn></mrow><mrow><mi>n</mi></mrow></mfrac></mrow><annotation encoding="application/x-tex">\frac{1}{n}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.845108em;"></span><span class="strut bottom" style="height:1.190108em;vertical-align:-0.345em;"></span><span class="base textstyle uncramped"><span class="mord reset-textstyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.345em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">n</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.394em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathrm">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span></span>。可如果<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>c</mi></mrow><annotation encoding="application/x-tex">c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">c</span></span></span></span>是很大的正数,<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>e</mi><mi>x</mi><mi>p</mi><mo>(</mo><mi>c</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">exp(c)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit">e</span><span class="mord mathit">x</span><span class="mord mathit">p</span><span class="mopen">(</span><span class="mord mathit">c</span><span class="mclose">)</span></span></span></span>会上溢,导致表达式未定义。如果<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>c</mi></mrow><annotation encoding="application/x-tex">c</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">c</span></span></span></span>是很小的负数,<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>e</mi><mi>x</mi><mi>p</mi><mo>(</mo><mi>c</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">exp(c)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit">e</span><span class="mord mathit">x</span><span class="mord mathit">p</span><span class="mopen">(</span><span class="mord mathit">c</span><span class="mclose">)</span></span></span></span>会下溢,则分母为零,表达式未定义。<br>
解决方法,计算 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>s</mi><mi>o</mi><mi>f</mi><mi>t</mi><mi>m</mi><mi>a</mi><mi>x</mi><mo>(</mo><mi>z</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">softmax(z)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit">s</span><span class="mord mathit">o</span><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mord mathit">t</span><span class="mord mathit">m</span><span class="mord mathit">a</span><span class="mord mathit">x</span><span class="mopen">(</span><span class="mord mathit" style="margin-right:0.04398em;">z</span><span class="mclose">)</span></span></span></span>,<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>z</mi><mo>=</mo><mi>x</mi><mo>−</mo><msub><mi>max</mi><mi>i</mi></msub><msub><mi>x</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">z = x - \max_i x_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.58333em;"></span><span class="strut bottom" style="height:0.73333em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.04398em;">z</span><span class="mrel">=</span><span class="mord mathit">x</span><span class="mbin">−</span><span class="mop"><span class="mop">max</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span>。函数值不会因为从输入向量减去或加上标量而改变。减去<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>m</mi><mi>a</mi><msub><mi>x</mi><mi>i</mi></msub><msub><mi>x</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">max_i x_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord mathit">m</span><span class="mord mathit">a</span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span>导致 exp 的最大参数为0,这排除了上溢。同样,分母中至少有一个值为1的项,排除了下溢。</p>
<h3 id="病态条件"><a class="markdownIt-Anchor" href="#病态条件"></a> 病态条件</h3>
<p>条件数指的是函数相对于输入的微小变化而变化的快慢程度。输入被轻微扰动而迅速改变的函数对于科学计算来说可能是有问题的,因为输入中的舍入误差可能导致输出的巨大变化。<br>
考虑函数 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>=</mo><msup><mi>A</mi><mrow><mo>−</mo><mn>1</mn></mrow></msup><mi>x</mi></mrow><annotation encoding="application/x-tex">f(x) = A^{-1}x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8141079999999999em;"></span><span class="strut bottom" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span><span class="mrel">=</span><span class="mord"><span class="mord mathit">A</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord">−</span><span class="mord mathrm">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathit">x</span></span></span></span>。当 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>A</mi><mo>∈</mo><msup><mi>R</mi><mrow><mi>n</mi><mo>×</mo><mi>n</mi></mrow></msup></mrow><annotation encoding="application/x-tex">A \in R^{n \times n}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.771331em;"></span><span class="strut bottom" style="height:0.810431em;vertical-align:-0.0391em;"></span><span class="base textstyle uncramped"><span class="mord mathit">A</span><span class="mrel">∈</span><span class="mord"><span class="mord mathit" style="margin-right:0.00773em;">R</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathit">n</span><span class="mbin">×</span><span class="mord mathit">n</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span> 具有特征值分解时,其条件数为</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>max</mi><mrow><mi>i</mi><mo separator="true">,</mo><mi>j</mi></mrow></msub><mi mathvariant="normal">∣</mi><mfrac><mrow><msub><mi>λ</mi><mi>i</mi></msub></mrow><mrow><msub><mi>λ</mi><mi>j</mi></msub></mrow></mfrac><mi mathvariant="normal">∣</mi></mrow><annotation encoding="application/x-tex">\max_{i,j} |\frac{\lambda_i}{\lambda_j}|
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.37144em;"></span><span class="strut bottom" style="height:2.343548em;vertical-align:-0.972108em;"></span><span class="base displaystyle textstyle uncramped"><span class="mop op-limits"><span class="vlist"><span style="top:0.627664em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span style="top:2.7755575615628914e-17em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span><span class="mop">max</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathrm">∣</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.6859999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit">λ</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit">λ</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mord mathrm">∣</span></span></span></span></span></p>
<p>这是最大和最小特征值的模之比。当该数很大时,矩阵求逆对输入的误差特别敏感。</p>
<h3 id="基于梯度的优化方法"><a class="markdownIt-Anchor" href="#基于梯度的优化方法"></a> 基于梯度的优化方法</h3>
<p>大多数深度学习算法都涉及某种形式的优化。优化指的是改变 x 以最小化或最大化某个函数 f(x) 的任务。一般用最小化 f(x) 指代大多数最优化问题。<br>
通常使用一个上标 <code>*</code> 来表示最小化或最大化函数的 x 值,如 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><msup><mi>x</mi><mo>∗</mo></msup><mo>=</mo><mi>arg</mi><mi>min</mi><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">x^* = \arg \min f(x)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mrel">=</span><span class="mop">ar<span style="margin-right:0.01389em;">g</span></span><span class="mop">min</span><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span></span></span></span>。</p>
<p>梯度下降 gradient descent 就是将 x 往导数的反方向移动一小步来减少 f(x)。<br>
如果是多维,就是偏导数 partial derivative。梯度 gradient 是相对一个向量求导的导数:f 的导数是包含所有偏导数的向量,记为 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi mathvariant="normal">▽</mi><mi>x</mi></msub><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">\triangledown_x f(x)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathrm">▽</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">x</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span></span></span></span>。梯度的第 i 个元素是 f 关于 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>x</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">x_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span> 的偏导数。<br>
梯度下降建议新的点为:<span class="katex"><span class="katex-mathml"><math><semantics><mrow><msup><mi>x</mi><mrow><mi mathvariant="normal">′</mi></mrow></msup><mo>=</mo><mi>x</mi><mo>−</mo><mi>ϵ</mi><msub><mi mathvariant="normal">▽</mi><mi>x</mi></msub><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">x&#x27; = x - \epsilon \triangledown_x f(x)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.751892em;"></span><span class="strut bottom" style="height:1.001892em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathrm">′</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mrel">=</span><span class="mord mathit">x</span><span class="mbin">−</span><span class="mord mathit">ϵ</span><span class="mord"><span class="mord mathrm">▽</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">x</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span></span></span></span>,<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>ϵ</mi></mrow><annotation encoding="application/x-tex">\epsilon</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">ϵ</span></span></span></span>为学习率。<br>
<img src="/img/numeric_comp_gradient_descent_example.png" alt="梯度下降使用导数"></p>
<p>临界点的类型<br>
<img src="/img/numeric_comp_critical_point.png" alt="临界点的类型"></p>
<p>近似最小化<br>
<img src="/img/numeric_comp_minimum_approx.png" alt="近似最小化"></p>
<h4 id="梯度之上jacobian-和-hessian-矩阵"><a class="markdownIt-Anchor" href="#梯度之上jacobian-和-hessian-矩阵"></a> 梯度之上:Jacobian 和 Hessian 矩阵</h4>
<p>计算输入和输出都是向量的函数的所有偏导数,放在一个矩阵中,就是 <strong>Jacobian 矩阵</strong>。具体来说,如果我们有一个函数 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>f</mi><mo>:</mo><msup><mi>R</mi><mi>m</mi></msup><mo>→</mo><msup><mi>R</mi><mi>n</mi></msup></mrow><annotation encoding="application/x-tex">f: R^m \rightarrow R^n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mrel">:</span><span class="mord"><span class="mord mathit" style="margin-right:0.00773em;">R</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord mathit">m</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mrel">→</span><span class="mord"><span class="mord mathit" style="margin-right:0.00773em;">R</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord mathit">n</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span>,<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>f</mi></mrow><annotation encoding="application/x-tex">f</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.10764em;">f</span></span></span></span> 的 Jacobian 矩阵 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>J</mi><mo>∈</mo><msup><mi>R</mi><mrow><mi>n</mi><mo>×</mo><mi>m</mi></mrow></msup></mrow><annotation encoding="application/x-tex">J \in R^{n \times m}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.771331em;"></span><span class="strut bottom" style="height:0.810431em;vertical-align:-0.0391em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.09618em;">J</span><span class="mrel">∈</span><span class="mord"><span class="mord mathit" style="margin-right:0.00773em;">R</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mord mathit">n</span><span class="mbin">×</span><span class="mord mathit">m</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span> 定义为:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>J</mi><mrow><mi>i</mi><mo separator="true">,</mo><mi>j</mi></mrow></msub><mo>=</mo><mfrac><mrow><mi mathvariant="normal">∂</mi></mrow><mrow><mi mathvariant="normal">∂</mi><msub><mi>x</mi><mi>j</mi></msub></mrow></mfrac><mi>f</mi><mo>(</mo><mi>x</mi><msub><mo>)</mo><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">J_{i,j} = \frac{\partial}{\partial x_j} f(x)_i
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.37144em;"></span><span class="strut bottom" style="height:2.343548em;vertical-align:-0.972108em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.09618em;">J</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.09618em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.6859999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathrm" style="margin-right:0.05556em;">∂</span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm" style="margin-right:0.05556em;">∂</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose"><span class="mclose">)</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span></span></p>
<p>二阶导数,告诉我们一阶导数将如何随着输入的变化而变化。它表示只基于梯度信息的梯度下降步骤是否会产生如我们预期的那样大的改善。二阶导数是对曲率的衡量。</p>
<ul>
<li>二阶导数为零。没有曲率,也就是一条平坦的线,仅用梯度就可以预测它的值。我们使用沿着负梯度方向大小为<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>ϵ</mi></mrow><annotation encoding="application/x-tex">\epsilon</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">ϵ</span></span></span></span>的下降步,当该梯度是1时,代价函数将下降<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>ϵ</mi></mrow><annotation encoding="application/x-tex">\epsilon</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">ϵ</span></span></span></span>。</li>
<li>二阶导数为负。函数曲线向下凹陷,向上凸出,因此代价函数将下降得比<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>ϵ</mi></mrow><annotation encoding="application/x-tex">\epsilon</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">ϵ</span></span></span></span>多。</li>
<li>二阶导数为正。函数曲线向上凹陷,向下凸出,代价函数将下降得比<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>ϵ</mi></mrow><annotation encoding="application/x-tex">\epsilon</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">ϵ</span></span></span></span>少。</li>
</ul>
<p><img src="/img/numeric_comp_second_derivative.png" alt="二阶导数确定函数曲率"></p>
<p>当函数有多维输入时,二阶导数也很多,合并为一个矩阵,即 <strong>Hessian 矩阵</strong>。Hessian 矩阵定义为:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>H</mi><mo>(</mo><mi>f</mi><mo>)</mo><mo>(</mo><mi>x</mi><msub><mo>)</mo><mrow><mi>i</mi><mo separator="true">,</mo><mi>j</mi></mrow></msub><mo>=</mo><mfrac><mrow><msup><mi mathvariant="normal">∂</mi><mn>2</mn></msup></mrow><mrow><mi mathvariant="normal">∂</mi><msub><mi>x</mi><mi>i</mi></msub><mi mathvariant="normal">∂</mi><msub><mi>x</mi><mi>j</mi></msub></mrow></mfrac><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">H(f)(x)_{i,j} = \frac{\partial^2}{\partial x_i \partial x_j} f(x)
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.491108em;"></span><span class="strut bottom" style="height:2.463216em;vertical-align:-0.972108em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mclose">)</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose"><span class="mclose">)</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">i</span><span class="mpunct">,</span><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.6859999999999999em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathrm" style="margin-right:0.05556em;">∂</span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathrm" style="margin-right:0.05556em;">∂</span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit" style="margin-right:0.05724em;">j</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathrm" style="margin-right:0.05556em;">∂</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord mathrm">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span></span></span></span></span></p>
<p>Hessian 等价于梯度的 Jacobian 矩阵。</p>
<p>我们可以通过(方向)二阶导数预期一个梯度下降步骤能表现得多好。我们在当前点<span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>x</mi><mrow><mo>(</mo><mn>0</mn><mo>)</mo></mrow></msub></mrow><annotation encoding="application/x-tex">x_{(0)}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.7857599999999999em;vertical-align:-0.3551999999999999em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.18019999999999992em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mopen">(</span><span class="mord mathrm">0</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span>处做函数<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">f(x)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span></span></span></span>的近似二阶泰勒级数:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>f</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>≈</mo><mi>f</mi><mo>(</mo><msup><mi>x</mi><mrow><mo>(</mo><mn>0</mn><mo>)</mo></mrow></msup><mo>)</mo><mo>+</mo><mo>(</mo><mi>x</mi><mo>−</mo><msup><mi>x</mi><mrow><mo>(</mo><mn>0</mn><mo>)</mo></mrow></msup><msup><mo>)</mo><mi>T</mi></msup><mi>g</mi><mo>+</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><mo>(</mo><mi>x</mi><mo>−</mo><msup><mi>x</mi><mrow><mo>(</mo><mn>0</mn><mo>)</mo></mrow></msup><msup><mo>)</mo><mi>T</mi></msup><mi>H</mi><mo>(</mo><mi>x</mi><mo>−</mo><msup><mi>x</mi><mrow><mo>(</mo><mn>0</mn><mo>)</mo></mrow></msup><mo>)</mo></mrow><annotation encoding="application/x-tex">f(x) \approx f(x^{(0)}) + (x - x^{(0)})^T g + \frac{1}{2}(x - x^{(0)})^T H (x - x^{(0)})
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.32144em;"></span><span class="strut bottom" style="height:2.00744em;vertical-align:-0.686em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span><span class="mrel">≈</span><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">0</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose">)</span><span class="mbin">+</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mbin">−</span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">0</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">T</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="mbin">+</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathrm">2</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mbin">−</span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">0</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose"><span class="mclose">)</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">T</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathit" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mbin">−</span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">0</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose">)</span></span></span></span></span></p>
<p>其中 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>g</mi></mrow><annotation encoding="application/x-tex">g</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">g</span></span></span></span> 是梯度,<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>H</mi></mrow><annotation encoding="application/x-tex">H</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.08125em;">H</span></span></span></span>是<span class="katex"><span class="katex-mathml"><math><semantics><mrow><msup><mi>x</mi><mrow><mo>(</mo><mn>0</mn><mo>)</mo></mrow></msup></mrow><annotation encoding="application/x-tex">x^{(0)}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8879999999999999em;"></span><span class="strut bottom" style="height:0.8879999999999999em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">0</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span></span></span></span>点的 Hessian。如果我们使用学习率<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>ϵ</mi></mrow><annotation encoding="application/x-tex">\epsilon</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">ϵ</span></span></span></span>,那么新的点<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>x</mi></mrow><annotation encoding="application/x-tex">x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">x</span></span></span></span>将会是<span class="katex"><span class="katex-mathml"><math><semantics><mrow><msup><mi>x</mi><mrow><mo>(</mo><mn>0</mn><mo>)</mo></mrow></msup><mo>−</mo><mi>ϵ</mi><mi>g</mi></mrow><annotation encoding="application/x-tex">x^{(0)} - \epsilon g</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8879999999999999em;"></span><span class="strut bottom" style="height:1.0824399999999998em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">0</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mbin">−</span><span class="mord mathit">ϵ</span><span class="mord mathit" style="margin-right:0.03588em;">g</span></span></span></span>。代入上述的近似,可得:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>f</mi><mo>(</mo><msup><mi>x</mi><mrow><mo>(</mo><mn>0</mn><mo>)</mo></mrow></msup><mo>−</mo><mi>ϵ</mi><mi>g</mi><mo>)</mo><mo>≈</mo><mi>f</mi><mo>(</mo><msup><mi>x</mi><mrow><mo>(</mo><mn>0</mn><mo>)</mo></mrow></msup><mo>)</mo><mo>−</mo><mi>ϵ</mi><msup><mi>g</mi><mi>T</mi></msup><mi>g</mi><mo>+</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><msup><mi>ϵ</mi><mn>2</mn></msup><msup><mi>g</mi><mi>T</mi></msup><mi>H</mi><mi>g</mi></mrow><annotation encoding="application/x-tex">f(x^{(0)} - \epsilon g) \approx f(x^{(0)}) - \epsilon g^T g + \frac{1}{2} \epsilon^2 g^T H g
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.32144em;"></span><span class="strut bottom" style="height:2.00744em;vertical-align:-0.686em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">0</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mbin">−</span><span class="mord mathit">ϵ</span><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="mclose">)</span><span class="mrel">≈</span><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mopen">(</span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">0</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose">)</span><span class="mbin">−</span><span class="mord mathit">ϵ</span><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">T</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="mbin">+</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathrm">2</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mord"><span class="mord mathit">ϵ</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord mathrm">2</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">T</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathit" style="margin-right:0.08125em;">H</span><span class="mord mathit" style="margin-right:0.03588em;">g</span></span></span></span></span></p>
<p>其中有3项:函数的原始值、函数斜率导致的预期改善、函数曲率导致的校正。当最后一项太大时,梯度下降实际上是可能向上移动的。<br>
当<span class="katex"><span class="katex-mathml"><math><semantics><mrow><msup><mi>g</mi><mi>T</mi></msup><mi>H</mi><mi>g</mi></mrow><annotation encoding="application/x-tex">g^T H g</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.8413309999999999em;"></span><span class="strut bottom" style="height:1.035771em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">T</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathit" style="margin-right:0.08125em;">H</span><span class="mord mathit" style="margin-right:0.03588em;">g</span></span></span></span>为正时,计算可得使近似泰勒级数下降最多的最优步长为:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msup><mi>ϵ</mi><mo>∗</mo></msup><mo>=</mo><mfrac><mrow><msup><mi>g</mi><mi>T</mi></msup><mi>g</mi></mrow><mrow><msup><mi>g</mi><mi>T</mi></msup><mi>H</mi><mi>g</mi></mrow></mfrac></mrow><annotation encoding="application/x-tex">\epsilon^* = \frac{g^Tg}{g^T H g}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.5183309999999999em;"></span><span class="strut bottom" style="height:2.398771em;vertical-align:-0.8804400000000001em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit">ϵ</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord">∗</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.6860000000000002em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="vlist"><span style="top:-0.289em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit" style="margin-right:0.13889em;">T</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathit" style="margin-right:0.08125em;">H</span><span class="mord mathit" style="margin-right:0.03588em;">g</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="vlist"><span style="top:-0.363em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">T</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathit" style="margin-right:0.03588em;">g</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span></span></span></p>
<p>Hessian 的特征值决定了学习率的量级。</p>
<p>二阶导数还可以用于确定一个临界点是否是局部极大点、全局极小点、鞍点。临界点处 f’(x) = 0.</p>
<ul>
<li>
<p>f’’(x) &gt; 0 或 Hessian 是正定的(所有特征值都是正的)<br>
f’(x) 会随着我们移向右边而增加,移向左边而减小。全局极小点。</p>
</li>
<li>
<p>f’’(x) &lt; 0 或 Hessian 是负定的<br>
局部极大点</p>
</li>
<li>
<p>f’’(x) = 0<br>
不确定。x 可以是鞍点或平坦区域的一部分。</p>
</li>
<li>
<p>一阶优化算法 first-order optimization algorithms<br>
仅适用梯度信息的优化算法。如梯度下降。</p>
</li>
<li>
<p>二阶优化算法 second-order optimization algorithms<br>
使用 Hessian 矩阵的优化算法。如牛顿法。</p>
</li>
</ul>
<p>梯度下降无法利用包含在 Hessian 矩阵中的曲率信息。<br>
<img src="/img/numeric_comp_gradient_descent_ignore_hessian.png" alt="梯度下降无法利用包含在 Hessian 矩阵中的曲率信息"></p>
<p>凸优化算法只对凸函数适用,即 Hessian 处处半正定的函数。因为这些函数没有鞍点而且其所有全局极小点必然是全局最小点。所以表现很好。<br>
但是,深度学习中的大多数问题都难以表示成凸优化的形式。凸优化仅用作一些深度学习算法的子程序。</p>
<h3 id="约束优化"><a class="markdownIt-Anchor" href="#约束优化"></a> 约束优化</h3>
<p>限制了定义域,约束优化 constrained optimization。<br>
可转化成原始优化问题的解。见<a href="/2018/03/15/ml-lagrange">拉格朗日算子</a>。</p>
<p>Ref:<br>
[1] 深度学习</p>
</content>
<summary type="html">
数值计算。
</summary>
<category term="Math" scheme="http://conglang.github.io/categories/Math/"/>
<category term="Math" scheme="http://conglang.github.io/tags/Math/"/>
</entry>
<entry>
<title>微积分基础</title>
<link href="http://conglang.github.io/2018/08/05/calculus-intuition/"/>
<id>http://conglang.github.io/2018/08/05/calculus-intuition/</id>
<published>2018-08-05T12:47:02.000Z</published>
<updated>2019-04-23T16:54:35.315Z</updated>
<content type="html"><p>todo<br>
目前只有一个 cheatsheet 整理了常见公式和概念。</p>
<h2 id="calculus"><a class="markdownIt-Anchor" href="#calculus"></a> Calculus</h2>
<h3 id="目录"><a class="markdownIt-Anchor" href="#目录"></a> 目录</h3>
<ul>
<li>Trigonometric Formulas</li>
<li>Differentiation Formulas</li>
<li>Integration Formulas</li>
<li>Formulas and Theorems</li>
</ul>
<h3 id="文件"><a class="markdownIt-Anchor" href="#文件"></a> 文件</h3>
<p><a href="/img/Final_Notes_for_AB_and_BC.pdf">Final Notes for AB and BC</a></p>
</content>
<summary type="html">
微积分直觉。
</summary>
<category term="Math" scheme="http://conglang.github.io/categories/Math/"/>
<category term="Math" scheme="http://conglang.github.io/tags/Math/"/>
<category term="Calculus" scheme="http://conglang.github.io/tags/Calculus/"/>
</entry>
<entry>
<title>概率与统计直觉</title>
<link href="http://conglang.github.io/2018/08/04/probability-statistics-intuition/"/>
<id>http://conglang.github.io/2018/08/04/probability-statistics-intuition/</id>
<published>2018-08-04T11:36:29.000Z</published>
<updated>2019-04-24T13:09:30.426Z</updated>
<content type="html"><p>整理 cs229 与微软在 Edx 上课程 Data Science Orientation 的笔记,深度学习花书第3章,还有概率导论里的一些内容。</p>
<h2 id="cs229-review-of-probability-theory"><a class="markdownIt-Anchor" href="#cs229-review-of-probability-theory"></a> cs229 - Review of Probability Theory</h2>
<p>概率论重要知识点。</p>
<h3 id="目录"><a class="markdownIt-Anchor" href="#目录"></a> 目录</h3>
<ol>
<li>Elements of probability<br>
a. Conditional probability and independence</li>
<li>Random variables<br>
a. Cumulative distribution functions<br>
b. Probability mass functions<br>
c. Probability density functions<br>
d. Expectation<br>
e. Variance<br>
f. Some common random variables</li>
<li>Two random variables<br>
a. Joint and marginal distributions<br>
b. Joint and marginal probability mass functions<br>
c. Joint and marginal probability density functions<br>
d. Conditional distributions<br>
e. Bayes’s rule<br>
f. Independence<br>
g. Expectation and covariance</li>
<li>Multiple random variables<br>
a. Basic properties<br>
b. Random vectors<br>
c. The multivariate Gaussian distribution</li>
</ol>
<h3 id="文件"><a class="markdownIt-Anchor" href="#文件"></a> 文件</h3>
<p><a href="/img/cs229-prob.pdf">Review of Probability Theory</a></p>
<h2 id="microsoft-statistical-insights"><a class="markdownIt-Anchor" href="#microsoft-statistical-insights"></a> Microsoft - Statistical Insights</h2>
<p>统计学相关知识点。</p>
<h3 id="目录-2"><a class="markdownIt-Anchor" href="#目录-2"></a> 目录</h3>
<ol>
<li>What is a Variable?<br>
Why is this important?</li>
<li>Population vs. Sample<br>
Why is this important?</li>
<li>Measures of Central Tendency</li>
<li>Measures of Variability<br>
Why is this important?</li>
<li>Hypothesis Testing<br>
Why Don’t We “Accept” the Null Hypothesis?</li>
<li>Measures of Association: Correlation Coefficients<br>
How to Interpret a Correlation Coefficient<br>
How to Calculate a Correlation Coefficient<br>
Rules of Thumb for Correlations</li>
<li>Comparative Measures: One Sample t-Test<br>
How to Calculate a One Sample t-Test Statistic</li>
<li>Comparative Measures: Two Sample t-Test<br>
How to Calculate a Two Sample t-Test Statistic</li>
<li>Comparative Measures: Paired Sample t-Test<br>
How to Calculate a Paired Sample t-Test Statistic</li>
<li>Comparative Measures: Analysis of Variance (ANOVA)<br>
Why You Shouldn’t Run Multiple t-Tests<br>
How to Calculate a One-Way ANOVA<br>
How to Calculate a Two-Way ANOVA</li>
<li>Predictive Measures: Linear Regression<br>
How to Calculate a Regression</li>
</ol>
<h3 id="文件-2"><a class="markdownIt-Anchor" href="#文件-2"></a> 文件</h3>
<p><a href="/img/Data_Science_101_Statistics_Overview.pdf">Statistical Insights</a></p>
<h2 id="深度学习-第3章-概率与信息论"><a class="markdownIt-Anchor" href="#深度学习-第3章-概率与信息论"></a> 深度学习 - 第3章 概率与信息论</h2>
<p>针对深度学习中所需要的概率论。</p>
<h3 id="为什么要使用概率"><a class="markdownIt-Anchor" href="#为什么要使用概率"></a> 为什么要使用概率</h3>
<p>因为机器学习通常必须处理不确定量,有时也可能需要处理随机(非确定性的)量。<br>
不确定性有3种可能的来源:</p>
<ul>
<li>被建模系统内在的随机性。</li>
<li>不完全观测。</li>
<li>不完全建模。</li>
</ul>
<p>概率直接与事件发生的频率相联系,是频率派概率;涉及确定性水平,是贝叶斯概率。</p>
<h3 id="随机变量-概率分布-边缘概率-条件概率-条件概率的链式法则-独立性和条件独立性-期望方差和协方差"><a class="markdownIt-Anchor" href="#随机变量-概率分布-边缘概率-条件概率-条件概率的链式法则-独立性和条件独立性-期望方差和协方差"></a> 随机变量、概率分布、边缘概率、条件概率、条件概率的链式法则、独立性和条件独立性 、期望方差和协方差</h3>
<p>一个随机变量只是对可能的状态的描述,它必须伴随着一个概率分布来指定每个状态的可能性。<br>
协方差在某种意义上给出了两个变量线性相关性的强度以及这些变量的尺度。协方差矩阵的对角元是方差。</p>
<h3 id="常用概率分布"><a class="markdownIt-Anchor" href="#常用概率分布"></a> 常用概率分布</h3>
<p>当我们由于缺乏关于某个实数上分布的先验知识而不知道该选择怎样的形式时,正态分布是默认的比较好的选择。有两个原因:</p>
<ul>
<li>我们想要建模的很多分布的真实情况是比较接近正态分布的。中心极限定理说明很多独立随机变量和近似服从正态分布。这意味着在实际中,很多复杂系统都可以被成功地建模成正态分布的噪声,即使系统可以被分解成一些更结构化的部分。</li>
<li>在具有相同方差的所有可能的概率分布中,正态分布在实数上具有最大的不确定性。因此,我们可以认为正态分布是对模型加入的先验知识量最少的分布。</li>
</ul>
<p>高斯混合模型是概率密度的万能近似器,任何平滑的概率密度都可以用具有足够多组件的高斯混合模型以任意精度来逼近。</p>
<h3 id="常用函数的有用性质-贝叶斯规则"><a class="markdownIt-Anchor" href="#常用函数的有用性质-贝叶斯规则"></a> 常用函数的有用性质、贝叶斯规则</h3>
<blockquote>
<p>logistic sigmoid</p>
</blockquote>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>σ</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mn>1</mn><mo>+</mo><mi>e</mi><mi>x</mi><mi>p</mi><mo>(</mo><mo>−</mo><mi>x</mi><mo>)</mo></mrow></mfrac></mrow><annotation encoding="application/x-tex">\sigma(x) = \frac{1}{1 + exp(-x)}
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.32144em;"></span><span class="strut bottom" style="height:2.25744em;vertical-align:-0.936em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathrm">1</span><span class="mbin">+</span><span class="mord mathit">e</span><span class="mord mathit">x</span><span class="mord mathit">p</span><span class="mopen">(</span><span class="mord">−</span><span class="mord mathit">x</span><span class="mclose">)</span></span></span></span><span style="top:-0.2300000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span></span></span></span></span></p>
<p>通常用来产生 Bernouilli 分布中的参数 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>ϕ</mi></mrow><annotation encoding="application/x-tex">\phi</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">ϕ</span></span></span></span>,因为其值域是(0,1),在其有效取值范围内。</p>
<blockquote>
<p>softplus</p>
</blockquote>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>ζ</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>=</mo><mi>l</mi><mi>o</mi><mi>g</mi><mo>(</mo><mn>1</mn><mo>+</mo><mi>e</mi><mi>x</mi><mi>p</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>)</mo></mrow><annotation encoding="application/x-tex">\zeta(x) = log(1 + exp(x))
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.07378em;">ζ</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.01968em;">l</span><span class="mord mathit">o</span><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="mopen">(</span><span class="mord mathrm">1</span><span class="mbin">+</span><span class="mord mathit">e</span><span class="mord mathit">x</span><span class="mord mathit">p</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span><span class="mclose">)</span></span></span></span></span></p>
<p>用来产生正态分布的 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>β</mi></mrow><annotation encoding="application/x-tex">\beta</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.05278em;">β</span></span></span></span> 和 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>α</mi></mrow><annotation encoding="application/x-tex">\alpha</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.0037em;">α</span></span></span></span>参数,因为它的范围是<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mo>(</mo><mn>0</mn><mo separator="true">,</mo><mi mathvariant="normal">∞</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">(0, \infty)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mopen">(</span><span class="mord mathrm">0</span><span class="mpunct">,</span><span class="mord mathrm">∞</span><span class="mclose">)</span></span></span></span>。<br>
<img src="/img/stat_softplus.png" alt="softplus 函数"></p>
<blockquote>
<p>一些有用的性质</p>
</blockquote>
<p><img src="/img/stat_ss_property.png" alt="一些有用的性质"></p>
<h3 id="信息论"><a class="markdownIt-Anchor" href="#信息论"></a> 信息论</h3>
<p>信息论是应用数学的一个分支,主要研究的是对一个信号包含信息的多少进行量化。本书主要是用信息论的一些关键思想来描述概率分布或者量化概率分布之间的相似性。<br>
信息论的基本想法是一个不太可能的事件居然发生了,要比一个非常可能的事件发生,能提供更多的信息。</p>
<p>一个事件<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>x</mi><mo>=</mo><mi>x</mi></mrow><annotation encoding="application/x-tex">x = x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">x</span><span class="mrel">=</span><span class="mord mathit">x</span></span></span></span>的自信息 self-information 为:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>I</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>=</mo><mo>−</mo><mi>log</mi><mi>P</mi><mo>(</mo><mi>x</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">I(x) = -\log P(x)
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.07847em;">I</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span><span class="mrel">=</span><span class="mord">−</span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span></span></span></span></span></p>
<blockquote>
<p>香农熵 Snannkon entropy</p>
</blockquote>
<p>对整个概率分布中的不确定性总量进行量化。是指遵循这个分布的时间所产生的期望信息总量。</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>H</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>=</mo><msub><mi>E</mi><mrow><mo>∼</mo><mi>P</mi></mrow></msub><mo>[</mo><mi>I</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>]</mo><mo>=</mo><mo>−</mo><msub><mi>E</mi><mrow><mi>x</mi><mo>∼</mo><mi>P</mi></mrow></msub><mo>[</mo><mi>l</mi><mi>o</mi><mi>g</mi><mi>P</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>]</mo></mrow><annotation encoding="application/x-tex">H(x) = E_{\sim P}[I(x)] = -E_{x \sim P}[log P(x)]
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span><span class="mrel">=</span><span class="mord"><span class="mord mathit" style="margin-right:0.05764em;">E</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.05764em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mrel">∼</span><span class="mord mathit" style="margin-right:0.13889em;">P</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">[</span><span class="mord mathit" style="margin-right:0.07847em;">I</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span><span class="mclose">]</span><span class="mrel">=</span><span class="mord">−</span><span class="mord"><span class="mord mathit" style="margin-right:0.05764em;">E</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.05764em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">x</span><span class="mrel">∼</span><span class="mord mathit" style="margin-right:0.13889em;">P</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">[</span><span class="mord mathit" style="margin-right:0.01968em;">l</span><span class="mord mathit">o</span><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span><span class="mclose">]</span></span></span></span></span></p>
<p>接近确定性的分布 (输出几乎可以确定) 具有较低的熵,接近均匀分布的概率分布有较高的熵。<br>
<img src="/img/stat_shannon_entropy.png" alt="香农熵"></p>
<blockquote>
<p>KL 散度 Kullback-Leibler (KL) divergence</p>
</blockquote>
<p>对于同一个随机变量 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>x</mi></mrow><annotation encoding="application/x-tex">x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.43056em;"></span><span class="strut bottom" style="height:0.43056em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">x</span></span></span></span> 有两个单独的概率分布 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>P</mi><mo>(</mo><mi>X</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">P(X)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathit" style="margin-right:0.07847em;">X</span><span class="mclose">)</span></span></span></span> 和 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>Q</mi><mo>(</mo><mi>x</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">Q(x)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit">Q</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span></span></span></span>,可以使用 KL 散度来衡量这两个分布的差异。</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><msub><mi>D</mi><mrow><mi>K</mi><mi>L</mi></mrow></msub><mo>(</mo><mi>P</mi><mo>∥</mo><mi>Q</mi><mo>)</mo><mo>=</mo><msub><mi>E</mi><mrow><mi>x</mi><mo>∼</mo><mi>P</mi></mrow></msub><mo>[</mo><mi>log</mi><mfrac><mrow><mi>P</mi><mo>(</mo><mi>x</mi><mo>)</mo></mrow><mrow><mi>Q</mi><mo>(</mo><mi>x</mi><mo>)</mo></mrow></mfrac><mo>]</mo><mo>=</mo><msub><mi>E</mi><mrow><mi>x</mi><mo>∼</mo><mi>P</mi></mrow></msub><mo>[</mo><mi>log</mi><mi>P</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>−</mo><mi>log</mi><mi>Q</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>]</mo></mrow><annotation encoding="application/x-tex">D_{KL}(P \parallel Q) = E_{x \sim P} [\log \frac{P(x)}{Q(x)}] = E_{x \sim P} [\log P(x) - \log Q(x)]
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.427em;"></span><span class="strut bottom" style="height:2.363em;vertical-align:-0.936em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord"><span class="mord mathit" style="margin-right:0.02778em;">D</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.02778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit" style="margin-right:0.07153em;">K</span><span class="mord mathit">L</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">(</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mrel">∥</span><span class="mord mathit">Q</span><span class="mclose">)</span><span class="mrel">=</span><span class="mord"><span class="mord mathit" style="margin-right:0.05764em;">E</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.05764em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">x</span><span class="mrel">∼</span><span class="mord mathit" style="margin-right:0.13889em;">P</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">[</span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit">Q</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span></span></span></span><span style="top:-0.2300000000000001em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mclose">]</span><span class="mrel">=</span><span class="mord"><span class="mord mathit" style="margin-right:0.05764em;">E</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.05764em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">x</span><span class="mrel">∼</span><span class="mord mathit" style="margin-right:0.13889em;">P</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">[</span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span><span class="mbin">−</span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit">Q</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span><span class="mclose">]</span></span></span></span></span></p>
<p>在离散型变量情况下,KL 散度衡量的是,当我们使用一种被设计成能够使得概率分布 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>Q</mi></mrow><annotation encoding="application/x-tex">Q</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.8777699999999999em;vertical-align:-0.19444em;"></span><span class="base textstyle uncramped"><span class="mord mathit">Q</span></span></span></span> 产生的消息的长度最小的编码,发送包含由概率分布 <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>P</mi></mrow><annotation encoding="application/x-tex">P</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span></span></span></span> 产生的符号的消息时,所需要的额外信息量。</p>
<p>KL 散度的性质:<br>
非负。如果为0,当且仅当 P 和 Q 在离散型变量的情况下是相同的分布,或者在连续性变量的情况下是几乎处处相同的。<br>
KL 散度是不对称的。<br>
<img src="/img/stat_kl.png" alt="KL 散度"></p>
<blockquote>
<p>交叉熵 cross-entropy</p>
</blockquote>
<p>与 KL 散度很像。针对 Q 最小化交叉熵等价于最小化 KL 散度。todo</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>H</mi><mo>(</mo><mi>P</mi><mo separator="true">,</mo><mi>Q</mi><mo>)</mo><mo>=</mo><mi>H</mi><mo>(</mo><mi>P</mi><mo>)</mo><mo>+</mo><msub><mi>D</mi><mrow><mi>K</mi><mi>L</mi></mrow></msub><mo>(</mo><mi>P</mi><mo>∥</mo><mi>Q</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">H(P,Q) = H(P) + D_{KL}(P \parallel Q)
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mpunct">,</span><span class="mord mathit">Q</span><span class="mclose">)</span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mclose">)</span><span class="mbin">+</span><span class="mord"><span class="mord mathit" style="margin-right:0.02778em;">D</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.02778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit" style="margin-right:0.07153em;">K</span><span class="mord mathit">L</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">(</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mrel">∥</span><span class="mord mathit">Q</span><span class="mclose">)</span></span></span></span></span></p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>H</mi><mo>(</mo><mi>P</mi><mo separator="true">,</mo><mi>Q</mi><mo>)</mo><mo>=</mo><mo>−</mo><msub><mi>E</mi><mrow><mi>x</mi><mo>∼</mo><mi>P</mi></mrow></msub><mi>log</mi><mi>Q</mi><mo>(</mo><mi>x</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">H(P,Q) = - E_{x \sim P} \log Q(x)
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord mathit" style="margin-right:0.08125em;">H</span><span class="mopen">(</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mpunct">,</span><span class="mord mathit">Q</span><span class="mclose">)</span><span class="mrel">=</span><span class="mord">−</span><span class="mord"><span class="mord mathit" style="margin-right:0.05764em;">E</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.05764em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">x</span><span class="mrel">∼</span><span class="mord mathit" style="margin-right:0.13889em;">P</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mop">lo<span style="margin-right:0.01389em;">g</span></span><span class="mord mathit">Q</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span></span></span></span></span></p>
<h3 id="结构化概率模型"><a class="markdownIt-Anchor" href="#结构化概率模型"></a> 结构化概率模型</h3>
<p>用图表示概率分布的分解。只是概率分布的一种特殊描述。<br>
有向或无向,图的每个节点对应这一个随机变量,连接两个随机变量的边意味着概率分布可以表示为这两个随机变量之间的直接作用。<br>
有向:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>=</mo><msub><mo>∏</mo><mi>i</mi></msub><mi>p</mi><mo>(</mo><msub><mi>x</mi><mi>i</mi></msub><mi mathvariant="normal">∣</mi><msub><mi>P</mi><mrow><mi>a</mi><mi>G</mi></mrow></msub><mo>(</mo><msub><mi>x</mi><mi>i</mi></msub><mo>)</mo><mo>)</mo></mrow><annotation encoding="application/x-tex">p(x) = \prod_i p(x_i | P_{aG}(x_i))
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.0500050000000003em;"></span><span class="strut bottom" style="height:2.327674em;vertical-align:-1.277669em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord mathit">p</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span><span class="mrel">=</span><span class="mop op-limits"><span class="vlist"><span style="top:1.1776689999999999em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span style="top:-0.000005000000000143778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span><span class="op-symbol large-op mop">∏</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathit">p</span><span class="mopen">(</span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord mathrm">∣</span><span class="mord"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.13889em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord scriptstyle cramped"><span class="mord mathit">a</span><span class="mord mathit">G</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathit">x</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose">)</span><span class="mclose">)</span></span></span></span></span></p>
<p><img src="/img/stat_struct_directed.png" alt=""></p>
<p>无向:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>p</mi><mo>(</mo><mi>x</mi><mo>)</mo><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><mrow><mi>Z</mi></mrow></mfrac><msub><mo>∏</mo><mi>i</mi></msub><msup><mi>ϕ</mi><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow></msup><mo>(</mo><msup><mi>C</mi><mrow><mo>(</mo><mi>i</mi><mo>)</mo></mrow></msup><mo>)</mo></mrow><annotation encoding="application/x-tex">p(x) = \frac{1}{Z} \prod_i \phi^{(i)} (C^{(i)})
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:1.32144em;"></span><span class="strut bottom" style="height:2.599109em;vertical-align:-1.277669em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord mathit">p</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mclose">)</span><span class="mrel">=</span><span class="mord reset-textstyle displaystyle textstyle uncramped"><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span><span class="mfrac"><span class="vlist"><span style="top:0.686em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle cramped"><span class="mord textstyle cramped"><span class="mord mathit" style="margin-right:0.07153em;">Z</span></span></span></span><span style="top:-0.22999999999999998em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped frac-line"></span></span><span style="top:-0.677em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle textstyle uncramped"><span class="mord textstyle uncramped"><span class="mord mathrm">1</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="sizing reset-size5 size5 reset-textstyle textstyle uncramped nulldelimiter"></span></span><span class="mop op-limits"><span class="vlist"><span style="top:1.1776689999999999em;margin-left:0em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">i</span></span></span><span style="top:-0.000005000000000143778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span><span class="op-symbol large-op mop">∏</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mord"><span class="mord mathit">ϕ</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mopen">(</span><span class="mord mathit">i</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord mathit" style="margin-right:0.07153em;">C</span><span class="vlist"><span style="top:-0.413em;margin-right:0.05em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle uncramped"><span class="mord scriptstyle uncramped"><span class="mopen">(</span><span class="mord mathit">i</span><span class="mclose">)</span></span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose">)</span></span></span></span></span></p>
<p><img src="/img/stat_struct_undirected.png" alt=""></p>
<h2 id="概率导论"><a class="markdownIt-Anchor" href="#概率导论"></a> 概率导论</h2>
<p>doing</p>
<p>Ref:<br>
[1] cs229<br>
[2] Edx - Microsoft Professional Program Certificate in Data Science = Data Sciencae Orientation<br>
[3] 概率导论</p>
</content>
<summary type="html">
概率论与统计学直觉。
</summary>
<category term="Math" scheme="http://conglang.github.io/categories/Math/"/>
<category term="Math" scheme="http://conglang.github.io/tags/Math/"/>
<category term="Probability" scheme="http://conglang.github.io/tags/Probability/"/>
<category term="Statistics" scheme="http://conglang.github.io/tags/Statistics/"/>
</entry>
<entry>
<title>论文 DeepID 系列</title>
<link href="http://conglang.github.io/2018/08/01/essay-deepid/"/>
<id>http://conglang.github.io/2018/08/01/essay-deepid/</id>
<published>2018-08-01T13:12:09.000Z</published>
<updated>2018-08-02T14:45:26.000Z</updated>
<content type="html"><h2 id="introduction"><a class="markdownIt-Anchor" href="#introduction"></a> Introduction</h2>
<p>DeepID 已演进数次。</p>
<ul>
<li>DeepID1: Deep Learning Face Representation from Predicting 10,000 Classes</li>
<li>DeepID2: deep learning face representation by joint identification-verification</li>
<li>DeepId2+: Deeply learned face representations are sparse, selective, and robust</li>
<li>DeepID3: Face Recognition with Very Deep Neural Networks</li>
</ul>
<p>卷积神经网络在DeepID中的作用是是学习特征,即将图片输入进去,学习到一个160维的向量。然后再这个160维向量上,套用各种现成的分类器,即可得到结果。</p>
<p>DeepID 算法优化的主要手段就是增大数据集。</p>
<p><img src="/img/n2-process.png" alt="Whole Process"><br>
在上述的流程中,DeepID可以换为Hog,LBP等传统特征提取算法。Classifier可以是SVM,Joint Bayes,LR,NN等任意的machine learning分类算法。</p>
<p>在引入外部数据集的情况下,训练流程是这样的。首先,外部数据集4:1进行切分,4那份用来训练DeepID,1那份作为训练DeepID的验证集;然后,1那份用来训练Classifier。这样划分的原因在于两层模型不能使用同一种数据进行训练,容易产生过拟合。</p>
<h2 id="deepid1"><a class="markdownIt-Anchor" href="#deepid1"></a> DeepID1</h2>
<blockquote>
<p>face patchs ---- ConvNet ----&gt; high-level feature of last hidden layer<br>
features ---- joint bayesian or neural network ----&gt; face verification</p>
</blockquote>
<p>DeepID features -&gt; Last hidden layer of each ConvNet (160d)<br>
200+ ConvNets (each ConvNets are corresponding to one patch)<br>
<img src="/img/deepid1_feature_extraction_process.png" alt="Feature Extraction Process"></p>
<h3 id="deep-convnets"><a class="markdownIt-Anchor" href="#deep-convnets"></a> Deep ConvNets</h3>
<p><img src="/img/deepid1_convnet_structure.png" alt="Convnet Structure"></p>
<p>注意倒数第二层,DeepID feature 那一层,与 Convolutional layer 4 和 Max-pooling layer 3 相连,是为了减少信息损失,既考虑局部的特征,又考虑全局的特征。</p>
<p>The last hidden layer of DeepID is fully connected to both the third and fourth convolutional layers (after max- pooling) such that it sees multi-scale features. This is critical to feature learning because after successive down-sampling along the cascade, the fourth convolutional layer contains too few neurons and becomes the bottleneck for information propagation.</p>
<h3 id="feature-extraction"><a class="markdownIt-Anchor" href="#feature-extraction"></a> Feature Extraction</h3>
<p><img src="/img/deepid1_face_regions.png" alt="Face Regions"><br>
人脸图片的预处理方式 aligned and patch。</p>
<ul>
<li>Faces are globally aligned by similarity transformation according to the two eye centers and the mid-point of the two mouth corners.</li>
<li>Features are extracted from 60 face patches with ten regions, three scales, and RGB or gray channels.</li>
</ul>
<h3 id="face-verification"><a class="markdownIt-Anchor" href="#face-verification"></a> Face Verification</h3>
<h4 id="joint-bayesian"><a class="markdownIt-Anchor" href="#joint-bayesian"></a> Joint Bayesian</h4>
<p><img src="/img/deepid1_joint_bayesian.png" alt="Joint Bayesian for Face Verification"></p>
<h4 id="neural-network"><a class="markdownIt-Anchor" href="#neural-network"></a> Neural Network</h4>
<p><img src="/img/deepid1_neural_network_for_face_verification.png" alt="Neural Network for Face Verification"><br>
Input layer: 60 groups, each has [<code>2 (a patch pair) * 160 (d features of a convnet) * 2 (patch and its horizontally flipped counterpart)</code>]<br>
Features in the same group are highly correlated.</p>
<h3 id="experiments"><a class="markdownIt-Anchor" href="#experiments"></a> Experiments</h3>
<ul>
<li>使用multi-scale patches的convnet比只使用一个只有整张人脸的patch的效果要好。</li>
<li>DeepID自身的分类错误率在40%到60%之间震荡,虽然较高,但DeepID是用来学特征的,并不需要要关注自身分类错误率。</li>
<li>使用DeepID神经网络的最后一层softmax层作为特征表示,效果很差。</li>
<li>随着DeepID的训练集人数的增长,DeepID本身的分类正确率和LFW的验证正确率都在增加。</li>
</ul>
<h2 id="deepid2"><a class="markdownIt-Anchor" href="#deepid2"></a> DeepID2</h2>
<blockquote>
<p>face identification signal + face verification signal</p>
</blockquote>
<p>DeepID1的卷积神经网络最后一层softmax使用的是Logistic Regression作为最终的目标函数,也就是识别信号 face identification signal;<br>
但在DeepID2中,目标函数上添加了验证信号 face verification signal,两个信号使用加权的方式进行了组合。</p>
<h3 id="identification-verification-guided-deep-feature-learning"><a class="markdownIt-Anchor" href="#identification-verification-guided-deep-feature-learning"></a> Identification-Verification Guided Deep Feature Learning</h3>
<p><img src="/img/deepid2_convnet.png" alt="The ConvNet structure for DeepID2 extraction"><br>
<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>f</mi><mo>=</mo><mi>C</mi><mi>o</mi><mi>n</mi><mi>v</mi><mo>(</mo><mi>x</mi><mo separator="true">,</mo><msub><mi>θ</mi><mi>c</mi></msub><mo>)</mo></mrow><annotation encoding="application/x-tex">f = Conv(x, \theta_c)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.10764em;">f</span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.07153em;">C</span><span class="mord mathit">o</span><span class="mord mathit">n</span><span class="mord mathit" style="margin-right:0.03588em;">v</span><span class="mopen">(</span><span class="mord mathit">x</span><span class="mpunct">,</span><span class="mord"><span class="mord mathit" style="margin-right:0.02778em;">θ</span><span class="vlist"><span style="top:0.15em;margin-right:0.05em;margin-left:-0.02778em;"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span><span class="reset-textstyle scriptstyle cramped"><span class="mord mathit">c</span></span></span><span class="baseline-fix"><span class="fontsize-ensurer reset-size5 size5"><span style="font-size:0em;"></span></span></span></span></span><span class="mclose">)</span></span></span></span><br>
x is input face patch, f is DeepID2 vector, θc is convnet parameters to be learned.</p>
<p>Two supervisory signals:</p>
<ul>
<li>face identification signal 识别信号<br>
Classifies each face image into one of n different identities.<br>
softmax<br>
<img src="/img/deepid2_identification_signal.png" alt="Formular of Face Identification Signal"><br>
f is the DeepID2 vector, t is the target class, θ is the softmax layer parameters, p is the target probability distribution, p hat is the predicted probability distribution.</li>
<li>face verification signal 验证信号<br>
encourages DeepID2 extracted from faces of the same identity to be similar.<br>
Regularize DeepID2 to reduce the intra-personal variations. Can be L1/L2 norm and cosine similarity.<br>
<img src="/img/deepid2_verification_signal.png" alt="Formular of Face Verification Signal"><br>
f1 and f2 are DeepID2 vectors of two images; y=1 means same identity, minimize L2; y=-1 means different identity, distance larger than margin m.</li>
</ul>
<p>由于验证信号的计算需要两个样本,所以整个卷积神经网络的训练过程也就发生了变化,之前是将全部数据切分为小的batch来进行训练。现在则是每次迭代时随机抽取两个样本,然后进行训练。训练过程如下:<br>
<img src="/img/deepid2_training.png" alt="The DeepID2 learning algorithm"><br>
在训练过程中,lambda是验证信号的加权参数。参数是动态调整的,调整策略是使最近的训练样本上的验证错误率最低。</p>
<h3 id="experiments-2"><a class="markdownIt-Anchor" href="#experiments-2"></a> Experiments</h3>
<p>首先使用SDM算法对每张人脸检测出21个landmarks,然后根据这些landmarks,再加上位置、尺度、通道、水平翻转等因素,每张人脸形成了400张patch,使用200个CNN对其进行训练,水平翻转形成的patch跟原始图片放在一起进行训练。这样,就形成了400×160维的向量。</p>
<p>这样形成的特征维数太高,所以要进行特征选择,不同于之前的DeepID直接采用PCA的方式,DeepID2先对patch进行选取,使用前向-后向贪心算法选取了25个最有效的patch,这样就只有25×160维向量,然后使用PCA进行降维,降维后为180维,然后再输入到联合贝叶斯模型中进行分类。</p>
<p>DeepID2使用的外部数据集仍然是CelebFaces+,但先把CelebFaces+进行了切分,切分成了CelebFaces+A(8192个人)和CelebFaces+B(1985个人)。首先,训练DeepID2,CelebFaces+A做训练集,此时CelebFaces+B做验证集;其次,CelebFaces+B切分为1485人和500人两个部分,进行特征选择,选择25个patch。最后在CelebFaces+B整个数据集上训练联合贝叶斯模型,然后在LFW上进行测试。在上一段描述的基础上,进行了组合模型的加强,即在选取特征时进行了七次。第一次选效果最好的25个patch,第二次从剩余的patch中再选25个,以此类推。然后将七个联合贝叶斯模型使用SVM进行融合。最终达到了99.15%的结果。</p>
<p>其中,选取的25个patch如下:<br>
<img src="/img/deepid2_patches.png" alt="Patches selected for feature extraction"></p>
<p>对lambda进行调整,也即对识别信号和验证信号进行平衡,发现lambda在0.05的时候最好。使用LDA中计算类间方差和类内方差的方法进行计算。得到的结果如下:<br>
<img src="/img/deepid2_variance_compare.png" alt="Variance Compare"></p>
<p>可以发现,在lambda=0.05的时候,类间方差几乎不变,类内方差下降了很多。这样就保证了类间区分性,而减少了类内区分性。如果lambda为无穷大,即只有验证信号时,类间方差和类内方差都变得很小,不利于最后的分类。</p>
<ul>
<li>DeepID的训练集人数越多,最后的验证率越高。</li>
<li>对不同的验证信号,包括L1,L2,cosin等分别进行了实验,发现L2 Norm最好。</li>
</ul>
<h2 id="deepid2-2"><a class="markdownIt-Anchor" href="#deepid2-2"></a> DeepID2+</h2>
<blockquote>
<p>Compared with the DeepID2, DeepID2+ added the supervisory signal in the early layers and increases the dimension of hidden repsresentation.<br>
In the DeepID2+,author discover some nice property of neural network: sparsity, selectivity and robustness.</p>
</blockquote>
<ul>
<li>Sparsity<br>
神经单元的适度稀疏性,该性质甚至可以保证即便经过二值化后,仍然可以达到较好的识别效果。</li>
<li>Selectivity<br>
高层的神经单元对人比较敏感,即对同一个人的头像来说,总有一些单元处于一直激活或者一直抑制的状态。</li>
<li>Robustness<br>
DeepID2+的输出对遮挡非常鲁棒。</li>
</ul>
<h3 id="deepid2-nets"><a class="markdownIt-Anchor" href="#deepid2-nets"></a> DeepID2+ Nets</h3>
<p>和 DeepID2 相比有三点改动。</p>
<ul>
<li>DeepID 层从160维提高到512维。</li>
<li>训练集将 CelebFaces+ 和 WDRef 数据集进行了融合,共有12000人,290000张图片。</li>
<li>将 DeepID 层不仅和第四层和第三层的 max-pooling 层连接,还连接了第一层和第二层的 max-pooling层。</li>
</ul>
<p><img src="/img/deepid2p_neural_net.png" alt="DeepID+ Neural Net"><br>
<img src="/img/deepid2p_net.png" alt="DeepID+ Neural Net"><br>
joint face identification-verification<br>
supervisory signals</p>
<h3 id="moderate-sparsity-of-neural-activations"><a class="markdownIt-Anchor" href="#moderate-sparsity-of-neural-activations"></a> Moderate Sparsity of Neural Activations</h3>
<ul>
<li>Sparsity for each image<br>
一张 image 差不多激活半数的 neuron,使不同身份的 face 更可区分。</li>
<li>Sparsity for each neuron<br>
一个 neuron 差不多被半数的 image 激活,使其有更大区分度。</li>
</ul>
<p><img src="/img/deepid2p_sparsity_hist.png" alt="Sparsity"></p>
<p>Activation patterns are more important than precise activation values. 所以使用阈值对最后输出的512维向量进行了二值化处理,发现效果降低有限。<br>
<img src="/img/deepid2p_binary_better.png" alt="Comparison of the Original and the Binary"><br>
而二值化的数据更节省空间和计算能力,图片搜索更快。</p>
<h3 id="selectiveness-on-identities-and-attributes"><a class="markdownIt-Anchor" href="#selectiveness-on-identities-and-attributes"></a> Selectiveness on Identities and Attributes</h3>
<p>存在某个神经单元,只使用普通的阈值法,就能针对某个人得到97%的正确率。不同的神经单元针对不同的人或不同的种族或不同的年龄都有很强的区分性。这和它的激活或抑制态有关。</p>
<h3 id="robustness-of-deepid-features"><a class="markdownIt-Anchor" href="#robustness-of-deepid-features"></a> Robustness of DeepID+ Features</h3>
<p>在训练数据中没有遮挡数据的情况下,DeepID2+自动就对遮挡有了很好的鲁棒性。<br>
有两种方式对人脸进行多种尺度的遮挡,第一种是从下往上进行遮挡,从10%-70%。第二种是不同大小的黑块随机放,黑块的大小从10×10到70×70。<br>
<img src="/img/deepid2p_occluded_image.png" alt="Occluded Images"></p>
<p>结论是遮挡在20%以内,块大小在30×30以下,DeepID2+的输出的向量的验证正确率几乎不变。<br>
<img src="/img/deepid2p_occlusion_ratio.png" alt="Occulusion Ratio"><br>
<img src="/img/deepid2p_occlusion_block.png" alt="Occulusion Block"></p>
<h2 id="deepid3"><a class="markdownIt-Anchor" href="#deepid3"></a> DeepID3</h2>
<blockquote>
<p>Explore 2 very deep neural network architectures.<br>
Stacked convolution in VGG net<br>
Inception layers in GoogLeNet</p>
</blockquote>
<h3 id="deepid3-net"><a class="markdownIt-Anchor" href="#deepid3-net"></a> DeepID3 Net</h3>
<p><img src="/img/deepid3_neural_net.png" alt="DeepID3 Net"><br>
DeepID3 比起 DeepID2+ 并没有明显的优势。</p>
<p>Ref:<br>
[1] <a href="https://www.researchgate.net/publication/283749931_Deep_Learning_Face_Representation_from_Predicting_10000_Classes" target="_blank" rel="external">https://www.researchgate.net/publication/283749931_Deep_Learning_Face_Representation_from_Predicting_10000_Classes</a><br>
[2] <a href="https://arxiv.org/abs/1406.4773" target="_blank" rel="external">https://arxiv.org/abs/1406.4773</a><br>
[3] <a href="https://arxiv.org/abs/1412.1265" target="_blank" rel="external">https://arxiv.org/abs/1412.1265</a><br>
[4] <a href="https://arxiv.org/abs/1502.00873" target="_blank" rel="external">https://arxiv.org/abs/1502.00873</a><br>
[5] <a href="https://blog.csdn.net/stdcoutzyx/article/details/42091205" target="_blank" rel="external">https://blog.csdn.net/stdcoutzyx/article/details/42091205</a></p>
</content>
<summary type="html">
论文,DeepId 系列
</summary>
<category term="ML & DL" scheme="http://conglang.github.io/categories/ML-DL/"/>
<category term="Deep Learning" scheme="http://conglang.github.io/tags/Deep-Learning/"/>
<category term="Essay" scheme="http://conglang.github.io/tags/Essay/"/>
<category term="Face Recognition" scheme="http://conglang.github.io/tags/Face-Recognition/"/>
</entry>
<entry>
<title>论文 FaceNet - A Unified Embedding for Face Recognition and Clustering</title>
<link href="http://conglang.github.io/2018/07/31/essay-facenet/"/>
<id>http://conglang.github.io/2018/07/31/essay-facenet/</id>
<published>2018-07-31T12:49:21.000Z</published>
<updated>2018-07-31T16:10:32.000Z</updated>
<content type="html"><h2 id="introduction"><a class="markdownIt-Anchor" href="#introduction"></a> Introduction</h2>
<p>核心思想</p>
<blockquote>
<p>Face Image -&gt; 128-D Embedding (End to End)<br>
Euclidean distance between Embeddings = Measure of face similarity<br>
Triplet Loss = minimize sum(||A - P|| - ||A - N|| + α),P和N如何选择很重要</p>
</blockquote>
<p>with Embedding, face recognition, verification, clustering 变成了常规任务,Embedding 之间距离的较量。<br>
图片是 tight crops of the face area,无 2D or 3D alignment。</p>
<h2 id="triplet-loss"><a class="markdownIt-Anchor" href="#triplet-loss"></a> Triplet Loss</h2>
<p>为什么不用 softmax?</p>
<ul>
<li>Usually in supervised learning we have a fixed number of classes and train the network using the softmax cross entropy loss. However in some cases we need to be able to have a variable number of classes. In face recognition for instance, we need to be able to compare two unknown faces and say whether they are from the same person or not.</li>
</ul>
<p>Triplet loss tries to enforce a margin between each pair of faces from one person to all other faces. 和 SVM 的 margin 有点像。</p>
<p>triplets of embeddings:</p>
<ul>
<li>an anchor</li>
<li>a positive of the same class as the anchor</li>
<li>a negative of a different class</li>
</ul>
<p><img src="/img/triplet_loss.png" alt="Triplet Loss"><br>
<img src="/img/facenet_triplet_loss.png" alt="Triplet Loss"><br>
公式<br>
<img src="/img/facenet_triplet_loss_formula.png" alt="Triplet Loss"><br>
即 For some distance on the embedding space <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>d</mi></mrow><annotation encoding="application/x-tex">d</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.69444em;"></span><span class="strut bottom" style="height:0.69444em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit">d</span></span></span></span>, the loss of a triplet <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mo>(</mo><mi>a</mi><mo separator="true">,</mo><mi>p</mi><mo separator="true">,</mo><mi>n</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">(a,p,n)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mopen">(</span><span class="mord mathit">a</span><span class="mpunct">,</span><span class="mord mathit">p</span><span class="mpunct">,</span><span class="mord mathit">n</span><span class="mclose">)</span></span></span></span> is:</p>
<p><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>L</mi><mo>=</mo><mi>max</mi><mo>(</mo><mi>d</mi><mo>(</mo><mi>a</mi><mo separator="true">,</mo><mi>p</mi><mo>)</mo><mo>−</mo><mi>d</mi><mo>(</mo><mi>a</mi><mo separator="true">,</mo><mi>n</mi><mo>)</mo><mo>+</mo><mi>m</mi><mi>a</mi><mi>r</mi><mi>g</mi><mi>i</mi><mi>n</mi><mo separator="true">,</mo><mn>0</mn><mo>)</mo></mrow><annotation encoding="application/x-tex">L = \max(d(a,p) - d(a,n) + margin, 0)
</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base displaystyle textstyle uncramped"><span class="mord mathit">L</span><span class="mrel">=</span><span class="mop">max</span><span class="mopen">(</span><span class="mord mathit">d</span><span class="mopen">(</span><span class="mord mathit">a</span><span class="mpunct">,</span><span class="mord mathit">p</span><span class="mclose">)</span><span class="mbin">−</span><span class="mord mathit">d</span><span class="mopen">(</span><span class="mord mathit">a</span><span class="mpunct">,</span><span class="mord mathit">n</span><span class="mclose">)</span><span class="mbin">+</span><span class="mord mathit">m</span><span class="mord mathit">a</span><span class="mord mathit" style="margin-right:0.02778em;">r</span><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="mord mathit">i</span><span class="mord mathit">n</span><span class="mpunct">,</span><span class="mord mathrm">0</span><span class="mclose">)</span></span></span></span></span></p>
<p><img src="/img/facenet_triplet_loss_function.png" alt="Triplet Loss Function"></p>
<p>Triplet 应当选 hard triplets,违反公式1的例子,这样才对模型的训练有帮助,fast convergence。</p>
<h2 id="triplet-selection-and-training-procedure"><a class="markdownIt-Anchor" href="#triplet-selection-and-training-procedure"></a> Triplet Selection and Training Procedure</h2>
<p>Three categories of triplets:</p>
<ul>
<li>Easy Triplets<br>
triplets which have a loss of 0, because <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>d</mi><mo>(</mo><mi>a</mi><mo separator="true">,</mo><mi>p</mi><mo>)</mo><mo>+</mo><mi>m</mi><mi>a</mi><mi>r</mi><mi>g</mi><mi>i</mi><mi>n</mi><mo>&lt;</mo><mi>d</mi><mo>(</mo><mi>a</mi><mo separator="true">,</mo><mi>n</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">d(a,p) + margin \lt d(a,n)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit">d</span><span class="mopen">(</span><span class="mord mathit">a</span><span class="mpunct">,</span><span class="mord mathit">p</span><span class="mclose">)</span><span class="mbin">+</span><span class="mord mathit">m</span><span class="mord mathit">a</span><span class="mord mathit" style="margin-right:0.02778em;">r</span><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="mord mathit">i</span><span class="mord mathit">n</span><span class="mrel">&lt;</span><span class="mord mathit">d</span><span class="mopen">(</span><span class="mord mathit">a</span><span class="mpunct">,</span><span class="mord mathit">n</span><span class="mclose">)</span></span></span></span></li>
<li>Hard Triplets<br>
triplets where the negative is closer to the anchor than the positive, i.e. <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>d</mi><mo>(</mo><mi>a</mi><mo separator="true">,</mo><mi>n</mi><mo>)</mo><mo>&lt;</mo><mi>d</mi><mo>(</mo><mi>a</mi><mo separator="true">,</mo><mi>p</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">d(a,n) \lt d(a,p)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit">d</span><span class="mopen">(</span><span class="mord mathit">a</span><span class="mpunct">,</span><span class="mord mathit">n</span><span class="mclose">)</span><span class="mrel">&lt;</span><span class="mord mathit">d</span><span class="mopen">(</span><span class="mord mathit">a</span><span class="mpunct">,</span><span class="mord mathit">p</span><span class="mclose">)</span></span></span></span></li>
<li>Semi-hard Triplet<br>
triplets where the negative is not closer to the anchor than the positive, but which still have positive loss: <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>d</mi><mo>(</mo><mi>a</mi><mo separator="true">,</mo><mi>p</mi><mo>)</mo><mo>&lt;</mo><mi>d</mi><mo>(</mo><mi>a</mi><mo separator="true">,</mo><mi>n</mi><mo>)</mo><mo>&lt;</mo><mi>d</mi><mo>(</mo><mi>a</mi><mo separator="true">,</mo><mi>p</mi><mo>)</mo><mo>+</mo><mi>m</mi><mi>a</mi><mi>r</mi><mi>g</mi><mi>i</mi><mi>n</mi></mrow><annotation encoding="application/x-tex">d(a,p) \lt d(a,n) \lt d(a,p) + margin</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit">d</span><span class="mopen">(</span><span class="mord mathit">a</span><span class="mpunct">,</span><span class="mord mathit">p</span><span class="mclose">)</span><span class="mrel">&lt;</span><span class="mord mathit">d</span><span class="mopen">(</span><span class="mord mathit">a</span><span class="mpunct">,</span><span class="mord mathit">n</span><span class="mclose">)</span><span class="mrel">&lt;</span><span class="mord mathit">d</span><span class="mopen">(</span><span class="mord mathit">a</span><span class="mpunct">,</span><span class="mord mathit">p</span><span class="mclose">)</span><span class="mbin">+</span><span class="mord mathit">m</span><span class="mord mathit">a</span><span class="mord mathit" style="margin-right:0.02778em;">r</span><span class="mord mathit" style="margin-right:0.03588em;">g</span><span class="mord mathit">i</span><span class="mord mathit">n</span></span></span></span></li>
</ul>
<p><img src="/img/triplets.png" alt="Categories of Negatives"></p>
<p>在选择 triplet 时,我们想要 Hard Positive $$argmax_{x_i^p} \parallel f(x_i^a) - f(x_i^p) \parallel_2^2$$ 和 Hard Negative $$argmin_{x_i^n} \parallel f(x_i^a) - f(x_i^n) \parallel_2^2$$。<br>
但是在整个训练集上计算不现实,而且 outlier 和 mislabelled 会严重影响选择。</p>
<p>The paper pick a random semi-hard negative for every pair of anchor and positive, and train on these triplets.</p>
<p>有两条出路:</p>
<ul>
<li>Offline Triplet Mining<br>
Generate triplets offline every n steps, using the most recent network checkpoint and computing the argmin and argmax on a subset of the data.<br>
Not efficient enough.</li>
<li>Online Triplet Mining<br>
Generate triplets online. This can be done by selecting the hard positive/negative exemplars from within a mini-batch.</li>
</ul>
<h3 id="online-generation"><a class="markdownIt-Anchor" href="#online-generation"></a> Online Generation</h3>
<p>In online mining, we have computed a batch of <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.05017em;">B</span></span></span></span> embeddings from a batch of <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>B</mi></mrow><annotation encoding="application/x-tex">B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.05017em;">B</span></span></span></span> inputs.<br>
valid triplet: (i,j,k) 中 i,j 属于同一人,k 则不。</p>
<p>Suppose that you have a batch of faces as input of size <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>B</mi><mo>=</mo><mi>P</mi><mi>K</mi></mrow><annotation encoding="application/x-tex">B = PK</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.05017em;">B</span><span class="mrel">=</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mord mathit" style="margin-right:0.07153em;">K</span></span></span></span>, composed of <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>P</mi></mrow><annotation encoding="application/x-tex">P</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span></span></span></span> different persons with <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>K</mi></mrow><annotation encoding="application/x-tex">K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.07153em;">K</span></span></span></span> images each. A typical value is <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>K</mi><mo>=</mo><mn>4</mn></mrow><annotation encoding="application/x-tex">K=4</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.07153em;">K</span><span class="mrel">=</span><span class="mord mathrm">4</span></span></span></span>. 有两种策略:</p>
<ul>
<li>Batch All<br>
select all the valid triplets, and average the loss on the hard and semi-hard triplets.
<ul>
<li>a crucial point here is to not take into account the easy triplets (those with loss<br>
0), as averaging on them would make the overall loss very small.</li>
<li>this produces a total of <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>P</mi><mi>K</mi><mo>(</mo><mi>K</mi><mo>−</mo><mn>1</mn><mo>)</mo><mo>(</mo><mi>P</mi><mi>K</mi><mo>−</mo><mi>K</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">PK(K-1)(PK-K)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mord mathit" style="margin-right:0.07153em;">K</span><span class="mopen">(</span><span class="mord mathit" style="margin-right:0.07153em;">K</span><span class="mbin">−</span><span class="mord mathrm">1</span><span class="mclose">)</span><span class="mopen">(</span><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mord mathit" style="margin-right:0.07153em;">K</span><span class="mbin">−</span><span class="mord mathit" style="margin-right:0.07153em;">K</span><span class="mclose">)</span></span></span></span> triplets (<span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>P</mi><mi>K</mi></mrow><annotation encoding="application/x-tex">PK</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mord mathit" style="margin-right:0.07153em;">K</span></span></span></span> anchors, <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>K</mi><mo>−</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">K-1</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.76666em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.07153em;">K</span><span class="mbin">−</span><span class="mord mathrm">1</span></span></span></span> possible positives per anchor, <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>P</mi><mi>K</mi><mo>−</mo><mi>K</mi></mrow><annotation encoding="application/x-tex">PK-K</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.76666em;vertical-align:-0.08333em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mord mathit" style="margin-right:0.07153em;">K</span><span class="mbin">−</span><span class="mord mathit" style="margin-right:0.07153em;">K</span></span></span></span> possible negatives).</li>
</ul>
</li>
<li>Batch Hard (better)<br>
for each anchor, select the hardest positive (biggest distance <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>d</mi><mo>(</mo><mi>a</mi><mo separator="true">,</mo><mi>p</mi><mo>)</mo></mrow><annotation encoding="application/x-tex">d(a,p)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.75em;"></span><span class="strut bottom" style="height:1em;vertical-align:-0.25em;"></span><span class="base textstyle uncramped"><span class="mord mathit">d</span><span class="mopen">(</span><span class="mord mathit">a</span><span class="mpunct">,</span><span class="mord mathit">p</span><span class="mclose">)</span></span></span></span> and the hardest negative among the batch.
<ul>
<li>this produces <span class="katex"><span class="katex-mathml"><math><semantics><mrow><mi>P</mi><mi>K</mi></mrow><annotation encoding="application/x-tex">PK</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="strut" style="height:0.68333em;"></span><span class="strut bottom" style="height:0.68333em;vertical-align:0em;"></span><span class="base textstyle uncramped"><span class="mord mathit" style="margin-right:0.13889em;">P</span><span class="mord mathit" style="margin-right:0.07153em;">K</span></span></span></span> triplets.</li>
<li>the selected tripltes are the hardest among the batch.</li>
</ul>
</li>
</ul>
<h2 id="model-architecture"><a class="markdownIt-Anchor" href="#model-architecture"></a> Model Architecture</h2>
<p><img src="/img/facenet_model_structure.png" alt="Model Structure"><br>
Train the CNN using Stochastic Gradient Descent (SGD) with standard backprop and AdaGrad.</p>
<p>两种,Zeiler&amp;Fergus based Model,GoogLeNet style Inception Model。</p>
<p>Their practical differences lie in the difference of parameters and FLOPS. 选择使用哪个模型要看应用场景。</p>
<ul>
<li>Model running in a datacenter can have many parameters and require a large number of FLOPS.</li>
<li>Model running on a mobile phone needs to have few parameters, so that it can fit into memory.</li>
</ul>
<h3 id="zeilerfergus-based-model"><a class="markdownIt-Anchor" href="#zeilerfergus-based-model"></a> Zeiler&amp;Fergus based Model</h3>
<p>Per image</p>
<ul>
<li>140 million parameters</li>
<li>1.6 billion FLOPS</li>
</ul>
<p><img src="/img/facenet_model1.jpg" alt="Zeiler&amp;Fergus based Model"></p>
<h3 id="googlenet-style-inception-model"><a class="markdownIt-Anchor" href="#googlenet-style-inception-model"></a> GoogLeNet style Inception Model</h3>
<p>Per image</p>
<ul>
<li>6.6M -7.5M parameters</li>
<li>500M - 1.6B FLOPS</li>
</ul>
<p><img src="/img/facenet_model2.jpg" alt="GoogLeNet style Inception Model"></p>
<h2 id="experiments"><a class="markdownIt-Anchor" href="#experiments"></a> Experiments</h2>
<h3 id="flops-vs-accuracy-trade-off"><a class="markdownIt-Anchor" href="#flops-vs-accuracy-trade-off"></a> FLOPS vs. Accuracy Trade-off</h3>
<p>注:model parameters 与 Accuray 没看出明显相关性。<br>
<img src="/img/facenet_network_architectures.png" alt="Network Architectures"><br>
<img src="/img/facenet_network_roc.jpg" alt="Network Architectures"><br>
<img src="/img/facenet_flop_accuracy_tradeoff.jpg" alt="FLOPS vs. Accuracy trade-off"></p>
<h3 id="sensitivity-to-image-quality"><a class="markdownIt-Anchor" href="#sensitivity-to-image-quality"></a> Sensitivity to Image Quality</h3>
<p><img src="/img/face_image_quality.jpg" alt="Image Quality"></p>
<h3 id="embedding-dimensionality"><a class="markdownIt-Anchor" href="#embedding-dimensionality"></a> Embedding Dimensionality</h3>
<p><img src="/img/facenet_embedding_dimensionality.jpg" alt="Embedding Dimensionality"></p>
<h3 id="amouint-of-training-data"><a class="markdownIt-Anchor" href="#amouint-of-training-data"></a> Amouint of Training Data</h3>
<p><img src="/img/facenet_training_data_size.jpg" alt="Training Data Size"></p>
<h2 id="summary"><a class="markdownIt-Anchor" href="#summary"></a> Summary</h2>
<p>优点:</p>
<ul>
<li>直接学习 an embedding into<br>
an Euclidean space for face verification.</li>
<li>不需要太多 alignment,只需要 tight crop arouind the face area。</li>
</ul>
<p>未来:</p>
<ul>
<li>Better understanding of the error cases;</li>
<li>Further improving the model;</li>
<li>Reducing model size and reducing CPU + requirements;</li>
<li>Reduce the currently extremely long training time.</li>
</ul>
<h2 id="code"><a class="markdownIt-Anchor" href="#code"></a> Code</h2>
<p>比葫芦画瓢实现了一下。<br>
<a href="https://github.com/Conglang/DeepOps/tree/master/facenet_face_recognition" target="_blank" rel="external">FaceNet Face Recognition</a></p>
<p>Ref:<br>
[1] <a href="https://arxiv.org/abs/1503.03832" target="_blank" rel="external">https://arxiv.org/abs/1503.03832</a><br>
[2] <a href="https://omoindrot.github.io/triplet-loss" target="_blank" rel="external">https://omoindrot.github.io/triplet-loss</a></p>
</content>
<summary type="html">
论文,FaceNet - A Unified Embedding for Face Recognition and Clustering
</summary>
<category term="ML & DL" scheme="http://conglang.github.io/categories/ML-DL/"/>
<category term="Deep Learning" scheme="http://conglang.github.io/tags/Deep-Learning/"/>
<category term="Essay" scheme="http://conglang.github.io/tags/Essay/"/>
<category term="Face Recognition" scheme="http://conglang.github.io/tags/Face-Recognition/"/>
</entry>
<entry>
<title>Face Recognition 人脸识别</title>
<link href="http://conglang.github.io/2018/07/30/face-recognition/"/>
<id>http://conglang.github.io/2018/07/30/face-recognition/</id>
<published>2018-07-30T15:18:35.000Z</published>
<updated>2018-09-02T16:37:15.144Z</updated>
<content type="html"><p>doing</p>
<h2 id="face-recognition"><a class="markdownIt-Anchor" href="#face-recognition"></a> Face Recognition</h2>
<p>Detection -&gt; Alignment(~= landmark localization) -&gt; Recognition</p>
<h2 id="-recognition"><a class="markdownIt-Anchor" href="#-recognition"></a> -&gt; Recognition:</h2>
<p>“Deep Face Recognition - A Survey” 这篇论文介绍了人脸识别领域的大致样貌。内容基本如下:</p>
<blockquote>
<ul>
<li>Background Concepts and Terminology</li>
<li>Components of Face Recognition
<ul>
<li>Data Preprocessing</li>
<li>Deep Feature Extraction<br>
Network Architecture<br>
Loss Function<br>
Similarity Comparison</li>
</ul>
</li>
<li>Databases of Face Recognition</li>
<li>Real-World Scenes
<ul>
<li>Cross-factor FR</li>
<li>Hetorogenous FR</li>
<li>Multiple (or single) media FR</li>
<li>FR in industry</li>
</ul>
</li>
</ul>
</blockquote>
<p>详细见博文 <a href="/2018/07/07/essay-deep-face-recognition-survey">论文 Deep Face Recognition - A Survey</a>。</p>
<h2 id="-deep-feature-extraction"><a class="markdownIt-Anchor" href="#-deep-feature-extraction"></a> -&gt; -&gt; Deep Feature Extraction</h2>
<ul>
<li><a href="/2018/07/07/essay-deep-face-recognition-survey/#background-concepts-and-terminology">综述 Survey</a></li>
</ul>
<h3 id="network-architecture"><a class="markdownIt-Anchor" href="#network-architecture"></a> Network Architecture</h3>
<p>人脸识别主要有两种思路。</p>
<ul>
<li>一种是直接转换为图像分类任务,每一类对应一个人的多张照片,比较有代表性的方法有DeepFace、DeepID等。</li>
<li>另一种则将识别转换为度量学习问题,通过特征学习使得来自同一个人的不同照片距离比较近、不同的人的照片距离比较远,比较有代表性的方法有DeepID2、FaceNet等。</li>
</ul>
<p>选取一些详细分析:</p>
<ul>
<li>图像分类任务
<ul>
<li>DeepFace</li>
<li><a href="/2018/08/01/essay-deepid/">DeepID</a></li>
</ul>
</li>
<li>度量学习问题
<ul>
<li><a href="/2018/07/31/essay-facenet/">FaceNet</a></li>
</ul>
</li>
</ul>
<h3 id="real-world-scenes"><a class="markdownIt-Anchor" href="#real-world-scenes"></a> Real-World Scenes</h3>
<ul>
<li>Cross-Pose Face Recognition</li>
<li>Cross-Age Face Recognition</li>
<li>Makeup Face Recognition</li>
<li>NIR-VIS Face Recognition</li>
<li>Low-Resolution Face Recognition
<ul>
<li><a href="/2018/09/02/essay-two-branch-dcnn/">Two-Branch DCNN</a></li>
</ul>
</li>
<li>Photo-Sketch Face Recognition</li>
<li>Low-Shot Face Recognition</li>
<li>Set/Template-Based Face Recognition</li>
<li>Video Face Recognition</li>
</ul>
<h3 id="industry-concerns"><a class="markdownIt-Anchor" href="#industry-concerns"></a> Industry Concerns</h3>
<ul>
<li>3D Face Recognition</li>
<li>Face Anti-spoofing</li>
<li>Face Recognition for Mobile Devices</li>
</ul>
<h2 id="ref"><a class="markdownIt-Anchor" href="#ref"></a> Ref</h2>
<p>[1] <a href="https://tech.meituan.com/deep_learning_image_recognition.html" target="_blank" rel="external">https://tech.meituan.com/deep_learning_image_recognition.html</a><br>
[2] <a href="https://arxiv.org/pdf/1804.06655.pdf" target="_blank" rel="external">https://arxiv.org/pdf/1804.06655.pdf</a></p>
</content>
<summary type="html">
人脸识别索引目录。
</summary>
<category term="ML & DL" scheme="http://conglang.github.io/categories/ML-DL/"/>
<category term="Index Page" scheme="http://conglang.github.io/tags/Index-Page/"/>
<category term="Deep Learning" scheme="http://conglang.github.io/tags/Deep-Learning/"/>
</entry>
<entry>
<title>Data Cleaning 数据清洗</title>
<link href="http://conglang.github.io/2018/07/21/ml-data-cleaning/"/>
<id>http://conglang.github.io/2018/07/21/ml-data-cleaning/</id>
<published>2018-07-21T07:26:48.000Z</published>
<updated>2018-08-12T06:48:57.000Z</updated>
<content type="html"><p>Data Cleaning and Transformation</p>
<ul>
<li>Missing and repeated values.</li>
<li>Cleaning outliers and errors.</li>
<li>Categorical to Numeric.</li>
<li>Scaling Data.</li>
</ul>
<h2 id="missing-and-repeated-values"><a class="markdownIt-Anchor" href="#missing-and-repeated-values"></a> Missing and Repeated Values</h2>
<p>Missing values and repeated values are common.<br>
Many ML algorithms don’t deal with missing values.<br>
Repeated values bias results.</p>
<h3 id="treating-missing-values"><a class="markdownIt-Anchor" href="#treating-missing-values"></a> Treating Missing Values</h3>
<ul>
<li>如果是不合适的值,可以用<code>df.loc[df[col]=='?', col] = np.nan</code>或替换为别的值。
<ol>
<li>The choice of method to fill NaN depends on the situation.<br>
-999, -1, etc<br>
mean, median<br>
Reconstruct value<br>
Interpolate values.<br>
Forward fill.<br>
Backward fill<br>
Impute</li>
<li>Binary feature “isnull” can be beneficial.</li>
<li>In general, avoid filling nans before feature generation.</li>
<li>Xgboost can handle NaN.</li>
</ol>
</li>
<li>如果有的列大多是空的,可以直接去掉。<code>df.drop(drop_list, axis = 1, inplace = True)</code></li>
<li>有空值的行可以直接去掉。<code>df.dropna(axis=0, inplace = True)</code></li>
<li>有些明显无用行,比如 id,可以去掉。</li>
</ul>
<h3 id="treating-repeated-values"><a class="markdownIt-Anchor" href="#treating-repeated-values"></a> Treating Repeated Values</h3>
<p>有没有<code>traintest.nunique(axis = 1) == 1</code><br>
<code>df.drop_duplicates(subset = '', inplace = True)</code></p>
<h2 id="outliers"><a class="markdownIt-Anchor" href="#outliers"></a> Outliers</h2>
<p><strong>Visualizing Outliers</strong><br>
Scatter plot matrix helps validate outliers.<br>
<code>pandas.tools.plotting.scatter_matrix</code></p>
<p><strong>Removing Outliers</strong><br>
<code>frame1 = frame1[(frame1['Col1] &gt; 40.0) &amp; ((frame1['Col2] &lt; 30.0) &amp; ((frame1['Col3] &gt; 3.0)]</code></p>
<h2 id="others"><a class="markdownIt-Anchor" href="#others"></a> Others</h2>
<p>see <a href="/2018/07/19/ml-feature-extraction/#numeric-feature">this</a>.</p>
<h2 id="ref"><a class="markdownIt-Anchor" href="#ref"></a> Ref</h2>
<p>[1] Edx - Data Science Essentials</p>
</content>
<summary type="html">
数据清洗。
</summary>
<category term="ML & DL" scheme="http://conglang.github.io/categories/ML-DL/"/>
<category term="Machine Learning" scheme="http://conglang.github.io/tags/Machine-Learning/"/>
</entry>
<entry>
<title>Exploratory Data Analysis 数据探索</title>
<link href="http://conglang.github.io/2018/07/19/ml-exploratory-data-analysis/"/>
<id>http://conglang.github.io/2018/07/19/ml-exploratory-data-analysis/</id>
<published>2018-07-19T14:36:49.000Z</published>
<updated>2018-08-12T06:49:41.000Z</updated>
<content type="html"><p>Do EDA first. Do not immediately dig into modelling.</p>
<ul>
<li>Get domain knowledge<br>
It helps to deeper understand the problem.</li>
<li>Check if the data is intuitive<br>
And agrees with domain knowledge.</li>
<li>Understand how the data was generated<br>
As it is crucial to set up a proper validation.</li>
</ul>
<h2 id="data-overview"><a class="markdownIt-Anchor" href="#data-overview"></a> Data Overview</h2>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">df.dtypes</span><br><span class="line">df.info()</span><br><span class="line">x.value_counts()</span><br><span class="line">x.isnull()</span><br><span class="line">df.head()</span><br><span class="line">df.shape</span><br></pre></td></tr></table></figure>
<h2 id="visualization-explained"><a class="markdownIt-Anchor" href="#visualization-explained"></a> Visualization Explained</h2>
<p><img src="/img/ml_plots_explained.png" alt="Image Loading"><br>
<img src="/img/ml_python_plotting.png" alt="Image Loading"><br>
<img src="/img/ml_pandas_ploting.png" alt="Image Loading"><br>
<img src="/img/ml_df_plot_type.png" alt="Image Loading"><br>
<img src="/img/ml_pandas_plot_options.png" alt="Image Loading"><br>
<img src="/img/ml_boxplot_explained.png" alt="Image Loading"></p>
<h2 id="visualization"><a class="markdownIt-Anchor" href="#visualization"></a> Visualization</h2>
<!-- Visualization tools to...
+ Explore individual features
Histogram
Plot (index vs value)
Statistics
+ Explore feature relations
+ Pairs
Scatter plot, scatter matrix
Corrplot
+ Groups
Corrplot + clustering
Plot (index vs feature statistics) -->
<h3 id="statistics"><a class="markdownIt-Anchor" href="#statistics"></a> statistics</h3>
<h4 id="statistics-2"><a class="markdownIt-Anchor" href="#statistics-2"></a> statistics</h4>
<p><img src="/img/ml_feature_statistics.png" alt="Image Loading"></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">df.describe()</span><br><span class="line">x.mean()</span><br><span class="line">x.var()</span><br></pre></td></tr></table></figure>
<h4 id="boxplot-and-histogram"><a class="markdownIt-Anchor" href="#boxplot-and-histogram"></a> boxplot and histogram</h4>
<p><img src="/img/ml_feature_boxplot_histogram.png" alt="Image Loading"></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">plotstats</span><span class="params">(df, col)</span>:</span></span><br><span class="line"> <span class="keyword">import</span> matplotlib.pyplot <span class="keyword">as</span> plt</span><br><span class="line"> <span class="comment">## Setup for ploting two charts one over the other</span></span><br><span class="line"> fig, ax = plt.subplots(<span class="number">2</span>, <span class="number">1</span>, figsize = (<span class="number">12</span>, <span class="number">8</span>))</span><br><span class="line"> <span class="comment">## First a box plot</span></span><br><span class="line"> df.dropna().boxplot(col, ax = ax[<span class="number">0</span>], vert = <span class="keyword">False</span>, return_type = <span class="string">'dict'</span>)</span><br><span class="line"> <span class="comment">## Plot the histogram</span></span><br><span class="line"> temp = df[col].as_matrix()</span><br><span class="line"> ax[<span class="number">1</span>].hist(temp, bins = <span class="number">30</span>, alpha = <span class="number">0.7</span>)</span><br><span class="line"> plt.ylabel(<span class="string">'Number of Cars'</span>)</span><br><span class="line"> plt.xlabel(col)</span><br><span class="line"> <span class="keyword">return</span> [col]</span><br></pre></td></tr></table></figure>
<h3 id="bar-plot-the-categorical-features"><a class="markdownIt-Anchor" href="#bar-plot-the-categorical-features"></a> Bar Plot the Categorical Features</h3>
<p>比例比绝对数目重要。<br>
<img src="/img/ml_barplot_categ.png" alt="Image Loading"></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">## Plot categorical variables as bar plots</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">income_barplot</span><span class="params">(df)</span>:</span></span><br><span class="line"> <span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"> <span class="keyword">import</span> matplotlib.pyplot <span class="keyword">as</span> plt</span><br><span class="line"></span><br><span class="line"> cols = df.columns.tolist()[:<span class="number">-1</span>]</span><br><span class="line"> <span class="keyword">for</span> col <span class="keyword">in</span> cols:</span><br><span class="line"> <span class="keyword">if</span>(df.ix[:, col].dtype <span class="keyword">not</span> <span class="keyword">in</span> [np.int64, np.int32, np.float64]):</span><br><span class="line"> temp1 = df.ix[df[<span class="string">'income'</span>] == <span class="string">'&lt;=50K'</span>, col].value_counts()</span><br><span class="line"> temp0 = df.ix[df[<span class="string">'income'</span>] == <span class="string">'&gt;50K'</span>, col].value_counts()</span><br><span class="line"></span><br><span class="line"> ylim = [<span class="number">0</span>, max(max(temp1), max(temp0))]</span><br><span class="line"> fig = plt.figure(figsize = (<span class="number">12</span>, <span class="number">6</span>))</span><br><span class="line"> fig.clf()</span><br><span class="line"> ax1 = fig.add_subplot(<span class="number">1</span>, <span class="number">2</span>, <span class="number">1</span>)</span><br><span class="line"> ax0 = fig.add_subplot(<span class="number">1</span>, <span class="number">2</span>, <span class="number">2</span>)</span><br><span class="line"> temp1.plot(kind = <span class="string">'bar'</span>, ax = ax1, ylim = ylim)</span><br><span class="line"> ax1.set_title(<span class="string">'Values of '</span> + col + <span class="string">'\n for income &lt;= 50K'</span>)</span><br><span class="line"> temp0.plot(kind = <span class="string">'bar'</span>, ax = ax0, ylim = ylim)</span><br><span class="line"> ax0.set_title(<span class="string">'Values of '</span> + col + <span class="string">'\n for income &gt; 50K'</span>)</span><br><span class="line"> <span class="keyword">return</span>(<span class="string">'Done'</span>)</span><br><span class="line"></span><br><span class="line">income_barplot(income)</span><br></pre></td></tr></table></figure>
<h3 id="box-plot-the-numeric-features-conditioned-on-the-label-value"><a class="markdownIt-Anchor" href="#box-plot-the-numeric-features-conditioned-on-the-label-value"></a> Box Plot the Numeric Features, Conditioned on the Label Value.</h3>
<p><img src="/img/ml_barplot_numeric_condition.png" alt="Image Loading"></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">## Plot categorical variables as box plots</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">income_boxplot</span><span class="params">(df)</span>:</span></span><br><span class="line"> <span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"> <span class="keyword">import</span> matplotlib.pyplot <span class="keyword">as</span> plt</span><br><span class="line"></span><br><span class="line"> cols = df.columns.tolist()[:<span class="number">-1</span>]</span><br><span class="line"> <span class="keyword">for</span> col <span class="keyword">in</span> cols:</span><br><span class="line"> <span class="keyword">if</span>(df[col].dtype <span class="keyword">in</span> [np.int64, np.int32, np.float64]):</span><br><span class="line"> fig = plt.figure(figsize = (<span class="number">6</span>, <span class="number">6</span>))</span><br><span class="line"> fig.clf()</span><br><span class="line"> ax = fig.gca()</span><br><span class="line"> df.boxplot(column = [col], ax = ax, byk = [<span class="string">'income'</span>])</span><br><span class="line"> <span class="keyword">return</span> (<span class="string">'Done'</span>)</span><br><span class="line"></span><br><span class="line">income_boxplot(income)</span><br></pre></td></tr></table></figure>
<h3 id="pair-wise-scatter-plot"><a class="markdownIt-Anchor" href="#pair-wise-scatter-plot"></a> Pair-Wise Scatter Plot</h3>
<p>大致看看每列之间的关系(不过这种是针对Regression问题的)。用seaborn包的pairplot。<br>
<img src="/img/ml_pair_wise_scatter_plot.png" alt="Image Loading"></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> seaborn <span class="keyword">as</span> sns</span><br><span class="line">num_cols = [<span class="string">'length'</span>, <span class="string">'curb-weight'</span>, <span class="string">'engine-size'</span>, <span class="string">'horsepower'</span>, <span class="string">'city-mpg'</span>, <span class="string">'price'</span>, <span class="string">'fuel-type'</span>]</span><br><span class="line">sns.pairplot(auto_price[num_cols], size = <span class="number">2</span>)</span><br></pre></td></tr></table></figure>
<h3 id="conditioned-histograms"><a class="markdownIt-Anchor" href="#conditioned-histograms"></a> Conditioned Histograms</h3>
<p>一般是一个数据值以一个 categorical 为 condition 的 histogram。<br>
<img src="/img/ml_cond_hists.png" alt="Image Loading"></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">## Function to plot conditioned histograms</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">cond_hists</span><span class="params">(df, plot_cols, grid_col)</span>:</span></span><br><span class="line"> <span class="keyword">import</span> matplotlib.pyplot <span class="keyword">as</span> plt</span><br><span class="line"> <span class="keyword">import</span> seaborn <span class="keyword">as</span> sns</span><br><span class="line"> <span class="comment">## Loop over the list of columns</span></span><br><span class="line"> <span class="keyword">for</span> col <span class="keyword">in</span> plot_cols:</span><br><span class="line"> grid1 = sns.FacetGrid(df, col = grid_col)</span><br><span class="line"> grid1.map(plt.hist, col, alpha = <span class="number">.7</span>)</span><br><span class="line"> <span class="keyword">return</span> grid_col</span><br><span class="line"></span><br><span class="line"><span class="comment">## Define columns for making a conditioned histogram</span></span><br><span class="line">plot_cols = [<span class="string">'length'</span>, <span class="string">'curb-weight'</span>, <span class="string">'engine-size'</span>, <span class="string">'city-mpg'</span>, <span class="string">'price'</span>]</span><br><span class="line">cond_hists(auto_price, plot_cols, <span class="string">'drive-wheels'</span>)</span><br></pre></td></tr></table></figure>
<h3 id="conditioned-box-plot"><a class="markdownIt-Anchor" href="#conditioned-box-plot"></a> Conditioned Box Plot</h3>
<p><img src="/img/ml_cond_boxplot.png" alt="Image Loading"></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">## Create boxplots of data</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">auto_boxplot</span><span class="params">(df, plot_cols, by)</span>:</span></span><br><span class="line"> <span class="keyword">import</span> matplotlib.pyplot <span class="keyword">as</span> plt</span><br><span class="line"> <span class="keyword">for</span> col <span class="keyword">in</span> plot_cols:</span><br><span class="line"> fig = plt.figure(figsize = (<span class="number">9</span>, <span class="number">6</span>))</span><br><span class="line"> ax = fig.gca()</span><br><span class="line"> df.boxplot(column = col, by = by, ax = ax)</span><br><span class="line"> ax.set_title(<span class="string">'Box plots of '</span> + col + <span class="string">' by '</span> + by)</span><br><span class="line"> ax.set_ylabel(col)</span><br><span class="line"> <span class="keyword">return</span> by</span><br><span class="line"></span><br><span class="line">auto_boxplot(auto_price, plot_cols2, <span class="string">'drive-wheels'</span>)</span><br></pre></td></tr></table></figure>
<h3 id="scatter-plot"><a class="markdownIt-Anchor" href="#scatter-plot"></a> Scatter Plot</h3>
<p>通过使用颜色,可以在二维plot上看看三维信息<br>
<img src="/img/ml_scatterplot.png" alt="Image Loading"></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">## Create scatter plot</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">auto_scatter</span><span class="params">(df, plot_cols)</span>:</span></span><br><span class="line"> <span class="keyword">import</span> matplotlib.pyplot <span class="keyword">as</span> plt</span><br><span class="line"> <span class="keyword">for</span> col <span class="keyword">in</span> plot_cols:</span><br><span class="line"> fig = plt.figure(figsize = (<span class="number">8</span>, <span class="number">8</span>))</span><br><span class="line"> ax = fig.gca()</span><br><span class="line"> temp1 = df.ix[df[<span class="string">'fuel-type'</span>] == <span class="string">'gas'</span>]</span><br><span class="line"> temp2 = df.ix[df[<span class="string">'fuel-type'</span>] == <span class="string">'diesel'</span>]</span><br><span class="line"> <span class="keyword">if</span> temp1.shape[<span class="number">0</span>] &gt; <span class="number">0</span>:</span><br><span class="line"> temp1.plot(kind = <span class="string">'scatter'</span>, x = col, y = <span class="string">'price'</span>, ax = ax, color = <span class="string">'DarkBlue'</span>)</span><br><span class="line"> <span class="keyword">if</span> temp2.shape[<span class="number">0</span>] &gt; <span class="number">0</span>:</span><br><span class="line"> temp2.plot(kind = <span class="string">'scatter'</span>, x = col, y = <span class="string">'price'</span>, ax = ax, color = <span class="string">'Red'</span>)</span><br><span class="line"> ax.set_title(<span class="string">'Scatter plot of price vs. '</span> + col)</span><br><span class="line"> <span class="keyword">return</span> plot_cols</span><br><span class="line"></span><br><span class="line"><span class="comment">## Define columns for making scatter plots</span></span><br><span class="line">plot_cols = [<span class="string">'length'</span>, <span class="string">'curb-weight'</span>, <span class="string">'engine-size'</span>, <span class="string">'city-mpg'</span>]</span><br><span class="line">auto_scatter(auto_price, plot_cols)</span><br></pre></td></tr></table></figure>
<h3 id="conditioned-scatterplot"><a class="markdownIt-Anchor" href="#conditioned-scatterplot"></a> Conditioned Scatterplot</h3>
<p>较难解读。<br>
<img src="/img/ml_cond_scatterplot.png" alt="Image Loading"></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">cond_plot</span><span class="params">(cols)</span>:</span></span><br><span class="line"> <span class="keyword">import</span> Ipython.html.widgets</span><br><span class="line"> <span class="keyword">import</span> seaborn <span class="keyword">as</span> sns</span><br><span class="line"> <span class="keyword">for</span> col <span class="keyword">in</span> cols:</span><br><span class="line"> g = sns.FacetGrid(auto_price, col = <span class="string">'num-cylinders'</span>, row = <span class="string">'body-style'</span>, hue = <span class="string">'fuel-type'</span>, palette = <span class="string">'Set2'</span>, margin_title = <span class="keyword">True</span>)</span><br><span class="line"> g.map(sns.regplot, col, <span class="string">'price'</span>, fit_reg = <span class="keyword">False</span>)</span><br><span class="line"></span><br><span class="line">cond_plot(plot_cols3)</span><br></pre></td></tr></table></figure>
<h3 id="t-test"><a class="markdownIt-Anchor" href="#t-test"></a> t-test</h3>
<p>(对于两个来源非常相近的值,比如妈妈和女儿的身高,可以用t-test比较两者的mean是否有显著差异。)会用到statsmodels.stats.weightstats来计算two-sided t statistics。<br>
<img src="/img/ml_t_test.png" alt="Image Loading"><br>
<img src="/img/ml_t_test_explain.png" alt="Image Loading"><br>
<img src="/img/ml_t_test_code.png" alt="Image Loading"></p>
<!--
### Explore individual features
#### Feature Statistics
#### Histogram
![Image Loading](/img/ml_histogram.png)
#### Plot
![Image Loading](/img/ml_plot.png)
![Image Loading](/img/ml_scatter.png)
#### Other
![Image Loading](/img/ml_other_eda_visual.png)
### Explore feature relations: pairs/groups
![Image Loading](/img/ml_scatter_multi.png)
![Image Loading](/img/ml_scatter_matrix.png)
![Image Loading](/img/ml_matshow.png)
![Image Loading](/img/ml_matshow_group.png)
![Image Loading](/img/ml_plot_group.png)