-
Notifications
You must be signed in to change notification settings - Fork 0
/
search.xml
1842 lines (1842 loc) · 262 KB
/
search.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>CS144-Lab0</title>
<url>/2023/09/04/CS144-Lab0/</url>
<content><![CDATA[<p>本文基于<a href="https://cs144.github.io/assignments/check0.pdf">指导文档</a>进行编写。</p>
<p>CS144 的 Lab0 主要分为三部分</p>
<ul>
<li>第一部分是 VM 的安装/使用</li>
<li>第二部分则是 telnet 等网络程序的尝试</li>
<li>第三部分则是写一个基于 OS 自带 socket 库的网络程序和实现一个简单 ByteStream</li>
</ul>
<p>第一部分可以略过。</p>
<p>第二部分则主要是介绍<strong>telnet</strong>和<strong>telcat</strong>,其中<strong>telnet</strong>的作用就是建立 connection, 并用不同协议进行通信。 <strong>netcat</strong>则是用于建立 client/server 一类的 end-to-end 的端。</p>
<p>首先用</p>
<figure class="highlight sh"><table><tr><td class="code"><pre><span class="line">telcat 9091</span><br></pre></td></tr></table></figure>
<p>建立一个对于 9091 端口的监听 socket, 然后打开另一个终端用</p>
<figure class="highlight sh"><table><tr><td class="code"><pre><span class="line">telnet localhost 9091</span><br></pre></td></tr></table></figure>
<p>连接到该端口,此时 telcat 的窗口就会显示连接信息。</p>
<p>重点是第三部分的实验。这个实验将利用 Linux 的 Socket 构建一个基于 TCP 的程序,要求该程序可以连接到 Web Server,并抓取一个界面。</p>
<p>这里有些需要注意的要点:</p>
<ul>
<li>在 HTTP 协议中每行必须以‘’结尾</li>
<li>不能漏了‘Connection: closed’, 不然进程会一直等待</li>
</ul>
<p>代码如下:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">get_URL</span><span class="params">( <span class="type">const</span> string& host, <span class="type">const</span> string& path )</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> TCPSocket sock1;</span><br><span class="line"> Address addr = <span class="built_in">Address</span>( host, <span class="string">"http"</span> );</span><br><span class="line"> sock1.<span class="built_in">connect</span>( addr );</span><br><span class="line"> sock1.<span class="built_in">write</span>( <span class="string">"GET "</span> + path + <span class="string">" "</span> + <span class="string">"HTTP/1.1\r\nHost: "</span> + host + <span class="string">"\r\nConnection: close\r\n\r\n"</span> );</span><br><span class="line"> <span class="keyword">while</span> ( <span class="number">1</span> ) {</span><br><span class="line"> string recv;</span><br><span class="line"> sock1.<span class="built_in">read</span>( recv );</span><br><span class="line"> cout << recv;</span><br><span class="line"> <span class="keyword">if</span> ( sock1.<span class="built_in">eof</span>() )</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> }</span><br><span class="line"> sock1.<span class="built_in">close</span>();</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>在 ByteStream 部分,要实现对于数据的读写,笔者主要是基于<code>std::queue</code>实现的缓存, 主要难点在于 peek 函数,参考网上代码后,发现 string_view 必须像下列代码一样初始化:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="function">string_view <span class="title">Reader::peek</span><span class="params">()</span> <span class="type">const</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="keyword">return</span> { &buffer.<span class="built_in">front</span>(), <span class="number">1</span> };</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<h3 id="优化部分">优化部分</h3>
<p>用 string_view 和 move 实现移动语义:</p>
<p>两个队列存数据和引用</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line">std::queue<std::string_view> buffer;</span><br><span class="line">std::queue<std::string> buffer_actual;</span><br></pre></td></tr></table></figure>
<p>Reader 的 pop 则要分类讨论:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">Reader::pop</span><span class="params">( <span class="type">uint64_t</span> len )</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> bytesPopped += len;</span><br><span class="line"> <span class="keyword">for</span> ( <span class="type">unsigned</span> i = <span class="number">0</span>; i < len; ) {</span><br><span class="line"> <span class="keyword">if</span> ( buffer.<span class="built_in">front</span>().<span class="built_in">size</span>() > len - i ) {</span><br><span class="line"> buffer.<span class="built_in">front</span>() = buffer.<span class="built_in">front</span>().<span class="built_in">substr</span>( len - i );</span><br><span class="line"> i = len;</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> i += buffer.<span class="built_in">front</span>().<span class="built_in">size</span>();</span><br><span class="line"> buffer.<span class="built_in">pop</span>();</span><br><span class="line"> buffer_actual.<span class="built_in">pop</span>();</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">while</span> ( !buffer.<span class="built_in">empty</span>() && buffer.<span class="built_in">front</span>().<span class="built_in">empty</span>() )</span><br><span class="line"> buffer.<span class="built_in">pop</span>();</span><br><span class="line"> bytesBuffered -= len;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>最后优化的结果: <img src="/images/CS144_Lab0.png" alt="img" /></p>
]]></content>
<categories>
<category>Network</category>
</categories>
<tags>
<tag>Network</tag>
<tag>Algorithm</tag>
<tag>CS144</tag>
</tags>
</entry>
<entry>
<title>CS144-Lab2</title>
<url>/2023/09/12/CS144-Lab2/</url>
<content><![CDATA[<p>CS144 Lab2的主要任务是完成一个TCP Receiver,在TCP协议中每一个端系统都会有两个角色: <strong>Sender</strong>和<strong>Receiver</strong>,这个Lab的主要研究对象就是后者了。</p>
<p>而Receiver要完成几个任务: - 从Sender接受数据 - Reassemble 这些数据(在Lab1已经完成) - 决定是否把<strong>Acknowledgement</strong>和<strong>Flow-Control</strong>的数据send back</p>
<p>注意, <strong>Acknowledgement</strong> 表示的是Receiver所需要下一个byte的index, <strong>Flow-Control</strong> 表示的则是Receiver想获取多少数据。</p>
<h2 id="转换64位和32位的seqnos">转换64位和32位的seqnos</h2>
<p>众所周知,64位非常大,以至于可以认为其永远不会溢出,但32位最大只有4GB,这意味着32位的地址可能会不够用。 而TCP header中,seqno是用32位来表示,也就是说为了节省空间,每份sequence的地址都是32位寻址的。</p>
<p>这导致了TCP的一些机制: - 一旦32位的sequence number积累到 <span class="math inline">\(2^{32} - 1\)</span>,下一字节的index就变成了0。 - 为了提高TCP的健壮性并避免在同一端点之间的早期连接中混淆旧的数据段,TCP试图确保序列号不易被猜测并且不太可能重复。 因此,流的TCP sequences number不从零开始。流中的第一个序列号是一个随机的32位数字,称为初始序列号(<span class="math inline">\(ISN\)</span>)。 这是表示“零点”或<span class="math inline">\(SYN\)</span>(流的开始)的序列号。之后的序列号行为与正常情况下相同: 数据的第一个字节将具有<span class="math inline">\(ISN + 1\mod 2^{32}\)</span>的序列号,第二个字节将具有<span class="math inline">\(ISN + 2\mod 2^{32}\)</span>的序列号,依此类推。 - (懒得翻译直接粘贴了)The logical beginning and ending each occupy one sequence number: In addition to ensuring the receipt of all bytes of data, TCP makes sure that the beginning and ending of the stream are received reliably. Thus, in TCP the SYN (beginning-ofstream) and FIN (end-of-stream) control flags are assigned sequence numbers. Each of these occupies one sequence number. (The sequence number occupied by the SYN flag is the ISN.) Each byte of data in the stream also occupies one sequence number. Keep in mind that SYN and FIN aren’t part of the stream itself and aren’t “bytes”—they represent the beginning and ending of the byte stream itself.</p>
<p>总之我们要实现一个<code>Wrap32</code>类来进行有关转换,基本代码如下: <figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="function">Wrap32 <span class="title">Wrap32::wrap</span><span class="params">( <span class="type">uint64_t</span> n, Wrap32 zero_point )</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="keyword">return</span> zero_point + n;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">uint64_t</span> <span class="title">Wrap32::unwrap</span><span class="params">( Wrap32 zero_point, <span class="type">uint64_t</span> checkpoint )</span> <span class="type">const</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="type">uint64_t</span> cycle = <span class="number">1ll</span> << <span class="number">32</span>;</span><br><span class="line"> <span class="type">uint64_t</span> n_cycle = checkpoint / cycle;</span><br><span class="line"> <span class="type">uint64_t</span> diff = raw_value_ - zero_point.raw_value_;</span><br><span class="line"> <span class="type">uint64_t</span> upper = ( n_cycle + <span class="number">1ll</span> ) * cycle + diff;</span><br><span class="line"> <span class="type">uint64_t</span> middle = n_cycle * cycle + diff;</span><br><span class="line"> <span class="type">uint64_t</span> lower = ( n_cycle - <span class="number">1ll</span> ) * cycle + diff;</span><br><span class="line"> <span class="keyword">if</span> ( ( ( n_cycle == <span class="number">0</span> && cycle <= diff ) || n_cycle != <span class="number">0</span> ) && checkpoint <= ( lower + middle ) / <span class="number">2</span> )</span><br><span class="line"> <span class="keyword">return</span> lower;</span><br><span class="line"> <span class="keyword">if</span> ( checkpoint <= ( middle + upper ) / <span class="number">2</span> )</span><br><span class="line"> <span class="keyword">return</span> middle;</span><br><span class="line"> <span class="keyword">else</span></span><br><span class="line"> <span class="keyword">return</span> upper;</span><br><span class="line">}</span><br></pre></td></tr></table></figure></p>
<p>说实话这里我debug了很久,主要是没有考虑 <span class="math inline">\(lower < 0\)</span> 的情况。</p>
<p>然后是receiver的代码: <figure class="highlight c++"><table><tr><td class="code"><pre><span class="line">TCPReceiver::<span class="built_in">TCPReceiver</span>() : <span class="built_in">ISN</span>( <span class="literal">nullopt</span> ), <span class="built_in">FIN</span>( <span class="literal">false</span> ) {}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">TCPReceiver::receive</span><span class="params">( TCPSenderMessage message, Reassembler& reassembler, Writer& inbound_stream )</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="keyword">if</span> ( message.SYN )</span><br><span class="line"> ISN = message.seqno;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> ( !ISN.<span class="built_in">has_value</span>() )</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> ( message.FIN )</span><br><span class="line"> FIN = <span class="literal">true</span>;</span><br><span class="line"></span><br><span class="line"> reassembler.<span class="built_in">insert</span>( message.seqno.<span class="built_in">unwrap</span>( ISN.<span class="built_in">value</span>(), reassembler.<span class="built_in">bytes_pending</span>() ) + message.SYN - <span class="number">1ll</span>,</span><br><span class="line"> message.payload,</span><br><span class="line"> message.FIN,</span><br><span class="line"> inbound_stream );</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function">TCPReceiverMessage <span class="title">TCPReceiver::send</span><span class="params">( <span class="type">const</span> Writer& inbound_stream )</span> <span class="type">const</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> (<span class="type">void</span>)inbound_stream;</span><br><span class="line"> TCPReceiverMessage ret;</span><br><span class="line"> <span class="keyword">if</span> ( !ISN.<span class="built_in">has_value</span>() )</span><br><span class="line"> ret.ackno = <span class="literal">nullopt</span>;</span><br><span class="line"> <span class="keyword">else</span></span><br><span class="line"> <span class="comment">// +1 for the SYN flag, and finish only when FIN flag reached and stream is closed.</span></span><br><span class="line"> ret.ackno</span><br><span class="line"> = Wrap32::<span class="built_in">wrap</span>( inbound_stream.<span class="built_in">bytes_pushed</span>() + <span class="number">1</span> + ( FIN && inbound_stream.<span class="built_in">is_closed</span>() ), ISN.<span class="built_in">value</span>() );</span><br><span class="line"></span><br><span class="line"> ret.window_size = <span class="built_in">min</span>( inbound_stream.<span class="built_in">available_capacity</span>(), (<span class="type">uint64_t</span>)UINT16_MAX );</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> ret;</span><br><span class="line">}</span><br></pre></td></tr></table></figure></p>
<p>逻辑很简单,就是要处理 <span class="math inline">\(SYN\)</span> 和 <span class="math inline">\(FIN\)</span> 的情况。</p>
]]></content>
<categories>
<category>Network</category>
</categories>
<tags>
<tag>Network</tag>
<tag>CS144</tag>
</tags>
</entry>
<entry>
<title>CS144-Lab3</title>
<url>/2023/09/17/CS144-Lab3/</url>
<content><![CDATA[<p>Lab3 接续 Lab2,要完成<strong>Sender</strong>的角色。 TCP 的任务主要为:</p>
<ul>
<li>Keep track of the receiver's window (acknos and window size)</li>
<li>Fill the window when possible, by reading from the <em>ByteStream</em>, creating new TCP segments (including <em>SYN</em> and <em>FIN</em> flags if needed), and sending them.</li>
<li>Keep track of which segments have been sent but not yet acknowledged by the receiver — we call these “<strong>outstanding</strong>” segments</li>
<li>Re-send <strong>outstanding</strong> segments if <strong>enough time passes</strong> since they were sent, and they haven’t been acknowledged yet</li>
</ul>
<blockquote>
<p>Why am I doing this? The basic principle is to send whatever the receiver will allow</p>
<p>us to send (filling the window), and keep retransmitting until the receiver acknowledges</p>
<p>each segment. This is called “automatic repeat request” (ARQ).</p>
</blockquote>
<h2 id="那么tcpsender是怎么时候知道一段segment丢失了呢">那么TCPSender是怎么时候知道一段segment丢失了呢?</h2>
<p>Sender会记录每一个outstanding segment直到收到receiver的ackno。而如果一个segment outstand了太久的话, 我们就需要将其重新发送一遍。</p>
<p>当然,这里有些关于"outstanding for too long"的原则,但Lab3不会让我们解决一些tricky或者过于文字游戏的问题 (留在Lab4)。</p>
<p>这里有几个要点: - Sender的<strong>tick</strong>函数是唯一一个你可以用的,关于时间的函数。其他对于CPU/OS的调用都是被禁止的。 - Sender会被设置一个<strong>retransmission timeout (RTO)</strong>。这个就是我们resend segment的时长了。 - 我们需要自己实现retransmission timer,<strong>基于tick</strong>。 - 每个包含数据的segment被发送时,若timer没有运行,就启动timer。 - 当所有outstanding data被acknowledged了,停止timer。</p>
<p>在这里我们可以讨论一下RTO和Retransmission timer。 首先,当有带数据的segment被发送时,我们要让timer run起来。 当tick时若timer超时,则: - 把segno最低的重发一遍 - 若window大小不为0,则: - retransmission num ++ (timer stop的时候置零) - RTO *= 2, 这是根据流量调整速率的 - reset timer and start it</p>
<p>除此之外<span class="math inline">\(FIN\)</span>的处理也有点dirty,实现的时候要注意一下。</p>
<p>Lab4和Lab5比较简单,就不记录了,一个是IP/Ethernet以及ARP的NetworkInterface实现,一个则是Router的跳转表实现, 不需要太动脑子。</p>
]]></content>
<categories>
<category>Network</category>
</categories>
<tags>
<tag>Network</tag>
<tag>CS144</tag>
</tags>
</entry>
<entry>
<title>Cache Performance Analysis</title>
<url>/2023/08/07/Cache-Performance-Analysis/</url>
<content><![CDATA[<h2 id="some-concepts">Some Concepts</h2>
<p>AMAT: Average memory access time. <span class="math inline">\(AMAT = t_{hit} + rate_{missed} * penalty_{missed}\)</span></p>
<h2 id="cache-miss">Cache Miss</h2>
<p>Sources of Cache Misses:</p>
<ul>
<li>Compulsory: (Like cold start, process migration, 1st reference)</li>
<li>Capacity</li>
<li>Conflict (Collison)</li>
</ul>
<p>The Design Solutions:</p>
<ul>
<li>Compulsory:
<ul>
<li>Increase block size</li>
</ul></li>
<li>Capacity:
<ul>
<li>Increase cache size</li>
</ul></li>
<li>Conflict:
<ul>
<li>Increase associativity (may increase hit-time)</li>
</ul></li>
</ul>
<h2 id="miss-penalty">Miss Penalty</h2>
<p>Factors:</p>
<ul>
<li>How big is your memory architecture</li>
<li>How big is your block size</li>
</ul>
<h2 id="multiple-cache-levels">Multiple Cache Levels</h2>
<p>To minimize AMAT, we need to adjust the type/parameters of cache. But it's hard to reduce hit time, miss rate and miss penalty at once.</p>
<p>Multiple Cache Levels resolves this.</p>
<p>In general, L1 focuses on low hit time, L2,L3 focus on low miss rate. However, there is also big write back cost for such design.</p>
<h2 id="the-cache-design-space">The Cache Design Space</h2>
<ul>
<li>Cache parameters</li>
<li>Policy choices (Rewrite, Replacement)</li>
<li>Optimal choice is a compromise</li>
<li>Simplicity often wins</li>
</ul>
]]></content>
<categories>
<category>Architecture</category>
</categories>
<tags>
<tag>Architecture</tag>
<tag>CS61C</tag>
</tags>
</entry>
<entry>
<title>CS144-Lab1</title>
<url>/2023/09/12/CS144-Lab1/</url>
<content><![CDATA[<p>CS144 Lab1的主要任务是完成<strong>TCP</strong>的<strong>Reassembler</strong>,其主要功能为: <figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">Reassembler::insert</span><span class="params">( <span class="type">uint64_t</span> first_index, string data, <span class="type">bool</span> is_last_substring, Writer& output )</span></span>;</span><br></pre></td></tr></table></figure> 其中<code>first_index</code>是数据的逻辑下标,也就是数据到达的顺序,<code>is_last_substring</code>标识了该数据是否代表了最后一份数据。 而<code>output</code>明显就是输出的字节流。</p>
<p>TCP的数据流一般由以下部分组成:</p>
<p><code>| popped data | unpopped-and-pushed data | arrived-and-unpushed data |</code></p>
<p>其中<strong>popped data</strong>已经被Reader获取,而<strong>unpopped-and-pushed data</strong>已经被Writer写进缓存, 但Reader暂时还未读取,最后的<strong>arrived-and-unpushed data</strong>是从网络接收,尚未组装传入Writer,非连续的数据, 是本Lab的核心工作对象。</p>
<p>计算机网络的特性决定了:不同的数据到来顺序是乱序的,他们之间可能有重叠(overlapping),而且到来的数据可能已经被push, 而Reassembler要解决这些问题,提供可靠的<strong>流服务(Reliable Flow)</strong>。</p>
<h2 id="设计思路">设计思路</h2>
<p>我的基本想法是用一个类似char数组的缓存存储<strong>arrived-and-unpushed</strong>的数据, 然后用一个<code>map<int,int></code>存储已经到达的数据的index区间 <span class="math inline">\([l,r]\)</span>,在数据到达时进行区间的合并, 这一问题和经典算法题<strong>插入区间</strong>一致。</p>
<p>其他的逻辑比较简单,主要是: - <code>first_index</code>若是arrived-and-unpushed data的首地址,要直接push - 若是空字符则省略,但若有last_string的标识,则要把writer关闭 - 一部分data在push之后,buf之后的数据要往前推(有优化空间?)</p>
<p>Reassembler的成员如下: <figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">private</span>:</span><br><span class="line"> std::map<<span class="type">uint64_t</span>, <span class="type">uint64_t</span>> buffer;</span><br><span class="line"> std::string buf;</span><br><span class="line"> <span class="type">uint64_t</span> end_index;</span><br><span class="line"> <span class="type">uint64_t</span> pending;</span><br></pre></td></tr></table></figure></p>
<p>实现代码如下: <figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">Reassembler::insert</span><span class="params">( <span class="type">uint64_t</span> first_index, string data, <span class="type">bool</span> is_last_substring, Writer& output )</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="comment">// Your code here.</span></span><br><span class="line"> (<span class="type">void</span>)first_index;</span><br><span class="line"> (<span class="type">void</span>)data;</span><br><span class="line"> (<span class="type">void</span>)is_last_substring;</span><br><span class="line"> (<span class="type">void</span>)output;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> ( buf.<span class="built_in">empty</span>() )</span><br><span class="line"> buf.<span class="built_in">resize</span>( output.<span class="built_in">capacity</span>() );</span><br><span class="line"></span><br><span class="line"> <span class="type">uint64_t</span> bias_push = output.<span class="built_in">bytes_pushed</span>();</span><br><span class="line"> <span class="type">uint64_t</span> insert_l = <span class="built_in">max</span>( output.<span class="built_in">bytes_pushed</span>(), first_index );</span><br><span class="line"> <span class="type">uint64_t</span> insert_r = <span class="built_in">min</span>( first_index + data.<span class="built_in">size</span>() - <span class="number">1</span>, output.<span class="built_in">available_capacity</span>() + bias_push - <span class="number">1</span> );</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> ( is_last_substring )</span><br><span class="line"> end_index = first_index + data.<span class="built_in">size</span>() - <span class="number">1</span>;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> ( data.<span class="built_in">empty</span>() && is_last_substring ) {</span><br><span class="line"> output.<span class="built_in">close</span>();</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> ( insert_l - first_index >= data.<span class="built_in">size</span>() )</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> <span class="keyword">if</span> ( insert_l > insert_r )</span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> ( <span class="type">uint64_t</span> i = insert_l; i <= insert_r; i++ ) {</span><br><span class="line"> buf[i - bias_push] = data[i - first_index];</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="type">bool</span> changed = <span class="literal">true</span>;</span><br><span class="line"> <span class="keyword">while</span> ( changed && !buffer.<span class="built_in">empty</span>() ) {</span><br><span class="line"> changed = <span class="literal">false</span>;</span><br><span class="line"> <span class="keyword">auto</span> upper = buffer.<span class="built_in">lower_bound</span>( insert_l );</span><br><span class="line"></span><br><span class="line"> <span class="comment">// upper.first >= l, compare [l,r] with [uf, us]</span></span><br><span class="line"> <span class="keyword">if</span> ( upper != buffer.<span class="built_in">end</span>() && insert_r + <span class="number">1</span> >= upper->first ) {</span><br><span class="line"></span><br><span class="line"> insert_r = <span class="built_in">max</span>( upper->second, insert_r );</span><br><span class="line"> pending -= upper->second - upper->first + <span class="number">1</span>;</span><br><span class="line"></span><br><span class="line"> buffer.<span class="built_in">erase</span>( upper );</span><br><span class="line"> changed = <span class="literal">true</span>;</span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> ( upper == buffer.<span class="built_in">begin</span>() || buffer.<span class="built_in">empty</span>() )</span><br><span class="line"> <span class="keyword">break</span>;</span><br><span class="line"> <span class="keyword">auto</span> lower = --upper;</span><br><span class="line"></span><br><span class="line"> <span class="comment">// lower.first < l, compare [lf, ls] with [l, r]</span></span><br><span class="line"> <span class="keyword">if</span> ( lower != buffer.<span class="built_in">end</span>() && lower->second + <span class="number">1</span> >= insert_l ) {</span><br><span class="line"></span><br><span class="line"> insert_l = lower->first;</span><br><span class="line"> insert_r = <span class="built_in">max</span>( lower->second, insert_r );</span><br><span class="line"> pending -= lower->second - lower->first + <span class="number">1</span>;</span><br><span class="line"></span><br><span class="line"> buffer.<span class="built_in">erase</span>( lower );</span><br><span class="line"> changed = <span class="literal">true</span>;</span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> ( insert_l == output.<span class="built_in">bytes_pushed</span>() ) {</span><br><span class="line"> <span class="type">uint64_t</span> old_bias = output.<span class="built_in">bytes_pushed</span>();</span><br><span class="line"> output.<span class="built_in">push</span>( buf.<span class="built_in">substr</span>( insert_l - output.<span class="built_in">bytes_pushed</span>(), insert_r - insert_l + <span class="number">1</span> ) );</span><br><span class="line"> bias_push = output.<span class="built_in">bytes_pushed</span>();</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> ( insert_r == end_index ) {</span><br><span class="line"> output.<span class="built_in">close</span>();</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">for</span> ( <span class="keyword">auto</span> it = buffer.<span class="built_in">begin</span>(); it != buffer.<span class="built_in">end</span>(); it++ )</span><br><span class="line"> <span class="keyword">for</span> ( <span class="type">uint64_t</span> i = it->first; i <= it->second; ++i )</span><br><span class="line"> buf[i - bias_push] = buf[i - old_bias];</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> buffer[insert_l] = insert_r;</span><br><span class="line"> pending += insert_r - insert_l + <span class="number">1</span>;</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">uint64_t</span> <span class="title">Reassembler::bytes_pending</span><span class="params">()</span> <span class="type">const</span></span></span><br><span class="line"><span class="function"></span>{</span><br><span class="line"> <span class="keyword">return</span> pending;</span><br><span class="line">}</span><br></pre></td></tr></table></figure></p>
<p>大部分的代码和插入区间问题一致,最后的性能指标为 1.78Gbit/s,有一定的优化空间。</p>
<p>我想用string先存储起来,最后进行合并的话性能会提高不少。</p>
]]></content>
<categories>
<category>Network</category>
</categories>
<tags>
<tag>Network</tag>
<tag>Algorithm</tag>
<tag>CS144</tag>
</tags>
</entry>
<entry>
<title>Different Caches</title>
<url>/2023/08/07/Different-Caches/</url>
<content><![CDATA[<h2 id="fully-associative-caches">Fully Associative Caches</h2>
<p>Basic implementation of cache. Omit it here. Note: The offset is determined by block size.</p>
<h2 id="direct-mapped-caches">Direct Mapped Caches</h2>
<p>For normal fully associative cache, we break down the address into: <code>[Tag | 31 ~ X bits] [Offset | X-1 ~ 0 bits]</code> which requires multiple tag checks.</p>
<p>So we design a direct-mapped cache inspired by hash table. Currently, we break down the address into: <code>[Tag | 31 ~ X bits] [Index | X-1 ~ Y bits] [Offset | Y-1 ~ 0 bits]</code> And the <strong>Index</strong> serves as the hashcode for the address, <strong>Tag</strong> as a identifier.</p>
<p>Here, for <em>Write-back Policy</em> cache, there are:</p>
<ul>
<li>Block of data</li>
<li>Index field</li>
<li>Tag field of address as identifier</li>
<li>Valid bit</li>
<li>Dirty bit</li>
<li>No replacement management bit</li>
</ul>
<p>every slot.</p>
<p>For example, for address 10010010, we can break it down into:</p>
<ul>
<li>Tag: 1001</li>
<li>Index: 001</li>
<li>Offset: 0</li>
</ul>
<p>Then look for something like <code>cache[Index] + Offset</code> to find avaliable data.</p>
<p>There are also some worst-case for such design. Since the multiple address is mapped into the same slot, We can consider the memory accesses: 00000010, 00010010, 00000010, 00010010, ... And all of the accesses will be missed.</p>
<p>But for fully associative cache, it only miss twice.</p>
<p>What <strong>direct-mapped</strong> outweighs <strong>fully-associative</strong> is its fast mapping.</p>
<h2 id="set-associative-caches">Set Associative Caches</h2>
<p><strong>N-way set-associative</strong>: divide $ into sets, each of which consists of N slots.</p>
<ul>
<li>Memory block maps to a set determined by <strong>Index</strong> field and is placed in any of the N slots of that set.</li>
<li>Call <span class="math inline">\(N\)</span> the associativity.</li>
<li>Replcaement policy applies to every set.</li>
</ul>
<p>Actually, from my perspective, Set Associative Cache is just a in-between of the two former.</p>
<p>Fully associative requires 0 index bits. Direct-mapped requires max index bits. Set-associative requires somewhere in-between.</p>
<p>Here is a screenshot from CS61C: <img src="/images/Set-Associative.png" alt="img" /></p>
<p>As you can see, it's just the combination of Direct-Mapped and Fully-Associative.</p>
]]></content>
<categories>
<category>Architecture</category>
</categories>
<tags>
<tag>Architecture</tag>
<tag>CS61C</tag>
</tags>
</entry>
<entry>
<title>Direct Memory Access Mechanism</title>
<url>/2023/08/07/Direct-Memory-Access/</url>
<content><![CDATA[<p>DMA serves asa real solution for I/O problems.</p>
<ul>
<li>Device controller transfers data directly to/from memory without involving the processor.</li>
<li>Only interrupts once per page (large) once transfer is complete.</li>
</ul>
<p>The incoming procedure:</p>
<ul>
<li>Receive interrupt from device</li>
<li>CPU takes interrupt, begins transfer (instructs DMA to place data at certain address)</li>
<li>Device/DMA engine handle the transfer (CPU is free to execute other things)</li>
<li>Upon completion, Device/DMA engine interrupt the CPU again</li>
</ul>
<p>The outgoing procedure:</p>
<ul>
<li>CPU decides to initiate transfer, confirms that external device is ready.</li>
<li>CPU takes interrupt, begins transfer (instructs DMA to place data at certain address)</li>
<li>Device/DMA engine handle the transfer (CPU is free to execute other things)</li>
<li>Device/DMA engine interrupt the CPU again to signal completion</li>
</ul>
<h2 id="cache-coherency">Cache-coherency</h2>
<p>DMA writes to memory, leading to incoherency with cache. Here we can see DMA as another processor core, whose coherency has been solved by most modern multiprocessors.</p>
<h2 id="dma-and-cpu-sharing-memory">DMA and CPU Sharing Memory</h2>
<h3 id="cycle-stealing-mode">Cycle Stealing mode</h3>
<ul>
<li>DMA Engine transfers a byte, releases control, then repeats</li>
</ul>
<h3 id="transparent-mode-maybe-best">Transparent Mode (Maybe best)</h3>
<ul>
<li>DMA transfer only occurs when CPU is not using the system bus</li>
</ul>
]]></content>
<categories>
<category>Architecture</category>
</categories>
<tags>
<tag>Architecture</tag>
<tag>CS61C</tag>
</tags>
</entry>
<entry>
<title>Looking into ELF Symbol Table</title>
<url>/2023/10/26/ELF-Symbol-Table/</url>
<content><![CDATA[<h2 id="introduction">Introduction</h2>
<p>ELF, Executable and Linking Format (ELF) files, is a universal binary format in Linux. As its name suggests, any executable or linking files in Linux are in format of ELF, which consists of an ELF header, followed by a program header table or a section header table, or both. The two tables describe the rest of the particularities of the file.</p>
<p>The header file <elf.h> defines the format of ELF files and related C structures.</p>
<span id="more"></span>
<h2 id="top-view">Top-View</h2>
<figure class="highlight c"><table><tr><td class="code"><pre><span class="line">| -------------- |</span><br><span class="line">| ELF Header |</span><br><span class="line">| -------------- |</span><br><span class="line">| Program Header |</span><br><span class="line">| Table |</span><br><span class="line">| -------------- |</span><br><span class="line">| Section Header |</span><br><span class="line">| Table |</span><br><span class="line">| -------------- |</span><br><span class="line">| .......... |</span><br><span class="line">| .......... |</span><br><span class="line">| .......... |</span><br><span class="line">| -------------- |</span><br><span class="line">| Symbol Table |</span><br><span class="line">| Section |</span><br><span class="line">| -------------- |</span><br><span class="line">| String Table |</span><br><span class="line">| Section |</span><br><span class="line">| -------------- |</span><br><span class="line"></span><br></pre></td></tr></table></figure>
<p>We take Elf32 as an example, it's ELF header is like below:</p>
<figure class="highlight c"><table><tr><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span></span></span><br><span class="line"><span class="class">{</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">char</span> e_ident[EI_NIDENT]; <span class="comment">/* Magic number and other info */</span></span><br><span class="line"> Elf64_Half e_type; <span class="comment">/* Object file type */</span></span><br><span class="line"> Elf64_Half e_machine; <span class="comment">/* Architecture */</span></span><br><span class="line"> Elf64_Word e_version; <span class="comment">/* Object file version */</span></span><br><span class="line"> Elf64_Addr e_entry; <span class="comment">/* Entry point virtual address */</span></span><br><span class="line"> Elf64_Off e_phoff; <span class="comment">/* Program header table file offset */</span></span><br><span class="line"> Elf64_Off e_shoff; <span class="comment">/* Section header table file offset */</span></span><br><span class="line"> Elf64_Word e_flags; <span class="comment">/* Processor-specific flags */</span></span><br><span class="line"> Elf64_Half e_ehsize; <span class="comment">/* ELF header size in bytes */</span></span><br><span class="line"> Elf64_Half e_phentsize; <span class="comment">/* Program header table entry size */</span></span><br><span class="line"> Elf64_Half e_phnum; <span class="comment">/* Program header table entry count */</span></span><br><span class="line"> Elf64_Half e_shentsize; <span class="comment">/* Section header table entry size */</span></span><br><span class="line"> Elf64_Half e_shnum; <span class="comment">/* Section header table entry count */</span></span><br><span class="line"> Elf64_Half e_shstrndx; <span class="comment">/* Section header string table index */</span></span><br><span class="line">} Elf64_Ehdr;</span><br></pre></td></tr></table></figure>
<p><strong>e_shoff</strong> defines the offset of <strong>section header tables</strong> from <strong>file begin</strong>. And section tables consist of consecutive sections.<br />
<strong>p_shoff</strong> defines the offset of <strong>program header tables</strong> from <strong>file begin</strong>.</p>
<h3 id="section-header">Section Header</h3>
<p>A file's section header table lets one locate all the file's sections. From <strong>e_shoff</strong> we can reach the table of section headers. And <strong>e_shnum</strong> holds the number of entries the section header table contains.</p>
<p>A section header table index is a subscript into this array. Some section header table indices are reserved: the initial entry and the indices between <strong>SHN_LORESERVE</strong> and <strong>SHN_HIRESERVE</strong>. The initial entry is used in ELF extensions for <strong>e_phnum</strong>, <strong>e_shnum</strong>, and <strong>e_shstrndx</strong>; in other cases, each field in the initial entry is set to zero. An object file does not have sections for these special indices:</p>
<p>For details about these special indices, see also <code>man 5 elf</code>.</p>
<p>The section header has the following structure:</p>
<figure class="highlight c"><table><tr><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span></span></span><br><span class="line"><span class="class">{</span></span><br><span class="line"> Elf32_Word sh_name; <span class="comment">/* Section name (string tbl index) */</span></span><br><span class="line"> Elf32_Word sh_type; <span class="comment">/* Section type */</span></span><br><span class="line"> Elf32_Word sh_flags; <span class="comment">/* Section flags */</span></span><br><span class="line"> Elf32_Addr sh_addr; <span class="comment">/* Section virtual addr at execution */</span></span><br><span class="line"> Elf32_Off sh_offset; <span class="comment">/* Section file offset */</span></span><br><span class="line"> Elf32_Word sh_size; <span class="comment">/* Section size in bytes */</span></span><br><span class="line"> Elf32_Word sh_link; <span class="comment">/* Link to another section */</span></span><br><span class="line"> Elf32_Word sh_info; <span class="comment">/* Additional section information */</span></span><br><span class="line"> Elf32_Word sh_addralign; <span class="comment">/* Section alignment */</span></span><br><span class="line"> Elf32_Word sh_entsize; <span class="comment">/* Entry size if section holds table */</span></span><br><span class="line">} Elf32_Shdr;</span><br></pre></td></tr></table></figure>
<p><strong>sh_name</strong>: indicates the <em>index</em> of <em>section name</em> in <em>Section Header String Table</em>.<br />
<strong>sh_type</strong>: mainly includes(The part I'm interested in):</p>
<ul>
<li><strong>SHT_NULL</strong>: Marks the section header as inactive.</li>
<li><strong>SHT_SYMTAB</strong>: Symbol Table, for link editing and dynamic linking.</li>
<li><strong>SHT_DYNSYM</strong>: Dynamic Symbol Table, holds a minimal set of dynamic symbols linking symbols.</li>
<li><strong>SHT_STRTAB</strong>: String Table. An object file may have multiple string sections.</li>
</ul>
<p><strong>sh_offset</strong>: functions as above, determining the offset of section from from begin.<br />
<strong>sh_link</strong>: This member holds a section header table index link, whose interpretation depends on the section type. For symbol table, it's the section index of String Table Section (holding <strong>name</strong> of symbol).</p>
<h3 id="elf-symbol-table">ELF Symbol Table</h3>
<p>ELF Symbol Table consists of <strong>consecutive</strong> entries.<br />
The structure of the ELF symbol table entry is like:</p>
<figure class="highlight c"><table><tr><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span></span></span><br><span class="line"><span class="class">{</span></span><br><span class="line"> Elf32_Word st_name; <span class="comment">/* Symbol name (string tbl index) */</span></span><br><span class="line"> Elf32_Addr st_value; <span class="comment">/* Symbol value */</span></span><br><span class="line"> Elf32_Word st_size; <span class="comment">/* Symbol size */</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">char</span> st_info; <span class="comment">/* Symbol type and binding */</span></span><br><span class="line"> <span class="type">unsigned</span> <span class="type">char</span> st_other; <span class="comment">/* Symbol visibility */</span></span><br><span class="line"> Elf32_Section st_shndx; <span class="comment">/* Section index */</span></span><br><span class="line">} Elf32_Sym;</span><br></pre></td></tr></table></figure>
<p>As the comment shows, <strong>st_name</strong> is an <em>entry</em> index in <em>String Table</em>. And the section index of <em>String Table</em> is holded in <strong>sh_link</strong>. Based on both, we can get the function/variable name easily.</p>
<p><strong>st_info</strong>: Consist of 2 field: Bind and Type, we focus on latter(which can be derived by <code>ELF32_ST_TYPE(info)</code>) now.</p>
<p>Symbol type mainly includes:</p>
<ul>
<li><strong>STT_OBJECT</strong>: A data object. (Such as <em>C</em> variable)</li>
<li><strong>STT_FUNC</strong>: A function or other executable code.</li>
<li><strong>STT_SECTION</strong>: A section, for relocation.</li>
<li><strong>STT_FILE</strong>: The name of the source file.</li>
<li><strong>STB_LOCAL</strong>: Local symbols are not visible outside the object file containing their definition. Local symbols of the same name may exist in multiple files without interfering with each other.</li>
<li><strong>STB_GLOBAL</strong>: Global symbols are visible to all object files being com‐ bined. One file's definition of a global symbol will satisfy another file's undefined reference to the same symbol.</li>
<li><strong>STB_WEAK</strong>: Weak symbols resemble global symbols, but their definitions have lower precedence.</li>
</ul>
<h3 id="how-to-find-string-in-string-table">How to Find String in String Table?</h3>
<p>String Table can be seen as an array of multiple null-terminated strings. The index of the entry in String Table is just the index of string in array.</p>
<p>For example, a String Table may look like this:</p>
<figure class="highlight c"><table><tr><td class="code"><pre><span class="line"><span class="string">"\0hello\0world\0xxxxxxxxx"</span></span><br></pre></td></tr></table></figure>
<p>The 0th entry is always empty. The 1st entry is "hello". The 2nd is "world". So every time you want to find an string entry by index, you must traverse every string before the one you looks for. Or you can cache the whole string table to speed up the whole ELF analysis.</p>
<h3 id="some-tools">Some Tools</h3>
<p><em>readelf</em> can read the elf file easily by various options.</p>
<p>Some cheatsheet:</p>
<figure class="highlight bash"><table><tr><td class="code"><pre><span class="line">readelf -h <span class="comment"># show elf header</span></span><br><span class="line">readelf -l <span class="comment"># show program headers, or segments</span></span><br><span class="line">readelf -S <span class="comment"># show section headers</span></span><br><span class="line">readelf -g <span class="comment"># show section groups</span></span><br><span class="line">readelf -s <span class="comment"># show symbols</span></span><br></pre></td></tr></table></figure>
<h2 id="reference">Reference</h2>
<ul>
<li><strong><em>Linux ELF Manual</em></strong></li>
</ul>
]]></content>
<categories>
<category>OS</category>
</categories>
<tags>
<tag>ELF</tag>
<tag>Linux</tag>
</tags>
</entry>
<entry>
<title>How to debug LLVM ?</title>
<url>/2023/10/17/How-to-debug-LLVM/</url>
<content><![CDATA[<h2 id="abstract">Abstract</h2>
<p>Debugging programs remains a significant topic in software engineering field. Especially in system software like <em>Compiler</em>, it's difficult to pinpoint the root and solve relative problems.<br />
For my recent work on <em>LLVM</em>, I'd like to share some experience about it.</p>
<span id="more"></span>
<h2 id="classification-of-bugs">Classification of Bugs</h2>
<p>Bugs in Compiler field can be mainly classified into <em>Crash</em>, <em>Mis-compilation</em>, and <em>Missed Optimizations</em>.</p>
<p>For example, there may be a C file triggering one of these bugs.</p>
<p>If it crashed clang, which is easiest to pinpoint, clang would dump the stack trace. With the stack trace, we are able to determine it's a frontend/middleend/backend problem.</p>
<p>Or, if someone reported an assembly file after mis-compilation, we have to reduce it first (use <em>llvm-reduce</em> if it's a LLVM-IR), and try to validate in the whole compilation. For example, validate the AST, the LLVM-IR after each pass, and the assembly after every step in backend. In this way, we can pinpoint which module caused the mis-compilation.</p>
<p>For missed optimizations, it's similar to the case as mis-compilation. However, it's harder to define whether it's a <em>helpful or real</em> missed optimizations. There are some kinds of missed optimizations that always make no sense to real improvement of optimization. And fuzzers always generate such missed cases:</p>
<ul>
<li><p>Too large IR. For this kind of IR, passes like CSE, GVN and DSE only fold it partially for cost/compile-time problem</p></li>
<li><p>No real motivation. The optimization in LLVM is designed mostly for real-world applications. For this reason, some non-sense missed cases are not considered at all, unless they become a pattern.</p></li>
<li><p>Hard to debug. Complex testcases always needs reduction and can be located precisely in which module.</p></li>
<li><p>Won't fix. Optimization is a recursively unsolvable problem, and there is always some topics that compiler can't fix at all, such as fully eliminating all common expressions or simplifying all expressions. Most optimization in LLVM is <strong>mostly heuristic</strong> or based on <strong>experience</strong>, which determines that LLVM can't handle all cases.</p></li>
</ul>
<h2 id="some-tools">Some Tools</h2>
<ul>
<li><p><em>llc/opt --print-before=[crash pass] [ir]</em>: You could dump IR before the pass causing crash through it. For example, use <code>2> dump.txt</code> to output to a file.</p></li>
<li><p><em>opt -O2 -print-before-all / opt -O2 -print-before-all</em>: Dump all IR before/after all passes that modify IR.</p></li>
<li><p><em>llvm-reduce [ir] --test=test.sh</em>: <em>llvm-reduce</em> is an IR-Reduction tool based on <em>Delta Algorithm</em>, which reduces ir if <em>test.sh</em> return 0(0 represents interestness). It make IR eaiser to analyze.</p></li>
</ul>
<hr />
<h2 id="my-workflow">My Workflow</h2>
<p>When I come across an IR file crashing clang/opt, I first take a look at stacktrace. The stacktrace always indicates which function/class exposes the error.</p>
<!-- picture1 -->
<p>Here we assume the error is exposed by a optimization pass <em>op1</em>. Then we enter:</p>
<figure class="highlight sh"><table><tr><td class="code"><pre><span class="line">opt --print-before=op1 -O2 -S [ir-file] > [ir-before-op1.ll]</span><br></pre></td></tr></table></figure>
<p>Or if it's a C file, we enter:</p>
<figure class="highlight bash"><table><tr><td class="code"><pre><span class="line">clang -mllvm -print-before=instcombine xxx.c -O2 -g0 2> xxx.ll</span><br></pre></td></tr></table></figure>
<p><code>O2</code> can be anything causing problem. The output serves as a reproducer. And then we use <em>op1</em> to reproduce it:</p>
<figure class="highlight sh"><table><tr><td class="code"><pre><span class="line">opt --passes=op1 -S [ir-file]</span><br></pre></td></tr></table></figure>
<p>If we reproduce it successfully, we are down to reducing it:</p>
<p>Write a <code>test.sh</code></p>
<figure class="highlight bash"><table><tr><td class="code"><pre><span class="line"><span class="meta">#!/bin/bash</span></span><br><span class="line">opt --passes=op1 -S <span class="variable">$1</span> | grep <span class="string">"something related to error"</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># For missed optimization, write a testcase and check it with FileCheck</span></span><br><span class="line"><span class="comment"># FileCheck $1 | grep "something related"</span></span><br></pre></td></tr></table></figure>
<p>And launch <em>llvm-reduce</em>:</p>
<figure class="highlight bash"><table><tr><td class="code"><pre><span class="line">llvm-reduce [ir-file] --<span class="built_in">test</span>=test.sh</span><br></pre></td></tr></table></figure>
<p>Finally we get a <em>reduced.ll</em>. Based on this file, we analyze the problem easier.</p>
<h2 id="how-to-get-command-line-arguments-and-ir-dump-file-when-bootstrapping">How to Get Command-Line Arguments and IR-dump File when bootstrapping</h2>
<p>Refer to <a href="https://discourse.llvm.org/t/how-to-reproduce-the-bug-and-get-the-exact-ir-before-crash-during-bootstrapping/74032">discourse</a></p>
<h2 id="reference">Reference</h2>
<ul>
<li><a href="https://clangbuiltlinux.github.io/llvm-dev-conf-2020/nick/debugging_llvm.html">Debugging llvm</a></li>
<li><a href="https://www.npopov.com/2023/10/22/How-to-reduce-LLVM-crashes.html">Nikic's Blog</a></li>
</ul>
]]></content>
<categories>
<category>LLVM</category>
</categories>
<tags>
<tag>Compiler</tag>
<tag>LLVM</tag>
<tag>OpenSource</tag>
</tags>
</entry>
<entry>
<title>Introduction to GPU</title>
<url>/2023/10/31/Introduction-To-GPU/</url>
<content><![CDATA[<h2 id="introduction">Introduction</h2>
<p>GPU, Graphics Processing Unit, is initially designed to accelerate image rendering such as video games. For its high performance at parallel computation, it has become a great processor for accelerating DL/ML training.</p>
<span id="more"></span>
<p>Unlike CPU, GPU consists of numerous computational units, long pipeline and a video memory, which determines its advantages in parallel computation and disadvantages in complex control logic handling.</p>
<p><img src="/images/CPU-GPU.webp" /></p>
<h2 id="computational-unitscores">Computational units(cores)</h2>
<p>In total, computational units of CPU are fast but few, while that of GPU are slow but numerous. The fastness of CPU is based on its high frequency and smart calculation. Here, smartness is reflected by its out-of-order executions, multiple branch prediction and etc.</p>
<p>But GPU can only handle some easy linear work like <code>fmuladd</code> instructions. In fact, except small scalar float, modern GPU can perform operations on more complicated type like tensor(<em>tensor core</em>).</p>
<p>SIMD not only exist in CPU, but also in GPU. Same operation, but different data, such feature make GPU fast in parallel work like matrix multiplication.</p>
<h2 id="memory">Memory</h2>
<p>Memory of GPU is much tinier than CPU's. And cache in GPU has some difference with what L1, L2 in CPU do.</p>
<p>For <code>reduce</code> in parallel computation, it requires multiple cores share memory. But it's hard and expensive for thousands of core share one memory segment. So we divide different types of cores into multiple groups, called <em>Streaming Multiprocessors</em>;</p>
<p>There are INT32, FP32 and other types of SM in GPU. So how they cooperate?</p>
<p>In TU102. every 4 SMs share a shared segment of L1 cache, and all cores share L2 cache. Like CPU, after missing data in L1, core will try to hit L2, and then GMEM. To note, how L1 is shared is controlled by software or programmer, not hardware. But L2 and GMEM is controlled by hardware. Besides, cores can also share data in registers.</p>
<p>The basic idea is that every thread holds a register to keep temporary result and every register can only be visited by one consistent thread(or by same wrap/group).</p>
<h2 id="references">References</h2>
<p><a href="https://zhuanlan.zhihu.com/p/598173226">Clarence's Zhihu</a><br />
<a href="https://medium.com/codex/understanding-the-architecture-of-a-gpu-d5d2d2e8978b">Understanding the architecture of a GPU</a></p>
]]></content>
<categories>
<category>Architecture</category>
</categories>
<tags>
<tag>Architecture</tag>
<tag>GPU</tag>
<tag>HPC</tag>
</tags>
</entry>
<entry>
<title>第一次给LLVM的Contribution</title>
<url>/2023/06/30/LLVM-First-Contribution/</url>
<content><![CDATA[<blockquote>
<p>2023.9.21修改:LLVM的patch以及完全迁移到Github PR上,本篇文章有关Phabricator的操作已经<strong>out-of-dated</strong>。</p>
</blockquote>
<h2 id="为什么要参与llvm的开源">为什么要参与LLVM的开源?</h2>
<p>由于一直以来对编译器后端特别感兴趣,又曾用<strong>LLVM</strong>作为后端为自己的语言进行AOT的编译, 我对LLVM的内部十分好奇,于是想通过为<strong>LLVM</strong>贡献代码的方式了解<strong>LLVM</strong>,并了解编译器优化的流程。</p>
<p>于是我参考了一位LLVM Member的文章: <a href="https://developers.redhat.com/articles/2022/12/20/how-contribute-llvm#implementing_the_transform">How to contribute to llvm?</a></p>
<p>以下则是我从编译到提交patch的全流程。</p>
<h3 id="编译">编译</h3>
<p>要为LLVM贡献代码,那首先能在本地编译LLVM库。</p>
<p>那么我们首先要clone LLVM的git仓库,或者自己fork了<strong>llvm-project</strong>后再clone到本地。二者区别不大,我按照github的开源习惯选了后者。</p>
<p>clone完之后我们开始编译,这边要注意的是:由于计算机编译速度的限制,我们一边建议进行<strong>Release</strong>编译。否则一次编译链接要长达几小时的时间。 以下是cmake的模板:</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line">cmake -GNinja -Bbuild -Hllvm \</span><br><span class="line"> -DLLVM_ENABLE_PROJECTS="clang" \</span><br><span class="line"> -DLLVM_TARGETS_TO_BUILD="all" \</span><br><span class="line"> -DCMAKE_BUILD_TYPE=Release \</span><br><span class="line"> -DLLVM_ENABLE_ASSERTIONS=true \</span><br><span class="line"> -DLLVM_CCACHE_BUILD=true \</span><br><span class="line"> -DLLVM_USE_LINKER=lld</span><br></pre></td></tr></table></figure>
<p>其中Debug可通过<code>-debug</code> flag来进行,你可以在对应的代码位置用<code>errs() << something</code>进行输出。</p>
<p>而ninja的编译速度相对较快,所以以下有构建和测试的shell:</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">Build LLVM</span></span><br><span class="line">ninja -Cbuild</span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">Run all LLVM tests</span></span><br><span class="line">ninja -Cbuild check-llvm</span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">Run tests <span class="keyword">in</span> a specific directory.</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">-v will <span class="built_in">print</span> additional information <span class="keyword">for</span> failures.</span></span><br><span class="line">build/bin/llvm-lit -v llvm/test/Transforms/InstCombine</span><br></pre></td></tr></table></figure>
<h3 id="选issue">选Issue</h3>
<p>由于我是LLVM领域的新手,不太可能一上来就砍大龙,所以我挑了个简单的任务。 <span id="more"></span> 而llvm-project包括许多子项目,包括LLVM本身、Clang编译器、LLD链接器、libc++标准库以及许多其他项目。即使在LLVM本身中也有不同的领域。主要分为与中端优化器与LLVM中间表示(IR)有关的项目,和与后端将IR转换为机器代码有关的项目。</p>
<p>而我对中端的了解比较多,而且中端优化的代码有许多corner cases,可以通过简单的几行代码解决这些cases, 所以本博客主要针对中端IR优化的<strong>InstCombine</strong>进行讨论,挑选的也是<a href="https://github.com/llvm/llvm-project/issues?q=is%3Aopen+is%3Aissue+label%3Allvm%3Ainstcombine">InstCombine Issue</a>。 当然,LLVM还有许多其他容易解决的Issue,如:<a href="https://github.com/llvm/llvm-project/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22">good first issues</a>,Clang,Flang还有clang-tidy和clang-format等项目的Issue。</p>
<p>在这里我将展示我的一次LLVM贡献经历:<a href="https://reviews.llvm.org/D154126/new/">D154126</a></p>
<p>相关<a href="https://github.com/llvm/llvm-project/issues/62586">Issue</a>。</p>
<h3 id="问题分析">问题分析</h3>
<p>这篇Issue里提到的问题为: <code>(a > b) | (a < b)</code> 的优化会在 <code>b == 0</code> 时失效。</p>
<p>而一般的 <code>(a > b) | (a < b)</code> 会折叠为 <code>ZExt(a != 0)</code>,对应的LLVM-IR如下:</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">define i32 @src(i32 %A, i32 %B) {</span><br><span class="line">%1:</span><br><span class="line"> %2 = icmp sgt i32 %A, %B</span><br><span class="line"> %3 = zext i1 %2 to i32</span><br><span class="line"> %4 = icmp slt i32 %A, %B</span><br><span class="line"> %5 = zext i1 %4 to i32</span><br><span class="line"> %6 = or i32 %3, %5</span><br><span class="line"> ret i32 %6</span><br><span class="line">}</span><br><span class="line">=></span><br><span class="line">define i32 @tgt(i32 %A, i32 %B) {</span><br><span class="line">%1:</span><br><span class="line"> %2 = icmp ne i32 %A, %B</span><br><span class="line"> %3 = zext i1 %2 to i32</span><br><span class="line"> ret i32 %3</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>但对于 <code>b == 0</code> 的case,其对应的InstCombine优化为: <figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">define i32 @src(i32 %A) {</span><br><span class="line">%1:</span><br><span class="line"> %2 = icmp sgt i32 %A, 0</span><br><span class="line"> %3 = zext i1 %2 to i32</span><br><span class="line"> %4 = lshr i32 %A, 31</span><br><span class="line"> %5 = or i32 %4, %3</span><br><span class="line"> ret i32 %5</span><br><span class="line">}</span><br><span class="line">=></span><br><span class="line">define i32 @tgt(i32 %A) {</span><br><span class="line">%1:</span><br><span class="line"> %2 = icmp sgt i32 %A, 0</span><br><span class="line"> %3 = zext i1 %2 to i32 </span><br><span class="line"> %4 = lshr i32 %A, 31</span><br><span class="line"> %5 = or i32 %3, %4</span><br><span class="line"> ret i32 %5</span><br><span class="line">}</span><br></pre></td></tr></table></figure> 也就是说在这种情况下 <code>A < 0</code> 被优化成了 <code>A << 31</code>,而之前对应的 <code>A < B | A > B</code> 的<strong>Pattern Matching</strong>被破坏掉了。</p>
<p>在分析如何解决这个优化问题前,我们先了解LLVM的中端优化代码提交patch的特殊规则。</p>
<p>LLVM的patch由两部分组成,第一部分是<strong>impl</strong>前的<strong>misoptimization tests</strong>,第二部分则是<strong>impl</strong>以及应用<strong>impl</strong>后的<strong>tests</strong>。 这样分解patch的好处有以下2点:</p>
<ol type="1">
<li>便于通过对tests的前后对比查看你实现的优化效果。</li>
<li>可以把tests作为单独的patch提交,这样能简单提高LLVM的测试量。</li>
</ol>
<p>除此之外,在你提交patch前,你还要证明你优化的正确性。</p>
<h4 id="证明transform的正确性">证明Transform的正确性</h4>
<p>一般来讲,我们会使用 <a href="https://github.com/AliveToolkit/alive2">alive2</a> 验证不同<strong>LLVM-IR</strong>的正确性,<a href="https://alive2.llvm.org/ce/">online</a>版。 本篇的Issue的alive2结果如下:</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">define i32 @src(i32 %0) {</span><br><span class="line">%1:</span><br><span class="line"> %2 = icmp sgt i32 %0, 0</span><br><span class="line"> %3 = zext i1 %2 to i32</span><br><span class="line"> %4 = lshr i32 %0, 31</span><br><span class="line"> %5 = or i32 %4, %3</span><br><span class="line"> ret i32 %5</span><br><span class="line">}</span><br><span class="line">=></span><br><span class="line">define i32 @tgt(i32 %0) {</span><br><span class="line">%1:</span><br><span class="line"> %2 = icmp ne i32 %0, 0</span><br><span class="line"> %3 = zext i1 %2 to i32</span><br><span class="line"> ret i32 %3</span><br><span class="line">}</span><br><span class="line">Transformation seems to be correct!</span><br></pre></td></tr></table></figure>
<p>虽然<strong>alive2</strong>是确保LLVM转换正确性的非常重要的工具,但值得注意的是它可能会产生<strong>false negative</strong>结果(即有时它会声称一个不正确的转换是正确的)。这通常发生在循环优化的背景下,并且通常不会影响<strong>InstCombine</strong>优化。</p>
<h4 id="测试">测试</h4>
<p>在我们写<strong>impl</strong>之前,我们需要先完成所有testcases的构建。</p>
<p>首先是基本成功转换的测试样例: <figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">define i32 @icmp_slt_0_or_icmp_sgt_0_i32(i32 %x) {</span><br><span class="line">; CHECK-LABEL: @icmp_slt_0_or_icmp_sgt_0_i32(</span><br><span class="line">; CHECK-NEXT: [[B:%.*]] = icmp sgt i32 [[X:%.*]], 0</span><br><span class="line">; CHECK-NEXT: [[X_LOBIT:%.*]] = lshr i32 [[X]], 31</span><br><span class="line">; CHECK-NEXT: [[D:%.*]] = zext i1 [[B]] to i32</span><br><span class="line">; CHECK-NEXT: [[E:%.*]] = or i32 [[X_LOBIT]], [[D]]</span><br><span class="line">; CHECK-NEXT: ret i32 [[E]]</span><br><span class="line">;</span><br><span class="line"> %A = icmp slt i32 %x, 0</span><br><span class="line"> %B = icmp sgt i32 %x, 0</span><br><span class="line"> %C = zext i1 %A to i32</span><br><span class="line"> %D = zext i1 %B to i32</span><br><span class="line"> %E = or i32 %C, %D</span><br><span class="line"> ret i32 %E</span><br><span class="line">}</span><br></pre></td></tr></table></figure></p>
<p>注意,其中的<strong>CHECK-LABEL</strong>后的是testcase的函数名,<strong>CHECK-NEXT</strong>后则是经过转换后期望的IR,在测试时若不满足期望,则会返回失败的测试报告。 这里的测试是未进行优化时的结果,故<strong>CHECK</strong>的结果也自然是未优化的。 当然这里<strong>CHECK</strong>的内容不用自己直接输入,可以用llvm的脚本自动生成,脚本如下:</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line">llvm/utils/update_test_checks.py --opt-bin build/bin/opt \</span><br><span class="line"> llvm/test/Transforms/InstCombine/and-or-icmps.ll</span><br></pre></td></tr></table></figure>
<p>这段脚本会用<strong>InstCombine</strong>对<code>and-or-icmps</code>的每个testcase进行一次优化,并把优化结果作为<strong>CHECK</strong>的IR插入到<code>and-or-icmps</code>中。</p>
<p>而上面的测试用例只考虑了i32的基本类型,这里我们再添加i64的测试类型:</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">define i64 @icmp_slt_0_or_icmp_sgt_0_i64(i64 %x) {</span><br><span class="line"> %A = icmp slt i64 %x, 0</span><br><span class="line"> %B = icmp sgt i64 %x, 0</span><br><span class="line"> %C = zext i1 %A to i64</span><br><span class="line"> %D = zext i1 %B to i64</span><br><span class="line"> %E = or i64 %C, %D</span><br><span class="line"> ret i64 %E</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>除此之外,我们还需要一些反例(如改变左移的位数,把大于变为小于等),防止我们的转换误优化,一例如下: <figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">define i64 @icmp_slt_0_or_icmp_sgt_0_i64_fail2(i64 %x) {</span><br><span class="line">; CHECK-LABEL: @icmp_slt_0_or_icmp_sgt_0_i64_fail2(</span><br><span class="line">; CHECK-NEXT: [[B:%.*]] = icmp sgt i64 [[X:%.*]], 0</span><br><span class="line">; CHECK-NEXT: [[C:%.*]] = lshr i64 [[X]], 62</span><br><span class="line">; CHECK-NEXT: [[D:%.*]] = zext i1 [[B]] to i64</span><br><span class="line">; CHECK-NEXT: [[E:%.*]] = or i64 [[C]], [[D]]</span><br><span class="line">; CHECK-NEXT: ret i64 [[E]]</span><br><span class="line">;</span><br><span class="line"> %B = icmp sgt i64 %x, 0</span><br><span class="line"> %C = lshr i64 %x, 62</span><br><span class="line"> %D = zext i1 %B to i64</span><br><span class="line"> %E = or i64 %C, %D</span><br><span class="line"> ret i64 %E</span><br><span class="line">}</span><br></pre></td></tr></table></figure></p>
<p>最后,我们可能还要考虑向量化的测试如下:</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">define <2 x i64> @icmp_slt_0_or_icmp_sgt_0_i64x2(<2 x i64> %x) {</span><br><span class="line"> %A = icmp slt <2 x i64> %x, <i64 0,i64 0></span><br><span class="line"> %B = icmp sgt <2 x i64> %x, <i64 0,i64 0></span><br><span class="line"> %C = zext <2 x i1> %A to <2 x i64></span><br><span class="line"> %D = zext <2 x i1> %B to <2 x i64></span><br><span class="line"> %E = or <2 x i64> %C, %D</span><br><span class="line"> ret <2 x i64> %E</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>完成这些testcases后我们进行一次commit。</p>
<h4 id="实现">实现</h4>
<p>最终到了我们的实现部分,在实现之前,我们要进行有关的分析/debug的工作。</p>
<p>我们在这里通过<code>build/bin/opt -passes=instcombine -S -debug src.ll</code>进行Debug,在不同函数中插入打印的函数,从而根据输出判断优化的代码位置。</p>
<p>经过一系列排查,我们可以发现当<code>b == 0</code>时无法优化的原因是在 <strong>InstCombineAndOr.cpp</strong>中的<strong>transformZExtICmp</strong>函数会把<code>ZExt(a < 0)</code>转化为<code>a << 31</code>。</p>
<p>而优化 <code>a < b | a > b</code> 的函数<strong>foldAndOrOfICmpsUsingRanges</strong>无法识别<code>a << 31</code>这样的语句,自然就无法优化了。 由于笔者并不是特别清楚InstCombine优化的顺序,故笔者选择在<strong>foldCastedBitwiseLogic</strong>中增加对<code>Zext(a > 0) | a << 31</code>的匹配,并进行对应的优化。 代码如下:</p>
<figure class="highlight cpp"><table><tr><td class="code"><pre><span class="line"><span class="comment">// ( A << (X - 1) ) | ((A > 0) zext to iX)</span></span><br><span class="line"><span class="comment">// <=> A < 0 | A > 0</span></span><br><span class="line"><span class="comment">// <=> (A != 0) zext to iX</span></span><br><span class="line">Value *A;</span><br><span class="line">ICmpInst::Predicate Pred;</span><br><span class="line"></span><br><span class="line"><span class="keyword">auto</span> MatchOrZExtICmp = [&](Value *Op0, Value *Op1) -> <span class="type">bool</span> {</span><br><span class="line"><span class="keyword">return</span> <span class="built_in">match</span>(Op0, <span class="built_in">m_LShr</span>(<span class="built_in">m_Value</span>(A), <span class="built_in">m_SpecificInt</span>(Op0-><span class="built_in">getType</span>()-><span class="built_in">getScalarSizeInBits</span>() - <span class="number">1</span>))) &&</span><br><span class="line"> <span class="built_in">match</span>(Op1, <span class="built_in">m_ZExt</span>(<span class="built_in">m_ICmp</span>(Pred, <span class="built_in">m_Specific</span>(A), <span class="built_in">m_Zero</span>())));</span><br><span class="line">};</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> (LogicOpc == Instruction::Or &&</span><br><span class="line"> (<span class="built_in">MatchOrZExtICmp</span>(Op0, Op1) || <span class="built_in">MatchOrZExtICmp</span>(Op1, Op0)) &&</span><br><span class="line"> Pred == ICmpInst::ICMP_SGT) {</span><br><span class="line"> Value *Cmp =</span><br><span class="line"> Builder.<span class="built_in">CreateICmpNE</span>(A, Constant::<span class="built_in">getNullValue</span>(A-><span class="built_in">getType</span>()));</span><br><span class="line"> <span class="keyword">return</span> <span class="keyword">new</span> <span class="built_in">ZExtInst</span>(Cmp, A-><span class="built_in">getType</span>());</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>在这里我们定义了一个lambda:<code>MatchOrZExtICmp</code>,用于匹配左移与Zext运算,而<code>Op0</code>,<code>Op1</code>则是在<code>or</code>运算符的两个操作数。</p>
<p><code>match</code>、<code>m_ZExt</code>等有关的函数、类则是LLVM的<strong>PatternMatching</strong>库。 <strong>PatternMatching</strong>库提供一系列函数和模板类,用于匹配特定LLVM-IR的Pattern,类似<code>m_SpecificInt</code>则是匹配一个特定整数或者有相同整数元素的向量 (<strong>Splat Vector</strong>)。</p>
<p>其中要注意的是<code>getScalarSizeInBits</code>函数在整数类型中返回整数的大小,而在vector中返回元素的大小。</p>
<p>最后经过了实现,我们需要再次更新我们的testcases以确认优化的效果,故要再次运行:</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line">llvm/utils/update_test_checks.py --opt-bin build/bin/opt \</span><br><span class="line"> llvm/test/Transforms/InstCombine/and-or-icmps.ll</span><br></pre></td></tr></table></figure>
<p>这时我们可以发现我们的正例的<strong>CHECK</strong>发生了变化:</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">define i32 @icmp_slt_0_or_icmp_sgt_0_i32(i32 %x) {</span><br><span class="line">; CHECK-LABEL: @icmp_slt_0_or_icmp_sgt_0_i32(</span><br><span class="line">; CHECK-NEXT: [[TMP1:%.*]] = icmp ne i32 [[X:%.*]], 0</span><br><span class="line">; CHECK-NEXT: [[E:%.*]] = zext i1 [[TMP1]] to i32</span><br><span class="line">; CHECK-NEXT: ret i32 [[E]]</span><br><span class="line">;</span><br><span class="line"> %A = icmp slt i32 %x, 0</span><br><span class="line"> %B = icmp sgt i32 %x, 0</span><br><span class="line"> %C = zext i1 %A to i32</span><br><span class="line"> %D = zext i1 %B to i32</span><br><span class="line"> %E = or i32 %C, %D</span><br><span class="line"> ret i32 %E</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>且其他的testcases的变化也符合我们的期望,这里我们再commit一次。</p>
<p>这时我们就可以进入patch的提交阶段了。</p>
<h3 id="提交patch">提交Patch</h3>
<p>现在我们已经有了两个<strong>commit</strong>,可以通过以下指令生成test和impl的patch文件。</p>
<figure class="highlight shell"><table><tr><td class="code"><pre><span class="line">git show -U99999 HEAD^ > patch_test</span><br><span class="line">git show -U99999 > patch_transform</span><br></pre></td></tr></table></figure>
<p>而LLVM暂时不接受Github的PR,只允许在<a href="https://reviews.llvm.org/">Phabricator</a>上提交patch。 故在这里我注册了Phabricator的帐号,并通过<a href="https://reviews.llvm.org/differential/diff/create/">Create Diff</a>分别上传我的两个patch。</p>
<p>Patch的标题内容等格式可以博客开头的参考文章,机翻并改造如下: > 选择一个有意义的patch标题和摘要。对于我们的运行示例,第一个patch可能是这样的: > > title:[InstCombine] Add tests for (A > 0) | (A < 0) -> zext (A != 0) fold (NFC) > > summary:Tests for an upcoming (A > 0) | (A < 0) -> zext (A != 0) fold.。 > > reviewer:(见下文) > > 第二个patch可能是这样的: > > title:[InstCombine] Transform (A > 0) | (A < 0) -> zext (A != 0) fold > > summary:[InstCombine] Transform (A > 0) | (A < 0) -> zext (A != 0) fold > > This extends foldCastedBitwiseLogic to handle the similar cases. > > ......你的分析...... > > It's proved by alive-tv:<strong>link</strong> > > Depends on DNNNNNN(在此处放置第一个patch的ID)。 > reviewer:(见下文) > > 这里有几个值得强调的地方: > > 标题开头应该有一个 [Category] 标签。通常,您可以只使用您要修改的文件的名称。例如,对 InstCombine 的更改通常带有[InstCombine]标记。 > 非功能性更改(如测试添加)的patch通常在标题中的某个地方带有 NFC 标记。 > 如果您有任何 alive2 证明,请在patch摘要中包含它们。 > 您可以使用“Depends on DNNNNNN”来创建堆叠的patch。也可以事后添加“子修订版”来实现此目的。</p>
<hr />
<p>现在我们还差<strong>Reviewers</strong>,在LLVM中,patch提交者负责选择适当的审阅者。虽然有人可能会根据patch标题(这就是分类标记如此重要的原因)来找到合适的审阅者,但您最好一开始就指定适当的审阅者。</p>
<p>虽然LLVM有一个CODE_OWNERS.txt文件,用于指定不同领域的代码所有者,但不幸的是, 这个文件往往过时且不完整。找到审阅者的更好方法是查看您要修改的文件的Git历史记录,并添加一些最近commit或最近review diff revision的人员。</p>
<p>对于InstCombine,主要的reviewer是spatel,但您也可以根据历史记录找到其他几个候选人(例如nikic,goldstein.w.n)。</p>
<p>提交了patch后,就该等待review了。对于这样简单的更改,通常会有人很快处理。如果您在一周内没有得到回复,请发送“ping”评论,并每周发送一次。对于InstCombine来说等待数周才进行审阅是相当不寻常的,但如果您提交的更改是很长时间没有人真正工作的领域,则可能会发生。只需要不断“ping”。</p>
<p>最后,一旦patch获得批准,审阅者通常会认为您已经拥有提交访问权限,并允许您自己提交更改。如果不是这种情况,则应该跟进一条评论, 例如“I don't have commit access, can you please land this for me? Please use 'Your Name <a href="mailto:your@email" class="email">your@email</a>' for the commit”。 最后一点很重要,因为Phabricator会丢失patch的作者信息,提交者必须将其添加回来。</p>
<p>如果您计划对LLVM进行任何形式的常规贡献,建议请求提交访问权限。这方面的门槛非常低,因此可以尽早请求。如果不必创建堆叠的审查,则测试的预提交工作流程要方便得多。</p>
<p>最后,有关CI的一些话:Phabricator上的patch会通过“pre-merge”测试运行。特别是如果您没有在本地运行完整的测试套件,则这些结果可能会有所帮助。不幸的是,这些测试运行有些不稳定,因此如果您看到与您的patch没有明显关系的失败,则通常可以忽略它们。</p>
<p>一旦patch被提交,它将在更广泛的“buildbots”范围内运行,这些机器人在许多不同的架构和许多不同的配置上运行测试。 这些也相当不稳定,因此同样适用:如果您收到buildbots故障电子邮件,看起来与您的patch无关,则不必担心。如果最终发现是您的责任,buildbots所有者会让您知道。</p>
<h3 id="总结">总结</h3>
<blockquote>
<p>翻译参考文章的总结</p>
</blockquote>
<p>LLVM的贡献过程具有某些不同于其他开源项目的不寻常方面。其中一部分是使用Phabricator而不是GitHub进行审查,但大多数差异都集中在强调正确性方面,从正确性证明开始,到测试的预提交工作流程,以及最终往往是测试和代码更改之间非常大的比率。</p>
<p>我希望本文对于想要进入LLVM开发的人有所帮助,但我想重申,第一次做不需要完全做得“正确”,如果遇到问题,人们会很乐意提供帮助。Discourse的初学者类别以及Discord聊天是提问的好地方。</p>
<blockquote>
<p>自己的总结</p>
</blockquote>
<p>第一次为大型开源项目Contribute是一次特别的经历,在不断与reviewer的沟通中,我也对LLVM的体系有了更深刻的了解,希望读者在看了本篇博客后也可以更活跃地参与开源活动。</p>
]]></content>
<categories>
<category>LLVM</category>
</categories>
<tags>
<tag>Compiler</tag>
<tag>LLVM</tag>
<tag>OpenSource</tag>
</tags>
</entry>
<entry>
<title>有关LLVM的文档</title>
<url>/2023/10/30/LLVM-Docs/</url>
<content><![CDATA[<p><a href="https://llvm.org/docs/LangRef.html">LangRef</a><br />
<a href="https://llvm.org/docs/LoopTerminology.html">循环术语/Loop Terminology</a><br />
<a href="https://llvm.org/docs/MemorySSA.html">MemorySSA</a><br />
<a href="https://llvm.org/docs/Reference.html">Reference Guide</a><br />
<a href="https://llvm.org/docs/Passes.html">Current Passes</a><br />
<span id="more"></span></p>
]]></content>
<categories>
<category>LLVM</category>
</categories>
<tags>
<tag>Compiler</tag>
<tag>LLVM</tag>
<tag>OpenSource</tag>
</tags>
</entry>
<entry>
<title>LLVM源码解析- EarlyCSE</title>
<url>/2023/10/08/LLVM-Source-Analysis-EarlyCSE/</url>
<content><![CDATA[<h2 id="abstract">Abstract</h2>
<p><strong>Common sub-expression elimination (CSE)</strong> is an important optimization for compilers, which is similar to partial redundancies elimination optimization.<br />
CSE is designed to eliminate those expressions with identical and semantically equivalent components, with consideration for some properties like commutativity, associativity of operators.<br />
For LLVM, there is <strong>EarlyCSE</strong> pass as one of implementation for CSE. The "Early" in <strong>EarlyCSE</strong> means that simple, fast and can be applied in every stages it needs.</p>
<span id="more"></span>
<h2 id="a-top-down-view">A Top-Down View</h2>
<p>EarlyCSE iterates down all BasicBlocks in DFS order within dom-tree (only once), which guarantees that expressions in current expressions will be <strong>dominated</strong> after the expressions iterated before.</p>
<p>Besides, EarlyCSE tags every Node(or BasicBlock) with a generation number for memory instructions, since memory insts in LLVM doesn't fit into SSA, which we must hack in other ways. And every time we meet a branch (current BB has more than one predecessors), we have to increment generation by one.</p>
<blockquote>
<p>If this block has a single predecessor, then the predecessor is the parent of the domtree node and all of the live out memory values are still current in this block. If this block has multiple predecessors, then they could have invalidated the live-out memory values of our parent value. For now, just be conservative and invalidate memory if this block has multiple predecessors.</p>
</blockquote>
<p>Then, in <code>processBlock</code> function, we handle the most key case where "SimpleValue" can handle. We maintain a hash table called "AvailableValues". And when we encounter an instruction, we lookup this table for the hash value of the instruction. If no such hash in table, insert it. Otherwise, we compare whether those with the same hash is equivalent in instruction level. If equivalent, we replace the latter with the former higher in dom-tree.</p>
<p>In this way, we handle the most SSA. Memory operations are discussed later.</p>
<h2 id="how-is-the-available-values-maintained">How is the available values maintained?</h2>
<p>When DFS the dom-tree, EarlyCSE actually maintains a scoped map and a stack (emulating the function stack). When entering a new <em>BB</em>, push a Node to the stack and insert relevant hash in <em>BB</em>. When exiting <em>BB</em>, pop the Node and erase relevant hash in <em>BB</em>.</p>
<h2 id="how-is-lookup-implemented">How is lookup implemented?</h2>
<p>Let's take a look at <code>getHashValueImpl</code> of SimpleValue:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="function"><span class="type">static</span> <span class="type">unsigned</span> <span class="title">getHashValueImpl</span><span class="params">(SimpleValue Val)</span> </span>{</span><br><span class="line"> Instruction *Inst = Val.Inst;</span><br><span class="line"> <span class="comment">// Hash in all of the operands as pointers.</span></span><br><span class="line"> <span class="keyword">if</span> (BinaryOperator *BinOp = <span class="built_in">dyn_cast</span><BinaryOperator>(Inst)) {</span><br><span class="line"> Value *LHS = BinOp-><span class="built_in">getOperand</span>(<span class="number">0</span>);</span><br><span class="line"> Value *RHS = BinOp-><span class="built_in">getOperand</span>(<span class="number">1</span>);</span><br><span class="line"> <span class="keyword">if</span> (BinOp-><span class="built_in">isCommutative</span>() && BinOp-><span class="built_in">getOperand</span>(<span class="number">0</span>) > BinOp-><span class="built_in">getOperand</span>(<span class="number">1</span>))</span><br><span class="line"> std::<span class="built_in">swap</span>(LHS, RHS);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> <span class="built_in">hash_combine</span>(BinOp-><span class="built_in">getOpcode</span>(), LHS, RHS);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (CmpInst *CI = <span class="built_in">dyn_cast</span><CmpInst>(Inst)) {</span><br><span class="line"> <span class="comment">// Compares can be commuted by swapping the comparands and</span></span><br><span class="line"> <span class="comment">// updating the predicate. Choose the form that has the</span></span><br><span class="line"> <span class="comment">// comparands in sorted order, or in the case of a tie, the</span></span><br><span class="line"> <span class="comment">// one with the lower predicate.</span></span><br><span class="line"> Value *LHS = CI-><span class="built_in">getOperand</span>(<span class="number">0</span>);</span><br><span class="line"> Value *RHS = CI-><span class="built_in">getOperand</span>(<span class="number">1</span>);</span><br><span class="line"> CmpInst::Predicate Pred = CI-><span class="built_in">getPredicate</span>();</span><br><span class="line"> CmpInst::Predicate SwappedPred = CI-><span class="built_in">getSwappedPredicate</span>();</span><br><span class="line"> <span class="keyword">if</span> (std::<span class="built_in">tie</span>(LHS, Pred) > std::<span class="built_in">tie</span>(RHS, SwappedPred)) {</span><br><span class="line"> std::<span class="built_in">swap</span>(LHS, RHS);</span><br><span class="line"> Pred = SwappedPred;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> <span class="built_in">hash_combine</span>(Inst-><span class="built_in">getOpcode</span>(), Pred, LHS, RHS);</span><br><span class="line"> }</span><br><span class="line"> ....</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>As we can see, <span class="math inline">\(hash(binop) = hash(opcode, lhs, rhs)\)</span>, where <span class="math inline">\(lhs\)</span> is the pointer of lhs, <span class="math inline">\(rhs\)</span> is that of rhs. It means that what we can eliminate once is those instruction with the same <strong>references/pointers</strong> of the same value.</p>
<p>For the DFS order in dom-tree, for the same two <span class="math inline">\(op(a,b)\)</span> in BB1 and BB2, only when BB1 dominates BB2 or BB2 dominates BB1, can we eliminate them. However, <em>GVN</em> could solve it for its <em>RPO</em> iteration order (More <strong>expensive</strong> one).</p>
<p>Besides, IR flags like <code>nsw, nuw</code> having no effect on the what IR actually does are ignored.</p>
<p>With such easy implementation, EarlyCSE is <strong>cheap</strong> with <span class="math inline">\(O(n)\)</span> time, but <strong>less effective</strong> than <em>GVN</em>.</p>
<h2 id="ignorecombine-ir-flag">Ignore/Combine IR flag</h2>
<p>When hashing instructions, we always ignore the flags like <code>nsw, nuw</code>. But for <strong>memory instructions</strong>, we will combine the flags like matching id, atomicity.</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line">AvailableLoads.<span class="built_in">insert</span>(MemInst.<span class="built_in">getPointerOperand</span>(),</span><br><span class="line"> <span class="built_in">LoadValue</span>(&Inst, CurrentGeneration,</span><br><span class="line"> MemInst.<span class="built_in">getMatchingId</span>(),</span><br><span class="line"> MemInst.<span class="built_in">isAtomic</span>(),</span><br><span class="line"> MemInst.<span class="built_in">isLoad</span>()));</span><br></pre></td></tr></table></figure>
<h2 id="memory-cse">Memory CSE</h2>
<p>EarlyCSE eliminates memory operations mostly based on <em>Memory SSA</em> analysis. And it records the <strong>generation</strong> of BasicBlock. Currently, such generation is equivalent to the iteration order number (or DFS number) of BasicBlocks.</p>
<p>If generations of two memory operations differs, we can't state they are identical, since the live-out memory parental value could be invalidated by multiple predecessors.</p>
<p>In <code>processNode</code> function, EarlyCSE handles some trivial dead store elimination.</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="comment">/// LastStore - Keep track of the last non-volatile store that we saw... for</span></span><br><span class="line"><span class="comment">/// as long as there in no instruction that reads memory. If we see a store</span></span><br><span class="line"><span class="comment">/// to the same location, we delete the dead store. This zaps trivial dead</span></span><br><span class="line"><span class="comment">/// stores which can occur in bitfield code among other things.</span></span><br><span class="line">Instruction *LastStore = <span class="literal">nullptr</span>;</span><br></pre></td></tr></table></figure>
<p>For non-trivial memory operations, EarlyCSE applies specific methods. Let's take a look at its implementation after lookup:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="function">ParseMemoryInst <span class="title">MemInst</span><span class="params">(&Inst, TTI)</span></span>;</span><br><span class="line"><span class="comment">// If this is a non-volatile load, process it.</span></span><br><span class="line"><span class="keyword">if</span> (MemInst.<span class="built_in">isValid</span>() && MemInst.<span class="built_in">isLoad</span>()) {</span><br><span class="line"> <span class="keyword">if</span> (MemInst.<span class="built_in">isVolatile</span>() || !MemInst.<span class="built_in">isUnordered</span>()) {</span><br><span class="line"> LastStore = <span class="literal">nullptr</span>;</span><br><span class="line"> ++CurrentGeneration;</span><br><span class="line"> }</span><br></pre></td></tr></table></figure>
<p>Here we drop the last store, since volatile/ordered memory operation make the store unCSEable.</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"></span><br><span class="line"><span class="keyword">if</span> (MemInst.<span class="built_in">isInvariantLoad</span>()) {</span><br><span class="line"> <span class="comment">// If we pass an invariant load, we know that memory location is</span></span><br><span class="line"> <span class="comment">// indefinitely constant from the moment of first dereferenceability.</span></span><br><span class="line"> <span class="comment">// We conservatively treat the invariant_load as that moment. If we</span></span><br><span class="line"> <span class="comment">// pass a invariant load after already establishing a scope, don't</span></span><br><span class="line"> <span class="comment">// restart it since we want to preserve the earliest point seen.</span></span><br><span class="line"> <span class="keyword">auto</span> MemLoc = MemoryLocation::<span class="built_in">get</span>(&Inst);</span><br><span class="line"> <span class="keyword">if</span> (!AvailableInvariants.<span class="built_in">count</span>(MemLoc))</span><br><span class="line"> AvailableInvariants.<span class="built_in">insert</span>(MemLoc, CurrentGeneration);</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>For invariant loop, its <em>memory location</em>, or pointer will keep <em>invariant</em> in later stages. So we keep the earliest load, to maximize its effect.</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="comment">// If we have an available version of this load, and if it is the right</span></span><br><span class="line"><span class="comment">// generation or the load is known to be from an invariant location,</span></span><br><span class="line"><span class="comment">// replace this instruction.</span></span><br><span class="line"><span class="comment">//</span></span><br><span class="line"><span class="comment">// If either the dominating load or the current load are invariant, then</span></span><br><span class="line"><span class="comment">// we can assume the current load loads the same value as the dominating</span></span><br><span class="line"><span class="comment">// load.</span></span><br><span class="line">LoadValue InVal = AvailableLoads.<span class="built_in">lookup</span>(MemInst.<span class="built_in">getPointerOperand</span>());</span><br><span class="line"><span class="keyword">if</span> (Value *Op = <span class="built_in">getMatchingValue</span>(InVal, MemInst, CurrentGeneration)) {</span><br><span class="line"> <span class="comment">// Something related to debug information</span></span><br><span class="line"> <span class="keyword">if</span> (InVal.IsLoad)</span><br><span class="line"> <span class="keyword">if</span> (<span class="keyword">auto</span> *I = <span class="built_in">dyn_cast</span><Instruction>(Op))</span><br><span class="line"> <span class="built_in">combineMetadataForCSE</span>(I, &Inst, <span class="literal">false</span>);</span><br><span class="line"> <span class="keyword">if</span> (!Inst.<span class="built_in">use_empty</span>())</span><br><span class="line"> Inst.<span class="built_in">replaceAllUsesWith</span>(Op);</span><br><span class="line"> <span class="comment">// Something related to updating analysis and debug information</span></span><br><span class="line"> Inst.<span class="built_in">eraseFromParent</span>();</span><br><span class="line"> Changed = <span class="literal">true</span>;</span><br><span class="line"> ++NumCSELoad;</span><br><span class="line"> <span class="keyword">continue</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>Similar to SimpleValue case, besides getting matching value throught <em>MemorySSA</em>.</p>
<h2 id="difference-between-gvn-and-earlycse">Difference between GVN and EarlyCSE</h2>
<p>To be continued</p>
]]></content>
<categories>
<category>LLVM</category>
</categories>
<tags>
<tag>Compiler</tag>
<tag>LLVM</tag>
<tag>OpenSource</tag>
</tags>
</entry>
<entry>
<title>LLVM源码解析-Interval Analysis</title>
<url>/2023/09/21/LLVM-Source-Analysis-Interval/</url>
<content><![CDATA[<h2 id="abstract">Abstract</h2>
<p>第一次专门写 blog 解析 LLVM 源码,最近在看鲸书学习编译优化,正好借这个系列结合 Theory 与 Practice。</p>
<p>Interval Analysis 是一种 Control Flow Analysis,常用作于其他优化如 LoopUnroll 的基础。</p>
<p>先看 Interval 类的代码,在编译理论里,Interval 一般指 Node 的集合, 集合里每个 <span class="math inline">\(Node \ne Head\)</span> 都满足 <span class="math inline">\(Pred(Node) \subset Interval\)</span> :</p>
<span id="more"></span>
<h2 id="interval-类">Interval 类</h2>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Interval</span> {</span><br><span class="line"> <span class="comment">/// HeaderNode - The header BasicBlock, which dominates all BasicBlocks in this</span></span><br><span class="line"> <span class="comment">/// interval. Also, any loops in this interval must go through the HeaderNode.</span></span><br><span class="line"> <span class="comment">///</span></span><br><span class="line"> BasicBlock *HeaderNode;</span><br></pre></td></tr></table></figure>
<p>这里的 HeaderNode dominates Interval 里所有的 BasicBlock(Node),代表了一个 Interval。</p>
<!-- more -->
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"></span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"><span class="function"><span class="keyword">inline</span> <span class="title">Interval</span><span class="params">(BasicBlock *Header)</span> : HeaderNode(Header) {</span></span><br><span class="line"> Nodes.<span class="built_in">push_back</span>(Header);</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">inline</span> BasicBlock *<span class="title">getHeaderNode</span><span class="params">()</span> <span class="type">const</span> </span>{ <span class="keyword">return</span> HeaderNode; }</span><br><span class="line"></span><br><span class="line"><span class="comment">/// Nodes - The basic blocks in this interval.</span></span><br><span class="line">std::vector<BasicBlock*> Nodes;</span><br></pre></td></tr></table></figure>
<p>构造函数和一些基本定义, Nodes 存了 Interval 里所有的 BasicBlock。</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="comment">/// Successors - List of BasicBlocks that are reachable directly from nodes in</span></span><br><span class="line"><span class="comment">/// this interval, but are not in the interval themselves.</span></span><br><span class="line"><span class="comment">/// These nodes necessarily must be header nodes for other intervals.</span></span><br><span class="line">std::vector<BasicBlock*> Successors;</span><br></pre></td></tr></table></figure>
<p>Successors 是所有<strong>从</strong>Interval 里的 Node 可以<strong>直接</strong>到达的 Nodes</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="comment">/// Predecessors - List of BasicBlocks that have this Interval's header block</span></span><br><span class="line"><span class="comment">/// as one of their successors.</span></span><br><span class="line">std::vector<BasicBlock*> Predecessors;</span><br></pre></td></tr></table></figure>
<p>Predecessors 则是满足 <span class="math inline">\(Head \in Succ(Node)\)</span> 的所有 Node。</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="comment">/// contains - Find out if a basic block is in this interval</span></span><br><span class="line"><span class="function"><span class="keyword">inline</span> <span class="type">bool</span> <span class="title">contains</span><span class="params">(BasicBlock *BB)</span> <span class="type">const</span> </span>{</span><br><span class="line"> <span class="keyword">for</span> (BasicBlock *Node : Nodes)</span><br><span class="line"> <span class="keyword">if</span> (Node == BB)</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line"> <span class="comment">// I don't want the dependency on <algorithm></span></span><br><span class="line"> <span class="comment">//return find(Nodes.begin(), Nodes.end(), BB) != Nodes.end();</span></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="comment">/// isSuccessor - find out if a basic block is a successor of this Interval</span></span><br><span class="line"><span class="function"><span class="keyword">inline</span> <span class="type">bool</span> <span class="title">isSuccessor</span><span class="params">(BasicBlock *BB)</span> <span class="type">const</span> </span>{</span><br><span class="line"> <span class="keyword">for</span> (BasicBlock *Successor : Successors)</span><br><span class="line"> <span class="keyword">if</span> (Successor == BB)</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line"> <span class="comment">// I don't want the dependency on <algorithm></span></span><br><span class="line"> <span class="comment">//return find(Successors.begin(), Successors.end(), BB) != Successors.end();</span></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="comment">/// Equality operator. It is only valid to compare two intervals from the</span></span><br><span class="line"><span class="comment">/// same partition, because of this, all we have to check is the header node</span></span><br><span class="line"><span class="comment">/// for equality.</span></span><br><span class="line"><span class="keyword">inline</span> <span class="type">bool</span> <span class="keyword">operator</span>==(<span class="type">const</span> Interval &I) <span class="type">const</span> {</span><br><span class="line"> <span class="keyword">return</span> HeaderNode == I.HeaderNode;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>这些比较简单就不多说了。</p>
<h2 id="interval-partition-类">Interval Partition 类</h2>
<p>下面是关键的 IntervalPartition 和 IntervalIterator,也是算法核心:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">IntervalPartition</span> : <span class="keyword">public</span> FunctionPass {</span><br><span class="line"> <span class="keyword">using</span> IntervalMapTy = std::map<BasicBlock *, Interval *>;</span><br><span class="line"> IntervalMapTy IntervalMap;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">using</span> IntervalListTy = std::vector<Interval *>;</span><br><span class="line"> Interval *RootInterval = <span class="literal">nullptr</span>;</span><br><span class="line"> std::vector<Interval *> Intervals;</span><br></pre></td></tr></table></figure>
<p>这里的存储类型也和理论一致,由一个根节点和所有节点的集合以及 BasicBlock 与 Interval 的对应(Map)构成。</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="comment">// addIntervalToPartition - Add an interval to the internal list of intervals,</span></span><br><span class="line"><span class="comment">// and then add mappings from all of the basic blocks in the interval to the</span></span><br><span class="line"><span class="comment">// interval itself (in the IntervalMap).</span></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">IntervalPartition::addIntervalToPartition</span><span class="params">(Interval *I)</span> </span>{</span><br><span class="line"> Intervals.<span class="built_in">push_back</span>(I);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Add mappings for all of the basic blocks in I to the IntervalPartition</span></span><br><span class="line"> <span class="keyword">for</span> (Interval::node_iterator It = I->Nodes.<span class="built_in">begin</span>(), End = I->Nodes.<span class="built_in">end</span>();</span><br><span class="line"> It != End; ++It)</span><br><span class="line"> IntervalMap.<span class="built_in">insert</span>(std::<span class="built_in">make_pair</span>(*It, I));</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>这个函数就是加 Intervals,并把 BasicBlock 和其 Interval 的 Map 建立起来。</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="comment">// updatePredecessors - Interval generation only sets the successor fields of</span></span><br><span class="line"><span class="comment">// the interval data structures. After interval generation is complete,</span></span><br><span class="line"><span class="comment">// run through all of the intervals and propagate successor info as</span></span><br><span class="line"><span class="comment">// predecessor info.</span></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">IntervalPartition::updatePredecessors</span><span class="params">(Interval *Int)</span> </span>{</span><br><span class="line"> BasicBlock *Header = Int-><span class="built_in">getHeaderNode</span>();</span><br><span class="line"> <span class="keyword">for</span> (BasicBlock *Successor : Int->Successors)</span><br><span class="line"> <span class="built_in">getBlockInterval</span>(Successor)->Predecessors.<span class="built_in">push_back</span>(Header);</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>由于生成 Interval 时只更新了 Interval 的 Successors 数据,这里需要更新其对应的 Predecessors。</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="comment">// IntervalPartition ctor - Build the first level interval partition for the</span></span><br><span class="line"><span class="comment">// specified function...</span></span><br><span class="line"><span class="function"><span class="type">bool</span> <span class="title">IntervalPartition::runOnFunction</span><span class="params">(Function &F)</span> </span>{</span><br><span class="line"> <span class="comment">// Pass false to intervals_begin because we take ownership of it's memory</span></span><br><span class="line"> function_interval_iterator I = <span class="built_in">intervals_begin</span>(&F, <span class="literal">false</span>);</span><br><span class="line"> <span class="built_in">assert</span>(I != <span class="built_in">intervals_end</span>(&F) && <span class="string">"No intervals in function!?!?!"</span>);</span><br><span class="line"></span><br><span class="line"> <span class="built_in">addIntervalToPartition</span>(RootInterval = *I);</span><br><span class="line"></span><br><span class="line"> ++I; <span class="comment">// After the first one...</span></span><br><span class="line"></span><br><span class="line"> <span class="comment">// Add the rest of the intervals to the partition.</span></span><br><span class="line"> <span class="keyword">for</span> (function_interval_iterator E = <span class="built_in">intervals_end</span>(&F); I != E; ++I)</span><br><span class="line"> <span class="built_in">addIntervalToPartition</span>(*I);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Now that we know all of the successor information, propagate this to the</span></span><br><span class="line"> <span class="comment">// predecessors for each block.</span></span><br><span class="line"> <span class="keyword">for</span> (Interval *I : Intervals)</span><br><span class="line"> <span class="built_in">updatePredecessors</span>(I);</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>这里就是一个 Interval 一个 Interval 地分解,根据算法原理我们可以知道, 当一个 Interval 更新完,可以根据其 Successors 更新其余的 Interval,最后更新 Preds 并划分整个函数。</p>
<h2 id="interval-iterator-类">Interval Iterator 类</h2>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">template</span><<span class="keyword">class</span> <span class="title class_">NodeTy</span>, <span class="keyword">class</span> <span class="title class_">OrigContainer_t</span>, <span class="keyword">class</span> <span class="title class_">GT</span> = GraphTraits<NodeTy *>,</span><br><span class="line"> <span class="keyword">class</span> IGT = GraphTraits<Inverse<NodeTy *>>></span><br><span class="line"><span class="keyword">class</span> IntervalIterator {</span><br><span class="line"> std::vector<std::pair<Interval *, <span class="keyword">typename</span> Interval::succ_iterator>> IntStack;</span><br><span class="line"> std::set<BasicBlock *> Visited;</span><br><span class="line"> OrigContainer_t *OrigContainer;</span><br><span class="line"> <span class="type">bool</span> IOwnMem; <span class="comment">// If True, delete intervals when done with them</span></span><br><span class="line"> <span class="comment">// See file header for conditions of use</span></span><br></pre></td></tr></table></figure>
<p>这是 Iterator 的数据结构,暂时不需要分析模板,这里直接把 NodeTy 换成 BasicBlock, OrigContainer 看成 Function。</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="comment">// ProcessInterval - This method is used during the construction of the</span></span><br><span class="line"><span class="comment">// interval graph. It walks through the source graph, recursively creating</span></span><br><span class="line"><span class="comment">// an interval per invocation until the entire graph is covered. This uses</span></span><br><span class="line"><span class="comment">// the ProcessNode method to add all of the nodes to the interval.</span></span><br><span class="line"><span class="comment">//</span></span><br><span class="line"><span class="comment">// This method is templated because it may operate on two different source</span></span><br><span class="line"><span class="comment">// graphs: a basic block graph, or a preexisting interval graph.</span></span><br><span class="line"><span class="function"><span class="type">bool</span> <span class="title">ProcessInterval</span><span class="params">(NodeTy *Node)</span> </span>{</span><br><span class="line"> BasicBlock *Header = <span class="built_in">getNodeHeader</span>(Node);</span><br><span class="line"> <span class="keyword">if</span> (!Visited.<span class="built_in">insert</span>(Header).second)</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line"></span><br><span class="line"> Interval *Int = <span class="keyword">new</span> <span class="built_in">Interval</span>(Header);</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Check all of our successors to see if they are in the interval...</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">typename</span> GT::ChildIteratorType I = GT::<span class="built_in">child_begin</span>(Node),</span><br><span class="line"> E = GT::<span class="built_in">child_end</span>(Node); I != E; ++I)</span><br><span class="line"> <span class="built_in">ProcessNode</span>(Int, <span class="built_in">getSourceGraphNode</span>(OrigContainer, *I));</span><br><span class="line"></span><br><span class="line"> IntStack.<span class="built_in">push_back</span>(std::<span class="built_in">make_pair</span>(Int, <span class="built_in">succ_begin</span>(Int)));</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="comment">// ProcessNode - This method is called by ProcessInterval to add nodes to the</span></span><br><span class="line"><span class="comment">// interval being constructed, and it is also called recursively as it walks</span></span><br><span class="line"><span class="comment">// the source graph. A node is added to the current interval only if all of</span></span><br><span class="line"><span class="comment">// its predecessors are already in the graph. This also takes care of keeping</span></span><br><span class="line"><span class="comment">// the successor set of an interval up to date.</span></span><br><span class="line"><span class="comment">//</span></span><br><span class="line"><span class="comment">// This method is templated because it may operate on two different source</span></span><br><span class="line"><span class="comment">// graphs: a basic block graph, or a preexisting interval graph.</span></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">ProcessNode</span><span class="params">(Interval *Int, NodeTy *Node)</span> </span>{</span><br><span class="line"> <span class="built_in">assert</span>(Int && <span class="string">"Null interval == bad!"</span>);</span><br><span class="line"> <span class="built_in">assert</span>(Node && <span class="string">"Null Node == bad!"</span>);</span><br><span class="line"></span><br><span class="line"> BasicBlock *NodeHeader = <span class="built_in">getNodeHeader</span>(Node);</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (Visited.<span class="built_in">count</span>(NodeHeader)) { <span class="comment">// Node already been visited?</span></span><br><span class="line"> <span class="keyword">if</span> (Int-><span class="built_in">contains</span>(NodeHeader)) { <span class="comment">// Already in this interval...</span></span><br><span class="line"> <span class="keyword">return</span>;</span><br><span class="line"> } <span class="keyword">else</span> { <span class="comment">// In other interval, add as successor</span></span><br><span class="line"> <span class="keyword">if</span> (!Int-><span class="built_in">isSuccessor</span>(NodeHeader)) <span class="comment">// Add only if not already in set</span></span><br><span class="line"> Int->Successors.<span class="built_in">push_back</span>(NodeHeader);</span><br><span class="line"> }</span><br><span class="line"> } <span class="keyword">else</span> { <span class="comment">// Otherwise, not in interval yet</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">typename</span> IGT::ChildIteratorType I = IGT::<span class="built_in">child_begin</span>(Node),</span><br><span class="line"> E = IGT::<span class="built_in">child_end</span>(Node); I != E; ++I) {</span><br><span class="line"> <span class="keyword">if</span> (!Int-><span class="built_in">contains</span>(*I)) { <span class="comment">// If pred not in interval, we can't be</span></span><br><span class="line"> <span class="keyword">if</span> (!Int-><span class="built_in">isSuccessor</span>(NodeHeader)) <span class="comment">// Add only if not already in set</span></span><br><span class="line"> Int->Successors.<span class="built_in">push_back</span>(NodeHeader);</span><br><span class="line"> <span class="keyword">return</span>; <span class="comment">// See you later</span></span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// If we get here, then all of the predecessors of BB are in the interval</span></span><br><span class="line"> <span class="comment">// already. In this case, we must add BB to the interval!</span></span><br><span class="line"> <span class="built_in">addNodeToInterval</span>(Int, Node);</span><br><span class="line"> Visited.<span class="built_in">insert</span>(NodeHeader); <span class="comment">// The node has now been visited!</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (Int-><span class="built_in">isSuccessor</span>(NodeHeader)) {</span><br><span class="line"> <span class="comment">// If we were in the successor list from before... remove from succ list</span></span><br><span class="line"> llvm::<span class="built_in">erase_value</span>(Int->Successors, NodeHeader);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="comment">// Now that we have discovered that Node is in the interval, perhaps some</span></span><br><span class="line"> <span class="comment">// of its successors are as well?</span></span><br><span class="line"> <span class="keyword">for</span> (<span class="keyword">typename</span> GT::ChildIteratorType It = GT::<span class="built_in">child_begin</span>(Node),</span><br><span class="line"> End = GT::<span class="built_in">child_end</span>(Node); It != End; ++It)</span><br><span class="line"> <span class="built_in">ProcessNode</span>(Int, <span class="built_in">getSourceGraphNode</span>(OrigContainer, *It));</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>这是最关键的算法部分,第一个<code>ProcessInterval</code>函数以 Node 作为 Header, 开始寻找以此为 Header 的 Interval(通过调用第二个 ProcessInterval)。然后把找到的 Interval 和对应的 Successors 迭代器入栈, 然后在<code>operator ++()</code>里面每次搜索 IntStack 里所有 Successors 作为 Header 的 Interval,其实这是一种<strong>Interval 层面的 BFS</strong>。 同时注意,如果已经 visited 改 Node, 就返回 false, 说明这次没有找到 Interval。</p>
<p>我们接下来看 Interval 里面的图算法,也就是第二个<code>ProcessInterval</code>函数里的逻辑。</p>
<p>如果 Visited[Node]:</p>
<ul>
<li>若 Interval 里已经有这个 Node 了,就结束这次寻找</li>
<li>若没有,说明在别的 Interval 里,也就是说,是本 Interval 的 Successor 之一</li>
</ul>
<p>若没有 Visited</p>
<ul>
<li>若逆向搜索发现 Pred(Node)不在 Interval 里,则说明我们没有搜完 Node 的 Preds, 也就是还不能 dominate Node, 先退出让 Preds 先被搜完 (BasicBlock 层面的 BFS, 其实还是会 see you later)
<ul>
<li>若 Node 不是 Successor,先加进去,后面再删除。<span class="math inline">\((1)\)</span></li>
</ul></li>
</ul>
<p>然后把 Node 加进 Interval 里,若 Node 之前是 Successor 现在取出,对应的是情况<span class="math inline">\((1)\)</span></p>
<p>最后继续搜索子节点,直到所有对应的 Node 都被加进来,注意这里 Interval 的前一个 Interval 的 Successors 是未更新的, 这也就是为什么 IntervalPartition 类要调用<code>updatePredecessors(I)</code>。</p>
]]></content>
<categories>
<category>LLVM</category>
</categories>
<tags>
<tag>Compiler</tag>
<tag>LLVM</tag>
<tag>OpenSource</tag>
</tags>
</entry>
<entry>
<title>LeetCode 42 接雨水 题解</title>
<url>/2023/08/23/LeetCode-42/</url>
<content><![CDATA[<p>题目描述: <img src="/images/leetcode42.png" alt="img" /></p>
<p>基本想法:</p>
<p>对于每个方格索引 <span class="math inline">\(x\)</span>,其容量<span class="math inline">\(c(x)\)</span>取决于其左边最高的格子和右边最高的格子,也就是说令:</p>
<p><span class="math display">\[t(x) = \min(\max_{y<x}{\{h(y)\}} , \max_{y>x}{\{h(y)\}})\]</span></p>
<p>则</p>
<p><span class="math display">\[
c(x) =
\begin{cases}
t(x) - h(x), & \text{if } y > x \\
0, & \text{otherwise}
\end{cases}
\]</span></p>
<p>故我们可以有代码:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Solution</span> {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="function"><span class="type">int</span> <span class="title">trap</span><span class="params">(vector<<span class="type">int</span>> &height)</span> </span>{</span><br><span class="line"> <span class="type">int</span> n = height.<span class="built_in">size</span>();</span><br><span class="line"> <span class="type">int</span> *maxLessThan = <span class="keyword">new</span> <span class="type">int</span>[n];</span><br><span class="line"> <span class="type">int</span> *maxGreaterThan = <span class="keyword">new</span> <span class="type">int</span>[n];</span><br><span class="line"> maxLessThan[<span class="number">0</span>] = <span class="number">0</span>;</span><br><span class="line"> maxGreaterThan[n - <span class="number">1</span>] = <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line"> <span class="type">int</span> curMax = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">1</span>; i < n; ++i) {</span><br><span class="line"> <span class="keyword">if</span> (height[i - <span class="number">1</span>] > curMax)</span><br><span class="line"> curMax = height[i - <span class="number">1</span>];</span><br><span class="line"> maxLessThan[i] = curMax;</span><br><span class="line"> }</span><br><span class="line"> curMax = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = n - <span class="number">2</span>; i >= <span class="number">0</span>; --i) {</span><br><span class="line"> <span class="keyword">if</span> (height[i + <span class="number">1</span>] > curMax)</span><br><span class="line"> curMax = height[i + <span class="number">1</span>];</span><br><span class="line"> maxGreaterThan[i] = curMax;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="type">int</span> ret = <span class="number">0</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i < n; ++i) {</span><br><span class="line"> <span class="type">int</span> t = std::<span class="built_in">min</span>(maxLessThan[i], maxGreaterThan[i]);</span><br><span class="line"> <span class="type">int</span> capa = t > height[i] ? t - height[i] : <span class="number">0</span>;</span><br><span class="line"> ret += capa;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="keyword">return</span> ret;</span><br><span class="line"> }</span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<p>当然还有同样思想的双指针法,此处不表。</p>
]]></content>
<categories>
<category>LeetCode</category>
</categories>
<tags>
<tag>Algorithm</tag>
<tag>LeetCode</tag>
</tags>
</entry>
<entry>
<title>LeetCode 44 通配符匹配 题解</title>
<url>/2023/08/23/LeetCode-44/</url>
<content><![CDATA[<p>题目描述: <img src="/images/leetcode44.png" alt="img" /></p>
<h2 id="动态规划">动态规划</h2>
<p>简单想法:</p>
<p>使用动态规划,令 <span class="math inline">\(dp[i][j]\)</span> 为 <strong>是否 <span class="math inline">\(s[0..i]\)</span> 与 <span class="math inline">\(p[0..j]\)</span> 匹配</strong> ,也就是 s 前 i 个字符与 p 前 j 个字符匹配。</p>
<p>则有初始状态:</p>
<p><span class="math display">\[dp[0][0] = true\]</span></p>
<p>由于长度大于 0 的字符串不可能被长度为 0 的模式匹配,故令:</p>
<p><span class="math display">\[dp[i][0] = false, 0 < i \le sn\]</span></p>
<p>同时长度为 0 的字符串只可能被形如"<strong>*</strong>"这样<strong>全为通配符</strong>的模式匹配,故令:</p>
<p><span class="math display">\[dp[0][j] = dp[0][j-1] \quad \wedge \quad p[j-1] = '*'\]</span></p>
<p>状态转移方程则为:</p>
<p><span class="math display">\[
\begin{equation*} %加*表示不对公式编号
\begin{split}
dp[i][j] =
& dp[i][j - 1] \wedge p[j - 1] = '*' \quad \vee \\
& dp[i - 1][j] \wedge p[j - 1] = '*' \quad \vee \\
& dp[i - 1][j - 1] \wedge (s[i - 1] = p[j - 1] \vee p[j - 1] = '?' \vee p[j - 1] = '*')
\end{split}
\end{equation*}
\]</span></p>
<p>故我们可以有代码:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="type">bool</span> dp[<span class="number">2001</span>][<span class="number">2001</span>];</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">Solution</span> {</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line"> <span class="function"><span class="type">bool</span> <span class="title">isMatch</span><span class="params">(string s, string p)</span> </span>{</span><br><span class="line"> <span class="keyword">if</span> (s.<span class="built_in">length</span>() == <span class="number">0</span> &&</span><br><span class="line"> std::<span class="built_in">all_of</span>(p.<span class="built_in">begin</span>(), p.<span class="built_in">end</span>(), [](<span class="type">char</span> c) { <span class="keyword">return</span> c == <span class="string">'*'</span>; }))</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">true</span>;</span><br><span class="line"> <span class="keyword">if</span> (p.<span class="built_in">length</span>() == <span class="number">0</span>)</span><br><span class="line"> <span class="keyword">return</span> <span class="literal">false</span>;</span><br><span class="line"></span><br><span class="line"> <span class="type">int</span> sn = s.<span class="built_in">length</span>();</span><br><span class="line"> <span class="type">int</span> pn = p.<span class="built_in">length</span>();</span><br><span class="line"> dp[<span class="number">0</span>][<span class="number">0</span>] = <span class="number">1</span>;</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">1</span>; i <= pn; ++i) {</span><br><span class="line"> dp[<span class="number">0</span>][i] = std::<span class="built_in">all_of</span>(p.<span class="built_in">begin</span>(), p.<span class="built_in">begin</span>() + i,</span><br><span class="line"> [](<span class="type">char</span> c) { <span class="keyword">return</span> c == <span class="string">'*'</span>; });</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">1</span>; i <= sn; ++i) {</span><br><span class="line"> dp[i][<span class="number">0</span>] = <span class="literal">false</span>;</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">1</span>; i <= sn; ++i) {</span><br><span class="line"> <span class="keyword">for</span> (<span class="type">int</span> j = <span class="number">1</span>; j <= pn; ++j) {</span><br><span class="line"></span><br><span class="line"> <span class="type">bool</span> a = dp[i][j] =</span><br><span class="line"> (dp[i][j - <span class="number">1</span>] && p[j - <span class="number">1</span>] == <span class="string">'*'</span>) ||</span><br><span class="line"> (dp[i - <span class="number">1</span>][j] && p[j - <span class="number">1</span>] == <span class="string">'*'</span>) ||</span><br><span class="line"> (dp[i - <span class="number">1</span>][j - <span class="number">1</span>] &&</span><br><span class="line"> (s[i - <span class="number">1</span>] == p[j - <span class="number">1</span>] || p[j - <span class="number">1</span>] == <span class="string">'?'</span> || p[j - <span class="number">1</span>] == <span class="string">'*'</span>));</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> <span class="keyword">return</span> dp[sn][pn];</span><br><span class="line"> }</span><br><span class="line">};</span><br></pre></td></tr></table></figure>
<p>时间复杂度为 <span class="math inline">\(O(mn)\)</span>, 空间复杂度为 <span class="math inline">\(O(mn)\)</span></p>
<h2 id="贪心leetcode-题解">贪心(LeetCode 题解)</h2>
<p>前一方法的瓶颈在于对星号 <span class="math inline">\(*\)</span> 的处理方式:使用动态规划枚举所有的情况。由于星号是「万能」的匹配字符,连续的多个星号和单个星号实际上是等价的,那么不连续的多个星号呢?</p>
<p>我们以 <span class="math inline">\(p=∗ abcd ∗\)</span> 为例,ppp 可以匹配所有包含子串 abcd 的字符串,也就是说,我们只需要暴力地枚举字符串 s 中的每个位置作为起始位置,并判断对应的子串是否为 abcd 即可。这种暴力方法的时间复杂度为 O(mn),与动态规划一致,但不需要额外的空间。</p>
<p>如果 p=∗abcd∗efgh∗i∗ 呢?显然,ppp 可以匹配所有依次出现子串 abcd、efgh、i 的字符串。此时,对于任意一个字符串 sss,我们首先暴力找到最早出现的 abcd,随后从下一个位置开始暴力找到最早出现的 efgh,最后找出 i,就可以判断 sss 是否可以与 ppp 匹配。这样「贪心地」找到最早出现的子串是比较直观的,因为如果 sss 中多次出现了某个子串,那么我们选择最早出现的位置,可以使得后续子串能被找到的机会更大。</p>
<p>因此,如果模式 ppp 的形式为 <span class="math display">\[* u_1 * u_2 * u_3 * \cdots * u_x ∗\]</span> ,即字符串(可以为空)和星号交替出现,并且首尾字符均为星号,那么我们就可以设计出下面这个基于贪心的暴力匹配算法。算法的本质是:如果在字符串 sss 中首先找到 <span class="math inline">\(u_1\)</span> ,再找到 <span class="math inline">\(u_2, u_3, \cdots, u_x\)</span>,那么 s 就可以与模式 p 匹配,伪代码如下:</p>
<figure class="highlight c++"><table><tr><td class="code"><pre><span class="line"><span class="comment">// 我们用 sIndex 和 pIndex 表示当前遍历到 s 和 p 的位置</span></span><br><span class="line"><span class="comment">// 此时我们正在 s 中寻找某个 u_i</span></span><br><span class="line"><span class="comment">// 其在 s 和 p 中的起始位置为 sRecord 和 pRecord</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// sIndex 和 sRecord 的初始值为 0</span></span><br><span class="line"><span class="comment">// 即我们从字符串 s 的首位开始匹配</span></span><br><span class="line">sIndex = sRecord = <span class="number">0</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// pIndex 和 pRecord 的初始值为 1</span></span><br><span class="line"><span class="comment">// 这是因为模式 p 的首位是星号,那么 u_1 的起始位置为 1</span></span><br><span class="line">pIndex = pRecord = <span class="number">1</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">while</span> sIndex < s.length <span class="keyword">and</span> pIndex < p.length <span class="keyword">do</span></span><br><span class="line"> <span class="keyword">if</span> p[pIndex] == <span class="string">'*'</span> then</span><br><span class="line"> <span class="comment">// 如果遇到星号,说明找到了 u_i,开始寻找 u_i+1</span></span><br><span class="line"> pIndex += <span class="number">1</span></span><br><span class="line"> <span class="comment">// 记录下起始位置</span></span><br><span class="line"> sRecord = sIndex</span><br><span class="line"> pRecord = pIndex</span><br><span class="line"> <span class="keyword">else</span> <span class="keyword">if</span> <span class="built_in">match</span>(s[sIndex], p[pIndex]) then</span><br><span class="line"> <span class="comment">// 如果两个字符可以匹配,就继续寻找 u_i 的下一个字符</span></span><br><span class="line"> sIndex += <span class="number">1</span></span><br><span class="line"> pIndex += <span class="number">1</span></span><br><span class="line"> <span class="keyword">else</span> <span class="keyword">if</span> sRecord + <span class="number">1</span> < s.length then</span><br><span class="line"> <span class="comment">// 如果两个字符不匹配,那么需要重新寻找 u_i</span></span><br><span class="line"> <span class="comment">// 枚举下一个 s 中的起始位置</span></span><br><span class="line"> sRecord += <span class="number">1</span></span><br><span class="line"> sIndex = sRecord</span><br><span class="line"> pIndex = pRecord</span><br><span class="line"> <span class="keyword">else</span></span><br><span class="line"> <span class="comment">// 如果不匹配并且下一个起始位置不存在,那么匹配失败</span></span><br><span class="line"> <span class="keyword">return</span> False</span><br><span class="line"> end <span class="keyword">if</span></span><br><span class="line">end <span class="keyword">while</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// 由于 p 的最后一个字符是星号,那么 s 未匹配完,那么没有关系</span></span><br><span class="line"><span class="comment">// 但如果 p 没有匹配完,那么 p 剩余的字符必须都是星号</span></span><br><span class="line"><span class="keyword">return</span> <span class="built_in">all</span>(p[pIndex] ~ p[p.length - <span class="number">1</span>] == <span class="string">'*'</span>)</span><br></pre></td></tr></table></figure>
<p>当然还有一些特殊情况,如星号不总是出现在前后,此处省略。 时间复杂度: 渐进:<span class="math inline">\(O(mn)\)</span>,平均复杂度:<span class="math inline">\(O(m\log{n})\)</span> 具体的分析可以参考论文<a href="https://arxiv.org/abs/1407.0950">On the Average-case Complexity of Pattern Matching with Wildcards</a>,注意论文中的分析是对于每一个<span class="math inline">\(u_i\)</span> 而言的,即模式中只包含小写字母和问号,本题相当于多个连续模式的情况。由于超出了面试难度。这里不再赘述。</p>
<p>空间复杂度:O(1)</p>
<h2 id="此外leetcode-官方题解">此外(LeetCode 官方题解)</h2>
<p>在贪心方法中,对于每一个被星号分隔的、只包含小写字符和问号的子模式 <span class="math inline">\(u_i\)</span> ,我们在原串中使用的是暴力匹配的方法。然而这里是可以继续进行优化的,即使用 AC 自动机 代替暴力方法进行匹配。 由于 AC 自动机本身已经是竞赛难度的知识点,而本题还需要在 AC 自动机中额外存储一些内容才能完成匹配,因此这种做法远远超过了面试难度。 这里只给出参考讲义 <a href="http://www.cs.cmu.edu/~ab/CMU/Week%2010-%20Strings%20Search/print04.pdf">Set Matching and Aho-Corasick Algorithm</a>:</p>
<ul>
<li><p>讲义的前 6 页介绍了字典树 Trie;</p></li>
<li><p>讲义的 7−19 页介绍了 AC 自动机,它是以字典树为基础的;</p></li>
<li><p>讲义的 20−23 页介绍了基于 AC 自动机的一种 wildcard matching 算法,其中的 wildcard <span class="math inline">\(\phi\)</span> 就是本题中的问号。</p></li>
</ul>
<p>感兴趣的读者可以尝试进行学习。</p>
]]></content>
<categories>
<category>LeetCode</category>
</categories>
<tags>
<tag>Algorithm</tag>
<tag>LeetCode</tag>
</tags>
</entry>
<entry>
<title>LeetCode 89 格雷编码</title>
<url>/2023/09/15/LeetCode-89/</url>
<content><![CDATA[<p>题目如下:</p>
<p>n 位格雷码序列 是一个由 <span class="math inline">\(2^n\)</span> 个整数组成的序列,其中:</p>
<p>每个整数都在范围 <span class="math inline">\([0, 2^n - 1]\)</span> 内, 要求:</p>
<ul>
<li>第一个整数是 0</li>
<li>一个整数在序列中出现 不超过一次</li>
<li>每对 相邻 整数的二进制表示 恰好一位不同 ,且</li>
<li>第一个 和 最后一个 整数的二进制表示 恰好一位不同</li>
</ul>
<p>给你一个整数 n ,返回任一有效的 n 位格雷码序列 。</p>
<blockquote>
<p>说实话我一开始想的简单了,直接暴力搜索,最后发现不行,只能 refer 一下官方题解了</p>
</blockquote>
<h2 id="方法一">方法一</h2>
<p>我们可以用归纳法,从 <span class="math inline">\(n-1\)</span>推到<span class="math inline">\(n\)</span>,设序列 <span class="math inline">\(G_n\)</span> 为<span class="math inline">\(n\)</span> 位的格雷码序列, 我们可以从 <span class="math inline">\(G_{n-1}\)</span> 推到 <span class="math inline">\(G_n\)</span>。</p>
<p>首先把 <span class="math inline">\(G_{n-1}\)</span> 中所有元素的<span class="math inline">\(n-1\)</span>位设为 1,得到<span class="math inline">\(G_{n-1}^T\)</span>, 然后拼接 <span class="math inline">\(G_{n-1}\)</span>和<span class="math inline">\(G_{n-1}^T\)</span>就得到了我们想要的结果。</p>
<p>为什么呢?其实很简单,<span class="math inline">\(G_{n-1}^T\)</span> 中每个数字都与<span class="math inline">\(G_{n-1}\)</span> <strong>有且仅有</strong>一位不同, 且 <span class="math inline">\(G_{n-1}\)</span>是<span class="math inline">\([0,2^{n-1}]\)</span>的一个排列,<span class="math inline">\(G_{n-1}^T\)</span>则是<span class="math inline">\([2^{n-1}, 2^{n}-1]\)</span>上的排列。 二者组合后自然就得到了<span class="math inline">\([0,2^n-1]\)</span>上的排列,且依次穿插后二进制位恰有一位不同。</p>
<h2 id="方法二">方法二</h2>
<p>这个方法是纯粹的找规律,如下: <img src="/images/leetcode89.png" alt="a" /></p>
]]></content>
<categories>
<category>LeetCode</category>
</categories>
<tags>
<tag>Algorithm</tag>
<tag>LeetCode</tag>
</tags>
</entry>
<entry>
<title>Linear Algebra 4.3 -- Least Squares Approximations</title>
<url>/2023/08/28/Linear-Algebra-4-3-Least-Squares/</url>
<content><![CDATA[<p>线性回归基本方法--<strong>最小二乘法(Least Squares Approximations)</strong>,这里记录具体思想。</p>
<p><span class="math inline">\(Ax=b\)</span>在实际情况中大多是无解的,一种情况是:方程式往往会比未知数更多(<span class="math inline">\(m>n\)</span>),而 n 列只能产生 m 维线性空间的一小部分。 换句话讲,<span class="math inline">\(\boldsymbol{b}\)</span> 总是在 <span class="math inline">\(C(A)\)</span> 之外。这时我们便可以通过上一章投影的有关知识解决这一问题。</p>
<p>首先给出结果,和投影一样,我们的基本方程仍是如下方程: <span class="math display">\[A^TA\boldsymbol{\hat{x}}=A^T\boldsymbol{b}\]</span></p>
<p>而我们的基本目标就是减小 error ( <span class="math inline">\(\boldsymbol{Ax-b}\)</span> ),我们可以从三个不同的方向解决的这个问题:</p>
<h4 id="几何方向">几何方向</h4>
<p>对于一条直线<span class="math inline">\(\boldsymbol{b}\)</span>,要让其和一个平面/子空间 <span class="math inline">\(A\boldsymbol{x}\)</span> 相距最小, 必然要求出其投影<span class="math inline">\(\boldsymbol{p}\)</span>,<span class="math inline">\(\boldsymbol{e = b - p}\)</span> 此时就是最小的, <span class="math inline">\(\boldsymbol{p}\)</span> 此时也是比较合适的的接近解的直线。</p>
<h4 id="代数方向">代数方向</h4>
<p>每一个向量 <span class="math inline">\(\boldsymbol{b}\)</span> 都可以被分成两个部分,一个是在 <span class="math inline">\(C(A)\)</span> 中的 <span class="math inline">\(\boldsymbol{p}\)</span>, 另一部分则是正交于 <span class="math inline">\(C(A)\)</span> 的 <span class="math inline">\(\boldsymbol{e}\)</span>。</p>
<p><span class="math inline">\(A\boldsymbol{x = b = p + e}\)</span> 是不可解的</p>
<p><span class="math inline">\(A\boldsymbol{\hat{x} = p}\)</span> 则是可解的</p>
<p>而后者的解则留下了最小的误差$ $。最小的原因:</p>
<p>这里有 <strong>Squared length for any <span class="math inline">\(x\)</span></strong>: <span class="math inline">\(||Ax - b||^2 = ||Ax-p||^2 + ||e||^2\)</span></p>
<p>而我们把 <span class="math inline">\(||Ax-p||^2\)</span> 减到了 <span class="math inline">\(0\)</span> ,已经把 <span class="math inline">\(||Ax - b||^2\)</span> 减到不能再减了。</p>
<h4 id="微积分方向">微积分方向</h4>
<p>举例而言,对于直线<span class="math inline">\(C + Dt\)</span>,有三个样本点:<span class="math inline">\((0,6), (1,0), (2,0)\)</span>,则有:</p>
<p><span class="math display">\[
A=\left [ \begin{matrix}
1& 0 \\
1& 1 \\
1& 2 \\
\end{matrix} \right ] ,
\boldsymbol{x} = \left [ \begin{matrix}
C \\
D \\
\end{matrix} \right ] ,
\boldsymbol{b} = \left [ \begin{matrix}
6 \\
0 \\
0 \\
\end{matrix} \right ]
\]</span></p>
<p>我们要最小化 <span class="math inline">\(E = ||Ax-b||^2\)</span> 则要有: <span class="math display">\[\frac{\partial E}{\partial C} = 0, \quad \frac{\partial E}{\partial D} = 0\]</span></p>
<p>事实上最后化简的结果与 <span class="math inline">\(A^TA\hat{x}=A^Tb\)</span> 是一样的。</p>
]]></content>
<categories>
<category>Linear Algebra</category>
</categories>
<tags>
<tag>Math</tag>
<tag>Linear Algebra</tag>
</tags>
</entry>
<entry>
<title>Linear Algebra 4.2 -- Projection</title>
<url>/2023/08/28/Linear-Algebra-Projection/</url>
<content><![CDATA[<p>The projection of <span class="math inline">\(\boldsymbol{b}\)</span> onto a subspace <span class="math inline">\(C(A)\)</span> is computed by:</p>
<p><span class="math display">\[
\boldsymbol{p} = P\boldsymbol{b}
\]</span></p>
<p>where <span class="math inline">\(P\)</span> is called <strong>Projection Matrix</strong>. The reason for multiplying a matrix is based on how the projection is computed.</p>
<p>Here is the reasoning steps:</p>
<p>Let's image that there is <span class="math inline">\(\boldsymbol{b}\)</span> projecting onto a plane <span class="math inline">\(C(A)\)</span>, producing projection <span class="math inline">\(\boldsymbol{p}\)</span>. Then <span class="math inline">\(\boldsymbol{p}\)</span> is in <span class="math inline">\(C(A)\)</span>, which could be expressed as <span class="math inline">\(A\boldsymbol{\hat{x}}\)</span>. Our <strong>goal</strong> is to get <span class="math inline">\(\boldsymbol{\hat{x}}\)</span>.</p>
<p>Let <span class="math inline">\(\boldsymbol{e = b - A\hat{x}}\)</span> be the error vector , only when <span class="math inline">\(\boldsymbol{e}\)</span> is <strong>perpendicular</strong> to the subspace, can we say <span class="math inline">\(\boldsymbol{p = b - e}\)</span> is projection.</p>
<p>Since <span class="math inline">\(\boldsymbol{e}\)</span> is perpendicular to <span class="math inline">\(C(A)\)</span>, we can get: <span class="math display">\[A^T(\boldsymbol{b}-A\boldsymbol{\hat{x}}) = \boldsymbol{0}\]</span> or <span class="math display">\[A^TA\boldsymbol{\hat{x}} = A^T\boldsymbol{b}\]</span></p>
<p>The symmetric matrix <span class="math inline">\(A^TA\)</span> is invertible if and only if <span class="math inline">\(\boldsymbol{a's}\)</span> in <span class="math inline">\(A\)</span> are <strong>independent</strong>. Then, <span class="math display">\[\boldsymbol{p} = A\boldsymbol{\hat{x}}=A(A^TA)^{-1}A^T\boldsymbol{b}\]</span></p>
<p>Here <span class="math inline">\(A(A^TA)^{-1}A^T\)</span> is a matrix, we name it <strong>Projection Matrix</strong>. You might try to split <span class="math inline">\((A^TA)^{-1}\)</span> into <span class="math inline">\(A^{-1}(A^{T})^{-1}\)</span>, however when <span class="math inline">\(A\)</span> is rectangular, it has no inverse.</p>
<p>Or when <span class="math inline">\(A\)</span> is invertible, <span class="math inline">\(N(A), N(A^T)\)</span> contains only <strong>zero</strong> vector, where <span class="math inline">\(A^T\boldsymbol{e} = 0 \rightarrow \boldsymbol{e=0, b=p}\)</span> itself, <span class="math inline">\(P = \boldsymbol{I}\)</span> satisfies it well.</p>
<h4 id="why-the-symmetric-matrix-ata-is-invertible-if-and-only-if-boldsymbolas-in-a-are-independent">Why the symmetric matrix <span class="math inline">\(A^TA\)</span> is invertible if and only if <span class="math inline">\(\boldsymbol{a's}\)</span> in <span class="math inline">\(A\)</span> are <strong>independent</strong>?</h4>
<p><span class="math display">\[A^TAx = 0 \Longleftrightarrow Ax = 0\]</span></p>
<p>Thus <span class="math inline">\(A^TA\)</span> has the same nullspace with <span class="math inline">\(A\)</span>. <span class="math inline">\(A\)</span> is invertible, <strong>if and only if</strong> <span class="math inline">\(A^TA\)</span> is invertible.</p>
]]></content>
<categories>
<category>Linear Algebra</category>
</categories>
<tags>
<tag>Math</tag>
<tag>Linear Algebra</tag>
</tags>
</entry>
<entry>
<title>Mathematica微积分常用命令</title>
<url>/2023/02/06/Mathematica%E5%BE%AE%E7%A7%AF%E5%88%86%E5%B8%B8%E7%94%A8%E5%91%BD%E4%BB%A4/</url>
<content><![CDATA[<p>作为大一新生,每天都要为了数学作业焦头烂额,为了解决这个问题,聪慧的我想到了利用数学工具 Mathematica 来解决这个问题</p>
<p>于是我先用南大邮箱获得了 mma,并在 Ubuntu 上安装了 mma 及其依赖</p>
<p>下面记录有关求极限,求微分,以及求积分的几个模板</p>
<span id="more"></span>
<h2 id="极限limit">极限(Limit)</h2>
<p>我们要求得下列式子的极限:</p>
<p><span class="math inline">\(Assume {\quad} f'(a)=\sqrt{2} {\quad} f''(a)=2\)</span></p>
<p>$_{x a} - $</p>
<p>我们在 mma 可以输入以下代码</p>
<figure class="highlight mathematica"><table><tr><td class="code"><pre><span class="line"><span class="built_in">Limit</span><span class="punctuation">[</span><span class="number">1</span><span class="operator">/</span><span class="punctuation">(</span><span class="variable">f</span><span class="punctuation">[</span><span class="variable">x</span><span class="punctuation">]</span> <span class="operator">-</span> <span class="variable">f</span><span class="punctuation">[</span><span class="variable">a</span><span class="punctuation">]</span><span class="punctuation">)</span> <span class="operator">-</span> <span class="number">1</span><span class="operator">/</span><span class="punctuation">(</span><span class="punctuation">(</span><span class="variable">x</span> <span class="operator">-</span> <span class="variable">a</span><span class="punctuation">)</span> <span class="variable">f</span><span class="operator">'</span><span class="punctuation">[</span><span class="variable">x</span><span class="punctuation">]</span><span class="punctuation">)</span><span class="operator">,</span> <span class="variable">x</span> <span class="operator">-></span> <span class="variable">a</span><span class="operator">,</span></span><br><span class="line"> <span class="built_in">Assumptions</span> <span class="operator">-></span> <span class="punctuation">{</span><span class="built_in">D</span><span class="punctuation">[</span><span class="variable">f</span><span class="punctuation">[</span><span class="variable">a</span><span class="punctuation">]</span><span class="operator">,</span> <span class="variable">a</span><span class="punctuation">]</span> <span class="operator">=</span> <span class="built_in">Sqrt</span><span class="punctuation">[</span><span class="number">2</span><span class="punctuation">]</span><span class="operator">,</span> <span class="built_in">D</span><span class="punctuation">[</span><span class="built_in">D</span><span class="punctuation">[</span><span class="variable">f</span><span class="punctuation">[</span><span class="variable">a</span><span class="punctuation">]</span><span class="operator">,</span> <span class="variable">a</span><span class="punctuation">]</span><span class="operator">,</span> <span class="variable">a</span><span class="punctuation">]</span> <span class="operator">=</span> <span class="number">2</span><span class="punctuation">}</span><span class="punctuation">]</span></span><br></pre></td></tr></table></figure>
<hr />
<h2 id="微分导数derivative">微分/导数(Derivative)</h2>
<p>我们要求得下列函数的导数:</p>
<p><span class="math inline">\(f(x)=\sin{x}^{\sin{x}}+\ln{\int_0^x{\sqrt{\tan{x}}dx}}\)</span></p>
<p>我们在 mma 可以输入以下代码</p>
<figure class="highlight mathematica"><table><tr><td class="code"><pre><span class="line"><span class="variable">f</span><span class="punctuation">[</span><span class="type">_x</span><span class="punctuation">]</span><span class="operator">=...</span></span><br><span class="line"><span class="built_in">D</span><span class="punctuation">[</span><span class="variable">f</span><span class="punctuation">(</span><span class="variable">x</span><span class="punctuation">)</span><span class="operator">,</span><span class="variable">x</span><span class="punctuation">]</span></span><br></pre></td></tr></table></figure>
<hr />
<h2 id="积分定积分integration">积分/定积分(Integration)</h2>
<p>我们要求得以下积分</p>
<p><span class="math inline">\(\int{\frac{1}{\cos^2{x}}dx}\)</span></p>
<p><span class="math inline">\(\int_0^{\pi/2}{\frac{1}{\cos^2{x}}dx}\)</span></p>
<p>我们可以分别在 mma 输入以下代码</p>
<figure class="highlight mathematica"><table><tr><td class="code"><pre><span class="line"><span class="built_in">Integrate</span><span class="punctuation">[</span><span class="number">1</span><span class="operator">/</span><span class="punctuation">(</span><span class="built_in">Cos</span><span class="punctuation">[</span><span class="variable">x</span><span class="punctuation">]</span><span class="operator">^</span><span class="number">2</span><span class="punctuation">)</span><span class="operator">,</span><span class="variable">x</span><span class="punctuation">]</span></span><br><span class="line"><span class="built_in">Integrate</span><span class="punctuation">[</span><span class="number">1</span><span class="operator">/</span><span class="punctuation">(</span><span class="built_in">Cos</span><span class="punctuation">[</span><span class="variable">x</span><span class="punctuation">]</span><span class="operator">^</span><span class="number">2</span><span class="punctuation">)</span><span class="operator">,</span><span class="punctuation">{</span><span class="variable">x</span><span class="operator">,</span><span class="number">0</span><span class="operator">,</span><span class="built_in">Pi</span><span class="operator">/</span><span class="number">2</span><span class="punctuation">}</span><span class="punctuation">]</span></span><br></pre></td></tr></table></figure>
<hr />
]]></content>
<categories>
<category>Math</category>
</categories>
<tags>
<tag>Math</tag>
<tag>Mathematica</tag>
</tags>
</entry>
<entry>
<title>Neovim常用配置(1)</title>
<url>/2023/02/06/Neovim%E5%B8%B8%E7%94%A8%E9%85%8D%E7%BD%AE-1/</url>
<content><![CDATA[<p>网上有关Neovim API的中文资料实在稀缺,在此特意整理一部分</p>
<p>若英文水平过关,可以直接输入指令 <code>:h lua guide</code> 获得Neovim的Lua API相关英文文档</p>
<span id="more"></span>
<h2 id="neovims-lua-api">Neovim's Lua API</h2>
<ul>
<li><p><code>vim.keymap.set(mode , from_keys, to_expr, opts)</code></p>
<p><em>作用:创建一个键位映射</em></p>
<p><strong>mode</strong>:类型:<strong>string</strong>,映射作用的模式,"n"代表normal,"i"代表insert,"v"代表visual</p>
<p><strong>from_keys</strong>:类型:<strong>string</strong>,则指被映射的按键</p>
<p><strong>to_expr</strong>:类型:<strong>any</strong>,指映射得到的键位,vim表达式,或者Lua函数</p>
<p><strong>opts</strong>:类型:<strong>table</strong>,键位映射有关的设置</p></li>
</ul>
<hr />
<ul>
<li><p><code>vim.api.nvim_create_user_command(commandName, expr)</code></p>
<p><em>作用:创建一个用户命令</em></p>
<p><strong>commandName</strong>:类型:<strong>string</strong>, 命令名(必须首字母大写)</p>
<p><strong>expr</strong>:类型:<strong>any</strong>,命令执行的键位,表达式或者Lua函数</p></li>
</ul>
<hr />
<ul>
<li><p><code>vim.api.nvim_create_autocmd(event, opts)</code></p>
<p><em>作用:创建一个自动命令</em></p>
<p><strong>event</strong>:类型:<strong>string</strong>, 自动命令组(autogroup)</p>
<p><strong>opts</strong>:类型:<strong>table</strong>,相关设置:</p>
<ul>
<li><p><strong>pattern</strong>: 文件名的pattern</p></li>
<li><p><strong>callback</strong>: 自动命令的回调函数,可以是键位,vim表达式,或者Lua函数</p></li>
</ul></li>
</ul>
<hr />
]]></content>
<categories>
<category>Tools</category>
</categories>
<tags>
<tag>Vim</tag>
</tags>
</entry>
<entry>
<title>Neovim常用配置(2)</title>
<url>/2023/02/08/Neovim%E5%B8%B8%E7%94%A8%E9%85%8D%E7%BD%AE-2/</url>
<content><![CDATA[<h3 id="使用-lua-配置-neovim并设置自己的-workflow">使用 Lua 配置 Neovim,并设置自己的 workflow</h3>
<h4 id="结合命令行工具">结合命令行工具</h4>
<p>我在编码时常常有使用 git 的需求,但又不想总是在命令行中敲命令</p>
<p>于是我利用与 ToggleTerm 把命令行工具 lazygit 嵌入至 Neovim 中</p>
<figure class="highlight lua"><table><tr><td class="code"><pre><span class="line"><span class="keyword">local</span> Terminal = <span class="built_in">require</span>(<span class="string">'toggleterm.terminal'</span>).Terminal</span><br><span class="line"></span><br><span class="line"><span class="keyword">local</span> lazygit = Terminal:new({ cmd = <span class="string">"lazygit"</span>, direction = <span class="string">'float'</span>, hidden = <span class="literal">true</span> })</span><br><span class="line"><span class="keyword">local</span> top = Terminal:new({ cmd = <span class="string">"top"</span>, direction = <span class="string">'float'</span>, hidden = <span class="literal">true</span> })</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">-- lazygit</span></span><br><span class="line">vim.api.nvim_create_user_command(<span class="string">"LazyGit"</span>,</span><br><span class="line"> <span class="function"><span class="keyword">function</span><span class="params">()</span></span></span><br><span class="line"> lazygit:toggle()</span><br><span class="line"> <span class="keyword">end</span>,</span><br><span class="line"> { nargs = <span class="number">0</span> })</span><br><span class="line"></span><br><span class="line"><span class="comment">-- top</span></span><br><span class="line">vim.api.nvim_create_user_command(<span class="string">"Top"</span>,</span><br><span class="line"> <span class="function"><span class="keyword">function</span><span class="params">()</span></span></span><br><span class="line"> top:toggle()</span><br><span class="line"> <span class="keyword">end</span>,</span><br><span class="line"> { nargs = <span class="number">0</span> })</span><br></pre></td></tr></table></figure>
<span id="more"></span>
<p>同样类似的,还可以通过命令行工具 trans 进行翻译,并通过 neovim 的 api 将翻译结果显示出来.</p>
<figure class="highlight lua"><table><tr><td class="code"><pre><span class="line"><span class="keyword">local</span> <span class="function"><span class="keyword">function</span> <span class="title">translate_terminal</span><span class="params">()</span></span></span><br><span class="line"> <span class="keyword">local</span> mode = vim.api.nvim_get_mode()[<span class="string">'mode'</span>]</span><br><span class="line"> <span class="keyword">local</span> to_translate</span><br><span class="line"> <span class="keyword">if</span> mode == <span class="string">'n'</span> <span class="keyword">then</span></span><br><span class="line"> to_translate = vim.fn.expand(<span class="string">'<cword>'</span>)</span><br><span class="line"> <span class="keyword">elseif</span> mode == <span class="string">'v'</span> <span class="keyword">then</span></span><br><span class="line"> to_translate = <span class="built_in">require</span>(<span class="string">'basic'</span>).get_visual_selection()</span><br><span class="line"> <span class="keyword">end</span></span><br><span class="line"> <span class="keyword">local</span> command = <span class="built_in">string</span>.<span class="built_in">format</span>(<span class="string">'trans "%s"'</span>, to_translate)</span><br><span class="line"></span><br><span class="line"> async.run(<span class="function"><span class="keyword">function</span><span class="params">()</span></span></span><br><span class="line"> <span class="keyword">local</span> translated_content = vim.fn.systemlist(command)</span><br><span class="line"> utils.show_term_content(translated_content)</span><br><span class="line"> <span class="keyword">end</span>)</span><br><span class="line"><span class="keyword">end</span></span><br><span class="line"></span><br></pre></td></tr></table></figure>
<h4 id="设置-layout">设置 Layout</h4>
<figure class="highlight lua"><table><tr><td class="code"><pre><span class="line">vim.api.nvim_create_user_command(</span><br><span class="line"> <span class="string">"BufferDelete"</span>,</span><br><span class="line"> <span class="function"><span class="keyword">function</span><span class="params">()</span></span></span><br><span class="line"> <span class="comment">---@diagnostic disable-next-line: missing-parameter</span></span><br><span class="line"> <span class="keyword">local</span> file_exists = vim.fn.filereadable(vim.fn.expand(<span class="string">"%p"</span>))</span><br><span class="line"> <span class="keyword">local</span> modified = vim.api.nvim_buf_get_option(<span class="number">0</span>, <span class="string">"modified"</span>)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> file_exists == <span class="number">0</span> <span class="keyword">and</span> modified <span class="keyword">then</span></span><br><span class="line"> <span class="keyword">local</span> user_choice = vim.fn.<span class="built_in">input</span>(</span><br><span class="line"> <span class="string">"The file is not saved, whether to force delete? Press enter or input [y/n]:"</span>)</span><br><span class="line"> <span class="keyword">if</span> user_choice == <span class="string">"y"</span> <span class="keyword">or</span> <span class="built_in">string</span>.<span class="built_in">len</span>(user_choice) == <span class="number">0</span> <span class="keyword">then</span></span><br><span class="line"> vim.cmd(<span class="string">"bd!"</span>)</span><br><span class="line"> <span class="keyword">end</span></span><br><span class="line"> <span class="keyword">return</span></span><br><span class="line"> <span class="keyword">end</span></span><br><span class="line"></span><br><span class="line"> <span class="keyword">local</span> force = <span class="keyword">not</span> vim.bo.buflisted <span class="keyword">or</span> vim.bo.buftype == <span class="string">"nofile"</span></span><br><span class="line"></span><br><span class="line"> vim.cmd(force <span class="keyword">and</span> <span class="string">"bd!"</span> <span class="keyword">or</span> <span class="built_in">string</span>.<span class="built_in">format</span>(<span class="string">"bp | bd! %s"</span>, vim.api.nvim_get_current_buf()))</span><br><span class="line"> <span class="keyword">end</span>,</span><br><span class="line"> { desc = <span class="string">"Delete the current Buffer while maintaining the window layout"</span> })</span><br></pre></td></tr></table></figure>
<h4 id="在-neovim-中编辑-hexo-blog">在 Neovim 中编辑 Hexo blog</h4>
<figure class="highlight lua"><table><tr><td class="code"><pre><span class="line"><span class="keyword">local</span> blog_path = <span class="string">"~/Documents/Hexo-Blog"</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">local</span> <span class="function"><span class="keyword">function</span> <span class="title">blogNew</span><span class="params">(input)</span></span></span><br><span class="line"> vim.api.nvim_set_current_dir(blog_path)</span><br><span class="line"> <span class="built_in">require</span>(<span class="string">'nvim-tree.api'</span>).tree.change_root(blog_path)</span><br><span class="line"> <span class="keyword">local</span> <span class="built_in">output</span> = vim.fn.system(<span class="string">"hexo n "</span> .. <span class="string">'\"'</span> .. <span class="built_in">input</span>.args .. <span class="string">'\"'</span>)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (vim.v.shell_error == <span class="number">0</span>) <span class="keyword">then</span></span><br><span class="line"> <span class="keyword">local</span> <span class="built_in">path</span> = <span class="built_in">string</span>.<span class="built_in">sub</span>(<span class="built_in">output</span>, <span class="built_in">string</span>.<span class="built_in">find</span>(<span class="built_in">output</span>, <span class="string">'~'</span>, <span class="number">1</span>, <span class="literal">true</span>), <span class="number">-1</span>)</span><br><span class="line"> vim.cmd(<span class="string">":e "</span> .. <span class="built_in">path</span>)</span><br><span class="line"> <span class="keyword">else</span></span><br><span class="line"> vim.notify(<span class="string">"Failed creating new blog post"</span> .. <span class="built_in">input</span>.args, <span class="string">"error"</span>)</span><br><span class="line"> <span class="keyword">end</span></span><br><span class="line"><span class="keyword">end</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">local</span> <span class="function"><span class="keyword">function</span> <span class="title">blogNewDraft</span><span class="params">(input)</span></span></span><br><span class="line"> vim.api.nvim_set_current_dir(blog_path)</span><br><span class="line"> <span class="built_in">require</span>(<span class="string">'nvim-tree.api'</span>).tree.change_root(blog_path)</span><br><span class="line"> <span class="keyword">local</span> <span class="built_in">output</span> = vim.fn.system(<span class="string">"hexo new draft "</span> .. <span class="string">'\"'</span> .. <span class="built_in">input</span>.args .. <span class="string">'\"'</span>)</span><br><span class="line"></span><br><span class="line"> <span class="keyword">if</span> (vim.v.shell_error == <span class="number">0</span>) <span class="keyword">then</span></span><br><span class="line"> <span class="keyword">local</span> <span class="built_in">path</span> = <span class="built_in">string</span>.<span class="built_in">sub</span>(<span class="built_in">output</span>, <span class="built_in">string</span>.<span class="built_in">find</span>(<span class="built_in">output</span>, <span class="string">'~'</span>, <span class="number">1</span>, <span class="literal">true</span>), <span class="number">-1</span>)</span><br><span class="line"> vim.cmd(<span class="string">":e "</span> .. <span class="built_in">path</span>)</span><br><span class="line"> <span class="keyword">else</span></span><br><span class="line"> vim.notify(<span class="string">"Failed creating new blog post"</span> .. <span class="built_in">input</span>.args, <span class="string">"error"</span>)</span><br><span class="line"> <span class="keyword">end</span></span><br><span class="line"><span class="keyword">end</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">local</span> <span class="function"><span class="keyword">function</span> <span class="title">blogGenerateAndDeploy</span><span class="params">()</span></span></span><br><span class="line"> vim.api.nvim_set_current_dir(blog_path)</span><br><span class="line"> <span class="keyword">if</span> (<span class="built_in">os</span>.<span class="built_in">execute</span>(<span class="string">"hexo g && hexo s"</span>)) <span class="keyword">then</span></span><br><span class="line"> vim.notify(<span class="string">"Deploy the blog successfully"</span>, <span class="string">"info"</span>)</span><br><span class="line"> <span class="keyword">else</span></span><br><span class="line"> vim.notify(<span class="string">"Deployment of blog failed"</span>, <span class="string">"error"</span>)</span><br><span class="line"> <span class="keyword">end</span></span><br><span class="line"><span class="keyword">end</span></span><br><span class="line"></span><br></pre></td></tr></table></figure>
<h4 id="取消下一行注释">取消下一行注释</h4>
<figure class="highlight lua"><table><tr><td class="code"><pre><span class="line"><span class="comment">-- avoid comment when enter the new line</span></span><br><span class="line">vim.api.nvim_create_autocmd({ <span class="string">"BufEnter"</span> }, {</span><br><span class="line"> pattern = <span class="string">"*"</span>,</span><br><span class="line"> callback = <span class="function"><span class="keyword">function</span><span class="params">()</span></span></span><br><span class="line"> vim.opt.formatoptions = vim.opt.formatoptions - { <span class="string">"c"</span>, <span class="string">"r"</span>, <span class="string">"o"</span> }</span><br><span class="line"> <span class="keyword">end</span>,</span><br><span class="line">})</span><br></pre></td></tr></table></figure>
]]></content>
<categories>
<category>Tools</category>
</categories>
<tags>
<tag>Vim</tag>
</tags>
</entry>
<entry>
<title>Neovim常用配置(3) (clangd & CMake)</title>
<url>/2023/03/01/Neovim%E5%B8%B8%E7%94%A8%E9%85%8D%E7%BD%AE-3-Clangd---CMake/</url>
<content><![CDATA[<p>在使用 Neovim 进行 C/C++的开发时,我们常常使用 <strong>clangd</strong> 作为 <strong>lsp</strong> 提供语法高亮/重构等语言服务</p>
<p>其中 clangd 根据自动推断宏的功能也是十分有效,搭配<strong>CMake</strong>可以达到更加好的效果(如支持 CMake 内置宏,支持自动 include CMake 配置的头文件)</p>
<p>下面提供简要的集成 clangd 与 cmake 的方法</p>
<p>一般来说<strong>clangd</strong>可以自动识别<strong>CMake</strong>生成的<strong>compile_commands.json</strong>来进行头文件的识别与宏的分析</p>
<p>但 compile_commands.json 不会自动生产,故我们可以通过以下命令实现 compile_commands 的自动生产</p>
<figure class="highlight bash"><table><tr><td class="code"><pre><span class="line">cmake . -G -DCMAKE_EXPORT_COMPILE_COMMANDS=ON</span><br></pre></td></tr></table></figure>
<p>其中 <em><code>-DCMAKE_EXPORT_COMPILE_COMMANDS=ON</code></em> 是用于导出编译命令的 flag</p>
<p>故我常常会在项目目录下建立一个 build.sh 来构建项目:</p>
<figure class="highlight bash"><table><tr><td class="code"><pre><span class="line">cmake . -G -DCMAKE_EXPORT_COMPILE_COMMANDS=ON</span><br><span class="line">make</span><br></pre></td></tr></table></figure>
<p>构建时只需要输入 build.sh</p>
]]></content>
<categories>
<category>Tools</category>
</categories>
<tags>
<tag>Vim</tag>
</tags>
</entry>
<entry>
<title>OSPF (Open Shortest Path First) & BGP (Border Gateway Protocol)</title>
<url>/2023/08/19/OSPF_BGP/</url>
<content><![CDATA[<h2 id="making-routing-scalable">Making routing scalable</h2>
<p>Here are some concepts to note:</p>
<p>scale: billions of destinations:</p>
<ul>
<li>can't store all destinations in routing tables.</li>
<li>routing table exchange would swamp links.</li>
</ul>
<p>administrative autonomy:</p>
<ul>
<li>Internet: a network of networks</li>
<li>each network admin may want to control routing in its own network</li>
</ul>
<h2 id="approach-to-scalable-routing">Approach to scalable routing</h2>
<p>We always aggregate routers into regions known as "autonomous systems" (a.k.a "domains").</p>
<p>And <strong>intra-AS (intra-domain)</strong> is such routing among routers within same AS(network).</p>
<ul>
<li>all routers in AS must run same intra-domain protocol.</li>
<li>routers in different AS can run different intra-domain protocols.</li>
<li>gateway router: at edge of its own AS, has link(s) to router(s) in other AS'es</li>
</ul>
<p><strong>inter-AS</strong> routing among AS'es is the gateways perform inter-domain routing</p>
<p>Both of them determine entries for destination of routers, while former is <em>within</em> AS and latter is for <em>external</em> destinations. Most common intra-AS routing protocols:</p>
<ul>
<li>RIP (Routing Information Protocol), which is no longer widely used.</li>
<li>OSPF (Open Shortest Path First), which includes classic <strong>link-state</strong> routing.</li>
<li>EIGRP: (Enhanced Interior Gateway Routing Protocol), which is <strong>DV</strong> based/</li>
</ul>
<h2 id="ospf">OSPF</h2>
<p>OSPF is an intra-domain routing protocol.</p>
<ul>
<li><p>open: publicly available</p></li>
<li><p>classic link-state:</p>
<ul>
<li>each router floods OSPF link-state advertisements (directly over IP) to all other routers in entire AS.</li>
<li>multiple link costs metrics possible: bandwidth, delay.</li>
<li>global (has full topology)</li>
</ul>