-
Notifications
You must be signed in to change notification settings - Fork 4
/
JamesSergeyRodrigoDefcon2012.txt
1832 lines (1439 loc) · 73.5 KB
/
JamesSergeyRodrigoDefcon2012.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Overwriting the Exception Handling Cache Pointer - Dwarf Oriented Programming
Version 1.0
James Oakley (Electron)
Sergey Bratus (@SergeyBratus)
Rodrigo Branco (@BSDaemon)
"I'll never stop to be amazed by the amount
of effort people put into not understanding things"
Mark Dowd
------[ Index
0 - Abstract
1 - Introduction
1.1 - Paper structure
2 - Concepts and Additions
2.1 - Call Frame Information
2.1.1 - Dwarf Tables
2.1.2 - Dwarf Registers
2.1.3 - Dwarf Rules
2.2 - DWARF Expressions
2.3 - Exception Handlers
2.4 - Exception Process
2.4.1 - Context initialization
2.4.2 - Stack unwinding using DWARF tables
2.4.3 - Overall algorithm
2.4.4 - Callgraph of the relevant gcc/libunwind code
2.5 - Playground
3 - The Techniques and Comparisons
3.0 - Katana and Dwarfscript
3.1 - Dwarf Payloads
3.2 - .eh_frame abusing techniques
3.2.1 - Using a write4 to overwrite the cache pointer
3.2.2 - Using a writeN to overwrite the .eh_frame
3.2.3 - Shellcode execution to replace .eh_frame: Avoiding
the usage of decoders
3.2.4 - Ret-into-lib to remap the .eh_frame pages as writable
3.3 - ROP x .eh_frame
4 - Future
5 - Acknowledgements
6 - References
7 - Downloadable Materials
------[ 0 - Abstract
This paper describes a new technique for abusing the DWARF exception
handling architecture used by the GCC tool chain. This technique can
be used to exploit vulnerabilities in programs compiled with or linked
to exception-enabled parts. Exception handling information is stored
in bytecode format, executed by a virtual machine during the course of
exception unwinding and handling. We show how a malicious attacker
could gain control of those structures and inject bytecode for
malicious purposes. This virtual machine is actually Turing-complete,
which means that it can be made to run arbitrary attacker logic.
------[ 1 - Introduction
This paper demonstrates how to exploit exception handling mechanisms based on
DWARF. For a detailed analysis describing construction of trojaned binaries,
please refer to [1]. We try here to extend the aforementioned paper, but for
completeness we also explain all the technical details of the exception
handling mechanisms. Some experience with DWARF is recommend for fully
understanding of this paper.
In this paper, we are going to discuss the standard format used in many
Unix-based systems to represent programs on disk and their respective
process image in memory, the ELF format [2].
An ELF binary is organized using various header information and an array of
named sections. Each section contains data used in a specific part of the
process lifecycle. For programs dependent on exceptions (such as the ones
created using C++) the format must provide a way for the exception handling
mechanism to occur.
On Windows-like systems (such as ReactOS and Windows XP) the information is
stored on the stack [3]. Exploitation of such mechanisms on Windows is very
common, and ways to avoid such exploitation have been exhaustively
researched [4] [5].
On Linux and other Unix-like systems, exception handling information is
stored in ELF sections using a standardized format called Dwarf (Debugging
with Attributed Records Format) [6]. Exploitation of Dwarf-based exception
handling mechanisms had never been discussed before, and this discussion is the
intent of the article. We would like to stress here that we are going to
discuss Dwarf-based EXCEPTION HANDLING, not Dwarf-based Debugging Data.
When the attacker gains control over the Dwarf data, he can perform
sophisticated computations unhindered by standard protections such as
non-executable memory and ASLR. As we will demonstrate in this article,
non-executable memory is completely useless against this technique. ASLR
is still problematic, and the conditions necessary to bypass it are similar to
the conditions needed to bypass it using other techniques (such as memleaks).
Nonetheless, since we need other memory area information to be leaked, it
increases possibility of bypass in occasions when you don't control the
information being leaked.
There are two main attack vectors for abusing Dwarf-based exception handling:
- Dwarf Trojans -> The attacker has complete control over the binary,
and thus creates a malicious Dwarf section. Since the antivirus
companies are currently unaware of the technique, it cannot be
detected using signatures. This attack is discussed in [1].
- Dwarf Injection -> The attacker has an arbitrary write in a running
program, allowing him to overwrite data. This ability is used to
overwrite the Dwarf portion of an executable (it requires multiple
writes and a writable Dwarf section map) or the information used to
locate the Dwarf data (our biggest contribution). After the
injection, the attacker just needs to force the vulnerable program to
throw an exception, and the crafted Dwarf data will be executed.
In [1], the authors demonstrated that, due to the flexibility and extensibility
of the Dwarf-based exception handling, accommodating present and future stack
unwinding, saved registers restoration logic and bytecode support, generic
computations are possible. Among other things, these computations are
capable of performing memory reading and register reading, and as such make
full use of the symbol information. To avoid including a huge list of
information for unwinding, the bytecode represents the symbolic memory
operations in efficient ways, thus allowing us to pack a lot of functionality
in small sections of data (a great feature for exploiting vulnerabilities with
size constraints). We created a dynamic linker in less than 200 bytes, a
size that fits within the typical .eh_frame section length.
---[ 1.1 - Paper structure
The paper is structured in a different way than [1] due to the focus being
only on the exploitation part of the problem.
In section 2 we describe the Dwarf-based exception handling mechanism,
detailing its internals and comparing it with other code execution mechanisms.
We present the tools created to manipulate Dwarf data in this section (attached
to this article).
In section 3 we discuss the exploitation mechanisms, using different conditions
in order to generalize the ideas. We give examples of vulnerable programs,
discuss which information the attacker needs to be able to locate, and explain
how to locate the information.
------[ 2 - Concepts and Additions
In this chapter we discuss the Dwarf-based exception handling mechanism in
detail.
We define the models used for computation and show other similar techniques
used during exploitation.
The technique presented in this article is important due to the wide use
of gcc-like compilers (LLVM also uses Dwarf-based exception handling, and
Clang C++ compiler is known to be nearly fully binary compatible with gcc) on
different Unix-based OSes (Linux, BSD, Solaris, Darwin, Cygwin) and also on
Windows-based machines (gcc on Windows uses setjmp/longjmp (SJLJ) [7]
exception handling [8], but Dwarf-based exception handling is also supported
except for the GUI callbacks - gcc on Windows does not use SEH, and gcc 4.4
only supports Dwarf [9]). The exception-handling data formats and processes are
not C++ specific and are also applicable to other gcc-compiled languages
supporting exceptions.
For discussing details we choose C++ code, compiled with GCC on Linux
x86_64 architecture.
Traditional exploitation techniques focus either on introducing new code to
be executed directly [10] or on manipulating the flow of execution of
existing code [11] [12] [13]. In translation to ELF terms, the attention
is highly focused on the executable sections of the target program and the
libraries it loads (or can be forced to load).
Besides the main computation (traditionally the target of exploitation) in
a modern ELF file there are over 30 auxiliary sections, which contain data
and code controlling various stages of the process life-cycle: loading,
dynamic linking, relocation, process tear down, exception handling.
The automaton responsible for performing all these tasks uses the relevant
ELF sections as input. Previous work like locreate [14] already demonstrated
the potential for borrowing logic from auxiliary tasks that build a process
image (in locreate's case, relocation). The logic of these tasks is driven
by specialized data sections. Changing these sections will cause the
auxiliary tasks to do more; possibly, perform arbitrary computations.
To verify if your compiler is using Dwarf-based exception handling we
recommend the following steps (in our example, under Solaris):
------------------------ Checking if Dwarf is used ------------------------
Establish what version of Solaris we're on
$ uname -a
SunOS boson 5.11 snv_111b i86pc i386 i86pc Solaris
$ cat /etc/release
OpenSolaris 2009.06 snv_111b X86
Copyright 2009 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 07 May 2009
Show a simple C++ program with an exception
$ cat hello.cpp
#include <iostream>
int main(int argc,char** argv)
{
try
{
std::cout<<"Hello world\n";
throw 1;
}
catch(int a)
{
std::cout<<"caught int "<<a<<"\n";
}
}
$ g++ -o hello hello.cpp
$ ./hello
Hello world
caught int 1
Examine the ELF structure of that program
$ readelf -S ./hello
There are 33 section headers, starting at offset 0x2620:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk
Inf Al
[ 0] NULL 00000000 000000 000000 00 0
0 0
[ 1] .interp PROGBITS 080500f4 0000f4 000011 00 A 0
0 1
[ 2] .SUNW_cap LOOS+ffffff5 08050108 000108 000010 08 A 0
0 4
[ 3] .hash HASH 08050118 000118 000178 04 A 5
0 4
[ 4] .SUNW_ldynsym LOOS+ffffff3 08050290 000290 000100 10 A 6
10 4
[ 5] .dynsym DYNSYM 08050390 000390 0002d0 10 A 6
1 4
[ 6] .dynstr STRTAB 08050660 000660 0005e2 00 AS 0
0 1
[ 7] .SUNW_version VERNEED 08050c44 000c44 000050 01 A 6
2 4
[ 8] .SUNW_versym VERSYM 08050c94 000c94 00005a 02 A 5
0 4
[ 9] .SUNW_dynsymsort LOOS+ffffff1 08050cf0 000cf0 000058 04 A 4
0 4
[10] .SUNW_reloc REL 08050d48 000d48 000038 08 A 5
0 4
[11] .rel.plt REL 08050d80 000d80 0000a0 08 AI 5
c 4
[12] .plt PROGBITS 08050e20 000e20 000150 10 AX 0
0 4
[13] .text PROGBITS 08050f70 000f70 000484 00 AX 0
0 4
[14] .init PROGBITS 08051400 001400 00001d 00 AX 0
0 16
[15] .fini PROGBITS 08051420 001420 000018 00 AX 0
0 16
[16] .rodata PROGBITS 08051438 001438 00001f 00 A 0
0 4
[17] .got PROGBITS 08061458 001458 000068 04 WA 0
0 4
[18] .dynamic DYNAMIC 080614c0 0014c0 000170 08 WA 6
0 4
[19] .data PROGBITS 08061630 001630 000058 00 WA 0
0 8
[20] .bssf PROGBITS 08061688 001688 000000 00 WA 0
0 1
[21] .picdata PROGBITS 08061688 001688 000000 00 WA 0
0 1
[22] .ctors PROGBITS 08061688 001688 00000c 00 WA 0
0 4
[23] .dtors PROGBITS 08061694 001694 00000c 00 WA 0
0 4
[24] .eh_frame PROGBITS 080616a0 0016a0 000128 00 WA 0
0 4
[25] .jcr PROGBITS 080617c8 0017c8 000004 00 WA 0
0 4
[26] .data.rel.local PROGBITS 080617cc 0017cc 000004 00 WA 0
0 4
[27] .gcc_except_table PROGBITS 080617d0 0017d0 000020 00 WA 0
0 4
[28] .bss NOBITS 080617f0 0017f0 0000b4 00 WA 0
0 8
[29] .symtab SYMTAB 00000000 0017f0 000720 10 30
46 4
[30] .strtab STRTAB 00000000 001f10 00048e 00 S 0
0 1
[31] .comment PROGBITS 00000000 00239e 000166 00 0
0 1
[32] .shstrtab STRTAB 00000000 002504 00011c 00 S 0
0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
$ readelf --debug-dump=frames ./hello
The section .eh_frame contains:
00000000 00000018 00000000 CIE
Version: 1
Augmentation: "zPL"
Code alignment factor: 1
Data alignment factor: -4
Return address column: 8
Augmentation data: 00 60 0f 05 08 00
DW_CFA_def_cfa: r4 ofs 4
DW_CFA_offset: r8 at cfa-4
0000001c 00000028 00000020 FDE cie=00000000 pc=0805114c..08051247
Augmentation data: 00 00 00 00
continues...
We can see that Dwarf-related information is present in the binary
(DW_CFA_*). We explain it below.
---------------------------------------------------------------------------
---[ 2.1 - Call Frame Information
This paper is about exploiting auxiliary computations inside a program.
The benefits for the attacker are due to new opportunities while exploiting a
given vulnerability and are highly dependent on the vulnerability specifics.
Section 3 will focus on the advantages for each scenario.
In this section we explain how we can manipulate the automaton used by the
exception-handling process, which is a Turing-complete machine, by providing
crafted contents for the .eh_frame and .gcc_except_table ELF sections.
We will discuss how the exception handling process works as implemented by gcc
and partially standardized by the Linux Standards Base [15] and x86_64 ABI
[16].
To handle an exception, the stack must be unwound. Walking the call
stack following return address pointers to find all call frames is not
efficient and does not provide execution restoration (due to the loss
of the register state - functions contain assembly instructions before
the return to restore callee-saved registers before returning to the
caller, and these instructions are not going to be executed when there
is an interrupt). It is thus a requisite that the information necessary
to restore registers at the time of an unexpected procedure termination
(exception thrown from within the procedure) is present at the time of
the exception handling.
It is thus a requisite that the information necessary to restore registers at
the time of an unexpected procedure termination (exception thrown from
within the procedure) is present at the time of the exception handling.
Debuggers also require this information in order to display backtraces,
examine local variables, and so on. That is why DWARF Call-Frame Information
section of the Dwarf standard [6] has been adopted to encode the unwinding
information (there are minor changes in areas such as pointer encoding
[16]).
GCC does not care which version of DWARF a program was compiled against (the
DWARF standard provides call frame information with a version number field,
and in .eh_frame this version number is always set to 1) unless necessary to
resolve the layout of a structure of behavior which conflicts across standards
(we use version 4 for this paper).
As we mentioned, the information utilized by debuggers and the exception
handling mechanism of the C++ runtime lies within a bunch of ELF sections:
.eh_frame, .debug_frame, .gcc_except_table and many more, but this article
deals with the first two (other DWARF related sections deal with type
information, mappings from source code lines to ASM instructions and so
on). The layout of .eh_frame and .debug_frame is similar. Whatever said
for the first also applies to the latter unless otherwise stated.
Quoting [25]: "The .eh_frame section shall contain 1 or more Call
Frame Information (CFI) records. The number of records present shall be
determined by size of the section as contained in the section header. Each
CFI record contains a Common Information Entry (CIE) record followed
by 1 or more Frame Description Entry (FDE) records. Both CIEs and FDEs
shall be aligned to an addressing unit sized boundary". This leads us
to the following ASCII diagram that depicts the relationship between
the various structures.
.eh_frame section
.-----------------------.
| CFI | CFI = Call Frame Information
| .---------------. |
| | CIE | | CIE = Common Information Entry
| '---------------' |
| | FDE | | FDE = Frame Description Entry
| | .-------. | |
| | | *LSDA | | | LSDA = Language Specific Data Area
| | '-------' | |
| '---------------' |
| | FDE | |
| | .-------. | |
| | | *LSDA | | |
| | '-------' | |
| '---------------' |
| | | |
| | ... | |
| '---------------' |
| | FDE | |
| | .-------. | |
| | | *LSDA | | |
| | '-------' | |
| '---------------' |
'-----------------------'
FDE is a structure that holds information for a range of instruction
pointer values and is usually used to represent debug information for
a procedure. Nevertheless, certain compiler optimizations may result
in one procedure being assigned several FDEs. For example, optimizing
compilers usually split the code in "hot" and "cold" paths that are
not contiguous. The "hot" path may be assigned a FDE separate from the
"cold" path. If an exception is thrown in function "foo()", or in case
a debugger wants to examine its debug information in order to unwind its
stack frame, the FDE for "foo()" is looked up and parsed.
A CIE (Common Information Entry) holds information common to several
FDEs. For this purpose, each FDE structure contains a pointer to its
parent CIE. Those that have experience with C compiler internals, may
think of a CIE as the debugging information common to a 'translation unit'
of the target program [27]. A CIE indicates the start of a CFI logical
block. That is, a CFI (Call Frame Information) record doesn't have a
header of its own - The term CIE is only used as a logical separation
of a CIE along with its children FDEs.
Last but not least, LSDA stands for Language Specific Data Area. Each FDE
contains a pointer to a LSDA but it may be NULL indicating the absence
of specific information for the FDE in question. The layout of the LSDAs
is architecture specific but its something we're not dealing with in
this article. On x86 and x86_64 (and probably others), LSDAs contain a
landing-pad table in a special encoded format. The landing-pads indicate
the address of an exception handler for a given set of instruction
pointer values. That is, if an exception happens at 0xdeadbeef, then
the landing-pad table will have an entry for that EIP that will encode
the address of the exception handler as well as its type information.
---[ 2.1.1 - Dwarf Tables
The unwinding information describes a large table. The rows of the table
correspond to machine instructions in the program text, and the columns
correspond to registers and Canonical Frame Addresses (CFA).
CFA is internally used by the Dwarf to express the canonical address for
the call frame on the stack.
The information in the rows is sufficient to restore the machine state (the
values of the registers and the CFA) for every instruction within the previous
call frame.
Here is a simplified example of what a Dwarf table might look like:
---------------------------------------------------------------------------
PC (eip) || CFA || ebp || (ebx) || eax || return address
---------------------------------------------------------------------------
0xf000f000 || rsp+16 || *(cfa-16) || || || *(cfa-8)
0xf000f001 || rsp+16 || *(cfa-16) || || || *(cfa-8)
0xf000f002 || rsp+16 || *(cfa-16) || || eax=edi || *(cfa-8)
... || ... || ... || ... || ... || ...
0xf000f00a || rsp+16 || *(cfa-16) || *(cfa-24) || eax=edi || *(cfa-8)
---------------------------------------------------------------------------
The Dwarf specification largely discusses compression techniques to provide
sufficient information at run-time to build parts of the table without the
full information.
The aforementioned compression required by the Dwarf tables is performed
using the concept of the Frame Description Entities (FDE) and Dwarf
instructions.
An FDE corresponds to a logical block of the program .text and describes
how unwinding may be done from within that block. Information common to
many FDEs is stored in the Common Information Entity (CIE). The .eh_frame
uses information from the versions 2 and 3 and does not include details
added to this structure in version 4 of the standard.
The FDE structure looks like this:
length
CIE_pointer
initial_location
address_range
LSDA pointer (see section 2.3)
instructions
padding
Whereas the CIE structure contains:
length
CIE_id
version
augmentation (string) (see section 2.3)
address_size
segment_size
code_alignment_factor
data_alignment_factor
return_address_register
initial_instructions
padding
The instructions in the FDE either specify column rules (registers) to
apply to all cells in that column from the initial_location to the end of
the procedure (unless a different rule is specified for the same
column/register in the sequence) or change the initial_location (thus
influencing how the other instructions affect the process).
---[ 2.1.2 - Dwarf Registers
An arbitrary number of registers is permitted, identified by their number.
To map the Dwarf registers with the architecture-specific hardware
registers we need the ABI information. Some Dwarf registers do not map
directly to a hardware register (such as the return-address in x86
architecture).
---[ 2.1.3 - Dwarf Rules
The Dwarf tables contain in each cell a rule detailing how the contents of
the register will be restored for the previous call frame. There are
several rules supported [6].
Registers are restored usually from another register or from a memory
location accessible through an offset from the CFA.
---[ 2.2 - Dwarf Expressions
Dwarf expressions were introduced in the version 3 of the Dwarf standard to
give flexibility in the way a register could be restored, permitting it to be
restored to the value computed by a Dwarf expression.
The expressions consist of one or more Dwarf operations (instructions) and
are evaluated by a stack machine with instructions operating on the top items
on the stack.
Since the standard does not specify the format of the stack items, gcc uses
architecture word-sized objects.
All basic operations used for numerical computations are available:
- Pushing values onto the stack
- Arithmetic operations
- Bitwise operations
- Stack manipulation
Other instructions are available to dereference memory addresses and to
obtain values from Dwarf registers calculated as part of the unwinding
process.
This functionality allows to restore registers after some arithmetics using
values either in memory locations or in registers. It also extends the
Dwarf rules in the sense that you are now capable of using absolute addresses
and not only stack-relative ones.
The expressions also include conditional operations and conditional branch
instructions: all the conditions required are met to consider the Dwarf
automaton being Turing-complete.
---[ 2.3 - Exception Handlers
Since Dwarf is a debugging format, its specification does not provide any
mechanism for halting the unwinding process.
Every CIE includes an augmentation string, which is implementation-defined
(not defined by the standard).
This augmentation string is used to control the set of previously agreed upon
augmentations to the CIE and FDE structures being used. For our chosen case of
Linux on x86_64, the augmentations are well-defined [15] [16].
The augmentation defines a language-specific data area (LSDA) and
personality routine associated with every FDE (defined as pointers).
During the unwinding of an FDE, the process calls the personality routine
associated to this FDE. The routine interprets the LSDA and determines if
a handler for the exception has been found [22].
LSDA contents are not defined by any standard, and the same program may use
completely different formats, as they will be read by separate personality
routines (libstdc++-v3/libsupc++/eh_personality.cc).
For details of the format, the gcc commented assembly code generated with
gcc -fverbose-asm -dA is recommended.
In an ELF binary, the section .gcc_except_table contains an array of LSDA.
The text region described by an FDE is divided into call sites defined by the
LSDA, with each call site corresponding to a code within a try block (C++
terminology) and with a pointer to a chain of C++ typeinfo descriptors
(again, C++ terminology). These objects are then used by the personality
routine to determine whether the exception can be handled in the current
frame:
gcc_except_table | LSDA 0
----------------- ----------------
LSDA 0 | -------->| Header -----> | LpStart encoding
LSDA 1 | | Call Site Table -| | LpStart
... | | Action Table---| | | TType format
LSDA n | | Type Table---| | | | TTBase
------------------------------------| | | | Call Site Format
| ----------------------------| | | Call Site Table size
| | | ----------------------
| | ----------------------- |
| | |----->| Call Site Record 0 -> | call site position
| | | Call Site Record 1 | call site length
| | | ... | landing pad position
| | | Call Site Record n | first action
| | ------------------- ----------------------
| |
| |----> | action 0
| | action 1
| | ...
| | action n
| -----------
|
|----> | typeid 0
| typeid 1
| ...
| typeid n
-----------
Arrows in the graph denote not pointers but a detailed layout of the
object in question.
There is a separate LSDA for each try/catch block. The LSDA is made
up of the following parts:
| HEADER |
| --LPStart enc |
| --LPStart |
| --TType format |
| --TTBase |
| --CallSiteFormat |
| --CallSiteTableSize |
| CALL SITE TABLE |
| --CallSiteRecord1 |
| ----call site position |
| ----call site length |
| ----landing pad position |
| ----first action |
| --CallSiteRecord2 |
| .... |
| --CallSiteRecordn |
| ACTION TABLE |
| --Action1 |
| ----type filter |
| ----offset to next action |
| --Action2 |
| .... |
| --ActionN |
| TYPE TABLE |
| --typeinfoN |
| .... |
| --typeinfo2 |
| --typeinfo1 |
In more detail:
*** header
+ LPStart encoding
Note that gcc frequently writes it to be DW_PE_omit.
+ LPStart
Landing pad start pointer. Self-relative offset to beginning of
landing-pad code for this code fragment. Landing pad fields in
call-site table entries are relative to this. The lower 4 bits
are not part of this pointer and have a special meaning:
+ 0000: There is a type table pointer
+ 0001: There is no type table pointer
+ TTypeFormat
This seems to be the encoding of the entries in the type table
(DW_PE_* values).
+ TTBase offset
Self-relative offset to the *end* of the type table. The type
table goes from bottom to top, like a heap. This appears to
always be ULEB128, although this is not specified by any
documentation.
+ CallSiteFormat
The format (DW_PE_*) of values in entries in the call site table
+ CallSiteTableSize
Number of bytes in the call site table. We believe that this
value is encoded according to the value of CallSiteFormat. It
seems to generally be uleb128.
*** call site table
The call site table consists of any number of
entries. Conceptually, each entry corresponds to a range of text
addresses and specifies, for that range, where the exception
handler (landing pad) begins and what types of thrown objects the
handler can deal with. This information is encoded by an entry
with the following fields (their format governed by
CallSiteFormat).
+ CallSiteStart
The text address of the beginning of this call-site, relative to
the start of the procedure. It is possible that this should
actually be LPStart-relative. Since gcc generally omits LPStart,
it is difficult to tell. Note that this differs from the
documentation.
+ CallSiteRange
Length of this call site in bytes.
+ LandingPad
The address of the handler, relative to the start of the
procedure. It is possible that this should
actually be LPStart-relative. Since gcc generally omits LPStart,
it is difficult to tell.
+ ActionRecordPtr
Byte-offset into the action table plus one. A value of 0
indicates no actions. When evaluating whether execution should
be passed to the landing pad, the action chain starting at this
offset can be examined.
*** action table
The action table can be used as a filter to see what types can be
"caught" by a handler. Each action record contains two values. The
documentation indicates that they are both LEB-encoded.
+ TypeFilter
This is used to match the type of a thrown exception to a type
that can be handled. It is an index into the type table
(starting from the bottom), which has more information about the
type. Note that it is an index, not a byte-offset.
Unfortunately, although this seems to be a 1-based index, on
occasion it has a value of 0. The documentation indicates that this
means "match all". A comment in
libstdc++-v3/libsupc++/eh_personality.cc line 581 of gcc 4.5.2
indicates that it may just apply to cleanups and not
handlers.
+ NextRecordPtr
Self-relative byte offset to the next action record. If 0 then
there is no next one.
*** types table
Table of type entries indexed from the bottom up. The format of
each entry is described by TTypeFormat in the header.
** asm
Some of the call sites exist for the purposes of dealing with
re-thrown exceptions or exceptions thrown within a catch block.
The beginning of the normal catch handler has conditional jumps.
---[ 2.4 - Exception Process
Exception handling is a mazy process. Although the C++ standards define
exceptions, their semantics and how they should be handled, no reference
is made regarding the actual implementation details (obviously this is not
what a standard should do). Consequently, each compiler suite implements
its very own way of handling exceptions, which may also differ depending
on the operating system. In this article we will be focusing on platforms
that use GNU compiler toolchain and its associated runtime. Most of the
code snippets presented below were directly ripped from the GCC source
code and libunwind, so, our analysis applies on various operating systems.
Analyzing the way exceptions are handled involves two components: gcc
and its libstdc++/libsup++, as well as libunwind by HP. The first is
famous for its large codebase, while the latter for its architectural
support which makes code reading quite difficult. When an exception is
thrown, control is passed from the libsup++ runtime to libunwind which
is responsible for extracting, parsing and even caching the DWARF data
for later use. C++, being an object oriented language, performs many
magic operations transparently, without the explicit intervention of the
programmer. Examples that fall into this category are the constructors,
the destructors and of course, the exception handlers. In the end, the
whole world is about bits and bytes, everything is translated in ASM and
no high level object view is taken into account. Without further ado,
we start with a very simple C++ code that defines one try/catch block
and throws an exception. The relevant code snippet is shown below.
Note: Libunwind is capable of performing both local and remote
unwinding. The first refers to a process unwinding itself while the
latter involves frame inspection of a process A from another process
B. Additionally, libunwind is capable of handling unwind information
for either statically compiled languages as C or C++, or dynamically
compiled (interpreted) languages as Java or Python [23]. In this
article we are mostly interested in _local_ unwinding for _statically_
compiled languages, and precisely for C++.
---------------------- Exception Sample Code ------------------------------
#include <iostream>
using namespace std;
int main() {
try {
throw(1);
}
catch(int e) {
cout << "Hello Phrack! " << e << endl;
}
return 0;
}
---------------------------------------------------------------------------
Most programmers perceive exceptions as being asynchronous events,
kinda like POSIX signals, while they are not. Leaf functions calling
"throw()", like "main()" in the example above, do nothing more than
calling "__cxa_allocate_exception()" and then "__cxa_throw()". Remember
that exceptions are classes, so the first is used to allocate space
via "malloc()" (or via a set of buffers located in .bss if that fails,
check "gcc-xxx/libstd++v3/libsupc++/eh_alloc.c:84"), while the latter
is used to traverse the frames until a handler for that exception is
eventually located. It is now apparent that no asynchronous process is
involved. Throwing an exception means traversing the frames and calling
a special function, nothing more, nothing less, yet the process of doing
so, is kind of complex.
As a proof, consider the following ASM snippet that corresponds to the
simple .cpp file above (the disassembly comes from a x86_64 machine
running Linux).
---------------------------------------------------------------------------
Dump of assembler code for function main:
...
<+9>: mov $0x4,%edi # std::size_t thrown_size
# Allocates a new "__cxa_refcounted_exception" followed by 4 bytes; we
# do a "throw(1)", 1 being an "int" occupies 4 bytes.
<+14>: callq 0x400930 <__cxa_allocate_exception@plt>
...
<+25>: mov $0x0,%edx # void (*dest) (void *)
<+30>: mov $0x6013c0,%esi # std::type_info *tinfo
<+35>: mov %rax,%rdi # void *obj
<+38>: callq 0x400940 <__cxa_throw@plt>
---------------------------------------------------------------------------
The return value of "__cxa_allocate_exception()" returns a pointer to a
"struct __cxa_refcounted_exception". This structure holds a reference
count, as well as a "__cxa_exception" object, a structure used to
represent all exceptions. A copy of the thrown object, an "int" in
our example, follows the "exc" variable shown below. Obviously, since
exceptions are objects thrown that also need to be cought, the copy of
the thrown object is later used for the "catching" process.
---------------------------------------------------------------------------
struct __cxa_refcounted_exception
{
_Atomic_word referenceCount;
__cxa_exception exc;
};
---------------------------------------------------------------------------
.------ Pointer returned by __cxa_refcounted_exception()
|
v
---+----------------+-----+--------+---
... | referenceCount | exc | object | ...
---+----------------+-----+--------+---
Control is then passed to "__cxa_throw()" that does the hard work of (a)
initializing the current context (i.e. current value of registers) (b)
traversing the stack until a proper handler for the exception is found.
The following sections analyze those two tasks in more detail.
----[ 2.4.1 - Context initialization
The following figure represents part of the callgraph starting at
"__cxa_throw()". The, not necessarily leaf, functions "unw_getcontext()"
and "unw_init_local()" are responsible for the proper initialization of
the current context. They may deal with OS specific structures in order
to reshape them in machine independent libunwind objects.
---------------------------------------------------------------------------
__cxa_throw()
[gcc-xxx/libstdc++v3/libsup++/eh_throw.cc:61]
_Unwind_RaiseException()
[libunwind-xxx/src/unwind/RaiseException.c:29]
_Unwind_InitContext() [MACRO]
[libunwind-xxx/src/unwind/unwind-internal.h:52]
unw_getcontext()
[libunwind-xxx/src/x86_64/getcontext.S:37]
unw_init_local()
[libunwind-xxx/src/x86_64/Ginit_local.c:42]
---------------------------------------------------------------------------
As it was previously mentioned, stack traversal takes place with the help
of special DWARF tables. For the frame unwinding to begin one must know
the current line in the aforementioned table, that is the instruction
pointer value corresponding to the place where the exception was
raised. The current value of EIP, along with a bunch of other registers,
are stored in a buffer by "unw_getcontext()", a function coded in ASM,
defined in "libunwind-1.0.1/src/x86_64/getcontext.S:37".
Note: On certain platforms (e.g. IA-64) running Linux or FreeBSD,
"unw_getcontext()" [24] ends up calling "getcontext(3)". The latter
returns context information in a structure called "ucontext_t" which
has different members depending on the operating system, one of them
being the register state. Nevertheless, "unw_getcontext()" needs to
provide a more highlevel view of the registers. That's why figuring
out how it works, involves messing up with a bunch of libunwind
macros which take care of the underlying OS.
"unw_init_local()", among others, initializes a, so called, "struct
cursor" (check "libunwind-xxx/include/tdep-x86_64/libunwind_i.h:77"). This
structure holds the current state of the unwinder, including the
DWARF state. At this point of execution no DWARF related data from the ELF
file have been analyzed yet. The only pieces of available information
come from "getcontext(3)". The various bookkeeping structures are still
zeroed out.
----[ 2.4.2 - Stack unwinding using DWARF tables
Once the unwinding state has been initialized, the exception handling
mechanism continues by calling "dl_iterate_phdr(3)" to walk through
the list of loaded shared objects. For each DSO, libunwind attempts to
locate an .eh_frame ([25], [26]) or a .debug_frame (see DWARF specs)
segment. With the initial EIP value fetched via "unw_getcontext()", the
current FDE is looked up by checking if the first lies within the FDE's
"initial_location" and "address_range" members that specify the range
of instruction pointer values for which the FDE in question applies
(the DSO's load address is taken into account). While trying to locate
the appropriate FDE, libunwind parses its parent CIE and may also store
a pointer to the corresponding LSDA, if one exists.
In section 2.1.1, we mention that the information required to restore the
register state of a previous frame, are encoded in the current frame's
DWARF table. This DWARF table is encoded in a form of instructions,
like a standard ASM program. The instructions are executed using a simple
stack machine that, when halted, will return the values of the previous
frame's registers, including the instruction pointer value at the parent
call site. The instruction stream for a given FDE can be located by
following the FDE header field called "instructions".
For example, executing a set of such instructions for some random FDE
may yield the following semantics:
---------------------------------------------------------------------------
RAX: stays the same
RBX: its value results by dereferencing the stack pointer plus some
offset
RCX: its value is the result of executing DWARF expression at address
0xdeadbeef
RDX: its value is stored in the address returned by evaluating the
DWARF expression at 0xdeadbabe
...
RIP: its value results by dereferencing CFA (Cannonical Frame Address)
minus 8
---------------------------------------------------------------------------
The value of a register sometimes has to be recovered by evaluating a
DWARF expression. Obviously, once an attacker is able to inject
arbitrary DWARF data, he can force certain registers take values by
executing arbitrary DWARF expressions. This process gives control,
not only over EIP, but on all general purpose registers used by the
target architecture. Think of it as having an interpreter (Python?) on
every running process!
Once the instructions for the current FDE have been processed, the
unwinder's context is updated with the newly calculated register
values. Consequently, the new instruction pointer value points to the
parent call site. The personality routine (see next section) is called
and if an appropriate exception handler is found, it is dispatched,
otherwise, the whole process starts over from paragraph one.
Note: As we have previously stated frame unwiding is a difficult process. An
unwinder has to take care of several complexities, one of them being
the signal frames and their trampolines. The interested reader must
take a look at the appropriate kernel sources for the format of signal
frames as well as the signal trampoline code that is mostly interesting
those studying the grsecurity code.
----[ 2.4.3 - Overall algorithm
A) Using "getcontenxt(3)" or the equivalent ASM code, initiliaze the
current values of all registers including the frame pointer and the
instruction pointer
B) Use "dl_iterate_phdr(3)" to walk through all loaded shared objects.
Load .eh_frame and/or .debug_frame depending on the configuration
options
C) If .eh_frame