-
Notifications
You must be signed in to change notification settings - Fork 1.3k
/
buffer.rs
2496 lines (2249 loc) · 101 KB
/
buffer.rs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
//! In-memory representation of compiled machine code, with labels and fixups to
//! refer to those labels. Handles constant-pool island insertion and also
//! veneer insertion for out-of-range jumps.
//!
//! This code exists to solve three problems:
//!
//! - Branch targets for forward branches are not known until later, when we
//! emit code in a single pass through the instruction structs.
//!
//! - On many architectures, address references or offsets have limited range.
//! For example, on AArch64, conditional branches can only target code +/- 1MB
//! from the branch itself.
//!
//! - The lowering of control flow from the CFG-with-edges produced by
//! [BlockLoweringOrder](super::BlockLoweringOrder), combined with many empty
//! edge blocks when the register allocator does not need to insert any
//! spills/reloads/moves in edge blocks, results in many suboptimal branch
//! patterns. The lowering also pays no attention to block order, and so
//! two-target conditional forms (cond-br followed by uncond-br) can often by
//! avoided because one of the targets is the fallthrough. There are several
//! cases here where we can simplify to use fewer branches.
//!
//! This "buffer" implements a single-pass code emission strategy (with a later
//! "fixup" pass, but only through recorded fixups, not all instructions). The
//! basic idea is:
//!
//! - Emit branches as they are, including two-target (cond/uncond) compound
//! forms, but with zero offsets and optimistically assuming the target will be
//! in range. Record the "fixup" for later. Targets are denoted instead by
//! symbolic "labels" that are then bound to certain offsets in the buffer as
//! we emit code. (Nominally, there is a label at the start of every basic
//! block.)
//!
//! - As we do this, track the offset in the buffer at which the first label
//! reference "goes out of range". We call this the "deadline". If we reach the
//! deadline and we still have not bound the label to which an unresolved branch
//! refers, we have a problem!
//!
//! - To solve this problem, we emit "islands" full of "veneers". An island is
//! simply a chunk of code inserted in the middle of the code actually produced
//! by the emitter (e.g., vcode iterating over instruction structs). The emitter
//! has some awareness of this: it either asks for an island between blocks, so
//! it is not accidentally executed, or else it emits a branch around the island
//! when all other options fail (see `Inst::EmitIsland` meta-instruction).
//!
//! - A "veneer" is an instruction (or sequence of instructions) in an "island"
//! that implements a longer-range reference to a label. The idea is that, for
//! example, a branch with a limited range can branch to a "veneer" instead,
//! which is simply a branch in a form that can use a longer-range reference. On
//! AArch64, for example, conditionals have a +/- 1 MB range, but a conditional
//! can branch to an unconditional branch which has a +/- 128 MB range. Hence, a
//! conditional branch's label reference can be fixed up with a "veneer" to
//! achieve a longer range.
//!
//! - To implement all of this, we require the backend to provide a `LabelUse`
//! type that implements a trait. This is nominally an enum that records one of
//! several kinds of references to an offset in code -- basically, a relocation
//! type -- and will usually correspond to different instruction formats. The
//! `LabelUse` implementation specifies the maximum range, how to patch in the
//! actual label location when known, and how to generate a veneer to extend the
//! range.
//!
//! That satisfies label references, but we still may have suboptimal branch
//! patterns. To clean up the branches, we do a simple "peephole"-style
//! optimization on the fly. To do so, the emitter (e.g., `Inst::emit()`)
//! informs the buffer of branches in the code and, in the case of conditionals,
//! the code that would have been emitted to invert this branch's condition. We
//! track the "latest branches": these are branches that are contiguous up to
//! the current offset. (If any code is emitted after a branch, that branch or
//! run of contiguous branches is no longer "latest".) The latest branches are
//! those that we can edit by simply truncating the buffer and doing something
//! else instead.
//!
//! To optimize branches, we implement several simple rules, and try to apply
//! them to the "latest branches" when possible:
//!
//! - A branch with a label target, when that label is bound to the ending
//! offset of the branch (the fallthrough location), can be removed altogether,
//! because the branch would have no effect).
//!
//! - An unconditional branch that starts at a label location, and branches to
//! another label, results in a "label alias": all references to the label bound
//! *to* this branch instruction are instead resolved to the *target* of the
//! branch instruction. This effectively removes empty blocks that just
//! unconditionally branch to the next block. We call this "branch threading".
//!
//! - A conditional followed by an unconditional, when the conditional branches
//! to the unconditional's fallthrough, results in (i) the truncation of the
//! unconditional, (ii) the inversion of the condition's condition, and (iii)
//! replacement of the conditional's target (using the original target of the
//! unconditional). This is a fancy way of saying "we can flip a two-target
//! conditional branch's taken/not-taken targets if it works better with our
//! fallthrough". To make this work, the emitter actually gives the buffer
//! *both* forms of every conditional branch: the true form is emitted into the
//! buffer, and the "inverted" machine-code bytes are provided as part of the
//! branch-fixup metadata.
//!
//! - An unconditional B preceded by another unconditional P, when B's label(s) have
//! been redirected to target(B), can be removed entirely. This is an extension
//! of the branch-threading optimization, and is valid because if we know there
//! will be no fallthrough into this branch instruction (the prior instruction
//! is an unconditional jump), and if we know we have successfully redirected
//! all labels, then this branch instruction is unreachable. Note that this
//! works because the redirection happens before the label is ever resolved
//! (fixups happen at island emission time, at which point latest-branches are
//! cleared, or at the end of emission), so we are sure to catch and redirect
//! all possible paths to this instruction.
//!
//! # Branch-optimization Correctness
//!
//! The branch-optimization mechanism depends on a few data structures with
//! invariants, which are always held outside the scope of top-level public
//! methods:
//!
//! - The latest-branches list. Each entry describes a span of the buffer
//! (start/end offsets), the label target, the corresponding fixup-list entry
//! index, and the bytes (must be the same length) for the inverted form, if
//! conditional. The list of labels that are bound to the start-offset of this
//! branch is *complete* (if any label has a resolved offset equal to `start`
//! and is not an alias, it must appear in this list) and *precise* (no label
//! in this list can be bound to another offset). No label in this list should
//! be an alias. No two branch ranges can overlap, and branches are in
//! ascending-offset order.
//!
//! - The labels-at-tail list. This contains all MachLabels that have been bound
//! to (whose resolved offsets are equal to) the tail offset of the buffer.
//! No label in this list should be an alias.
//!
//! - The label_offsets array, containing the bound offset of a label or
//! UNKNOWN. No label can be bound at an offset greater than the current
//! buffer tail.
//!
//! - The label_aliases array, containing another label to which a label is
//! bound or UNKNOWN. A label's resolved offset is the resolved offset
//! of the label it is aliased to, if this is set.
//!
//! We argue below, at each method, how the invariants in these data structures
//! are maintained (grep for "Post-invariant").
//!
//! Given these invariants, we argue why each optimization preserves execution
//! semantics below (grep for "Preserves execution semantics").
//!
//! # Avoiding Quadratic Behavior
//!
//! There are two cases where we've had to take some care to avoid
//! quadratic worst-case behavior:
//!
//! - The "labels at this branch" list can grow unboundedly if the
//! code generator binds many labels at one location. If the count
//! gets too high (defined by the `LABEL_LIST_THRESHOLD` constant), we
//! simply abort an optimization early in a way that is always correct
//! but is conservative.
//!
//! - The fixup list can interact with island emission to create
//! "quadratic island behavior". In a little more detail, one can hit
//! this behavior by having some pending fixups (forward label
//! references) with long-range label-use kinds, and some others
//! with shorter-range references that nonetheless still are pending
//! long enough to trigger island generation. In such a case, we
//! process the fixup list, generate veneers to extend some forward
//! references' ranges, but leave the other (longer-range) ones
//! alone. The way this was implemented put them back on a list and
//! resulted in quadratic behavior.
//!
//! To avoid this fixups are split into two lists: one "pending" list and one
//! final list. The pending list is kept around for handling fixups related to
//! branches so it can be edited/truncated. When an island is reached, which
//! starts processing fixups, all pending fixups are flushed into the final
//! list. The final list is a `BinaryHeap` which enables fixup processing to
//! only process those which are required during island emission, deferring
//! all longer-range fixups to later.
use crate::binemit::{Addend, CodeOffset, Reloc};
use crate::ir::function::FunctionParameters;
use crate::ir::{ExternalName, RelSourceLoc, SourceLoc, TrapCode};
use crate::isa::unwind::UnwindInst;
use crate::machinst::{
BlockIndex, MachInstLabelUse, TextSectionBuilder, VCodeConstant, VCodeConstants, VCodeInst,
};
use crate::trace;
use crate::{ir, MachInstEmitState};
use crate::{timing, VCodeConstantData};
use cranelift_control::ControlPlane;
use cranelift_entity::{entity_impl, PrimaryMap};
use smallvec::SmallVec;
use std::cmp::Ordering;
use std::collections::BinaryHeap;
use std::mem;
use std::string::String;
use std::vec::Vec;
#[cfg(feature = "enable-serde")]
use serde::{Deserialize, Serialize};
#[cfg(feature = "enable-serde")]
pub trait CompilePhase {
type MachSrcLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone;
type SourceLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone;
}
#[cfg(not(feature = "enable-serde"))]
pub trait CompilePhase {
type MachSrcLocType: core::fmt::Debug + PartialEq + Clone;
type SourceLocType: core::fmt::Debug + PartialEq + Clone;
}
/// Status of a compiled artifact that needs patching before being used.
#[derive(Clone, Debug, PartialEq)]
#[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))]
pub struct Stencil;
/// Status of a compiled artifact ready to use.
#[derive(Clone, Debug, PartialEq)]
pub struct Final;
impl CompilePhase for Stencil {
type MachSrcLocType = MachSrcLoc<Stencil>;
type SourceLocType = RelSourceLoc;
}
impl CompilePhase for Final {
type MachSrcLocType = MachSrcLoc<Final>;
type SourceLocType = SourceLoc;
}
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
enum ForceVeneers {
Yes,
No,
}
/// A buffer of output to be produced, fixed up, and then emitted to a CodeSink
/// in bulk.
///
/// This struct uses `SmallVec`s to support small-ish function bodies without
/// any heap allocation. As such, it will be several kilobytes large. This is
/// likely fine as long as it is stack-allocated for function emission then
/// thrown away; but beware if many buffer objects are retained persistently.
pub struct MachBuffer<I: VCodeInst> {
/// The buffer contents, as raw bytes.
data: SmallVec<[u8; 1024]>,
/// Any relocations referring to this code. Note that only *external*
/// relocations are tracked here; references to labels within the buffer are
/// resolved before emission.
relocs: SmallVec<[MachReloc; 16]>,
/// Any trap records referring to this code.
traps: SmallVec<[MachTrap; 16]>,
/// Any call site records referring to this code.
call_sites: SmallVec<[MachCallSite; 16]>,
/// Any source location mappings referring to this code.
srclocs: SmallVec<[MachSrcLoc<Stencil>; 64]>,
/// Any user stack maps for this code.
///
/// Each entry is an `(offset, span, stack_map)` triple. Entries are sorted
/// by code offset, and each stack map covers `span` bytes on the stack.
user_stack_maps: SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]>,
/// Any unwind info at a given location.
unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>,
/// The current source location in progress (after `start_srcloc()` and
/// before `end_srcloc()`). This is a (start_offset, src_loc) tuple.
cur_srcloc: Option<(CodeOffset, RelSourceLoc)>,
/// Known label offsets; `UNKNOWN_LABEL_OFFSET` if unknown.
label_offsets: SmallVec<[CodeOffset; 16]>,
/// Label aliases: when one label points to an unconditional jump, and that
/// jump points to another label, we can redirect references to the first
/// label immediately to the second.
///
/// Invariant: we don't have label-alias cycles. We ensure this by,
/// before setting label A to alias label B, resolving B's alias
/// target (iteratively until a non-aliased label); if B is already
/// aliased to A, then we cannot alias A back to B.
label_aliases: SmallVec<[MachLabel; 16]>,
/// Constants that must be emitted at some point.
pending_constants: SmallVec<[VCodeConstant; 16]>,
/// Byte size of all constants in `pending_constants`.
pending_constants_size: CodeOffset,
/// Traps that must be emitted at some point.
pending_traps: SmallVec<[MachLabelTrap; 16]>,
/// Fixups that haven't yet been flushed into `fixup_records` below and may
/// be related to branches that are chomped. These all get added to
/// `fixup_records` during island emission.
pending_fixup_records: SmallVec<[MachLabelFixup<I>; 16]>,
/// The nearest upcoming deadline for entries in `pending_fixup_records`.
pending_fixup_deadline: CodeOffset,
/// Fixups that must be performed after all code is emitted.
fixup_records: BinaryHeap<MachLabelFixup<I>>,
/// Latest branches, to facilitate in-place editing for better fallthrough
/// behavior and empty-block removal.
latest_branches: SmallVec<[MachBranch; 4]>,
/// All labels at the current offset (emission tail). This is lazily
/// cleared: it is actually accurate as long as the current offset is
/// `labels_at_tail_off`, but if `cur_offset()` has grown larger, it should
/// be considered as empty.
///
/// For correctness, this *must* be complete (i.e., the vector must contain
/// all labels whose offsets are resolved to the current tail), because we
/// rely on it to update labels when we truncate branches.
labels_at_tail: SmallVec<[MachLabel; 4]>,
/// The last offset at which `labels_at_tail` is valid. It is conceptually
/// always describing the tail of the buffer, but we do not clear
/// `labels_at_tail` eagerly when the tail grows, rather we lazily clear it
/// when the offset has grown past this (`labels_at_tail_off`) point.
/// Always <= `cur_offset()`.
labels_at_tail_off: CodeOffset,
/// Metadata about all constants that this function has access to.
///
/// This records the size/alignment of all constants (not the actual data)
/// along with the last available label generated for the constant. This map
/// is consulted when constants are referred to and the label assigned to a
/// constant may change over time as well.
constants: PrimaryMap<VCodeConstant, MachBufferConstant>,
/// All recorded usages of constants as pairs of the constant and where the
/// constant needs to be placed within `self.data`. Note that the same
/// constant may appear in this array multiple times if it was emitted
/// multiple times.
used_constants: SmallVec<[(VCodeConstant, CodeOffset); 4]>,
/// Indicates when a patchable region is currently open, to guard that it's
/// not possible to nest patchable regions.
open_patchable: bool,
}
impl MachBufferFinalized<Stencil> {
/// Get a finalized machine buffer by applying the function's base source location.
pub fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachBufferFinalized<Final> {
MachBufferFinalized {
data: self.data,
relocs: self.relocs,
traps: self.traps,
call_sites: self.call_sites,
srclocs: self
.srclocs
.into_iter()
.map(|srcloc| srcloc.apply_base_srcloc(base_srcloc))
.collect(),
user_stack_maps: self.user_stack_maps,
unwind_info: self.unwind_info,
alignment: self.alignment,
}
}
}
/// A `MachBuffer` once emission is completed: holds generated code and records,
/// without fixups. This allows the type to be independent of the backend.
#[derive(PartialEq, Debug, Clone)]
#[cfg_attr(
feature = "enable-serde",
derive(serde_derive::Serialize, serde_derive::Deserialize)
)]
pub struct MachBufferFinalized<T: CompilePhase> {
/// The buffer contents, as raw bytes.
pub(crate) data: SmallVec<[u8; 1024]>,
/// Any relocations referring to this code. Note that only *external*
/// relocations are tracked here; references to labels within the buffer are
/// resolved before emission.
pub(crate) relocs: SmallVec<[FinalizedMachReloc; 16]>,
/// Any trap records referring to this code.
pub(crate) traps: SmallVec<[MachTrap; 16]>,
/// Any call site records referring to this code.
pub(crate) call_sites: SmallVec<[MachCallSite; 16]>,
/// Any source location mappings referring to this code.
pub(crate) srclocs: SmallVec<[T::MachSrcLocType; 64]>,
/// Any user stack maps for this code.
///
/// Each entry is an `(offset, span, stack_map)` triple. Entries are sorted
/// by code offset, and each stack map covers `span` bytes on the stack.
pub(crate) user_stack_maps: SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]>,
/// Any unwind info at a given location.
pub unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>,
/// The required alignment of this buffer.
pub alignment: u32,
}
const UNKNOWN_LABEL_OFFSET: CodeOffset = 0xffff_ffff;
const UNKNOWN_LABEL: MachLabel = MachLabel(0xffff_ffff);
/// Threshold on max length of `labels_at_this_branch` list to avoid
/// unbounded quadratic behavior (see comment below at use-site).
const LABEL_LIST_THRESHOLD: usize = 100;
/// A label refers to some offset in a `MachBuffer`. It may not be resolved at
/// the point at which it is used by emitted code; the buffer records "fixups"
/// for references to the label, and will come back and patch the code
/// appropriately when the label's location is eventually known.
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct MachLabel(u32);
entity_impl!(MachLabel);
impl MachLabel {
/// Get a label for a block. (The first N MachLabels are always reserved for
/// the N blocks in the vcode.)
pub fn from_block(bindex: BlockIndex) -> MachLabel {
MachLabel(bindex.index() as u32)
}
/// Get the numeric label index.
pub fn get(self) -> u32 {
self.0
}
/// Creates a string representing this label, for convenience.
pub fn to_string(&self) -> String {
format!("label{}", self.0)
}
}
impl Default for MachLabel {
fn default() -> Self {
UNKNOWN_LABEL
}
}
/// Represents the beginning of an editable region in the [`MachBuffer`], while code emission is
/// still occurring. An [`OpenPatchRegion`] is closed by [`MachBuffer::end_patchable`], consuming
/// the [`OpenPatchRegion`] token in the process.
pub struct OpenPatchRegion(usize);
/// A region in the [`MachBuffer`] code buffer that can be edited prior to finalization. An example
/// of where you might want to use this is for patching instructions that mention constants that
/// won't be known until later: [`MachBuffer::start_patchable`] can be used to begin the patchable
/// region, instructions can be emitted with placeholder constants, and the [`PatchRegion`] token
/// can be produced by [`MachBuffer::end_patchable`]. Once the values of those constants are known,
/// the [`PatchRegion::patch`] function can be used to get a mutable buffer to the instruction
/// bytes, and the constants uses can be updated directly.
pub struct PatchRegion {
range: std::ops::Range<usize>,
}
impl PatchRegion {
/// Consume the patch region to yield a mutable slice of the [`MachBuffer`] data buffer.
pub fn patch<I: VCodeInst>(self, buffer: &mut MachBuffer<I>) -> &mut [u8] {
&mut buffer.data[self.range]
}
}
impl<I: VCodeInst> MachBuffer<I> {
/// Create a new section, known to start at `start_offset` and with a size limited to
/// `length_limit`.
pub fn new() -> MachBuffer<I> {
MachBuffer {
data: SmallVec::new(),
relocs: SmallVec::new(),
traps: SmallVec::new(),
call_sites: SmallVec::new(),
srclocs: SmallVec::new(),
user_stack_maps: SmallVec::new(),
unwind_info: SmallVec::new(),
cur_srcloc: None,
label_offsets: SmallVec::new(),
label_aliases: SmallVec::new(),
pending_constants: SmallVec::new(),
pending_constants_size: 0,
pending_traps: SmallVec::new(),
pending_fixup_records: SmallVec::new(),
pending_fixup_deadline: u32::MAX,
fixup_records: Default::default(),
latest_branches: SmallVec::new(),
labels_at_tail: SmallVec::new(),
labels_at_tail_off: 0,
constants: Default::default(),
used_constants: Default::default(),
open_patchable: false,
}
}
/// Current offset from start of buffer.
pub fn cur_offset(&self) -> CodeOffset {
self.data.len() as CodeOffset
}
/// Add a byte.
pub fn put1(&mut self, value: u8) {
self.data.push(value);
// Post-invariant: conceptual-labels_at_tail contains a complete and
// precise list of labels bound at `cur_offset()`. We have advanced
// `cur_offset()`, hence if it had been equal to `labels_at_tail_off`
// before, it is not anymore (and it cannot become equal, because
// `labels_at_tail_off` is always <= `cur_offset()`). Thus the list is
// conceptually empty (even though it is only lazily cleared). No labels
// can be bound at this new offset (by invariant on `label_offsets`).
// Hence the invariant holds.
}
/// Add 2 bytes.
pub fn put2(&mut self, value: u16) {
let bytes = value.to_le_bytes();
self.data.extend_from_slice(&bytes[..]);
// Post-invariant: as for `put1()`.
}
/// Add 4 bytes.
pub fn put4(&mut self, value: u32) {
let bytes = value.to_le_bytes();
self.data.extend_from_slice(&bytes[..]);
// Post-invariant: as for `put1()`.
}
/// Add 8 bytes.
pub fn put8(&mut self, value: u64) {
let bytes = value.to_le_bytes();
self.data.extend_from_slice(&bytes[..]);
// Post-invariant: as for `put1()`.
}
/// Add a slice of bytes.
pub fn put_data(&mut self, data: &[u8]) {
self.data.extend_from_slice(data);
// Post-invariant: as for `put1()`.
}
/// Reserve appended space and return a mutable slice referring to it.
pub fn get_appended_space(&mut self, len: usize) -> &mut [u8] {
let off = self.data.len();
let new_len = self.data.len() + len;
self.data.resize(new_len, 0);
&mut self.data[off..]
// Post-invariant: as for `put1()`.
}
/// Align up to the given alignment.
pub fn align_to(&mut self, align_to: CodeOffset) {
trace!("MachBuffer: align to {}", align_to);
assert!(
align_to.is_power_of_two(),
"{align_to} is not a power of two"
);
while self.cur_offset() & (align_to - 1) != 0 {
self.put1(0);
}
// Post-invariant: as for `put1()`.
}
/// Begin a region of patchable code. There is one requirement for the
/// code that is emitted: It must not introduce any instructions that
/// could be chomped (branches are an example of this). In other words,
/// you must not call [`MachBuffer::add_cond_branch`] or
/// [`MachBuffer::add_uncond_branch`] between calls to this method and
/// [`MachBuffer::end_patchable`].
pub fn start_patchable(&mut self) -> OpenPatchRegion {
assert!(!self.open_patchable, "Patchable regions may not be nested");
self.open_patchable = true;
OpenPatchRegion(usize::try_from(self.cur_offset()).unwrap())
}
/// End a region of patchable code, yielding a [`PatchRegion`] value that
/// can be consumed later to produce a one-off mutable slice to the
/// associated region of the data buffer.
pub fn end_patchable(&mut self, open: OpenPatchRegion) -> PatchRegion {
// No need to assert the state of `open_patchable` here, as we take
// ownership of the only `OpenPatchable` value.
self.open_patchable = false;
let end = usize::try_from(self.cur_offset()).unwrap();
PatchRegion { range: open.0..end }
}
/// Allocate a `Label` to refer to some offset. May not be bound to a fixed
/// offset yet.
pub fn get_label(&mut self) -> MachLabel {
let l = self.label_offsets.len() as u32;
self.label_offsets.push(UNKNOWN_LABEL_OFFSET);
self.label_aliases.push(UNKNOWN_LABEL);
trace!("MachBuffer: new label -> {:?}", MachLabel(l));
MachLabel(l)
// Post-invariant: the only mutation is to add a new label; it has no
// bound offset yet, so it trivially satisfies all invariants.
}
/// Reserve the first N MachLabels for blocks.
pub fn reserve_labels_for_blocks(&mut self, blocks: usize) {
trace!("MachBuffer: first {} labels are for blocks", blocks);
debug_assert!(self.label_offsets.is_empty());
self.label_offsets.resize(blocks, UNKNOWN_LABEL_OFFSET);
self.label_aliases.resize(blocks, UNKNOWN_LABEL);
// Post-invariant: as for `get_label()`.
}
/// Registers metadata in this `MachBuffer` about the `constants` provided.
///
/// This will record the size/alignment of all constants which will prepare
/// them for emission later on.
pub fn register_constants(&mut self, constants: &VCodeConstants) {
for (c, val) in constants.iter() {
self.register_constant(&c, val);
}
}
/// Similar to [`MachBuffer::register_constants`] but registers a
/// single constant metadata. This function is useful in
/// situations where not all constants are known at the time of
/// emission.
pub fn register_constant(&mut self, constant: &VCodeConstant, data: &VCodeConstantData) {
let c2 = self.constants.push(MachBufferConstant {
upcoming_label: None,
align: data.alignment(),
size: data.as_slice().len(),
});
assert_eq!(*constant, c2);
}
/// Completes constant emission by iterating over `self.used_constants` and
/// filling in the "holes" with the constant values provided by `constants`.
///
/// Returns the alignment required for this entire buffer. Alignment starts
/// at the ISA's minimum function alignment and can be increased due to
/// constant requirements.
fn finish_constants(&mut self, constants: &VCodeConstants) -> u32 {
let mut alignment = I::function_alignment().minimum;
for (constant, offset) in mem::take(&mut self.used_constants) {
let constant = constants.get(constant);
let data = constant.as_slice();
self.data[offset as usize..][..data.len()].copy_from_slice(data);
alignment = constant.alignment().max(alignment);
}
alignment
}
/// Returns a label that can be used to refer to the `constant` provided.
///
/// This will automatically defer a new constant to be emitted for
/// `constant` if it has not been previously emitted. Note that this
/// function may return a different label for the same constant at
/// different points in time. The label is valid to use only from the
/// current location; the MachBuffer takes care to emit the same constant
/// multiple times if needed so the constant is always in range.
pub fn get_label_for_constant(&mut self, constant: VCodeConstant) -> MachLabel {
let MachBufferConstant {
align,
size,
upcoming_label,
} = self.constants[constant];
if let Some(label) = upcoming_label {
return label;
}
let label = self.get_label();
trace!(
"defer constant: eventually emit {size} bytes aligned \
to {align} at label {label:?}",
);
self.pending_constants.push(constant);
self.pending_constants_size += size as u32;
self.constants[constant].upcoming_label = Some(label);
label
}
/// Bind a label to the current offset. A label can only be bound once.
pub fn bind_label(&mut self, label: MachLabel, ctrl_plane: &mut ControlPlane) {
trace!(
"MachBuffer: bind label {:?} at offset {}",
label,
self.cur_offset()
);
debug_assert_eq!(self.label_offsets[label.0 as usize], UNKNOWN_LABEL_OFFSET);
debug_assert_eq!(self.label_aliases[label.0 as usize], UNKNOWN_LABEL);
let offset = self.cur_offset();
self.label_offsets[label.0 as usize] = offset;
self.lazily_clear_labels_at_tail();
self.labels_at_tail.push(label);
// Invariants hold: bound offset of label is <= cur_offset (in fact it
// is equal). If the `labels_at_tail` list was complete and precise
// before, it is still, because we have bound this label to the current
// offset and added it to the list (which contains all labels at the
// current offset).
self.optimize_branches(ctrl_plane);
// Post-invariant: by `optimize_branches()` (see argument there).
}
/// Lazily clear `labels_at_tail` if the tail offset has moved beyond the
/// offset that it applies to.
fn lazily_clear_labels_at_tail(&mut self) {
let offset = self.cur_offset();
if offset > self.labels_at_tail_off {
self.labels_at_tail_off = offset;
self.labels_at_tail.clear();
}
// Post-invariant: either labels_at_tail_off was at cur_offset, and
// state is untouched, or was less than cur_offset, in which case the
// labels_at_tail list was conceptually empty, and is now actually
// empty.
}
/// Resolve a label to an offset, if known. May return `UNKNOWN_LABEL_OFFSET`.
pub(crate) fn resolve_label_offset(&self, mut label: MachLabel) -> CodeOffset {
let mut iters = 0;
while self.label_aliases[label.0 as usize] != UNKNOWN_LABEL {
label = self.label_aliases[label.0 as usize];
// To protect against an infinite loop (despite our assurances to
// ourselves that the invariants make this impossible), assert out
// after 1M iterations. The number of basic blocks is limited
// in most contexts anyway so this should be impossible to hit with
// a legitimate input.
iters += 1;
assert!(iters < 1_000_000, "Unexpected cycle in label aliases");
}
self.label_offsets[label.0 as usize]
// Post-invariant: no mutations.
}
/// Emit a reference to the given label with the given reference type (i.e.,
/// branch-instruction format) at the current offset. This is like a
/// relocation, but handled internally.
///
/// This can be called before the branch is actually emitted; fixups will
/// not happen until an island is emitted or the buffer is finished.
pub fn use_label_at_offset(&mut self, offset: CodeOffset, label: MachLabel, kind: I::LabelUse) {
trace!(
"MachBuffer: use_label_at_offset: offset {} label {:?} kind {:?}",
offset,
label,
kind
);
// Add the fixup, and update the worst-case island size based on a
// veneer for this label use.
let fixup = MachLabelFixup {
label,
offset,
kind,
};
self.pending_fixup_deadline = self.pending_fixup_deadline.min(fixup.deadline());
self.pending_fixup_records.push(fixup);
// Post-invariant: no mutations to branches/labels data structures.
}
/// Inform the buffer of an unconditional branch at the given offset,
/// targeting the given label. May be used to optimize branches.
/// The last added label-use must correspond to this branch.
/// This must be called when the current offset is equal to `start`; i.e.,
/// before actually emitting the branch. This implies that for a branch that
/// uses a label and is eligible for optimizations by the MachBuffer, the
/// proper sequence is:
///
/// - Call `use_label_at_offset()` to emit the fixup record.
/// - Call `add_uncond_branch()` to make note of the branch.
/// - Emit the bytes for the branch's machine code.
///
/// Additional requirement: no labels may be bound between `start` and `end`
/// (exclusive on both ends).
pub fn add_uncond_branch(&mut self, start: CodeOffset, end: CodeOffset, target: MachLabel) {
debug_assert!(
!self.open_patchable,
"Branch instruction inserted within a patchable region"
);
assert!(self.cur_offset() == start);
debug_assert!(end > start);
assert!(!self.pending_fixup_records.is_empty());
let fixup = self.pending_fixup_records.len() - 1;
self.lazily_clear_labels_at_tail();
self.latest_branches.push(MachBranch {
start,
end,
target,
fixup,
inverted: None,
labels_at_this_branch: self.labels_at_tail.clone(),
});
// Post-invariant: we asserted branch start is current tail; the list of
// labels at branch is cloned from list of labels at current tail.
}
/// Inform the buffer of a conditional branch at the given offset,
/// targeting the given label. May be used to optimize branches.
/// The last added label-use must correspond to this branch.
///
/// Additional requirement: no labels may be bound between `start` and `end`
/// (exclusive on both ends).
pub fn add_cond_branch(
&mut self,
start: CodeOffset,
end: CodeOffset,
target: MachLabel,
inverted: &[u8],
) {
debug_assert!(
!self.open_patchable,
"Branch instruction inserted within a patchable region"
);
assert!(self.cur_offset() == start);
debug_assert!(end > start);
assert!(!self.pending_fixup_records.is_empty());
debug_assert!(
inverted.len() == (end - start) as usize,
"branch length = {}, but inverted length = {}",
end - start,
inverted.len()
);
let fixup = self.pending_fixup_records.len() - 1;
let inverted = Some(SmallVec::from(inverted));
self.lazily_clear_labels_at_tail();
self.latest_branches.push(MachBranch {
start,
end,
target,
fixup,
inverted,
labels_at_this_branch: self.labels_at_tail.clone(),
});
// Post-invariant: we asserted branch start is current tail; labels at
// branch list is cloned from list of labels at current tail.
}
fn truncate_last_branch(&mut self) {
debug_assert!(
!self.open_patchable,
"Branch instruction truncated within a patchable region"
);
self.lazily_clear_labels_at_tail();
// Invariants hold at this point.
let b = self.latest_branches.pop().unwrap();
assert!(b.end == self.cur_offset());
// State:
// [PRE CODE]
// Offset b.start, b.labels_at_this_branch:
// [BRANCH CODE]
// cur_off, self.labels_at_tail -->
// (end of buffer)
self.data.truncate(b.start as usize);
self.pending_fixup_records.truncate(b.fixup);
while let Some(last_srcloc) = self.srclocs.last_mut() {
if last_srcloc.end <= b.start {
break;
}
if last_srcloc.start < b.start {
last_srcloc.end = b.start;
break;
}
self.srclocs.pop();
}
// State:
// [PRE CODE]
// cur_off, Offset b.start, b.labels_at_this_branch:
// (end of buffer)
//
// self.labels_at_tail --> (past end of buffer)
let cur_off = self.cur_offset();
self.labels_at_tail_off = cur_off;
// State:
// [PRE CODE]
// cur_off, Offset b.start, b.labels_at_this_branch,
// self.labels_at_tail:
// (end of buffer)
//
// resolve_label_offset(l) for l in labels_at_tail:
// (past end of buffer)
trace!(
"truncate_last_branch: truncated {:?}; off now {}",
b,
cur_off
);
// Fix up resolved label offsets for labels at tail.
for &l in &self.labels_at_tail {
self.label_offsets[l.0 as usize] = cur_off;
}
// Old labels_at_this_branch are now at cur_off.
self.labels_at_tail
.extend(b.labels_at_this_branch.into_iter());
// Post-invariant: this operation is defined to truncate the buffer,
// which moves cur_off backward, and to move labels at the end of the
// buffer back to the start-of-branch offset.
//
// latest_branches satisfies all invariants:
// - it has no branches past the end of the buffer (branches are in
// order, we removed the last one, and we truncated the buffer to just
// before the start of that branch)
// - no labels were moved to lower offsets than the (new) cur_off, so
// the labels_at_this_branch list for any other branch need not change.
//
// labels_at_tail satisfies all invariants:
// - all labels that were at the tail after the truncated branch are
// moved backward to just before the branch, which becomes the new tail;
// thus every element in the list should remain (ensured by `.extend()`
// above).
// - all labels that refer to the new tail, which is the start-offset of
// the truncated branch, must be present. The `labels_at_this_branch`
// list in the truncated branch's record is a complete and precise list
// of exactly these labels; we append these to labels_at_tail.
// - labels_at_tail_off is at cur_off after truncation occurs, so the
// list is valid (not to be lazily cleared).
//
// The stated operation was performed:
// - For each label at the end of the buffer prior to this method, it
// now resolves to the new (truncated) end of the buffer: it must have
// been in `labels_at_tail` (this list is precise and complete, and
// the tail was at the end of the truncated branch on entry), and we
// iterate over this list and set `label_offsets` to the new tail.
// None of these labels could have been an alias (by invariant), so
// `label_offsets` is authoritative for each.
// - No other labels will be past the end of the buffer, because of the
// requirement that no labels be bound to the middle of branch ranges
// (see comments to `add_{cond,uncond}_branch()`).
// - The buffer is truncated to just before the last branch, and the
// fixup record referring to that last branch is removed.
}
/// Performs various optimizations on branches pointing at the current label.
pub fn optimize_branches(&mut self, ctrl_plane: &mut ControlPlane) {
if ctrl_plane.get_decision() {
return;
}
self.lazily_clear_labels_at_tail();
// Invariants valid at this point.
trace!(
"enter optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}",
self.latest_branches,
self.labels_at_tail,
self.pending_fixup_records
);
// We continue to munch on branches at the tail of the buffer until no
// more rules apply. Note that the loop only continues if a branch is
// actually truncated (or if labels are redirected away from a branch),
// so this always makes progress.
while let Some(b) = self.latest_branches.last() {
let cur_off = self.cur_offset();
trace!("optimize_branches: last branch {:?} at off {}", b, cur_off);
// If there has been any code emission since the end of the last branch or
// label definition, then there's nothing we can edit (because we
// don't move code once placed, only back up and overwrite), so
// clear the records and finish.
if b.end < cur_off {
break;
}
// If the "labels at this branch" list on this branch is
// longer than a threshold, don't do any simplification,
// and let the branch remain to separate those labels from
// the current tail. This avoids quadratic behavior (see
// #3468): otherwise, if a long string of "goto next;
// next:" patterns are emitted, all of the labels will
// coalesce into a long list of aliases for the current
// buffer tail. We must track all aliases of the current
// tail for correctness, but we are also allowed to skip
// optimization (removal) of any branch, so we take the
// escape hatch here and let it stand. In effect this
// "spreads" the many thousands of labels in the
// pathological case among an actual (harmless but
// suboptimal) instruction once per N labels.
if b.labels_at_this_branch.len() > LABEL_LIST_THRESHOLD {
break;
}
// Invariant: we are looking at a branch that ends at the tail of
// the buffer.
// For any branch, conditional or unconditional:
// - If the target is a label at the current offset, then remove
// the conditional branch, and reset all labels that targeted
// the current offset (end of branch) to the truncated
// end-of-code.
//
// Preserves execution semantics: a branch to its own fallthrough
// address is equivalent to a no-op; in both cases, nextPC is the
// fallthrough.
if self.resolve_label_offset(b.target) == cur_off {
trace!("branch with target == cur off; truncating");
self.truncate_last_branch();
continue;
}
// If latest is an unconditional branch:
//
// - If the branch's target is not its own start address, then for
// each label at the start of branch, make the label an alias of the
// branch target, and remove the label from the "labels at this
// branch" list.
//
// - Preserves execution semantics: an unconditional branch's
// only effect is to set PC to a new PC; this change simply
// collapses one step in the step-semantics.
//
// - Post-invariant: the labels that were bound to the start of
// this branch become aliases, so they must not be present in any
// labels-at-this-branch list or the labels-at-tail list. The
// labels are removed form the latest-branch record's
// labels-at-this-branch list, and are never placed in the
// labels-at-tail list. Furthermore, it is correct that they are