dotnet · jakobbotsch · May 16, 2025 · May 15, 2025 · May 15, 2025
diff --git a/src/coreclr/jit/async.cpp b/src/coreclr/jit/async.cpp
@@ -1,6 +1,38 @@
 // Licensed to the .NET Foundation under one or more agreements.
 // The .NET Foundation licenses this file to you under the MIT license.
 
+//
+// This file implements the transformation of C# async methods into state
+// machines. The transformation takes place late in the JIT pipeline, when most
+// optimizations have already been performed, right before lowering.
+//
+// The transformation performs the following key operations:
+//
+// 1. Each async call becomes a suspension point where execution can pause and
+//    return to the caller, accompanied by a resumption point where execution can
+//    continue when the awaited operation completes.
+//
+// 2. When suspending at a suspension point a continuation object is created that contains:
+//    - All live local variables
+//    - State number to identify which await is being resumed
+//    - Return value from the awaited operation (filled in by the callee later)
+//    - Exception information if an exception occurred
+//    - Resumption function pointer
+//    - Flags containing additional information
+//
+// 3. The method entry is modified to include dispatch logic that checks for an
+//    incoming continuation and jumps to the appropriate resumption point.
+//
+// 4. Special handling is included for:
+//    - Exception propagation across await boundaries
+//    - Return value management for different types (primitives, references, structs)
+//    - Tiered compilation and On-Stack Replacement (OSR)
+//    - Optimized state capture based on variable liveness analysis
+//
+// The transformation ensures that the semantics of the original async method are
+// preserved while enabling efficient suspension and resumption of execution.
+//
+
 #include "jitpch.h"
 #include "jitstd/algorithm.h"
 #include "async.h"

diff --git a/src/coreclr/jit/inductionvariableopts.cpp b/src/coreclr/jit/inductionvariableopts.cpp
@@ -1,11 +1,12 @@
 // Licensed to the .NET Foundation under one or more agreements.
 // The .NET Foundation licenses this file to you under the MIT license.
 
+//
 // This file contains code to optimize induction variables in loops based on
 // scalar evolution analysis (see scev.h and scev.cpp for more information
 // about the scalar evolution analysis).
 //
-// Currently the following optimizations are done:
+// Currently the following optimizations are implemented:
 //
 // IV widening:
 //   This widens primary induction variables from 32 bits into 64 bits. This is
@@ -37,14 +38,19 @@
 //   single instruction, bypassing the need to do a separate comparison with a
 //   bound.
 //
-// Strength reduction (disabled):
-//   This changes the stride of primary IVs in a loop to avoid more expensive
-//   multiplications inside the loop. Commonly the primary IVs are only used
-//   for indexing memory at some element size, which can end up with these
-//   multiplications.
-//
-//   Strength reduction frequently relies on reversing the loop to remove the
-//   last non-multiplied use of the primary IV.
+// Strength reduction:
+//   Strength reduction identifies cases where all uses of a primary IV compute
+//   a common derived value. Commonly this happens when indexing memory at some
+//   element size, resulting in multiplications. It introduces a new primary IV
+//   that directly computes this derived value, avoiding the need for the
+//   original primary IV and its associated calculations. The optimization
+//   handles GC pointers carefully, ensuring all accesses remain within managed
+//   objects.
+//
+// Unused IV removal:
+//   This removes induction variables that are only used for self-updates with
+//   no external uses. This commonly happens after other IV optimizations have
+//   replaced all meaningful uses of an IV with a different, more efficient IV.
 //
 
 #include "jitpch.h"

diff --git a/src/coreclr/jit/promotion.cpp b/src/coreclr/jit/promotion.cpp
@@ -1,6 +1,37 @@
 // Licensed to the .NET Foundation under one or more agreements.
 // The .NET Foundation licenses this file to you under the MIT license.
 
+//
+// Physical promotion is an optimization where struct fields accessed as
+// LCL_FLD nodes are promoted to individual primitive-typed local variables
+// accessed as LCL_VAR, allowing register allocation and removing unnecessary
+// memory operations.
+//
+// Key components:
+//
+// 1. Candidate Identification:
+//    - Identifies struct locals that aren't already promoted and aren't address-exposed
+//    - Analyzes access patterns to determine which fields are good promotion candidates
+//    - Uses weighted cost models to balance performance and code size and to take PGO
+//      data into account
+//
+// 2. Field Promotion:
+//    - Creates primitive-typed replacement locals for selected fields
+//    - Records which parts of the struct remains unpromoted
+//
+// 3. Access Transformation:
+//    - Transforms local field accesses to use promoted field variables
+//    - Decomposes struct stores and copies to operate on the primitive fields
+//    - Handles call argument passing and returns with field lists where appropriate
+//    - Tracks when values in promoted fields vs. original struct are fresher
+//    - Inserts read-backs when the struct field is fresher than the promoted local
+//    - Inserts write-backs when the promoted local is fresher than the struct field
+//    - Ensures proper state across basic block boundaries and exception flow
+//
+// The transformation carefully handles OSR locals, parameters, and call arguments,
+// while maintaining correct behavior for exception handling and control flow.
+//
+
 #include "jitpch.h"
 #include "promotion.h"
 #include "jitstd/algorithm.h"

diff --git a/src/coreclr/jit/promotiondecomposition.cpp b/src/coreclr/jit/promotiondecomposition.cpp
@@ -1,6 +1,33 @@
 // Licensed to the .NET Foundation under one or more agreements.
 // The .NET Foundation licenses this file to you under the MIT license.
 
+//
+// This file provides the machinery to decompose stores and initializations
+// involving physically promoted structs into stores/initialization involving
+// individual fields.
+//
+// Key components include:
+//
+// 1. DecompositionStatementList
+//    - Collects statement trees during decomposition
+//    - Converts them to a single comma tree at the end
+//
+// 2. DecompositionPlan
+//    - Plans the decomposition of block operations
+//    - Manages mappings between source and destination replacements
+//    - Supports both copies between structs and initializations
+//    - Creates specialized access plans for remainders (unpromoted parts)
+//
+// 3. Field-by-field copying and initialization
+//    - Determines optimal order and strategy for field operations
+//    - Handles cases where replacements partially overlap
+//    - Optimizes GC pointer handling to minimize write barriers
+//    - Special cases primitive fields when possible
+//
+// This works in coordination with the ReplaceVisitor from promotion.cpp to
+// transform IR after physical promotion decisions have been made.
+//
+
 #include "jitpch.h"
 #include "promotion.h"
 #include "jitstd/algorithm.h"

diff --git a/src/coreclr/jit/promotionliveness.cpp b/src/coreclr/jit/promotionliveness.cpp
@@ -4,6 +4,42 @@
 #include "jitpch.h"
 #include "promotion.h"
 
+//
+// This file implements a specialized liveness analysis for physically promoted struct fields
+// and remainders. Unlike standard JIT liveness analysis, it focuses on accurately tracking
+// which fields are live at specific program points to optimize physically promoted struct operations.
+//
+// Key characteristics:
+//
+// 1. Separate Bit Vectors:
+//    - Maintains its own liveness bit vectors separate from the main compiler's bbLiveIn/bbLiveOut
+//    - Uses "dense" indices: bit vectors only contain entries for the remainder and replacement
+//      fields of physically promoted structs (allocating 1 + num_fields indices per local)
+//    - Does not update BasicBlock::bbLiveIn or other standard liveness storage, as this would
+//      require allocating regular tracked indices (lvVarIndex) for all new fields
+//
+// 2. Liveness Representation:
+//    - Writes liveness into IR using normal GTF_VAR_DEATH flags
+//    - Important: After liveness is computed but before replacement phase completes,
+//      GTF_VAR_DEATH semantics temporarily differ from the rest of the JIT
+//      (e.g., "LCL_FLD int V16 [+8] (last use)" indicates that specific field is dying,
+//      not the whole variable)
+//    - For struct uses that can indicate deaths of multiple fields or remainder parts,
+//      maintains side information accessed via GetDeathsForStructLocal()
+//
+// 3. Analysis Process:
+//    - Single-pass dataflow computation (no DCE iterations, unlike other liveness passes)
+//    - Handles QMark nodes specially for conditional execution
+//    - Accounts for implicit exception flow
+//    - Distinguishes between full definitions and partial definitions
+//
+// The liveness information is critical for:
+// - Avoiding creation of dead stores (especially to remainders, which the SSA liveness
+//   pass handles very conservatively as partial definitions)
+// - Marking replacement fields with proper liveness flags for subsequent compiler phases
+// - Optimizing read-back operations by determining when they're unnecessary
+//
+
 struct BasicBlockLiveness
 {
     // Variables used before a full definition.
@@ -22,54 +58,6 @@ struct BasicBlockLiveness
 // Run:
 //   Compute liveness information pertaining the promoted structs.
 //
-// Remarks:
-//   For each promoted aggregate we compute the liveness for its remainder and
-//   all of its fields. Unlike regular liveness we currently do not do any DCE
-//   here and so only do the dataflow computation once.
-//
-//   The liveness information is written into the IR using the normal
-//   GTF_VAR_DEATH flag. Note that the semantics of GTF_VAR_DEATH differs from
-//   the rest of the JIT for a short while between the liveness is computed and
-//   the replacement phase has run: in particular, after this liveness pass you
-//   may see a node like:
-//
-//       LCL_FLD   int    V16 tmp9         [+8] (last use)
-//
-//   that indicates that this particular field (or the remainder if it wasn't
-//   promoted) is dying, not that V16 itself is dying. After replacement has
-//   run the semantics align with the rest of the JIT: in the promoted case V16
-//   [+8] would be replaced by its promoted field local, and in the remainder
-//   case all non-remainder uses of V16 would also be.
-//
-//   There is one catch which is struct uses of the local. These can indicate
-//   deaths of multiple fields and also the remainder, so this information is
-//   stored on the side. PromotionLiveness::GetDeathsForStructLocal is used to
-//   query this information.
-//
-//   The liveness information is used by decomposition to avoid creating dead
-//   stores, and also to mark the replacement field uses/defs with proper
-//   up-to-date liveness information to be used by future phases (forward sub
-//   and morph, as of writing this). It is also used to avoid creating
-//   unnecessary read-backs; this is mostly just a TP optimization as future
-//   liveness passes would be expected to DCE these anyway.
-//
-//   Avoiding the creation of dead stores to the remainder is especially
-//   important as these otherwise would often end up looking like partial
-//   definitions, and the other liveness passes handle partial definitions very
-//   conservatively and are not able to DCE them.
-//
-//   Unlike the other liveness passes we keep the per-block liveness
-//   information on the side and we do not update BasicBlock::bbLiveIn et al.
-//   This relies on downstream phases not requiring/wanting to use per-basic
-//   block live-in/live-out/var-use/var-def sets. To be able to update these we
-//   would need to give the new locals "regular" tracked indices (i.e. allocate
-//   a lvVarIndex).
-//
-//   The indices allocated and used internally within the liveness computation
-//   are "dense" in the sense that the bit vectors only have indices for
-//   remainders and the replacement fields introduced by this pass. In other
-//   words, we allocate 1 + num_fields indices for each promoted struct local).
-//
 void PromotionLiveness::Run()
 {
     m_structLclToTrackedIndex = new (m_compiler, CMK_Promotion) unsigned[m_compiler->lvaCount]{};

diff --git a/src/coreclr/jit/scev.cpp b/src/coreclr/jit/scev.cpp
@@ -1,12 +1,13 @@
 // Licensed to the .NET Foundation under one or more agreements.
 // The .NET Foundation licenses this file to you under the MIT license.
 
+//
 // This file contains code to analyze how the value of induction variables
 // evolve (scalar evolution analysis), and to turn them into the SCEV IR
 // defined in scev.h. The analysis is inspired by "Michael Wolfe. 1992. Beyond
 // induction variables." and also by LLVM's scalar evolution analysis.
 //
-// The main idea of scalar evolution nalysis is to give a closed form
+// The main idea of scalar evolution analysis is to give a closed form
 // describing the value of tree nodes inside loops even when taking into
 // account that they are changing on each loop iteration. This is useful for
 // optimizations that want to reason about values of IR nodes inside loops,
@@ -28,34 +29,19 @@
 // describes its value (possibly taking its evolution into account). Note that
 // SCEV nodes are immutable and the values they represent are _not_
 // flow-dependent; that is, they don't exist at a specific location inside the
-// loop, even though some particular tree node gave rise to that SCEV node. The
-// analysis itself _is_ flow-dependent and guarantees that the Scev* returned
-// describes the value that corresponds to what the tree node computes at its
-// specific location. However, it would be perfectly legal for two trees at
-// different locations in the loop to analyze to the same SCEV node (even
-// potentially returning the same pointer). For example, in theory "i" and "j"
-// in the following loop would both be represented by the same add recurrence
-// <L, 0, 1>, and the analysis could even return the same Scev* for both of
-// them, even if it does not today:
-//
-//   int i = 0;
-//   while (true)
-//   {
-//     i++;
-//     ...
-//     int j = i - 1;
-//   }
-//
-// Actually materializing the value of a SCEV node back into tree IR is not
-// implemented yet, but generally would depend on the availability of tree
-// nodes that compute the dependent values at the point where the IR is to be
-// materialized.
-//
-// Besides the add recurrences the analysis itself is generally a
-// straightforward translation from JIT IR into the SCEV IR. Creating the add
-// recurrences requires paying attention to the structure of PHIs, and
-// disambiguating the values coming from outside the loop and the values coming
-// from the backedges.
+// loop, even though some particular tree node gave rise to that SCEV node.
+//
+// The SCEV analysis is capable of:
+//
+// 1. Identifying both direct and indirect induction variables
+// 2. Simplifying complex expressions involving induction variables
+// 3. Determining when recurrences won't overflow during loop execution
+// 4. Computing exact trip counts for countable loops
+// 5. Converting SCEV expressions back to JIT IR and value numbers
+//
+// Understanding the relationship between values across iterations enables
+// many loop optimizations, including strength reduction, loop reversal,
+// and IV widening, which are implemented in inductionvariableopts.cpp.
 //
 
 #include "jitpch.h"