Skip to content

[MemProf] Emit remarks when hinting allocations not needing cloning #141859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion llvm/include/llvm/Analysis/MemoryProfileInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
#include <map>

namespace llvm {

class OptimizationRemarkEmitter;

namespace memprof {

/// Return the allocation type for a given set of memory profile values.
Expand Down Expand Up @@ -85,6 +88,10 @@ class CallStackTrie {
// The allocation's leaf stack id.
uint64_t AllocStackId = 0;

// If the client provides a remarks emitter object, we will emit remarks on
// allocations for which we apply non-context sensitive allocation hints.
OptimizationRemarkEmitter *ORE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to switch all the other messages to ORE too in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which messages? I wasn't planning to switch e.g. the matching and hinted byte reporting messages that are more for compiler developers at this point


void deleteTrieNode(CallStackTrieNode *Node) {
if (!Node)
return;
Expand All @@ -111,7 +118,7 @@ class CallStackTrie {
uint64_t &ColdBytes);

public:
CallStackTrie() = default;
CallStackTrie(OptimizationRemarkEmitter *ORE = nullptr) : ORE(ORE) {}
~CallStackTrie() { deleteTrieNode(Alloc); }

bool empty() const { return Alloc == nullptr; }
Expand Down
4 changes: 3 additions & 1 deletion llvm/include/llvm/Transforms/Utils/Cloning.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ class Instruction;
class Loop;
class LoopInfo;
class Module;
class OptimizationRemarkEmitter;
class PGOContextualProfile;
class ProfileSummaryInfo;
class ReturnInst;
Expand Down Expand Up @@ -314,7 +315,8 @@ InlineResult InlineFunction(CallBase &CB, InlineFunctionInfo &IFI,
bool MergeAttributes = false,
AAResults *CalleeAAR = nullptr,
bool InsertLifetime = true,
Function *ForwardVarArgsTo = nullptr);
Function *ForwardVarArgsTo = nullptr,
OptimizationRemarkEmitter *ORE = nullptr);

/// Same as above, but it will update the contextual profile. If the contextual
/// profile is invalid (i.e. not loaded because it is not present), it defaults
Expand Down
18 changes: 10 additions & 8 deletions llvm/lib/Analysis/MemoryProfileInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
//===----------------------------------------------------------------------===//

#include "llvm/Analysis/MemoryProfileInfo.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/IR/Constants.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compiler.h"
Expand Down Expand Up @@ -145,13 +146,6 @@ std::string llvm::memprof::getAllocTypeAttributeString(AllocationType Type) {
llvm_unreachable("invalid alloc type");
}

static void addAllocTypeAttribute(LLVMContext &Ctx, CallBase *CI,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was only one call to this so I inlined it for simplicity and so we have easier access to the AllocTypeString for the remark.

AllocationType AllocType) {
auto AllocTypeString = getAllocTypeAttributeString(AllocType);
auto A = llvm::Attribute::get(Ctx, "memprof", AllocTypeString);
CI->addFnAttr(A);
}

bool llvm::memprof::hasSingleAllocType(uint8_t AllocTypes) {
const unsigned NumAllocTypes = llvm::popcount(AllocTypes);
assert(NumAllocTypes != 0);
Expand Down Expand Up @@ -475,7 +469,9 @@ bool CallStackTrie::buildMIBNodes(CallStackTrieNode *Node, LLVMContext &Ctx,

void CallStackTrie::addSingleAllocTypeAttribute(CallBase *CI, AllocationType AT,
StringRef Descriptor) {
addAllocTypeAttribute(CI->getContext(), CI, AT);
auto AllocTypeString = getAllocTypeAttributeString(AT);
auto A = llvm::Attribute::get(CI->getContext(), "memprof", AllocTypeString);
CI->addFnAttr(A);
if (MemProfReportHintedSizes) {
std::vector<ContextTotalSize> ContextSizeInfo;
collectContextSizeInfo(Alloc, ContextSizeInfo);
Expand All @@ -485,6 +481,12 @@ void CallStackTrie::addSingleAllocTypeAttribute(CallBase *CI, AllocationType AT,
<< getAllocTypeAttributeString(AT) << ": " << TotalSize << "\n";
}
}
if (ORE)
ORE->emit(OptimizationRemark(DEBUG_TYPE, "MemprofAttribute", CI)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we associate this annotation back to what was in the profile? Is it possible to associate a non-context sensitive hint to (a set of) context ids?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are for the user so have the allocation file and line like other remarks. I wasn't planning to emit context ids which are more for compiler developers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. here is an example of the equivalent message emitted during cloning:

; REMARKS: call in clone _Z3barv.memprof.1 marked with memprof allocation attribute cold

<< ore::NV("AllocationCall", CI) << " in function "
<< ore::NV("Caller", CI->getFunction())
<< " marked with memprof allocation attribute "
<< ore::NV("Attribute", AllocTypeString));
}

// Build and attach the minimal necessary MIB metadata. If the alloc has a
Expand Down
7 changes: 4 additions & 3 deletions llvm/lib/Transforms/IPO/Inliner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -382,9 +382,10 @@ PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
&FAM.getResult<BlockFrequencyAnalysis>(*(CB->getCaller())),
&FAM.getResult<BlockFrequencyAnalysis>(Callee));

InlineResult IR =
InlineFunction(*CB, IFI, /*MergeAttributes=*/true,
&FAM.getResult<AAManager>(*CB->getCaller()));
InlineResult IR = InlineFunction(
*CB, IFI, /*MergeAttributes=*/true,
&FAM.getResult<AAManager>(*CB->getCaller()), true, nullptr,
&FAM.getResult<OptimizationRemarkEmitterAnalysis>(*CB->getCaller()));
if (!IR.isSuccess()) {
Advice->recordUnsuccessfulInlining(IR);
continue;
Expand Down
9 changes: 6 additions & 3 deletions llvm/lib/Transforms/Instrumentation/MemProfiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/MemoryProfileInfo.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Constant.h"
Expand Down Expand Up @@ -968,7 +969,8 @@ readMemprof(Module &M, Function &F, IndexedInstrProfReader *MemProfReader,
const TargetLibraryInfo &TLI,
std::map<uint64_t, AllocMatchInfo> &FullStackIdToAllocMatchInfo,
std::set<std::vector<uint64_t>> &MatchedCallSites,
DenseMap<uint64_t, LocToLocMap> &UndriftMaps) {
DenseMap<uint64_t, LocToLocMap> &UndriftMaps,
OptimizationRemarkEmitter &ORE) {
auto &Ctx = M.getContext();
// Previously we used getIRPGOFuncName() here. If F is local linkage,
// getIRPGOFuncName() returns FuncName with prefix 'FileName;'. But
Expand Down Expand Up @@ -1135,7 +1137,7 @@ readMemprof(Module &M, Function &F, IndexedInstrProfReader *MemProfReader,
// We may match this instruction's location list to multiple MIB
// contexts. Add them to a Trie specialized for trimming the contexts to
// the minimal needed to disambiguate contexts with unique behavior.
CallStackTrie AllocTrie;
CallStackTrie AllocTrie(&ORE);
uint64_t TotalSize = 0;
uint64_t TotalColdSize = 0;
for (auto *AllocInfo : AllocInfoIter->second) {
Expand Down Expand Up @@ -1282,8 +1284,9 @@ PreservedAnalyses MemProfUsePass::run(Module &M, ModuleAnalysisManager &AM) {
continue;

const TargetLibraryInfo &TLI = FAM.getResult<TargetLibraryAnalysis>(F);
auto &ORE = FAM.getResult<OptimizationRemarkEmitterAnalysis>(F);
readMemprof(M, F, MemProfReader.get(), TLI, FullStackIdToAllocMatchInfo,
MatchedCallSites, UndriftMaps);
MatchedCallSites, UndriftMaps, ORE);
}

if (ClPrintMemProfMatchInfo) {
Expand Down
22 changes: 13 additions & 9 deletions llvm/lib/Transforms/Utils/InlineFunction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -828,12 +828,13 @@ static void removeCallsiteMetadata(CallBase *Call) {
}

static void updateMemprofMetadata(CallBase *CI,
const std::vector<Metadata *> &MIBList) {
const std::vector<Metadata *> &MIBList,
OptimizationRemarkEmitter *ORE) {
assert(!MIBList.empty());
// Remove existing memprof, which will either be replaced or may not be needed
// if we are able to use a single allocation type function attribute.
removeMemProfMetadata(CI);
CallStackTrie CallStack;
CallStackTrie CallStack(ORE);
for (Metadata *MIB : MIBList)
CallStack.addCallStack(cast<MDNode>(MIB));
bool MemprofMDAttached = CallStack.buildAndAttachMIBMetadata(CI);
Expand All @@ -848,7 +849,8 @@ static void updateMemprofMetadata(CallBase *CI,
// the call that was inlined.
static void propagateMemProfHelper(const CallBase *OrigCall,
CallBase *ClonedCall,
MDNode *InlinedCallsiteMD) {
MDNode *InlinedCallsiteMD,
OptimizationRemarkEmitter *ORE) {
MDNode *OrigCallsiteMD = ClonedCall->getMetadata(LLVMContext::MD_callsite);
MDNode *ClonedCallsiteMD = nullptr;
// Check if the call originally had callsite metadata, and update it for the
Expand Down Expand Up @@ -891,7 +893,7 @@ static void propagateMemProfHelper(const CallBase *OrigCall,
return;
}
if (NewMIBList.size() < OrigMemProfMD->getNumOperands())
updateMemprofMetadata(ClonedCall, NewMIBList);
updateMemprofMetadata(ClonedCall, NewMIBList, ORE);
}

// Update memprof related metadata (!memprof and !callsite) based on the
Expand All @@ -902,7 +904,8 @@ static void propagateMemProfHelper(const CallBase *OrigCall,
static void
propagateMemProfMetadata(Function *Callee, CallBase &CB,
bool ContainsMemProfMetadata,
const ValueMap<const Value *, WeakTrackingVH> &VMap) {
const ValueMap<const Value *, WeakTrackingVH> &VMap,
OptimizationRemarkEmitter *ORE) {
MDNode *CallsiteMD = CB.getMetadata(LLVMContext::MD_callsite);
// Only need to update if the inlined callsite had callsite metadata, or if
// there was any memprof metadata inlined.
Expand All @@ -925,7 +928,7 @@ propagateMemProfMetadata(Function *Callee, CallBase &CB,
removeCallsiteMetadata(ClonedCall);
continue;
}
propagateMemProfHelper(OrigCall, ClonedCall, CallsiteMD);
propagateMemProfHelper(OrigCall, ClonedCall, CallsiteMD, ORE);
}
}

Expand Down Expand Up @@ -2473,7 +2476,8 @@ llvm::InlineResult llvm::InlineFunction(CallBase &CB, InlineFunctionInfo &IFI,
bool MergeAttributes,
AAResults *CalleeAAR,
bool InsertLifetime,
Function *ForwardVarArgsTo) {
Function *ForwardVarArgsTo,
OptimizationRemarkEmitter *ORE) {
assert(CB.getParent() && CB.getFunction() && "Instruction not in function!");

// FIXME: we don't inline callbr yet.
Expand Down Expand Up @@ -2807,8 +2811,8 @@ llvm::InlineResult llvm::InlineFunction(CallBase &CB, InlineFunctionInfo &IFI,
// inlined function which use the same param.
AddParamAndFnBasicAttributes(CB, VMap, InlinedFunctionInfo);

propagateMemProfMetadata(CalledFunc, CB,
InlinedFunctionInfo.ContainsMemProfMetadata, VMap);
propagateMemProfMetadata(
CalledFunc, CB, InlinedFunctionInfo.ContainsMemProfMetadata, VMap, ORE);

// Propagate metadata on the callsite if necessary.
PropagateCallSiteMetadata(CB, FirstNewBlock, Caller->end());
Expand Down
7 changes: 6 additions & 1 deletion llvm/test/Transforms/Inline/memprof_inline.ll
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,12 @@
;; }


; RUN: opt -passes=inline %s -S | FileCheck %s
;; Also check that remarks are emitted when the allocations are hinted.
; RUN: opt -passes=inline -pass-remarks=memory-profile-info %s -S 2>&1 | FileCheck %s

; CHECK: remark: memprof_inline.cc:5:10: call in function _Z4foo2v marked with memprof allocation attribute cold
; CHECK: remark: memprof_inline.cc:5:10: call in function main marked with memprof allocation attribute notcold
; CHECK: remark: memprof_inline.cc:5:10: call in function main marked with memprof allocation attribute cold

; ModuleID = 'memprof_inline.cc'
source_filename = "memprof_inline.cc"
Expand Down
7 changes: 5 additions & 2 deletions llvm/test/Transforms/PGOProfile/memprof.ll
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,9 @@
; RUN: opt < %s -passes='pgo-instr-use,memprof-use<profile-filename=%t.pgomemprofdata>' -pgo-test-profile-file=%t.pgomemprofdata -pgo-warn-missing-function -S 2>&1 | FileCheck %s --check-prefixes=MEMPROF,ALL,PGO

;; Check that the total sizes are reported if requested. A message should be
;; emitted for the pruned context.
; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES,TOTALSIZENOKEEPALL
;; emitted for the pruned context. Also check that remarks are emitted for the
;; allocations hinted without context sensitivity.
; RUN: opt < %s -passes='memprof-use<profile-filename=%t.memprofdata>' -pgo-warn-missing-function -S -memprof-report-hinted-sizes -pass-remarks=memory-profile-info 2>&1 | FileCheck %s --check-prefixes=TOTALSIZESSINGLE,TOTALSIZES,TOTALSIZENOKEEPALL,REMARKSINGLE

;; Check that the total sizes are reported if requested, and prevent pruning
;; via -memprof-keep-all-not-cold-contexts.
Expand Down Expand Up @@ -386,7 +387,9 @@ for.end: ; preds = %for.cond
; TOTALSIZESTHRESH60: Total size for full allocation context hash 18254812774972004394 and dominant alloc type cold: 10
; TOTALSIZESTHRESH60: Total size for full allocation context hash 1093248920606587996 and dominant alloc type cold: 10
; TOTALSIZESSINGLE: Total size for full allocation context hash 6792096022461663180 and single alloc type notcold: 10
; REMARKSINGLE: remark: memprof.cc:25:13: call in function main marked with memprof allocation attribute notcold
; TOTALSIZESSINGLE: Total size for full allocation context hash 15737101490731057601 and single alloc type cold: 10
; REMARKSINGLE: remark: memprof.cc:26:13: call in function main marked with memprof allocation attribute cold
;; For context sensitive allocations the full context hash and size in bytes
;; are in separate metadata nodes included on the MIB metadata.
; TOTALSIZES: !"cold", ![[CONTEXT1:[0-9]+]]}
Expand Down
Loading