Skip to content

Conversation

@jorickert
Copy link

No description provided.

teresajohnson and others added 30 commits January 14, 2025 13:46
…llvm#122946)

The ThinLTO index bitcode writer uses a helper forEachSummary to manage
preparation and writing of summaries needed for each distributed index
file. For alias summaries, it invokes the provided callback for the
aliasee as well, as we at least need to produce a value id for the
alias's summary. However, all summary generation for the aliasee itself
should be skipped on calls when IsAliasee is true. We invoke the
callback again if that value's summary is to be written as well.

We were asserting in debug mode when invoking collectMemProfCallStacks,
because a given stack id index was not in the StackIdIndicesToIndex
map. It was not added because the forEachSummary invocation that records
these ids in the map (invoked from the IndexBitcodeWriter constructor)
was correctly skipping this handling when invoked for aliasees. We need
the same guard in the invocation that calls collectMemProfCallStacks.

Note that this doesn't cause any real problems in a non-asserts build
as the missing map lookup will return the default 0 value from the map,
which isn't used since we don't actually write the corresponding
summary.
…m#120300)

The first API is clang_visitCXXBaseClasses: this allows visiting the
base classes without going through the generic child visitor (which is
awkward, and doesn't work for template instantiations).

The second API is clang_getOffsetOfBase; this allows computing the
offset of a base in the class layout, the same way
clang_Cursor_getOffsetOfField computes the offset of a field.

Also, add a Python binding for the existing function
clang_isVirtualBase.
Currently we fail to detect the case where BTC + 1 wraps, i.e. the
vector trip count is 0, In those cases, the minimum iteration count
check will fail, and the vector code will never be executed.

Explicitly check for this condition in computeMaxVF and avoid trying to
vectorize alltogether.

Note that a number of tests needed to be updated, because the vector
loop would never be executed given the input IR.

Fixes llvm#122558.
…lvm#122801)

For MinGW environments, the regular C/C++ toolchains usually use "w64"
for the vendor field in triples, while Rust toolchains usually use "pc"
in the vendor field.

The differences in the vendor field have no bearing on whether the IR is
compatible on this platform. (This probably goes for most other OSes as
well, but limiting the scope of the change to the specific case.)

Add a unit test for the isCompatibleWith, including some existing test
cases found in existing tests.
…t phase (llvm#121991)" (llvm#122775)"

This reverts commit a39aaf3 and
63d3bd6.

Due to test failures on MacOSX.
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect Source to be nonnull.
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect DeclOrVector to be nonnull.
…#122856)

Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect Storage to be nonnull.
…e database (llvm#120348)

A change list may include files that are not part of the compile
database, which can cause clang-tidy to fail (e.g., due to missing
included headers). To prevent false negatives, we should allow to skip
processing these files.
Preparation for llvm#121124 

This PR provides tests added into
[PR](llvm#121124) that add
selection patterns for instruction `v_sat_pk`, in order to specify the
change of the tests before and after the commit.

Pre-commit tests PR for llvm#121124 : Add selection patterns for instruction
`v_sat_pk`
…normalize` (llvm#122935)

Currently, the output of `Triple::normalize` can vary depending on how the
`Triple` object is constructed, producing a 3-field, 4-field, or even 5-field
string. However, there is no way to control the format of the output, as all
forms are considered canonical according to the LangRef.

This lack of control can be inconvenient when a specific format is required. To
address this, this PR introduces an argument to specify the desired format (3,
4, or 5 identifiers), with the default set to none to maintain the current
behavior. If the requested format requires more components than are available in
the actual `Data`, `"unknown"` is appended as needed.
Introduce a new op to get the device address from a host symbol. This
simplify the current conversion and this is also in preparation for some
legalization work that need to be done in cuf kernel and cuf kernel
launch similar to
llvm#122802
This PR removes the redundant checks for the `supported` variable, as
it's guaranteed to be true.
…ter handlers (llvm#122555)

This PR addresses part of the feedback provided in llvm#115808.
Change the inheritance of class VPWidenSelectRecipe to class
VPRecipeWithIRFlags, which allows recipe of the select to pass the
fastmath flags.The patch of llvm#119847 will add the fastmath flag to for
recipe
These missing continuations were causing commands in this testcase to fail.
Adds new convenience methods findDefinedSymbolByName, findExternalSymbolByName
and findAbsoluteSymbolByName to the LinkGraph class. These should be used to
find symbols of the given types by name.

COFFLinkGraphBuilder and MachOPlatform are updated to take advantage of the
new methods.
…s. NFC.

On platfarms where some relocations for eh-frame sections are implicit (e.g.
MachO/x86-64) EHFrameEdgeFixer is responsible for adding edges for the
implicit relocations.
llvm#122418)

Also clarify the comment above DependentNameType::getIdentifier()
Lots of code around LLDB was directly accessing the target's section
load list. This NFC patch makes the section load list private so the
Target class can access it, but everyone else now uses accessor
functions. This allows us to control the resolving of addresses and will
allow for functionality in LLDB which can lazily resolve addresses in
JIT plug-ins with a future patch.
While this likely won't impact the final SASS, it makes for more compact
PTX.
jofrn and others added 27 commits January 15, 2025 06:56
…ets (llvm#122251)

APInt will fail when given a negative offset. SelectScratchSVAddr
utilizes this function and can be given a negative offset as well, so
this change modifies it to use APSInt instead.
Currently, trailing comments get mixed up:

```
struct Foo {
  int a; // This one is the cool field
         // within the struct.
  int b;
};
```

becomes:

```
struct Foo {
  int b; // This one is the cool field
         // within the struct.
  int a;
};
```

This should be:

```
struct Foo {
  int b;
  int a; // This one is the cool field
         // within the struct.
};
```
…AlignedInLoop (llvm#96752)

Currently when we encounter a negative step in the induction
variable isDereferenceableAndAlignedInLoop bails out because
the element size is signed greater than the step. This patch
adds support for negative steps in cases where we detect the
start address for the load is of the form base + offset. In
this case the address decrements in each iteration so we need
to calculate the access size differently. I have done this by
caling getStartAndEndForAccess from LoopAccessAnalysis.cpp.

The motivation for this patch comes from PR llvm#88385 where a
reviewer requested reusing isDereferenceableAndAlignedInLoop,
but that PR itself does support reverse loops.

The changed test in LoopVectorize/X86/load-deref-pred.ll now
passes because previously we were calculating the total access
size incorrectly, whereas now it is 412 bytes and fits
perfectly into the alloca.
Fixes llvm#37901

This behavior is consistent with GCC
…pattern (llvm#120421)

Relates to (but isn't dependent on) llvm#120420.

This allows alias analysis o the intrinsic of the same quality as for
the libcall, which we want in order to move LoopIdiomRecognize over to
selecting the intrinsic.
…ate` ops (llvm#122866)

This PR generalizes a fix that we implemented previously for
`omp.wsloop`s. The fix makes sure the pre-condtion that the `alloca`
block has a single successor whenever we inline delayed privatizers is
respected. I simply moved the fix to `allocatePrivateVars` so that it
kicks in for any op not just `omp.wsloop`.

This handles a bug uncovered by [a
test](https://github.com/OpenMP-Validation-and-Verification/OpenMP_VV/blob/master/tests/4.5/target_simd/test_target_simd_safelen.F90)
in the OpenMP_VV test suite.
…s() (llvm#123042)

This would allow tools that don't use the real file system to use this
function.
While at it, move a test that calls the IndVarSimplify pass into the
IndVarSimplify directory.
…ass (NFC) (llvm#122836)

This refactor prepares for further ARM64X hybrid support, where these
helpers will need to work with either the native or EC symbol table
based on context.
…123043)

This might be a copy/paste error. I don't think this an issue in
practice as the builtins/intrinsics are only legal with identical vector
element types.
…m#122338)

Adds tests for scalable vectors in:

  * "vector-sink.mlir".

This test file exercises patterns included in
`populateSinkVectorOpsPatterns`:

  * `ReorderElementwiseOpsOnBroadcast`,
  * `ReorderCastOpsOnBroadcast`,
  * `ReorderElementwiseOpsOnTranspose`.

This PR focuses on adding tests for the latter two patterns
(`ReorderCastOpsOnBroadcast` and `ReorderElementwiseOpsOnTranspose`).

Tests for `ReorderElementwiseOpsOnBroadcast` were added in llvm#102286. Please
note that in PR llvm#102856, I renamed:

  * `populateSinkVectorBroadcastPatterns`, to
  * `populateSinkVectorOpsPatterns`.
llvm#122144)

…ariables

Patch adds a flag to control zero initialization of global variables
without default initialization. The default is to zero initialize.
…lvm#121949)

Intrinsic module procedures ieee_get_modes, ieee_set_modes,
ieee_get_status, and ieee_set_status store and retrieve opaque data
values whose size varies by machine and OS environment. These data
values are usually, but not always small. Their sizes are not directly
known in a cross compilation environment. Address this issue by
implementing two mechanisms for processing these data values.
Environments that use typical small data sizes can access storage
defined at compile time. When this is not valid, data storage of any
size can be allocated at runtime.
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect P to be nonnull.
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect Data to be nonnull.
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:

  // FIXME: Replace the uses of is(), get() and dyn_cast() with
  //        isa<T>, cast<T> and the llvm::dyn_cast<T>

Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect EnumUnderlying to be nonnull.
CUID is needed by CUDA/HIP for supporting accessing static device
variables in host function.

Currently CUID is only supported by the old driver for CUDA/HIP. The new
driver does not support it, which causes CUDA/HIP programs using static
device variables in host functions to fail with the new driver for
CUDA/HIP.

This patch refactors the CUID support in the old driver so that CUID is
supported by both the old and the new drivers for CUDA/HIP.
Signals 1-32 are matching the default UNIX platform.

There are platform specific ones above 32.
Base automatically changed from bump_to_f09db6a3 to bump_to_392622d0 April 14, 2025 07:58
@jorickert jorickert merged commit 6bf8653 into bump_to_392622d0 Apr 14, 2025
6 checks passed
@jorickert jorickert deleted the bump_to_3986cffe branch April 14, 2025 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.