Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions main/acle.md
Original file line number Diff line number Diff line change
Expand Up @@ -471,6 +471,7 @@ Armv8.4-A [[ARMARMv84]](#ARMARMv84). Support is added for the Dot Product intrin

* Added support for modal 8-bit floating point matrix multiply-accumulate widening intrinsics.
* Added support for 16-bit floating point matrix multiply-accumulate widening intrinsics.
* Added support for range prefetch intrinsic when `__ARM_FEATURE_RPRFM` is defined.

### References

Expand Down Expand Up @@ -3613,6 +3614,79 @@ values.
| KEEP | 0 | Temporal fetch of the addressed location (that is, allocate in cache normally) |
| STRM | 1 | Streaming fetch of the addressed location (that is, memory used only once) |

The following intrinsic is also available when `__ARM_FEATURE_RPRFM` is defined:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the expectation for the builtin to fail compilation when this feature is not present? or be treated as a NOP? The existing _pld documentation suggests the latter.

I guess it comes down to whether the builtin provides useful information that can be used even if RPRFM is not available (e.g. by emitting other prefetch instructions, or non-temporal accesses).

The reason I mention this is because if there's only a weak link to the underlying instruction then we'll not want to use a feature macro, but instead something that just says the builtin is available, like __ARM_PREFIX_RANGE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the existing documentation above the _pld builtin applies here too, in that the range builtin should not fail when the feature is not present and is treated as a nop.


``` c
void __pldx_range(/*constant*/ unsigned int /*access_kind*/,
/*constant*/ unsigned int /*retention_policy*/,
/*constant*/ size_t /*reuse distance*/,
/*constant*/ signed int /*stride*/,
/*constant*/ unsigned int /*count*/,
/*constant*/ signed int /*length*/,
void const volatile *addr);
```

Generates a data prefetch instruction for a range of addresses starting from a
given base address. Locations within the specified address ranges are prefetched
into one or more caches. This intrinsic allows the specification of the
expected access kind (read or write), the data retention policy (temporal or
streaming) and the reuse distance, stride, count and length metadata values.

The access kind and data retention policy arguments can only be one of the
following values.

| **Access Kind** | **Value** | **Summary** |
| --------------- | --------- | ---------------------------------------- |
| PLD | 0 | Fetch the addressed location for reading |
| PST | 1 | Fetch the addressed location for writing |

| **Retention Policy** | **Value** | **Summary** |
| -------------------- | --------- | -------------------------------------------------------------------------- |
| KEEP | 0 | Temporal fetch of the addressed location (that is, allocate in cache normally) |
| STRM | 1 | Streaming fetch of the addressed location (that is, memory used only once) |

The table below describes the ranges of the reuse distance, stride, count and length arguments.

| **Metadata** | **Range** | **Summary** |
| -------------- | ------------------- | -------------------------------------------------------------------- |
| Reuse Distance | | Maximum number of bytes to be accessed before executing the next |
| | | RPRFM instruction that specifies the same range. All values are |
| | | rounded up to the nearest power of 2 in the range 32KiB to 512MiB. |
| | | Values exceeding the maximum of 512MiB will be represented by 0, |
| | | indicating distance not known. |
| | | Note: This value is ignored if a streaming prefetch is specified. |
| Stride | [-2MiB, +2MiB) | Number of bytes to advance the block address by after `Length` |
| | | bytes have been accessed. Note: This value is ignored if Count is 1. |
| Count | [1, 65536] | Number of blocks to be accessed. |
| Length | [-2MiB, +2MiB) | Number of contiguous bytes to be accessed. |

``` c
void __pld_range(/*constant*/ unsigned int /*access_kind*/,
/*constant*/ unsigned int /*retention_policy*/,
unsigned long /*metadata*/,
void const volatile *addr);
```

Generates a data prefetch instruction for a range of addresses starting from a
given base address. Locations within the specified address ranges are prefetched
into one or more caches. The access kind and retention policy arguments can
have the same values as in `__pldx_range`. The bits of the metadata argument
are interpreted as follows:

| **Metadata** | **Bits** | **Range** | **Summary** |
| -------------- | -------- | --------------- | ------------------------------------------------------------ |
| Length | 0-21 | [-2MiB, +2MiB) | Signed integer representing the number of contiguous |
| | | | bytes to be accessed. |
| Count | 37-22 | [0, 65535] | Unsigned integer representing number of blocks of data |
| | | | to be accessed, minus 1. |
| Stride | 59-38 | [-2MiB, +2MiB) | Signed integer representing the number of bytes to advance |
| | | | the block address by after `Length` bytes have been |
| | | | accessed. This value is ignored if Count is 0. |
| Reuse Distance | 63-60 | [0, 15] | Indicates the maximum number of bytes to be accessed before |
| | | | executing the next RPRFM instruction that specifies the same |
| | | | range. Bits encode decreasing powers of two in the range |
| | | | 1 (512MiB) to 15 (32KiB). 0 indicates distance not known. |

### Instruction prefetch

``` c
Expand Down
Loading