Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 65 additions & 1 deletion .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4120,4 +4120,68 @@ pyproject
toml
virtualenv
mebibytes
syscalls
syscalls
ArchSpecificLibrary
Asahi
AsmSource
AutoEncoder
Avx
BuildCommand
BuildYourOwnKernel
CPPLibRecommend
CPPLibVersion
CPPStdCodes
CompilerSpecific
ConfigGuess
ConfigurationInfo
CrossCompile
DefineOtherArch
Denoises
DiT
Drozd
FlatBuffers
GolangInlineAsm
GolangIntrinsic
GolangLinkLibrary
HostCpuDetection
IncompatibleHeaderFile
InlineAsm
JavaJar
JavaPom
JavaSource
NoEquivalentInlineAsm
NoEquivalentIntrinsic
OldCrt
OpenAnolis
PreprocessorError
PythonInlineAsm
PythonIntrinsic
PythonLinkLibrary
PythonPackage
RustInlineAsm
RustIntrinsic
RustLinkLibrary
SentencePiece
SignedChar
Submodule
TUI
Wix’s
audiogen
bazelbuild
centos
cmdline
deadsnakes
flatbuffers
libmagic
litert
mv
ngrok’s
pagesize
runfinch
spiece
subcommand
subgenre
submodule
subword
techcrunch
transformative
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Start by creating the Create Cosmos DB account and database:
* Account Name: provide a unique name (for example, armiotcosmosdb).
* Availability Zones: disable.
* Region: choose the same region as your IoT Hub and Stream Analytics job.
* Select servleress as capacity mode.
* Select serverless as capacity mode.
* Apply Free Tier Discount: apply
* Check Limit total account throughput.
![img17 alt-text#center](figures/17.png)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ You can use the provided script to convert the Conditioners submodule:
python3 ./scripts/export_conditioners.py --model_config "$WORKSPACE/model_config.json" --ckpt_path "$WORKSPACE/model.ckpt"
```

After successful conversion, you now have a `tflite_conditioners` directory containing models with different precisions (e.g., float16, float32).
After successful conversion, you now have a `tflite_conditioners` directory containing models with different precision (e.g., float16, float32).

You will be using the float32.tflite model for on-device inference.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ SPE integrates sampling directly into the CPU pipeline, triggering on individual

This enables software developers to tune user-space software for characteristics such as memory latency and cache accesses. Importantly, cache statistics are enabled with the Linux Perf cache-to-cache (C2C) utility.

Please refer to the [Arm SPE whitepaper](https://developer.arm.com/documentation/109429/latest/) for more details.
Please refer to the [Arm SPE white paper](https://developer.arm.com/documentation/109429/latest/) for more details.

In this Learning Path, you will use SPE and Perf C2C to diagnose a cache issue for an application running on a Neoverse server.

## False sharing within the cache

Even when two threads touch entirely separate variables, modern processors move data in fixed-size cache lines (nominally 64-bytes). If those distinct variables happen to occupy bytes within the same line, every time one thread writes its variable the core’s cache must gain exclusive ownership of the whole line, forcing the other core’s copy to be invalidated. The second thread, still working on its own variable, then triggers a coherence miss to fetch the line back, and the ping-pong pattern repeats. Please see the illustration below, taken from the Arm SPE whitepaper, for a visual explanation.
Even when two threads touch entirely separate variables, modern processors move data in fixed-size cache lines (nominally 64-bytes). If those distinct variables happen to occupy bytes within the same line, every time one thread writes its variable the core’s cache must gain exclusive ownership of the whole line, forcing the other core’s copy to be invalidated. The second thread, still working on its own variable, then triggers a coherence miss to fetch the line back, and the ping-pong pattern repeats. Please see the illustration below, taken from the Arm SPE white paper, for a visual explanation.

![false_sharing_diagram](./false_sharing_diagram.png)

Expand Down