Single-Core Optimization

This series of lectures is about code optimization on single-core.

We start from scratch, revising some basic features of the architecture of modern cpus after a brief historical review of the reasons that led to them.
We then introduce some general concepts about what “optimization” is, settling some common terms, and the usage of the compiler. Code examples related to memory aliasing are here.
The running and memory model in the *niX environment is discussed in some detailed, with a focus on the stack and the heap memory. Code examples related to this topic are in this folder.
Some more detailed facts about memory allocation are exposed. Code examples related to the topic are in this folder.
Basic precautions and simple habits in writing codes so that to avoid trivial performance killers are discussed. Examples are here
Optimization concepts are subdivided in 4 big chapters:
1. cache usage (slides, examples)
2. conditional branching (slides ,examples)
3. how to exploit pipelines (i.e. ILP; slides, examples )
4. simple techniques for loops and prefetching of data (slides, examples on loops, examples on prefetching )

Some materials about how to use the debugger and how to profile a code are also provided with some examples on debugging and on profiling.

What is required for the exam

In general consider that you will not be required anything at assembler level. All the examples and discussions presented in the slides have been discussed only to let you undertand better what happens behind the scenes.

Architecture

As for the final oral exam, it will be required that the candidate has a firm understanding of the modern CPU architecture, so what being out-of-order is about, the pipeline concept, the cache hierarchy and how a typical cache works, the branch prediction and the concept of conditional branching and its impact.

Running model and memory allocation

The candidate must know what the running model is in its fundamental traits, and definitely to have a clear idea about what the stack and the heap are. Also the main concepts about how the memory allocation works and what is the memory alignement must be digested.

Optimization / cache

Basically all the materials presented in the slides should be mastered by the candidate. The “cache resonance” and the exact details of cache mapping are required only to get the maximum grade, a simpler general understanding of the concept is sufficient even for high marks.

Optimization / branches

The candidate must know what a conditional branch is and what is its impact on the performance. He/she/they must have a clear idea about the basci techniques to avoid conditional branches when possible and in any case how to ease the compiler’s job in issuing performant code.

Optimization / pipelines

The candidate must be familiar with the concept of pipeline and ILP and with how exploiting them “translates” in coding in a general sense.

Optimization / loops and prefetching

Some basic concepts and techniques on how to write loops are presented. We require that they are all clear to the candidate, as well as the concept of “prefetching” data in loops (direct prefetching is not a must-have).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Single-Core Optimization

What is required for the exam

Architecture

Running model and memory allocation

Optimization / cache

Optimization / branches

Optimization / pipelines

Optimization / loops and prefetching

Further readings (optional, but useful)

Files

README.md

Latest commit

History

README.md

File metadata and controls

Single-Core Optimization

What is required for the exam

Architecture

Running model and memory allocation

Optimization / cache

Optimization / branches

Optimization / pipelines

Optimization / loops and prefetching

Further readings (optional, but useful)