Instruction Erecular tetch - Pecode - PF reach - { Anishe breach - rest PC { cp17, energy? Performance Pipeline use all stages at a time syder for an instru terroin continue -limitedian: IC[ICH: compiler]. CP[=1] Best case], CT Eartial logic path in down trage Merrics . \* Specil performance = 1 / latency, specilip = latency | (371 ; faster latency) | (1 ; slower latency) Or Constrol hospital: forth stuye x human next - Problem: Perform a cpz = 6 by stelling, Energy a a is small - speculative execution: predict on prior, detect misgreliating latery after 1<1: slower. - importance: wising/human vision/due soon E.g. 1-bit local giredator TIAY,2"] == 1 product due NT; TIA/2"]= then? 1:0 - Misspeculation: CTI7, pipeline plushing

· Ic determined by non-speculative, ampileo1958; CFI: Cydex worked Pidle

· Energy: a:1 since all one long. CPIT ucused energ; Non-speculative warse 1 Bendwichth: worke per time, byter/second. instruction/seconds, etc. e.g. write IOMB At IOMB/6: 15. - latency = work bandwidth. @ Asta hazard: Fordund, connections by puss the reg tile Key Equations (3 x8) terrible: micro-op. decouple instructions for Hamility, overage to ugul x86 pecode decorpose instruction into ucops so decode quene More perforance:

(Deeper gaineline: Decode, then CT & CPI > point x rective by promise and so perform to prome the decode of the promise to the promise O Amdohl's Law: optimizations x (generally) uniformly affect the entire groupsam X1: fraction of explications - Speedy total = xx1+xx2+...1-x1-x2-xn Si: speedup golded optimise - corollaries: common aptimine uncommon timed new common of divide up dif · Impuce. Et (CT + 50/L), Energy (act extraorer. switchig), [Forest] (Formules)

(B) Wildor pipeline: two wide (1914. duta depends) · Import: ET & "CCPI & Soy! , Energy & act higger rey the ) To be 10 CPV Performance Equation - Execution time = IC + CPI x CT = tyles x instructions x second cycles · mider: more hurnds, idle write, overhead; long-laterry complications - CPI: average cycles. CT = Trequency (a: 109, M: 106), IC: dynamic instructions 1 Box-of-Order Evertism: A) Puta Perendency - instruction route carret when fordar (RAW) A) Reduce Instruction Court: Algorithmy/Compiler (correction sub elimination 500: no aprimazonian, variable on stacks, lots of lands/stores (constant propagation) WAW & fake dependency: no dota flows but A must execute before B · critical path: largest sequency of RAW, WAW, and/or WAR-depondent instr 101: x lands/snoros, fewer movs, nothing an stock function in-line - infine -> constant -> loop unval -> paritial evolution (substitute computation).

Lapacis: program agent instruction set · formulas: CPI = CPI , average ILP = CPI , IC = ILP & CP = CPI xCP ET=ICXGIZXCT = IC X CP X CT = CPXCT B) Improve CPI: total cycles = average latercy of an instruction effective latercy: its contribution to CP only IAW consorts

B) Register renamly: RAT ~ Architecture > Physical register, parallelism

Lazar register of the instruction stream in hyperparallelism only PAW constitute Program: Heat/memory, larger CPI Imputs: exercise different pures of the program . ILP: examine buyer region of the instruction stream Compile: which instruction . e.g. 00 1/4 01 C) Out of order issue LTomasulo's Algo I == a a=

Triputs (physical reg

Triputs (physical c) Affect CT: processor design (cycle timemin = delay mux along critical porth) Manufacturing variation (transition chars). software policy (3) Power and Energy Consumption voltage activity fultor moderation of the state of the A) Power = Energy = 1/3 = Wate; P= V2 FaC + Pidle ille pover consumption topped to copredicting (user) premietingent valy reorder buffor all prior bromby valy veg volve available - Pincrewes with avealco and switching fuctor (a) + tree , competite part \* Scheduler: usne instructions to parts [pipeline]. | printers and intense Mondown Proceeds was peculiary | with printers and intense peculiary | printers and intense peculiary | Period | PAT | Page - Pidle proportional to the area of the chip - Vand F are linearly related. PaFaC+Pidle: Ft, Burgt B) Energy = Power x Time = (v3faC+Pidle) x CCPIXICXCT) - clock freq 1: E7, P1, ET V Benchmork well-defined computation used to quantif system characteristics · Browsh preliation I , 120% willisestim > Jam chearets, metal meas Cache (# block offset birs = kg\_ (black-size) bytes

(aprine compiler larchitecture torrowless the entries = coch size (kR, losa

the entries = coch size (kR, losa

block size x x bytes

trulex = ku. (Hentries) (Mirco: mechnie specific aspect

Mirco: stand app inving standard injuris - chrodonistics form cloturets, metal meas Couke Poc: ma a secon-specific program capture compilor larchitecture formula tacy = deliers - Hinder - Htoy \* block wise: date transferred on miss Special]: including amodim (veto), open-source, all in c/c++1 Pex index = loy2 (# entries) top inder offset, bloch size 7 = miss & , A broke & houlities: temporal a new in time, access same very soon; spatial a near in x86 Assembly eux rax acumikia eby vbx bue ax eux rax asuma by eby vbx bue cx ecx rcy amount b s ul 1 q 8 16 16 32 64 bits immetin farmed space, next access is close to just access. Howdware - ISA instructor spection -> coffuence instructor with renowy couche mices. compressing: first access to the dotal computions: first access to the data containing country: advers, requested AVD fully-associative ale of lare the exp a miss complet: indexing council collicions. adds ADID full-uss could of a exp a little of the exp a little o instruction area sree det; m(1/eux.//3x n) = . % creys: register, & imm: immediate, blabel: label. Water (Recuripe driver) Set Associative index - see later 2 · //eux: rey value; (//eux): Mem(Rteux); n/g/axx): Mem(R+n); - compare: compliferox febri: feux-febr; est compleax, febri = gless == gle - miss: write allocate (bring code into cache), write an -allocate (matify) limbrish.

- hit: write through(tell know lovel cuche what not charged), write bach. musticas drivy · ZF: 01 if signal so are adjust; SF: 1 if so < SI. - Function call: ret; return address; lenve: restore cut stade pointer unintel regality (CISC: complex is imposer, human recolable, constant to add feelnes [166, VAV] Prefetching the smale: difference import: file had vide debuy other refuse to stream hafter complement import: fine had vide debuy other refuse to stream had one or complete to stream to | energe: > love + energy 5: PREFEICH: insort your own ymplerences. RISC: compler outpreate. I Alpha, NIPS, SPAKET. fixed: work, kight, debourning for x (implementation x) High healing: miss (write allocate), hit (write bude) · x86 - RIC-like instructions - execution low locality: miss (write no-altreate), hit (write through) linited on those hit

C1: write-thony, white allowate; C2 write back and allowate; C2/Memony or write through



- scheduler (ALL process Lops , two memorins: fetch , 12013, such thread RAT, bouldstone queus trach - snell oneshed - Methodown: vecel any monory neepped so there coldress space · rely: aggrousse speculation that courses exception auguressive memory promision check in the 7213. namely supply approximations, high resolution times Appearse: read ong monoy in the bestel aggressive speculation pass bromakes. and of order execution

high resolution +) mer.