In [12]:
%load_ext autoreload
%autoreload 2
from notebook import *
# if get something about NUMEXPR_MAX_THREADS being set incorrectly, don't worry.  It's not a problem.

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Branch and branch predictions

## Why do we have "branches" in code?

Consider the following code snippet, how does the compiler translate to instructions?

In [3]:
render_code("loop.c", show="loop0")

In [5]:
! gcc -S -O0 loop.c
render_code("loop.s", show=["loop0","LFE23"])

In [6]:
! gcc -S -O2 loop.c
render_code("loop.s", show=["loop1","LFE24"])

In [5]:
! gcc -S -O2 loop.c
render_code("loop.s", show=["loop2","LFE25"])

## Sorting and branch miss rates

Do you remember this?

In [13]:
compare([do_render_code("arraySort.cpp",show=["//START","//END"]),do_render_code("calculate_sum.c", show="calculate_sum")])

In [14]:
! lscpu | grep 'Model name'

Model name:                           13th Gen Intel(R) Core(TM) i7-13700


In [15]:
! make clean; make EXTRA_OPTS=-DCOUNT_SORTING; sleep 2
! echo "size,iterations,sorted,IC,Cycles,CPI,CT,ET,L1_dcache_miss_rate,L1_dcache_misses,L1_dcache_accesses,branches,branch_misses" > stats.csv
! echo -n "131072,1000,0," >> stats.csv
! ./arraySort 131072 1000 0
! echo -n "131072,1000,1," >> stats.csv
! ./arraySort 131072 1000 1

rm -f madd arraySort *.o
gcc -g -DHAVE_LINUX_PERF_EVENT_H -O0 -DCOUNT_SORTING -I/nfshome/htseng/courses/CS203/demo/branch  -o calculate_sum.o -c calculate_sum.c
gcc -g -DHAVE_LINUX_PERF_EVENT_H -O3 -DCOUNT_SORTING -I/nfshome/htseng/courses/CS203/demo/branch  -o perfstats.o -c perfstats.c
[01m[Kperfstats.c:[m[K In function ‘[01m[Kchange_cpufrequnecy[m[K’:
  115 |     cpu = [01;35m[Ksched_getcpu[m[K();
      |           [01;35m[K^~~~~~~~~~~~[m[K
      |           [32m[KSYS_getcpu[m[K
g++ -O3 -DHAVE_LINUX_PERF_EVENT_H -DCOUNT_SORTING arraySort.cpp perfstats.o calculate_sum.o -o arraySort
[01m[KarraySort.cpp:[m[K In function ‘[01m[Kint[01;32m[K main[m[K(int, char**)[m[K’:
   53 |     sprintf(preamble,[01;35m[K""[m[K);
      |                      [01;35m[K^~[m[K
sum = 64494148
sum = 64494148


In [16]:
display_df_mono(render_csv("stats.csv"))

Unnamed: 0,index,size,iterations,sorted,IC,Cycles,CPI,CT,ET,L1_dcache_miss_rate,L1_dcache_misses,L1_dcache_accesses,branches,branch_misses
0,0,131072,1000,0,1500089369,1560707085,1.040409,0.197792,0.308695,0.007012,8216848,1171809488,261041614,35149848
1,1,131072,1000,1,1522559118,375015986,0.246306,0.19684,0.073818,0.006986,8260554,1182512878,266006670,957531


In [70]:
miss_prediction_rate_without_sorting = 35372534/262712331
print(miss_prediction_rate_without_sorting)
cycles_with_sorting = 1560707085

0.134643599961054


What did we learn?

The CPI is ??? smaller with data sorted (including sorting itself)

The ET is ??? faster with data sorted (including sorting itself)

What's the cost of branch misses?

Let's exclude the sorting part and do it again.

In [17]:
! make clean; make
! sleep 2
! echo "size,iterations,sorted,IC,Cycles,CPI,CT,ET,L1_dcache_miss_rate,L1_dcache_misses,L1_dcache_accesses,branches,branch_misses" > stats.csv
! echo -n "262144,1000,0," >> stats.csv
! ./arraySort 262144 1000 0
! echo -n "262144,1000,1," >> stats.csv
! ./arraySort 262144 1000 1
display_df_mono(render_csv("stats.csv"))

rm -f madd arraySort *.o
gcc -g -DHAVE_LINUX_PERF_EVENT_H -O0  -I/nfshome/htseng/courses/CS203/demo/branch  -o calculate_sum.o -c calculate_sum.c
gcc -g -DHAVE_LINUX_PERF_EVENT_H -O3  -I/nfshome/htseng/courses/CS203/demo/branch  -o perfstats.o -c perfstats.c
[01m[Kperfstats.c:[m[K In function ‘[01m[Kchange_cpufrequnecy[m[K’:
  115 |     cpu = [01;35m[Ksched_getcpu[m[K();
      |           [01;35m[K^~~~~~~~~~~~[m[K
      |           [32m[KSYS_getcpu[m[K
g++ -O3 -DHAVE_LINUX_PERF_EVENT_H  arraySort.cpp perfstats.o calculate_sum.o -o arraySort
[01m[KarraySort.cpp:[m[K In function ‘[01m[Kint[01;32m[K main[m[K(int, char**)[m[K’:
   53 |     sprintf(preamble,[01;35m[K""[m[K);
      |                      [01;35m[K^~[m[K
sum = 127161197
sum = 127161197


Unnamed: 0,index,size,iterations,sorted,IC,Cycles,CPI,CT,ET,L1_dcache_miss_rate,L1_dcache_misses,L1_dcache_accesses,branches,branch_misses
0,0,262144,1000,0,3015678570,3013721621,0.999351,0.196537,0.592308,0.00427,10048477,2353311459,525147647,67979099
1,1,262144,1000,1,3011998253,707433688,0.234872,0.196626,0.1391,0.006609,15543808,2351953168,524510162,4686


In [18]:
Diff_Misses = 67979099-4686
Diff_Cycles = 3013721621-707433688
print(Diff_Cycles/Diff_Misses)

33.92876571070941


Let's try a different processor!

In [19]:
! make clean; make
! sleep 2
! echo "size,iterations,sorted,IC,Cycles,CPI,CT,ET,L1_dcache_miss_rate,L1_dcache_misses,L1_dcache_accesses,branches,branch_misses" > stats.csv
! ssh htseng@blissey 'make -C /nfshome/htseng/courses/CSE142/demo/branch/ clean all ; lscpu|grep "Model name"'
! echo -n "262144,1000,0," >> stats.csv
! ssh htseng@blissey "cd /nfshome/htseng/courses/CSE142/demo/branch; ./arraySort 262144 1000 0"
! echo -n "262144,1000,1," >> stats.csv
! ssh htseng@blissey "cd /nfshome/htseng/courses/CSE142/demo/branch; ./arraySort 262144 1000 1"
display_df_mono(render_csv("stats.csv"))

rm -f madd arraySort *.o
gcc -g -DHAVE_LINUX_PERF_EVENT_H -O0  -I/nfshome/htseng/courses/CS203/demo/branch  -o calculate_sum.o -c calculate_sum.c
gcc -g -DHAVE_LINUX_PERF_EVENT_H -O3  -I/nfshome/htseng/courses/CS203/demo/branch  -o perfstats.o -c perfstats.c
[01m[Kperfstats.c:[m[K In function ‘[01m[Kchange_cpufrequnecy[m[K’:
  115 |     cpu = [01;35m[Ksched_getcpu[m[K();
      |           [01;35m[K^~~~~~~~~~~~[m[K
      |           [32m[KSYS_getcpu[m[K
g++ -O3 -DHAVE_LINUX_PERF_EVENT_H  arraySort.cpp perfstats.o calculate_sum.o -o arraySort
[01m[KarraySort.cpp:[m[K In function ‘[01m[Kint[01;32m[K main[m[K(int, char**)[m[K’:
   53 |     sprintf(preamble,[01;35m[K""[m[K);
      |                      [01;35m[K^~[m[K
make: Entering directory '/nfshome/htseng/courses/CSE142/demo/branch'
rm -f madd arraySort *.o
gcc -g -DHAVE_LINUX_PERF_EVENT_H -O0  -I/nfshome/htseng  -o calculate_sum.o -c calculate_sum.c
gcc -g -DHAVE_LINUX_PERF_EVENT_H -O3  -I/nfsho

Unnamed: 0,index,size,iterations,sorted,IC,Cycles,CPI,CT,ET,L1_dcache_miss_rate,L1_dcache_misses,L1_dcache_accesses,branches,branch_misses
0,0,262144,1000,0,262144,1000,1,,,,,,,


In [33]:
Diff_Misses = 49159652-57646
Diff_Cycles = 1941551579-789330059
print(Diff_Cycles/Diff_Misses)

23.46587469359195


In [35]:
new_CPI = 0.9*1+0.1*24
print(new_CPI)

3.3000000000000003
