## Exercise: Use LIKWID to Count FLOPs

First, let's check that LIKWID is working. The following should work and print the supported LIKWID performance groups.

In [1]:
using LIKWID

In [2]:
PerfMon.supported_groups()

Dict{String, LIKWID.GroupInfoCompact} with 18 entries:
  "L2CACHE"  => L2CACHE => L2 cache miss rate/ratio (experimental)
  "MEM2"     => MEM2 => Main memory bandwidth in MBytes/s (channels 4-7)
  "NUMA"     => NUMA => L2 cache bandwidth in MBytes/s (experimental)
  "BRANCH"   => BRANCH => Branch prediction miss rate/ratio
  "FLOPS_SP" => FLOPS_SP => Single Precision MFLOP/s
  "DIVIDE"   => DIVIDE => Divide unit information
  "CPI"      => CPI => Cycles per instruction
  "L2"       => L2 => L2 cache bandwidth in MBytes/s (experimental)
  "L3"       => L3 => L3 cache bandwidth in MBytes/s
  "L3CACHE"  => L3CACHE => L3 cache miss rate/ratio (experimental)
  "CACHE"    => CACHE => Data cache miss rate/ratio
  "ICACHE"   => ICACHE => Instruction cache miss rate/ratio
  "TLB"      => TLB => TLB miss rate/ratio
  "CLOCK"    => CLOCK => Cycles per instruction
  "FLOPS_DP" => FLOPS_DP => Double Precision MFLOP/s
  "ENERGY"   => ENERGY => Power and Energy consumption
  "MEM1"     => MEM1 => Mai

Great, you're set up!

**You can find the instructions for this exercise/tutorial here:**   
https://juliaperf.github.io/LIKWID.jl/dev/tutorials/counting_flops/

In [3]:
# ...Your code goes here...

In [4]:
daxpy!(z, a, x, y) = z .= a .* x .+ y

const N = 10_000
const a = 3.141
const x = rand(N)
const y = rand(N)
const z = zeros(N)

daxpy!(z, a, x, y);

In [5]:
metrics, events = @perfmon "FLOPS_DP" daxpy!(z, a, x, y);


Group: [0m[1mFLOPS_DP[22m
┌───────────────────────────┬──────────┐
│[1m                     Event [0m│[1m Thread 1 [0m│
├───────────────────────────┼──────────┤
│          ACTUAL_CPU_CLOCK │  88296.0 │
│             MAX_CPU_CLOCK │  65097.0 │
│      RETIRED_INSTRUCTIONS │  10412.0 │
│       CPU_CLOCKS_UNHALTED │  39886.0 │
│ RETIRED_SSE_AVX_FLOPS_ALL │  20000.0 │
│                     MERGE │      0.0 │
└───────────────────────────┴──────────┘
┌──────────────────────┬────────────┐
│[1m               Metric [0m│[1m   Thread 1 [0m│
├──────────────────────┼────────────┤
│  Runtime (RDTSC) [s] │    1.04e-5 │
│ Runtime unhalted [s] │ 3.60393e-5 │
│          Clock [MHz] │    3323.11 │
│                  CPI │    3.83077 │
│         DP [MFLOP/s] │    1923.07 │
└──────────────────────┴────────────┘


In [6]:
function count_FLOPs(N)
    a = 3.141
    x = rand(N)
    y = rand(N)
    z = zeros(N)
    metrics, _ = perfmon(() -> daxpy!(z, a, x, y), "FLOPS_DP"; print=false)
    flops_per_second = first(metrics["FLOPS_DP"])["DP [MFLOP/s]"] * 1e6
    runtime = first(metrics["FLOPS_DP"])["Runtime (RDTSC) [s]"]
    return round(Int, flops_per_second * runtime)
end

count_FLOPs (generic function with 1 method)

In [7]:
NFLOPs_expected(N) = 2 * N

NFLOPs_expected (generic function with 1 method)

In [8]:
count_FLOPs(N)

20000

In [9]:
count_FLOPs(2 * N) == NFLOPs_expected(2 * N)

true