Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Auto-generated wrappers + pylikwid features #10

Merged
merged 69 commits into from
Sep 20, 2021
Merged

Conversation

carstenbauer
Copy link
Member

@carstenbauer carstenbauer commented Sep 5, 2021

In this WIP PR I try to work towards the goal of reaching feature parity with pylikwid (#5).

pylikwid repository / README

  • add Clang.jl auto-generated wrappers of likwid.h (LibLikwid.jl).
    • GPU / nvmon missing
  • Marker API
  • Topology
    • CPU
      • (Improve printing)
    • GPU
  • NUMA
    • (Improve printing)
  • Affinity
    • printing
  • Timer
  • Temperature
  • Power/Energy
  • Configuration
  • Access module
  • Performance monitoring
    • CPU
    • GPU (wrapped but, on the LIKWID side, not fully functional yet)
  • Frequency related function (not mentioned on pylikwid page?)
  • Update Readme
  • Put things into modules(?)

Closes #5
Closes #2

@carstenbauer
Copy link
Member Author

carstenbauer commented Sep 6, 2021

Disclaimer: this my first time wrapping a moderately complicated C library (beyond simple function calls etc.) 😄

C struct:

typedef struct {
    uint32_t numHWThreads; /*!< \brief Amount of active HW threads in the system (e.g. in cpuset) */
    uint32_t activeHWThreads; /*!< \brief Amount of HW threads in the system and length of \a threadPool */
    uint32_t numSockets; /*!< \brief Amount of CPU sockets/packages in the system */
    uint32_t numDies; /*!< \brief Amount of CPU dies in the system */
    uint32_t numCoresPerSocket; /*!< \brief Amount of physical cores in one CPU socket/package */
    uint32_t numThreadsPerCore; /*!< \brief Amount of HW threads in one physical CPU core */
    uint32_t numCacheLevels; /*!< \brief Amount of caches for each HW thread and length of \a cacheLevels */
    HWThread* threadPool; /*!< \brief List of all HW thread descriptions */
    CacheLevel*  cacheLevels; /*!< \brief List of all caches in the hierarchy */
    struct treeNode* topologyTree; /*!< \brief Anchor for a tree structure describing the system topology */
} CpuTopology;

Julia struct:

struct CpuTopology
    numHWThreads::UInt32
    activeHWThreads::UInt32
    numSockets::UInt32
    numDies::UInt32
    numCoresPerSocket::UInt32
    numThreadsPerCore::UInt32
    numCacheLevels::UInt32
    threadPool::Ptr{HWThread}
    cacheLevels::Ptr{CacheLevel}
    topologyTree::Ptr{treeNode}
end

Should be correct, no? However, we get incorrect numbers when ccall((:get_cpuTopology, liblikwid), Ptr{CpuTopology}, ()) |> unsafe_load:

julia> LIKWID.init_topology();

julia> LIKWID.get_cpu_topology() # essentially ccall + unsafe_load
Dict{String, Any} with 7 entries:
  "numSockets"        => 2
  "numHWThreads"      => 40
  "numCoresPerSocket" => 1
  "numCacheLevels"    => 20422880
  "activeHWThreads"   => 40
  "numDies"           => 20
  "numThreadsPerCore" => 3

Note that the python pendant in pylikwid gets the correct numbers:

>>> d["numSockets"]
2
>>> d["numHWThreads"]
40
>>> d["numCoresPerSocket"]
20
>>> d["numCacheLevels"]
3
>>> d["activeHWThreads"]
40
>>> d["numThreadsPerCore"]
1

What's going on? @vchuravy could you give this a quick look when you find some time? Once I figured this one out the rest should (hopefully) be similar. Never mind.... numDies was only added in the most recent likwid version and on "my" supercomputer we only have likwid 5.1.0. (Of course, it is essentially the only change to the entire API and I stumble across it right away 😄)

@carstenbauer
Copy link
Member Author

carstenbauer commented Sep 6, 2021

Note for future self, current output:

julia> LIKWID.get_cpu_info()
LIKWID.CpuInfo
├ family: 6
├ model: 85
├ stepping: 4
├ vendor: 0
├ part: 0
├ clock: 0
├ turbo: true
├ osname: Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
├ name: Intel Skylake SP processor
├ short_name: skylakeX
├ features: FP ACPI MMX SSE SSE2 HTT TM RDTSCP MONITOR VMX EIST TM2 SSSE FMA SSE4.1 SSE4.2 AES AVX RDRAND HLE AVX2 RTM AVX512 RDSEED SSE3 
├ isIntel: true
├ architecture: x86_64
├ supportUncore: false
├ supportClientmem: false
├ featureFlags: 4328456191
├ perf_version: 4
├ perf_num_ctr: 8
├ perf_width_ctr: 48
└ perf_num_fixed_ctr: 3

julia> LIKWID.get_cpu_topology()
LIKWID.CpuTopology
├ numHWThreads: 40
├ activeHWThreads: 40
├ numSockets: 2
├ numCoresPerSocket: 20
├ numThreadsPerCore: 1
├ numCacheLevels: 3
├ threadPool: ... (40 elements)
└ cacheLevels: ... (3 elements)

julia> LIKWID.get_numa_topology().nodes[2]
LIKWID.NumaNode
├ id: 1
├ totalMemory: 94.48 GB
├ freeMemory: 85.26 GB
├ numberOfProcessors: 20
├ processors: [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
├ numberOfDistances: 2
└ distances: [21, 10]
  • (Dict-like printing alignment)

@codecov-commenter
Copy link

codecov-commenter commented Sep 7, 2021

Codecov Report

Merging #10 (8465c05) into main (09e3807) will decrease coverage by 5.62%.
The diff coverage is 43.67%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #10      +/-   ##
==========================================
- Coverage   50.00%   44.37%   -5.63%     
==========================================
  Files           2       19      +17     
  Lines          22     1386    +1364     
==========================================
+ Hits           11      615     +604     
- Misses         11      771     +760     
Impacted Files Coverage Δ
src/LIKWID.jl 0.00% <0.00%> (-62.50%) ⬇️
src/frequency.jl 0.00% <0.00%> (ø)
src/marker_gpu.jl 0.00% <0.00%> (ø)
src/nvmon.jl 0.00% <0.00%> (ø)
src/topology_gpu.jl 0.00% <0.00%> (ø)
src/LibLikwid.jl 26.81% <26.81%> (ø)
src/types.jl 32.50% <32.50%> (ø)
src/power.jl 60.00% <60.00%> (ø)
src/affinity.jl 75.67% <75.67%> (ø)
src/misc.jl 77.96% <77.96%> (ø)
... and 27 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09e3807...8465c05. Read the comment docs.

@carstenbauer
Copy link
Member Author

@vchuravy I'd like to merge this to then set up documentation (Documenter.jl) directly on master. Any objections?

@carstenbauer carstenbauer mentioned this pull request Sep 20, 2021
@carstenbauer
Copy link
Member Author

carstenbauer commented Sep 20, 2021

Given that this PR is a strict (significant) improvement, I'll take the liberty to merge so that I can continue to work on things. Let's dicuss / improve (if necessary revert) things in separate issues / PRs. (I won't tag a new release soon).

@carstenbauer carstenbauer merged commit 6525277 into main Sep 20, 2021
@carstenbauer carstenbauer deleted the cb/wrappers branch September 20, 2021 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement features of pylikwid Marker API for NVIDIA GPUs
2 participants