forked from open-mpi/hwloc
-
Notifications
You must be signed in to change notification settings - Fork 0
/
hwloc.doxy
5106 lines (4202 loc) · 198 KB
/
hwloc.doxy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
/*
* Copyright © 2009 CNRS
* Copyright © 2009-2023 Inria. All rights reserved.
* Copyright © 2009-2013 Université Bordeaux
* Copyright © 2009-2020 Cisco Systems, Inc. All rights reserved.
* Copyright © 2020 Hewlett Packard Enterprise. All rights reserved.
* See COPYING in top-level directory.
*/
/*! \mainpage Hardware Locality
<h1 class="sub">Portable abstraction of hierarchical architectures for high-performance computing</h1>
<hr>
<br>
\htmlonly
<div class="section" id="toc">
\endhtmlonly
\section toc Table of Contents
<ul>
<li> Introduction
<ul>
<li> \ref overview
<li> \ref cli_examples
<li> \ref interface
<li> \ref bugs
<li> \ref history
</ul>
<li> Chapters
<ul>
<li> \ref installation
<li> \ref termsanddefs
<li> \ref tools
<li> \ref envvar
<li> \ref cpu_mem_bind
<li> \ref iodevices
<li> \ref miscobjs
<li> \ref attributes
<li> \ref topoattrs
<li> \ref xml
<li> \ref synthetic
<li> \ref interoperability
<li> \ref threadsafety
<li> \ref plugins
<li> \ref embed
<li> \ref faq
<li> \ref upgrade_to_api_2x
</ul>
</ul>
\htmlonly
This page contains all the <i>Introduction</i> sections.
Chapters are also available from the <i>Related Pages</i> tab above.
\endhtmlonly
\htmlonly
</div><div class="section" id="overview">
\endhtmlonly
\section overview hwloc Overview
The Hardware Locality (hwloc) software project aims at easing the process
of discovering hardware resources in parallel architectures.
It offers command-line tools and a C API for consulting these
resources, their locality, attributes, and interconnection.
hwloc primarily aims at helping high-performance computing (HPC)
applications, but is also applicable to any project seeking to exploit
code and/or data locality on modern computing platforms.
hwloc provides command line tools and a C API to obtain the
hierarchical map of key computing elements within a node, such as: NUMA memory
nodes, shared caches, processor packages, dies and cores,
processing units (logical processors or "threads")
and even I/O devices.
hwloc also gathers various attributes such as
cache and memory information, and is portable across a variety of
different operating systems and platforms.
hwloc primarily aims at helping high-performance computing (HPC)
applications, but is also applicable to any project seeking to exploit
code and/or data locality on modern computing platforms.
hwloc supports the following operating systems:
<ul>
<li>Linux (with knowledge of cgroups and cpusets, memory targets/initiators, etc.)
on all supported hardware, including Intel Xeon Phi, ScaleMP vSMP,
and NumaScale NumaConnect.</li>
<li>Solaris (with support for processor sets and logical domains)</li>
<li>AIX</li>
<li>Darwin / OS X</li>
<li>FreeBSD and its variants (such as kFreeBSD/GNU)</li>
<li>NetBSD</li>
<li>HP-UX</li>
<li>Microsoft Windows</li>
</ul>
Since it uses standard Operating System information, hwloc's support is mostly
independant from the processor type (x86, powerpc, ...) and just relies on the
Operating System support. The main exception is BSD operating systems (NetBSD, FreeBSD, etc.)
because they do not provide support topology information, hence hwloc uses an x86-only CPUID-based
backend (which can be used for other OSes too, see the \ref plugins section).
To check whether hwloc works on a particular machine, just try to build it
and run <tt>lstopo</tt> or <tt>lstopo-no-graphics</tt>. If some things do not look right
(e.g. bogus or missing cache information), see \ref bugs.
hwloc only reports the number of processors on unsupported operating
systems; no topology information is available.
For development and debugging purposes, hwloc also offers the ability to
work on "fake" topologies:
<ul>
<li> Symmetrical tree of resources generated from a list of level arities,
see \ref synthetic.</li>
<li> Remote machine simulation through the gathering of topology as XML files,
see \ref xml.</li>
</ul>
hwloc can display the topology in a human-readable format, either in
graphical mode (X11), or by exporting in one of several different
formats, including: plain text, LaTeX tikzpicture, PDF, PNG, and FIG (see \ref cli_examples
below). Note that some of the export formats require additional
support libraries.
hwloc offers a programming interface for manipulating topologies and
objects. It also brings a powerful CPU bitmap API that is used to
describe topology objects location on physical/logical processors. See
the \ref interface below. It may also be used to binding applications
onto certain cores or memory nodes. Several utility programs are also
provided to ease command-line manipulation of topology objects,
binding of processes, and so on.
Bindings for several other languages are available from the
<a href="https://www.open-mpi.org/projects/hwloc/#language_bindings">project website</a>.
\htmlonly
</div><div class="section" id="cli_examples">
\endhtmlonly
\section cli_examples Command-line Examples
On a 4-package 2-core machine with hyper-threading, the \c lstopo tool
may show the following graphical output:
\image html dudley.png
\image latex dudley.png "" width=\textwidth
Here's the equivalent output in textual form:
\verbatim
Machine
NUMANode L#0 (P#0)
Package L#0 + L3 L#0 (4096KB)
L2 L#0 (1024KB) + L1 L#0 (16KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#8)
L2 L#1 (1024KB) + L1 L#1 (16KB) + Core L#1
PU L#2 (P#4)
PU L#3 (P#12)
Package L#1 + L3 L#1 (4096KB)
L2 L#2 (1024KB) + L1 L#2 (16KB) + Core L#2
PU L#4 (P#1)
PU L#5 (P#9)
L2 L#3 (1024KB) + L1 L#3 (16KB) + Core L#3
PU L#6 (P#5)
PU L#7 (P#13)
Package L#2 + L3 L#2 (4096KB)
L2 L#4 (1024KB) + L1 L#4 (16KB) + Core L#4
PU L#8 (P#2)
PU L#9 (P#10)
L2 L#5 (1024KB) + L1 L#5 (16KB) + Core L#5
PU L#10 (P#6)
PU L#11 (P#14)
Package L#3 + L3 L#3 (4096KB)
L2 L#6 (1024KB) + L1 L#6 (16KB) + Core L#6
PU L#12 (P#3)
PU L#13 (P#11)
L2 L#7 (1024KB) + L1 L#7 (16KB) + Core L#7
PU L#14 (P#7)
PU L#15 (P#15)
\endverbatim
Note that there is also an equivalent output in XML that is meant for
exporting/importing topologies but it is hardly readable to human-beings
(see \ref xml for details).
On a 4-package 2-core Opteron NUMA machine
(with two core cores disallowed by the administrator),
the \c lstopo tool may show the following graphical output
(with <tt>\--disallowed</tt> for displaying disallowed objects):
\image html hagrid.png
\image latex hagrid.png "" width=\textwidth
Here's the equivalent output in textual form:
\verbatim
Machine (32GB total)
Package L#0
NUMANode L#0 (P#0 8190MB)
L2 L#0 (1024KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#0)
L2 L#1 (1024KB) + L1 L#1 (64KB) + Core L#1 + PU L#1 (P#1)
Package L#1
NUMANode L#1 (P#1 8192MB)
L2 L#2 (1024KB) + L1 L#2 (64KB) + Core L#2 + PU L#2 (P#2)
L2 L#3 (1024KB) + L1 L#3 (64KB) + Core L#3 + PU L#3 (P#3)
Package L#2
NUMANode L#2 (P#2 8192MB)
L2 L#4 (1024KB) + L1 L#4 (64KB) + Core L#4 + PU L#4 (P#4)
L2 L#5 (1024KB) + L1 L#5 (64KB) + Core L#5 + PU L#5 (P#5)
Package L#3
NUMANode L#3 (P#3 8192MB)
L2 L#6 (1024KB) + L1 L#6 (64KB) + Core L#6 + PU L#6 (P#6)
L2 L#7 (1024KB) + L1 L#7 (64KB) + Core L#7 + PU L#7 (P#7)
\endverbatim
On a 2-package quad-core Xeon (pre-Nehalem, with 2 dual-core dies into
each package):
\image html emmett.png
\image latex emmett.png "" width=\textwidth
Here's the same output in textual form:
\verbatim
Machine (total 16GB)
NUMANode L#0 (P#0 16GB)
Package L#0
L2 L#0 (4096KB)
L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L1 L#1 (32KB) + Core L#1 + PU L#1 (P#4)
L2 L#1 (4096KB)
L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2)
L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6)
Package L#1
L2 L#2 (4096KB)
L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1)
L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5)
L2 L#3 (4096KB)
L1 L#6 (32KB) + Core L#6 + PU L#6 (P#3)
L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)
\endverbatim
\htmlonly
</div><div class="section" id="interface">
\endhtmlonly
\section interface Programming Interface
The basic interface is available in hwloc.h.
Some higher-level functions are available in hwloc/helper.h to reduce
the need to manually manipulate objects and follow links between them.
Documentation for all these is provided later in this document.
Developers may also want to look at hwloc/inlines.h which contains the
actual inline code of some hwloc.h routines, and at this document,
which provides good higher-level topology traversal examples.
To precisely define the vocabulary used by hwloc, a \ref termsanddefs
section is available and should probably be read first.
Each hwloc object contains a cpuset describing the list of processing
units that it contains. These bitmaps may be used for
\ref hwlocality_cpubinding and \ref hwlocality_membinding.
hwloc offers an extensive
bitmap manipulation interface in hwloc/bitmap.h.
Moreover, hwloc also comes with additional helpers for
interoperability with several commonly used environments.
See the \ref interoperability section for details.
The complete API documentation is available in a full set of HTML
pages, man pages, and self-contained PDF files (formatted for both
both US letter and A4 formats) in the source tarball in
doc/doxygen-doc/.
<strong>NOTE:</strong> If you are building the documentation from a
Git clone, you will need to have Doxygen and pdflatex
installed -- the documentation will be built during the normal "make"
process. The documentation is installed during "make install" to
$prefix/share/doc/hwloc/ and your systems default man page tree (under
$prefix, of course).
\subsection portability Portability
Operating System have varying support for CPU and memory binding,
e.g. while some Operating Systems provide interfaces for all kinds of CPU and
memory bindings, some others provide only interfaces for a limited number of
kinds of CPU and memory binding, and some do not provide any binding interface
at all. Hwloc's binding functions would then simply return the ENOSYS error
(Function not implemented), meaning that the underlying Operating System
does not provide any interface for them. \ref hwlocality_cpubinding and
\ref hwlocality_membinding provide more information on which hwloc binding functions
should be preferred because interfaces for them are usually available on the
supported Operating Systems.
Similarly, the ability of reporting topology information varies from
one platform to another.
As shown in \ref cli_examples, hwloc can obtain information on a wide
variety of hardware topologies. However, some platforms and/or
operating system versions will only report a subset of this
information. For example, on an PPC64-based system with 8 cores
(each with 2 hardware threads) running a default 2.6.18-based kernel
from RHEL 5.4, hwloc is only able to glean information about NUMA
nodes and processor units (PUs). No information about caches,
packages, or cores is available.
Here's the graphical output from lstopo on this platform when
Simultaneous Multi-Threading (SMT) is enabled:
\image html ppc64-with-smt.png
\image latex ppc64-with-smt.png "" width=\textwidth
And here's the graphical output from lstopo on this platform when SMT is
disabled:
\image html ppc64-without-smt.png
\image latex ppc64-without-smt.png "" width=.5\textwidth
Notice that hwloc only sees half the PUs when SMT is disabled.
PU L#6, for example, seems to change location from NUMA node #0 to #1.
In reality, no PUs "moved" -- they were simply re-numbered when hwloc
only saw half as many (see also Logical index in \ref termsanddefs_indexes).
Hence, PU L#6 in the SMT-disabled picture probably corresponds to
PU L#12 in the SMT-enabled picture.
This same "PUs have disappeared" effect can be seen on other platforms
-- even platforms / OSs that provide much more information than the
above PPC64 system. This is an unfortunate side-effect of how
operating systems report information to hwloc.
Note that upgrading the Linux kernel on the same PPC64 system
mentioned above to 2.6.34, hwloc is able to discover all the topology
information. The following picture shows the entire topology layout
when SMT is enabled:
\image html ppc64-full-with-smt.png
\image latex ppc64-full-with-smt.png "" width=\textwidth
Developers using the hwloc API or XML output for portable applications
should therefore be extremely careful to not make any assumptions
about the structure of data that is returned. For example, per the
above reported PPC topology, it is not safe to assume that PUs will
always be descendants of cores.
Additionally, future hardware may insert new topology elements that
are not available in this version of hwloc. Long-lived applications
that are meant to span multiple different hardware platforms should
also be careful about making structure assumptions. For example,
a new element may someday exist between a core and a PU.
\subsection interface_example API Example
The following small C example (available in the source tree as ``doc/examples/hwloc-hello.c'')
prints the topology of the machine and performs some thread and memory binding.
More examples are available in the doc/examples/ directory of the source
tree.
\include examples/hwloc-hello.c
hwloc provides a \c pkg-config executable to obtain relevant compiler
and linker flags. For example, it can be used thusly to compile
applications that utilize the hwloc library (assuming GNU Make):
\verbatim
CFLAGS += $(shell pkg-config --cflags hwloc)
LDLIBS += $(shell pkg-config --libs hwloc)
hwloc-hello: hwloc-hello.c
$(CC) hwloc-hello.c $(CFLAGS) -o hwloc-hello $(LDLIBS)
\endverbatim
On a machine 2 processor packages -- each package of
which has two processing cores -- the output from running \c
hwloc-hello could be something like the following:
\verbatim
shell$ ./hwloc-hello
*** Objects at level 0
Index 0: Machine
*** Objects at level 1
Index 0: Package#0
Index 1: Package#1
*** Objects at level 2
Index 0: Core#0
Index 1: Core#1
Index 2: Core#3
Index 3: Core#2
*** Objects at level 3
Index 0: PU#0
Index 1: PU#1
Index 2: PU#2
Index 3: PU#3
*** Printing overall tree
Machine
Package#0
Core#0
PU#0
Core#1
PU#1
Package#1
Core#3
PU#2
Core#2
PU#3
*** 2 package(s)
*** Logical processor 0 has 0 caches totaling 0KB
shell$
\endverbatim
\htmlonly
</div><div class="section" id="bugs">
\endhtmlonly
\section bugs Questions and Bugs
Bugs should be reported in the tracker
(https://github.com/open-mpi/hwloc/issues).
Opening a new issue automatically displays lots of hints about
how to debug and report issues.
Questions may be sent to the users or developers mailing lists
(https://www.open-mpi.org/community/lists/hwloc.php).
There is also a <tt>\#hwloc</tt> IRC channel on Libera Chat (<tt>irc.libera.chat</tt>).
\htmlonly
</div><div class="section" id="history">
\endhtmlonly
\section history History / Credits
hwloc is the evolution and merger of the libtopology project and the Portable
Linux Processor Affinity (PLPA) (https://www.open-mpi.org/projects/plpa/)
project. Because of functional and ideological overlap, these two code bases
and ideas were merged and released under the name "hwloc" as an Open MPI
sub-project.
libtopology was initially developed by the Inria Runtime Team-Project.
PLPA was initially developed by
the Open MPI development team as a sub-project. Both are now deprecated
in favor of hwloc, which is distributed as an Open MPI sub-project.
\htmlonly
</div>
\endhtmlonly
\page installation Installation
hwloc (https://www.open-mpi.org/projects/hwloc/) is available under the
BSD license. It is hosted as a sub-project of the overall Open MPI
project (https://www.open-mpi.org/). Note that hwloc does not require
any functionality from Open MPI -- it is a wholly separate (and much
smaller!) project and code base. It just happens to be hosted as part
of the overall Open MPI project.
\htmlonly
</div><div class="section" id="basic_installation">
\endhtmlonly
\section basic_installation Basic Installation
Installation is the fairly common GNU-based process:
\verbatim
shell$ ./configure --prefix=...
shell$ make
shell$ make install
\endverbatim
The hwloc command-line tool "lstopo" produces human-readable topology
maps, as mentioned above.
Running the "lstopo" tool is a good way to check as a graphical output
whether hwloc properly detected the architecture of your node.
\htmlonly
</div><div class="section" id="optional_dependencies">
\endhtmlonly
\section optional_dependencies Optional Dependencies
lstopo may also export graphics to the SVG and "fig" file formats.
Support for PDF, Postscript, and PNG exporting is provided if
the "Cairo" development package (usually <tt>cairo-devel</tt> or
<tt>libcairo2-dev</tt>) can be found in "lstopo" when hwloc
is configured and build.
<br>
The hwloc core may also benefit from the following development packages:
<ul>
<li>libpciaccess for full I/O device discovery
(<tt>libpciaccess-devel</tt> or <tt>libpciaccess-dev</tt> package).
On Linux, PCI discovery may still be performed (without vendor/device names)
even if libpciaccess cannot be used.
</li>
<li>AMD or NVIDIA OpenCL implementations for OpenCL device discovery.
</li>
<li>the NVIDIA CUDA Toolkit for CUDA device discovery.
See \ref faq_cuda_build.
</li>
<li>the NVIDIA Management Library (NVML) for NVML device discovery.
It is included in CUDA since version 8.0.
Older NVML releases were available within the NVIDIA GPU Deployment Kit
from https://developer.nvidia.com/gpu-deployment-kit .
</li>
<li>the NV-CONTROL X extension library (NVCtrl) for NVIDIA display discovery.
The relevant development package is usually <tt>libXNVCtrl-devel</tt>
or <tt>libxnvctrl-dev</tt>.
It is also available within nvidia-settings from ftp://download.nvidia.com/XFree86/nvidia-settings/
and https://github.com/NVIDIA/nvidia-settings/ .
</li>
<li>the AMD ROCm SMI library for RSMI device discovery.
The relevant development package is usually <tt>rocm-smi-lib64</tt>
or <tt>librocm-smi-dev</tt>.
See \ref faq_rocm_build.
</li>
<li>the oneAPI Level Zero library.
The relevant development package is usually <tt>level-zero-dev</tt>
or <tt>level-zero-devel</tt>.
</li>
<li>libxml2 for full XML import/export support (otherwise, the
internal minimalistic parser will only be able to import
XML files that were exported by the same hwloc release).
See \ref xml for details.
The relevant development package is usually <tt>libxml2-devel</tt>
or <tt>libxml2-dev</tt>.
</li>
<li>libudev on Linux for easier discovery of OS device information
(otherwise hwloc will try to manually parse udev raw files).
The relevant development package is usually <tt>libudev-devel</tt>
or <tt>libudev-dev</tt>.
</li>
<li>libtool's ltdl library for dynamic plugin loading if the native dlopen cannot be used.
The relevant development package is usually <tt>libtool-ltdl-devel</tt>
or <tt>libltdl-dev</tt>.
</li>
</ul>
PCI and XML support may be statically built inside the main hwloc
library, or as separate dynamically-loaded plugins (see the
\ref plugins section).
Also note that if you install supplemental libraries in non-standard
locations, hwloc's configure script may not be able to find them
without some help. You may need to specify additional CPPFLAGS,
LDFLAGS, or PKG_CONFIG_PATH values on the configure command line.
For example, if libpciaccess was installed into /opt/pciaccess,
hwloc's configure script may not find it by default. Try adding
PKG_CONFIG_PATH to the ./configure command line, like this:
\verbatim
./configure PKG_CONFIG_PATH=/opt/pciaccess/lib/pkgconfig ...
\endverbatim
Note that because of the possibility of GPL taint, the
<tt>pciutils</tt> library <tt>libpci</tt> will not be used (remember
that hwloc is BSD-licensed).
\htmlonly
</div><div class="section" id="gitclone_installation">
\endhtmlonly
\section gitclone_installation Installing from a Git clone
Additionally, the code can be directly cloned from Git:
\verbatim
shell$ git clone https://github.com/open-mpi/hwloc.git
shell$ cd hwloc
shell$ ./autogen.sh
\endverbatim
Note that GNU Autoconf >=2.63, Automake >=1.11 and Libtool >=2.2.6 are
required when building from a Git clone.
Nightly development snapshots are available on the web site,
they can be configured and built without any need for Git
or GNU Autotools.
\page termsanddefs Terms and Definitions
\htmlonly
<div class="section" id="termsanddefs_objects">
\endhtmlonly
\section termsanddefs_objects Objects
<dl>
<dt>Object</dt>
<dd>Interesting kind of part of the system, such as a Core, a L2Cache,
a NUMA memory node, etc. The different types detected by hwloc are
detailed in the ::hwloc_obj_type_t enumeration.
There are four kinds of Objects: Memory (NUMA nodes and Memory-side caches), I/O (Bridges, PCI and OS devices),
Misc, and Normal (everything else, including Machine, Package, Die, Core, PU, CPU Caches, etc.).
Normal and Memory objects have (non-NULL) CPU sets and nodesets, while I/O and Misc don't.
Objects are topologically sorted by locality (CPU and node sets)
into a tree (see \ref termsanddefs_tree).
</dd>
<dt>Processing Unit (PU)</dt>
<dd>The smallest processing element that can be represented by a hwloc
object. It may be a single-core processor, a core of a multicore
processor, or a single thread in a SMT processor
(also sometimes called "Logical processor",
not to be confused with "Logical index of a processor").
hwloc's PU acronym stands for Processing Unit.
</dd>
<dt>Package</dt>
<dd>A processor Package is the physical package that usually gets
inserted into a socket on the motherboard.
It is also often called a physical processor or a CPU even if these
names bring confusion with respect to cores and processing units.
A processor package usually contains multiple cores
(and may also be composed of multiple dies).
hwloc Package objects were called Sockets up to hwloc 1.10.
</dd>
<dt>NUMA Node</dt>
<dd>
An object that contains memory that is directly and byte-accessible
to the host processors.
It is usually close to some cores as specified by its CPU set.
Hence it is attached as a memory child of the object that groups
those cores together, for instance a Package objects with 4 Core children
(see \ref termsanddefs_tree).
</dd>
<dt>Memory-side Cache</dt>
<dd>
A cache in front of a specific memory region (e.g. a range of physical addresses).
It caches all accesses to that region without caring about which core issued the request.
This is the opposite of usual CPU caches where only accesses from the local cores
are cached, without caring about the target memory.
In hwloc, memory-side caches are memory objects placed between their local CPU objects
(parent) and the target NUMA node memory (child).
</dd>
</dl>
\htmlonly
</div><div class="section" id="termsanddefs_indexes">
\endhtmlonly
\section termsanddefs_indexes Indexes and Sets
<dl>
<dt>OS or physical index</dt>
<dd>The index that the operating system (OS) uses to identify the
object. This may be completely arbitrary, non-unique, non-contiguous, not
representative of logical proximity, and may depend on the BIOS
configuration. That is why hwloc almost never uses them, only in the default
lstopo output (<tt>P\#x</tt>) and cpuset masks.
See also \ref faq_indexes.</dd>
<dt>Logical index</dt>
<dd>Index to uniquely identify objects of the same type and depth,
automatically computed by hwloc according to the topology. It expresses
logical proximity in a generic way, i.e. objects which have adjacent logical
indexes are adjacent in the topology. That is why hwloc almost always uses
it in its API, since it expresses logical proximity. They can be shown (as
<tt>L\#x</tt>) by <tt>lstopo</tt> thanks to the <tt>-l</tt> option. This index
is always linear and in
the range [0, num_objs_same_type_same_level-1]. Think of it as ``cousin
rank.'' The ordering is based on topology first, and then on OS CPU numbers,
so it is stable across everything except firmware CPU renumbering.
"Logical index" should not be confused with "Logical processor". A "Logical
processor" (which in hwloc we rather call "processing unit" to avoid the
confusion) has both a physical index (as chosen arbitrarily by BIOS/OS) and a logical
index (as computed according to logical proximity by hwloc).
See also \ref faq_indexes.</dd>
<dt>CPU set</dt>
<dd>The set of processing units (PU) logically included in an object
(if it makes sense). They are always expressed using physical
processor numbers (as announced by the OS). They are implemented as the
::hwloc_bitmap_t opaque structure. hwloc CPU sets are just masks, they
do \em not have any relation with an operating system actual binding notion like
Linux' cpusets.
I/O and Misc objects do not have CPU sets while all Normal and Memory objects have non-NULL CPU sets.</dd>
<dt>Node set</dt>
<dd>The set of NUMA memory nodes logically included in an object
(if it makes sense). They are always expressed using physical node
numbers (as announced by the OS). They are implemented with the
::hwloc_bitmap_t opaque structure.
as bitmaps.
I/O and Misc objects do not have Node sets while all Normal and Memory objects have non-NULL nodesets.</dd>
<dt>Bitmap</dt>
<dd>A possibly-infinite set of bits used for describing sets of objects
such as CPUs (CPU sets) or memory nodes (Node sets). They are implemented
with the ::hwloc_bitmap_t opaque structure.
</dd>
</dl>
\htmlonly
</div><div class="section" id="termsanddefs_tree">
\endhtmlonly
\section termsanddefs_tree Hierarchy, Tree and Levels
<dl>
<dt>Parent object</dt>
<dd>The object logically containing the current object, for example
because its CPU set includes the CPU set of the current object.
All objects have a non-NULL parent, except the root of the topology (Machine object).
</dd>
<dt>Ancestor object</dt>
<dd>The parent object, or its own parent, and so on.</dd>
<dt>Children object(s)</dt>
<dd>The object (or objects) contained in the current object because
their CPU set is included in the CPU set of the current object.
Each object may also contain separated lists for Memory, I/O and Misc object children.
</dd>
<dt>Arity</dt>
<dd>The number of normal children of an object.
There are also specific arities for Memory, I/O and Misc children.
</dd>
<dt>Sibling objects</dt>
<dd>Objects in the same children list, which all of them are normal
children of the same parent, or all of them are Memory children of
the same parent, or I/O children, or Misc.
They usually have the same type (and hence are cousins, as well).
But they may not if the topology is asymmetric.
</dd>
<dt>Sibling rank</dt>
<dd>Index to uniquely identify objects which have
the same parent, and is always in the range [0, arity-1]
(respectively memory_arity, io_arity or misc_arity for Memory, I/O
and Misc children of a parent).</dd>
<dt>Cousin objects</dt>
<dd>Objects of the same type (and depth) as the current object,
even if they do not have the same parent.</dd>
<dt>Level</dt>
<dd>Set of objects of the same type and depth. All these objects
are cousins.
Memory, I/O and Misc objects also have their own specific levels and (virtual) depth.
</dd>
<dt>Depth</dt>
<dd>Nesting level in the object tree, starting from the root object.
If the topology is symmetric, the depth of a child is equal to the
parent depth plus one, and an object depth is also equal to the number
of parent/child links between the root object and the given object.
If the topology is asymmetric, the difference between some parent
and child depths may be larger than one when some intermediate levels
(for instance groups) are missing in only some parts of the machine.
The depth of the Machine object is always 0 since it is always the
root of the topology.
The depth of PU objects is equal to the number of levels in the topology
minus one.
Memory, I/O and Misc objects also have their own specific levels and depth.
</dd>
</dl>
The following diagram can help to understand the vocabulary of the relationships
by showing the example of a machine with two dual core packages (with no
hardware threads); thus, a topology with 5 levels. Each box with rounded corner
corresponds to one ::hwloc_obj_t, containing the values of the different integer
fields (depth, logical_index, etc.), and arrows show to which other ::hwloc_obj_t
pointers point to (first_child, parent, etc.).
The topology always starts with a Machine object as root (depth 0)
and ends with PU objects at the bottom (depth 4 here).
Objects of the same level (cousins) are listed in red boxes and linked
with red arrows.
Children of the same parent (siblings) are linked with blue arrows.
The L2 cache of the last core is intentionally missing to show how asymmetric topologies are handled.
See \ref faq_asymmetric for more information about such strange topologies.
\image html diagram.png
\image latex diagram.eps "" width=\textwidth
It should be noted that for PU objects, the logical index -- as
computed linearly by hwloc -- is not the same as the OS index.
The NUMA node is on the side because it is not part of the main tree
but rather attached to the object that corresponds to its locality
(the entire machine here, hence the root object).
It is attached as a <i>Memory</i> child (in green) and has a virtual depth (negative).
It could also have siblings if there were multiple local NUMA nodes,
or cousins if other NUMA nodes were attached somewhere else in the machine.
I/O or Misc objects could be attached in a similar manner.
\page tools Command-Line Tools
\htmlonly
<div class="section">
\endhtmlonly
hwloc comes with an extensive C programming interface and several
command line utilities. Each of them is fully documented in its own
manual page; the following is a summary of the available command line
tools.
\htmlonly
</div><div class="section" id="cli_lstopo">
\endhtmlonly
\section cli_lstopo lstopo and lstopo-no-graphics
lstopo (also known as hwloc-ls) displays the
hierarchical topology map of the current system. The output may be
graphical, ascii-art or textual, and can also be exported to numerous file
formats such as PDF, PNG, XML, and others.
Advanced graphical outputs require the "Cairo" development package
(usually <tt>cairo-devel</tt> or <tt>libcairo2-dev</tt>).
lstopo and lstopo-no-graphics accept the same command-line options.
However, graphical outputs are only available in lstopo.
Textual outputs (those that do not depend on heavy external libraries
such as Cairo) are supported in both lstopo and lstopo-no-graphics.
This command can also display the processes currently bound to a part
of the machine (via the <tt>\--ps</tt> option).
Note that lstopo can read XML files and/or alternate chroot
filesystems and display topological maps representing those systems
(e.g., use lstopo to output an XML file on one system, and then use
lstopo to read in that XML file and display it on a different system).
\htmlonly
</div><div class="section" id="cli_hwloc_bind">
\endhtmlonly
\section cli_hwloc_bind hwloc-bind
hwloc-bind binds processes to specific hardware objects through a
flexible syntax. A simple example is binding an executable to
specific cores (or packages or bitmaps or ...). The hwloc-bind(1) man
page provides much more detail on what is possible.
hwloc-bind can also be used to retrieve the current process' binding,
or retrieve the last CPU(s) where a process ran,
or operate on memory binding.
Just like hwloc-calc, the input locations given to hwloc-bind may be
either objects or cpusets (bitmaps as reported by hwloc-calc or hwloc-distrib).
\htmlonly
</div><div class="section" id="cli_hwloc_calc">
\endhtmlonly
\section cli_hwloc_calc hwloc-calc
hwloc-calc is hwloc's Swiss Army Knife command-line tool for converting things.
The input may be either objects or cpusets (bitmaps as reported by another hwloc-calc instance or by hwloc-distrib),
that may be combined by addition, intersection or subtraction.
The output may be expressed as:
<ul>
<li>a cpuset bitmap: This compact opaque representation of objects is useful for shell scripts etc.
It may passed to hwloc command-line tools such as hwloc-calc or hwloc-bind,
or to hwloc command-line options such as <tt>lstopo \--restrict</tt>.</li>
<li>a nodeset bitmap: Another opaque representation that represents memory locality more precisely,
especially if some NUMA nodes are CPU less or if multiple NUMA nodes are local to the same CPUs.</li>
<li>the amount of the equivalent hwloc objects from a specific type, or the list of their indexes.
This is useful for iterating over all similar objects (for instance all cores) within a given
part of a platform.</li>
<li>a hierarchical description of objects,
for instance a thread index within a core within a package.
This gives a better view of the actual location of an object.</li>
</ul>
Moreover, input and/or output may be use either physical/OS object
indexes or as hwloc's logical object indexes.
It eases cooperation with external tools such as taskset or numactl
by exporting hwloc specifications into list of processor or NUMA node
physical indexes.
See also \ref faq_indexes.
\htmlonly
</div><div class="section" id="cli_hwloc_info">
\endhtmlonly
\section cli_hwloc_info hwloc-info
hwloc-info dumps information about the given objects, as well as all its specific attributes.
It is intended to be used with tools such as grep for filtering
certain attribute lines.
When no object is specified, or when <tt>\--topology</tt> is passed,
hwloc-info prints a summary of the topology.
When <tt>\--support</tt> is passed, hwloc-info lists the supported
features for the topology.
\htmlonly
</div><div class="section" id="cli_hwloc_distrib">
\endhtmlonly
\section cli_hwloc_distrib hwloc-distrib
hwloc-distrib generates a set of cpuset bitmaps that are uniformly
distributed across the machine for the given number of processes.
These strings may be used with hwloc-bind to run processes to maximize
their memory bandwidth by properly distributing them across the
machine.
\htmlonly
</div><div class="section" id="cli_hwloc_ps">
\endhtmlonly
\section cli_hwloc_ps hwloc-ps
hwloc-ps is a tool to display the bindings of processes that are
currently running on the local machine. By default, hwloc-ps only
lists processes that are bound; unbound process (and Linux kernel
threads) are not displayed.
\htmlonly
</div><div class="section" id="cli_hwloc_annotate">
\endhtmlonly
\section cli_hwloc_annotate hwloc-annotate
hwloc-annotate may modify object (and topology) attributes such as string information
(see \ref attributes_info for details) or Misc children objects.
It may also add distances, memory attributes, etc. to the topology.
It reads an input topology from a XML file and outputs
the annotated topology as another XML file.
\htmlonly
</div><div class="section" id="cli_hwloc_diffpatchcompress">
\endhtmlonly
\section cli_hwloc_diffpatchcompress hwloc-diff, hwloc-patch and hwloc-compress-dir
hwloc-diff computes the difference between two topologies
and outputs it to another XML file.
hwloc-patch reads such a difference file and applies to
another topology.
hwloc-compress-dir compresses an entire directory of XML
files by using hwloc-diff to save the differences between
topologies instead of entire topologies.
\htmlonly
</div><div class="section" id="cli_hwloc_dump_hwdata">
\endhtmlonly
\section cli_hwloc_dump_hwdata hwloc-dump-hwdata
hwloc-dump-hwdata is a Linux and x86-specific tool that dumps
(during boot, privileged) some topology and locality information
from raw hardware files (SMBIOS and ACPI tables) to human-readable
and world-accessible files that the hwloc library will later reuse.
Currently only used on Intel Xeon Phi processor platforms.
See \ref faq_knl_dump.
See <tt>HWLOC_DUMPED_HWDATA_DIR</tt> in \ref envvar for details
about the location of dumped files.
\htmlonly
</div><div class="section" id="cli_hwloc_gather">
\endhtmlonly
\section cli_hwloc_gather hwloc-gather-topology and hwloc-gather-cpuid
hwloc-gather-topology is a Linux-specific tool that saves the
relevant topology files of the current machine into a tarball
(and the corresponding lstopo outputs).
hwloc-gather-cpuid is a x86-specific tool that dumps the
result of CPUID instructions on the current machine into
a directory.