Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PATCH CATERPILLAR v4] Code instrumentation with PAPI library #406

Closed

Conversation

bogdanPricope
Copy link
Contributor

@bogdanPricope bogdanPricope commented Jan 19, 2018

ODP API instrumentation with PAPI (Performance API) library (http://icl.cs.utk.edu/papi)

"instrum" example is using library preload, symbols overloading and PAPI to obtain performance counters (like data cache miss or conditional branch count) for execution of ODP API calls. Performance counters are saved in CSV file format and can be presented in a graphical form (with Excel or similar).

Build:
./bootstrap
./configure --with-papi-path=< path to papi install> --with-code-instrum-profile=<ddf|scheduler|all>
make clean
make

Configure path to papi install:
--with-papi-path= < path to papi install>

Configure instrumented ODP API:
--with-code-instrum-profile=<ddf|scheduler|all>
Default: all

Storage directory:
export ODP_INSTRUM_STORE_DIR= < folder to store csv files >
Default directory: /tmp
e.g.
export ODP_INSTRUM_STORE_DIR=/home/bopi/Work/linaro/store

Configure PAPI events set:
export ODP_INSTRUM_PAPI_EVENTS=
Default: PAPI_BR_CN,PAPI_L2_DCM
e.g:
export ODP_INSTRUM_PAPI_EVENTS=PAPI_BR_CN,PAPI_L2_DCM,PAPI_BR_UCN

Set load library path:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH::
e.g.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/bopi/Work/linaro/odp/lib/.libs:/home/bopi/Work/linaro/papi_inst/lib

Run:
LD_PRELOAD=libinstrum.so.0.0.0 ./example/generator/.libs/odp_generator -I eth1 -m r -c 0x8

@muvarov muvarov changed the title Code instrumentation with PAPI library [PATCH CATERPILLAR v1] Code instrumentation with PAPI library Jan 19, 2018
Copy link
Contributor

@Bill-Fischofer-Linaro Bill-Fischofer-Linaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very promising. General comments:

  1. We should update the DEPENDENCIES file to include details about what's required to use PAPI.

  2. The Travis file needs to be updated to include at least one run that enables this code so that it's actually compiled. We can't merge in code that is never compiled.

code_instrumentation=yes],[])

AC_SUBST([PAPI_PATH])
AM_CONDITIONAL([CODE_INSTRUM], [test x$code_instrumentation = xyes ])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PAPI enable/disable status should also be configured in the configure summary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@@ -13,4 +13,8 @@ SUBDIRS = classifier \
ddf_ifs \
ddf_app

if CODE_INSTRUM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this get enabled/tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 'example/m4/configure.m4' there is this conditional:
AM_CONDITIONAL([CODE_INSTRUM], [test x$code_instrumentation = xyes])

It is defined if PAPI path is set at configure time: --with-papi-path=DIR
We have similar arrangements for different pktios (netmap, etc)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please support bare --with-papi, so that one can use system-provided PAPI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, '--with-papi' will be added.

@muvarov muvarov changed the title [PATCH CATERPILLAR v1] Code instrumentation with PAPI library [PATCH CATERPILLAR v2] Code instrumentation with PAPI library Jan 25, 2018
@@ -13,4 +13,8 @@ AM_CFLAGS = \
-I$(top_srcdir)/platform/@with_platform@/arch/@ARCH_DIR@ \
-I$(top_builddir)/include

if CODE_INSTRUM
AM_LDFLAGS = -L$(LIB) -lssl -lcrypto -latomic
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to pick up dynamic-examples-tests PR into your branch and then base this code on top.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need at this point (can be done at master ->caterpillar merge time). Most likely, #390 will get into Caterpillar before this PR.

$(srcdir)/papi_cnt.h \
$(srcdir)/init.h \
$(srcdir)/drv.h \
$(srcdir)/sched.h
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these headers used by anybody else? If not, you can just add them to _SOURCES. Also there is no need to specify $(srcdir) here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of expected to have *.h in noinst_HEADERS and *.c in _SOURCES.

$ ./configure --prefix=<papi_install_dir>
$ make clean
$ make
$ make install
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't apt-get install libpapi-dev / yum install / etc. enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know. I was not able to get it for my Ubuntu. You should get something like 'libpfm4 libpapi5.4 libpapi-dev' or newer.
Also, you get different PAPI versions for different releases and don't know if they have backward compats.

The Bionic Beaver (active development) papi trunk series
5.6.0-1 release (universe) 2018-01-26
The Artful Aardvark (current stable release) papi trunk series
5.5.1-2 release (universe) 2017-05-09
The Xenial Xerus (supported) papi trunk series
5.4.3-2 release (universe) 2016-04-19
The Trusty Tahr (supported)
5.3.0-3 release (universe) 2014-02-05

@@ -202,6 +202,83 @@ Prerequisites for building the OpenDataPlane (ODP) API
1024MB of memory:
$ sudo ODP_PKTIO_DPDK_PARAMS="-m 1024" ./test/performance/odp_l2fwd -i 0 -c 1

3.5 Code instrumentation with PAPI library (optional)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are not enabling instrumentation in generic library code, maybe this text should go to example/instrum/README with a note here that we support PAPI instrumentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am expecting to have more config time options in the future (symbols visibility, etc.) that will affect building of generic library code.

@lumag
Copy link

lumag commented Jan 26, 2018

Also this PR should be applicable to master. Could you please reopen it against master?

@bogdanPricope
Copy link
Contributor Author

@Bill-Fischofer-Linaro I suspect we cannot really run PAPI in Travis due to PAPI limitations when running in VM.
http://icl.cs.utk.edu/projects/papi/wiki/PAPITopics:PAPI_on_Virtualization_Platforms

See this change: bogdanPricope@e902b14

See this result:
https://travis-ci.org/bogdanPricope/odp/jobs/333705357

It seems we can display available counters (PAPI events) but we cannot monitor them.

@Bill-Fischofer-Linaro
Copy link
Contributor

@bogdanPricope It looks like you're getting output in the Raw Log:

travis_time:start:10ee4570
�[0K$ sudo $HOME/papi-install/bin/papi_avail
Available PAPI preset and user defined events plus hardware information.
--------------------------------------------------------------------------------
PAPI version             : 5.6.0.0
Operating system         : Linux 4.4.0-51-generic
Vendor string and code   : GenuineIntel (1, 0x1)
Model string and code    : Intel(R) Xeon(R) CPU @ 2.60GHz (45, 0x2d)
CPU revision             : 7.000000
CPUID                    : Family/Model/Stepping 6/45/7, 0x06/0x2d/0x07
CPU Max MHz              : 2600
CPU Min MHz              : 2600
Total cores              : 2
SMT threads per core     : 2
Cores per socket         : 1
Sockets                  : 1
Cores per NUMA region    : 2
NUMA regions             : 1
Running in a VM          : yes
VM Vendor                : KVMKVMKVM
Number Hardware Counters : 11
Max Multiplex Counters   : 384
Fast counter read (rdpmc): no
--------------------------------------------------------------------------------

================================================================================
  PAPI Preset Events
================================================================================
    Name        Code    Avail Deriv Description (Note)
PAPI_L1_DCM  0x80000000  Yes   No   Level 1 data cache misses
PAPI_L1_ICM  0x80000001  Yes   No   Level 1 instruction cache misses
PAPI_L2_DCM  0x80000002  Yes   Yes  Level 2 data cache misses
PAPI_L2_ICM  0x80000003  Yes   No   Level 2 instruction cache misses
PAPI_L3_DCM  0x80000004  No    No   Level 3 data cache misses
PAPI_L3_ICM  0x80000005  No    No   Level 3 instruction cache misses
PAPI_L1_TCM  0x80000006  Yes   Yes  Level 1 cache misses
PAPI_L2_TCM  0x80000007  Yes   No   Level 2 cache misses
PAPI_L3_TCM  0x80000008  Yes   No   Level 3 cache misses
PAPI_CA_SNP  0x80000009  No    No   Requests for a snoop
PAPI_CA_SHR  0x8000000a  No    No   Requests for exclusive access to shared cache line
PAPI_CA_CLN  0x8000000b  No    No   Requests for exclusive access to clean cache line
PAPI_CA_INV  0x8000000c  No    No   Requests for cache line invalidation
PAPI_CA_ITV  0x8000000d  No    No   Requests for cache line intervention
PAPI_L3_LDM  0x8000000e  No    No   Level 3 load misses
PAPI_L3_STM  0x8000000f  No    No   Level 3 store misses
PAPI_BRU_IDL 0x80000010  No    No   Cycles branch units are idle
PAPI_FXU_IDL 0x80000011  No    No   Cycles integer units are idle
PAPI_FPU_IDL 0x80000012  No    No   Cycles floating point units are idle
PAPI_LSU_IDL 0x80000013  No    No   Cycles load/store units are idle
PAPI_TLB_DM  0x80000014  Yes   Yes  Data translation lookaside buffer misses
PAPI_TLB_IM  0x80000015  Yes   No   Instruction translation lookaside buffer misses
PAPI_TLB_TL  0x80000016  No    No   Total translation lookaside buffer misses
PAPI_L1_LDM  0x80000017  Yes   No   Level 1 load misses
PAPI_L1_STM  0x80000018  Yes   No   Level 1 store misses
PAPI_L2_LDM  0x80000019  No    No   Level 2 load misses
PAPI_L2_STM  0x8000001a  Yes   No   Level 2 store misses
PAPI_BTAC_M  0x8000001b  No    No   Branch target address cache misses
PAPI_PRF_DM  0x8000001c  No    No   Data prefetch cache misses
PAPI_L3_DCH  0x8000001d  No    No   Level 3 data cache hits
PAPI_TLB_SD  0x8000001e  No    No   Translation lookaside buffer shootdowns
PAPI_CSR_FAL 0x8000001f  No    No   Failed store conditional instructions
PAPI_CSR_SUC 0x80000020  No    No   Successful store conditional instructions
PAPI_CSR_TOT 0x80000021  No    No   Total store conditional instructions
PAPI_MEM_SCY 0x80000022  No    No   Cycles Stalled Waiting for memory accesses
PAPI_MEM_RCY 0x80000023  No    No   Cycles Stalled Waiting for memory Reads
PAPI_MEM_WCY 0x80000024  No    No   Cycles Stalled Waiting for memory writes
PAPI_STL_ICY 0x80000025  Yes   No   Cycles with no instruction issue
PAPI_FUL_ICY 0x80000026  No    No   Cycles with maximum instruction issue
PAPI_STL_CCY 0x80000027  No    No   Cycles with no instructions completed
PAPI_FUL_CCY 0x80000028  No    No   Cycles with maximum instructions completed
PAPI_HW_INT  0x80000029  No    No   Hardware interrupts
PAPI_BR_UCN  0x8000002a  Yes   Yes  Unconditional branch instructions
PAPI_BR_CN   0x8000002b  Yes   No   Conditional branch instructions
PAPI_BR_TKN  0x8000002c  Yes   Yes  Conditional branch instructions taken
PAPI_BR_NTK  0x8000002d  Yes   No   Conditional branch instructions not taken
PAPI_BR_MSP  0x8000002e  Yes   No   Conditional branch instructions mispredicted
PAPI_BR_PRC  0x8000002f  Yes   Yes  Conditional branch instructions correctly predicted
PAPI_FMA_INS 0x80000030  No    No   FMA instructions completed
PAPI_TOT_IIS 0x80000031  No    No   Instructions issued
PAPI_TOT_INS 0x80000032  Yes   No   Instructions completed
PAPI_INT_INS 0x80000033  No    No   Integer instructions
PAPI_FP_INS  0x80000034  Yes   Yes  Floating point instructions
PAPI_LD_INS  0x80000035  Yes   No   Load instructions
PAPI_SR_INS  0x80000036  Yes   No   Store instructions
PAPI_BR_INS  0x80000037  Yes   No   Branch instructions
PAPI_VEC_INS 0x80000038  No    No   Vector/SIMD instructions (could include integer)
PAPI_RES_STL 0x80000039  No    No   Cycles stalled on any resource
PAPI_FP_STAL 0x8000003a  No    No   Cycles the FP unit(s) are stalled
PAPI_TOT_CYC 0x8000003b  Yes   No   Total cycles
PAPI_LST_INS 0x8000003c  No    No   Load/store instructions completed
PAPI_SYC_INS 0x8000003d  No    No   Synchronization instructions completed
PAPI_L1_DCH  0x8000003e  No    No   Level 1 data cache hits
PAPI_L2_DCH  0x8000003f  Yes   Yes  Level 2 data cache hits
PAPI_L1_DCA  0x80000040  No    No   Level 1 data cache accesses
PAPI_L2_DCA  0x80000041  Yes   No   Level 2 data cache accesses
PAPI_L3_DCA  0x80000042  Yes   Yes  Level 3 data cache accesses
PAPI_L1_DCR  0x80000043  No    No   Level 1 data cache reads
PAPI_L2_DCR  0x80000044  Yes   No   Level 2 data cache reads
PAPI_L3_DCR  0x80000045  Yes   No   Level 3 data cache reads
PAPI_L1_DCW  0x80000046  No    No   Level 1 data cache writes
PAPI_L2_DCW  0x80000047  Yes   No   Level 2 data cache writes
PAPI_L3_DCW  0x80000048  Yes   No   Level 3 data cache writes
PAPI_L1_ICH  0x80000049  No    No   Level 1 instruction cache hits
PAPI_L2_ICH  0x8000004a  Yes   No   Level 2 instruction cache hits
PAPI_L3_ICH  0x8000004b  No    No   Level 3 instruction cache hits
PAPI_L1_ICA  0x8000004c  No    No   Level 1 instruction cache accesses
PAPI_L2_ICA  0x8000004d  Yes   No   Level 2 instruction cache accesses
PAPI_L3_ICA  0x8000004e  Yes   No   Level 3 instruction cache accesses
PAPI_L1_ICR  0x8000004f  No    No   Level 1 instruction cache reads
PAPI_L2_ICR  0x80000050  Yes   No   Level 2 instruction cache reads
PAPI_L3_ICR  0x80000051  Yes   No   Level 3 instruction cache reads
PAPI_L1_ICW  0x80000052  No    No   Level 1 instruction cache writes
PAPI_L2_ICW  0x80000053  No    No   Level 2 instruction cache writes
PAPI_L3_ICW  0x80000054  No    No   Level 3 instruction cache writes
PAPI_L1_TCH  0x80000055  No    No   Level 1 total cache hits
PAPI_L2_TCH  0x80000056  No    No   Level 2 total cache hits
PAPI_L3_TCH  0x80000057  No    No   Level 3 total cache hits
PAPI_L1_TCA  0x80000058  No    No   Level 1 total cache accesses
PAPI_L2_TCA  0x80000059  Yes   Yes  Level 2 total cache accesses
PAPI_L3_TCA  0x8000005a  Yes   No   Level 3 total cache accesses
PAPI_L1_TCR  0x8000005b  No    No   Level 1 total cache reads
PAPI_L2_TCR  0x8000005c  Yes   Yes  Level 2 total cache reads
PAPI_L3_TCR  0x8000005d  Yes   Yes  Level 3 total cache reads
PAPI_L1_TCW  0x8000005e  No    No   Level 1 total cache writes
PAPI_L2_TCW  0x8000005f  Yes   No   Level 2 total cache writes
PAPI_L3_TCW  0x80000060  Yes   No   Level 3 total cache writes
PAPI_FML_INS 0x80000061  No    No   Floating point multiply instructions
PAPI_FAD_INS 0x80000062  No    No   Floating point add instructions
PAPI_FDV_INS 0x80000063  Yes   No   Floating point divide instructions
PAPI_FSQ_INS 0x80000064  No    No   Floating point square root instructions
PAPI_FNV_INS 0x80000065  No    No   Floating point inverse instructions
PAPI_FP_OPS  0x80000066  Yes   Yes  Floating point operations
PAPI_SP_OPS  0x80000067  Yes   Yes  Floating point operations; optimized to count scaled single precision vector operations
PAPI_DP_OPS  0x80000068  Yes   Yes  Floating point operations; optimized to count scaled double precision vector operations
PAPI_VEC_SP  0x80000069  Yes   Yes  Single precision vector/SIMD instructions
PAPI_VEC_DP  0x8000006a  Yes   Yes  Double precision vector/SIMD instructions
PAPI_REF_CYC 0x8000006b  Yes   No   Reference clock cycles
--------------------------------------------------------------------------------
Of 108 possible events, 50 are available, of which 17 are derived.


travis_time:end:10ee4570:start=1516974254068052334,finish=1516974254089141057,duration=21088723
�[0K
�[32;1mThe command "sudo $HOME/papi-install/bin/papi_avail" exited with 0.�[0m
travis_time:start:00b64900
�[0K$ sudo LD_LIBRARY_PATH="/usr/local/lib:$HOME/odp-papi-install/lib:$HOME/papi-install/lib:$LD_LIBRARY_PATH" LD_PRELOAD=libinstrum.so.0.0.0 $HOME/odp-papi-install/bin/odp_hello -c 1 -n 5
Setup Wrappers
HW time counter freq: 2600003840 hz

PKTIO: initialized ipc interface.
PKTIO: initialized loop interface.
PKTIO: initialized socket mmsg,use export ODP_PKTIO_DISABLE_SOCKET_MMSG=1 to disable.
PKTIO: initialized socket mmap, use export ODP_PKTIO_DISABLE_SOCKET_MMAP=1 to disable.
PKTIO: initialized pcap interface.
PAPI_add_events error: -7
Hello world from CPU 0!
Hello world from CPU 0!
Hello world from CPU 0!
Hello world from CPU 0!
Hello world from CPU 0!
Teardown Wrappers
Teardown Wrappers

travis_time:end:00b64900:start=1516974254097897686,finish=1516974259655111836,duration=5557214150
�[0K
�[32;1mThe command "sudo LD_LIBRARY_PATH="/usr/local/lib:$HOME/odp-papi-install/lib:$HOME/papi-install/lib:$LD_LIBRARY_PATH" LD_PRELOAD=libinstrum.so.0.0.0 $HOME/odp-papi-install/bin/odp_hello -c 1 -n 5" exited with 0.�[0m
travis_fold:start:cache.2
�[0Kstore build cache
travis_time:start:2dc46c7f
�[0K
travis_time:end:2dc46c7f:start=1516974259665062409,finish=1516974259672443392,duration=7380983
�[0Ktravis_time:start:0821c538
�[0K�[32;1mchanges detected, packing new archive�[0m
.
�[32;1muploading archive�[0m

travis_time:end:0821c538:start=1516974259680692331,finish=1516974268708094058,duration=9027401727
�[0Ktravis_fold:end:cache.2
�[0K
Done. Your build exited with 0.

Were you referring to the PAPI_add_events error: -7 message?

@bogdanPricope
Copy link
Contributor Author

@Bill-Fischofer-Linaro That is the output of papi tool ‘papi_avail’. This looks fine and is no surprise: ‘instrum’ library is also able to initialize PAPI (PAPI_library_init()) and validate availability of the requested PAPI events (PAPI_query_event()) (+ other PAPI calls).

My concern is about PAPI_add_events() that returns PAPI_ENOEVNT (-7):

https://linux.die.net/man/3/papi_add_events

“PAPI_ENOEVNT
The PAPI preset is not available on the underlying hardware.”

My best guess is related to this note:
‘However, virtualization does add a hardware abstraction layer. This prevents PAPI from directly reading the hardware PMU (Performance Monitoring Unit).

VMware and KVM both provide a virtual PMU given your configuration meets the requirements. This allows PAPI to function identically on the virtual machine guest operating system as on bare metal. Requirements are listed below.’ (http://icl.cs.utk.edu/projects/papi/wiki/PAPITopics:PAPI_on_Virtualization_Platforms)

My understanding is that travis test is running under KVM and requirements are:

‘Host must be running Linux Kernel 3.3 or higher.
Qemu-kvm must be version 1.2.5 or higher.
Intel CPU
VM must be booted with -cpu host’

@bogdanPricope
Copy link
Contributor Author

@lumag There is not decision to integrate this in Tiger Moth (master). I will be happy if LNG people will try to use it and signal if is useful, etc.

@muvarov muvarov changed the title [PATCH CATERPILLAR v2] Code instrumentation with PAPI library [PATCH CATERPILLAR v3] Code instrumentation with PAPI library Jan 29, 2018
@lumag
Copy link

lumag commented Jan 29, 2018

@bogdanPricope Merging patches from caterpillar branch to master will take time. We can benefit from PAPI there before merging it. Also it would be nice to have performance measurements during merge period.

@Bill-Fischofer-Linaro
Copy link
Contributor

I concur with @lumag's view that this can be helpful in guiding further merge/tuning decisions. Regarding KVM, do we know if we can get the prereq levels/configuration in Travis? Even with some limitations, it would seem limited PMU access would be beneficial. Something to discuss during tomorrow's call.

Bogdan Pricope added 2 commits January 31, 2018 10:00
Add configuration options to enable code instrumentation and set
PAPI installation folder.

Signed-off-by: Bogdan Pricope <bogdan.pricope@linaro.org>
Add instrumentation library as odp example.

Signed-off-by: Bogdan Pricope <bogdan.pricope@linaro.org>
Bogdan Pricope added 5 commits January 31, 2018 13:52
Use low level PAPI API to get performance counters.
Exemplify on some ODP APIs.

Signed-off-by: Bogdan Pricope <bogdan.pricope@linaro.org>
Configure the set of papi counters to be acquired via an
environment variable.

Signed-off-by: Bogdan Pricope <bogdan.pricope@linaro.org>
Add configure time option to trim API set to be instrumented.

Signed-off-by: Bogdan Pricope <bogdan.pricope@linaro.org>
Describe configuration and requirements of code instrumentation
library.

Signed-off-by: Bogdan Pricope <bogdan.pricope@linaro.org>
Add test to validate ODP build with papi library.

Signed-off-by: Bogdan Pricope <bogdan.pricope@linaro.org>
@muvarov muvarov changed the title [PATCH CATERPILLAR v3] Code instrumentation with PAPI library [PATCH CATERPILLAR v4] Code instrumentation with PAPI library Jan 31, 2018
@bogdanPricope bogdanPricope reopened this Feb 1, 2018
@bogdanPricope
Copy link
Contributor Author

Moved to master branch: #443

@bogdanPricope bogdanPricope deleted the cat_benchmark_pr branch August 21, 2018 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants