# Hotspot Calculations for HEP

Author: Brain Gravelle (gravelle@cs.uoregon.edu)


All this is using the taucmdr python libraries from paratools
http://taucommander.paratools.com/


## Imports

In [1]:
from utilities import *
from metrics import *

## Getting the data from the tau profiles.

You need a .tau directory in the same dir as this ipynb with all the relavant data and xml stuff that they put in there to make this thing work.

The results of the next box is a dict with keys as the metrics and the data in pandas. It should also print a list of the metrics.

The functions that import the data are found in utilities.py

Note:
* make sure that application and experiment are defined correctly
    * use 'tau dash' in the project folder to check options
* ALL trials from the experiment will be read. only one set of results will be kept per metric
* meta data about the machine and such is stored under the key METADATA

Available experiments:
* multi - data based on a toy version of the program.
* realistic  - data based on a run of the program with the TT35PU... input file, 10 threads, and 100 events
* event_scaling_TT35  - data based on a run of the program with the TT35PU... input file, 10 threads, and events ranging from 10 to 100
* TT70  - data based on a run of the program with the TT70PU... input file, 10 threads, and 100 events
* event_scaling_TT35  - data based on a run of the program with the TT70PU... input file, 10 threads, and events ranging from 10 to 100
* note that the event scaling runs don't have a function to properly load the data yet


In [2]:
application = "mictest_sampling"
experiment  = "TTbar70"

path = ".tau/" + application + "/" + experiment + "/"
expr_intervals = get_pandas(path)

#level_inds = {'trial': 0, 'rank': 1, 'context': 2, 'thread': 3, 'region': 4}
print(expr_intervals.keys())
print("")
print(expr_intervals['PAPI_TOT_INS'].columns)
print("")
print(expr_intervals['PAPI_TOT_INS'].index.names)
print("")
print_metadata(expr_intervals)

expr_intervals['PAPI_TOT_INS'].sort_values(by='Inclusive',ascending=False)[["Inclusive"]].head(10)

['PAPI_NATIVE_UOPS_RETIRED:SCALAR_SIMD', 'PAPI_L2_TCA', 'PAPI_NATIVE_LLC_MISSES', 'PAPI_RES_STL', 'PAPI_L2_TCM', 'PAPI_TOT_INS', 'PAPI_NATIVE_UOPS_RETIRED:PACKED_SIMD', 'PAPI_LST_INS', 'PAPI_NATIVE_LLC_REFERENCES', 'PAPI_L1_TCM', 'PAPI_TOT_CYC', 'METADATA']

Index([u'Calls', u'Subcalls', u'Exclusive', u'Inclusive', u'ProfileCalls'], dtype='object')

[u'rank', u'context', u'thread', u'region']

TAU_MAX_THREADS                                    512
TAU_CUDA_BINARY_EXE                                None
TAU_MEASURE_TAU                                    off
Memory Size                                        115370324 kB
TAU_TRACK_SIGNALS                                  off
TAU_TRACK_IO_PARAMS                                off
CPU MHz                                            1003.601
Local Time                                         2018-02-17T23:58:48-08:00
CPU Type                                           Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz
TAU_OUTPUT_CUDA_CSV               

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Inclusive
rank,context,thread,region,Unnamed: 4_level_1
0,0,0,[SUMMARY] Event::clean_cms_seedtracks(),2518060658
0,0,0,"[SUMMARY] std::__detail::_Mod_range_hashing::operator()(unsigned long, unsigned long) const",2039378190
0,0,5,[SUMMARY] syscall,1780321980
0,0,6,[SUMMARY] syscall,1734039953
0,0,7,[SUMMARY] syscall,1642961484
0,0,4,[SUMMARY] syscall,1582326153
0,0,1,[SUMMARY] syscall,1529560730
0,0,3,[SUMMARY] syscall,1523641615
0,0,9,[SUMMARY] syscall,1476745736
0,0,2,[SUMMARY] syscall,1451258013


In [None]:
application = "mictest_sampling"
experiment  = "TTbar70"

path = ".tau/" + application + "/" + experiment + "/"
e = get_pandas_multi_run(path)

In [14]:
print(len(e['PAPI_TOT_INS']))

e['PAPI_TOT_INS'][0].sort_values(by='Inclusive',ascending=False)[["Inclusive"]].head(10)

1


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Inclusive
rank,context,thread,region,Unnamed: 4_level_1
0,0,0,[SUMMARY] Event::clean_cms_seedtracks(),2518060658
0,0,0,"[SUMMARY] std::__detail::_Mod_range_hashing::operator()(unsigned long, unsigned long) const",2039378190
0,0,5,[SUMMARY] syscall,1780321980
0,0,6,[SUMMARY] syscall,1734039953
0,0,7,[SUMMARY] syscall,1642961484
0,0,4,[SUMMARY] syscall,1582326153
0,0,1,[SUMMARY] syscall,1529560730
0,0,3,[SUMMARY] syscall,1523641615
0,0,9,[SUMMARY] syscall,1476745736
0,0,2,[SUMMARY] syscall,1451258013


### Importing Scaling Data
No data yet, so don't use this

In [4]:
# scale_intervals = get_pandas_scaling('.tau/mictest_sampling/scaling/')

#level_inds = {'trial': 0, 'rank': 1, 'context': 2, 'thread': 3, 'region': 4}
# print(scale_intervals.keys())

# print(scale_intervals[10].keys())
# print(scale_intervals[10]['PAPI_TOT_INS'].columns)
# print(scale_intervals[10]['PAPI_TOT_INS'].index.names)

# scale_intervals[10]['PAPI_TOT_INS'][["Inclusive"]].head(10)

## Adding metrics

TODO figure this out
  - scale function
  - separate threads

These are functions that can add metrics to the dictionary

In [5]:
add_CPI(expr_intervals)
print(expr_intervals.keys())

expr_intervals['DERIVED_CPI'].head(10)

['PAPI_NATIVE_UOPS_RETIRED:SCALAR_SIMD', 'PAPI_L2_TCA', 'PAPI_NATIVE_LLC_MISSES', 'PAPI_RES_STL', 'PAPI_L2_TCM', 'PAPI_TOT_INS', 'PAPI_NATIVE_UOPS_RETIRED:PACKED_SIMD', 'DERIVED_CPI', 'PAPI_LST_INS', 'PAPI_NATIVE_LLC_REFERENCES', 'PAPI_L1_TCM', 'PAPI_TOT_CYC', 'METADATA']


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Calls,Exclusive,Inclusive,ProfileCalls,Subcalls
context,thread,region,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0,"[SUMMARY] (anonymous namespace)::MultHelixProp(Matriplex::Matriplex<float, 6, 6, 8> const&, Matriplex::MatriplexSym<float, 6, 8> const&, Matriplex::Matriplex<float, 6, 6, 8>&)",2.0,5.410783,5.410783,,
0,0,"[SUMMARY] (anonymous namespace)::MultHelixPropEndcap(Matriplex::Matriplex<float, 6, 6, 8> const&, Matriplex::MatriplexSym<float, 6, 8> const&, Matriplex::Matriplex<float, 6, 6, 8>&)",1.333333,2.828229,2.828229,,
0,0,"[SUMMARY] (anonymous namespace)::MultHelixPropTransp(Matriplex::Matriplex<float, 6, 6, 8> const&, Matriplex::Matriplex<float, 6, 6, 8> const&, Matriplex::MatriplexSym<float, 6, 8>&)",0.333333,0.863261,0.863261,,
0,0,"[SUMMARY] (anonymous namespace)::MultHelixPropTranspEndcap(Matriplex::Matriplex<float, 6, 6, 8> const&, Matriplex::Matriplex<float, 6, 6, 8> const&, Matriplex::MatriplexSym<float, 6, 8>&)",1.0,2.951659,2.951659,,
0,0,"[SUMMARY] (anonymous namespace)::sortCandListByHitsChi2(MkFinder::IdxChi2List const&, MkFinder::IdxChi2List const&)",0.666667,1.388162,1.388162,,
0,0,"[SUMMARY] CandCloner::ProcessSeedRange(int, int)",3.0,8.086814,8.086814,,
0,0,[SUMMARY] Event::clean_cms_seedtracks(),0.90099,1.611285,1.611285,,
0,0,"[SUMMARY] FMA(__m256 const&, __m256 const&, __m256 const&)",1.0,1.615924,1.615924,,
0,0,[SUMMARY] Hit::Hit(),1.0,1.43134,1.43134,,
0,0,[SUMMARY] Hit::mcHitID() const,1.0,2.519181,2.519181,,


### Metric Generation

gen_metric generates the boring bits of the metric adding function
* List the metrics you will use
* provide a name for the new metric
* paste into metrics.py
* implement the math bit

In [6]:
print(gen_metric(['PAPI_NATIVE_UOPS_RETIRED_PACKED_SIMD', 'PAPI_L1_TCM'], "VECTOR_PER_MISS"))

def add_VECTOR_PER_MISS(metrics):
	if (not metrics.has_key(PAPI_NATIVE_UOPS_RETIRED_PACKED_SIMD)):
		print 'ERROR adding VECTOR_PER_MISS to metric dictionary'
		return False	a0 = metrics[PAPI_NATIVE_UOPS_RETIRED_PACKED_SIMD].copy()
	a0.index = a0.index.droplevel()
	u0 = a0.unstack()
	if (not metrics.has_key(PAPI_L1_TCM)):
		print 'ERROR adding VECTOR_PER_MISS to metric dictionary'
		return False	a1 = metrics[PAPI_L1_TCM].copy()
	a1.index = a1.index.droplevel()
	u1 = a1.unstack()
	metrics[VECTOR_PER_MISS] = "PLEASE IMPLEMENT THIS PART"

	return True





## Interesting bits

This is where the stuff is actually calculated.

In [7]:
# levels: 0=trial, 1=node, 2=context, 3=thread, 4=region name -- deprecated
# levels: 0=rank, 1=context, 2=thread, 3=region name -- deprecated

       
n=10

def get_hotspots(metric):
    print('selected metric: %s\n' %metric)
    hotspots(expr_intervals[metric], n, 1)

    print('='*80)

    filtered_dfs = filter_libs_out(expr_intervals[metric])
    hotspots(filtered_dfs, n, 1)
    
get_hotspots('PAPI_TOT_CYC')



selected metric: PAPI_TOT_CYC

Hotspot Analysis Summary
The code regions with largest inclusive time are: 
1: [SUMMARY] Event::clean_cms_seedtracks()  (4057314016)
2: [SUMMARY] syscall  (3188799261)
3: [SUMMARY] std::__detail::_Mod_range_hashing::operator()(unsigned long, unsigned long) const  (2713533683)
4: [SUMMARY] _int_malloc  (2526790408)
5: [SUMMARY] MkFinder::SelectHitIndices(LayerOfHits const&, int)  (2215543058)
6: [SUMMARY] __svml_sincosf8_l9  (2126595723)
7: [SUMMARY] _int_free  (2027594045)
8: [SUMMARY] __GI___libc_malloc  (1814165933)
9: [SUMMARY] UNRESOLVED /storage/packages/intel/vtune_amplifier_xe_2017.5.0.526192/lib64/libstdc++.so.6.0.20 (1706600094)
10: [SUMMARY] __intel_mic_avx512f_memset  (1666113549)
Hotspot Analysis Summary
The code regions with largest inclusive time are: 
1: [SUMMARY] Event::clean_cms_seedtracks()  (4057314016)
2: [SUMMARY] _int_malloc  (2526790408)
3: [SUMMARY] MkFinder::SelectHitIndices(LayerOfHits const&, int)  (2215543058)
4: [SUMMARY] __sv

In [8]:
get_hotspots('DERIVED_CPI')

selected metric: DERIVED_CPI

Hotspot Analysis Summary
The code regions with largest inclusive time are: 
1: [SUMMARY] _ZN9__gnu_cxx13new_allocatorIN8MkFinder11IdxChi2ListEE9constructIS2_IRKS2_EEEvPT_DpOT0_  (213.944798734)
2: [SUMMARY] CandCloner::add_cand(int, MkFinder::IdxChi2List const&)  (167.911727744)
3: [SUMMARY] CandCloner::ProcessSeedRange(int, int)  (152.986385326)
4: [SUMMARY] __gnu_cxx::new_allocator<std::pair<int, int> >::allocate(unsigned long, void const*)  (67.0419350112)
5: [SUMMARY] Matriplex::Matriplex<float, 6, 1, 8>::CopyIn(int, float const*)  (51.053994364)
6: [SUMMARY] Matriplex::Matriplex<float, 6, 1, 8>::CopyOut(int, float*) const  (36.4479128446)
7: [SUMMARY] Matriplex::Matriplex<float, 6, 1, 8>::operator=(Matriplex::Matriplex<float, 6, 1, 8> const&)  (29.0821664127)
8: [SUMMARY] Matriplex::Matriplex<float, 1, 1, 8>::operator()(int, int, int)  (18.3002056245)
9: [SUMMARY] Matriplex::Matriplex<float, 3, 1, 8>::ConstAt(int, int, int) const  (15.5222161589)
10: 

In [9]:
add_L1_missrate(expr_intervals)
get_hotspots('DERIVED_L1_MISSRATE')

selected metric: DERIVED_L1_MISSRATE

Hotspot Analysis Summary
The code regions with largest inclusive time are: 
1: [SUMMARY] __GI___sched_yield  (21.2451109411)
2: [SUMMARY] (anonymous namespace)::MultHelixPropTransp(Matriplex::Matriplex<float, 6, 6, 8> const&, Matriplex::Matriplex<float, 6, 6, 8> const&, Matriplex::MatriplexSym<float, 6, 8>&)  (9.00434493368)
3: [SUMMARY] Matriplex::Matriplex<int, 1, 1, 8>::operator()(int, int, int) const  (5.63956221722)
4: [SUMMARY] std::sqrt(float)  (1.65987949413)
5: [SUMMARY] LayerInfo::is_within_r_sensitive_region(float, float) const  (0.393683020968)
6: [SUMMARY] CandCloner::add_cand(int, MkFinder::IdxChi2List const&)  (0.344553093074)
7: [SUMMARY] propagateHelixToRMPlex(Matriplex::MatriplexSym<float, 6, 8> const&, Matriplex::Matriplex<float, 6, 1, 8> const&, Matriplex::Matriplex<int, 1, 1, 8> const&, Matriplex::Matriplex<float, 3, 1, 8> const&, Matriplex::MatriplexSym<float, 6, 8>&, Matriplex::Matriplex<float, 6, 1, 8>&, int, bool)  (0.26626

In [10]:
get_hotspots('PAPI_NATIVE_UOPS_RETIRED:SCALAR_SIMD')

selected metric: PAPI_NATIVE_UOPS_RETIRED:SCALAR_SIMD

Hotspot Analysis Summary
The code regions with largest inclusive time are: 
1: [SUMMARY] Event::clean_cms_seedtracks()  (329464998)
2: [SUMMARY] syscall  (80975056)
3: [SUMMARY] _int_malloc  (61421041)
4: [SUMMARY] std::__detail::_Mod_range_hashing::operator()(unsigned long, unsigned long) const  (41359694)
5: [SUMMARY] std::_Hashtable<int, std::pair<int const, int>, std::allocator<std::pair<int const, int> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_rehash_aux(unsigned long, std::integral_constant<bool, true>)  (39915075)
6: [SUMMARY] __GI___libc_malloc  (37976506)
7: [SUMMARY] std::_Hashtable<int, std::pair<int const, int>, std::allocator<std::pair<int const, int> >, std::__detail::_Select1st, std::equal_to<int>, std::hash<int>, std::__detail: