# Hotspot Calculations for HEP stuff

Author: Brain Gravelle (gravelle@cs.uoregon.edu)


All this is using the taucmdr python libraries from paratools
http://taucommander.paratools.com/


## Imports

In [1]:
from utilities import *
from metrics import *

## Getting the data from the tau profiles.

You need a .tau directory in the same dir as this ipynb with all the relavant data and xml stuff that they put in there to make this thing work.

The results of the next box is a dict with keys as the metrics and the data in pandas. It should also print a list of the metrics.

The functions that import the data are found in utilities.py

Note:
* make sure that application and experiment are defined correctly
    * use 'tau dash' in the project folder to check options
* ALL trials from the experiment will be read. only one set of results will be kept per metric

In [11]:
application = "mictest_sampling"
experiment  = "multi"

path = ".tau/" + application + "/" + experiment + "/"
expr_intervals = get_pandas(path)

#level_inds = {'trial': 0, 'rank': 1, 'context': 2, 'thread': 3, 'region': 4}
print(expr_intervals.keys())
print(expr_intervals['PAPI_TOT_INS'].columns)
print(expr_intervals['PAPI_TOT_INS'].index.names)

expr_intervals['PAPI_TOT_INS'].sort_values(by='Inclusive',ascending=False)[["Inclusive"]].head(10)


['PAPI_NATIVE_UOPS_RETIRED:SCALAR_SIMD', 'PAPI_L2_TCA', 'PAPI_NATIVE_LLC_MISSES', 'PAPI_RES_STL', 'PAPI_L2_TCM', 'PAPI_TOT_INS', 'PAPI_NATIVE_UOPS_RETIRED:PACKED_SIMD', 'PAPI_LST_INS', 'PAPI_L1_TCM', 'PAPI_TOT_CYC', 'PAPI_NATIVE_LLC_REFERENCES']
Index([u'Calls', u'Subcalls', u'Exclusive', u'Inclusive', u'ProfileCalls'], dtype='object')
[u'rank', u'context', u'thread', u'region']


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Inclusive
rank,context,thread,region,Unnamed: 4_level_1
0,0,2,[SUMMARY] syscall,818676966
0,0,0,[SUMMARY] __read_nocancel,814210144
0,0,5,[SUMMARY] syscall,753760296
0,0,0,"[SUMMARY] make_validation_tree(char const*, std::vector<Track, std::allocator<Track> >&, std::vector<Track, std::allocator<Track> >&)",709041871
0,0,8,[SUMMARY] syscall,596480719
0,0,7,[SUMMARY] syscall,577221647
0,0,1,[SUMMARY] syscall,560194742
0,0,6,[SUMMARY] syscall,557631241
0,0,9,[SUMMARY] syscall,414965719
0,0,3,[SUMMARY] syscall,405237366


### Importing Scaling Data
No data yet, so don't use this

In [12]:
# scale_intervals = get_pandas_scaling('.tau/mictest_sampling/scaling/')

#level_inds = {'trial': 0, 'rank': 1, 'context': 2, 'thread': 3, 'region': 4}
# print(scale_intervals.keys())

# print(scale_intervals[10].keys())
# print(scale_intervals[10]['PAPI_TOT_INS'].columns)
# print(scale_intervals[10]['PAPI_TOT_INS'].index.names)

# scale_intervals[10]['PAPI_TOT_INS'][["Inclusive"]].head(10)

## Adding metrics

TODO figure this out
  - region has the info on where the sample was taken
  - use those to do the division
  - separate threads
  - basically you should just be able to do math
  - need to ignore the trials when combining

These are functions that can add metrics to the dictionary

In [13]:
add_CPI(expr_intervals)
print(expr_intervals.keys())

expr_intervals['DERIVED_CPI'].head(10)

['PAPI_NATIVE_UOPS_RETIRED:SCALAR_SIMD', 'PAPI_L2_TCA', 'PAPI_NATIVE_LLC_MISSES', 'PAPI_RES_STL', 'PAPI_L2_TCM', 'PAPI_TOT_INS', 'PAPI_NATIVE_UOPS_RETIRED:PACKED_SIMD', 'DERIVED_CPI', 'PAPI_LST_INS', 'PAPI_L1_TCM', 'PAPI_TOT_CYC', 'PAPI_NATIVE_LLC_REFERENCES']


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Calls,Exclusive,Inclusive,ProfileCalls,Subcalls
context,thread,region,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0,"[SUMMARY] (anonymous namespace)::MultHelixProp(Matriplex::Matriplex<float, 6, 6, 16> const&, Matriplex::MatriplexSym<float, 6, 16> const&, Matriplex::Matriplex<float, 6, 6, 16>&)",1.0,1.260023,1.260023,,
0,0,[SUMMARY] Event::resetLayerHitMap(bool),1.3,1.55595,1.55595,,
0,0,[SUMMARY] Hit::Hit(),1.5,2.055617,2.055617,,
0,0,"[SUMMARY] Matriplex::MatriplexSym<float, 3, 16>::SlurpIn(char const*, __m512i&)",1.0,1.33131,1.33131,,
0,0,"[SUMMARY] Matriplex::MatriplexSym<float, 6, 16>::CopyOut(int, float*) const",1.0,1.160391,1.160391,,
0,0,"[SUMMARY] Matriplex::MatriplexSym<float, 6, 16>::Subtract(Matriplex::MatriplexSym<float, 6, 16> const&, Matriplex::MatriplexSym<float, 6, 16> const&)",1.0,1.190664,1.190664,,
0,0,"[SUMMARY] MkFitter::SlurpInTracksAndHits(std::vector<Track, std::allocator<Track> > const&, std::vector<std::vector<Hit, std::allocator<Hit> >, std::allocator<std::vector<Hit, std::allocator<Hit> > > > const&, int, int)",1.0,1.156051,1.156051,,
0,0,"[SUMMARY] ROOT::Math::SVector<float, 3u>::SVector()",0.2,0.252146,0.252146,,
0,0,"[SUMMARY] ROOT::Math::SVector<float, 6u>::SVector()",0.875,1.129869,1.129869,,
0,0,[SUMMARY] __intel_mic_avx512f_memcpy,1.0,1.410934,1.410934,,


### Metric Generation

gen metric generates the boring bits of the metric adding function
* List the metrics you will use
* provide a name for the new metric
* paste into metrics.py
* implement the math bit

In [14]:
print(gen_metric(['PAPI_NATIVE_UOPS_RETIRED_PACKED_SIMD', 'PAPI_L1_TCM'], "VECTOR_PER_MISS"))

def add_VECTOR_PER_MISS(metrics):
	if (not metrics.has_key(PAPI_NATIVE_UOPS_RETIRED_PACKED_SIMD)):
		print 'ERROR adding VECTOR_PER_MISS to metric dictionary'
		return False	a0 = metrics[PAPI_NATIVE_UOPS_RETIRED_PACKED_SIMD].copy()
	a0.index = a0.index.droplevel()
	u0 = a0.unstack()
	if (not metrics.has_key(PAPI_L1_TCM)):
		print 'ERROR adding VECTOR_PER_MISS to metric dictionary'
		return False	a1 = metrics[PAPI_L1_TCM].copy()
	a1.index = a1.index.droplevel()
	u1 = a1.unstack()
	metrics[VECTOR_PER_MISS] = "PLEASE IMPLEMENT THIS PART"

	return True





## Interesting bits

This is where the stuff is actually calculated.

In [15]:
# levels: 0=trial, 1=node, 2=context, 3=thread, 4=region name -- deprecated
# levels: 0=rank, 1=context, 2=thread, 3=region name -- deprecated

       
n=10
def get_hotspots(metric):
    print('selected metric: %s\n' %metric)
    hotspots(expr_intervals[metric], n, 1)

    print('='*80)

    filtered_dfs = filter_libs_out(expr_intervals[metric])
    hotspots(filtered_dfs, n, 1)
    
get_hotspots('PAPI_TOT_CYC')



selected metric: PAPI_TOT_CYC

Hotspot Analysis Summary
The code regions with largest standard deviation are: 
1: [SUMMARY] __read_nocancel  (1301223522)
2: [SUMMARY] syscall  (1055965029)
3: [SUMMARY] std::vector<Hit, std::allocator<Hit> >::size() const  (486234193)
4: [SUMMARY] Event::resetLayerHitMap(bool)  (448552160)
5: [SUMMARY] make_validation_tree(char const*, std::vector<Track, std::allocator<Track> >&, std::vector<Track, std::allocator<Track> >&)  (425212633)
6: [SUMMARY] ROOT::Math::SVector<float, 6u>::SVector()  (247798970)
7: [SUMMARY] void _INTERNAL1f3c31c2::helixAtRFromIterativeCCS_impl<Matriplex::Matriplex<float, 6, 1, 16>, Matriplex::Matriplex<int, 1, 1, 16>, Matriplex::Matriplex<float, 6, 1, 16>, Matriplex::Matriplex<float, 1, 1, 16>, Matriplex::Matriplex<float, 6, 6, 16> >(Matriplex::Matriplex<float, 6, 1, 16> const&, Matriplex::Matriplex<int, 1, 1, 16> const&, Matriplex::Matriplex<float, 6, 1, 16>&, Matriplex::Matriplex<float, 1, 1, 16> const&, Matriplex::Matriplex<

In [16]:
get_hotspots('DERIVED_CPI')

selected metric: DERIVED_CPI

Hotspot Analysis Summary
The code regions with largest standard deviation are: 
1: [SUMMARY] Matriplex::MatriplexSym<float, 3, 16>::SlurpIn(char const*, __m512i&)  (7.18975553377)
2: [SUMMARY] Matriplex::Matriplex<float, 6, 1, 16>::CopyOut(int, float*) const  (6.6489529719)
3: [SUMMARY] Matriplex::MatriplexSym<float, 6, 16>::Subtract(Matriplex::MatriplexSym<float, 6, 16> const&, Matriplex::MatriplexSym<float, 6, 16> const&)  (5.45265132461)
4: [SUMMARY] std::sqrt(float)  (5.33063161979)
5: [SUMMARY] MkFitter::SlurpInTracksAndHits(std::vector<Track, std::allocator<Track> > const&, std::vector<std::vector<Hit, std::allocator<Hit> >, std::allocator<std::vector<Hit, std::allocator<Hit> > > > const&, int, int)  (5.16927036744)
6: [SUMMARY] Matriplex::Matriplex<int, 1, 1, 16>::operator()(int, int, int)  (4.65733899159)
7: [SUMMARY] (anonymous namespace)::MultHelixProp(Matriplex::Matriplex<float, 6, 6, 16> const&, Matriplex::MatriplexSym<float, 6, 16> const&, Mat

In [17]:
add_L1_missrate(expr_intervals)
get_hotspots('DERIVED_L1_MISSRATE')

selected metric: DERIVED_L1_MISSRATE

Hotspot Analysis Summary
The code regions with largest standard deviation are: 
1: [SUMMARY] void _INTERNAL1f3c31c2::helixAtRFromIterativeCCS_impl<Matriplex::Matriplex<float, 6, 1, 16>, Matriplex::Matriplex<int, 1, 1, 16>, Matriplex::Matriplex<float, 6, 1, 16>, Matriplex::Matriplex<float, 1, 1, 16>, Matriplex::Matriplex<float, 6, 6, 16> >(Matriplex::Matriplex<float, 6, 1, 16> const&, Matriplex::Matriplex<int, 1, 1, 16> const&, Matriplex::Matriplex<float, 6, 1, 16>&, Matriplex::Matriplex<float, 1, 1, 16> const&, Matriplex::Matriplex<float, 6, 6, 16>&, int, int, int, bool)  (0.224730558282)
2: [SUMMARY] _INTERNAL_27_______src_tbb_scheduler_cpp_dbc24cd9::__TBB_machine_pause(int)  (0.0612775331484)
3: [SUMMARY] __intel_mic_avx512f_memset  (0.0539007582363)
4: [SUMMARY] Matriplex::MatriplexSym<float, 3, 16>::SlurpIn(char const*, __m512i&)  (0.0532490464803)
5: [SUMMARY] __intel_mic_avx512f_memcpy  (0.0464728354125)
6: [SUMMARY] Matriplex::MatriplexSym<f

In [18]:
get_hotspots('PAPI_NATIVE_UOPS_RETIRED:SCALAR_SIMD')

selected metric: PAPI_NATIVE_UOPS_RETIRED:SCALAR_SIMD

Hotspot Analysis Summary
The code regions with largest standard deviation are: 
1: [SUMMARY] __read_nocancel  (336101904)
2: [SUMMARY] Event::resetLayerHitMap(bool)  (5005640)
3: [SUMMARY] make_validation_tree(char const*, std::vector<Track, std::allocator<Track> >&, std::vector<Track, std::allocator<Track> >&)  (4877635)
4: [SUMMARY] void std::__uninitialized_default_n_1<false>::__uninit_default_n<Hit*, unsigned long>(Hit*, unsigned long)  (4655578)
5: [SUMMARY] std::vector<Hit, std::allocator<Hit> >::size() const  (4235558)
6: [SUMMARY] ROOT::Math::SVector<float, 3u>::SVector()  (2695292)
7: [SUMMARY] std::vector<HitID, std::allocator<HitID> >::size() const  (1925263)
8: [SUMMARY] _IO_file_xsgetn  (1925188)
9: [SUMMARY] Matriplex::MatriplexSym<float, 3, 16>::SlurpIn(char const*, __m512i&)  (1540212)
10: [SUMMARY] ROOT::Math::SMatrix<float, 6u, 6u, ROOT::Math::MatRepSym<float, 6u> >::SMatrix()  (1540149)
Hotspot Analysis Summary
T