# Notebook to verify counters and stuff

Author: Brain Gravelle (gravelle@cs.uoregon.edu)


All this is using the taucmdr python libraries from paratools
http://taucommander.paratools.com/


## Imports
This section imports necessary libraies, the metrics.py and utilities.py files and sets up the window.

In [13]:
# A couple of scripts to set the environent and import data from a .tau set of results
from utilities import *
from metrics import *
# Plotting, notebook settings:
%matplotlib inline  
#plt.rcParams.update({'font.size': 16})
import numbers
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
pd.set_option('display.float_format', lambda x: '%.2e' % x)
pd.set_option('display.max_columns',100)
pd.set_option('max_colwidth', 70)

## Getting Data

TAU Commander uses TAU to run the application and measure it using runtime sampling techniques (similar to Intel VTune). Many customization options are available. For example, we may consider each function regardless of calling context, or we may decide to enable callpath profiling to see each context separately.

From the talapas_scaling application the following experiments are available. These use Talapas (with 28 thread Broadwell processors) and the build-ce (realistic) option for mkFit. The first six experiments use the --num-thr option to set the thread count which is intended to perform threading within the events. the last two add the --num-ev-thr option to set the event threads, so that all threads are used to process events in parallel and each event is processed by a single thread. 
* manual_scaling_Large_talapas		
* manual_scaling_Large_talapas_fullnode	
* manual_scaling_TTbar70_talapas		
* manual_scaling_TTbar70_talapas_fullnode
* manual_scaling_TTbar35_talapas
* manual_scaling_TTbar35_talapas_fullnode
* ev_thr_scaling_Large_talapas
* ev_thr_scaling_Large_talapas_fullnode

Additionally available in the cori_scaling application are the following. These were run on NERSC's Cori on the KNL with the default memory settings (quad - 1 NUMA domain, cache - MCDRAM as direct mapped cache). See http://www.nersc.gov/users/computational-systems/cori/running-jobs/advanced-running-jobs-options/ for more info on the KNL modes. Similar to the talapas scaling they use the build-ce option and threading within each event.
* manual_scaling_TTbar35


### Importing Scaling Data
Here we import the data. In this case we are using Cori data from the experiments with the threads working within each event using the TTbar35 file. Note that this box will take 10 or more minutes to run; please go enjoy a coffee while you wait.

In [14]:
# application = "talapas_scaling"
# experiment  = "manual_scaling_TTbar70_talapas"
# experiment  = "manual_scaling_Large_talapas"
# experiment = "ev_thr_scaling_Large_talapas"

# application = "cori_scaling"
# experiment  = "manual_scaling_TTbar35"

application = "cache_test"
experiment  = "L2"

path  = ".tau/" + application + "/" + experiment + "/"
# note that this function takes a long time to run, so only rerun if you must
metric_data  = get_pandas_scaling(path, callpaths=True)
# metric_data = remove_erroneous_threads(metric_data,  [1, 8, 16, 32, 48, 56])
# metric_data = remove_erroneous_threads(metric_data,  [1, 8, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256])

#### A list of metrics

In [15]:

add_metric_to_scaling_data(metric_data, add_L2_missrate)
print_available_metrics(metric_data,True)
print metric_data.keys()

PAPI_L2_TCA
DERIVED_L2_MISSRATE
PAPI_L2_TCM
[16]


#### Metric metadata

In [16]:
#print_metadata(metric_data[1])

In [57]:
L2_data = select_metric_from_scaling(metric_data, 'DERIVED_L2_MISSRATE')
L2_MR = filter_libs_out(L2_data[16]).sort_values(by='Inclusive',ascending=False)[["Inclusive"]]
L2_MR.head(200)

Unnamed: 0_level_0,Unnamed: 1_level_0,Inclusive
thread,region,Unnamed: 2_level_1
1,"[SUMMARY] Matriplex::MatriplexSym<float, 6, 8>::operator=(Matriplex::MatriplexSym<float, 6, 8> const&)",1.27e+02
1,"[SUMMARY] .TAU application => [CONTEXT] .TAU application => [SAMPLE] Matriplex::MatriplexSym<float, 6, 8>::operator=(Matriplex::MatriplexSym<float, 6, 8> const&)",1.27e+02
13,[SUMMARY] .TAU application => [CONTEXT] .TAU application => [SAMPLE] __lll_lock_wait_private,3.13e+01
13,[SUMMARY] __lll_lock_wait_private,3.13e+01
12,"[SUMMARY] .TAU application => [CONTEXT] .TAU application => [SAMPLE] RadixSort::Sort(float const*, unsigned int)",2.62e+01
12,"[SUMMARY] RadixSort::Sort(float const*, unsigned int)",2.62e+01
11,"[SUMMARY] ROOT::Math::SVector<float, 6u>::SVector(ROOT::Math::SVector<float, 6u> const&)",2.26e+01
11,"[SUMMARY] .TAU application => [CONTEXT] .TAU application => [SAMPLE] ROOT::Math::SVector<float, 6u>::SVector(ROOT::Math::SVector<float, 6u> const&)",2.26e+01
9,"[SUMMARY] .TAU application => [CONTEXT] .TAU application => [SAMPLE] helixAtZ(Matriplex::Matriplex<float, 6, 1, 8> const&, Matriplex::Matriplex<int, 1, 1, 8> const&, Matriplex::Matriplex<float, 6, 1, 8>&, Matriplex::Matriplex<float, 1, 1, 8> const&, Matriplex::Matriplex<float, 6, 6, 8>&, int, bool)",2.24e+01
9,"[SUMMARY] helixAtZ(Matriplex::Matriplex<float, 6, 1, 8> const&, Matriplex::Matriplex<int, 1, 1, 8> const&, Matriplex::Matriplex<float, 6, 1, 8>&, Matriplex::Matriplex<float, 1, 1, 8> const&, Matriplex::Matriplex<float, 6, 6, 8>&, int, bool)",2.24e+01


In [20]:
TCA_data = select_metric_from_scaling(metric_data, 'PAPI_L2_TCA')
TCA = filter_libs_out(TCA_data[16]).sort_values(by='Inclusive',ascending=False)[["Inclusive"]]
TCA.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Inclusive
context,thread,region,Unnamed: 3_level_1
0,0,[SUMMARY] .TAU application,311000000.0
0,0,[SUMMARY] .TAU application => [CONTEXT] .TAU application,311000000.0
0,0,[SUMMARY] __lll_lock_wait_private,94900000.0
0,0,[SUMMARY] .TAU application => [CONTEXT] .TAU application => [SAMPLE] __lll_lock_wait_private,94900000.0
0,0,[SUMMARY] .TAU application => [CONTEXT] .TAU application => [SAMPLE] __GI_madvise,94900000.0
0,0,[SUMMARY] __GI_madvise,94900000.0
0,6,[SUMMARY] .TAU application,27400000.0
0,6,[SUMMARY] .TAU application => [CONTEXT] .TAU application,27400000.0
0,14,[SUMMARY] .TAU application => [CONTEXT] .TAU application,27400000.0
0,14,[SUMMARY] .TAU application,27400000.0


In [24]:
TCM_data = select_metric_from_scaling(metric_data, 'PAPI_L2_TCM')
TCM = filter_libs_out(TCM_data[16]).sort_values(by='Inclusive',ascending=False)[["Inclusive"]]
TCM.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Inclusive
context,thread,region,Unnamed: 3_level_1
0,0,[SUMMARY] .TAU application,123000000.0
0,0,[SUMMARY] .TAU application => [CONTEXT] .TAU application,123000000.0
0,7,[SUMMARY] .TAU application,13800000.0
0,7,[SUMMARY] .TAU application => [CONTEXT] .TAU application,13800000.0
0,13,[SUMMARY] .TAU application,13500000.0
0,13,[SUMMARY] .TAU application => [CONTEXT] .TAU application,13500000.0
0,11,[SUMMARY] .TAU application => [CONTEXT] .TAU application,13500000.0
0,11,[SUMMARY] .TAU application,13500000.0
0,5,[SUMMARY] .TAU application => [CONTEXT] .TAU application,13300000.0
0,5,[SUMMARY] .TAU application,13300000.0


In [48]:
TCM_data[16][TCM_data[16].index.get_level_values('region').str.contains("__lll_lock_wait_private")]['Inclusive']

context  thread  region                                                                                         
0        1       [SUMMARY] .TAU application  => [CONTEXT] .TAU application  => [SAMPLE] __lll_lock_wait_private    6.92e+03
                 [SUMMARY] __lll_lock_wait_private                                                                 6.92e+03
         2       [SUMMARY] .TAU application  => [CONTEXT] .TAU application  => [SAMPLE] __lll_lock_wait_private    2.40e+04
                 [SUMMARY] __lll_lock_wait_private                                                                 2.40e+04
         5       [SUMMARY] .TAU application  => [CONTEXT] .TAU application  => [SAMPLE] __lll_lock_wait_private    3.69e+04
                 [SUMMARY] __lll_lock_wait_private                                                                 3.69e+04
         6       [SUMMARY] .TAU application  => [CONTEXT] .TAU application  => [SAMPLE] __lll_lock_wait_private    2.24e+04
                 [S

In [53]:
TCA_data[16][TCA_data[16].index.get_level_values('region').str.contains("applyMaterialEffects\(Matriplex::Matriplex")]['Inclusive']

context  thread  region                                                                                                                                                                                                                                                                   
0        2       [SUMMARY] .TAU application  => [CONTEXT] .TAU application  => [SAMPLE] applyMaterialEffects(Matriplex::Matriplex<float, 1, 1, 8> const&, Matriplex::Matriplex<float, 1, 1, 8> const&, Matriplex::MatriplexSym<float, 6, 8>&, Matriplex::Matriplex<float, 6, 1, 8>&, int)    6.70e+05
                 [SUMMARY] applyMaterialEffects(Matriplex::Matriplex<float, 1, 1, 8> const&, Matriplex::Matriplex<float, 1, 1, 8> const&, Matriplex::MatriplexSym<float, 6, 8>&, Matriplex::Matriplex<float, 6, 1, 8>&, int)                                                                 6.70e+05
         4       [SUMMARY] .TAU application  => [CONTEXT] .TAU application  => [SAMPLE] applyMaterialEffects(Matriplex::Matriplex

In [54]:
L2_data[16][L2_data[16].index.get_level_values('region').str.contains("applyMaterialEffects\(Matriplex::Matriplex")]['Inclusive']

thread  region                                                                                                                                                                                                                                                                   
4       [SUMMARY] .TAU application  => [CONTEXT] .TAU application  => [SAMPLE] applyMaterialEffects(Matriplex::Matriplex<float, 1, 1, 8> const&, Matriplex::Matriplex<float, 1, 1, 8> const&, Matriplex::MatriplexSym<float, 6, 8>&, Matriplex::Matriplex<float, 6, 1, 8>&, int)    7.18e-01
        [SUMMARY] applyMaterialEffects(Matriplex::Matriplex<float, 1, 1, 8> const&, Matriplex::Matriplex<float, 1, 1, 8> const&, Matriplex::MatriplexSym<float, 6, 8>&, Matriplex::Matriplex<float, 6, 1, 8>&, int)                                                                 7.18e-01
6       [SUMMARY] .TAU application  => [CONTEXT] .TAU application  => [SAMPLE] applyMaterialEffects(Matriplex::Matriplex<float, 1, 1, 8> const&, Matriplex::