# Profiling BatchFlow code

A profile is a set of statistics that describes how often and for how long various parts of the program executed.

This notebooks shows how to profile various parts of BatchFlow: namely, pipelines and models.

In [1]:
import sys

sys.path.append("../../..")
from batchflow import B, V, W
from batchflow.opensets import MNIST
from batchflow.models.torch import ResNet18

In [2]:
dataset = MNIST()

To collect information about model training times (both on CPU and GPU), one must set `profile` option in the model configuration to `True`:

In [3]:
model_config = {
    'inputs/labels/classes': 10,
    'loss': 'ce',
    'profile': True,
}

In [4]:
pipeline = (dataset.train.p
                .init_variable('loss_history', [])
                .to_array(channels='first', dtype='float32')
                .multiply(multiplier=1/255., preserve_type=False)
                .init_model('dynamic', ResNet18,
                            'resnet', config=model_config)
                .train_model('resnet',
                             B.images, B.labels,
                             fetches='loss',
                             save_to=V('loss_history', mode='a'))
                )

To gather statistics about how long each action takes, we must set `profile` to `True` inside `run` call:

In [5]:
BATCH_SIZE = 64
N_ITERS = 50

pipeline.run(BATCH_SIZE, n_iters=N_ITERS, bar=True, profile=True,
                   bar_desc=W(V('loss_history')[-1].format('Loss is {:7.7}')))

Loss is 0.1197789: 100%|██████████| 50/50 [00:57<00:00,  1.02s/it]


<batchflow.batchflow.pipeline.Pipeline at 0x7efbb0ed8fd0>

# Pipeline profiling

Now, pipeline has attribute `profile_info`: this `DataFrame` holds collected information:

In [6]:
pipeline.profile_info.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,iter,total_time,pipeline_time,ncalls,tottime,cumtime,batch_id,start_time
action,id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
to_array,<built-in method _abc._abc_instancecheck>::/home/tsimfer/anaconda3/lib/python3.7/abc.py::137::__instancecheck__,1,0.108264,0.101738,64,0.000214,0.001107,139619470154496,1582117000.0
to_array,<built-in method _abc._abc_subclasscheck>::/home/tsimfer/anaconda3/lib/python3.7/abc.py::141::__subclasscheck__,1,0.108264,0.101738,57,0.00073,0.000887,139619470154496,1582117000.0
to_array,<built-in method _operator.index>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py::1557::normalize_axis_tuple,1,0.108264,0.101738,128,6.7e-05,6.7e-05,139619470154496,1582117000.0
to_array,<built-in method posix.sched_getaffinity>::../../../batchflow/batchflow/decorators.py::18::_workers_count,1,0.108264,0.101738,1,1.4e-05,1.4e-05,139619470154496,1582117000.0
to_array,<built-in method builtins.any>::../../../batchflow/batchflow/decorators.py::86::any_action_failed,1,0.108264,0.101738,1,8e-06,3.1e-05,139619470154496,1582117000.0


Note that there is a detailed information about exact methods that are called inside each of the actions. That is a lot of data which can give us precise understanding of parts of the code, that are our bottlenecks.

Columns of the `profile_info`:
- `action`, `iter`, `batch_id` and `start_time` are pretty self-explainable
- `id` allows to identify exact method with great details: it is a concatenation of `method_name`, `file_name`, `line_number` and `callee`
- `total_time` is a time taken by an action
- `pipeline_time` is `total_time` plus time of processing the profiling table at each iteration
- `tottime` is a time taken by a method inside action
- `cumtime` is a time take by a method and all of the methods that are called inside this method


More often than not, though, we don't need such granularity. Pipeline method `show_profile_info` makes some handy aggregations:

**Note:** by default, results are sorted on `total_time` or `tottime`, depending on level of details.

In [7]:
# timings for each action
pipeline.show_profile_info(per_iter=False, detailed=False)

Unnamed: 0_level_0,total_time,total_time,total_time,pipeline_time,pipeline_time,pipeline_time
Unnamed: 0_level_1,sum,mean,max,sum,mean,max
action,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
train_model,51.89039,1.037808,1.643089,51.642231,1.032845,1.635717
to_array,2.209763,0.044195,0.109565,1.967972,0.039359,0.101738
multiply,1.330352,0.026607,0.052029,1.167702,0.023354,0.049151


In [8]:
# for each action show 2 of the slowest methods, based on maximum `ncalls`
pipeline.show_profile_info(per_iter=False, detailed=True, sortby=('ncalls', 'max'), limit=2)

Unnamed: 0_level_0,Unnamed: 1_level_0,ncalls,ncalls,ncalls,tottime,tottime,tottime,cumtime,cumtime,cumtime
Unnamed: 0_level_1,Unnamed: 1_level_1,sum,mean,max,sum,mean,max,sum,mean,max
action,id,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
multiply,<built-in method builtins.isinstance>::../../../batchflow/batchflow/components.py::105::find_in_index,409600,8192.0,8192,0.040543,0.000811,0.001351,0.040543,0.000811,0.001351
multiply,<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index,204800,4096.0,4096,0.094183,0.001884,0.003194,0.094183,0.001884,0.003194
to_array,<built-in method builtins.isinstance>::../../../batchflow/batchflow/components.py::105::find_in_index,409600,8192.0,8192,0.058232,0.001165,0.003015,0.058232,0.001165,0.003015
to_array,<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index,204800,4096.0,4096,0.138339,0.002767,0.008334,0.138339,0.002767,0.008334
train_model,<method 'append' of 'list' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::641::parse_cpu_trace,1082936,21658.72,21686,0.101479,0.00203,0.004265,0.101479,0.00203,0.004265
train_model,<lambda>::~::0::<method 'sort' of 'list' objects>,541468,10829.36,10843,0.068769,0.001375,0.003867,0.068769,0.001375,0.003867


In [9]:
# timings for each action for each iter
pipeline.show_profile_info(per_iter=True, detailed=False,)

Unnamed: 0_level_0,Unnamed: 1_level_0,total_time,pipeline_time,batch_id
iter,action,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,train_model,1.643089,1.635717,139619470154496
1,to_array,0.108264,0.101738,139619470154496
1,multiply,0.021203,0.018426,139619470154496
2,train_model,1.248317,1.231082,139619470155168
2,to_array,0.033515,0.027055,139619470155168
2,multiply,0.029224,0.023848,139619470155168
3,train_model,0.945223,0.940927,139619456921048
3,to_array,0.059107,0.055046,139619456921048
3,multiply,0.030329,0.027060,139619456921048
4,train_model,1.127812,1.123748,139619470246688


In [10]:
# for each iter each action show 3 of the slowest methods, based on maximum `ncalls`
pipeline.show_profile_info(per_iter=True, detailed=True, sortby='tottime', limit=3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,ncalls,tottime,cumtime
iter,action,id,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,multiply,find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>,4096,0.009526,0.011940
1,multiply,<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos,64,0.001797,0.013737
1,multiply,<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index,4096,0.001663,0.001663
1,to_array,find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>,4096,0.044399,0.055748
1,to_array,<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos,64,0.008959,0.064707
1,to_array,<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index,4096,0.008334,0.008334
1,train_model,<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward,143,0.568831,0.568831
1,train_model,<method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward,1,0.547273,0.547273
1,train_model,<built-in method batch_norm>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py::1643::batch_norm,143,0.155990,0.155990
2,multiply,find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>,4096,0.010454,0.013116


# Model profiling

In [11]:
model = pipeline.m('resnet')

There is an `info` property that, unsurprisingly, shows a lot of interesting details regarding model itself or the training process:

In [12]:
model.info


##### Config:
{'benchmark': True,
 'body': {'encoder': {'blocks': {'base': <class 'batchflow.models.torch.blocks.ResBlock'>,
                                 'bottleneck': False,
                                 'downsample': [False, True, True, True],
                                 'filters': [64, 128, 256, 512],
                                 'layout': 'cnacn',
                                 'n_reps': [2, 2, 2, 2],
                                 'se': False},
                      'downsample': {'layout': 'p',
                                     'pool_size': 2,
                                     'pool_strides': 2},
                      'num_stages': 4,
                      'order': ['skip', 'block']}},
 'common': {'data_format': 'channels_first'},
 'decay': None,
 'device': None,
 'head': {'classes': 10,
          'dropout_rate': 0.4,
          'filters': 10,
          'layout': 'Vdf',
          'target_shape': (64,),
          'units': 10},
 'initial_block': {'filters'

As with pipeline, there is a `profile_info` attribute, as well as `show_profile_info` method. Depending on type of the used device (`CPU` or `GPU`)

In [13]:
# one row for every operation inside model; limit at 5 rows
model.show_profile_info(per_iter=False, limit=5)

Unnamed: 0_level_0,ncalls,ncalls,ncalls,CPU_tottime,CPU_tottime,CPU_tottime,CPU_cumtime,CPU_cumtime,CPU_cumtime,CPU_tottime_avg,CPU_tottime_avg,CPU_tottime_avg
Unnamed: 0_level_1,sum,mean,max,sum,mean,max,sum,mean,max,sum,mean,max
name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
mkldnn_convolution_backward,1000,20.0,20,25.109348,0.502187,0.75024,25.109348,0.502187,0.75024,1.255467,0.025109,0.037512
mkldnn_convolution,1000,20.0,20,4.376594,0.087532,0.143996,4.376594,0.087532,0.143996,0.21883,0.004377,0.0072
sqrt,3038,62.0,62,2.756476,0.056255,0.079583,2.756476,0.056255,0.079583,0.044459,0.000907,0.001284
native_batch_norm_backward,1000,20.0,20,1.811546,0.036231,0.067658,1.811546,0.036231,0.067658,0.090577,0.001812,0.003383
add_,10114,202.28,206,1.40872,0.028174,0.047941,1.40872,0.028174,0.047941,0.006846,0.000137,0.000233


In [14]:
# for each iteration show 3 of the slowest operations
model.show_profile_info(per_iter=True, limit=3)

Unnamed: 0_level_0,Unnamed: 1_level_0,ncalls,CPU_tottime,CPU_cumtime,CPU_tottime_avg
iter,name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,mkldnn_convolution_backward,20,0.486976,0.486976,0.024349
0,mkldnn_convolution,20,0.091332,0.091332,0.004567
0,native_batch_norm_backward,20,0.039047,0.039047,0.001952
1,mkldnn_convolution_backward,20,0.450588,0.450588,0.022529
1,mkldnn_convolution,20,0.084945,0.084945,0.004247
1,sqrt,62,0.079583,0.079583,0.001284
2,mkldnn_convolution_backward,20,0.481878,0.481878,0.024094
2,mkldnn_convolution,20,0.081743,0.081743,0.004087
2,sqrt,62,0.061165,0.061165,0.000987
3,mkldnn_convolution_backward,20,0.497236,0.497236,0.024862
