<div class="alert alert-info" style="background-color:#006a79; color:white; padding:0px 10px; border-radius:5px;"><h2 style='margin:10px 5px'>cProfile in Python</h2>
</div>


cProfile is a built-in python module that can perform profiling. It is the most commonly used profiler currently.
It gives:
* Total run time taken by the entire code.
* The time taken by each individual step. This allows you to compare and find which parts need optimization
* The number of times certain functions are being called.

In [1]:
import cProfile

cProfile provides a simple run() function which is sufficient for most cases. The syntax is cProfile.run(statement, filename=None, sort=-1)

In [2]:
import numpy as np
cProfile.run("20+10")

         3 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}




```ncalls:``` is the number of calls made. When there are two numbers (like 11/9 above), the function recurred. The first value is the total number of calls and the second value is the number of primitive or non-recursive calls.

```tottime:``` is the total time spent in the given function (excluding time made in calls to sub-functions).

```percall:``` is the quotient of tottime divided by ncalls.

```cumtime:``` is the cumulative time spent in this and all subfunctions. This figure is accurate even for recursive functions.

```percall:``` is the quotient of cumtime divided by primitive calls.

```filename:```lineno(function): provides the respective data of each function.

The ```run()``` function can accept two more arguments: a ```filename``` to write the results to a file instead of the stdout, and a ```sort``` argument that specifies how the output should be sorted.

Some of the common ones are ```cumulative``` (for cumulative time), ```time``` (for total time), and ```calls``` (for number of calls).

If you pass a filename and save the results, we may notice that the output is not human-readable. In this case, we need to use the pstats.Stats class to format the results.

**Profiling on a code that calls other functions**

In [4]:
def create_array():
  arr=[]
  for i in range(0,400000):
    arr.append(i)
def print_statement():
  print('Array created successfully')

def main():
  create_array()
  print_statement()
if __name__ == '__main__':
    cProfile.run('main()')

Array created successfully
         400031 function calls in 0.375 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.234    0.234    0.369    0.369 3911947080.py:1(create_array)
        1    0.000    0.000    0.000    0.000 3911947080.py:5(print_statement)
        1    0.006    0.006    0.375    0.375 3911947080.py:8(main)
        1    0.000    0.000    0.375    0.375 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 iostream.py:202(schedule)
        2    0.000    0.000    0.000    0.000 iostream.py:429(_is_master_process)
        2    0.000    0.000    0.000    0.000 iostream.py:448(_schedule_flush)
        2    0.000    0.000    0.000    0.000 iostream.py:518(write)
        1    0.000    0.000    0.000    0.000 iostream.py:90(_event_pipe)
        1    0.000    0.000    0.000    0.000 socket.py:543(send)
        1    0.000    0.000    0.000    0.000 threading.py:1102(_wait_for_tstate_lock)
   

This output clearly tells you that for i in range(0,400000) is the part where majority of time is spent.

**Profile class of cProfile**

Although using ```cProfile.run()``` can be sufficient in most cases, The Profile() class of cProfile gives you more precise control

In [5]:
# How to use Profile class of cProfile
def create_array():
  arr=[]
  for i in range(0,400000):
    arr.append(i)
    
def print_statement():
  print('Array created successfully')

def main():
  create_array()
  print_statement()
    
if __name__ == '__main__':
    import cProfile, pstats
    profiler = cProfile.Profile()
    profiler.enable()
    main()
    profiler.disable()
    stats = pstats.Stats(profiler).sort_stats('ncalls')
    stats.print_stats()

Array created successfully
         400029 function calls in 0.222 seconds

   Ordered by: call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   400000    0.081    0.000    0.081    0.000 {method 'append' of 'list' objects}
        2    0.000    0.000    0.000    0.000 C:\Users\Kashish.sukhwani\AppData\Local\Programs\Python\Python310\lib\site-packages\ipykernel\iostream.py:429(_is_master_process)
        2    0.000    0.000    0.000    0.000 C:\Users\Kashish.sukhwani\AppData\Local\Programs\Python\Python310\lib\site-packages\ipykernel\iostream.py:448(_schedule_flush)
        2    0.000    0.000    0.000    0.000 C:\Users\Kashish.sukhwani\AppData\Local\Programs\Python\Python310\lib\site-packages\ipykernel\iostream.py:518(write)
        2    0.000    0.000    0.000    0.000 {built-in method nt.getpid}
        2    0.000    0.000    0.000    0.000 {method '__exit__' of '_thread.RLock' objects}
        2    0.000    0.000    0.000    0.000 {method 'write' of


The ```pstats``` module can be used to manipulate the results collected by the profiler object. First, create an instance of the stats class using pstats.Stats. Next, use the Stats class to create a statistics object from a profile object through ```ps= pstats.Stats(profiler)```.
Now, to sort the output by ncalls, use the ```sort_stats()``` method as shown below. Finally to print the output, call the function ```print_stats()``` of stats object.

In [6]:
# Sort output by Cumulative time
if __name__ == '__main__':
    import cProfile, pstats
    profiler = cProfile.Profile()
    profiler.enable()
    main()
    profiler.disable()
    stats = pstats.Stats(profiler).sort_stats('cumtime')
    stats.print_stats()

Array created successfully
         400029 function calls in 0.252 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.004    0.004    0.252    0.252 C:\Users\Kashish.sukhwani\AppData\Local\Temp\ipykernel_14340\1392519505.py:10(main)
        1    0.155    0.155    0.247    0.247 C:\Users\Kashish.sukhwani\AppData\Local\Temp\ipykernel_14340\1392519505.py:2(create_array)
   400000    0.092    0.000    0.092    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 C:\Users\Kashish.sukhwani\AppData\Local\Temp\ipykernel_14340\1392519505.py:7(print_statement)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.print}
        2    0.000    0.000    0.000    0.000 C:\Users\Kashish.sukhwani\AppData\Local\Programs\Python\Python310\lib\site-packages\ipykernel\iostream.py:518(write)
        2    0.000    0.000    0.000    0.000 C:\Users\Kashish.sukhwani\AppData\Local\Pro

```enable()``` Start collecting profiling data.Only in cProfile.

```disable()``` Stop collecting profiling data.Only in cProfile.

```create_stats()``` Stop collecting profiling data and record the results internallyasthecurrent profile.

```print_stats(sort=-1)``` Create a Stats object based on the current profile and print the results to stdout.

```dump_stats(filename)```Write theresults ofthecurrent profiletofilename.

```run(cmd)``` Profilethe cmd via exec().

```runctx(cmd,globals,locals)``` Profile the cmd via exec() with the specified global and localenvironment.

```runcall(func,/,*args,**kwargs)``` Profilefunc(*args,**kwargs)



Use ```dump_stats()``` method to store it to any file by providing the path.

Use ```strip_dirs()``` to removes all leading path information from file names.

In [8]:
# Remove dir names
stats.strip_dirs()
stats.print_stats()

         400029 function calls in 0.252 seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 threading.py:553(is_set)
        1    0.000    0.000    0.000    0.000 threading.py:1102(_wait_for_tstate_lock)
        1    0.000    0.000    0.000    0.000 threading.py:1169(is_alive)
        1    0.000    0.000    0.000    0.000 1392519505.py:7(print_statement)
        1    0.004    0.004    0.252    0.252 1392519505.py:10(main)
        1    0.000    0.000    0.000    0.000 socket.py:543(send)
        1    0.155    0.155    0.247    0.247 1392519505.py:2(create_array)
        1    0.000    0.000    0.000    0.000 iostream.py:90(_event_pipe)
        1    0.000    0.000    0.000    0.000 iostream.py:202(schedule)
        2    0.000    0.000    0.000    0.000 iostream.py:429(_is_master_process)
        2    0.000    0.000    0.000    0.000 iostream.py:448(_schedule_flush)
        2    0.0

<pstats.Stats at 0x292aa38b3a0>

Observe the difference between the above and previous outputs. The above output is “random”. This is because, after a strip operation, the object has just been initialized and loaded.

In [4]:
import cProfile, pstats, io
from pstats import SortKey

pr = cProfile.Profile()
pr.enable()
for i in range(10):
    print(i*i)
pr.disable()
s = io.StringIO()
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.strip_dirs()
ps.print_stats()
print(s.getvalue())

0
1
4
9
16
25
36
49
64
81
         323 function calls in 0.000 seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    0.000    0.000 contextlib.py:82(__init__)
        2    0.000    0.000    0.000    0.000 contextlib.py:108(__enter__)
        2    0.000    0.000    0.000    0.000 contextlib.py:117(__exit__)
        2    0.000    0.000    0.000    0.000 contextlib.py:238(helper)
       21    0.000    0.000    0.000    0.000 threading.py:513(is_set)
       21    0.000    0.000    0.000    0.000 threading.py:1017(_wait_for_tstate_lock)
       21    0.000    0.000    0.000    0.000 threading.py:1071(is_alive)
        2    0.000    0.000    0.000    0.000 traitlets.py:533(get)
        2    0.000    0.000    0.000    0.000 traitlets.py:564(__get__)
        2    0.000    0.000    0.000    0.000 ipstruct.py:125(__getattr__)
        2    0.000    0.000    0.000    0.000 codeop.py:140(__call__)
        4  

```CUMULATIVE``` which is for the cumulative time spent in a function. the sorting criteria can be in the form of a SortKey enum (added in Python 3.7) or a string (i.e. using 'cumulative' instead of SortKey.CUMULATIVE is also valid). Finally, the results are created and printed to the standard output.

In [5]:
import cProfile, pstats, io

def profile(fnc):
    
    """A decorator that uses cProfile to profile a function"""
    
    def inner(*args, **kwargs):        
        pr = cProfile.Profile()
        pr.enable()
        retval = fnc(*args, **kwargs)
        pr.disable()
        s = io.StringIO()
        sortby = 'cumulative'
        ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
        ps.print_stats()
        print(s.getvalue())
        return retval
    return inner

In [6]:
import time
import random
@profile
def get_winning_numbers():
    random.seed()
    for i in range (0,10):
        time.sleep(1) 
        yield random.randint(1,10)

random.seed()
my_number = random.randint(1,10)
print ("my number is " + str(my_number))

for winning_number in get_winning_numbers():
    print(winning_number)
    if my_number == winning_number:
        print ("you win!")
        break

my number is 8
         1 function calls in 0.000 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



8
you win!


**Visualize Profiling**
A best tool available at the moment for visualizing data obtained by cProfile module is SnakeViz

In [9]:
# Installing the module
!pip install snakeviz

Collecting snakeviz
  Downloading snakeviz-2.1.1-py2.py3-none-any.whl (282 kB)
     ------------------------------------ 282.1/282.1 kB 414.4 kB/s eta 0:00:00
Installing collected packages: snakeviz
Successfully installed snakeviz-2.1.1



[notice] A new release of pip is available: 23.0 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [11]:
import random
# Simple function to print messages 
def print_msg():
    for i in range(10):
        print("Program completed")
# Generate random data
def generate():
    data = [random.randint(0, 99) for p in range(0, 1000)]
    return data
# Function to search 
def search_function(data):
    for i in data:
        if i in [100,200,300,400,500]:
            print("success")
def main():
    data=generate()
    search_function(data)
    print_msg()
%load_ext snakeviz
%snakeviz main()

The snakeviz extension is already loaded. To reload it, use:
  %reload_ext snakeviz
Program completed
Program completed
Program completed
Program completed
Program completed
Program completed
Program completed
Program completed
Program completed
Program completed
 
*** Profile stats marshalled to file 'C:\\Users\\KASHIS~1.SUK\\AppData\\Local\\Temp\\tmpexqwtsi8'.
Embedding SnakeViz in this document...


 **Profiling Linear Regression Model from scikit learn**

Regression problems are very commonly used for various predictive modeling problems. The below code is a standard Linear regression problem using the sklearn library. Let’s print the profiling reports for this code.

In [12]:
# Function performing linear regression on diabetes dataset
def regression():
    import numpy as np
    from sklearn import datasets, linear_model
    from sklearn.metrics import mean_squared_error, r2_score
    # Load the diabetes dataset
    diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)

    # Use only one feature
    diabetes_X = diabetes_X[:, np.newaxis, 2]
    
    # Split the data into training/testing sets
    diabetes_X_train = diabetes_X[:-20]
    diabetes_X_test = diabetes_X[-20:]
    
    # Split the targets into training/testing sets
    diabetes_y_train = diabetes_y[:-20]
    diabetes_y_test = diabetes_y[-20:]
    
    # Create linear regression object
    regr = linear_model.LinearRegression()
    
    # Train the model using the training sets
    regr.fit(diabetes_X_train, diabetes_y_train)
    
    # Make predictions using the testing set
    diabetes_y_pred = regr.predict(diabetes_X_test)
    
# Initialize profile class and call regression() function
profiler = cProfile.Profile()
profiler.enable()
regression()
profiler.disable()
stats = pstats.Stats(profiler).sort_stats('tottime')

# Print the stats report
stats.print_stats()


         606182 function calls (596812 primitive calls) in 48.307 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      129   35.500    0.275   35.501    0.275 {built-in method _imp.create_dynamic}
      641    8.565    0.013    8.565    0.013 {built-in method io.open_code}
     2674    1.474    0.001    1.474    0.001 {built-in method nt.stat}
      641    0.283    0.000    0.283    0.000 {built-in method marshal.loads}
      644    0.199    0.000    0.199    0.000 {method '__exit__' of '_io._IOBase' objects}
        2    0.148    0.074    0.148    0.074 {built-in method _ctypes.LoadLibrary}
     4328    0.139    0.000    0.205    0.000 <frozen importlib._bootstrap_external>:96(_path_join)
      678    0.101    0.000    0.101    0.000 {method 'read' of '_io.BufferedReader' objects}
    900/2    0.080    0.000   48.309   24.155 {built-in method builtins.exec}
      399    0.069    0.000    0.192    0.000 C:\Users\Kashish.su

<pstats.Stats at 0x292b878cc40>

In [13]:
%load_ext snakeviz
%snakeviz regression()

The snakeviz extension is already loaded. To reload it, use:
  %reload_ext snakeviz
 
*** Profile stats marshalled to file 'C:\\Users\\KASHIS~1.SUK\\AppData\\Local\\Temp\\tmph83u2lf2'.
Embedding SnakeViz in this document...


<div class="alert alert-info" style="background-color:#006a79; color:white; padding:0px 10px; border-radius:5px;"><h2 style='margin:10px 5px'>Line Profiler</h2>
</div>


line_profiler will profile the time individual lines of code take to execute
It help us to moniter small function and able to track python line by line code to know which line of code accupies more time and then we will decide what optimization technique we can use.

In [14]:
pip install line_profiler

Collecting line_profiler
  Downloading line_profiler-4.0.3-cp310-cp310-win_amd64.whl (83 kB)
     -------------------------------------- 83.2/83.2 kB 311.9 kB/s eta 0:00:00
Installing collected packages: line_profiler
Successfully installed line_profiler-4.0.3
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.0 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [15]:
import line_profiler
%load_ext line_profiler

In [18]:
from line_profiler import LineProfiler
import random

def do_stuff(numbers):
    s = sum(numbers)
    l = [numbers[i]/43 for i in range(len(numbers))]
    m = ['hello'+str(numbers[i]) for i in range(len(numbers))]

numbers = [random.randint(1,100) for i in range(1000)]
lp = LineProfiler()
lp_wrapper = lp(do_stuff)
lp_wrapper(numbers)
lp.print_stats()

Timer unit: 1e-07 s

Total time: 0.0018304 s

Could not find file C:\Users\Kashish.sukhwani\AppData\Local\Temp\ipykernel_14340\1333878777.py
Are you sure you are running this program from the same directory
that you ran the profiler from?
Continuing without the function's contents.

Line #      Hits         Time  Per Hit   % Time  Line Contents
     4                                           
     5         1        184.0    184.0      1.0  
     6         1       6954.0   6954.0     38.0  
     7         1      11166.0  11166.0     61.0  



**Adding Additional Functions to Profile**

We can add additional functions to be profiled as well. For example, if you had a second called function and you only wrap the calling function, you'll only see the profile results from the calling function.

In [19]:
from line_profiler import LineProfiler
import random

def do_other_stuff(numbers):
    s = sum(numbers)

def do_stuff(numbers):
    do_other_stuff(numbers)
    l = [numbers[i]/43 for i in range(len(numbers))]
    m = ['hello'+str(numbers[i]) for i in range(len(numbers))]

numbers = [random.randint(1,100) for i in range(1000)]
lp = LineProfiler()
lp_wrapper = lp(do_stuff)
lp_wrapper(numbers)
lp.print_stats()

Timer unit: 1e-07 s

Total time: 0.0009557 s

Could not find file C:\Users\Kashish.sukhwani\AppData\Local\Temp\ipykernel_14340\610329918.py
Are you sure you are running this program from the same directory
that you ran the profiler from?
Continuing without the function's contents.

Line #      Hits         Time  Per Hit   % Time  Line Contents
     7                                           
     8         1        124.0    124.0      1.3  
     9         1       3689.0   3689.0     38.6  
    10         1       5744.0   5744.0     60.1  



The above would only produce the following profile output for the calling function,we can add the additional called function to profile like this:

In [20]:
from line_profiler import LineProfiler
import random

def do_other_stuff(numbers):
    s = sum(numbers)

def do_stuff(numbers):
    do_other_stuff(numbers)
    l = [numbers[i]/43 for i in range(len(numbers))]
    m = ['hello'+str(numbers[i]) for i in range(len(numbers))]

numbers = [random.randint(1,100) for i in range(1000)]
lp = LineProfiler()
lp.add_function(do_other_stuff)   # add additional function to profile
lp_wrapper = lp(do_stuff)
lp_wrapper(numbers)
lp.print_stats()

Timer unit: 1e-07 s

Total time: 1.8e-05 s

Could not find file C:\Users\Kashish.sukhwani\AppData\Local\Temp\ipykernel_14340\991343514.py
Are you sure you are running this program from the same directory
that you ran the profiler from?
Continuing without the function's contents.

Line #      Hits         Time  Per Hit   % Time  Line Contents
     4                                           
     5         1        180.0    180.0    100.0  

Total time: 0.0016124 s

Could not find file C:\Users\Kashish.sukhwani\AppData\Local\Temp\ipykernel_14340\991343514.py
Are you sure you are running this program from the same directory
that you ran the profiler from?
Continuing without the function's contents.

Line #      Hits         Time  Per Hit   % Time  Line Contents
     7                                           
     8         1        229.0    229.0      1.4  
     9         1       6794.0   6794.0     42.1  
    10         1       9101.0   9101.0     56.4  

