# Profilers  

27.4. The Python Profilers

https://docs.python.org/3/library/profile.html

### Contents

We will see the two kinds of profiling

* **A  Rapidly Perform Profiling**


* **B  Profiling in Python Module**

and give the methods for 

* **C improving the performance**



## 27.4.1. Introduction to the profilers

**A profile is a set of statistics** that describes

*  **how often** and for **how long** various **parts** of the program **executed**.



These statistics can be `formatted` into **reports** via the **pstats** module.

The Python standard library provides **two** different implementations of the same profiling interface:

* **cProfile** 

* **profile** 

provide deterministic profiling of Python programs. 

**1 cProfile** 

`cProfile` is recommended for most users

It’s a `C` extension with reasonable `overhead` that makes it suitable for profiling long-running programs. 
  
**2 profile**

`profile` is a pure `Python` module whose interface is imitated by cProfile, adds `significant overhead` to profiled programs.
   
If you’re trying to extend the profiler in some way, the task might be `easier` with this module. 
   

### Profilers: Performance analysis of Python programs.

* **cProfile or profile**: the `raw` profiling data  

* **pstats**:  `manipulating` and `printing` data in the `raw` profiling results file



### A  `Rapidly` Perform Profiling

#### 27.4.2. Instant User’s Manual: `rapidly` perform profiling

This section is provided for users that “don’t want to read the manual.” 

It provides a very `brief` overview, and allows a user to rapidly perform profiling on an existing application.

The most basic starting point in the profile module is 

```python
   cProfile.run(argument:a string statement)
```
It takes **a string statement** as argument, 

and creates a **report** of the time spent executing different lines of code while running the statement. 

### Example: To profile a function :`fib(n)`

This `recursive` version of a `Fibonacci` sequence calculator is especially useful for demonstrating the profile because the `performance` can be `improved significantly`. 


In [None]:
import cProfile

def fib(n):
    # https://en.wikipedia.org/wiki/Fibonacci_number
    # http://en.literateprograms.org/Fibonacci_numbers_(Python)
    if n == 0 or n == 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)

def fib_seq(n):
    
    seq = [ ]
    if n > 0:
        seq.extend(fib_seq(n-1))
    seq.append(fib(n))
    
    return seq
# Profiling
cProfile.run('print(fib_seq(10))')

#### <strong style="color:blue">1 The first line</strong>

**523 function calls (71 primitive calls) in 0.001 seconds**
   
* indicates that 
     
   * 523 calls were monitored.
   
   * Of those calls, 71 were **primitive*, meaning that the call was not induced via **recursion**. 

#### <strong style="color:blue">2 The next line</strong>

**`ordered` by: `standard name`** 
   
* indicates that: the text string in the `far right column` **filename of the module** was used to `sort` the output. 

**The column headings include:**
```
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3    0.000    0.000    0.000    0.000 iostream.py:93(_event_pipe)
        3    0.000    0.000    0.000    0.000 socket.py:333(send)
```
**`ordered` by `filename:lineno(function)` column**

#### <strong style="color:blue">3 The column headings in details</strong> 

* **ncalls**: the number of calls in the given function
   
```
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
453/11    0.000    0.000    0.000    0.000 <ipython-input-3-50b9aa81fe35>:3(fib)
```
   there are `two` numbers **453/11** in the the column,it means that the function **recursed**.
     
   The `second` value: **11** is the number of `primitive` calls 
   The former **453** : is the `total` number of calls. 
     
     Note that when the function does `not recurse`, these two values are the same, and only the `single figure` is printed

```
  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3    0.000    0.000    0.000    0.000 iostream.py:93(_event_pipe)
```

* **tottime**: the `total` time spent in the given function (and **excluding** time made in calls to `sub-functions`)


* **percall**: the `quotient` of tottime divided by **ncalls** :`tottime/ncalls`: the each call time spent in the given function 


* **cumtime** the `cumulative` time spent in **this `and` all subfunctions** (from invocation till exit). 
  

* **percall**: the `quotient` of cumtime divided by **primitive calls** : cumtime/primitive calls: the each call time spent in the given function and all subfunctions


* **filename:lineno(function)** provides the respective data of `each function`


## The results to a file
  
### 1 Save the results to a file
 
  * specifying a filename to the `run()` function:

In [None]:
import cProfile

#cProfile.run('re.compile("foo|bar|stats")')

#'re_stats' a filename for the run() function:
cProfile.run('print(fib_seq(10))','fib_stats')

In [None]:
!dir fib_*

### 2 pstats: `manipulating` and `printing` the data saved into a profile `results` file

**pstats**: saving and working with statistics
   
Reports of the raw profiling data from `run()` can be processed  separately with the `Stats` class from `pstats`: **pstats.stats**

The  **pstats.stats** class:  a variety of methods for manipulating and printing the data saved into a profile results file:  
  

* **strip_dirs()**：　removed the extraneous path from all the module names. 

* **sort_stats()**：　sorted all the entries according to the standard  <b>module/line/name</b> string that is printed.

```
        (module,line,name) : 0 module,1-line,2-name
 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3    0.000    0.000    0.000    0.000 socket.py:333(send)
        
    0 - module - socket.py

    1 - line - 333

    2 - name -send    
```


* **print_stats()**： method printed out all the statistics.

In [None]:
import pstats

p = pstats.Stats('fib_stats')

# 1 print  raw: Random listing order was used
#p.print_stats()

# 2 Ordered by:name
# sort_stats:filename:lineno(function) - (0-module,1-line,2-name) 
#p.sort_stats(2).print_stats()

# 3 removed the extraneous path from all the module names
p.strip_dirs().sort_stats('name').print_stats()

### The following are some interesting calls to experiment with:

### 1  understand `what algorithms` are taking time.

 This sorts the profile by `cumulative time` in a function, and then only prints `the ten most` significant lines<. 

* `cumulative` time -> understand `what algorithms` are taking time.(`cumulative`:time spent in the given function and all subfunctions)


In [None]:
p.sort_stats('cumulative').print_stats(10)


### 2 looking to see what `functions` were looping a lot and taking a lot of time:

To sort according to time spent within **each function**, and then print the statistics for the top ten functions.
   
* `totaltime` ->for the total time spent in the `given function`

In [None]:
p.sort_stats('time').print_stats(10)

###  3  sort all the statistics by file name, 

print out statistics for only the **fib\* ** methods


In [None]:
p.sort_stats('filename').print_stats('fib')

###  4 This line sorts statistics with 

* a **primary** key: internal time(tottime)
 
* a **secondary** key: cumulative time

and then prints out some of the statistics. 

To be specific, the `list` is first `culled down` to **50% `(.5)`** of its original `size`, 

then `only lines` containing **fib** are maintained and that **sub(%50)-sub(fib\*)-list** is printed.

In [None]:
p.sort_stats('time', 'cumulative').print_stats(.5,'fib')

## B Profiling in Python Module

The most `basic` starting point in the profile module is
```python
   cProfile.run(argument:a string statement)
```

**Example**: iapws.iapws97 

### 1 Profiling data into  `file`

In [None]:
import cProfile
import pstats

from iapws.iapws97 import IAPWS97

p1 = 16.10
t1 = 535.10
p2=3.56
t2=315

# 1 profiling 
pr = cProfile.Profile()

pr.enable()

steamin = IAPWS97(P=p1, T=t1)
steamout = IAPWS97(P=p2, T=t2)

pr.disable()

#  2 profiling data in file
filename="iapws97_stats"
profilingdatafile=open(filename, "w", encoding="utf-8")

sortby = 'cumulative'

ps = pstats.Stats(pr, stream=profilingdatafile).sort_stats(sortby)  

ps.print_stats()

profilingdatafile.close()

In [None]:
%load iapws97_stats

In [None]:
%load iapws97_stats

### 2 Profiling  data into memory-text


### 6.2. io — Core tools for working with streams

https://docs.python.org/3/library/io.html
    
The **io** module provides Python’s main facilities for dealing with `various types of I/O`.

There are three main types of I/O: 

* text I/O：Text I/O expects and produces `str` objects.

* binary I/O:(buffered I/O)expects bytes-like objects and produces `bytes` objects

* raw I/O: (called unbuffered I/O)is generally used as a low-level building-block for binary and text streams

**In-memory text streams** are also available as **StringIO** objects:   

In [None]:
import io

f = io.StringIO("some initial text data") # f:In-memory text streams 

f.getvalue()

#### Using `io.StringIO()` to profiling data into memory-text

In [None]:
import cProfile
import pstats

import io

from iapws.iapws97 import IAPWS97

p1 = 16.10
t1 = 535.10
p2=3.56
t2=315
# 1 profiling 
pr = cProfile.Profile()

pr.enable()

steamin = IAPWS97(P=p1, T=t1)
steamout = IAPWS97(P=p2, T=t2)

pr.disable()

# 2 profiling data in  In-memory text stream: profilingdata 

profilingdata = io.StringIO() # 1 : In-memory text streams:profilingdata 

sortby = 'cumulative'

ps = pstats.Stats(pr, stream=profilingdata).sort_stats(sortby) # 2 : Stats in In-memory text streams:profilingdata 

ps.print_stats()

print(profilingdata.getvalue()) # 3: get In-memory text streams

# 3 In-memory text stream to file
filename="iapws97_stats_memory_text"
datafile=open("iapws97_stats_memory_text", "w", encoding="utf-8")

print(profilingdata.getvalue(),file=datafile) # 3: get In-memory text streams

datafile.close()

In [None]:
%load iapws97_stats_memory_text

In [None]:
%load iapws97_stats_memory_text

## C Improve the Performance: memoization
 
```cProfile.run('print(fib_seq(20))')```

The standard report format shows a summary and then details for each function executed.

In [None]:
import cProfile

# Profiling
cProfile.run('print(fib_seq(20))')

* **fib_seq(10)**
```
513 function calls (61 primitive calls) in 0.000 seconds
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     11/1    0.000    0.000    0.000    0.000 <ipython-input-13-040ebd920202>:11(fib_seq)
   453/11    0.000    0.000    0.000    0.000 <ipython-input-13-040ebd920202>:3(fib)
```
* **fib_seq(20)**
```
  57381 function calls (91 primitive calls) in 0.022 seconds
  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     21/1    0.000    0.000    0.022    0.022 <ipython-input-13-040ebd920202>:11(fib_seq)
 57291/21    0.022    0.000    0.022    0.001 <ipython-input-13-040ebd920202>:3(fib)
```

**NOTE:**

**fib_seq(20):57381 function calls `>>>`fib_seq(10):513 function calls** 

* 1 the amount of function calls `increases` **exponentially** for increasing `values` of **n**

**57381 function calls >>> 91 primitive calls** 
 
* 2 `most of the time` here is spent calling `fib()` repeatedly. 


**Because**:  the function calls values that it has **already** calculated **again and again**.

```
f(4)->f(3),f(2)

f(3)->f(2),f(1)
```
![recursion_without_cache](./img/recursion_without_cache.png)

### `memoization`  to speed up a lot of my recursive algorithms: 

The easy way to optimize this would be to 

* cache the value  of fib(n) in a **dictionary** 

```python
{n:the value of fib(n)}
```

* then,check to see if that value of **n** has been called previously. 

  * If it has, return it’s value in the `dictionary`
  

  * if not, proceed to `call the function fib(n)`. 
  
This is **memoization**.

* https://en.wikipedia.org/wiki/Memoization

* http://avinashv.net/2008/04/python-decorators-syntactic-sugar/

In [None]:
class memoize:
    """
      from http://avinashv.net/2008/04/python-decorators-syntactic-sugar/
    """
    def __init__(self, function):
        self.function = function
       
        #　a dictionary, ｀self.memoized｀, that acts as our cache
        self.memoized = {} # key:value->n: the value of fib(n)

    def __call__(self, *args):
        try:
            return self.memoized[args]   
        except KeyError:
            self.memoized[args] = self.function(*args) # add new key:value to the dict
            return self.memoized[args]

### `__call__`

built-in **function call** operator.

In [None]:
class foo:
    def __init__(self,*args):
        print('__Init__',*args)
        
    def __call__(self, *args):
        print('__call__',*args)
        return args

f = foo(4,5,6)
# __call__ built-in function call operator.
f(1,2,3) 

There is now a `dictionary`, **self.memoized**, that acts as our **cache**, and a change in the **exception handling** that looks for **KeyError**, which throws an error if a key doesn’t exist in a dictionary. 

This class is **generalized** and will work for **any recursive function** that could benefit from `memoization`.

### The Memoize Decorator(装饰器)

**Decorators**  really came to the rescue in the form of **memoization**

We can add **a memoize decorator** to `reduce` the number of `recursive` calls 

In [None]:
import cProfile

@memoize
def fib(n):
    # https://en.wikipedia.org/wiki/Fibonacci_number
    # http://en.literateprograms.org/Fibonacci_numbers_(Python)
    if n == 0 or n == 1:
        return 1
    else:
        return fib(n-1) + fib(n-2)

def fib_seq(n):
    
    seq = [ ]
    if n > 0:
        seq.extend(fib_seq(n-1))
    seq.append(fib(n))
    
    return seq

# title
print('memoized Fibonacci \n','=' * 80)

# Profiling
cProfile.run('print(fib_seq(10))')

```
170 function calls (112 primitive calls) in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    59/21    0.000    0.000    0.000    0.000 <ipython-input-16-2cf0dfdc9e5d>:11(__call__)
     21/1    0.000    0.000    0.000    0.000 <ipython-input-17-d7b926d0b05b>:11(fib_seq)
       21    0.000    0.000    0.000    0.000 <ipython-input-17-d7b926d0b05b>:3(fib)
```

**A big impact on the performance** of this function.

```
57381 function calls (91 primitive calls) in 0.022 seconds
 
170 function calls (112 primitive calls) in 0.000 seconds
```

By remembering the <b>Fibonacci</b> value at each level we can `avoid` most of the `recursion`

the **ncalls** count for **fib()** shows that it `never` recurses.
```
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
21    0.000    0.000    0.000    0.000 <ipython-input-17-d7b926d0b05b>:3(fib)
```


In [None]:
%timeit fib(20) 

###  Decorators(装饰器)

A `decorator` is any `callable` Python object that is used to `modify` the definition of 

* **function**


* **method** 


* **class** 


A decorator is passed the **original object** being defined and `returns` a **modified object**


In [None]:
# square sum
def square_sum(a, b):
    return a**2 + b**2

# square diff
def square_diff(a, b):
    return a**2 - b**2

print(square_sum(3, 4))
print(square_diff(3, 4))

#### Add  print input

##### 1 modified the codes of defs directly

In [None]:
# modify: print input

# square sum
def square_sum(a, b):
    print("intput:", a, b)
    return a**2 + b**2

#  square diff
def square_diff(a, b):
    print("input", a, b)
    return a**2 - b**2


print(square_sum(3, 4))
print(square_diff(3, 4))


##### 2 Using decorator

In [None]:
def printinput(func):
    
    def new_func(a, b):
        print("input", a, b) # add print input to the origina func
        return func(a, b)
    
    return new_func

# square sum
@printinput
def square_sum(a, b):
    return a**2 + b**2

# square diff
@printinput
def square_diff(a, b):
    return a**2 - b**2

print(square_sum(3, 4))
print(square_diff(3, 4))

## D. Property and Decorator 
 
### Data encapsulation 

**Data encapsulation**: the bundling of data with the `methods` that `operate` on these data.

These `methods` are of course the 
    
* **getter** for `retrieving` the data and 
    
* **setter**  for `changing` the data

[8.1.2 Using Classes to Keep Track of Students and Faculty](./Unit3-1-08_CLASSES_AND_OBJECT-ORIENTED_PROGRAMMING.ipynb#8.1.2-Using-Classes-to-Keep-Track-of-Students-and-Faculty)

   * (2) One can then access information about these instances using the `methods` associated with them,e.g: `lastname`, **him.getLastName()**,not ** him.lastName**
    



### property class

Python has a great concept called **property** which makes the life of an object oriented programmer much simpler.

* https://docs.python.org/3/library/functions.html#property
```python
class property(fget=None, fset=None, fdel=None, doc=None)
```
Return a property attribute.

  * **fget**: a function for getting an attribute value.
  * **fset**: a function for setting an attribute value.
  * **fdel**:  a function for deleting an attribute value. 
  * **doc**: a docstring for the attribute.

A typical use is to define a managed attribute **x**:

In [1]:
class Cproperty:
    def __init__(self):
        self._x = None

    def getx(self):
        return self._x

    def setx(self, value):
        self._x = value

    def delx(self):
        del self._x
    
    x = property(getx, setx, delx, "I'm the 'x' property.")

In [2]:
c1=Cproperty()

c1.setx(1)

xvalue=c1.getx()

print(xvalue)

#print(c1.getx())
#print(c1._x)

#c1.delx()
#print(c1.getx())

1


In [3]:
c1=Cproperty()

c1.x=2

print(c1.x)

#del c1.x
print(c1.x)

2
2


#### Private Variables

A **single underscore(_)** before a name is used to specify that the name is to be **treated** as **`private` by a programmer.** 

It’s kind of  a `convention` so that the next person (or yourself) using your code knows that a name starting with **`_`** is for `internal` use


In [4]:
c1._x

2

**The suitable method：**

In [5]:
c1.x

2


> Reference Python Doc: 9.6. Private Variables https://docs.python.org/tutorial/classes.html#tut-private
>
>“Private” instance variables that cannot be accessed except from inside an object don’t exist in Python. However, there is a convention >that is followed by most Python code: a name prefixed with an underscore (e.g. _spam) should be treated as a non-public part of the API >(whether it is a function, a method or a data member). It should be considered an implementation detail and subject to change without notice.

#### property()  as a decorator

This makes it possible to create **read-only** `properties` easily using **property()** as **a decorator**

A property object has `getter, setter, and deleter` methods usable as `decorators `that create a copy of the property with the corresponding accessor function set to the decorated function. 

This is best explained with an example:

In [None]:
class Cproperty:
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        return self._x

    @x.setter
    def x(self, value):
        self._x = value

    @x.deleter
    def x(self):
        del self._x

In [None]:
c1=Cproperty()
c1.x=2
print(c1.x)

#del c1.x
#print(c1.x)

#### Example: The class `node`

the class `node` in [PyRankine](https://github.com/PySEE/PyRankine)

In [None]:
import seuif97 as if97

class node(object):
    """ water properites"""
    
    def __init__(self):
        self.p = None
        self.t = None
        self.x = None
        self.h = None
        self.s = None
        self.v = None
 
    def pt(self):
        self.h = if97.pt2h(self.p, self.t)
        self.s = if97.pt2s(self.p, self.t)
        self.v = if97.pt2v(self.p, self.t)
        self.x = if97.pt2x(self.p, self.t)

    def __str__(self):
        result=('{:6.3f}\t {:6.2f}\t {:7.2f}\t {:5.2f} \t {:6.3f}\t {:5.3}'.format
                (self.p,self.t,self.h,self.s,self.v, self.x))
        return  result       
   

In [None]:
n1=node()
n1.p=16
n1.t=535

n1.pt()
print(n1)

# t changed 
n1.t=600

n1.pt()
# t changed but you get the same h,s,... ,if call w1.pt()is ingored 

print(n1)


##### Using @property decorator to modify the class `node`

`@*.setter` :setting an attribute value.


In [None]:
import seuif97 as if97

class node(object):
    """ water properites"""
    
    def __init__(self):
        self._p = None
        self._t = None
        self._x = None
        self._h = None
        self._s = None
        self._v = None
    
    @property
    def p(self):
        return self._p

    @p.setter # setting P value.
    def p(self, value):
        self._p = value
        # calState
        self.calState()

    @property
    def t(self):
        return self._t

    @t.setter # setting t value.
    def t(self, value):
        self._t = value
        # calState
        self.calState()    
    
    @property
    def h(self):
        return self._h
    
    @property
    def s(self):
        return self._s
    
    @property
    def x(self):
        return self._x
    
    def pt(self):
        self._h = if97.pt2h(self._p, self._t)
        self._s = if97.pt2s(self._p, self._t)
        self._v = if97.pt2v(self._p, self._t)
        self._x = if97.pt2x(self._p, self._t)

   
    def calState(self):
        if self._p !=None and self._t!=None:
            self.pt()

    def __str__(self):
        result=('{:6.3f}\t {:6.2f}\t {:7.2f}\t {:5.2f} \t {:6.3f}\t {:5.3}'.format
                (self._p,self._t,self._h,self._s,self._v, self._x))
        return  result       
   

In [None]:
n1=node()
n1.p=16.
n1.t=535
print(n1)
print(n1.h)

# t changed 
n1.t=600
# got the right values(h,s,..) of (p,t)
print(n1)
print(n1.h)

## Furthe Reading 

### Python 3 Module of the Week:

profile and pstats — Performance Analysis

https://pymotw.com/3/profile/index.html  

