# Efficient Python Codes

In this lecture we will study ways to 

## Timing and profiling code

progress bar:

In [1]:
from time import sleep

from tqdm import tqdm, trange, tqdm_notebook

In [2]:
for i in tqdm(range(10)):
    
    sleep(.1)

100%|██████████| 10/10 [00:01<00:00,  9.83it/s]


In [3]:
def generator():
    
    for i in range(10):
        
        yield i

In [4]:
for i in tqdm(generator()):
    
    sleep(.1)

10it [00:01,  9.85it/s]


In [5]:
for i in tqdm(generator(), total=10):
    
    sleep(.1)

100%|██████████| 10/10 [00:01<00:00,  9.85it/s]


In [6]:
for i in trange(10):
    
    sleep(.1)

100%|██████████| 10/10 [00:01<00:00,  9.85it/s]


In [7]:
for i in tqdm_notebook(range(10)):
    
    sleep(.1)

HBox(children=(IntProgress(value=0, max=10), HTML(value='')))




In [8]:
import pandas as pd

In [9]:
tqdm.pandas()

  from pandas import Panel


In [10]:
amazon_fires = pd.read_csv('data/amazon.csv', encoding='latin1')

In [11]:
amazon_fires.head()

Unnamed: 0,Ano,Estado,Mês,Número,Período
0,1998,Acre,Janeiro,0.0,01-01-1998
1,1999,Acre,Janeiro,0.0,01-01-1999
2,2000,Acre,Janeiro,0.0,01-01-2000
3,2001,Acre,Janeiro,0.0,01-01-2001
4,2002,Acre,Janeiro,0.0,01-01-2002


In [12]:
amazon_fires.progress_apply(lambda x: '{} {}'.format(x['Estado'], x['Mês']), axis='columns')

100%|██████████| 6454/6454 [00:00<00:00, 13858.32it/s]


0             Acre Janeiro
1             Acre Janeiro
2             Acre Janeiro
3             Acre Janeiro
4             Acre Janeiro
               ...        
6449    Tocantins Dezembro
6450    Tocantins Dezembro
6451    Tocantins Dezembro
6452    Tocantins Dezembro
6453    Tocantins Dezembro
Length: 6454, dtype: object

## Timing a code

Magic commands or magic functions are one of the important enhancements that IPython offers compared to the standard Python shell. These magic commands are intended to solve common problems in data analysis using Python. In fact, they control the behaviour of IPython itself.

In [13]:
def good_practice(rand_array):
    bigger_than_fifties = [*rand_array[rand_array > 50]] # using masking, broadcasting and unpacking over an np.array
    return bigger_than_fifties

def bad_practice(rand_list):
    bigger_than_fifties = []
    for i in range(len(rand_list)):
        if rand_list[i] > 50:
            bigger_than_fifties.append(rand_list[i])
    return bigger_than_fifties

In [14]:
import numpy as np

rand_array = np.random.randint(100, size=1000)
rand_list = [*rand_array]

The Magic commands time and timeit 

In [15]:
%time a = bad_practice(rand_list)

CPU times: user 571 µs, sys: 51 µs, total: 622 µs
Wall time: 628 µs


In [16]:
%timeit a = bad_practice(rand_list)

491 µs ± 4.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [17]:
%time a = good_practice(rand_array)

CPU times: user 455 µs, sys: 15 µs, total: 470 µs
Wall time: 395 µs


In [18]:
%timeit -n 100 a = good_practice(rand_array)

107 µs ± 5.97 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [19]:
%%time

a=bad_practice(rand_list)
b=good_practice(rand_array)

CPU times: user 1.05 ms, sys: 0 ns, total: 1.05 ms
Wall time: 993 µs


In [20]:
%%timeit

a=bad_practice(rand_list)
b=good_practice(rand_array)

583 µs ± 4.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [21]:
%%timeit -n 100

a=bad_practice(rand_list)
b=good_practice(rand_array)

599 µs ± 22.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [22]:
print(good_practice)

<function good_practice at 0x7fbc2d2cf830>


In [23]:
# conda install -c anaconda line_profiler 

In [24]:
%prun bad_practice(rand_list)

 

In [25]:
%prun good_practice(rand_array)

 

In [26]:
%load_ext line_profiler

In [27]:
%lprun good_practice(rand_array)

In [28]:
 %lprun -f good_practice good_practice(rand_array)

In [29]:
 %lprun -f bad_practice bad_practice(rand_list)

## Memory profiling

In [30]:
# conda install -c anaconda memory_profiler 

In [31]:
%load_ext memory_profiler

In [32]:
%memit good_practice(rand_array)

peak memory: 111.41 MiB, increment: 0.29 MiB


In [33]:
%memit bad_practice(rand_list)

peak memory: 111.41 MiB, increment: 0.00 MiB


In [34]:
%%memit

a=bad_practice(rand_list)
b=good_practice(rand_array)

peak memory: 111.67 MiB, increment: 0.01 MiB


In [35]:
%mprun -f good_practice good_practice(rand_array)

ERROR: Could not find file <ipython-input-13-0718842debb0>
NOTE: %mprun can only be used on functions defined in physical files, and not in the IPython environment.



In [36]:
%%file practices.py

def good_practice(rand_array):
    bigger_than_fifties = [*rand_array[rand_array > 50]] # using masking, broadcasting and unpacking over an np.array
    return bigger_than_fifties

def bad_practice(rand_list):
    bigger_than_fifties = []
    for i in range(len(rand_list)):
        if rand_list[i] > 50:
            bigger_than_fifties.append(rand_list[i])
    return bigger_than_fifties

Overwriting practices.py


In [37]:
from practices import good_practice, bad_practice

In [40]:
%mprun -f good_practice good_practice(rand_array)




In [41]:
%mprun -f bad_practice bad_practice(rand_list)




How to optimize? AVOID LOOPS AND CONDITIONS!

1. Prefer numpy arrays, pandas apply, itertools and collections.
2. Try list comprehensions.
3. Write better loops.

## Numba

Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN.

You don't need to replace the Python interpreter, run a separate compilation step, or even have a C/C++ compiler installed. Just apply one of the Numba decorators to your Python function, and Numba does the rest. 

In [43]:
from numba import jit
import random

In [44]:
@jit(nopython=True)
def monte_carlo_pi(nsamples):
    acc = 0
    for i in range(nsamples):
        x = random.random()
        y = random.random()
        if (x ** 2 + y ** 2) < 1.0:
            acc += 1
    return 4.0 * acc / nsamples

In [49]:
%%time

monte_carlo_pi(100000000)

CPU times: user 2.57 s, sys: 15.4 ms, total: 2.59 s
Wall time: 2.57 s


3.14158088

In [50]:
x = random.uniform(size=100000000)
y = random.uniform(size=100000000)



TypeError: random() takes no keyword arguments