# Writing Efficient Python Code
Instructor: Logan Thomas, [course linke](https://learn.datacamp.com/courses/writing-efficient-python-code)  
Note Taker: Paris Zhang on Thu, Aug 13, 2020

Course contents:
1. Foundations for efficiencies
2. Timing and profiling code (`%timeit`,`%%timeit`,`%lprun`,`%mprun`)
3. Gaining efficiencies (consider `map()`,`np.array()`,`set` > list comprehension > loop)
4. Basic pandas optimizations (`zip()`,`itertools`,`collections`)

## Chapter 1 - Foundations for efficiencies
### 1.1 Intro to Pythonic Code

Writing efcient Python code:
1. Minimal completion time (fastruntime)
2. Minimal resource consumption (small memory footprint)

In [None]:
# Non-Pythonic
doubled_numbers = []
for i in range(len(numbers)):
    doubled_numbers.append(numbers[i] * 2)

# Pythonic
doubled_numbers = [x * 2 for x in numbers]

**The Zen of Python by Tim Peters**

In [1]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


### 1.2 - Building with built-ins

The Python Standard Library
- Python 3.6 Standard Library
  + Part of every standard Python installation
- Built-in **types**
  + `list`, `tuple`, `set`, `dict`, and others
- Built-in **functions**
  + `print()`, `len()`, `range()`, `round()`, `enumerate()`, `map()`, `zip()`, and others
- Built-in **modules**
  + `os`, `sys`, `itertools`, `collections`, `math`, and others

Built-in function:
1. `range(start, stop, step)`
  + Unpacking into a list using `*`

In [4]:
# Create a new list of odd numbers from 1 to 11 by unpacking a range object
nums_list2 = [*range(1,12,2)]
print(nums_list2)

[1, 3, 5, 7, 9, 11]


2. `enumerate()`

In [5]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

# Rewrite the for loop to use enumerate
indexed_names = []
for i,name in enumerate(names):
    index_name = (i,name)
    indexed_names.append(index_name) 
print(indexed_names,"\n")

# Rewrite the above for loop using list comprehension
indexed_names_comp = [(i,name) for i,name in enumerate(names)]
print(indexed_names_comp,"\n")

# Unpack an enumerate object with a starting index of one
indexed_names_unpack = [*enumerate(names,1)]
print(indexed_names_unpack)

[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')] 

[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')] 

[(1, 'Jerry'), (2, 'Kramer'), (3, 'Elaine'), (4, 'George'), (5, 'Newman')]


3. `map()`
  + Applies a function over an object
  + `map()` with `lambda` (anonymous function)

In [2]:
nums = [1.5, 2.3, 3.4, 4.6, 5.0]
rnd_nums = map(round, nums)
print(list(rnd_nums))

[2, 2, 3, 5, 5]


In [6]:
sqrd_nums = map(lambda x: x ** 2, nums)
print(list(sqrd_nums))

[2.25, 5.289999999999999, 11.559999999999999, 21.159999999999997, 25.0]


In [8]:
# for loop
names_uppercase = []

for name in names:
  names_uppercase.append(name.upper())
print(names_uppercase,"\n")

# Using map()
names_map  = map(str.upper, names)
# Unpack names_map into a list
names_uppercase = [*names_map]
print(names_uppercase)

['JERRY', 'KRAMER', 'ELAINE', 'GEORGE', 'NEWMAN'] 

['JERRY', 'KRAMER', 'ELAINE', 'GEORGE', 'NEWMAN']


### 1.3 - Numerical Python, NumPy

Advantages over list:
* Homogeneous (onle one type of elements)
* Broadcasting

In [9]:
# Python lists don't support broadcasting
nums = [-2, -1, 0, 1, 2]
nums ** 2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [10]:
# For loop (inefficient option)
sqrd_nums = []
for num in nums:
    sqrd_nums.append(num ** 2)
print(sqrd_nums)

# List comprehension (better option but not best)
sqrd_nums = [num ** 2 for num in nums]
print(sqrd_nums)

# NumPy array broadcasting
import numpy as np

nums_np = np.array([-2, -1, 0, 1, 2])
nums_np ** 2

[4, 1, 0, 1, 4]
[4, 1, 0, 1, 4]


array([4, 1, 0, 1, 4])

* Indexing
  + list `nums[0][1]` = NumPy array `nums_np[0,1]`
  + list select a column `[row[0] for row in nums2]` = NumPy array `nums_np[:,0]`
* NumPy array boolean indexing
  + No boolean indexing for lists
  + List comprehension works but not efficient

In [12]:
nums = [-2, -1, 0, 1, 2]
nums_np = np.array(nums)
print(nums_np[nums_np > 0])
print("\n")

# For loop (inefficient option)
pos = []
for num in nums:
    if num > 0:
        pos.append(num)
print(pos)

# List comprehension (better option but not best)
pos = [num for num in nums if num > 0]
print(pos)

[1 2]


[1, 2]
[1, 2]


In [14]:
# Inspect the source code of DataCamp function welcome_guest()
# import inspect (built-in module)
# source = inspect.getsource(welcome_guest)
# print(source)

def welcome_guest(guest_and_time):
    """
    Returns a welcome string for the guest_and_time tuple.
    
    Args:
        guest_and_time (tuple): The guest and time tuple to create
            a welcome string for.
            
    Returns:
        welcome_string (str): A string welcoming the guest to Festivus.
        'Welcome to Festivus {guest}... You're {time} min late.'
    
    """
    guest = guest_and_time[0]
    arrival_time = guest_and_time[1]
    welcome_string = "Welcome to Festivus {}... You're {} min late.".format(guest,arrival_time)
    return welcome_string

In [15]:
arrival_times = [*range(10,60,10)]

arrival_times_np = np.array(arrival_times)
new_times = arrival_times_np - 3

# Use list comprehension and enumerate to pair guests to new times
guest_arrivals = [(names[i],time) for i,time in enumerate(new_times)]

# Map the welcome_guest function to each (guest,time) pair
welcome_map = map(welcome_guest, guest_arrivals)

guest_welcomes = [*welcome_map]
print(*guest_welcomes, sep='\n')

Welcome to Festivus Jerry... You're 7 min late.
Welcome to Festivus Kramer... You're 17 min late.
Welcome to Festivus Elaine... You're 27 min late.
Welcome to Festivus George... You're 37 min late.
Welcome to Festivus Newman... You're 47 min late.


## Chapter 2 - Timing and Profiling Code

### 2.1 - Examining runtime
1. Calculate runtime with IPython magic command `%timeit`
2. See all available magic commands `%lsmagic`
3. Documentation [link](https://ipython.readthedocs.io/en/stable/interactive/magics.html)

In [17]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%

| symbol | name | unit(s) |
|--------|------|---------|
| ns | nanosecond | $10^9$ |
| µs (us) | microsecond | $10^6$ |
| ms | millisecond | $10^3$ |
| s	 | second | $10^0$ |

* Using `%timeit` (line magic) or `%%timeit` (cell magic)
  + `-r` number of runs
  + `-n` number of loops
  + Default is **7 runs, 100000 loops**

In [18]:
import numpy as np

%timeit rand_nums = np.random.rand(1000)

8.77 µs ± 216 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [20]:
# Set number of runs to 2 (-r2)
# Set number of loops to 10 (-n10)
%timeit -r2 -n10 rand_nums = np.random.rand(1000)

The slowest run took 5.62 times longer than the fastest. This could mean that an intermediate result is being cached.
47.4 µs ± 33.1 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)


In [25]:
%%timeit -r3 -n10 # By convention cell magic start from the first line
nums = []
for x in range(10):
    nums.append(x)

1.63 µs ± 89.2 ns per loop (mean ± std. dev. of 3 runs, 10 loops each)


* Saving and inspecting output of `%timeit` - `o`

In [26]:
times = %timeit -o rand_nums = np.random.rand(1000)

8.74 µs ± 167 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [28]:
times.timings

[8.832129589991383e-06,
 8.579272820006736e-06,
 9.00303211001301e-06,
 8.658767759989132e-06,
 8.501389940010994e-06,
 8.721082629999727e-06,
 8.910087019994535e-06]

In [29]:
times.best

8.501389940010994e-06

In [30]:
times.worst

9.00303211001301e-06

| data structure | formal name | literal syntax |
|--------|------|---------|
| list | `list()` | `[]` |
| dictionary | `dict()`	| `{}` |
| tuple | `tuple()` | `()` |

* Comparing times: formal vs. literal syntax

In [31]:
f_time = %timeit -o -r5 -n100 formal_dict = dict()

137 ns ± 9.39 ns per loop (mean ± std. dev. of 5 runs, 100 loops each)


In [32]:
l_time = %timeit -o -r5 -n100 literal_dict = {}

62.2 ns ± 2.61 ns per loop (mean ± std. dev. of 5 runs, 100 loops each)


In [36]:
diff = (f_time.average - l_time.average) * (10**9)
print('l_time better than f_time by {} ns'.format(diff))

l_time better than f_time by 74.97800470446236 ns


In [37]:
%timeit -r5 -n100 formal_dict = dict()

169 ns ± 3.79 ns per loop (mean ± std. dev. of 5 runs, 100 loops each)


In [38]:
%timeit -r5 -n100 literal_dict = {}

63.1 ns ± 3.21 ns per loop (mean ± std. dev. of 5 runs, 100 loops each)


### 2.2 - Code profiling for runtime

Code profiling:
1. Line-by-line analyses
2. Package used: `line_profiler` - install by `pip install line_profiler`

In [39]:
heroes = ['Batman','Superman','Wonder Woman']
hts = np.array([188.0, 191.0, 183.0])
wts = np.array([ 95.0, 101.0, 74.0])

In [41]:
def convert_units(heroes, heights, weights):
    new_hts = [ht * 0.39370 for ht in heights]
    new_wts = [wt * 2.20462 for wt in weights]
    
    hero_data = {}
    
    for i,hero in enumerate(heroes):
        hero_data[hero] = (new_hts[i], new_wts[i])
    
    return hero_data

convert_units(heroes, hts, wts)

{'Batman': (74.01559999999999, 209.4389),
 'Superman': (75.19669999999999, 222.66661999999997),
 'Wonder Woman': (72.0471, 163.14188)}

In [42]:
%timeit convert_units(heroes, hts, wts)

3.27 µs ± 99.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [43]:
%load_ext line_profiler
%lprun -f convert_units convert_units(heroes, hts, wts)

### 2.3 - Code profiling for memory usage

* Similar to `line_profiler`, `memory_profiler` has to be loaded
* Memory profiler only examines function from a script, it can't examine function directly in IPython

Syntax:

In [None]:
%load_ext memory_profiler
from bmi_arrays import calc_bmi_arrays

%mprun -f calc_bmi_arrays calc_bmi_arrays(sample_indices,hts,wts)

**From the 2 examples above, usually, the NumPy method improves significantly on runtime, but memory usages are the same for both functions. (List comprehension versus numpy array operations.)**


## Chapter 3 - Gaining Efficiencies

### 3.1 - Efficiently combining, counting, and iterating
1. Combining objects with `zip()`, must unzip with `*` to return a list - `[*zip(names[:5], types[:3])]`
2. Counting with `Counter()`

The collections module:
* Notable:
  + `namedtuple` :tuple subclasses with named fields
  + `deque` : list-like container with fast appends and pops
  + **`Counter`** : dict for counting hashable objects
  + `OrderedDict` : dict that retains order of entries
  + `defaultdict` : dict that calls a factory function to supply missing values

In [47]:
# Counting with loop
# Each Pokémon's type (720 total)
poke_types = ['Grass','Dark','Fire','Fire','Bug', 'Fire', 'Ghost', 'Grass', 'Water']
type_counts = {}
for poke_type in poke_types:
    if poke_type not in type_counts:
        type_counts[poke_type] = 1
    else:
        type_counts[poke_type] += 1
print(type_counts)

{'Grass': 2, 'Dark': 1, 'Fire': 3, 'Bug': 1, 'Ghost': 1, 'Water': 1}


In [48]:
# collections.Counter()
from collections import Counter

type_counts = Counter(poke_types)
print(type_counts) # Ordered

Counter({'Fire': 3, 'Grass': 2, 'Dark': 1, 'Bug': 1, 'Ghost': 1, 'Water': 1})


3. Combinations using `combinations` from `itertools`

The `itertools` module
* Notable:
 + Innite iterators: `count` , `cycle` , `repeat`
 + Finite iterators: `accumulate` , `chain` , `zip_longest` , etc.
 + Combination generators: `product` , `permutations` , **`combinations`**

In [49]:
# Combinations with loop
combos = []
for x in poke_types:
    for y in poke_types:
        if x == y:
            continue
        if ((x,y) not in combos) & ((y,x) not in combos):
            combos.append((x,y))
print(combos)

[('Grass', 'Dark'), ('Grass', 'Fire'), ('Grass', 'Bug'), ('Grass', 'Ghost'), ('Grass', 'Water'), ('Dark', 'Fire'), ('Dark', 'Bug'), ('Dark', 'Ghost'), ('Dark', 'Water'), ('Fire', 'Bug'), ('Fire', 'Ghost'), ('Fire', 'Water'), ('Bug', 'Ghost'), ('Bug', 'Water'), ('Ghost', 'Water')]


In [50]:
# itertools.combinations()
from itertools import combinations

combos_obj = combinations(poke_types, 2)
print(type(combos_obj))

<class 'itertools.combinations'>


In [51]:
combos = [*combos_obj]
print(combos)

[('Grass', 'Dark'), ('Grass', 'Fire'), ('Grass', 'Fire'), ('Grass', 'Bug'), ('Grass', 'Fire'), ('Grass', 'Ghost'), ('Grass', 'Grass'), ('Grass', 'Water'), ('Dark', 'Fire'), ('Dark', 'Fire'), ('Dark', 'Bug'), ('Dark', 'Fire'), ('Dark', 'Ghost'), ('Dark', 'Grass'), ('Dark', 'Water'), ('Fire', 'Fire'), ('Fire', 'Bug'), ('Fire', 'Fire'), ('Fire', 'Ghost'), ('Fire', 'Grass'), ('Fire', 'Water'), ('Fire', 'Bug'), ('Fire', 'Fire'), ('Fire', 'Ghost'), ('Fire', 'Grass'), ('Fire', 'Water'), ('Bug', 'Fire'), ('Bug', 'Ghost'), ('Bug', 'Grass'), ('Bug', 'Water'), ('Fire', 'Ghost'), ('Fire', 'Grass'), ('Fire', 'Water'), ('Ghost', 'Grass'), ('Ghost', 'Water'), ('Grass', 'Water')]


In [46]:
pokemon = ['Geodude', 'Cubone', 'Lickitung', 'Persian', 'Diglett']

from itertools import combinations

combos_obj = combinations(pokemon, 2)
print(type(combos_obj), '\n')

# Convert combos_obj to a list by unpacking
combos_2 = [*combos_obj]
print(combos_2, '\n')

# Collect all possible combinations of 4 Pokémon directly into a list
combos_4 = [*combinations(pokemon,4)]
print(combos_4)

<class 'itertools.combinations'> 

[('Geodude', 'Cubone'), ('Geodude', 'Lickitung'), ('Geodude', 'Persian'), ('Geodude', 'Diglett'), ('Cubone', 'Lickitung'), ('Cubone', 'Persian'), ('Cubone', 'Diglett'), ('Lickitung', 'Persian'), ('Lickitung', 'Diglett'), ('Persian', 'Diglett')] 

[('Geodude', 'Cubone', 'Lickitung', 'Persian'), ('Geodude', 'Cubone', 'Lickitung', 'Diglett'), ('Geodude', 'Cubone', 'Persian', 'Diglett'), ('Geodude', 'Lickitung', 'Persian', 'Diglett'), ('Cubone', 'Lickitung', 'Persian', 'Diglett')]


### 3.2 - Set theory

* Branch of Mathematics applied to collections of objects
  + i.e., `set`s
* Python has built-in `set` datatype with accompanying methods:
  + `intersection()` : all elements that are in both sets
  + `difference()` : all elements in one set but not the other
  + `symmetric_difference()` : all elements in exactly one set
  + `union()` : all elements that are in either set
* Fast membership testing
  + Check if a value exists in a sequence or not
  + Using the `in` operator

1. Object comparisons with loops vs. sets
 + Usage cases

In [52]:
list_a = ['Bulbasaur','Charmander','Squirtle']
list_b = ['Caterpie','Pidgey','Squirtle']

# With loops
in_common = []

for pokemon_a in list_a:
    for pokemon_b in list_b:
        if pokemon_a == pokemon_b:
            in_common.append(pokemon_a)

print(in_common)

['Squirtle']


In [54]:
set_a = set(list_a)
set_b = set(list_b)

set_a.intersection(set_b) # items in both sets

{'Squirtle'}

In [55]:
set_a.difference(set_b) # items in the 1st set but not in the 2nd set

{'Bulbasaur', 'Charmander'}

In [56]:
set_a.symmetric_difference(set_b) # items in only one set but not the other

{'Bulbasaur', 'Caterpie', 'Charmander', 'Pidgey'}

In [57]:
set_a.union(set_b) # items in both sets

{'Bulbasaur', 'Caterpie', 'Charmander', 'Pidgey', 'Squirtle'}

 + Efciency gained with settheory

In [58]:
%%timeit -r5 -n100
in_common = []

for pokemon_a in list_a:
    for pokemon_b in list_b:
        if pokemon_a == pokemon_b:
            in_common.append(pokemon_a)

737 ns ± 15.8 ns per loop (mean ± std. dev. of 5 runs, 100 loops each)


In [59]:
%timeit -r5 -n100 in_common = set_a.intersection(set_b)

203 ns ± 9.5 ns per loop (mean ± std. dev. of 5 runs, 100 loops each)


2. Membership testing with sets

In [60]:
# The same 720 total Pokémon in each data structure
names_list = ['Abomasnow','Abra','Absol']
names_tuple = ('Abomasnow','Abra','Absol')
names_set = {'Abomasnow','Abra','Absol'}

%timeit 'Zubat' in names_list

56.7 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [61]:
%timeit 'Zubat' in names_tuple

58.3 ns ± 0.653 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [62]:
%timeit 'Zubat' in names_set

30.5 ns ± 0.763 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


3. Uniques with sets

In [63]:
# 720 Pokémon primary types corresponding to each Pokémon
primary_types = ['Grass', 'Psychic', 'Dark', 'Bug']

unique_types = []

for prim_type in primary_types:
    if prim_type not in unique_types:
        unique_types.append(prim_type)

print(unique_types)

['Grass', 'Psychic', 'Dark', 'Bug']


In [64]:
unique_types_set = set(primary_types)
print(unique_types_set)

{'Grass', 'Psychic', 'Bug', 'Dark'}


### 3.3 - Eliminating for loops
`for`, `while`, nested loops

1. Eliminating loops with built-ins

In [65]:
# List of HP, Attack, Defense, Speed
poke_stats = [
[90, 92, 75, 60],
[25, 20, 15, 90],
[65, 130, 60, 75],
]

# For loop approach
totals = []
for row in poke_stats:
    totals.append(sum(row))

# List comprehension
totals_comp = [sum(row) for row in poke_stats]

# Built-in map() function
totals_map = [*map(sum, poke_stats)]

In [66]:
%%timeit
totals = []
for row in poke_stats:
    totals.append(sum(row))

556 ns ± 18.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [67]:
%timeit totals_comp = [sum(row) for row in poke_stats]

592 ns ± 53.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [68]:
%timeit totals_map = [*map(sum, poke_stats)]

459 ns ± 8.21 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


2. Eliminating loops with built-in modules

In [69]:
poke_types = ['Bug', 'Fire', 'Ghost', 'Grass', 'Water']

# Nested for loop approach
combos = []
for x in poke_types:
    for y in poke_types:
        if x == y:
            continue
        if ((x,y) not in combos) & ((y,x) not in combos):
            combos.append((x,y))

# Built-in module approach
from itertools import combinations
combos2 = [*combinations(poke_types, 2)]

3. Eliminate loops with NumPy

In [73]:
import numpy as np

avgs = []
for row in poke_stats:
    avg = np.mean(row)
    avgs.append(avg)
print(avgs)

[79.25, 37.5, 82.5]


In [76]:
poke_stats = np.array(poke_stats)
avgs_np = poke_stats.mean(axis=1)
print(avgs_np)

[79.25 37.5  82.5 ]


In [77]:
%%timeit
avgs = []
for row in poke_stats:
    avg = np.mean(row)
    avgs.append(avg)

24.1 µs ± 662 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [78]:
%timeit avgs = poke_stats.mean(axis=1)

7.97 µs ± 258 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


### 3.4 - Writing better for loops

1. Moving calculations above a loop

In [79]:
import numpy as np

names = ['Absol', 'Aron', 'Jynx', 'Natu', 'Onix']
attacks = np.array([130, 70, 50, 50, 45])

for pokemon,attack in zip(names, attacks):
    total_attack_avg = attacks.mean()
    if attack > total_attack_avg:
        print(
            "{}'s attack: {} > average: {}!".format(pokemon, attack, total_attack_avg)
        )

Absol's attack: 130 > average: 69.0!
Aron's attack: 70 > average: 69.0!


In [80]:
# Calculate total average once (outside the loop)
total_attack_avg = attacks.mean() ## Saves half runtime

for pokemon,attack in zip(names, attacks):
    if attack > total_attack_avg:
        print(
            "{}'s attack: {} > average: {}!".format(pokemon, attack, total_attack_avg)
        )

Absol's attack: 130 > average: 69.0!
Aron's attack: 70 > average: 69.0!


2. Holistic conversion

In [83]:
names = ['Pikachu','Squirtle','Articuno']
legend_status = [False, False, True]
generations = [1, 1, 1]

poke_data = []

for poke_tuple in zip(names, legend_status, generations):
    poke_list = list(poke_tuple)
    poke_data.append(poke_list)
    print(poke_data)

[['Pikachu', False, 1]]
[['Pikachu', False, 1], ['Squirtle', False, 1]]
[['Pikachu', False, 1], ['Squirtle', False, 1], ['Articuno', True, 1]]


In [84]:
poke_data_tuples = []

for poke_tuple in zip(names, legend_status, generations):
    poke_data_tuples.append(poke_tuple)

poke_data = [*map(list, poke_data_tuples)]
print(poke_data)

[['Pikachu', False, 1], ['Squirtle', False, 1], ['Articuno', True, 1]]


In [85]:
%%timeit

poke_data = []
for poke_tuple in zip(names, legend_status, generations):
    poke_list = list(poke_tuple)
    poke_data.append(poke_list)

918 ns ± 31.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [86]:
%%timeit

poke_data_tuples = []

for poke_tuple in zip(names, legend_status, generations):
    poke_data_tuples.append(poke_tuple)

poke_data = [*map(list, poke_data_tuples)]

1.08 µs ± 35.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


* Pokemon example

In [88]:
from itertools import combinations

pokemon_types = [
    'Bug', 'Dark', 'Dragon', 'Electric', 'Fairy',
    'Fighting','Fire', 'Flying', 'Ghost', 'Grass',
    'Ground', 'Ice', 'Normal', 'Poison', 'Psychic',
    'Rock', 'Steel', 'Water'
]

# Collect all possible pairs using combinations()
possible_pairs = [*combinations(pokemon_types, 2)]

enumerated_tuples = []

# Add a line to append each enumerated_pair_tuple to the empty list above
for i,pair in enumerate(possible_pairs, 1):
    enumerated_pair_tuple = (i,) + pair
    enumerated_tuples.append(enumerated_pair_tuple)

# Convert all tuples in enumerated_tuples to a list
enumerated_pairs = [*map(list, enumerated_tuples)]
print(enumerated_pairs[:5])

[[1, 'Bug', 'Dark'], [2, 'Bug', 'Dragon'], [3, 'Bug', 'Electric'], [4, 'Bug', 'Fairy'], [5, 'Bug', 'Fighting']]


## Chapter 4 - Basic pandas optimization

### 4.1 - Intro to pandas DataFrame iterations
* Iterating with `.iloc[]`
* Iterating with `.iterrows()`

### 4.2 - Another iterator method: `.itertuples()`
* Often performs better than `.iterrows()`
* Slicing with `[]` in `.iterrows()`, but `.` in `.itertuples()`

### 4.3 - `pandas` alternative to looping
* `.apply()`, `0` for column-wise calculation, and `1` for row-wise calculation

In [None]:
def calc_run_diff(runs_scored, runs_allowed):
    run_diff = runs_scored - runs_allowed
    return run_diff

baseball_df.apply(
    lambda row: calc_run_diff(row['RS'], row['RA']),
    axis=1
)

### 4.4 - Optimal pandas iterating
* `pandas` was built on `numpy` on `python`.
* Utilize the power of vector calculation.

Using NumPy arrays was the **fastest** approach, followed by the `.itertuples()` approach, and the `.apply()` approach was slowest.

In [None]:
win_perc_preds_loop = []

# Use a loop and .itertuples() to collect each row's predicted win percentage
for row in baseball_df.itertuples():
    runs_scored = row.RS
    runs_allowed = row.RA
    win_perc_pred = predict_win_perc(runs_scored, runs_allowed)
    win_perc_preds_loop.append(win_perc_pred)

# Apply predict_win_perc to each row of the DataFrame
win_perc_preds_apply = baseball_df.apply(
    lambda row: predict_win_perc(row['RS'], row['RA']), axis=1
)

# Calculate the win percentage predictions using NumPy arrays
win_perc_preds_np = predict_win_perc(baseball_df['RS'].values, baseball_df['RA'].values)
baseball_df['WP_preds'] = win_perc_preds_np
print(baseball_df.head())