# Efficient Python Code Methods

## Efficient Definition: 
* Minimal completion time (fast runtime: reduce latency)
* Minimal resource consumption (small memory usage: reduce overhead)
* Follows _pythonic_ ways 

## Best Practices

### List Comprehension instead of loops

In [4]:
# Best Practice 1
# Looping over contents instead of looping with index

names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']
bad_list = []
better_list = []

# Bad practice
for name in names: # this loop only keeps names with 6 or more letter
    if len(name) >= 6:
        bad_list.append(name)
print(bad_list)

# Good practice
for name in names:
     if len(name) >= 6:
        better_list.append(name)
print(better_list)

# Best practice (list comprehension)
best_list = [name for name in names if len(name) >= 6]
print(best_list)

['Kramer', 'Elaine', 'George', 'Newman']
['Kramer', 'Elaine', 'George', 'Newman']
['Kramer', 'Elaine', 'George', 'Newman']


### Built-in objects and unpacking objects: range()

In [5]:
# Creating lists with objects and unpacking them
nums = range(6) # creates a range object
nums_list = list(nums) # Convert nums to a list
print(nums_list)

# Create a new list of odd numbers from 1 to 11 by unpacking a range object
nums_list2 = [*range(1,12,2)]
print(nums_list2)

[0, 1, 2, 3, 4, 5]
[1, 3, 5, 7, 9, 11]


### Built-in objects and unpacking objects: enumerate()

In [11]:
# An enumerate() produces enumerate object, which is an iterator that produces a sequence of tuples, 
# each containing an index and the value from the iterable
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

# Rewrite the for loop to use enumerate
indexed_names = []
for i,name in enumerate(names):
    index_name = (i,name)
    indexed_names.append(index_name) 
print(indexed_names)

# Rewrite the above for loop using list comprehension
indexed_names_comp = [(i,name) for i,name in enumerate(names)]
print(indexed_names_comp)

# Unpack an enumerate object with a starting index of one instead of zero
indexed_names_unpack = [*enumerate(names, start=1)]
print(indexed_names_unpack)

[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]
[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]
[(1, 'Jerry'), (2, 'Kramer'), (3, 'Elaine'), (4, 'George'), (5, 'Newman')]


### Built-in objects and unpacking objects: map()

In [4]:
# The map() function is used to apply a function on all elements of a specified iterable and return a map object
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']
names_uppercase_1 = []
names_uppercase_2 = []

# Bad practice
for name in names:
  names_uppercase_1.append(name.upper())
print(names_uppercase_1)

# Best practice
names_map  = map(str.upper, names) # Use map to apply str.upper to each element in names and returns output into a map object
names_uppercase_2 = [*names_map] # Unpack names_map object into a list
print(names_uppercase_2)

['JERRY', 'KRAMER', 'ELAINE', 'GEORGE', 'NEWMAN']
['JERRY', 'KRAMER', 'ELAINE', 'GEORGE', 'NEWMAN']


### Built-in objects and unpacking objects: zip()

In [3]:
# The zip() function combines elements of 2 lists with matching indexes into an interable of tuples
names = ['caio', 'fofo', 'muzy']
ages = ['87','86','88']

# Combine names and primary_types
combination_of_tuples = [*zip(names, ages)] #using zip() to combine elements from the 2 lists into tuples and then unpacking them
print(combination_of_tuples)
print(type(combination_of_tuples))
print('----')
print(combination_of_tuples[0])
print(type(combination_of_tuples[0]))



[('caio', '87'), ('fofo', '86'), ('muzy', '88')]
<class 'list'>
----
('caio', '87')
<class 'tuple'>


### Combining, Counting, and Iterating 
Built-in modules:
* Collections: Counter()
* Itertools: combinations()

In [55]:
# Combining ( zip () )
# Efficient methods for counting, combining, and iterating
names = ['caio', 'fofo', 'muzy']
ages = ['87','86','88']

# Combine names and primary_types
combination = [*zip(names, ages)] #combine all possibilities with zip and unpack as a list of objects

print(*combination[:3], sep='\n') #unpack the list as objects in the output
print('----')
# Combine 2 items from names and 3 items from primary_types
differing_lengths = [*zip(names[:2], ages[:3])] # it will only combine until the smallest lengthed object is exhausted

print(*differing_lengths, sep='\n')

('caio', '87')
('fofo', '86')
('muzy', '88')
----
('caio', '87')
('fofo', '86')


In [59]:
# Combining ( combinations from Intertools )

# Import combinations from itertools
from itertools import combinations

pokemon = ['Geodude', 'Cubone', 'Lickitung', 'Persian', 'Diglett']

# Create a combination object (list of tuples) with pairs of Pokémon
combos_obj = combinations(pokemon, 2)
print(type(combos_obj), '\n')

# Convert combos_obj to a list by unpacking
combos_2 = [*combos_obj]
print(combos_2, '\n')

<class 'itertools.combinations'> 

[('Geodude', 'Cubone'), ('Geodude', 'Lickitung'), ('Geodude', 'Persian'), ('Geodude', 'Diglett'), ('Cubone', 'Lickitung'), ('Cubone', 'Persian'), ('Cubone', 'Diglett'), ('Lickitung', 'Persian'), ('Lickitung', 'Diglett'), ('Persian', 'Diglett')] 



In [57]:
# Counting ( Counter() )
from collections import Counter
primary_types = ['ab','ab', 'ab', 'ab', 'bb', 'bb', 'cb','cb']

# Collect the count of primary types
type_count = Counter(primary_types)
print(type_count, '\n')

# Use list comprehension to get each Pokémon's starting letter
starting_letters = [name[0] for name in names]

# Collect the count of Pokémon for each starting_letter
starting_letters_count = Counter(starting_letters)
print(starting_letters_count)

Counter({'ab': 4, 'bb': 2, 'cb': 2}) 

Counter({'c': 1, 'f': 1, 'm': 1})


## Examining Runtime (Optimazing Code)

In [29]:
# IPython Magic Commands 
%lsmagic


Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %code_wrap  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %mamba  %man  %matplotlib  %micromamba  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%code_wrap  %%debug  %%file  %%html  %%javascript  %%js  %

In [39]:
# Time to generate 1000 random numbers between 0 and 1

%timeit rand_nums = np.random.rand(1000)

6.17 µs ± 223 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [40]:
# Since it provides a distribution of time it takes to generate the random numbers, we can specify number of runs (-r) and loops (-n)
# -r: number of interations you'd like to use to estimate runtime
# -n: how many times to execute the code per run
# Let's do 2 runs with 10 loops each in order to estimate runtime
%timeit -r2 -n10 rand_nums = np.random.rand(1000)


The slowest run took 32.09 times longer than the fastest. This could mean that an intermediate result is being cached.
105 µs ± 98.4 µs per loop (mean ± std. dev. of 2 runs, 10 loops each)


In [45]:
#saving the outupt into a variable by using "-o"
times = %timeit -o rand_nums = np.random.rand(1000)

print(times)
print(times.best)
print(times.worst)

5.99 µs ± 43.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
5.99 µs ± 43.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
5.916518209996866e-06
6.059531249993597e-06


In [42]:
## Multiple vs. Single lines of code
%%timeit
nums = []
for x in range(1000):
    nums.append(x)


UsageError: Line magic function `%%timeit` not found.


In [4]:
#Comparing times between data structure using formal names vs. literal syntax (short hand for creating a data structure)
formal_list = list()
formal_dict = dict()
formal_tuple = tuple()

literal_list = []
literal_dict = {}
literal_tuple = ()

formal_time = %timeit -o formal_list = list()
literal_time = %timeit -o literal_list = ()
print(formal_time.average - literal_time.average)

64.3 ns ± 2.73 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
16.4 ns ± 0.776 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
4.7860790504273614e-08


### Eliminating Loops (Zen of Python: "Flat is better than Nested")

In [19]:
#Bad practice
# A for loop has been created to filter the name with age less than 30 and show number of characters
names = ['caio', 'pedro', 'fofo']
ages = [36, 25, 37]
name_lengths_loop = []
for name,age in zip(names, ages):
    if age < 30:
        name_length = len(name)
        tuple = (name, name_length)
        name_lengths_loop.append(tuple)
print(name_lengths_loop)

#Best practice
# Eliminate the above for loop using list comprehension and the map() function
my_names = [name for name,age in zip(names, ages) if age < 30]

# Create a map object that stores the name lengths
name_lengths_map = map(len, my_names)

# Combine my_names and name_lengths_map into a list
names_and_name_lengths = [*zip(my_names, name_lengths_map)]
print(names_and_name_lengths)

[('pedro', 5)]
[('pedro', 5)]


### Making Loops more Efficient if they are inevitable
* Lists
* DataFrames (Iterating over rows or columns: iterrows(), itertuples(), apply())
* DataFrames (Vectorizing operations, i.e., perfomring calculation on all elements of an object at once)

### Let's start with Lists.

In [7]:
# For Lists
# Calculating the percentage of a each category in a list of categories
categories_list = ['a','a', 'a', 'a', 'b', 'b', 'c','c']

# Import Counter
from collections import Counter

# Collect the count of each category
cat_counts = Counter(categories_list) #returns a dictionary

# Improve for loop by doing the total count calculation outside of it 
total_count = len(categories_list)

print('Total counts: {}'.format(total_count))
for cat,count in cat_counts.items():
    cat_percent = round(count / total_count * 100, 2)
    print('category {}: count = {:3} percentage = {}'
          .format(cat, count, cat_percent))

Total counts: 8
category a: count =   4 percentage = 50.0
category b: count =   2 percentage = 25.0
category c: count =   2 percentage = 25.0


In [29]:
data = {
  "wins": [10, 20, 30],
  "total_games": [60, 60, 60]
}
test_df = pd.DataFrame(data)
print('Dataframe created:')
print(test_df)
print('---')

def calc_win_perc(wins, games_played):
  win_perc = wins / games_played
  return np.round(win_perc,2)

def calc_diff(games_played, wins):
  diffs = games_played - wins
  return diffs

Dataframe created:
   wins  total_games
0    10           60
1    20           60
2    30           60
---


### Now, let's move to DataFrames.

In [30]:
import pandas as pd
import numpy as np

# For DataFrames

# 1) iterrrows()
# .iterrows() returns each DataFrame row as a tuple of (index, pandas Series) pairs.
# i, row: i returns the index of each pandas Series, and pandas Series is basically the column of the dataframe as a pandas Series 

#Bad Practice
win_perc_list = []
test_df_bad = test_df.copy()
for i in range(len(test_df_bad)):
    row = test_df_bad.iloc[i]
    wins = row['wins']
    games_played = row['total_games']
    win_perc = calc_win_perc(wins, games_played)
    win_perc_list.append(win_perc)
test_df_bad['perc_wins'] = win_perc_list
print(test_df_bad)
print('---')

#Best Practice 1: to use iterrows() and avoid having to create an index to go over each row
win_perc_list = []
test_df_best_1 = test_df.copy()

####
# Notice that iterrows returns a tuple as a Pandas series:
print('Iterrows returns each row as a tuple:')
for row_tuple in test_df_best_1.iterrows():
    print(row_tuple)
    print(type(row_tuple[1])) #access second tuple and its type
print('---')


for i, row in test_df_best_1.iterrows():
    wins = row['wins']
    games_played = row['total_games']    
    win_perc = calc_win_perc(wins, games_played)
    win_perc_list.append(win_perc)
test_df_best_1['perc_wins'] = win_perc_list


# 2) itertuples()
# It is quicker than iterrows() and avoid having to create an index to go over each row
# itertuples() is quicker than iterrows() because its output is also a row tuple but not a pandas series
# itertuples()'s output is a row tuple with a type "namedtuple", where the not only the row number but the columns are "tagged" with names
# which means you must access it with a ".method". This accessibility is a property called "attribute lookup".

#Best Practice 2: to use itertuples(), which also avoids having to create an index to go over each row
win_perc_list = []
test_df_best_2 = test_df.copy()
for i, row in test_df_best_2.iterrows():
    wins = row['wins']
    games_played = row['total_games']    
    win_perc = calc_win_perc(wins, games_played)
    win_perc_list.append(win_perc)
test_df_best_2['perc_wins'] = win_perc_list

# 3) apply()
# Takes a function as input (for example a lambda function) and applies to the entire dataframe, by specifiying an axis (0 = calculation within columns, 1 = calculation within rows)
diff_apply = test_df.apply(lambda row: calc_diff(row['total_games'], row['wins']), axis=1) # the difference is performed within rows
test_df['diff'] = diff_apply
print(test_df)


   wins  total_games  perc_wins
0    10           60       0.17
1    20           60       0.33
2    30           60       0.50
---
Iterrows returns each row as a tuple:
(0, wins           10
total_games    60
Name: 0, dtype: int64)
<class 'pandas.core.series.Series'>
(1, wins           20
total_games    60
Name: 1, dtype: int64)
<class 'pandas.core.series.Series'>
(2, wins           30
total_games    60
Name: 2, dtype: int64)
<class 'pandas.core.series.Series'>
---
   wins  total_games  diff
0    10           60    50
1    20           60    40
2    30           60    30


### NumPy instead of Python Lists
* broadcasting = vectorizing

Broadcasting allows NumPy to perform operations on arrays of different shapes by implicitly adjusting their shapes to make them compatible, while vectorizing enables efficient element-wise operations on entire arrays or matrices without explicit looping.

In [35]:
# List is more verbose than NumPy arrays
import numpy as np

# Creating list and indexing
nums_list = [ [1, 2, 3], 
         [4, 5, 6] ]
print(nums_list[0][0])

# Creating array (from a list) and indexing
nums_array = np.array(nums_list)
print(nums_array[0,0])

# Return all values from a column with a list
test_list = [row[0] for row in nums_list] #we must use list comprehension (or loops) to return columns, there is no other way
print(test_list)
print(type(test_list))

# Return all values from a column with an array
test_array = nums_array[:,0]
print(test_array)
print(type(nums_array[:,0]))

# We can access the values within a Dataframe column by using the .values method
print(type(test_df['total_games'].values))
test_df['diff'] = test_df['total_games'].values - test_df['wins'].values
print(test_df)

1
1
[1, 4]
<class 'list'>
[1 4]
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
   wins  total_games  diff
0    10           60    50
1    20           60    40
2    30           60    30
