## Python Programming
* Skill Track Review of key items prior to taking next certification (Data Science)
* Hopefully more of a review but let's crack on

<br>

### (1 of 6) - Writing Efficient Python Code
#### Defining Efficient
* In the context of this course, efficient refers to code that satisfies two key concepts. 
* **First**, efficient code is fast and has a small latency between execution and returning a result. 
* **Second**, efficient code allocates resources skillfully and isn't subjected to unnecessary overhead. 

Although your definition of fast runtime and small memory usage may depend on the task at hand, the goal of writing efficient code is still to reduce both `latency` and `overhead`. For the remainder of this course, we'll be exploring how to write Python code that runs quickly and has little memory overhead. 

#### Defining Pythonic
* We've defined what is meant by efficient code, but it is also important to note that this course focuses on writing efficient code using Python. 
* Python is a language that prides itself on code readability, and thus, it comes with its own set of idioms and best practices. 
* Writing Python code the way it was intended is often referred to as Pythonic code. 
	* This means the code that you write follows the best practices and guiding principles of Python. 
	* Pythonic code tends to be less verbose and easier to interpret. 
* Although Python supports code that doesn't follow its guiding principles, this type of code tends to run slower. 
	* As an example, look at the non-Pythonic code in this slide. 
	* Not only is this code more verbose than the Pythonic version, it takes longer to run. 
	* We'll take a closer look at why this is the case later on in the course, but for now, the main take away here is that Pythonic code is efficient code!

#### Non-Pythonic
```python
doubled_numbers = []
for i in range(len(numbers)):
    double_numbers.append(numbers[i] * 2)
```
#### Pythonic (List Comprehension)
```python
double_numbers = [x * 2 for x in numbers]
```

![Screen Shot 2023-03-03 at 9.11.21 AM](Screen%20Shot%202023-03-03%20at%209.11.21%20AM.png)

* Writing efficient Python code minimizes runtime and memory usage while also following the idioms in the Zen of Python.

#### First Exercises Below

```python
# Print the list created using the Non-Pythonic approach
i = 0
new_list= []
while i < len(names):
    if len(names[i]) >= 6:
        new_list.append(names[i])
    i += 1
print(new_list)

# Print the list created by looping over the contents of names : More Pythonic
better_list = []
for name in names:
    if len(name) >= 6:
        better_list.append(name)
print(better_list)


# Print the list created by using list comprehension : Most Pythonic
best_list = [name for name in names if len(name) >= 6]
print(best_list)
```

### Building w/built-ins
* Python 3.6 Standard Library
	* Part of every standard Python installation
* Built-in types
	* list, tuple set, dict, and others
* Built-in functions
	* print(), len(), range(), round(), enumerate(), map(), zip()
* Built-in modules
	* os, sys, itertools, collections, math, and others    

Ex : map()
![Screen Shot 2023-03-03 at 9.19.52 AM](Screen%20Shot%202023-03-03%20at%209.19.52%20AM.png)
```python
# square all numbers in list with anonyomous lambda function
nums = [1, 2, 3, 4]
sqrd_numbs = map(lambda x: x ** 2, nums)
print(sqrd_numbs) = [1, 4, 9, 16]
```

In [1]:
# Create a range object that goes from 0 to 5
nums = range(6)
print(nums)

# Converts nums to a list
nums_list = list(nums)
print(nums_list)

# Create a new list of odd numbers from 1 to 11 by unpacking a range object - using star characters to unpack range object
nums_list2 = [*range(1, 12, 2)]
print(nums_list2)

range(0, 6)
[0, 1, 2, 3, 4, 5]
[1, 3, 5, 7, 9, 11]


In [2]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']
# Rewrite the for loop to use enumerate
indexed_names = []
for i, name in enumerate(names):
    index_name = (i, name)
    indexed_names.append(index_name)
print(indexed_names)

# Rewrite the above for loop using list comprehension
indexed_names_comp = [(i, name) for i, name in enumerate(names)]
print(indexed_names_comp)

# Unpack an enumerate object with a starting index of one
indexed_names_unpack = [*enumerate(names, 1)]
print(indexed_names_unpack)

[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]
[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]
[(1, 'Jerry'), (2, 'Kramer'), (3, 'Elaine'), (4, 'George'), (5, 'Newman')]


#### Built-in practice: map()
In this exercise, you'll practice using Python's built-in map() function to apply a function to every element of an object. Let's look at a list of party guests:

In [3]:
# Use map to apply str.upper to each element in names
names_map = map(str.upper, names)
print(type(names_map))

# Unpack names_map into a list
names_uppercase = [*names_map]

print(names_uppercase)

<class 'map'>
['JERRY', 'KRAMER', 'ELAINE', 'GEORGE', 'NEWMAN']


<br>

### Numpy Arrays
* NumPy arrays provide a fast and memory efficient alternative to Python lists. Typically, we import NumPy as np and use np dot array to create a NumPy array.
* NumPy arrays are **homogeneous**, which means that they must contain elements of the same type. We can see the type of each element using the dot dtype method. Suppose we created an array using a mixture of types.

### Numpy Array Broadcasting
* NumPy arrays vectorize operations, so they are performed on all elements of an object at once. This allows us to efficiently perform calculations over entire arrays.


### NumPy array indexing
* Indexing capabilites advantageous
* When using two-dimensional arrays and lists, the advantages of arrays are clear. To return the second item of the first row in our two-dimensional object, the array syntax is square bracket, zero, comma, one, square bracket. 


### Numpy array boolean indexing
```python
nums = [-2, -1, 0, 1, 2]
nums_np = np.array(nums)
# Boolean indexing
nums_np > 0
array([False, False, False, True, True])

nums_np[nums_np > 0]
array([1,2])
```

In [4]:
# Practice w/Numpy Arrays
import numpy as np

nums = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print(nums, '\n')

# Print seocnd row of nums
print(nums[1, :], '\n')

# Print all elements of nums that are greater than six
print(nums[nums > 6], '\n')

# Double every element of nums
nums_dbl = nums * 2
print(nums_dbl, '\n')

# Replace the third column in nums with a new column that adds 1 to each item in the original column.
nums[:, 2] = nums[:, 2] + 1
print(nums)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]] 

[ 6  7  8  9 10] 

[ 7  8  9 10] 

[[ 2  4  6  8 10]
 [12 14 16 18 20]] 

[[ 1  2  4  4  5]
 [ 6  7  9  9 10]]


* A numpy array contains homogeneous data types (which reduces memory consumption) and provides the ability to apply operations on all elements through broadcasting.

#### Putting The Above Together Exercise

In [5]:
# Create a list of arrival times (10 -50, in 10 increments). Unpack range object
arrival_times = [*range(10, 60, 10)]
print(arrival_times, '\n')

# You realize your clock is three minutes fast. Convert the arrival_times list into a numpy array (called arrival_times_np) and use NumPy broadcasting to subtract three minutes from each arrival time
arrival_times_np = np.array(arrival_times)
new_times = arrival_times_np - 3
print(new_times, '\n')

# Use list comprehension with enumerate() to pair each guest in the names list to their updated arrival time in the new_times array
guest_arrivals = [(names[i], time) for i, time in enumerate(new_times)]
print(guest_arrivals)

[10, 20, 30, 40, 50] 

[ 7 17 27 37 47] 

[('Jerry', 7), ('Kramer', 17), ('Elaine', 27), ('George', 37), ('Newman', 47)]


In [6]:
# Welcome_guest function (def preloaded in exercises so write your own)
# Output : "Welcome to Festivus (Name)... You're () min late."
def welcome_guest(guest_list):
    """
    Overview : Function takes list of tuples (with 2 indexes) and creates a basic statement
    Arguments : list of tuples
    Returns : List of strings adding Name and late properties
    """
    guest_late_times = [f"Welcome to Festiveus {guest[0]}... You're {guest[1]} min late." for guest in guest_list]
    return guest_late_times

seinfeld_tardiness = welcome_guest(guest_arrivals)
print(*seinfeld_tardiness, sep='\n')

Welcome to Festiveus Jerry... You're 7 min late.
Welcome to Festiveus Kramer... You're 17 min late.
Welcome to Festiveus Elaine... You're 27 min late.
Welcome to Festiveus George... You're 37 min late.
Welcome to Festiveus Newman... You're 47 min late.


<br>

## Examining runtime
* Comparing runtimes between two code bases, that effectively do the same thing, allows us to pick the code with the optimal performance. By gathering and analyzing runtimes, we can be sure to implement the code that is fastest and thus more efficient.
* Using %timeit
	* Consider this example: we want to inspect the runtime for selecting 1,000 random numbers between zero and one using NumPy's random-dot-rand function. Using %timeit just requires adding the magic command before the line of code we want to analyze. That's it! 
```python
import numpy as np
rand_nums = np.random.rand(1000)
## Timing with %timeit
%timeit rand_nums = np.random.rand(1000)

#8.61 us+ 69.1 ns per loop (mean + std. dev of 7 runs, 100000)
```
* We also see that multiple runs and loops were generated. 
	* **%timeit** runs through the provided code multiple times to estimate the code's execution time. 
	* This provides a more accurate representation of the actual runtime rather than relying on just one iteration to calculate the runtime. 
	* The mean and standard deviation displayed in the output is a summary of the runtime considering each of the multiple runs.

#### Specifying numbers of runs/loops
```python
# Set number of runs to 2, number of loops to 10
% timeit -r2 -n10 rand_nums = np.random.rand(1000)
```

![Screen Shot 2023-03-03 at 10.19.58 AM](Screen%20Shot%202023-03-03%20at%2010.19.58%20AM.png)


In [7]:
# Create a list of integers (0-50) using list comprehension
nums_list_comp = [num for num in range(51)]
print(nums_list_comp, '\n')

# Create a list of integesr (0-50) by unpacking range
nums_unpack = [*range(51)]
print(nums_unpack)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50] 

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]


In [8]:
# Use %timeit within your IPython console (i.e. not within the script.py window) to compare the runtimes for creating a list of integers from 0 to 50 using list comprehension vs. unpacking the range object.
%timeit lc = [num for num in range(51)]

2.25 µs ± 6.66 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [9]:
%timeit upro = [*range(51)]

862 ns ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


* Although list comprehension is a useful and powerful tool, sometimes unpacking an object can save time and looks a little cleaner.

In [10]:
# Create an empty list called formal_list using the formal name (list()).
formal_list = list()
print(formal_list)

# Create an empty list called literal_list using the literal syntax ([]).
literal_list = []
print(literal_list)

# Print out the type of formal_list
print(type(formal_list))

# Print out the type of literal_list
print(type(literal_list))

[]
[]
<class 'list'>
<class 'list'>


In [11]:
%timeit list()

73.4 ns ± 0.162 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [12]:
%timeit []

40.6 ns ± 0.0839 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


* The approach with the smallest runtime is the fastest.
* Using Python's literal syntax to define a data structure can speed up your runtime. 
* Consider using the literal syntaxes 
	* (like [] instead of list(), {} instead of dict(), or () instead of tuple()), where applicable, to gain some speed

![Screen Shot 2023-03-03 at 10.32.25 AM](Screen%20Shot%202023-03-03%20at%2010.32.25%20AM.png)
* You used %%timeit (_cell magic mode_) to time multiple lines of code. Converting the wts list into a NumPy array and taking advantage of NumPy array broadcasting saved you some time! 
* Moving forward, remember that you can use %timeit to gather runtime for a single line of code (_line magic mode_) and %%timeit to get the runtime for multiple lines of code.

### Code Profiling for runtime
We've covered how to time our code using the magic command %timeit, which works well with bite-sized code. But, what if we wanted to time a large code base or see the line-by-line runtimes within a function? In this lesson, we'll cover a concept called code profiling that allows us to analyze code more efficiently.
* packaged used : `line_profiler`
* %load_ext line_profiler
* Magic command for line-by-line times : %lprun -f
	* %lprun -f convert_units(heroes, hts, wts)
	* how to profile a function

#### steps for using %lprun
```python
def convert_units(heroes, heights, weights):

    new_hts = [ht * 0.39370  for ht in heights]
    new_wts = [wt * 2.20462  for wt in weights]

    hero_data = {}

    for i,hero in enumerate(heroes):
        hero_data[hero] = (new_hts[i], new_wts[i])

    return hero_data
```
* Suppose you have a list of superheroes (named heroes) along with each hero's height (in centimeters) and weight (in kilograms) loaded as NumPy arrays (named hts and wts respectively).

* **What are the necessary steps you need to take in order to profile the convert_units() function acting on your superheroes data if you'd like to see line-by-line runtimes?**

1. Use %load_ext line_profiler to load the line_profiler within your IPython session.
2. Use %lprun -f convert_units convert_units(heroes, hts, wts) to get line-by-line runtimes.

<br>

### Using %lprun: spot bottlenecks
Profiling a function allows you to dig deeper into the function's source code and potentially spot bottlenecks. When you see certain lines of code taking up the majority of the function's runtime, it is an indication that you may want to deploy a different, more efficient technique.

In [13]:
def convert_units(heroes, heights, weights):
    new_hts = [ht * 0.39370  for ht in heights]
    new_wts = [wt * 2.20462  for wt in weights]

    hero_data = {}

    for i,hero in enumerate(heroes):
        hero_data[hero] = (new_hts[i], new_wts[i])

    return hero_data

heroes = ['A-Bomb', 'Abe Sapien', 'Abin Sur', 'Abomination', 'Absorbing Man', 'Adam Strange', 'Agent 13', 'Agent Bob', 'Agent Zero', 'Air-Walker', 'Ajax', 'Alan Scott', 'Alfred Pennyworth', 'Alien', 'Amazo', 'Ammo', 'Angel', 'Angel Dust', 'Angel Salvadore', 'Animal Man', 'Annihilus', 'Ant-Man', 'Ant-Man II', 'Anti-Venom', 'Apocalypse', 'Aqualad', 'Aquaman', 'Arachne', 'Archangel', 'Arclight', 'Ardina', 'Ares', 'Ariel', 'Armor', 'Atlas', 'Atom', 'Atom Girl', 'Atom II', 'Aurora', 'Azazel', 'Bane', 'Banshee', 'Bantam', 'Batgirl', 'Batgirl IV', 'Batgirl VI', 'Batman', 'Batman II', 'Battlestar', 'Beak', 'Beast', 'Beast Boy', 'Beta Ray Bill', 'Big Barda', 'Big Man', 'Binary', 'Bishop', 'Bizarro', 'Black Adam', 'Black Bolt', 'Black Canary', 'Black Cat', 'Black Knight III', 'Black Lightning', 'Black Mamba', 'Black Manta', 'Black Panther', 'Black Widow', 'Black Widow II', 'Blackout', 'Blackwing', 'Blackwulf', 'Blade', 'Bling!', 'Blink', 'Blizzard II', 'Blob', 'Bloodaxe', 'Blue Beetle II', 'Boom-Boom', 'Booster Gold', 'Box III', 'Brainiac', 'Brainiac 5', 'Brother Voodoo', 'Buffy', 'Bullseye', 'Bumblebee', 'Cable', 'Callisto', 'Cannonball', 'Captain America', 'Captain Atom', 'Captain Britain', 'Captain Mar-vell', 'Captain Marvel', 'Captain Marvel II', 'Carnage', 'Cat', 'Catwoman', 'Cecilia Reyes', 'Century', 'Chamber', 'Changeling', 'Cheetah', 'Cheetah II', 'Cheetah III', 'Chromos', 'Citizen Steel', 'Cloak', 'Clock King', 'Colossus', 'Copycat', 'Corsair', 'Cottonmouth', 'Crimson Dynamo', 'Crystal', 'Cyborg', 'Cyclops', 'Cypher', 'Dagger', 'Daredevil', 'Darkhawk', 'Darkseid', 'Darkstar', 'Darth Vader', 'Dash', 'Dazzler', 'Deadman', 'Deadpool', 'Deadshot', 'Deathlok', 'Deathstroke', 'Demogoblin', 'Destroyer', 'Diamondback', 'Doc Samson', 'Doctor Doom', 'Doctor Doom II', 'Doctor Fate', 'Doctor Octopus', 'Doctor Strange', 'Domino', 'Donna Troy', 'Doomsday', 'Doppelganger', 'Drax the Destroyer', 'Elastigirl', 'Electro', 'Elektra', 'Elongated Man', 'Emma Frost', 'Enchantress', 'Etrigan', 'Evil Deadpool', 'Evilhawk', 'Exodus', 'Fabian Cortez', 'Falcon', 'Feral', 'Fin Fang Foom', 'Firebird', 'Firelord', 'Firestar', 'Firestorm', 'Flash', 'Flash II', 'Flash III', 'Flash IV', 'Forge', 'Franklin Richards', 'Franklin Storm', 'Frenzy', 'Frigga', 'Galactus', 'Gambit', 'Gamora', 'Genesis', 'Ghost Rider', 'Giganta', 'Gladiator', 'Goblin Queen', 'Goku', 'Goliath IV', 'Gorilla Grodd', 'Granny Goodness', 'Gravity', 'Green Arrow', 'Green Goblin', 'Green Goblin II', 'Green Goblin III', 'Green Goblin IV', 'Groot', 'Guy Gardner', 'Hal Jordan', 'Han Solo', 'Harley Quinn', 'Havok', 'Hawk', 'Hawkeye', 'Hawkeye II', 'Hawkgirl', 'Hawkman', 'Hawkwoman', 'Hawkwoman III', 'Heat Wave', 'Hela', 'Hellboy', 'Hellcat', 'Hellstorm', 'Hercules', 'Hobgoblin', 'Hope Summers', 'Howard the Duck', 'Hulk', 'Human Torch', 'Huntress', 'Husk', 'Hybrid', 'Hydro-Man', 'Hyperion', 'Iceman', 'Impulse', 'Ink', 'Invisible Woman', 'Iron Fist', 'Iron Man', 'Jack of Hearts', 'Jack-Jack', 'James T. Kirk', 'Jean Grey', 'Jennifer Kale', 'Jessica Jones', 'Jigsaw', 'John Stewart', 'John Wraith', 'Joker', 'Jolt', 'Jubilee', 'Juggernaut', 'Justice', 'Kang', 'Karate Kid', 'Killer Croc', 'Kilowog', 'Kingpin', 'Klaw', 'Kraven II', 'Kraven the Hunter', 'Krypto', 'Kyle Rayner', 'Lady Deathstrike', 'Leader', 'Legion', 'Lex Luthor', 'Light Lass', 'Lightning Lad', 'Lightning Lord', 'Living Brain', 'Lizard', 'Lobo', 'Loki', 'Longshot', 'Luke Cage', 'Luke Skywalker', 'Mach-IV', 'Machine Man', 'Magneto', 'Man-Thing', 'Man-Wolf', 'Mandarin', 'Mantis', 'Martian Manhunter', 'Marvel Girl', 'Master Brood', 'Maverick', 'Maxima', 'Medusa', 'Meltdown', 'Mephisto', 'Mera', 'Metallo', 'Metamorpho', 'Metron', 'Micro Lad', 'Mimic', 'Miss Martian', 'Mister Fantastic', 'Mister Freeze', 'Mister Sinister', 'Mockingbird', 'MODOK', 'Molten Man', 'Monarch', 'Moon Knight', 'Moonstone', 'Morlun', 'Morph', 'Moses Magnum', 'Mr Immortal', 'Mr Incredible', 'Ms Marvel II', 'Multiple Man', 'Mysterio', 'Mystique', 'Namor', 'Namora', 'Namorita', 'Naruto Uzumaki', 'Nebula', 'Nick Fury', 'Nightcrawler', 'Nightwing', 'Northstar', 'Nova', 'Odin', 'Omega Red', 'Omniscient', 'One Punch Man', 'Onslaught', 'Oracle', 'Paul Blart', 'Penance II', 'Penguin', 'Phantom Girl', 'Phoenix', 'Plantman', 'Plastic Man', 'Plastique', 'Poison Ivy', 'Polaris', 'Power Girl', 'Predator', 'Professor X', 'Professor Zoom', 'Psylocke', 'Punisher', 'Purple Man', 'Pyro', 'Question', 'Quicksilver', 'Quill', "Ra's Al Ghul", 'Raven', 'Ray', 'Razor-Fist II', 'Red Arrow', 'Red Hood', 'Red Hulk', 'Red Robin', 'Red Skull', 'Red Tornado', 'Rhino', 'Rick Flag', 'Ripcord', 'Robin', 'Robin II', 'Robin III', 'Robin V', 'Rocket Raccoon', 'Rogue', 'Ronin', 'Rorschach', 'Sabretooth', 'Sage', 'Sandman', 'Sasquatch', 'Scarecrow', 'Scarlet Spider', 'Scarlet Spider II', 'Scarlet Witch', 'Scorpion', 'Sentry', 'Shadow King', 'Shadow Lass', 'Shadowcat', 'Shang-Chi', 'Shatterstar', 'She-Hulk', 'She-Thing', 'Shocker', 'Shriek', 'Sif', 'Silver Surfer', 'Silverclaw', 'Sinestro', 'Siren', 'Siryn', 'Skaar', 'Snowbird', 'Solomon Grundy', 'Songbird', 'Space Ghost', 'Spawn', 'Spider-Girl', 'Spider-Gwen', 'Spider-Man', 'Spider-Woman', 'Spider-Woman III', 'Spider-Woman IV', 'Spock', 'Spyke', 'Star-Lord', 'Starfire', 'Stargirl', 'Static', 'Steel', 'Steppenwolf', 'Storm', 'Sunspot', 'Superboy', 'Superboy-Prime', 'Supergirl', 'Superman', 'Swarm', 'Synch', 'T-1000', 'Taskmaster', 'Tempest', 'Thanos', 'The Comedian', 'Thing', 'Thor', 'Thor Girl', 'Thunderbird', 'Thunderbird III', 'Thunderstrike', 'Thundra', 'Tiger Shark', 'Tigra', 'Tinkerer', 'Toad', 'Toxin', 'Trickster', 'Triplicate Girl', 'Triton', 'Two-Face', 'Ultragirl', 'Ultron', 'Utgard-Loki', 'Vagabond', 'Valerie Hart', 'Valkyrie', 'Vanisher', 'Vegeta', 'Venom', 'Venom II', 'Venom III', 'Vertigo II', 'Vibe', 'Vindicator', 'Violet Parr', 'Vision', 'Vision II', 'Vixen', 'Vulture', 'Walrus', 'War Machine', 'Warbird', 'Warlock', 'Warp', 'Warpath', 'Wasp', 'White Queen', 'Winter Soldier', 'Wiz Kid', 'Wolfsbane', 'Wolverine', 'Wonder Girl', 'Wonder Man', 'Wonder Woman', 'Wyatt Wingfoot', 'X-23', 'X-Man', 'Yellow Claw', 'Yellowjacket', 'Yellowjacket II', 'Yoda', 'Zatanna', 'Zoom']

hts = np.array([203. , 191. , 185. , 203. , 193. , 185. , 173. , 178. , 191. ,
       188. , 193. , 180. , 178. , 244. , 257. , 188. , 183. , 165. ,
       163. , 183. , 180. , 211. , 183. , 229. , 213. , 178. , 185. ,
       175. , 183. , 173. , 193. , 185. , 165. , 163. , 183. , 178. ,
       168. , 183. , 180. , 183. , 203. , 183. , 165. , 170. , 165. ,
       168. , 188. , 178. , 198. , 175. , 180. , 173. , 201. , 188. ,
       165. , 180. , 198. , 191. , 191. , 188. , 165. , 178. , 183. ,
       185. , 170. , 188. , 183. , 170. , 170. , 191. , 185. , 188. ,
       188. , 168. , 165. , 175. , 178. , 218. , 183. , 165. , 196. ,
       193. , 198. , 170. , 183. , 157. , 183. , 170. , 203. , 175. ,
       183. , 188. , 193. , 198. , 188. , 180. , 175. , 185. , 173. ,
       175. , 170. , 201. , 175. , 180. , 163. , 170. , 175. , 185. ,
       183. , 226. , 178. , 226. , 183. , 191. , 183. , 180. , 168. ,
       198. , 191. , 175. , 165. , 183. , 185. , 267. , 168. , 198. ,
       122. , 173. , 183. , 188. , 185. , 193. , 193. , 185. , 188. ,
       193. , 198. , 201. , 201. , 188. , 175. , 188. , 173. , 175. ,
       244. , 196. , 193. , 168. , 180. , 175. , 185. , 178. , 168. ,
       193. , 188. , 191. , 183. , 196. , 188. , 175. , 975. , 165. ,
       193. , 173. , 188. , 180. , 183. , 183. , 157. , 183. , 142. ,
       188. , 211. , 180. , 876. , 185. , 183. , 185. , 188. ,  62.5,
       198. , 168. , 175. , 183. , 198. , 178. , 178. , 188. , 180. ,
       178. , 183. , 178. , 701. , 188. , 188. , 183. , 170. , 183. ,
       185. , 191. , 165. , 175. , 185. , 175. , 170. , 180. , 213. ,
       259. , 173. , 185. , 196. , 180. , 168. ,  79. , 244. , 178. ,
       180. , 170. , 175. , 188. , 183. , 173. , 170. , 180. , 168. ,
       180. , 198. , 155. ,  71. , 178. , 168. , 168. , 170. , 188. ,
       185. , 183. , 196. , 165. , 165. , 287. , 178. , 191. , 173. ,
       244. , 234. , 201. , 188. , 191. , 183. ,  64. , 180. , 175. ,
       178. , 175. , 188. , 165. , 155. , 191. , 198. , 203. , 229. ,
       193. , 188. , 198. , 168. , 180. , 183. , 188. , 213. , 188. ,
       188. , 168. , 201. , 170. , 183. , 193. , 180. , 180. , 165. ,
       198. , 175. , 196. , 185. , 185. , 183. , 188. , 178. , 185. ,
       183. , 196. , 175. , 366. , 196. , 193. , 188. , 180. , 188. ,
       178. , 175. , 188. , 201. , 173. , 180. , 180. , 178. , 188. ,
       180. , 168. , 168. , 185. , 185. , 175. , 178. , 180. , 185. ,
       206. , 211. , 180. , 175. , 305. , 178. , 170. , 183. , 157. ,
       168. , 168. , 183. , 185. , 168. , 168. , 170. , 180. , 213. ,
       183. , 180. , 180. , 183. , 180. , 178. , 188. , 183. , 163. ,
       193. , 165. , 178. , 191. , 180. , 183. , 213. , 165. , 188. ,
       185. , 196. , 185. , 180. , 178. , 183. , 165. , 137. , 122. ,
       173. , 191. , 168. , 198. , 170. , 185. , 305. , 183. , 178. ,
       193. , 170. , 211. , 188. , 185. , 173. , 168. , 178. , 191. ,
       201. , 183. , 175. , 173. , 188. , 193. , 157. , 201. , 175. ,
       168. , 198. , 178. , 279. , 165. , 188. , 211. , 170. , 165. ,
       178. , 178. , 173. , 178. , 185. , 183. , 188. , 193. , 165. ,
       170. , 201. , 183. , 180. , 173. , 170. , 180. , 165. , 191. ,
       196. , 180. , 183. , 188. , 163. , 201. , 188. , 183. , 198. ,
       175. , 185. , 175. , 198. , 218. , 185. , 178. , 163. , 175. ,
       188. , 183. , 168. , 188. , 183. , 168. , 206. ,  15.2, 168. ,
       175. , 191. , 165. , 168. , 191. , 175. , 229. , 168. , 178. ,
       165. , 137. , 191. , 191. , 175. , 180. , 183. , 185. , 180. ,
       188. , 173. , 218. , 163. , 178. , 175. , 140. , 366. , 160. ,
       165. , 188. , 183. , 196. , 155. , 175. , 188. , 183. , 165. ,
        66. , 170. , 185. ])

wts = np.array([441.,  65.,  90., 441., 122.,  88.,  61.,  81., 104., 108.,  90.,
        90.,  72., 169., 173., 101.,  68.,  57.,  54.,  83.,  90., 122.,
        86., 358., 135., 106., 146.,  63.,  68.,  57.,  98., 270.,  59.,
        50., 101.,  68.,  54.,  81.,  63.,  67., 180.,  77.,  54.,  57.,
        52.,  61.,  95.,  79., 133.,  63., 181.,  68., 216., 135.,  71.,
        54., 124., 155., 113.,  95.,  58.,  54.,  86.,  90.,  52.,  92.,
        90.,  59.,  61., 104.,  86.,  88.,  97.,  68.,  56.,  77., 230.,
       495.,  86.,  55.,  97., 110., 135.,  61.,  99.,  52.,  90.,  59.,
       158.,  74.,  81., 108.,  90., 116., 108.,  74.,  74.,  86.,  61.,
        61.,  62.,  97.,  63.,  81.,  50.,  55.,  54.,  86., 170.,  70.,
        78., 225.,  67.,  79.,  99., 104.,  50., 173.,  88.,  68.,  52.,
        90.,  81., 817.,  56., 135.,  27.,  52.,  90.,  95.,  91., 178.,
       101.,  95., 383.,  90., 171., 187., 132.,  89., 110.,  81.,  54.,
        63., 412., 104., 306.,  56.,  74.,  59.,  80.,  65.,  57., 203.,
        95., 106.,  88.,  96., 108.,  50.,  18.,  56.,  99.,  56.,  91.,
        81.,  88.,  86.,  52.,  81.,  45.,  92., 104., 167.,  16.,  81.,
        77.,  86.,  99., 630., 268.,  50.,  62.,  90., 270., 115.,  79.,
        88.,  83.,  77.,  88.,  79.,   4.,  95.,  90.,  79.,  63.,  79.,
        89., 104.,  57.,  61.,  88.,  54.,  65.,  81., 225., 158.,  61.,
        81., 146.,  83.,  48.,  18., 630.,  77.,  59.,  58.,  77., 119.,
       207.,  65.,  65.,  81.,  54.,  79., 191.,  79.,  14.,  77.,  52.,
        55.,  56., 113.,  90.,  88.,  86.,  49.,  52., 855.,  81., 104.,
        72., 356., 324., 203.,  97.,  99., 106.,  18.,  79.,  58.,  63.,
        59.,  95.,  54.,  65.,  95., 360., 230., 288., 236.,  36., 191.,
        77.,  79., 383.,  86., 225.,  90.,  97.,  52., 135.,  56.,  81.,
       110.,  72.,  59.,  54., 140.,  72.,  90.,  90.,  86.,  77., 101.,
        61.,  81.,  86., 128.,  61., 338., 248.,  90., 101.,  59.,  79.,
        79.,  72.,  70., 158.,  61.,  70.,  79.,  54., 125.,  85., 101.,
        54.,  83.,  99.,  88.,  79.,  83.,  86., 293., 191.,  65.,  69.,
       405.,  59., 117.,  89.,  79.,  54.,  52.,  87.,  80.,  55.,  50.,
        52.,  81., 234.,  86.,  81.,  70.,  90.,  74.,  68.,  83.,  79.,
        56.,  97.,  50.,  70., 117.,  83.,  81., 630.,  56., 108., 146.,
       320.,  85.,  72.,  79., 101.,  56.,  38.,  25.,  54., 104.,  63.,
       171.,  61., 203., 900.,  63.,  74., 113.,  59., 310.,  87., 149.,
        54.,  50.,  79.,  88., 315., 153.,  79.,  52., 191., 101.,  50.,
        92.,  72.,  52., 180.,  49., 437.,  65., 113., 405.,  54.,  56.,
        74.,  59.,  55.,  58.,  81.,  83.,  79.,  71.,  62.,  63., 131.,
        91.,  57.,  77.,  68.,  77.,  54., 101.,  47.,  74., 146.,  99.,
        54., 443., 101., 225., 288., 143., 101.,  74., 288., 158., 203.,
        81.,  54.,  76.,  97.,  81.,  59.,  86.,  82., 105., 331.,  58.,
        54.,  56., 214.,  79.,  73., 117.,  50., 334.,  52.,  71.,  54.,
        41., 135., 135.,  63.,  79., 162.,  95.,  54., 108.,  67., 158.,
        50.,  65., 117.,  39., 473., 135.,  51., 171.,  74., 117.,  50.,
        61.,  95.,  83.,  52.,  17.,  57.,  81.])

In [14]:
pip install line_profiler

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [15]:
%load_ext line_profiler

In [16]:
%lprun -f convert_units convert_units(heroes, hts, wts)

Timer unit: 1e-09 s

Total time: 0.000390303 s
File: /tmp/ipykernel_247/137672715.py
Function: convert_units at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def convert_units(heroes, heights, weights):
     2         1      79472.0  79472.0     20.4      new_hts = [ht * 0.39370  for ht in heights]
     3         1      66851.0  66851.0     17.1      new_wts = [wt * 2.20462  for wt in weights]
     4                                           
     5         1        250.0    250.0      0.1      hero_data = {}
     6                                           
     7       480     103398.0    215.4     26.5      for i,hero in enumerate(heroes):
     8       480     140132.0    291.9     35.9          hero_data[hero] = (new_hts[i], new_wts[i])
     9                                           
    10         1        200.0    200.0      0.1      return hero_data

* What percentage of time is spent on the new_hts list comprehension line of code relative to the total amount of time spent in the convert_units() function?
	* (22% or see above for percent time)

Using %lprun: fix the bottleneck
In the previous exercise, you profiled the convert_units() function and saw that the new_hts list comprehension could be a potential bottleneck. Did you notice that the new_wts list comprehension also accounted for a similar percentage of the runtime? This is an indication that you may want to create the new_hts and new_wts objects using a different technique.

Since the height and weight of each hero is stored in a numpy array, you can use array broadcasting rather than list comprehension to convert the heights and weights. 

In [17]:
def convert_units_broadcast(heroes, heights, weights):

    # Array broadcasting instead of list comprehension
    new_hts = heights * 0.39370
    new_wts = weights * 2.20462

    hero_data = {}

    for i,hero in enumerate(heroes):
        hero_data[hero] = (new_hts[i], new_wts[i])

    return hero_data

In [18]:
%lprun -f convert_units_broadcast convert_units_broadcast(heroes, hts, wts)

Timer unit: 1e-09 s

Total time: 0.000285247 s
File: /tmp/ipykernel_247/2887860108.py
Function: convert_units_broadcast at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def convert_units_broadcast(heroes, heights, weights):
     2                                           
     3                                               # Array broadcasting instead of list comprehension
     4         1      39351.0  39351.0     13.8      new_hts = heights * 0.39370
     5         1       3790.0   3790.0      1.3      new_wts = weights * 2.20462
     6                                           
     7         1        230.0    230.0      0.1      hero_data = {}
     8                                           
     9       480      81901.0    170.6     28.7      for i,hero in enumerate(heroes):
    10       480     159815.0    332.9     56.0          hero_data[hero] = (new_hts[i], new_wts[i])
    11                          

* By profiling the **convert_units()** function, you were able to see that using list comprehension was not the most efficient solution for creating the new_hts and new_wts objects. 
* You also saw that using array broadcasting in the convert_units_broadcast() function dramatically decreased the percentage of time spent executing these lines of code. 
	* You may have noticed that your function still takes a while to iterate through the for loop. Don't worry; you'll cover how to make this loop more efficient in a later chapter

<br>

### Code Profiling for Memory Usage
![Screen Shot 2023-03-03 at 3.11.15 PM](Screen%20Shot%202023-03-03%20at%203.11.15%20PM.png)
* Quick and Dirty but limited in only showing memory alloaction for single object memory size
* Package : `memory_profiler`
	* pip install memory_profiler
	* %load_ext memory_profiler
	* %mprun -f convert_units(heroes, hts, wts)
	* Remember that using %mprun requires one additional step compared to using %lprun (i.e., you need to import the function in order to use %mprun on it) 

* Load the memory_profiler package into your IPython session.
* Import calc_bmi_lists from bmi_lists.
* Once you've completed the above steps, use %mprun to profile the calc_bmi_lists() function acting on your superheroes data.

```python
from bmi_lists import calc_bmi_lists
%load_ext memory_profiler
%mprun -f calc_bmi_lists calc_bmi_lists(sample_indices, hts, wts)
```
Filename: /tmp/tmptte7kozn/bmi_lists.py

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
     1    114.1 MiB    114.1 MiB           1   def calc_bmi_lists(sample_indices, hts, wts):
     2                                         
     3                                             # Gather sample heights and weights as lists
     4    114.1 MiB      0.0 MiB       25003       s_hts = [hts[i] for i in sample_indices]
     5    114.1 MiB      0.0 MiB       25003       s_wts = [wts[i] for i in sample_indices]
     6                                         
     7                                             # Convert heights from cm to m and square with list comprehension
     8    114.1 MiB      0.0 MiB       25003       s_hts_m_sqr = [(ht / 100) ** 2 for ht in s_hts]
     9                                         
    10                                             # Calculate BMIs as a list with list comprehension
    11    114.3 MiB      0.2 MiB       25003       bmis = [s_wts[i] / s_hts_m_sqr[i] for i in range(len(sample_indices))]
    12                                         
    13    114.3 MiB      0.0 MiB           1       return bmis
    
* **How much memory do the list comprehension lines of code consume in the calc_bmi_lists() function**?
* Using a list comprehension approach allocates anywhere from **0.1 MiB to 2 MiB** of memory to calculate your BMIs.
	* If you run %mprun multiple times within your session, you may notice that the Increment column reports 0.0 MiB for all lines of code. This is due to a limitation with the magic command. After running %mprun once, the memory allocation analyzed previously is taken into account for all consecutive runs and %mprun will start from the place the first run left off.


```python
# Use get_publisher_heroes() to gather Star Wars heroes
star_wars_heroes = get_publisher_heroes(heroes, publishers, 'George Lucas')

print(star_wars_heroes)
print(type(star_wars_heroes))

# Use get_publisher_heroes_np() to gather Star Wars heroes
star_wars_heroes_np = get_publisher_heroes_np(heroes, publishers, 'George Lucas')

print(star_wars_heroes_np)
print(type(star_wars_heroes_np))

['Darth Vader', 'Han Solo', 'Luke Skywalker', 'Yoda']
<class 'list'>
['Darth Vader' 'Han Solo' 'Luke Skywalker' 'Yoda']
<class 'numpy.ndarray'>
```

```python
%load_ext line_profiler
%lprun -f get_publisher_heroes get_publisher_heroes(heroes, publishers, 'George Lucas')
```
Timer unit: 1e-06 s

Total time: 0.000426 s
File: <ipython-input-1-5a6bc05c1c55>
Function: get_publisher_heroes at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           def get_publisher_heroes(heroes, publishers, desired_publisher):
     2                                           
     3         1          2.0      2.0      0.5      desired_heroes = []
     4                                           
     5       481        207.0      0.4     48.6      for i,pub in enumerate(publishers):
     6       480        194.0      0.4     45.5          if pub == desired_publisher:
     7         4         22.0      5.5      5.2              desired_heroes.append(heroes[i])
     8                                           
     9         1          1.0      1.0      0.2      return desired_heroes

```pythoh
In [1]:
%load_ext line_profiler
In [2]:
%lprun -f get_publisher_heroes_np get_publisher_heroes_np(heroes, publishers, 'George Lucas')
```
In [1]:
%load_ext line_profiler
In [2]:
%lprun -f get_publisher_heroes_np get_publisher_heroes_np(heroes, publishers, 'George Lucas')
Timer unit: 1e-06 s

Total time: 0.00026 s
File: <ipython-input-1-5a6bc05c1c55>
Function: get_publisher_heroes_np at line 12

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    12                                           def get_publisher_heroes_np(heroes, publishers, desired_publisher):
    13                                           
    14         1        178.0    178.0     68.5      heroes_np = np.array(heroes)
    15         1         51.0     51.0     19.6      pubs_np = np.array(publishers)
    16                                           
    17         1         31.0     31.0     11.9      desired_heroes = heroes_np[pubs_np == desired_publisher]
    18                                           
    19         1          0.0      0.0      0.0      return desired_heroes

<br>

## Gaining efficiencies
* This chapter covers more complex efficiency tips and tricks. You'll learn a few useful built-in modules for writing efficient code and practice using set theory. You'll then learn about looping patterns in Python and how to make them more efficient.

![Screen Shot 2023-03-03 at 3.49.01 PM](Screen%20Shot%202023-03-03%20at%203.49.01%20PM.png)
* Counting w/Loop
	* Our Pokémon dataset describes 720 characters. 
	* We'd like to create a dictionary where each key is a Pokémon type, and each value is the count of characters that belong to that type. 
	* Using a standard dictionary approach, we have to instantiate an empty output dictionary. 
	* Then, we iterate over the poke_types list and check whether or not each poke_type exists within the type_counts dictionary. 
	* If the poke_type is not in the dictionary, we create a new key and initialize its count value as one. 
	* If the poke_type is already in the dictionary, we update the count by one.

* collections.Counter()
	* Using **Counter** is a more efficient approach
	* Counter returns a Counter dictionary of key-value pairs. When printed, it's ordered by highest to lowest counts. 
	* If compring runtime times, we'd see that using Counter takes half the time as the standard dictionary approach 

#### Combining Pokémon names and types
Three lists have been loaded into your session from a dataset that contains 720 Pokémon:

* The names list contains the names of each Pokémon.
* The primary_types list contains the corresponding primary type of each Pokémon.
* The secondary_types list contains the corresponding secondary type of each Pokémon (nan if the Pokémon has only one type).

```python
# Combine the names list and the primary_types list into a new list object (called names_type1).
names_type1 = [*zip(names, primary_types)]
print(*names_type1[:5], sep='\n')
('Abomasnow', 'Grass')
('Abra', 'Psychic')
('Absol', 'Dark')
('Accelgor', 'Bug')
('Aerodactyl', 'Rock')
```

```python
# Combine all three lists together
names_types = [*zip(names, primary_types, secondary_types)]

print(*names_types[:5], sep='\n')
('Abomasnow', 'Grass', 'Ice')
('Abra', 'Psychic', nan)
('Absol', 'Dark', nan)
('Accelgor', 'Bug', nan)
('Aerodactyl', 'Rock', 'Flying')
```

```python
# Use zip() to combine the first five items from the names list and the first three items from the primary_types list.
# Combine five items from names and three items from primary_types
differing_lengths = [*zip(names[:5], primary_types[:3])]

print(*differing_lengths, sep='\n')
    ('Abomasnow', 'Grass')
    ('Abra', 'Psychic')
    ('Absol', 'Dark')
```
* Did you notice that if you provide zip() with objects of differing lengths, it will only combine until the smallest lengthed object is exhausted?

#### Counting Pokémon from a sample
A sample of 500 Pokémon has been generated, and three lists from this sample have been loaded into your session:

* The names list contains the names of each Pokémon in the sample.
* The primary_types list containing the corresponding primary type of each Pokémon in the sample.
* The generations list contains the corresponding generation of each Pokémon in the sample.

```python
# Collect the count of primary types
type_count = Counter(primary_types)
print(type_count, '\n')

# Collect the count of generations
gen_count = Counter(generations)
print(gen_count, '\n')

# Use list comprehension to get each Pokémon's starting letter
starting_letters = [name[0] for name in names]

# Collect the count of Pokémon for each starting_letter
starting_letters_count = Counter(starting_letters)
print(starting_letters_count)

Counter({'Water': 66, 'Normal': 64, 'Bug': 51, 'Grass': 47, 'Psychic': 31, 'Rock': 29, 'Fire': 27, 'Electric': 25, 'Ground': 23, 'Fighting': 23, 'Poison': 22, 'Steel': 18, 'Ice': 16, 'Fairy': 16, 'Dragon': 16, 'Ghost': 13, 'Dark': 13}) 

Counter({5: 122, 3: 103, 1: 99, 4: 78, 2: 51, 6: 47}) 

Counter({'S': 83, 'C': 46, 'D': 33, 'M': 32, 'L': 29, 'G': 29, 'B': 28, 'P': 23, 'A': 22, 'K': 20, 'E': 19, 'W': 19, 'T': 19, 'F': 18, 'H': 15, 'R': 14, 'N': 13, 'V': 10, 'Z': 8, 'J': 7, 'I': 4, 'O': 3, 'Y': 3, 'U': 2, 'X': 1})
```

Combinations of Pokémon
Ash, a Pokémon trainer, encounters a group of five Pokémon. These Pokémon have been loaded into a list within your session (called pokemon) and printed into the console for your convenience.

Ash would like to try to catch some of these Pokémon, but his Pokédex can only store two Pokémon at a time. Let's use combinations from the itertools module to see what the possible pairs of Pokémon are that Ash could catch.

```python
# Import combinations from itertools
from itertools import combinations

# Create a combination object with pairs of Pokémon
combos_obj = combinations(pokemon, 2)
print(type(combos_obj), '\n')

# Convert combos_obj to a list by unpacking
combos_2 = [*combos_obj]
print(combos_2, '\n')

# Collect all possible combinations of 4 Pokémon directly into a list
combos_4 = [*combinations(pokemon, 4)]
print(combos_4)

<script.py> output:
    <class 'itertools.combinations'> 
    
    [('Geodude', 'Cubone'), ('Geodude', 'Lickitung'), ('Geodude', 'Persian'), ('Geodude', 'Diglett'), ('Cubone', 'Lickitung'), ('Cubone', 'Persian'), ('Cubone', 'Diglett'), ('Lickitung', 'Persian'), ('Lickitung', 'Diglett'), ('Persian', 'Diglett')] 
    
    [('Geodude', 'Cubone', 'Lickitung', 'Persian'), ('Geodude', 'Cubone', 'Lickitung', 'Diglett'), ('Geodude', 'Cubone', 'Persian', 'Diglett'), ('Geodude', 'Lickitung', 'Persian', 'Diglett'), ('Cubone', 'Lickitung', 'Persian', 'Diglett')]

```
* combinations() allows you to specify any size of combinations by passing an integer as the second argument
* Ash has 10 combination option when his pokedex can store only two pokemon, he has 5 combination option when his pokedex can store four pokemon

### Set Theory
* The main takeaway is that when we'd like to compare objects multiple times and in different ways, we should consider storing our data in sets to leverage these elegant and efficient methods.

![Screen Shot 2023-03-03 at 4.16.45 PM](Screen%20Shot%202023-03-03%20at%204.16.45%20PM.png)


#### Comparing Pokédexes
Two Pokémon trainers, Ash and Misty, would like to compare their individual collections of Pokémon. Let's see what Pokémon they have in common and what Pokémon Ash has that Misty does not.

In [19]:
ash_pokedex = ['Pikachu', 'Bulbasaur', 'Koffing', 'Spearow', 'Vulpix', 'Wigglytuff', 'Zubat', 'Rattata', 'Psyduck', 'Squirtle']
misty_pokedex = ['Krabby', 'Horsea', 'Slowbro', 'Tentacool', 'Vaporeon', 'Magikarp', 'Poliwag', 'Starmie', 'Psyduck', 'Squirtle']
brock_pokedex = ['Onix','Geodude','Zubat','Golem','Vulpix','Tauros','Kabutops','Omastar','Machop','Dugtrio']

In [20]:
# https://www.w3schools.com/python/python_ref_set.asp (Set Methods)

# Convert both lists to sets
ash_set = set(ash_pokedex)
misty_set = set(misty_pokedex)

# Find the Pokémon that exist in both sets
both = ash_set.intersection(misty_set)
print(both)

# Find the Pokémon that Ash has and Misty does not have
ash_only = ash_set.difference(misty_set)
print(ash_only)

# Find the Pokémon that are in only one set (not both)
unique_to_set = ash_set.symmetric_difference(misty_set)
print(unique_to_set)

{'Squirtle', 'Psyduck'}
{'Bulbasaur', 'Pikachu', 'Wigglytuff', 'Rattata', 'Vulpix', 'Zubat', 'Koffing', 'Spearow'}
{'Bulbasaur', 'Vaporeon', 'Pikachu', 'Magikarp', 'Wigglytuff', 'Rattata', 'Vulpix', 'Koffing', 'Starmie', 'Poliwag', 'Krabby', 'Zubat', 'Horsea', 'Tentacool', 'Spearow', 'Slowbro'}


In [21]:
# Convert Brock's Pokédex to a set
brock_pokedex_set = set(brock_pokedex)
print(brock_pokedex_set, '\n')

# Check if 'Psyduck' is in Ash's Pokédex list (ash_pokedex) and if 'Psyduck' is in Brock's Pokédex set (brock_pokedex_set).
print('Psyduck' in ash_pokedex)
print('Psyduck' in brock_pokedex_set, '\n')

# Check if Machop is in Ash's list and Brock's set
print('Machop' in ash_pokedex)
print('Machop' in brock_pokedex_set, '\n')

{'Dugtrio', 'Kabutops', 'Vulpix', 'Tauros', 'Geodude', 'Onix', 'Machop', 'Zubat', 'Omastar', 'Golem'} 

True
False 

False
True 



In [22]:
%timeit 'Machop' in ash_pokedex

162 ns ± 0.745 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [23]:
%timeit 'Machop' in brock_pokedex_set

36.5 ns ± 0.0855 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


* Membership testing is much faster when you use sets. Did you notice that using a set for member testing is faster than using a list regardless if the item you are checking is in the set? Checking for 'Psyduck' (which was not in Brock's set) is still faster than checking for 'Psyduck' in Ash's list!

#### Gathering unique Pokémon
A sample of 500 Pokémon has been created with replacement (meaning a Pokémon could be selected more than once and duplicates exist within the sample).

Three lists have been loaded into your session:

* The names list contains the names of each Pokémon in the sample.
* The primary_types list containing the corresponding primary type of each Pokémon in the sample.
* The generations list contains the corresponding generation of each Pokémon in the sample.

```python
# Use find_unique_items() to collect unique Pokémon names
uniq_names_func = find_unique_items(names)
print(len(uniq_names_func))

# Convert the names list to a set to collect unique Pokémon names
uniq_names_set = set(names)
print(len(uniq_names_set))

# Check that both unique collections are equivalent
print(sorted(uniq_names_func) == sorted(uniq_names_set))

# Use the best approach to collect unique primary types and generations
uniq_types = set(primary_types) 
uniq_gens = set(generations)
print(uniq_types, uniq_gens, sep='\n') 
<script.py> output:
    368
    368
    True
    {'Grass', 'Dark', 'Dragon', 'Bug', 'Electric', 'Poison', 'Water', 'Normal', 'Ground', 'Rock', 'Steel', 'Ice', 'Fire', 'Psychic', 'Fighting', 'Ghost', 'Fairy'}
    {1, 2, 3, 4, 5, 6}
```


* Using a set data type to collect unique values is much faster than using a for loop (like in the find_unique_items() function). Since a set is defined as a collection of distinct elements, it is an efficient way to collect unique items from an existing object.

### Eliminating Loops

#### Gathering Pokémon without a loop
A list containing 720 Pokémon has been loaded into your session as poke_names. Another list containing each Pokémon's corresponding generation has been loaded as poke_gens.

A for loop has been created to filter the Pokémon that belong to generation one or two, and collect the number of letters in each Pokémon's name:

```python
gen1_gen2_name_lengths_loop = []

for name,gen in zip(poke_names, poke_gens):
    if gen < 3:
        name_length = len(name)
        poke_tuple = (name, name_length)
        gen1_gen2_name_lengths_loop.append(poke_tuple)
```

```python
# quick idea of the two list objets in use
# poke_names[:10] = ['Abomasnow', 'Abra', 'Absol', 'Accelgor', 'Aerodactyl', 'Aggron', 'Aipom', 'Alakazam', 'Alomomola', 'Altaria']
# poke_gens[:10] = [4, 1, 3, 5, 1, 3, 2, 1, 5, 3]

# Collect Pokémon that belong to generation 1 or generation 2
gen1_gen2_pokemon = [name for name, gen in zip(poke_names, poke_gens) if gen < 3]

# Create a map object that stores the name lengths
name_lengths_map = map(len, gen1_gen2_pokemon)

# Combine gen1_gen2_pokemon and name_lengths_map into a list
gen1_gen2_name_lengths = [*zip(gen1_gen2_pokemon, name_lengths_map)]

print(gen1_gen2_name_lengths_loop[:5])
print(gen1_gen2_name_lengths[:5])

[('Abra', 4), ('Aerodactyl', 10), ('Aipom', 5), ('Alakazam', 8), ('Ampharos', 8)]
[('Abra', 4), ('Aerodactyl', 10), ('Aipom', 5), ('Alakazam', 8), ('Ampharos', 8)]

# To eliminate all
[(name, len(name)) for name,gen in zip(poke_names, poke_gens) if gen < 3]
```

#### Pokémon totals and averages without a loop
A list of 720 Pokémon has been loaded into your session called names. Each Pokémon's corresponding statistics has been loaded as a NumPy array called stats. Each row of stats corresponds to a Pokémon in names and each column represents an individual Pokémon stat (HP, Attack, Defense, Special Attack, Special Defense, and Speed respectively.)

You want to gather each Pokémon's total stat value (i.e., the sum of each row in stats) and each Pokémon's average stat value (i.e., the mean of each row in stats) so that you find the strongest Pokémon.

```python
# Below for loop was written to collect these values
poke_list = []

for pokemon,row in zip(names, stats):
    total_stats = np.sum(row)
    avg_stats = np.mean(row)
    poke_list.append((pokemon, total_stats, avg_stats))
```
* output for poke_list subset
	* poke_list[:2] : [('Abomasnow', 494, 82.33333333333333), ('Abra', 310, 51.666666666666664)] 
	* we see the matching 494 for the row sum total below 

* Replace the above for loop using NumPy:
```python
# Create a total stats array (total_stats_np) using the .sum() method .. remember the axis!
# stats is a numpy arrray with a shape of (720, 6) (720 rows and 6 columns), we want to sum over the row
total_stats_np = stats.sum(axis=1)

# Create an average stats array (each row average)
avg_stats_np = stats.mean(axis=1)

# Combine names, total_stats_np, and avg_stats_np into a list
poke_list_np = [*zip(names, total_stats_np, avg_stats_np)]

print(poke_list_np == poke_list, '\n')
print(poke_list_np[:3])
print(poke_list[:3], '\n')
top_3 = sorted(poke_list_np, key=lambda x: x[1], reverse=True)[:3]
print('3 strongest Pokémon:\n{}'.format(top_3))

# Print out below from statements above, neat sorted key for passing each tupples average as the poke tuple to sort by in descending order
True 

[('Abomasnow', 494, 82.33333333333333), ('Abra', 310, 51.666666666666664), ('Absol', 465, 77.5)]
[('Abomasnow', 494, 82.33333333333333), ('Abra', 310, 51.666666666666664), ('Absol', 465, 77.5)] 

3 strongest Pokémon:
[('GroudonPrimal Groudon', 770, 128.33333333333334), ('KyogrePrimal Kyogre', 770, 128.33333333333334), ('Arceus', 720, 120.0)]
```


In [24]:
stats_min = np.array([[ 90,  92,  75,  92,  85,  60],
       [ 25,  20,  15, 105,  55,  90]])

stats_min.sum(), stats_min.sum(axis=1), stats_min.sum(axis=0) 
# Quick sum type options for numpy array refresher
# 1 is the entire array (regardless of shape), 2 is across the row with axis=1, 3 is across each shared index/column in the numpy arr

(804, array([494, 310]), array([115, 112,  90, 197, 140, 150]))

### Writing Better Loops
* One-Time Calculations
* Holistic Approach
* The two above approaches can be referenced when mutating datatypes or generating a single summary value used in a loop
	* a better approach that takes less time is moving the agg summary value outside of the loop (above) and using a `map` function on a result set from a loop return and mutating after and not within the loop


#### One-time calculation loop
A list of integers that represents each Pokémon's generation has been loaded into your session called generations. You'd like to gather the counts of each generation and determine what percentage each generation accounts for out of the total count of integers.

The below loop was written to accomplish this task:
```python
for gen,count in gen_counts.items():
    total_count = len(generations)
    gen_percent = round(count / total_count * 100, 2)
    print(
      'generation {}: count = {:3} percentage = {}'
      .format(gen, count, gen_percent)
    )
```

* Let's make this loop more efficient by moving a one-time calculation outside the loop

```python
# Import Counter
from collections import Counter

# Collect the count of each generation
gen_counts = Counter(generations)

# Improve for loop by moving one calculation above the loop
total_count = len(generations)

for gen,count in gen_counts.items():
    gen_percent = round(count / total_count * 100, 2)
    print('generation {}: count = {:3} percentage = {}'
          .format(gen, count, gen_percent))

<script.py> output:
    generation 4: count = 112 percentage = 15.56
    generation 1: count = 151 percentage = 20.97
    generation 3: count = 136 percentage = 18.89
    generation 5: count = 154 percentage = 21.39
    generation 2: count =  99 percentage = 13.75
    generation 6: count =  68 percentage = 9.44

print(gen_counts)
Counter({5: 154, 1: 151, 3: 136, 4: 112, 2: 99, 6: 68})

sum(gen_counts.values())
Out[4]:
720
# Unsure of why a counter was brought in here as it isn't used in the generationg/percentage ouptut, however, a counters values can be summed just as it where like any other dict
```

#### Holistic conversion loop
A list of all possible Pokémon types has been loaded into your session as pokemon_types. It's been printed in the console for convenience.
* Possible Pokémon types: ['Bug', 'Dark', 'Dragon', 'Electric', 'Fairy', 'Fighting', 'Fire', 'Flying', 'Ghost', 'Grass', 'Ground', 'Ice', 'Normal', 'Poison', 'Psychic', 'Rock', 'Steel', 'Water']

You'd like to gather all the possible pairs of Pokémon types. You want to store each of these pairs in an individual list with an enumerated index as the first element of each list. This allows you to see the total number of possible pairs and provides an indexed label for each pair.

The below loop was written to accomplish this task:
```python
enumerated_pairs = []

for i,pair in enumerate(possible_pairs, 1):
    enumerated_pair_tuple = (i,) + pair
    enumerated_pair_list = list(enumerated_pair_tuple)
    enumerated_pairs.append(enumerated_pair_list)
```

```python
from itertools import combinations

# Collect all possible pairs using combinations() : Pokémon types (each pair has 2 Pokémon types).
possible_pairs = [*combinations(pokemon_types, 2)]
# Create an empty list called enumerated_tuples
enumerated_tuples = []
# Append each enumerate_pair_tuple to the empty list above 
for i, pair in enumerate(possible_pairs, 1):
    enumerated_pair_tuple = (i,) + pair
    enumerated_tuples.append(enumerated_pair_tuple)

# Convert all tuples in enumerated_tuples to a list
enumerated_pairs = [*map(list, enumerated_tuples)]
print(enumerated_pairs[:5])
[[1, 'Bug', 'Dark'], [2, 'Bug', 'Dragon'], [3, 'Bug', 'Electric'], [4, 'Bug', 'Fairy'], [5, 'Bug', 'Fighting']]
```

#### Bringing it all together: Pokémon z-scores
A list of 720 Pokémon has been loaded into your session as names. Each Pokémon's corresponding Health Points is stored in a NumPy array called hps. You want to analyze the Health Points using the z-score to see how many standard deviations each Pokémon's HP is from the mean of all HPs.

The below code was written to calculate the HP z-score for each Pokémon and gather the Pokémon with the highest HPs based on their z-scores:

```python
poke_zscores = []
for name,hp in zip(names, hps):
    hp_avg = hps.mean()
    hp_std = hps.std()
    z_score = (hp - hp_avg)/hp_std
    poke_zscores.append((name, hp, z_score))
    
highest_hp_pokemon = []

for name,hp,zscore in poke_zscores:
    if zscore > 2:
        highest_hp_pokemon.append((name, hp, zscore))
```

* Adjusted
```python
# Calculate the total HP avg and total HP standard deviation
hp_avg = hps.mean()
hp_std = hps.std()

# Use NumPy to eliminate the previous for loop
z_scores = (hps - hp_avg)/hp_std

# Combine names, hps, and z_scores
poke_zscores2 = [*zip(names, hps, z_scores)]
print(*poke_zscores2[:3], sep='\n')

# Use list comprehension with the same logic as the highest_hp_pokemon code block
highest_hp_pokemon2 = [(name, hps, z_score) for name,hps,z_score in poke_zscores2 if z_score > 2]
print(*highest_hp_pokemon2[:3], sep='\n')

# first print out
('Abomasnow', 80.0, 0.46797638117739043)
('Abra', 60.0, -0.3271693284337512)
('Absol', 131.0, 2.4955979406858013)

# second print out
('Absol', 131.0, 2.4955979406858013)
('Bonsly', 127.0, 2.3365687987635733)
('Caterpie', 122.0, 2.137782371360788)
```



## Basic Pandas Optimizations
* Iterating with .iterrows()
	* In the video, we discussed that .iterrows() returns each DataFrame row as a tuple of (index, pandas Series) pairs. But, what does this mean? Let's explore with a few coding exercises.

In [2]:
# pit_df = stats for MLB named Pittsburgh Pirates 
import pandas as pd
pit_df = pd.DataFrame({
    'Team': ['PIT' for x in range(5)],
    'League': ['NL' for x in range(5)],
    'Year': [2012, 2011, 2010, 2009, 2008],
    'RS' : [651, 610, 587, 636, 735],
    'RA' : [674, 712, 866, 768, 884],
    'W': [79, 72, 57, 62, 67],
    'G': [162, 162, 162, 161, 162],
    'Playoffs': [0 for x in range(5)]
})
display(pit_df)

Unnamed: 0,Team,League,Year,RS,RA,W,G,Playoffs
0,PIT,NL,2012,651,674,79,162,0
1,PIT,NL,2011,610,712,72,162,0
2,PIT,NL,2010,587,866,57,162,0
3,PIT,NL,2009,636,768,62,161,0
4,PIT,NL,2008,735,884,67,162,0


In [3]:
# loop over pit_df and print each row
for i, row in pit_df.iterrows():
    print(row, '\n')

Team         PIT
League        NL
Year        2012
RS           651
RA           674
W             79
G            162
Playoffs       0
Name: 0, dtype: object 

Team         PIT
League        NL
Year        2011
RS           610
RA           712
W             72
G            162
Playoffs       0
Name: 1, dtype: object 

Team         PIT
League        NL
Year        2010
RS           587
RA           866
W             57
G            162
Playoffs       0
Name: 2, dtype: object 

Team         PIT
League        NL
Year        2009
RS           636
RA           768
W             62
G            161
Playoffs       0
Name: 3, dtype: object 

Team         PIT
League        NL
Year        2008
RS           735
RA           884
W             67
G            162
Playoffs       0
Name: 4, dtype: object 



In [4]:
# Run differentials with .iterrows()
# Create an empty list to store run differentials
run_diffs = []

# Write a for loop and collect runs allowed and runs scored for each row
for i,row in pit_df.iterrows():
    runs_scored = row['RS']
    runs_allowed = row['RA']
    
    # get differential
    run_diff = runs_scored - runs_allowed
    
    # Append each run differential to the output list
    run_diffs.append(run_diff)

pit_df['RD'] = run_diffs
print(pit_df)

  Team League  Year   RS   RA   W    G  Playoffs   RD
0  PIT     NL  2012  651  674  79  162         0  -23
1  PIT     NL  2011  610  712  72  162         0 -102
2  PIT     NL  2010  587  866  57  162         0 -279
3  PIT     NL  2009  636  768  62  161         0 -132
4  PIT     NL  2008  735  884  67  162         0 -149


### Iterating with .itertuples()
Remember, .itertuples() returns each DataFrame row as a special data type called a namedtuple. You can look up an attribute within a namedtuple with a special syntax. Let's practice working with namedtuples.
* .itertuples() data return/storage computes far faster than iterrows
* https://stackoverflow.com/questions/19112398/getting-list-of-lists-into-pandas-dataframe
	* can simply pass a list of list and transform too


```python
rangers_df_mock = pd.DataFrame(rangers_df[col].values.tolist() for col in rangers_df.columns.tolist())
print(rangers_df_mock.head())
     0   1     2    3    4   5    6  7
0  TEX  AL  2012  808  707  93  162  1
1  TEX  AL  2011  855  677  96  162  1
2  TEX  AL  2010  787  687  90  162  1
3  TEX  AL  2009  784  740  87  162  0
4  TEX  AL  2008  901  967  79  162  0
```

In [7]:
rangers_df = pd.DataFrame({
    'Team': ['TEX' for x in range(37)],
    'League': ['AL' for x in range(37)],
    'Year': [2012,2011,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001,2000,1999,1998,1997,1996,
              1993,1992,1991,1990,1989,1988,1987,1986,1985,1984,1983,1982,1980,1979,1978,1977,1976,1975,1974,1973],
    'RS' :  [808,855,787,784,901,816,835,865,860,826,843,890,848,945,940,807,928,835,682,829,676,
              695,637,823,771,617,656,639,590,756,750,692,767,616,714,690,619],
    'RA' : [707, 677, 687, 740, 967, 844, 784, 858, 794, 969, 882, 968, 974, 859, 871, 823, 799, 751, 753, 814, 696, 714, 735, 849,               743, 785, 714, 609, 749, 752, 698, 632, 657, 652, 733, 698, 844],
    'W' : [93, 96, 90, 87, 79, 75, 80, 79, 89, 71, 72, 73, 71, 95, 88, 77, 90, 86, 77, 85, 83, 83, 70, 75, 87, 62, 69, 77, 64, 76,                 83, 87, 94, 76, 79, 83, 57],
    'G' : [162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 162, 163, 162, 162, 162, 162, 162, 161, 162,                 162, 161, 161, 163, 162, 163, 162, 162, 162, 162, 162, 161, 162],
    'Playoffs': [1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
})
display(rangers_df.head())
display(rangers_df.describe())

Unnamed: 0,Team,League,Year,RS,RA,W,G,Playoffs
0,TEX,AL,2012,808,707,93,162,1
1,TEX,AL,2011,855,677,96,162,1
2,TEX,AL,2010,787,687,90,162,1
3,TEX,AL,2009,784,740,87,162,0
4,TEX,AL,2008,901,967,79,162,0


Unnamed: 0,Year,RS,RA,W,G,Playoffs
count,37.0,37.0,37.0,37.0,37.0,37.0
mean,1992.702703,772.756757,777.864865,79.945946,161.972973,0.162162
std,12.004316,100.474542,96.902919,9.356952,0.440106,0.373684
min,1973.0,590.0,609.0,57.0,161.0,0.0
25%,1983.0,690.0,707.0,75.0,162.0,0.0
50%,1992.0,787.0,752.0,79.0,162.0,0.0
75%,2003.0,843.0,844.0,87.0,162.0,0.0
max,2012.0,945.0,974.0,96.0,163.0,1.0


In [9]:
for row in rangers_df.itertuples():
    print(row)
    break

Pandas(Index=0, Team='TEX', League='AL', Year=2012, RS=808, RA=707, W=93, G=162, Playoffs=1)


In [10]:
# Loop over the DataFrame and print each row's Index, Year and Wins (W)
for row in rangers_df.itertuples():
  i = row.Index
  year = row.Year
  wins = row.W 
  print(i, year, wins)
  break

0 2012 93


In [12]:
# Only for these rows where the Rangers made the playoffs
for row in rangers_df.itertuples():
    i = row.Index 
    year = row.Year
    wins = row.W 
    # Check if the rangers made Playoffs and print the index, year, and wins if so
    if row.Playoffs == 1:
        print(i, year, wins)

0 2012 93
1 2011 96
2 2010 90
13 1999 95
14 1998 88
16 1996 90


```python
run_diffs = []

# Loop over the DataFrame and calculate each row's run differential
for row in yankees_df.itertuples():
    
    runs_scored = row.RS
    runs_allowed = row.RA

    run_diff = calc_run_diff(runs_scored, runs_allowed)
    
    run_diffs.append(run_diff)

# Append new column
yankees_df['RD'] = run_diffs
print(yankees_df[['Year', 'RD']].sort_values('RD', ascending=False))

    Year   RD
14  1998  309
1   2011  210
15  1997  203
10  2002  200
5   2007  191
32  1977  180
25  1985  179
13  1999  169
45  1963  167
2   2010  166
6   2006  163
3   2009  162
9   2003  161
29  1980  158
33  1976  155
31  1978  153
44  1964  153
46  1962  137
0   2012  136
7   2005   97
34  1975   93
11  2001   91
8   2004   89
16  1996   84
26  1984   79
38  1970   68
27  1983   67
4   2008   62
30  1979   62
17  1993   60
24  1986   59
12  2000   57
35  1974   48
36  1973   31
23  1987   30
22  1988   24
37  1971    7
43  1965    7
40  1968    5
42  1966   -1
28  1982   -7
18  1992  -13
39  1969  -25
21  1989  -94
41  1967  -99
19  1991 -103
20  1990 -146
```

### Pandas Alternative to looping
* pandas .apply() method
	* Takes a function and applies it to a DataFrame
	* Must specify an axis to apply (0 for columns; 1 for rows)
	* Generally fairly quicker than either iter function above to loop over dataframe 

#### Analyzing Stats w/Apply

```python
# Apply sum() to each column of rays_df to collect the sum of each column
stat_totals = rays_df.apply(sum, axis=0)
print(stat_totals)
RS          3783
RA          3265
W            458
Playoffs       3
dtype: int64

# Apply sum() to each row of rays_df, only looking at the 'RS' and 'RA' columns, and specify the correct axis
total_runs_scored = rays_df[['RS', 'RA']].apply(sum, axis=1)
print(total_runs_scored)
2012    1274
2011    1321
2010    1451
2009    1557
2008    1445

# Convert numeric playoffs to text by applying text_playoffs() - apply to each row
textual_playoffs = rays_df.apply(lambda row: 'Yes' if row['Playoffs'] == 1 else 'No', axis=1)
print(textual_playoffs)
2012     No
2011    Yes
2010    Yes
2009     No
2008    Yes
dtype: object
```

```python
# Display the first five rows of the DataFrame
print(dbacks_df.head())

# Create a win percentage Series 
win_percs = dbacks_df.apply(lambda row: calc_win_perc(row['W'], row['G']), axis=1)
print(win_percs, '\n')

# Append a new column to dbacks_df
dbacks_df['WP'] = win_percs
print(dbacks_df, '\n')

# Display dbacks_df where WP is greater than 0.50
print(dbacks_df[dbacks_df['WP'] >= 0.50])

    Team League  Year   RS   RA    W    G  Playoffs    WP
    0   ARI     NL  2012  734  688   81  162         0  0.50
    1   ARI     NL  2011  731  662   94  162         1  0.58
    4   ARI     NL  2008  720  706   82  162         0  0.51
    5   ARI     NL  2007  712  732   90  162         1  0.56
    9   ARI     NL  2003  717  685   84  162         0  0.52
    10  ARI     NL  2002  819  674   98  162         1  0.60
    11  ARI     NL  2001  818  677   92  162         1  0.57
    12  ARI     NL  2000  792  754   85  162         0  0.52
    13  ARI     NL  1999  908  676  100  162         1  0.62
```

### Bring it All Together
* Let's compare the approaches you've learned to calculate a predicted win percentage for each season (or row) in your DataFrame.

```python
win_perc_preds_loop = []

# Use a loop and .itertuples() to collect each row's predicted win percentage
for row in baseball_df.itertuples():
    runs_scored = row.RS
    runs_allowed = row.RA
    win_perc_pred = predict_win_perc(runs_scored, runs_allowed)
    win_perc_preds_loop.append(win_perc_pred)

# Apply predict_win_perc to each row of the DataFrame
win_perc_preds_apply = baseball_df.apply(lambda row: predict_win_perc(row['RS'], row['RA']), axis=1)

# Calculate the win percentage predictions using NumPy arrays
win_perc_preds_np = predict_win_perc(baseball_df['RS'].values, baseball_df['RA'].values)
baseball_df['WP_preds'] = win_perc_preds_np
print(baseball_df.head())

<script.py> output:
      Team League  Year   RS   RA   W    G  Playoffs    WP  WP_preds
    0  ARI     NL  2012  734  688  81  162         0  0.50      0.53
    1  ATL     NL  2012  700  600  94  162         1  0.58      0.58
    2  BAL     AL  2012  712  705  93  162         1  0.57      0.50
    3  BOS     AL  2012  734  806  69  162         0  0.43      0.45
    4  CHC     NL  2012  613  759  61  162         0  0.38      0.39
```

* Using NumPy arrays was the fastest approach, followed by the .itertuples() approach, and the .apply() approach was slowest.