<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#List-Comprehensions" data-toc-modified-id="List-Comprehensions-1">List Comprehensions</a></span></li><li><span><a href="#Learning-Outcomes" data-toc-modified-id="Learning-Outcomes-2">Learning Outcomes</a></span></li><li><span><a href="#List-comprehensions-for-accumulator-pattern" data-toc-modified-id="List-comprehensions-for-accumulator-pattern-3">List comprehensions for accumulator pattern</a></span></li><li><span><a href="#List-comprehensions-with-conditional-logic" data-toc-modified-id="List-comprehensions-with-conditional-logic-4">List comprehensions with conditional logic</a></span></li><li><span><a href="#Criteria-for-using-list-comprehensions" data-toc-modified-id="Criteria-for-using-list-comprehensions-5">Criteria for using list comprehensions</a></span></li><li><span><a href="#When-not-use-list-comprehensions" data-toc-modified-id="When-not-use-list-comprehensions-6">When not use list comprehensions</a></span></li><li><span><a href="#List-comprehensions-in-Data-Science-to-process-data" data-toc-modified-id="List-comprehensions-in-Data-Science-to-process-data-7">List comprehensions in Data Science to process data</a></span></li><li><span><a href="#Python-has-many-types-of-comprehensions" data-toc-modified-id="Python-has-many-types-of-comprehensions-8">Python has many types of comprehensions</a></span></li><li><span><a href="#Takeaways" data-toc-modified-id="Takeaways-9">Takeaways</a></span></li><li><span><a href="#Bonus-Material" data-toc-modified-id="Bonus-Material-10">Bonus Material</a></span></li></ul></div>

<center><h2>List Comprehensions</h2></center>

<center><h2>Learning Outcomes</h2></center>

__By the end of this session, you should be able to__:

- Write a list comprehension to elegantly process data.
- Explain when to use and not use a  list comprehension.


List comprehensions for accumulator pattern
----

In [33]:
# Create a list of random numbers

from random import random

values = []
for _ in range(5):
    values.append(random())
values

[0.035828544052667,
 0.43959450755507634,
 0.5244983656093891,
 0.2388040541960864,
 0.10632814115613165]


There must be a better way!

If we need to programmatically build-up a list, we can use list comprehension. 

Our mantra - "Let Python do the work!"

List comprehensions are often shorten to "list comps".

In [34]:
# Create a list of random numbers

values = [random() for _ in range(5)]
values

[0.9048828236673391,
 0.06685606262775556,
 0.9093827144659158,
 0.3849028576059278,
 0.07307132628437607]

List comprehensions with conditional logic
-----

In [35]:
# Create a list of random numbers that is filtered / thresholded

from random import random

values = [random() for _ in range(5)]

threshold = .35
results = []
for v in values:
    if v < threshold:
        results.append(v)
        
results

[0.3084351253375509,
 0.33970280438570977,
 0.27751789328215715,
 0.24924206759781842]

In [36]:
values = [random() for _ in range(5)]
results = [value for value in values if value < threshold]
results

[0.03475076494026519]

Criteria for using list comprehensions
-----

1. The output type is a list.
1. The logic is straightforward. Typically, can be expressed in a single English sentence.



When not use list comprehensions
-----

- Output is not a list.
- Complex logic.
- Need to walk through state to debug.

When in doubt use a for-loop.

List comprehensions in Data Science to process data
-------

List comprehensions are very useful for data munging.


In [37]:
# Get the diagonal of a matrix (without using NumPy)

m = [[1, 2, 3], 
     [4, 5, 6], 
     [7, 8, 9]]

diag = [m[i][i] for i, _ in enumerate(m)]

assert diag == [1, 5, 9]

Comprehension are very useful in data science for data cleaning. 

In [38]:
# Remove all None (aka null)  
nums = [None, 1, None, 3, 12, None] 

nums = [n for n in nums if n is not None]

nums

[1, 3, 12]

__Normalize numbers__

Write a function that linearly rescales a list of numbers values to be [0,1], one type of [normalization](https://en.wikipedia.org/wiki/Normalization_(statistics)). 

Use this formula:

$$ X′ = \frac{X - X_{min}}{X_{max} - X_{min}} $$

Given that the inputs and outputs are both lists, a list comprehension is appropriate.

This is also called Min-Max Feature scaling. Often times in data science, we take raw values and rescale them to be features.

In [39]:
from typing import Sequence

def normalize(nums, value):
    return (value - min(nums)) / (max(nums) - min(nums)) 

nums = list(range(-3, 5)) 

[normalize(nums, v) for v in nums] 

[-0.36363636363636365,
 -0.2727272727272727,
 -0.18181818181818182,
 -0.09090909090909091,
 0.0,
 0.09090909090909091,
 0.18181818181818182,
 0.2727272727272727]

Python has many types of comprehensions
------

Comprehensions types:

- list
- set
- dict
- generator expressions - lazy version

Comprehensions are easy ways to make a complete pass over the data to improve quality of the data.

<center><h2>Takeaways</h2></center>

- List comprehensions are useful ways to process data.
- List comprehensions are good choice when the output is a list and the logic is straightforward.
- However, do not over use them. If the logic becomes complex or error prone, use a for loop instead.

Bonus Material
-----

In [41]:
# Python automatically manages namespace for list comps

In [42]:
reset -fs

In [43]:
whos

Interactive namespace is empty.


In [44]:
names = 'Mingming Stephanie Lina  Matthew Sunny'.split()
names = [name.lower() for name in names]

In [45]:
whos

Variable   Type    Data/Info
----------------------------
names      list    n=5


☝ no `name` variable in the namespace