## Comprehensions

Using comprehensions is often a way both to make code more compact and to shift our focus from the "how" to the "what". It is an expression that uses the same keywords as loop and conditional blocks, but inverts their order to focus on the data rather than on the procedure. 

Simply changing the form of expression can often make a surprisingly large difference in how we reason about code and how easy it is to understand. The ternary operator also performs a similar restructuring of our focus, using the same keywords in a different order.

###  List Comprehensions

A way to create a new list from existing list based on defined logic

#### Unconditional Compreshensions 

In [4]:
doubled_numbers = []

for n in range(1, 12, 2):
    doubled_numbers.append(n * 2)

print(f"{doubled_numbers = }")

doubled_numbers = [2, 6, 10, 14, 18, 22]


In [15]:
%%timeit

# Original
doubled_numbers = []
for n in range(1, 99_99_999, 2):
    doubled_numbers.append(n * 2)

308 ms ± 9.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [6]:
doubled_numbers = [n * 2 for n in range(1, 99_99_999, 2)]
print(doubled_numbers[100:105])

[402, 406, 410, 414, 418]


In [10]:
# To avoid confusion we can use similar style as below
doubled_numbers = [n * 2 
                       for n in range(1, 99, 2)]
print(doubled_numbers[1:5])

[6, 10, 14, 18]


In [16]:
%%timeit

# list compreshensions
doubled_numbers = [n * 2 for n in range(1, 99_99_999, 2)] # 1 ,3, 5, 7, 9, 11

291 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


#### Conditional Compreshensions 

In [14]:
doubled_odds = []

for n in range(1, 12):
    if n % 2 == 1:
        doubled_odds.append(n * 2)
        
print(doubled_odds)

[2, 6, 10, 14, 18, 22]


In [15]:
doubled_odds = [n * 2 for n in range(1,12) if n% 2 == 1]
print(doubled_odds)

[2, 6, 10, 14, 18, 22]


In [2]:
doubled_odds = [n * 2 
                    for n in range(1,12) 
                        if n% 2 == 1]
print(doubled_odds)

[2, 6, 10, 14, 18, 22]


**!!!! Tip !!!!** 

* Copy the variable assignment for our new empty list (line 3)
    `doubled_odds = [`
* Copy the expression that we’ve been append-ing into this new list (line 6)
    `n * 2`
* Copy the for loop line, excluding the final `:` (line 4)
    `for n in numbers` 
* Copy the if statement line, also without the `:` (line 5)
    `if n % 2 == 1`
* Close the list comprehension

In [17]:
%%timeit
# FROM
numbers = range(9999999)

doubled_odds = []
for n in numbers:
    if n % 2 == 1:
        doubled_odds.append(n * 2)

901 ms ± 23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [18]:
%%timeit
# TO
numbers = range(9999999)

doubled_odds = [n * 2 for n in numbers if n % 2 == 1]

751 ms ± 11.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


#### Nested `if` statements in `for` loop

When we have a nested `if-else` conditions, then its better we convert it into single line expression as discussed in `if` section.

In [19]:
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0]

lst = []
for v in l:
    if v == 0 :
        lst.append('Zero')
    else:
        if v % 2 == 0:
            lst.append('even')
        else:
            lst.append('odd')

print(lst)

['odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'Zero']


```python
# Converting the nested `if-else` block into single line expression. 
if v == 0 :
    lst.append('Zero')
else:
    if v % 2 == 0:
        lst.append('even')
    else:
        lst.append('odd')

# can be treated as 
if v == 0 :
    lst.append('Zero')
else:
    # For us its a single value.
    (if v % 2 == 0:
        lst.append('even')
    else:
        lst.append('odd'))
    
# So this can be converted to following
if v == 0 :
    lst.append('Zero')
else:
    # For us its a single value.
    lst.append('even' if v % 2 == 0 else 'odd')   # idea:  x = a if b>10 else c

# Now its a simple if-else statement
lst.append('Zero' if v == 0 else 'even' if v % 2 == 0 else 'odd')


```
so the final solution is shown below.

In [20]:
# we can convert the if statements into
# a single line if statement as shown below

l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0]

lst = []
for v in l:
    lst.append('Zero' if v == 0 else 'even' if v % 2 == 0 else 'odd')

print(lst)

['odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'Zero']


We can treat `"zero" if v == 0 else "even" if v%2 == 0 else "odd"` as a single value (refer to section (Chapter 03 - Control Flow))

In [20]:
lst = ["zero" if v == 0 else "even" if v % 2 == 0 else "odd" for v in l]
print(lst)

['odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'odd', 'even', 'zero']


In [13]:
print(['yes' if v == 1 else 'no' if v == 2 else 'idle' for v in l])

['yes', 'no', 'idle', 'idle', 'idle', 'idle', 'idle', 'idle', 'idle', 'idle', 'idle']


In [3]:
def flatten_list_new(lst, result=None):
    """Flattens a nested list
        >>> flatten_list([[1, 2, [3, 4] ], [5, 6], 7])
        [1, 2, 3, 4, 5, 6, 7]
    """
    if result is None:
        result = []

    for x in lst:
        if isinstance(x, list):
            result.append(flatten_list_new(x, result))
        else:
            result.append(x)

    return result
lst = [[1, 2, [3, [4]] ], [5, 6], 7]
  
print(flatten_list_new(lst))

[1, 2, 3, 4, [...], [...], [...], 5, 6, [...], 7]


In [4]:
# !!! Still not working !!!
def flatten_list_new(lst, result=None):
    """Flattens a nested list
        >>> flatten_list([[1, 2, [3, 4] ], [5, 6], 7])
        [1, 2, 3, 4, 5, 6, 7]
    """
    if result is None:
        result = []

    result = [flatten_list_new(x, result) if isinstance(x, list) else x  for x in lst]

    return result
lst = [[1, 2, [3, [4]] ], [5, 6], 7]
  
print(flatten_list_new(lst))

[[1, 2, [3, [4]]], [5, 6], 7]


In [5]:
newlist = []
input_list = [1,2, [2,[3]],3,[3,[[4],5]]]

def convertHetrogenousList(hetroList):
    newlist = []
    if type(hetroList) is int:
        newlist.append(hetroList)
    elif type(hetroList) is list:
        for items in hetroList:
            newlist.extend(convertHetrogenousList(items))
    return newlist

newlist = convertHetrogenousList(input_list)
print(newlist)

[1, 2, 2, 3, 3, 3, 4, 5]


In [6]:
from collections.abc import Iterable

newlist = []
input_list = [1, 2, [2, [3]], 3, [3, [[4], 5]]]

def convertHetrogenousList(hetroList):
    newlist = []
    if isinstance(hetroList, Iterable):
        for items in hetroList:
            if isinstance(items, Iterable):
                newlist.append(convertHetrogenousList(items))
            else:
                newlist.append(items)
    else:
        return hetroList
    return newlist

newlist = convertHetrogenousList(input_list)
print(newlist)

[1, 2, [2, [3]], 3, [3, [[4], 5]]]


In [108]:
isinstance(1, Iterable)

False

In [1]:
newlist = []
input_list = [1,2, [2,[3]],3,[3,[[4],5]]]

def convertHetrogenousList(hetroList):
    newlist = []
    if type(hetroList) is int:
        newlist.append(hetroList)
    elif type(hetroList) is list:
        for items in hetroList:
            newlist.extend(convertHetrogenousList(items))
    return newlist

newlist = convertHetrogenousList(input_list)
print(newlist)

[1, 2, 2, 3, 3, 3, 4, 5]


In [20]:
### TODO Can we redirect the stdio.out to a list

In [9]:
# Original code

lst = []
n = 10
b = 5

for a in range(n):
    if a % 2==0:
        for x in range(a, b):
            lst.append(x)

print(lst[])

[0, 1, 2, 3, 4, 2, 3, 4, 4]


In [15]:
lst = [x for a in range(n) if a % 2 == 0 for x in range(a, b)]
print(lst)

[0, 1, 2, 3, 4, 2, 3, 4, 4]


In [21]:
# solution

n = 10
b = 5
lst = [x for a in range(n) if a % 2 == 0 for x in range(a, b)]

print(lst)

[0, 1, 2, 3, 4, 2, 3, 4, 4]


In [7]:
%%time

import os

file_list = []

for path, _, files in os.walk(".."):
    for f in files:
        if f.endswith(".py"):
            file_list.append(os.path.join(path, f))

# print(len(file_list))
# Folder: C:\temp\test   File: dummy.py
# Windows: os.path.join -> c:\temp\test\dummy.py (Windows & ReactOS)
# Linux: C:\temp\test/dummy.py (*inx Linux/macOS or *BSD)

CPU times: user 10.9 ms, sys: 16.9 ms, total: 27.8 ms
Wall time: 106 ms


In [11]:
%%time

import os

file_list = []

for path, _, files in os.walk(".."):
    file_list = [os.path.join(path, f) for f in files if f.endswith(".py")]

# print(len(file_list))
# Folder: C:\temp\test   File: dummy.py
# Windows: os.path.join -> c:\temp\test\dummy.py (Windows & ReactOS)
# Linux: C:\temp\test/dummy.py (*inx Linux/macOS or *BSD)

CPU times: user 9.68 ms, sys: 3.62 ms, total: 13.3 ms
Wall time: 13 ms


In [25]:
import os

file_list = [os.path.join(path, f)
                 for path, _, files in os.walk("..")
                     for f in files if f.endswith(".py")]

print(len(file_list))

3936


In [7]:
%%time

import os

folder = ".."
file_list = [os.path.join(path, f) 
             for path, _, files in os.walk(folder) 
                 for f in files if f.endswith(".py")]

CPU times: user 19.9 ms, sys: 12.9 ms, total: 32.8 ms
Wall time: 32.4 ms


In [8]:
%%time
restFiles = []

for path, _, files in os.walk(r".."):
    if "code" in path:
        for f in files:
            if f.endswith(".py"):
                restFiles.append(os.path.join(path, f))
print(len(restFiles))

1525
CPU times: user 26.6 ms, sys: 24.4 ms, total: 51 ms
Wall time: 56.1 ms


In [9]:
%%time
restFiles = [os.path.join(path, f) 
                 for path, _, files in os.walk("..") 
                     if "code" in path
                         for f in files 
                             if f.endswith(".py")]
print(len(restFiles))

1525
CPU times: user 17.7 ms, sys: 0 ns, total: 17.7 ms
Wall time: 18.5 ms


In [30]:
lst = [1, 2, [2, 3], 3, [3, [[4], 5]]]
lst = [1, 2, 2, 3, 3, 3, 4, 5]

In [12]:
matrix = []

for row_idx in range(0, 3):
    itmList = []
    for item_idx in range(0, 3):
        if item_idx == row_idx:
            itmList.append(1)
        else:
            itmList.append(0)
    matrix.append(itmList)
    
print(matrix)

[[1, 0, 0], [0, 1, 0], [0, 0, 1]]


In [33]:
%%time

matrix = []
for row_idx in range(0, 3):
    itmList = []
    for item_idx in range(0, 3):
        if item_idx == row_idx:
            itmList.append(1)
        else:
            itmList.append(0)
    matrix.append(itmList)


CPU times: user 12 µs, sys: 1 µs, total: 13 µs
Wall time: 14.1 µs


In [29]:
# Optimization One: Simplifying the inner loop.

matrix = []
for row_idx in range(0, 3):
    itmList = []
    for item_idx in range(0, 3):
        itmList.append(1 if item_idx == row_idx else 0)
    matrix.append(itmList)

print(matrix)

[[1, 0, 0], [0, 1, 0], [0, 0, 1]]


In [31]:
# Opti: 2

matrix = []
for row_idx in range(0, 3):
    itmList = [1 if item_idx == row_idx else 0 for item_idx in range(0, 3)]
    matrix.append(itmList)

print(matrix)

[[1, 0, 0], [0, 1, 0], [0, 0, 1]]


In [34]:
# Opti: 3

matrix = []

for row_idx in range(0, 3):
    matrix.append([1 if item_idx == row_idx else 0 for item_idx in range(0, 3)])

print(matrix)

[[1, 0, 0], [0, 1, 0], [0, 0, 1]]


In [33]:
# Final Optimization

matrix = [[1 if item_idx == row_idx 
               else 0 for item_idx in range(0, 3)] 
                  for row_idx in range(0, 3)]
print(matrix)

[[1, 0, 0], [0, 1, 0], [0, 0, 1]]


In [38]:
# Review of `set`
lst = [1, 2, 34, 4, 5]
print(lst)
lst.append(2)
lst.extend((2, 3, 4, 5, 1))
print(lst)

[1, 2, 34, 4, 5]
[1, 2, 34, 4, 5, 2, 2, 3, 4, 5, 1]


In [40]:
# Set is a collection of unique elements.

l = set(lst)
print(l)

{1, 2, 34, 4, 5, 3}


### Set Comprehensions

Set comprehensions allow sets to be constructed using the same principles as list comprehensions, the only difference is that resulting sequence is a set and **"{ }"** are used instead of **"[ ]"**.

In [12]:
names = ['aaLok', 'Manish', 'AalOK', 'Manish', 'Gupta', 'Johri', 'Rahul']

names_using_list_comprehension = [name[0].upper() + name[1:].lower() for name in names if len(name) > 1 ]

names_using_set_comprehension = {name[0].upper() + name[1:].lower() for name in names if len(name) > 1 }
print("List:", names_using_list_comprehension)
print("Set:", names_using_set_comprehension)

List: ['Aalok', 'Manish', 'Aalok', 'Manish', 'Gupta', 'Johri', 'Rahul']
Set: {'Gupta', 'Rahul', 'Johri', 'Manish', 'Aalok'}


### Dictionary Comprehensions

Now, lets consolidate the above dictionary in such a way that resultant dictionary will have only lower case keys and if both lower and upper case keys are found in the original dictionary than values of both the keys should be added.  

In [22]:
original = {'a':10, 'b': 34, 'A': 7, 'Z':3, "z": 199, 'c': 10}
# Origianl example in which swap key value.
# Original Code, which need to be optimized using dictionary comprehension.

flipped = {}
for key, value in original.items():
    flipped[value] = key
    
print(flipped)

{10: 'c', 34: 'b', 7: 'A', 3: 'Z', 199: 'z'}


In [21]:
original = {'a':10, 'b': 34, 'A': 7, 'Z':3, "z": 199, 'c': 10}

# variable = { `key` : `val` `for loop`}
flipped = {value : key for key, value in original.items()}

print(flipped)

{10: 'c', 34: 'b', 7: 'A', 3: 'Z', 199: 'z'}


In [16]:
original = {'a':10, 'b': 34, 'A': 7, 'Z':3, "z": 199, 'c': 10}
# Origianl example in which swap key value.
# Original Code, which need to be optimized using dictionary comprehension.

flipped = {}
for key in original:
    flipped[original[key]] = key
    
print(flipped)

{10: 'c', 34: 'b', 7: 'A', 3: 'Z', 199: 'z'}


In [17]:
flipped = {original[key]: key for key in original}
print(flipped)

{10: 'c', 34: 'b', 7: 'A', 3: 'Z', 199: 'z'}


In [18]:
original = {'a': 10, 'b': 34, 'A': 7, 'Z': 3, 'z': 199}

In [21]:
# In this example, we are creating a dict with key from `original` and value is sum of all
# Values with same key (ignoring the case (upper or lower))
# So the new_dict will be {'a': 17, 'b': ....}
mcase_freq = {}

for k in original:
    # Due to `get` with default value `0` if the key is not present it will return `0` 
    # which is not effect the output but avoid an key not found exception.
    mcase_freq[k.lower()] = original.get(k.lower(), 0) + original.get(k.upper(), 0)

print("\n", mcase_freq)


 {'a': 17, 'b': 34, 'z': 202}


In [23]:
# variable = { `key` : `val` `for loop`}
# This is not an optimized code, because we are calculating again for other case key also,
# Say we have two key/ value pairs as `a:10` and `A:7`,
# our current logic will process both `a` and `A` keys, which is redundent action.

mcase_frequency = {k.lower() : original.get(k.lower(), 0) + original.get(k.upper(), 0) for k in original}

print(mcase_frequency)

{'a': 17, 'b': 34, 'z': 202}


In [24]:
original = {'a':10, 'b': 34, 'A': 7, 'Z':3, "z": 199, 'c': 10}

# A bit optimized solution

mcase_freq = {}

for k in original:
    if k.lower() not in mcase_freq:
        mcase_freq[k.lower()] = original.get(k.lower(), 0) + original.get(k.upper(), 0)
    
print(mcase_freq)

{'a': 17, 'b': 34, 'z': 202, 'c': 10}


```python
# When not to use single line if statement. 
original = {'a':10, 'b': 34, 'A': 7, 'Z':3, "z": 199, 'c': 10}
mcase_freq = {}
try:
    for k in original:
        mcase_freq[k.lower()] = original.get(k.lower(), 0) + original.get(k.upper(), 0) if k.lower() not in mcase_freq

    print(mcase_freq)
except Exception as e:
    print(e)
```

**Output:**

```python
 Input In [25]
    mcase_freq[k.lower()] = original.get(k.lower(), 0) + original.get(k.upper(), 0) if k.lower() not in mcase_freq
                            ^
SyntaxError: expected 'else' after 'if' expression
```


In [54]:
# Just like list comprehension.
mcase_freq = {}

mcase_freq = {k.lower(): original.get(k.lower(), 0) + original.get(k.upper(), 0) 
                  for k in original
                      if k.lower() not in mcase_freq}
print(mcase_freq)

{'a': 17, 'b': 34, 'z': 202, 'c': 10}


$$TODO$$
This map doesn’t take a named function. It takes an anonymous, inlined function defined with lambda. The parameters of the lambda are defined to the left of the colon. The function body is defined to the right of the colon. The result of running the function body is (implicitly) returned.

The unfunctional code below takes a list of real names and appends them with randomly assigned code names.

In [45]:
# Bad Implementation, copied from Java/C++/C
# Creating a dictionary from two lists one for key and another for values.
# the values are randomized usign `random.shuffle`

import random

names_dict = {}

names = ["Mr. K.V. Pauly", "Manish", "Aalok", "Roshan Musheer"]  # Keys
code_names = ['Mr. Normal', 'Mr. God', 'Mr. Cool', 'The Big Boss']  # Values

random.shuffle(code_names)

# NEVER EVER USE THIS LOGIC  `range(len(names))` is a bad idea.
for i in range(len(names)):
    names_dict[names[i]] = code_names[i] 

print(names_dict)

{'Mr. K.V. Pauly': 'Mr. Normal', 'Manish': 'Mr. Cool', 'Aalok': 'Mr. God', 'Roshan Musheer': 'The Big Boss'}


In [46]:
named_dict = {names[i] : code_names[i] for i in range(len(names))}
print(named_dict)

{'Mr. K.V. Pauly': 'Mr. Normal', 'Manish': 'Mr. Cool', 'Aalok': 'Mr. God', 'Roshan Musheer': 'The Big Boss'}


In [34]:
# Better implementation, Note that both the lists are of same size.
import random

names_dict = {}
names = ["Mayank", "Manish", "Aalok", "Roshan Musheer"]
code_names = ['Mr. Normal', 'Mr. God', 'Mr. Cool', 'The Big Boss']

random.shuffle(code_names)

for index, name in enumerate(names):
    names_dict[name] = code_names[index] 
        
print(names_dict)

{'Mayank': 'The Big Boss', 'Manish': 'Mr. Cool', 'Aalok': 'Mr. God', 'Roshan Musheer': 'Mr. Normal'}


In [35]:
# Better implementation using Dictionary Comprehension
# First thing to find is what is the key and what is the value.

import random

names_dict = {}
names = ["Mayank", "Manish", "Aalok", "Roshan Musheer"]
code_names = ['Mr. Normal', 'Mr. God', 'Mr. Cool', 'The Big Boss']

random.shuffle(code_names)

names_dict = {name : code_names[i]  for i, name in enumerate(names)}
print(names_dict)

{'Mayank': 'The Big Boss', 'Manish': 'Mr. Normal', 'Aalok': 'Mr. Cool', 'Roshan Musheer': 'Mr. God'}


In [36]:
# best implementation

import random

names_dict = {}
names = ["Mayank", "Manish", "Aalok", "Roshan Musheer"]
code_names = ['Mr. Normal', 'Mr. God', 'Mr. Cool', 'The Big Boss']

random.shuffle(code_names)

names_dict = dict(zip(names, code_names))

print(names_dict)

{'Mayank': 'Mr. God', 'Manish': 'The Big Boss', 'Aalok': 'Mr. Cool', 'Roshan Musheer': 'Mr. Normal'}


In [37]:
names = list(range(10000))
code_names = list(range(10000))
random.shuffle(code_names)

In [38]:
%%timeit

import random

names_dict = {}

for index, name in enumerate(names):
    names_dict[name] = code_names[index] 

1.54 ms ± 400 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [40]:
%%timeit

import random

names_dict = {name : code_names[i]  for i, name in enumerate(names)}

1.37 ms ± 149 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [41]:
%%timeit
import random

names_dict = dict(zip(names, code_names))

905 µs ± 89.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [None]:
ld = [{'a': 10, 'b': 20}, {'p': 10, 'u': 100}]
dict([kv for d in ld for kv in d.items()])

### Generator Comprehension

They are simply a generator expression with a parenthesis "( )" around it. Otherwise, the syntax and the way of working is like list comprehension, but a generator comprehension returns a generator instead of a list. 

In [49]:
%%timeit
# Generator Comprehension

x = (x**2 for x in range(20000))

754 ns ± 23.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [50]:
%%timeit
# List comprehension

x = [x**2 for x in range(20000)]

1.47 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [51]:
%%timeit
x = (x**2 for x in range(999999999999999999999))
# print(x)

945 ns ± 78.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [53]:
%%timeit

x = (x**2 for x in range(20000))

for a in x:
    pass

1.96 ms ± 172 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [54]:
%%timeit
# This will take a little less over all time then generator equivalent.
# but responsiveness of generator is better 

x = [x**2 for x in range(20000)]

for a in x:
    pass

1.55 ms ± 91.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [None]:
itm = 10
print(itm / 2)

#### Summary
When struggling to write a comprehension, `don’t panic`. Start with a `for` loop first and copy-paste your way into a comprehension. Also try to convert the `if-else` structure to a single line statement.

Any for loop that looks like this:

In [23]:
def condition_based_on(itm):
    return itm % 2 == 0

old_things = range(2,20, 3)
new_things = []
for ITEM in old_things:
    if condition_based_on(ITEM):
        new_things.append(ITEM)
print(new_things)

[2, 8, 14]


Can be rewritten into a list comprehension like this:

In [24]:
new_things = [ITEM for ITEM in old_things if condition_based_on(ITEM)]
print(new_things)

[2, 8, 14]


**NOTE**

If you can nudge a for loop until it looks like the ones above, you can rewrite it as a list comprehension.