# List Comprehension in Python

## Transforming Lists in Python

### Using for Loops

In [1]:
squares = []

for number in range(10):
    squares.append(number * number)
print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


### Using map Objects

In [2]:
prices = [0.86, 9.4, 0.35, 5, 8.7]
TAX_RATE = .67
def get_price_tax(price):
    return price * (1 + TAX_RATE)

final_prices = map(get_price_tax, prices)
print(final_prices)

print(list(final_prices))

<map object at 0x110296c10>
[1.4362, 15.698, 0.5844999999999999, 8.35, 14.528999999999998]


### Leveraging List Comprehensions

In [3]:
squares = [number * number for numbers in range(10)]
print(squares)

[81, 81, 81, 81, 81, 81, 81, 81, 81, 81]


new_list = [expression for member in iterable]

Every list comprehension in Python includes three elements:

1. expression is the member itself, a call to a method, or any other valid expression that returns a value. In the example above, the expression number * number is the square of the member value.
2. member is the object or value in the list or iterable. In the example above, the member value is number.
3. iterable is a list, set, sequence, generator, or any other object that can return its elements one at a time. In the example above, the iterable is range(10).

Because the expression requirement is so flexible, a list comprehension in Python works well in many places where you would use map(). You can rewrite the pricing example with its own list comprehension:


In [4]:
prices = [2.3, 5.7, 6.78, 34.6]
TAX_RATE = .65
def get_price_with_tax(price):
    return price * (1 + TAX_RATE)

final_prices = [get_price_with_tax(price) for price in prices]

print(final_prices)

[3.7949999999999995, 9.405, 11.187, 57.089999999999996]


The only distinction between this implementation and map() is that the list comprehension in Python returns a list, not a map object.

### Filter Values From a List

new_list = [expression for member in iterable if conditional]

In [5]:
sentence = "the rocket came back from mars"
[char for char in sentence if char in "aeiou"]


['e', 'o', 'e', 'a', 'e', 'a', 'o', 'a']

In [6]:
sentence = "The rocket, who was named Ted, came back from Mars because he missed his friends."

def is_consonant(letter):
    vowels = "aeiou"
    return letter.isalpha() and letter.lower() not in vowels

[char for char in sentence if is_consonant(char)]



['T',
 'h',
 'r',
 'c',
 'k',
 't',
 'w',
 'h',
 'w',
 's',
 'n',
 'm',
 'd',
 'T',
 'd',
 'c',
 'm',
 'b',
 'c',
 'k',
 'f',
 'r',
 'm',
 'M',
 'r',
 's',
 'b',
 'c',
 's',
 'h',
 'm',
 's',
 's',
 'd',
 'h',
 's',
 'f',
 'r',
 'n',
 'd',
 's']

In [7]:
sentence = ("The rocket, who was nAmed TEd, came back from Mars because he missed his friends.")

def remove_vowels(letter):
    vowels = "aeiou"
    return letter.lower() not in vowels
    # if you want to remove the spaces as well you say return letter.isalpha() and letter.lower() not in vowels

x = [char for char in sentence if remove_vowels(char)]
x = "".join(x)
print(x)

Th rckt, wh ws nmd Td, cm bck frm Mrs bcs h mssd hs frnds.


Here, you create a complex filter, is_consonant(), and pass this function as the conditional statement for your list comprehension. Note that you also pass the member value char as an argument to your function.

You can place the conditional at the end of the statement for basic filtering, but what if you want to change a member value instead of filtering it out? In this case, it’s useful to place the conditional near the beginning of the expression. You can do so by taking advantage of the conditional expression:



new_list = [true_expr if conditional else false_expr for member in iterable]

By placing the conditional logic at the beginning of a list comprehension, you can use conditional logic to select from multiple possible output options. For example, if you have a list of prices, then you may want to replace negative prices with 0 and leave the positive values unchanged:

In [8]:
originial_prices = [1.25, -9.45, 4, 6.75, -3.87]
[price if price > 0 else 0 for price in originial_prices]

[1.25, 0, 4, 6.75, 0]

Here, your expression is a conditional expression, price if price > 0 else 0. This tells Python to output the value of price if the number is positive, but to use 0 if the number is negative. If this seems overwhelming, then it may be helpful to view the conditional logic as its own function:

In [9]:
def get_price(price):
    return price if price > 0 else 0

[get_price(price) for price in originial_prices]

[1.25, 0, 4, 6.75, 0]

### Remove Duplicates With Set and Dictionary Comprehensions

While the list comprehension in Python is a common tool, you can also create set and dictionary comprehensions. A set comprehension is almost exactly the same as a list comprehension in Python. The difference is that set comprehensions make sure the output contains no duplicates. You can create a set comprehension by using curly braces instead of brackets:

In [10]:
quote = "life, uh, finds a way"
{char for char in quote if char in "aeiou"}

{'a', 'e', 'i', 'u'}

Your set comprehension outputs all the unique vowels that it found in quote. Unlike lists, sets don’t guarantee that items will be saved in any particular order. This is why the first member of the set is a, even though the first vowel in quote is i. **Dictionary** comprehensions are similar, with the additional requirement of defining a key:

In [11]:
{number:number * number for number in range(10)}

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

To create the dictionary, you use curly braces ({}) as well as a key-value pair (number: number * number) in your expression.

### Assign Values With the Walrus Operator

Python 3.8 introduced the assignment expression, also known as the walrus operator. To understand how you can use it, consider the following example.

Say you need to make ten requests to an API that will return temperature data. You only want to return results that are greater than 100 degrees Fahrenheit. Assume that each request will return different data. In this case, the formula expression for member in iterable if conditional provides no way for the conditional to assign data to a variable that the expression can access.

You need the temperature in both the expression and the conditional so this is a challenge. The walrus operator (:=) solves this problem. It allows you to run an expression while simultaneously assigning the output value to a variable. The following example shows how this is possible, using get_weather_data() to generate fake weather data:

In [12]:
import random

def get_weather_data():
    return random.randrange(90,110)

[temp for _ in range(20) if (temp := get_weather_data()) >= 100]

[106, 105, 104, 100, 106, 103, 102, 109]

Note that the walrus operator needs to be in the conditional part of your comprehension. You won’t often need to use the assignment expression inside of a list comprehension in Python, but it’s a useful tool to have at your disposal when necessa

### Watch Out for Nested Comprehensions

You can nest comprehensions to create combinations of lists, dictionaries, and sets within a collection. For example, say a climate laboratory is tracking the high temperature in five different cities for the first week of June. The perfect data structure for storing this data could be a Python list nested within a dictionary. You can create the data using nested comprehensions:

In [13]:
cities = ["Harare", "Bulawayo", "Masvingo", "Gweru"]
{city: [0 for _ in range(7)] for city in cities}

{'Harare': [0, 0, 0, 0, 0, 0, 0],
 'Bulawayo': [0, 0, 0, 0, 0, 0, 0],
 'Masvingo': [0, 0, 0, 0, 0, 0, 0],
 'Gweru': [0, 0, 0, 0, 0, 0, 0]}

You create the outer dictionary with a dictionary comprehension. The expression is a key-value pair that contains yet another comprehension. This code will quickly generate a list of data for each city in cities.

Nested lists are a common way to create matrices, which you’ll often use for mathematical purposes. Take a look at the code block below:

In [14]:
[[number for number in range(5)] for _ in range(6)]

[[0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4],
 [0, 1, 2, 3, 4]]

The outer list comprehension [... for _ in range(6)] creates six rows, while the inner list comprehension [number for number in range(5)] fills each of these rows with values.

So far, the purpose of each nested comprehension is pretty intuitive. However, there are other situations, such as flattening lists, where the logic arguably makes your code more confusing. Take this example, which uses a nested list comprehension to flatten a matrix:

In [15]:
matrix = [
    [0,0,0],
    [1,1,1],
    [2,2,2],
]

[number for row in matrix for number in row]

[0, 0, 0, 1, 1, 1, 2, 2, 2]

The code to flatten the matrix is concise, but it may not be so intuitive to understand how it works. On the other hand, if you used for loops to flatten the same matrix, then your code would be much more straightforward to understand:

In [16]:
matrix = [
    [0,0,0],
    [1,1,1],
    [2,2,2],
]

flat = []
for row in matrix:
    for number in row:
        flat.append(number)

print(flat)

[0, 0, 0, 1, 1, 1, 2, 2, 2]


### Choose Generators for Large Datasets

A list comprehension in Python works by loading the entire output list into memory. For small or even medium-sized lists, this is generally fine. If you want to sum the squares of the first one-thousand integers, then a list comprehension will solve this problem admirably:

In [32]:
sum([number * number for number in range(1000)])

332833500

But what if you wanted to sum the squares of the first billion integers? If you tried that on your machine, then your computer might become unresponsive. That’s because Python is trying to create a list with one billion integers, which consumes more memory than your computer would like. If you tried to do it anyway, then your machine could slow down or even crash.

When the size of a list becomes problematic, it’s often helpful to use a generator instead of a list comprehension in Python. A generator doesn’t create a single, large data structure in memory, but instead returns an iterable. Your code can ask for the next value from the iterable as many times as necessary or until you’ve reached the end of your sequence, while only storing a single value at a time.

If you sum the first billion squares with a generator, then your program will likely run for a while, but it shouldn’t cause your computer to freeze. In the example below, you use a generator:

In [17]:
#sum(number * number for number in range(1_000_000_000))

KeyboardInterrupt: 

You can tell this is a generator because the expression isn’t inside brackets or curly braces. Optionally, generators can be inside parentheses.

The example above still requires a lot of work, but it performs the operations lazily. Because of lazy evaluation, your code only calculates values when they’re explicitly requested. After the generator yields a value, it can add that value to the running sum, then discard that value and generate the next value. When the sum() function requests the next value, the cycle starts over. This process keeps the memory footprint small.

The map() function also operates lazily, meaning memory won’t be an issue if you choose to use it in this case:

In [None]:
#sum(map(lambda number:number * number, range(1_000_000_000)))

### Profile to Optimize Performance

So, which approach is faster? Should you use list comprehensions or one of their alternatives? Rather than adhere to a single rule that’s true in all cases, it’s more useful to ask yourself whether or not performance matters in your specific circumstance. If not, then it’s usually best to choose whatever approach leads to the cleanest code!

If you’re in a scenario where performance is important, then it’s typically best to profile different approaches and listen to the data. The timeit library is useful for timing how long it takes chunks of code to run. You can use timeit to compare the runtime of map(), for loops, and list comprehensions:

In [18]:
import random
import timeit
TAX_PRICE = .08
PRICES = [random.randrange(100) for _ in range(100_000)]

def get_prices(price):
    return price * (1 + TAX_PRICE)

def get_prices_with_map():
    return list(map(get_price, PRICES))

def get_prices_with_comprehension():
    return [get_price(price) for price in PRICES]

def get_prices_with_loop():
    prices = []
    for price in PRICES:
        prices.append(get_price(price))
    return prices

a = timeit.timeit(get_prices_with_map, number = 100)

b = timeit.timeit(get_prices_with_comprehension, number = 100)

c = timeit.timeit(get_prices_with_loop, number = 100)

print(a)
print(b)
print(c)

1.2152199129999985
1.6325756929999784
2.290300233000039


Here, you define three methods that each use a different approach for creating a list. Then, you tell timeit to run each of those functions 100 times each, and timeit returns the total time it took to run those 100 executions.

As your code demonstrates, the biggest difference is between the loop-based approach and map(), with the loop taking 50 percent longer to execute. Whether or not this matters depends on the needs of your application.