# List Comprehensions


Lesson Goals

    Learn about the concept of combining storage with iteration.
    Learn how list comprehensions combine storage and iteration in an efficient way.
    Learn how to incorporate conditional logic into list comprehensions.
    Learn how to construct list comprehensions with multiple for loops.
    Add conditional logic to multi-loop list comprehensions.
    Learn some use cases for list comprehensions.

Introduction

As you continue improving your Python programming skills, you will encounter useful ways to combine some of the basic concepts you learned earlier in the program. In this lesson, we will learn about list comprehensions. List comprehensions combine the concepts of storage in data structures, iteration, and even conditional logic into an efficient form.
Combining Storage and Iteration

The main idea behind list comprehensions is iterative storage. In some of the previous lessons, we have encountered scenarios where we had to create an empty list and iterate through a for loop appending a value to the list at the end of each iteration.

In [1]:
lst = []

for i in range(10):
    lst.append(i)

print(lst)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In this example, we created an empty list and for every number in the range, we appended that number to the list as we encountered it.


# List Comprehensions

A list comprehensions is a way to streamline that logic into a single line of Python code. Instead of creating an empty list and filling it with the append method, the list comprehension's square brackets cause the results returned to be packed into a list. Below is what the previous example looks like as a list comprehension. 

In [2]:
lst = [i for i in range(10)]
print(lst)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


As you can see, we get the same result from less code, which is generally a good thing. Inside the square brackets, we are requesting the value for every value in that range.



# Adding Conditions to List Comprehensions

We can also incorporate conditional statements into our list comprehensions. When adding conditions, they need to be placed after the appropriate for-loop to exclude results that do not satisfy our condition. The example below returns every value in a range but only if the value is greater than or equal to 5.

In [3]:
lst = [i for i in range(20) if i >= 5]
print(lst)

[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]


# Multiple For-Loops in List Comprehensions

Sometimes we encounter more complex situations where multiple loops are required. For example, suppose you had some nested lists whose values you wanted to unpack into a single list. Using regular lists and for loops, we would do that as follows.

In [4]:
lst_lst = [[1,2,3], [4,5,6], [7,8,9]]

lst = []

for x in lst_lst:
    for y in x:
        lst.append(y)

print(lst)

[1, 2, 3, 4, 5, 6, 7, 8, 9]


List comprehensions provide us with a very efficient way of handling these cases as well. In order to construct a list comprehension with multiple for loops, we would request the final result that we want followed by the series of loops. This may seem a little confusing because we are asking up front for the result that comes at the end of the last loop, but after writing them for a while, you will find that it helps you think about the result you want and then how to get it.

In [5]:
lst_lst = [[1,2,3], [4,5,6], [7,8,9]]

lst = c
print(lst)

[1, 2, 3, 4, 5, 6, 7, 8, 9]


Note that the order of the for loops within the list comprehensions is exactly the same as the order you would have written it with regular for loops. You will be tempted to write it backwards ( [y for y in x for x in lst_lst]), but that will not return the result we want. So just remember that the for loop order is the same within a list comprehension as it would be outside the list comprehension.


# Adding Conditions to List Comprehensions

List comprehensions can handle quite a bit of complexity. We can even incorporate conditional logic into list comprehensions. For example, suppose we had a list of lists such as the one in the example below and we wanted to extract a list of even numbers that appear in nested lists shorter than 4 elements long. We could achieve this with a list comprehension containing two for loops and two conditional statements as follows.

In [6]:
lst_lst = [[1,2,3,4,5], [6,7,8], [9,10]]

lst = [y for x in lst_lst if len(x) < 4 for y in x if y % 2 == 0]
print(lst)

[6, 8, 10]


Note that the condition for each for loop comes directly after the for loop it applies to. This ability to handle complex logic with such efficiency makes list comprehensions a very powerful tool to have in your arsenal.



# Practical Uses for List Comprehensions


Reading Multiple Files

One use case where list comprehensions come in handy is when data is split across multiple files. For example, suppose we had a data directory that contained several CSV files (among other files), each with the same information (columns) for separate groups or divisions. We could use a list comprehension with an endswith('.csv') condition in it to get a list of just the CSV files in that directory. We could use another list comprehension to have Pandas read each of those files and then the pd.concat method to combine them all into a single data set that we can analyze as follows. 

In [5]:
import os
import pandas as pd

file_list = [f for f in os.listdir('../data') if f.endswith('.csv')]
data_sets = [pd.read_csv(os.path.join('../data', f)) for f in file_list]
data = pd.concat(data_sets, axis=0)
print(data)

          0         1         2         3         4         5         6  \
0  0.734751  0.195362  0.734309  0.598184  0.763433  0.263434  0.868066   
1  0.772607  0.445391  0.249642  0.787922  0.598583  0.827238  0.624126   
2  0.226428  0.268764  0.694262  0.622335  0.063843  0.122683  0.815625   
3  0.362748  0.495430  0.113876  0.594149  0.612522  0.625204  0.864050   
4  0.033415  0.340433  0.464971  0.363737  0.025815  0.434129  0.415163   
0  0.276827  0.260054  0.942397  0.113187  0.781355  0.475740  0.152061   
1  0.995885  0.158381  0.244274  0.962163  0.651900  0.930665  0.577190   
2  0.641917  0.821055  0.392437  0.782617  0.510762  0.428320  0.017324   
3  0.806532  0.569258  0.148175  0.809987  0.459632  0.735762  0.730664   
4  0.311185  0.501165  0.365979  0.782807  0.776795  0.797199  0.791946   
0  0.948664  0.215285  0.918270  0.599951  0.755120  0.971609  0.103190   
1  0.163236  0.803926  0.916655  0.775234  0.644890  0.701362  0.910208   
2  0.934136  0.031410  0.

We did all this with three lines of Python code, which is pretty amazing!



# Selecting Data Frame Columns Based on Conditions

Another use case would be selecting data frame columns based on a condition that they have in common. The example below reads our vehicles data file, uses the _get_numeric_data method to retrieve all numeric columns, and then uses a list comprehension to return just the columns that have a mean greater than 15. 

In [12]:
data = pd.read_csv('../mtcars.csv')

selected_columns = [col for col in data._get_numeric_data() if data[col].mean() > 15]
print(selected_columns)

['mpg', 'disp', 'hp', 'qsec']
