# Lists and list comprehension

This notebook will introduce one of the most powerful Python functionalities: *list comprehension*, which can be used to conduct concise iteration without the use of loops. 

In [1]:
import pandas as pd

### Lists

We've seen lists a few times throughout this workshop, but we haven't really talked much about them.

A **List** is an object that can hold multiple values. 

A list can be defined using `[]`, with each value seprated by a comma. 

For example, below we create a list of four integers.

In [2]:
x = [2, 5, 18, 22]

In [3]:
x

[2, 5, 18, 22]

Since a list is an object in Python, we can check its type using the `type()` function:

In [4]:
type(x)

list

You can also ask how many entries are in a list using the `len()` function.

In [5]:
len(x)

4

Lists can contain multiple identical entries.

In [63]:
y = [1, 1, 2, 2, 2, 3]
y

[1, 1, 2, 2, 2, 3]

### Lists containing different types of objects

Lists can contain entries of multiple types too. For example, below we create a list of four values: a string, an integer, a float, and a boolean.

In [6]:
['hello', 3, 5.2, True]

['hello', 3, 5.2, True]

Lists can contain complex objects, in addition to single values. The following list contains three entires:

1. A list of two strings `['a', 'b']`

2. The integer `1`

3. A pandas Series containing two values: `apple` and `banana`

In [7]:
[['a', 'b'], 1, pd.Series(["apple", "banana"])]

[['a', 'b'],
 1,
 0     apple
 1    banana
 dtype: object]

### Manipulating lists

What do you think the following code will do?

In [8]:
[1, 2, 3] + [3, 3, 3]

[1, 2, 3, 3, 3, 3]

You may have assumed that it would add each entry in each list element-wise (which is what would happen if we were adding pandas Series objects together), but instead it has *concatenated* the two lists together (which is similar to the behavior of string values).

That is, **lists are _not_ vectorized**.

Based on this information, what do you think will happen when we run the following code?

In [9]:
[1, 2, 3] * 3

[1, 2, 3, 1, 2, 3, 1, 2, 3]

### Slicing/subsetting lists

Similarly to how we can extract values from a pandas Series and DataFrames, we can extract values from a list using the `[]` operator.

In [10]:
x

[2, 5, 18, 22]

In [11]:
# extract the third entry from x (remember that Python starts counting at 0)
x[2]

18

How might you try to extract multiple values from `x`?

In [12]:
# first guess
x[2, 3]

TypeError: list indices must be integers or slices, not tuple

In [None]:
# second guess 
x[[2, 3]]

TypeError: list indices must be integers or slices, not list

Unfortunately you cannot extract multiple values from a list using the `[]` operator. Instead, you can use the `:` sequencing operator to extract a range of values from a list.

We need to use the sequencing syntax `start:stop` to extract a subset of values from a list. Recall that the `stop` value is not included in the output.

In [13]:
# Extract the first two entries from x
x[0:2]

[2, 5]

You can omit the end point of the sequence to extract all values from the start index to the end of the list by leaving the `stop` index blank.

In [14]:
# extract all entries except the first entry
x[1:]

[5, 18, 22]

You can also omit the start of the sequence to extract all values until a certain point by leaving the `start` index blank:

In [15]:
# Extract the first two entries from x
x[:2]

[2, 5]

You can also use the `start:stop:step` syntax to take every `step`-th value from the list.

In [16]:
# Extract every second entry starting from the first entry (index 0) and ending with the fourth entry (index 3)
x[0:3:2]

[2, 18]

In [17]:
# Extract every second entry starting from the first entry (index 0)
# start:stop:step -- leaving stop blank will go to the end, i.e., start::step
x[0::2]

[2, 18]

### Negative indexing

In [18]:
# look at x again
x

[2, 5, 18, 22]

What do you think the following code will return?

In [19]:
x[-2]

18

Negative indexing involves indexing from the *end* of the list. So `x[-1]` gives the last value in the list, `x[-2]` gives the second-to-last value in the list, and so on.

The following code will return the list in reverse order. Can you explain why?

In [20]:
x[::-1]

[22, 18, 5, 2]

### Updating values with slicing

In [21]:
# take a look at x
x

[2, 5, 18, 22]

In [24]:
# replace the second and third entries with apple and banana
x[1:3] = ['apple', 'banana']

In [23]:
# take a look at x now
x

[2, 'apple', 'banana', 22]

### Exercise

For the following list

In [26]:
# define a list called `a`
a = [3, 1, 4, 9, 10, 3]

Write some code to do the following:

1. Extract the fifth entry in the list

1. Extract the last four entries in the list

1. Extract the first four entries in the list

1. Extract the third-last entry in the list 

1. Extract every second entry in the list, starting from the second entry

1. Return the list with its entries in reverse order

In [29]:
# extract the fifth entry
a[4]

10

In [31]:
# extract the last four entries in the list
a[-4:]
# or a[2:6] -- but this is less generalizable

[4, 9, 10, 3]

In [35]:
# extract the first four entries in the list
a[:4]

[3, 1, 4, 9]

In [36]:
# extract the third last entry in the list
a[-3]

9

In [37]:
# extract every second entry starting from the second entry
a[1::2]

[1, 9, 3]

In [38]:
# return the list entries in reverse order
a[::-1]

[3, 10, 9, 4, 1, 3]

### List comprehension

Recall that if we wanted to add `1` to each value in `a`, the following code would not work (because lists are not vectorized):

In [39]:
a + 1

TypeError: can only concatenate list (not "int") to list

The way to conduct element-wise operations on a list is to use list comprehension (another way would be to use a "for loop"). 

In [None]:
# list comprehension for adding 1 to each value in a
[i + 1 for i in a]

The general syntax for list comprehension is: 

`[expression for item in list]`

where `expression` is the operation you want to conduct on each item in the list.

In [40]:
# list comprehension for squaring each value in a
[i**2 for i in a]

[9, 1, 16, 81, 100, 9]

### Exercise

Use list comprehension to add the text `', Australia'` to the end of each string in the following list:

In [70]:
my_list = ['Sydney', 'Melbourne', 'Brisbane', 'Adelaide', 'Perth']

In [72]:
# solution: add an underscore to each city name
[city + ', Australia' for city in my_list]

['Sydney, Australia',
 'Melbourne, Australia',
 'Brisbane, Australia',
 'Adelaide, Australia',
 'Perth, Australia']

### Gapminder list comprehension example

Let's look at a more interesting example.

In [50]:
gapminder = pd.read_csv('data/gapminder.csv')
gapminder_clean = gapminder.copy()

Suppose that I wanted to change all of the column names to uppercase.

In [51]:
# print out the original columns
gapminder_clean.columns

Index(['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap'], dtype='object')

In [52]:
# I could do this manually, but that's not very efficient
gapminder_clean.columns = ['COUNTRY', 'CONTINENT', 'YEAR', 'LIFEEXP', 'POP', 'GDPPERCAP']
gapminder_clean

Unnamed: 0,COUNTRY,CONTINENT,YEAR,LIFEEXP,POP,GDPPERCAP
0,Afghanistan,Asia,1952,28.801,8425333,779.445314
1,Afghanistan,Asia,1957,30.332,9240934,820.853030
2,Afghanistan,Asia,1962,31.997,10267083,853.100710
3,Afghanistan,Asia,1967,34.020,11537966,836.197138
4,Afghanistan,Asia,1972,36.088,13079460,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,Africa,1987,62.351,9216418,706.157306
1700,Zimbabwe,Africa,1992,60.377,10704340,693.420786
1701,Zimbabwe,Africa,1997,46.809,11404948,792.449960
1702,Zimbabwe,Africa,2002,39.989,11926563,672.038623


In [53]:
# reset to the original columns
gapminder_clean = gapminder.copy()
gapminder_clean

Unnamed: 0,country,continent,year,lifeExp,pop,gdpPercap
0,Afghanistan,Asia,1952,28.801,8425333,779.445314
1,Afghanistan,Asia,1957,30.332,9240934,820.853030
2,Afghanistan,Asia,1962,31.997,10267083,853.100710
3,Afghanistan,Asia,1967,34.020,11537966,836.197138
4,Afghanistan,Asia,1972,36.088,13079460,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,Africa,1987,62.351,9216418,706.157306
1700,Zimbabwe,Africa,1992,60.377,10704340,693.420786
1701,Zimbabwe,Africa,1997,46.809,11404948,792.449960
1702,Zimbabwe,Africa,2002,39.989,11926563,672.038623


Instead I could use list comprehension together with the `.upper()` string method to do this in one line of code.

In [55]:
# use a list comprehension to create a list of upper case column names
[name.upper() for name in gapminder_clean.columns]

['COUNTRY', 'CONTINENT', 'YEAR', 'LIFEEXP', 'POP', 'GDPPERCAP']

In [57]:
# update the column names with the upper case names
gapminder_clean.columns = [name.upper() for name in gapminder_clean.columns]
gapminder_clean

Unnamed: 0,COUNTRY,CONTINENT,YEAR,LIFEEXP,POP,GDPPERCAP
0,Afghanistan,Asia,1952,28.801,8425333,779.445314
1,Afghanistan,Asia,1957,30.332,9240934,820.853030
2,Afghanistan,Asia,1962,31.997,10267083,853.100710
3,Afghanistan,Asia,1967,34.020,11537966,836.197138
4,Afghanistan,Asia,1972,36.088,13079460,739.981106
...,...,...,...,...,...,...
1699,Zimbabwe,Africa,1987,62.351,9216418,706.157306
1700,Zimbabwe,Africa,1992,60.377,10704340,693.420786
1701,Zimbabwe,Africa,1997,46.809,11404948,792.449960
1702,Zimbabwe,Africa,2002,39.989,11926563,672.038623


## Exercise

Using list comprehension, write some code to extract the first four letters of each country name in the country column of gapminder (advanced challenge: remove duplicated country names).

In [69]:
[x[0:4] for x in gapminder['country'].drop_duplicates()]

['Afgh',
 'Alba',
 'Alge',
 'Ango',
 'Arge',
 'Aust',
 'Aust',
 'Bahr',
 'Bang',
 'Belg',
 'Beni',
 'Boli',
 'Bosn',
 'Bots',
 'Braz',
 'Bulg',
 'Burk',
 'Buru',
 'Camb',
 'Came',
 'Cana',
 'Cent',
 'Chad',
 'Chil',
 'Chin',
 'Colo',
 'Como',
 'Cong',
 'Cong',
 'Cost',
 'Cote',
 'Croa',
 'Cuba',
 'Czec',
 'Denm',
 'Djib',
 'Domi',
 'Ecua',
 'Egyp',
 'El S',
 'Equa',
 'Erit',
 'Ethi',
 'Finl',
 'Fran',
 'Gabo',
 'Gamb',
 'Germ',
 'Ghan',
 'Gree',
 'Guat',
 'Guin',
 'Guin',
 'Hait',
 'Hond',
 'Hong',
 'Hung',
 'Icel',
 'Indi',
 'Indo',
 'Iran',
 'Iraq',
 'Irel',
 'Isra',
 'Ital',
 'Jama',
 'Japa',
 'Jord',
 'Keny',
 'Kore',
 'Kore',
 'Kuwa',
 'Leba',
 'Leso',
 'Libe',
 'Liby',
 'Mada',
 'Mala',
 'Mala',
 'Mali',
 'Maur',
 'Maur',
 'Mexi',
 'Mong',
 'Mont',
 'Moro',
 'Moza',
 'Myan',
 'Nami',
 'Nepa',
 'Neth',
 'New ',
 'Nica',
 'Nige',
 'Nige',
 'Norw',
 'Oman',
 'Paki',
 'Pana',
 'Para',
 'Peru',
 'Phil',
 'Pola',
 'Port',
 'Puer',
 'Reun',
 'Roma',
 'Rwan',
 'Sao ',
 'Saud',
 'Sene',
 