<img src=http://ascension.org/-/media/Images/Ascension/Standalone-Images/ascensionLogo.ashx?h=108&w=432&la=en&hash=A13B391959A96389BCBB19520362A79F06D754E2>

## Basics of Lists and Iteration in Python

Anthony Gatti (Email: <mailto:anthony.j.gatti@gmail.com>)

This is a presentation given to the Ascension Python User's Group on March 21, 2017. This is a general introduction to lists and iteration over lists in Python. Please consult the [Python documentation](https://docs.python.org/3/index.html) for this and much more information - it is far superior and more thorough than what is below (and was written by the people who created this stuff).

The best book on this topic (and many others) I have found is [Fluent Python](http://shop.oreilly.com/product/0636920032519.do) by Luciano Ramalho. The classic on the topic is [The Python Cookbook](http://shop.oreilly.com/product/9780596001674.do) by Alex Martelli and David Ascher.

This [Jupyter notebook](http://jupyter.org/) runs Python 3.5.

## 1. Lists

Lists are the bedrock data structure in Python. Lists in Python are similar to arrays in C or Java, and behave similarly. There is a lot of subtlety, but this is intended to be hidden from the end user.

### List Basics

Lists can contain any type of object, including integers, strings, floats, etc. They can also contain a mix of data types.

In [1]:
numbers = [3,5,2,8,5,7]
names = ['Joe','Susy','Carl','Michael','Lucy','Erin']
mixture = [3,'Sarah',2.52]

Lists are indexed starting at 0; you can access a specific element of a list by number as below. If you choose an index that is too large for the length of the list, you will get an *IndexError*.

In [2]:
print(numbers[4])
print(numbers[7])

5


IndexError: list index out of range

Lists have a length which is obtained using the `len` method. Note that because lists are indexed starting at 0, the maximum index of a list is `len(the_list)-1`. (For more on why it may be a good idea to have indices start at 0, check out [this short essay](http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html) from E.W. Dijkstra).

In [3]:
print(numbers)
print(len(numbers))
print(numbers[len(numbers)-1])

[3, 5, 2, 8, 5, 7]
6
7


### Mutable vs. Immutable

Lists are **mutable**, which means the contents of a specific list can change. Mutable vs. immutable is a very important distinction in data structures in Python. Tuples are an example of a data structure that is immutable. Here is an example of the difference between mutable and immutable.

In [4]:
# Create list, mutate, print.
mutable_list = [8,6,7,5,3,0,9]
mutable_list[1] = -2
print(mutable_list)

[8, -2, 7, 5, 3, 0, 9]


In [5]:
# Create tuple, try to mutate.
immutable_type = (8,6,7,5,3,0,9)
immutable_type[1] = -2

TypeError: 'tuple' object does not support item assignment

### List Methods

Lists have a built-in set of _methods_ that can be called to operate on a particular list. Many of these methods happen _in-place_, meaning there's no need to assign a new list in its place. A full list of these methods can be found [here](https://docs.python.org/3/tutorial/datastructures.html). Please note that depending on your use case, for some of these methods other data structures such as `deque` may be more efficient.

In [6]:
new_numbers = [1,6,3,7,6,8,2,0,19,4]
new_numbers.append(17); print(new_numbers) # Append to the list.

[1, 6, 3, 7, 6, 8, 2, 0, 19, 4, 17]


In [7]:
new_numbers.extend([12,3,1]); print(new_numbers) # Extend the list - extend is often considered faster.

[1, 6, 3, 7, 6, 8, 2, 0, 19, 4, 17, 12, 3, 1]


In [8]:
new_numbers = new_numbers + [7,8,9]; print(new_numbers) # Concatenation.

[1, 6, 3, 7, 6, 8, 2, 0, 19, 4, 17, 12, 3, 1, 7, 8, 9]


In [9]:
new_numbers.pop(); print(new_numbers) # Pop from the list.
new_numbers.pop(0); print(new_numbers) # Pop from the front of the list - better to use deque.

[1, 6, 3, 7, 6, 8, 2, 0, 19, 4, 17, 12, 3, 1, 7, 8]
[6, 3, 7, 6, 8, 2, 0, 19, 4, 17, 12, 3, 1, 7, 8]


Sorting takes place using [timsort](https://en.wikipedia.org/wiki/Timsort), Python's own sorting algorithm. Sorting is done in place, so there is no need to assign to a new list. This can be done using the `sorted` method.

In [10]:
new_numbers.sort(); print('new_numbers:', new_numbers)
sorted_list = sorted(new_numbers); print('sorted_list:', sorted_list)

new_numbers: [0, 1, 2, 3, 3, 4, 6, 6, 7, 7, 8, 8, 12, 17, 19]
sorted_list: [0, 1, 2, 3, 3, 4, 6, 6, 7, 7, 8, 8, 12, 17, 19]


We can also find the max and min, and wrap into mathematical functions.

In [11]:
print('sum of the list:', sum(new_numbers))
print('max of the list:', max(new_numbers))
print('min of the list:', min(new_numbers))

sum of the list: 103
max of the list: 19
min of the list: 0


### List slicing.

List slicing is an easy way to extract elements from a list. The syntax is a[start:end:stop]. Remember that lists are indexed at 0, and the stop index is not included.

We can slice the whole list:

In [12]:
my_list = [9,8,7,6,5,4,3,2,1]

print(my_list[0:])
print(my_list[0:11]) # Note no index error.

[9, 8, 7, 6, 5, 4, 3, 2, 1]
[9, 8, 7, 6, 5, 4, 3, 2, 1]


Or we can select elements we want, by a certain step.

In [13]:
my_list[0:6:2]

[9, 7, 5]

We can go backwards, and slice with negative indices in a number of different ways.

In [14]:
my_list[-1]

1

In [15]:
my_list[:-1]

[9, 8, 7, 6, 5, 4, 3, 2]

In [16]:
my_list[-1:]

[1]

Finally, we can reverse the list.

In [17]:
my_list[::-1]

[1, 2, 3, 4, 5, 6, 7, 8, 9]

## 2. Iteration over lists.

In Python, iteration happens differently from most languages.

In languages like C, C++, or Java, iteration would look something like this (C below):
```
int array[10], array_size;
array_size = sizeof(array)/sizeof(int);

for(int i = 0; i < array_size; i++) {
    do_something(array[i]);
}
```

Note that in this iteration, **we loop over array elements by their index.** The Python equivalent is below - this is the _naive_ way to iterate in Python.

In [18]:
def do_something(a):
    print(a+1)
    
for i in range(len(numbers)):
    do_something(numbers[i])

4
6
3
9
6
8


**This is highly non-Pythonic - don't do it this way!** (unless you really have). The proper way is as follows: 

In [19]:
print(numbers)
for number in numbers:
    do_something(number)

[3, 5, 2, 8, 5, 7]
4
6
3
9
6
8


Many, many objects in Python have iteration built-in. This is a slightly nuanced topic, and much more detail can be found [here](http://stackoverflow.com/questions/9884132/what-exactly-are-pythons-iterator-iterable-and-iteration-protocols). In general, we can loop over any object with `__iter__` and `__next__` methods in the class definition (which is a lot of things):

In [20]:
for char in 'abc':
    print(char + ' is a letter.')

a is a letter.
b is a letter.
c is a letter.


In [21]:
for ministry in ('TXAUS','TNNAS','MIDET'):
    print(ministry[0:2])

TX
TN
MI


In [22]:
for element in {'Ohio State': 30, 'Michigan': 24}:
    print(element)

Ohio State
Michigan


In [23]:
string = 'Michigan has beaten Ohio State twice since 2000.'
iterString = iter(string)
print(iterString)
print(next(iterString))

<str_iterator object at 0x0000000004350898>
M


In [24]:
for char in iterString:
    print(char)
    if char == 'g':
        break

i
c
h
i
g


Need to iterate over numbers? There's a generator for that! (Note that this can just be `range(5)` - also, observe that 5 is not printed).

In [25]:
for i in range(0,5):
    print(i)

0
1
2
3
4


## 3. List Comprehensions

List comprehensions are one of the coolest features of Python. They make looping over list elements fun, easy, and concise. See [here](https://en.wikipedia.org/wiki/List_comprehension) for more details on list comprehensions and how they are implemented in other languages such as Haskell.

To understand list comprehensions, let's start by taking a list of names, sorting them, and extracting the first letter.

In [26]:
starting_list = ['Eda','Zach','Jason','Amy','Ryan','Anthony','Randy']
starting_list.sort()

Now let's write a for loop, starting with a blank list and append item by item.

In [27]:
first_letter = []
for name in starting_list:
    first_letter.append(name[0])
print(first_letter)

['A', 'A', 'E', 'J', 'R', 'R', 'Z']


This is also non-Pythonic - instead of doing this, let's use a list comprehension as such:

In [28]:
first_letter = [s[0] for s in starting_list]
print(first_letter)

['A', 'A', 'E', 'J', 'R', 'R', 'Z']


We can include logic at the end of the list comprehension to filter items we don't want.

In [29]:
new_first_letter = [s[0] for s in starting_list if s[0] != 'R']
print(new_first_letter)

['A', 'A', 'E', 'J', 'Z']


Let's test these two approaches on larger inputs of data.

In [34]:
import datetime, random, string

# Create set of random words.
less_words = [''.join(random.choice(string.ascii_uppercase) for _ in range(5)) for _ in range(10)]
print(less_words)

words = [''.join(random.choice(string.ascii_uppercase) for _ in range(5)) for _ in range(1000000)]

['VVDPM', 'SZQFB', 'RYKMP', 'CTKTM', 'QTRIE', 'DDLPX', 'QSPHE', 'YRGXK', 'JTBOA', 'IUGOU']


In [35]:
start = datetime.datetime.now()

first_letters_for = []
for word in words:
    first_letters_for.append(word[0])

end = datetime.datetime.now()

print('time taken:', end - start)

time taken: 0:00:00.241200


In [36]:
start = datetime.datetime.now()

first_letters_lc = [word[0] for word in words]

end = datetime.datetime.now()

print('time taken:', end - start)

time taken: 0:00:00.162800


## 4. Functions to help out.

Python provides us a number of nice functions to aid in iteration. Let's examine some here.

If you want a counter to go along with your list, you can use enumerate:

In [37]:
my_list = ['D','A','R','Q']
for x, y in enumerate(my_list):
    print(x,y)

0 D
1 A
2 R
3 Q


The zip function iterates over two lists with elements combined pairwise:

In [38]:
second_list = [1,2,1,6]
for x, y in zip(my_list, second_list):
    print(x,y)

D 1
A 2
R 1
Q 6


Itertools.product takes the cartesian product of two lists. This avoids doing two for loops.

In [39]:
first_list = [1,2,3]
second_list = ['red','green']

# Nasty double for-loop
for f in first_list:
    for s in second_list:
        print(f,s)

1 red
1 green
2 red
2 green
3 red
3 green


In [40]:
import itertools

for x, y in itertools.product(first_list, second_list):
    print(x,y)

1 red
1 green
2 red
2 green
3 red
3 green


Let's take a list of students in a classroom and assign them number IDs in alphabetical order.

In [None]:
students = ['Smith, Joe', 'Doe, John', 'Q, Suzie', 'Underwood, Frank', 'Eisenhower, Dwight', 'Anthony, Susan B.']