# Lists, for loops, and iterators

In this lecture, we introduce `list`s, one of the most essential Python data structures. We will also introduce a new flow control statement, the `for` loop, which is much more commonly used in Python programming than `while` loops. And we will give a very high level overview of Python iterators.

As always, this reading will be more useful if you actively interact with the code blocks. Try to predict and understand what each block of code is doing, and then check your understanding by executing the code (and editing it, if you like).

## Introduction to lists

There a several ways to represent a sequence of values in Python; for now, we'll just use *lists*. Lists are ordered sequences of arbitary *items*, of the same or different types. To define a list, put a comma-separated sequence of Python expressions between two square brackets:

In [30]:
herbs = ["basil", "thyme", "sage", "oregano"] # a list of 4 strings
primes = [2, 3, 5, 7, 11] # a list of 5 integers
useless_list = ["lint", -43.6, primes] # a list with three elements (a string, a float, and a list)

The simplest way to access a particular item in a list is via its numerical **index**: its position in the list, counting up from **zero**. Use square-bracket notation to access items by index:

In [31]:
print("I used " + herbs[0] + " and " + herbs[3] + " in my pizza sauce.")

I used basil and oregano in my pizza sauce.


You can assign a value to a particular list item in the same way:

In [32]:
herbs[2] = "rosemary" # Replace "sage" with "rosemary" as the third item in the list of herbs
print(herbs)

['basil', 'thyme', 'rosemary', 'oregano']


Python provides a wide array of methods for accessing and manipulating lists. For now, we'll see just four useful techniques.

First, you can check whether a particular item appears somewhere in a list by using the `in` keyword. If `x` is a list and `item` is a Python value, the expression `item in x` evaluates to `True` if the item appears somewhere in the list, and evaluates to `False` otherwise.

In [33]:
print("basil" in herbs) # evaluates to True
print("marjoram" in herbs) # evaluates to False

True
False


Next, just as with strings, you can use the `+` operator to concatenate two lists together.

In [34]:
print([0,1,2] + [3,4,5]) # Concatenate two lists together with +

[0, 1, 2, 3, 4, 5]


The `len` function is also useful. If `x` is a list, then `len(x)` gives its length, i.e., the number of items in `x`.

In [35]:
print(len([0,1,2])) # There are three items
print(len([0,1,2] + [0,1,2])) # There are six items (repeated items count separately)

3
6


Finally, we mention the `append` method. If you want to tack an additional item (say, `item`) onto the end of a list `x`, you can use the method `x.append(item)` to modify `x` by putting the item at the end of the list.

In [36]:
primes = [2, 3, 5, 7, 11]
print("There are "+str(len(primes))+" items in our list of primes") # len(primes) is an integer
primes.append(13) # the next prime after 11 is 13
primes.append(17)
print(primes)
print("Now there are "+str(len(primes))+" items in our list of primes")

There are 5 items in our list of primes
[2, 3, 5, 7, 11, 13, 17]
Now there are 7 items in our list of primes


In [38]:
x=[1]
x.append(x)
print(x[1])

[1, [...]]


## For loops

In typical Python programming, `while` loops are used much *less* frequently than another kind of loop: the `for` loop. Recall that a `while` loop repeats the execution of a block of code until a Boolean expression evaluates to `False`. In order to perform common tasks like iterating over the first `n` integers using a `while` loop, we must introduce a variable to track which iteration we are currently performing, and manually increment it. A `for` loop makes this task, and many other common programming tasks, much more convenient.

The simplest use of a `for` loop is to repeat a block of code for every item on a list.

In [39]:
students = ["Alice", "Bob", "Carol"]

for student in students:
    print("Hello, " + student)

Hello, Alice
Hello, Bob
Hello, Carol


In the above block of code, a `for` loop is used to iterate over each item on the list `students`, and to print a greeting for each item on the list. The `for` loop iteratively sets the variable `student` equal to each item on the list `students`, in order, and then executes the indented block of code with that value of the variable `student`. So in the first iteration, `student` is assigned the value `"Alice"`, etc. Once the end of the list is reached, execution moves to any (non-indented) statements following the `for` loop.

This simple and readable code should be contrasted with how we would have to achieve the same effect using a while loop:

In [40]:
# This is a worse way of achieving the same thing as the for loop above
# Don't do this, it's less readable and more error-prone

i = 0
while i < len(students):
    print("Hello, " + students[i])
    i += 1

Hello, Alice
Hello, Bob
Hello, Carol


Just as with `while` loops, if you want to exit a `for` loop early you can use a `break` statement.

It may also happen that at some point during execution of a `for` or `while` loop, we wish to skip ahead to the next iteration through a loop. This is accomplished with a `continue` statement. When a `continue` statement is encountered within a loop, execution immediately returns to the very beginning of the loop; in the case of a `while` loop, the loop's Boolean condition is evaluated once again, and in the case of a `for` loop, the next item in the list or iterable sequence is processed.

In [41]:
words = ["I", "uh", "really", "um", "uh", "like", "um", "math", "class."]

sentence = ""
for word in words:
    if word in ["uh", "um"]:
        # Avoiding saying "uh" and "um"
        continue
    sentence += " " + word
print(sentence)

 I really like math class.


## Iterable types

In fact, a `for` loop can be used to iterate over arbitrary *iterable* structures in Python, of which lists are just one example. Another example of an iterable Python type is a string. Iterating over a string using a `for` loop will give you access to each character, one at a time. For example, the following loop counts the number of times the letter 'e' occurs in a sentence

In [42]:
sentence = "The letter 'e' is the most common letter."

e_count = 0
for character in sentence:
    if character == 'e':
        e_count += 1
    
print("The sentence has " + str(e_count) + " occurrences of 'e'.")

The sentence has 7 occurrences of 'e'.


### A brief remark on strings

Actually, `string`s and `list`s have more in common than being iterable structures. Just like lists, you can use the square-bracket index notation to access a specific character in a string:

In [43]:
mascot = "Ollie the Owl"

first_letter = mascot[0]
print(first_letter)

O


Similarly, one can use `len` to get the length of a string (as the number of characters in the string).

In [44]:
print(len(mascot))

13


You can also use the `in` keyword with strings, but it works somewhat differently than with lists. For strings `x` and `y`, the expression `x in y` evaluates to `True` if and only if `x` appears as a contiguous substring of `y`.

In [45]:
print("I" in "team")
print("tea" in "team")
print("tm" in "team")

False
True
False


However, while we can assign new values to list items using square-bracket notation, it is not possible to do the same with strings. In Python parlance, we say `list`s are *mutable* (their entries can be changed) while `string`s are *immutable*.

In [46]:
courses = ["Math 15a", "Math 16b"]
courses[0] = "Math 22a" # You can use square-bracket notation to change the items on a list
course = "Math 22a"
course[7] = "b" # Error! You cannot change an existing string, you can only make new strings

TypeError: 'str' object does not support item assignment

### Ranges

Perhaps the most common structure used with a `for` loop is a `range` object. A `range` object is an efficient representation of a sequence of (usually consecutive) integers, that behaves just like a list when you iterate over it.
- `range(k)` represents the integers `0, 1, 2, ..., k-1` (notice that the sequence ends at `k-1` rather than `k`!)
- `range(a, b)` represents the integers `a, a+1, a+2, ..., b-1`
- `range(a, b, i)` represents the integers `a, a+i, a+2i, ..., c` where `c` is the largest integer of the form `a + ki` which is less than `b`

For example, the following loop sums up the integers $0$ to $9$:

In [50]:
n = 0
for i in range(10):
    n += i # add i to n (i.e., n = n + i)
print(n)

45


I strongly encourage you to prefer iterating directly over elements of a list, rather than list indices using the `range` statement. Compare the following two `for` loops which accomplish the same thing. The first is to be preferred:

In [48]:
primes = [2,3,5,7]
for p in primes: # iterating directly over primes
    print(p) # so clear and concise! prints 2, then 3, then 5, then 7

for i in range(len(primes)): # i is an index, not a prime... Avoid this!
    print(primes[i]) # ugly and unnecessary indexing into primes. Avoid this!

2
3
5
7
2
3
5
7


## Optional material for advanced students

Python includes many language features that can make your life easier, if you are prepared to first learn additional syntax. I describe two such features here, for use with `for` loops.

I encouraged you above to prefer iterating directly over the list elements, rather than using a range statement to iterate over list indices. But sometimes we need the indices, for example if we want to modify the entries in a list. In such cases, we can use the `enumerate` built-in function. It takes as an argument an iteratable object (like a list), and returns an iterator over pairs `(i, x)` where `i` is the index of item `x`.

In [51]:
words = ["Various", "kInDs", "OF", "capitalization"]
for i, word in enumerate(words): # iterate over both indices and items of a list
    # i is the index of word in the list words
    words[i] = word.lower() # replace the ith entry with a lower-cased version of the same
print(words)

['various', 'kinds', 'of', 'capitalization']


Another use for indices when iterating is if you want to iterate over multiple lists at the same time. For example:

In [52]:
states = ["Massachusetts", "Rhode Island", "Vermont"]
capitals = ["Boston", "Providence", "Montpelier"]
for i, state in enumerate(states):
    capital = capitals[i] # get the item from capitals with the same index as the current state
    print(capital + " is the capital of " + state)

Boston is the capital of Massachusetts
Providence is the capital of Rhode Island
Montpelier is the capital of Vermont


But in such cases, when iterating over multiple lists with corresponding entries, a better solution is to use the `zip` built-in function, which does exactly that: iterates simultaneously over multiple lists (with corresponding entries).

In [53]:
for state, capital in zip(states, capitals):
    print(capital + " is the capital of " + state)

Boston is the capital of Massachusetts
Providence is the capital of Rhode Island
Montpelier is the capital of Vermont


In [57]:
beauty=[0,1,12]

In [58]:
list(zip(states, capitals, beauty))

[('Massachusetts', 'Boston', 0),
 ('Rhode Island', 'Providence', 1),
 ('Vermont', 'Montpelier', 12)]

In [59]:
for states, capitals, beauties in zip(states, capitals, beauty):
    print(capitals + " is the capital of " + states + " and the beauty is " + str(beauties))

Boston is the capital of Massachusetts and the beauty is 0
Providence is the capital of Rhode Island and the beauty is 1
Montpelier is the capital of Vermont and the beauty is 12
