# Working with lists (and strings)


_Practical Python for Linguistics and the Humanities -- Alexis Dimitriadis_

## Contents


**[1. Lists: The basics](#1.-Lists:-The-basics)**  
&nbsp;&nbsp;&nbsp;&nbsp;
  [1.1 Lists are "containers" that have multiple objects](#1.1-Lists-are-"containers"-that-have-multiple-objects)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [1.2 Lists of strings](#1.2-Lists-of-strings)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [1.3 Joining together a list of strings](#1.3-Joining-together-a-list-of-strings)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [1.4 Printing lists of strings "nicely"](#1.4-Printing-lists-of-strings-"nicely")  
&nbsp;&nbsp;&nbsp;&nbsp;
  [1.5 Extracting sublists](#1.5-Extracting-sublists)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [1.6 Modifying lists](#1.6-Modifying-lists)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [1.7 Copies and references](#1.7-Copies-and-references)  

**[2. Better flow control](#2.-Better-flow-control)**  
&nbsp;&nbsp;&nbsp;&nbsp;
  [2.1 Looping over a list](#2.1-Looping-over-a-list)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [2.2 Using if-else statements](#2.2-Using-if-else-statements)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [2.3 Looping without a counter](#2.3-Looping-without-a-counter)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [2.4 Collecting results](#2.4-Collecting-results)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [2.5 We can still count when we need to](#2.5-We-can-still-count-when-we-need-to)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [2.6 The magic behind `range()`](#2.6-The-magic-behind-range%28%29)  

**[3. More on functions](#3.-More-on-functions)**  
&nbsp;&nbsp;&nbsp;&nbsp;
  [3.1 Optional arguments to functions](#3.1-Optional-arguments-to-functions)  
&nbsp;&nbsp;&nbsp;&nbsp;
  [3.2 Named arguments ("keyword arguments")](#3.2-Named-arguments-%28"keyword-arguments"%29)  

**[4. Your turn: Working with lists of strings](#4.-Your-turn:-Working-with-lists-of-strings)**  

**[5. Function practice](#5.-Function-practice)**  

**[6. Review: What we learned](#6.-Review:-What-we-learned)**  


Lists are described in [Chapter 10][1] of Think Python, which you should **study ahead of this notebook.** Because the text covered a whole lot of ground, we revisit some important points now. Run the code samples and make sure you understand what they do. The last two sections are devoted to exercises.

[1]: http://greenteapress.com/thinkpython2/html/thinkpython2011.html

## 1. Lists: The basics

### 1.1 Lists are "containers" that have multiple objects

We create a list with square brackets. The function `len()` tells us how many elements a list contains.

In [None]:
primes = [ 1, 3, 5, 7, 11, 13, 17 ]
print("I have", len(primes), "prime numbers")

We retrieve list elements by numeric position ("index"),
also indicated by square brackets.  List indexes start from zero!

In [None]:
print(primes[0], primes[1])

In a list of length 7, the greatest index is 6:

In [None]:
print(primes[6])

I can append more elements to a list:

In [None]:
primes.append(19)
print("Now I have", len(primes), "prime numbers")

As with strings, negative indices have a special meaning:
They count backwards from the end of the list.

In [None]:
print("My last prime is", primes[-1])

### 1.2 Lists of strings

A list can contain anything, or any mix of things (including other lists)

In [None]:
mix = [ 3, 4, 5, "text", primes ]
print(mix)     # Note the brackets around the sublist:

In [None]:
print(mix[-1]) # Last element: the entire list "primes"

To split a long string into a list of (almost) words, use the
method `split()`:

In [None]:
storyquote = """King Arthur: The Lady of the Lake, her arm clad
in the purest shimmering samite held aloft Excalibur
from the bosom of the water, signifying by divine
providence that I, Arthur, was to carry Excalibur.
THAT is why I am your king.
"""

qwords = storyquote.split()
print(qwords)

Note that the string is broken up on whitespace (spaces and newlines). Punctuation goes with the word it is attached to.

Strings are _a little_ like lists: They are sequences of letters.
Indexing a string gives us letters, indexing a list gives us
list elements (strings in this case):

In [None]:
print(qwords[1])    # Word at index 1: 'Arthur:'

print(storyquote[1])    # At index 1 of the string: WHAT WILL THIS PRINT?

Counting also works with the appropriate units: Elements for lists, but characters for strings.

In [None]:
print("Characters:",  len(storyquote), "(string)")
print("Words:      ", len(qwords), "(list of strings)")

### 1.3 Joining together a list of strings

The method `join()` is the opposite of `split()`: 

- `split()` breaks up a string into a list of smaller strings, throwing away the separator (which, by default, is any amount of whitespace).
- `join()` combines a list of strings into a single string, adding a separator. 

The following will join together the elements of `qwords`, adding a space between them:

In [None]:
joined = " ".join(qwords)
print(joined)

The syntax is a little convoluted: `join()` is a method of the "separator" string that will be inserted between our list elements, and our list is its _argument._ The reason for this puzzling choice is that a list can contain any type of object, but only lists of strings can be joined into a single string. 

Naturally, we can use other separator strings, even the empty string (which simply sticks the list elements together):

In [None]:
fruit = [ "apple","banana", "cherry", "date"  ]
print(" and ".join(fruit)) 
print("".join(fruit))

We can also specify a custom separator for `split()`, but this rarely useful. 

In [None]:
print("a:b:c:d".split(":"))

### Your turn:

Split up the string `groceries` into a list of words, then join them back into a new string that has a comma and a space between words.

In [None]:
groceries = "bananas apples chocolate spaghetti liver"

In [None]:
# YOUR SOLUTION:


### 1.4 Printing lists of strings "nicely"

Note that when we ask `print` to print out a list of strings, Python shows us a detailed representation of the list (the so-called "`repr`" form), with all the brackets, commas and quotes we would need to define the list in a Python program. 

In [None]:
print(qwords)

When we work with text, this is usually not what we want: we just want to see the words themselves. The method `join()` gives us one easy way to do this.  (Another way, which we will see presently, is to use a loop.)

In [None]:
print(" ".join(qwords))

When you are asked to print a list of words "nicely", you should always arrange for the text to be shown without the brackets and other Python syntax.

### 1.5 Extracting sublists

List "slices" work just like string slices.
The following gives us a sublist starting at index 0
and stopping just **before** index 5 (i.e., indexes 0 to 4)

In [None]:
sublist = qwords[0:5]
print(sublist)

A missing start means 0, a missing end means the end of the list:

In [None]:
print(qwords[:13])  # Print words at indices 0-12 only
print(qwords[13:])  # Print words from index 13 on

There's a lot more to using indices: See the readings and python
documentation. 

### Your turn:

Split the string `groceries` into a list of words, then use slicing to create a new list, `tasty`, that contains all except the last word. Print out your result.

In [None]:
groceries = "bananas apples chocolate spaghetti liver"

In [None]:
# YOUR SOLUTION:


A list slice is a list, even if it contains just one element. Look carefully at the following expressions. Can you predict their output before you run the code? Execute the code cell, then look carefully at the result.

In [None]:
print(qwords[1])
print(qwords[1:2])
print(qwords[1:1])

### 1.6 Modifying lists

A list is a collection of objects, and we can modify individual elements with the assignment operator.

In [None]:
qwords[1] = "Fred:"
print(qwords)

We can also delete elements outright, using (among other ways) the operator `del`.

In [None]:
del qwords[0]
del qwords[-10:]  # Delete the last ten elements
print(qwords)

Our list `qwords` is now a lot shorter. Run the last block of code, and you'll see it get even shorter. Note that the string `storyquote`, from which we split the list, has not been affected:

In [None]:
print(storyquote)

### Your turn:

Redefine `qwords` so that it once again contains the full list of words, by splitting `storyquote` again.

In [None]:
# YOUR SOLUTION:


### 1.7 Copies and references

Let's try to make a copy of `qwords` so we can play with it. Study the following code, then run it **once** and inspect the results:

In [None]:
print("Original:", qwords)

copy = qwords
copy[1] = "Fred:"
del copy[5:]
print("copy, now:  ", copy)
print("qwords, now:", qwords)

If you are not surprised, you should be: **Both the copy and the "original" were modified.** The reason is this: Assignment in Python doesn't actually copy objects (of any kind). It merely creates another name for the original object. (Technically: Another _reference_ to the original object.) For objects like lists whose parts can be modified, this can have very surprising effects. 

**Seeing is believing!** To make all this absolutely clear, check out [this great code visualization][1] on the site `pythontutor.com`. Use the **Forward** button to go through our code step by step. When you are done, you can use one of the links `Edit code` or `Live programming` (under the source code window) to modify the code or paste your own to explore further.

[1]: http://pythontutor.com/visualize.html#code=qwords%20%3D%20%5B'King',%20'Arthur%3A',%20'The',%20'Lady',%20'of',%20'the',%20'Lake,',%20'her',%20'arm',%20'clad',%20'in',%20'the',%20'purest',%20'...'%5D%0A%0Acopy%20%3D%20qwords%0Acopy%5B1%5D%20%3D%20%22Fred%3A%22%0Adel%20copy%5B5%3A%5D&cumulative=false&curInstr=0&heapPrimitives=false&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false

So how do we actually make a copy? Lists have a `copy()` method for just this purpose.

In [None]:
qwords = storyquote.split()
copy = qwords.copy()
del copy[5:]
print("In qwords:", len(qwords))
print("In copy:", len(copy))

There are other ways to copy lists, but **the best approach is to simply avoid modifying lists:** Instead of deleting unwanted elements from a list, for example, it is usually simpler and faster to build a new list that contains just the elements we want. This is the approach we'll usually follow; you will not need to delete anything in the rest of this notebook.

## 2. Better flow control

### 2.1 Looping over a list

Here's a simple recipe for doing something with each word in the string. This particular loop prints "umm" before each word in our word list. Examine the code, then read the explanation below.

In [None]:
n = 0
while n < len(qwords):
    print("umm", qwords[n], end=" ")
    n = n + 1
print()
print("Done")

Note how the two calls to `print` are written. The argument `end=" "` in the first call tells  `print` to add a space after the output of `print`, instead of the usual newline. In effect, it prevents `print` from adding a newline between words.  When the loop is done, the second `print()` call adds a newline to terminate the line. (I've added a final `print()` call to make the result visible.)

In the following section, we examine this feature of `print()` in more detail.

### 2.2 Using if-else statements

We can use `if`-`else` statements with a loop to pick out particular words, instead of printing everything. Can you tell what the following does? (The variable `qwords` should still contain the words from our earlier quote.)

In [None]:
n = 0
while n < len(qwords):
    if 'a' in qwords[n]:
        print(qwords[n], end=" ")
    n = n + 1
print()    

Here is a more elaborate example. **Try to deduce what it will do before you run the code.**

In [None]:
n = 0
while n < len(qwords):
    word = qwords[n]
    if word.startswith('s'):
        print(word, end=" ")
    elif word.endswith(','):
        print(',', end=" ")
    else:
        print("*", end=" ")
    n = n + 1
print()

### 2.3 Looping without a counter

In [None]:
n = 0
while n < len(qwords):
    if len(qwords[n]) <= 3:
        print(qwords[n])
    n = n + 1

Using a for-loop, we can rewrite the above like this:

In [None]:
for w in qwords:   # Set w to each element in turn
    if len(w) <= 3:
        print(w)

The syntax of the `for` loop is: `for <variable> in <sequence>:`, followed by an indented block of code (the "body" of the loop). Schematically:
```
for x in <sequence>:
    <do stuff with x>
```
(Note that the `if` statement in our example is not required! It's just something we put in the body of this loop). In contrast to the `while` loop, there is no counter and nothing to increment. The indented block, the "body" of the loop, will be executed once for each element of the list `<sequence>`, with the loop variable set to that element. Then the loop will exit. Because `for`-loops are so much simpler, you should prefer them whenever you are visiting every element of a sequence.

### Your turn:

Use a `for`-loop to print "umm" before each word in our word list. (Separate the words with spaces, as usual.) 

In [None]:
# YOUR SOLUTION:


### 2.4 Collecting results

Instead of simply printing out the short words in our list, it is more useful to collect them into a new list. We can do this by creating an empty list, then using its `append()` method (remember it?) to add words to it:

In [None]:
shortwords = []
for w in qwords:
    if len(w) <= 3:
        shortwords.append(w)

### Your turn:

Copy the above block of code and modify it so that the list `shortwords` is printed out every time it grows. (I.e., only print it out after a short word has been found and added).

In [None]:
# YOUR SOLUTION:


### 2.5 We can still count when we need to

What if we just want to do something 10 times?

In [None]:
for n in range(10):
    print(n, "do something")


The function `range(10)` returns the ten numbers from 0 to 9. As with slices, the number at the end of the range is not included in the generated sequence. `range()` can also be called with two arguments, in which case the sequence starts with the first number instead of with zero.

In [None]:
for n in range(5, 10):
    print(n, "Mississippi")

We can also change the "step" between successive values of the range:

In [None]:
for num in range(10, 20, 2):
    print(num, end=" ")

Incidentally, string and list slices also allow a step argument, as a third index. E.g., to select every third letter from the first fifteen letters of a string:

In [None]:
letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
print(letters[0:15:3])

### 2.6 The magic behind `range()`

Although `range()` behaves like a list, you should be aware (because you'll find out anyway) that it is not really a list:

In [None]:
x = range(10, 15)
# List-like
print(x[3])
print("x has", len(x), "elements")

# Not list-like:
print(type(x))
print(x)

`range()` gives us a special object (an "iterator") specifically designed to handle large number ranges. Because it doesn't actually build a list of all the numbers, it can work quickly even if we specify enormous limits. Of course, if necessary we can convert the range to a real list (which will be as large as it needs to).

In [None]:
x = range(100, 10000000000000000000, 350)
print(x)
print("Number of elements in range x:", len(x))
print(x[50000000])

# Convert a (not huge) range to a list:
numbers = list(range(200))  
print(numbers)

Python makes wide use of iterators, often behind the scenes. We'll learn more as we encounter them.

## 3. More on functions

### 3.1 Optional arguments to functions

We now return to functions. If you are hazy on defining and using your own functions, review the topic now (in the earlier notebook) before you go on.

Functions can have one or more _optional_ arguments: we can call the function without giving a value for them. In python, the motivation for optional arguments is the principle "don't state the obvious". Optional arguments have a predefined "default" value, which they automatically receive whenever we call a function without one of its optional arguments.

Here's our old blackboard punishment function. Since the typical punishment is to write something a hundred times, we can make this number the default:

In [None]:
def write_on_blackboard(message, howmany=100):
    for n in range(0, howmany):
        print(n+1, message)

# This will run with `howmany=20`
write_on_blackboard("Practice makes perfect", 20)

# This will run with `howmany=100`
write_on_blackboard("I will not skateboard in class")


We can define functions that take any number of arguments,
obligatory or optional. Optional arguments must come after obligatory ones -- otherwise we wouldn't be able to omit them!

In [None]:
def greet(name, greeting="Hello", times=1):
    for n in range(times):
        print(greeting, name+"!")

greet("Alexis")
greet("Bob", "Welcome,", 3)

### 3.2 Named arguments ("keyword arguments")

We usually keep track of arguments by their position: first, second, etc.
But sometimes (especially with functions that have several optional arguments,
or simply lots of arguments), it is convenient to identify an argument
by name ("keyword arguments"):

In [None]:
greet("Alexis", times=3)

No special command is necessary: Python always allows us to identify function arguments by the name they were given when the function was defined. 

## 4. Your turn: Working with lists of strings

We will now review what we just learned, and practice working with a short text. Corpus searches of all sorts are just more elaborate versions of this kind of manipulations.

**Task 1:** Copy and paste the following text, and assign it to a string variable named `dialog`. Be sure to preserve the newlines (i.e., don't edit it so that the whole thing is one huge line).

<pre>
DENNIS: [interrupting] Listen -- strange women lying in 
        ponds distributin' swords is no basis for a system of
        government.  Supreme executive power derives from a mandate 
        from the masses, not from some farcical aquatic ceremony.
ARTHUR: Be quiet!
DENNIS: Well you can't expect to wield supreme executive power
        just 'cause some watery tart threw a sword at you!
ARTHUR: Shut up! 
        [...]
DENNIS: HELP! HELP! I'm being repressed!
</pre>

In [None]:
# YOUR SOLUTION:


**Task 2:** Split the string `dialog` into a list of words, and use a loop to print it one word per line. Also report how many words there are in `dialog`.

In [None]:
# YOUR SOLUTION:


**Task 3:** Use another loop to print only words of four letters or more, 
 as running text (_not_ one word per line).

In [None]:
# YOUR SOLUTION:


**Task 4:** Print only every fifth word, again as running text. You could use the modulo operator `%` (if n is an integer, `(n % 5)` is zero only if 5 divides `n`), or any other method. 

In [None]:
# YOUR SOLUTION:


**Task 5:** Use the method `append()` to build a list of all words in `dialog` that have four letters or more. 

In [None]:
# YOUR SOLUTION:


**Task 6:** Build a list of all words in `dialog` that begin with "s" or "S".

In [None]:
# YOUR SOLUTION:


**Task 7:** Using your list of all words in `dialog`, create a string that contains the **even numbered** words, i.e. the words at index 0, 2, 4, etc. Separate the words with spaces.

In [None]:
# YOUR SOLUTION:


-----------------------------------

## 5. Function practice

The Fibonacci numbers are a sequence where each
number Fn is the sum of the two previous numbers, starting with 1, 1:

    1, 1, 2, 3, 5, 8, 13, 21, 34, ...

**Task 8:** Write a loop that prints out all Fibonacci numbers smaller than a million.  
HINT: Use two or more variables.

In [None]:
# YOUR SOLUTION:


<span>**Task 9:**</span> Write a **function** `fib(n)` that will calculate (and return) the n'th Fibonacci number, where:

`fib(0) = fib(1) = 1`, and so `fib(2) = 2`, `fib(5) = 8`, etc.

In [None]:
# YOUR SOLUTION:


## 6. Review: What we learned

### Things you'll need to know by heart

Here are some essential skills from today's notebook. If you have to look them up in the future, you won't be able to progress. So make sure they roll off your fingers.

* The syntax for defining a list. Example: Define a list `demo` containing (in that order) the numbers 1, 3, 4, 20, and the strings `"hello"` and `"goodbye"`.

* The function that gives us the number of elements in a list.

* How to split a string into a list of words, and how to join the list of words back together.

* How to keep the output of several `print` statements on the same line.

* The syntax for indexing and simple slicing (and the meaning of these terms). Examples: The index of the last element of a list (regardless of length). The first element? The next-to-last element? The last ten elements?

### Your turn:

Test your memory by trying out the skills in the above checklist. Go back and review the relevant sections until you feel that you can handle such tasks without needing to look up anything.

### Things you should remember you saw

Here are some things you will not need as often; make sure you rememeber that they exist, and where you saw them, and you can always google them or come back to this notebook when you need them.

* There are ways to make a real copy of a list.

* How to "slice" off part of a list into a smaller list. Examples: Slice off the first 3 elements of the list; or the last 4 elements. 

* How to define a function with optional arguments.

* `range()` can do a lot more than give us a list of numbers from 1 to `N`.

* Python's "iterators" behave like lists, but are not.