# Operating on time and space

So far, we've seen that we can represent the natural world with data and models, and can operate on single instances of things. For example, we can represent a rock as a P-wave velocity and a bulk density, then multiply those together to get acoustic impedance. 

But we could do these things onthe back of an envelope. The whole point of computers is to process large amounts of data — usually representing measurements made over time or space. We'll often want to process large collections of data. In this chapter, we'll meet loops, especially `for` loops, which will let us do just this.

We'll also meet conditional statements, the `if... else...` block, and see how to control exactly what gets processed, and how, according to various rules or conditions. 

At the end of the chapter, we also find out how to read data from a text file. And this will set us well on our way to writing full-blown programs that do real work.

Let's get started!

## `for` loops

Let's start with some made-up data resembling a pair of well logs.

In [32]:
vp = [2300, 2400, 2500, 2300, 2600]
rho = [2.45, 2.35, 2.45, 2.55, 2.80]

The velocity log is in units of m/s. The density 'log' appears to be in units of g/cm<sup>3</sup>. We'd like to convert it to SI units of kg/m<sup>3</sup> by multiplying all of the values by 1000. 

To do this, we'll use a `for` loop. This will visit each value in turn, and multiply it by 1000, and append it to a new list with the correct values. First, we'll create an empty list to hold the new values:

In [34]:
rho_si = []

In [35]:
for value in rho:
    value_si = 1000 * value
    rho_si.append(value_si)

In [36]:
rho_si

[2450.0, 2350.0, 2450.0, 2550.0, 2800.0]

The basic pattern of a `for` loop is always the same. You can read the first line like, "for each thing in these things", where "these things" is some sequence of items. The loop body describes what to do with each "thing". When there are no more instructions, the loop continues to the next item. If there are no more items, the loop terminates and the program continues.

Here are some more examples:

In [39]:
for n in [1,2,3,4,5]:
    print(n**2)

1
4
9
16
25


In [37]:
for character in "Sandstone":
    print(character)

S
a
n
d
s
t
o
n
e


In [40]:
for word in ["Sand", "Silt", "Mud"]:
    for character in word:
        print(word, character)

Sand S
Sand a
Sand n
Sand d
Silt S
Silt i
Silt l
Silt t
Mud M
Mud u
Mud d


In [62]:
samples = ['A10', 'B10', 'C22', 'D20']
for i, sample in enumerate(samples):
    print(f"{i+1}. {sample}")

1. A10
2. B10
3. C22
4. D20


The function `enumerate()` works like a counter. It gives you access to the index of each item as you traverse the list (or whatever kind of collection it is). It returns a tuple of `(index, item)` on each loop, and we use the simulatenous assignment trick to unpack these. 

## List comprehension

Look again at the pattern in our first example in the previous section:

- Make an empty list.
- Loop over a collection (an 'iterable').
- Transform each item to make a new thing.
- Append the new things to the new list.

This pattern crops up so often that there's a more concise way to do it — the **list comprehension**. These follow an analogous pattern, but expressed more tersely:

- Transform items in this sequence and make a new list. 

Here's the example from the previous section again:

In [41]:
rho_si = []
for value in rho:
    value_si = 1000 * value
    rho_si.append(value_si)

After which `rho_si` contains the transformed values:

In [42]:
rho_si

[2450.0, 2350.0, 2450.0, 2550.0, 2800.0]

Let's rewrite this loop as a list comprehension:

In [44]:
rho_si = [1000*value for value in rho]

We end up with the same thing as before:

In [45]:
rho_si

[2450.0, 2350.0, 2450.0, 2550.0, 2800.0]

### Exercise

Turn these loops into list comprehensions:

In [47]:
squares = []
for n in range(20):
    squares.append(n**2)

In [None]:
# SOLUTION
[n**2 for n in range(20)]

In [50]:
names = ['William Smith', 'Mary Anning', 'Steve Gould']
short = []
for name in names:
    initial = name[0]
    surname = name.split()[-1]
    short.append(surname + initial)

In [None]:
# SOLUTION
[name.split()[-1] + name[0] for name in names]

## Looping over two lists: `zip`

Often, we would like to combine the data from two lists. Perhaps we'd like to compute the acoustic impedance from our velocity and density logs. (We will need acoustic impedance to compute acoustic reflectivity and thus the synthetic seismic response.)

Some computer languages would achieve this with the use of a counter. Using the function `enumerate()` which we met earlier, this might look like:

In [57]:
impedance = []
for i, v in enumerate(vp):
    imp = v * rho_si[i]
    impedance.append(imp)

Let's look at what we made:

In [63]:
impedance

[5635000.0, 5640000.0, 6125000.0, 5865000.0, 7280000.0]

We don't need to do this in Python, however. Even with the eminently Pythonic `enumerate` trick, we can do better. The function `zip()` will take two or more sequences and match them, item for item, until it reaches the end of the shortest sequence:

In [65]:
for t in zip("Sandstone", "Mudstone"):
    print(t)

('S', 'M')
('a', 'u')
('n', 'd')
('d', 's')
('s', 't')
('t', 'o')
('o', 'n')
('n', 'e')


It didn't return a tuple containing the last character of `'Sandstone'` because it ran out of `'Mudstone'`. 

Let's apply this to our impedance loop, again using the simultaneous assignment trick to name the `vp` values `v` and the `rho_si` values `r`:

In [66]:
impedance = []
for v, r in zip(vp, rho_si):
    imp = v * r
    impedance.append(imp)

In [68]:
impedance

[5635000.0, 5640000.0, 6125000.0, 5865000.0, 7280000.0]

### Exercise

Write this new loop as a list comprehension:

In [69]:
# SOLUTION
impedance = [v * r for v, r in zip(vp, rho_si)]

## Making decisions with `if`

We very often want to make certain operations in a program depend on the outcomes of other operations. For example, we would only want to transform the units of a measurement if the measurement is not already in those units. Let's do this for our SI unit transformation, only transforming numbers that are less than 10 and therefore likely in the cgs system.

In [75]:
transformed = []
for value in rho:
    if value < 10:
        value *= 1000
    transformed.append(value)
transformed

[2450.0, 2350.0, 2450.0, 2550.0, 2800.0]

Note that `value *= 1000` is exactly equivalent to `value = value * 1000`; it just avoids a bit of repetition.

If we run the same code on the `vp` data, the transformation is not applied and we get:

    [2300, 2400, 2500, 2300, 2600]
    
As well as telling the code what to do if a condition is true, we'd often like to specify what to do if it's false as well. In the previous example, we didn't specify anything, so the program did nothing when `value` was equal to or greater than 10. To provide a specific path for when the test fails, we use `else`:

In [141]:
transformed = []
for value in rho:
    if value < 10:
        value *= 1000
    else:
        value = value
    transformed.append(value)
transformed

[2450.0, 2350.0, 2450.0, 2550.0, 2800.0]

This doesn't do anything different, it must makes the implicit 'do nothing' explicit.

If we have multiple things to test for, we can simply add more `if` blocks. However, if the conditions need to be applied in a mutually exclusive way, then we can use `if... elif... else`, where `elif` means "else if" and simply applies a new conditional:

In [145]:
rho[0] = -0.1

transformed = []
for value in rho:
    if value < 0:
        value = -1
    elif value < 10:
        value *= 1000
    else:
        value = value
    transformed.append(value)
transformed

[-1, 2350.0, 2450.0, 2550.0, 2800.0]

## `break` and `continue`

Sometimes we'd like to short-circuit the loop. For example, when reading a file, we might need to skip lines in the file that look like comments. Suppose we have a file with the following contents:

    # B-41 Formation tops
    # Depths in metres
    WyanDot FM,858.62158
    DAWSON CANYON FM,985.11358
    LOGAN CANYON FM,1157.0207
    Upper MISSISAUGA FM, -2246.9856m
    Lower MISSISAUGA FM,3190.6464
    Base O-Marker,2472.561
    Abenaki,-999.25
    pay_sand_1,
    TD,-999.25,
    % END OF DATA
    
Those first two lines clearly need to be treated differently from the other lines in the file. For now, we'll ignore them. 

the last line is garbage; let's just stop once we've processed anything called `'TD'`. 

Let's start by reading all the lines in the file into a list of strings, where each string is one line. We do this by making a file-reading 'context' with the keyword `with`. This simply gives us a block in which we can open the file and read from it, and Python's context manager will take care of closing the file when we're done. The `'r'` tells the function `open()` to open the file for reading.

In [164]:
with open('../data/B-41_tops.txt', 'r') as f:
    data = f.readlines()

Now we have a list of strings, so we can use a `for` loop to process it. We'll use `continue` to skip the rest of the loop when a line starts with the character `'#'`. 

In [165]:
tops = {}
for line in data:
    if line.startswith('#'):
        continue
    name, depth = line.split(',')
    tops[name] = depth
    if name == 'TD':
        break
tops

{'Abenaki': '-999.25\n',
 'Base O-Marker': '2472.561\n',
 'DAWSON CANYON FM': '985.11358\n',
 'LOGAN CANYON FM': '1157.0207\n',
 'Lower MISSISAUGA FM': '3190.6464\n',
 'TD': '-999.25\n',
 'Upper MISSISAUGA FM': ' -2246.9856m\n',
 'WyanDot FM': '858.62158\n',
 'pay_sand_1': '\n'}

### Exercise

Modify the file reader to produce better results. For example, try to do the following:

- Remove the new line characters (`'\n'`) from the ends of the lines.
- Make sure the numbers are represented as numbers, not strings.
- Skip tops with null depth values, represented by `-999.25`
- Regularize the names a bit, so they aren't so heterogeneous.
- Do anything else that looks liek it might cause a problem.

In [167]:
# SOLUTION

# We'll use some parameters instead of hard-wiring these options.
fix_case = True
delimiter = ','
comment = '#'
null = -999.25

# Process the list of lines:
tops = {}
for line in data:

    # Skip comment rows.
    if line.startswith(comment):
        continue

    # Assign names to elements.
    name, dstr = line.split(delimiter)

    if fix_case:
        name = name.title()
        
    # Stop processing if we reached TD.
    if name.lower() == 'td':
        break

    dstr = dstr.strip()
    if not dstr.isnumeric():
        dstr = dstr.lower().rstrip('mft')

    # Skip NULL entries.
    if (not dstr) or (dstr == str(null)):
        continue

    # Correct for other negative values.
    depth = float(dstr)
    if depth < 0:
        depth *= -1
        print('Changed depth: {}'.format(name))

    tops[name] = depth

tops

Changed depth: Upper Missisauga Fm


{'Base O-Marker': 2472.561,
 'Dawson Canyon Fm': 985.11358,
 'Logan Canyon Fm': 1157.0207,
 'Lower Missisauga Fm': 3190.6464,
 'Upper Missisauga Fm': 2246.9856,
 'Wyandot Fm': 858.62158}

## `while`

The `for` loop is guaranteed to run a finite number of times, because it will only loop over some existing collection of items. Sometimes (rarely, it turns out), you want to instead wait for some condition to be met — then you need `while` instead of `for`.

In practice, these occasions are few and far between. One pattern that does come up is waiting for a certain kind of input from a user:

In [55]:
while True:
    n = input("Please enter a number between 0 and 1: ")
    if 0 < float(n) < 1:
        break

print(n, "met the condition.")

Please enter a number between 0 and 1:2
Please enter a number between 0 and 1:43
Please enter a number between 0 and 1:0.12
0.12 met the condition.


Notice how the condition in this case is, by definition, always true. We then test for our desired condition inside the loop, and break out of the loop explicitly if it is true.

Another way to do this would be to put a negative version of our test in the loop definition. In other words, we're saying, "keep doing this as long as our condition is not met". This makes the loop neater, but we have to predefine the variable `n`, which feels like an aesthetic compromise. 

In [56]:
n = 0
while not (0 < float(n) < 1):
    n = input("Please enter a number between 0 and 1: ")

print(n, "met the condition.")

Please enter a number between 0 and 1: 3
Please enter a number between 0 and 1: 2
Please enter a number between 0 and 1: -4
Please enter a number between 0 and 1: 2.3
Please enter a number between 0 and 1: -100
Please enter a number between 0 and 1: 1.2
Please enter a number between 0 and 1: 0.23
0.23 met the condition.


## Turing complete

The tools we have met in this chapter — loops and conditional statements — will enable us to control the 'flow' of our programs. Indeed, with loops and conditionals, along with the tools we have met so far (variables, operators, ad so on), we now have everything we need to write *any* computer program! Congratulations on attaining Turing completeness!

Next we'll look at another important data structure — the dictionary. 