# Geological abstractions

You've seen how to represent individual numbers and strings — and we'll use these concepts a lot in our programming. But you also need to be able to represent more complex things, like grains, or rocks, or maps, or models of the subsurface, or other geological entities. 

In general, we'd like to be able to make collections of data which represent these things and allow us to use them in our programs. Finding or making such collections is part of the challenge, and fun, or programming.

On a fundamental level, 'geocomputing' is just this — the search for useful models of the earth in code and data.

In our attempts to make useful models, we're going to meet the following Python types:

- `list`: mutable ordered collections
- `tuple`: immutable ordered collections
- `dict`: mappings, short for 'dictionary'
- `set`: unordered collections of unique elements

We'll look at lists and tuples in this chapter, then continue with dicts and sets in the next. Let's get started!

## `list`

Lists in Python are one-dimensional, ordered containers whose elements may be any Python objects (numbers, strings, other lists, or a mixture of things). Lists are *mutable* — they can be changed in place — and have methods for adding and removing elements to and from themselves.

The syntax to define a list is commas seperated values surrounded by square brackets (`[]`). The square brackets are a syntactic hint that lists are indexable.

Let's imagine a stratigraphic series of six rocks, a sequence (in the non-technical sense). We'll define a list of P-wave velocities representing these rocks:

In [1]:
vp = [2100, 2400, 2300, 2500, 2350, 2750]
vp

[2100, 2400, 2300, 2500, 2350, 2750]

As with strings, we can index into the list with square brackets:

In [2]:
vp[0]

2100

I can make a tuple of two or more such elements at once:

In [3]:
vp[1], vp[-1]

(2400, 2750)

In [4]:
vp[2:]

[2300, 2500, 2350, 2750]

Unlike strings, lists are **mutable**, which means I can change an element in place:

In [5]:
print(vp[0])
vp[0] = 2150
print(vp[0])

2100
2150


We can add another 'layer' to the sequence of rocks:

In [6]:
vp.append(3000)
vp

[2150, 2400, 2300, 2500, 2350, 2750, 3000]

We can also concatenate lists:

In [7]:
vp + [4000, 5000]

[2150, 2400, 2300, 2500, 2350, 2750, 3000, 4000, 5000]

Note that we ran that command interactively — but it didn't change the list we made:

In [8]:
vp

[2150, 2400, 2300, 2500, 2350, 2750, 3000]

### Exercise: what is the expected output of the following commands?

- `vp[:3]`
- `vp[-2:]`
- `vp[1:-1]`
- `vp[1::2]`

### Exercise: more indexing practice

In [9]:
periods = ['Cambrian (\uA792)', 'Ordovician (O)',  'Silurian (S)', 
           'Devonian (D)', 'Carboniferous (C)', 'Permian (P)',
           'Triassic (T)', 'Jurassic (J)',  'Cretaceous (K)', 
           'Palaeogene (Pg)', 'Neogene (N)', 'Quaternary (Q)']

- return the string `Triassic (T)` 
- return just the word `Jurassic`
- return the abbreviation `Pg`

Try to do these things without explicitly using the positions of the characters. <span title="Use the string method find()"><b>Hint.</b></span>

## Nested `list`

Lists can contain anything... but at some point things get out of hand. There's often a better way.

In [10]:
periods = [["Cambrian (\uA792)", [541, 485]], ["Ordovician (O)", [485, 444]], 
           ["Silurian (S)", [444, 419]], ["Devonian (D)",[419, 359]], 
           ["Carboniferous (M)", [359, 299]], ["Permian (P)", [299, 252]],
           ["Triassic (T)", [252, 201]], ["Jurassic (J)", [201, 145]],
           ["Cretaceous (C)", [145, 66]], ["Palaeogene (Pg)", [66, 23]],
           ["Neogene (N)", [23, 2.6]], ["Quaternary (Q)", [2.6, 0]],
          ]

## Pointers and gotchas

Let's say we'd like to make another list, using `vp` as a starting point:

In [11]:
vp_new = vp

Let's change the first element of the new list:

In [12]:
vp_new[0] = 1486

vp_new

[1486, 2400, 2300, 2500, 2350, 2750, 3000]

A reminder of what `vp` contains:

In [13]:
vp

[1486, 2400, 2300, 2500, 2350, 2750, 3000]

Wait a minute! We changed `vp[0]` to `2150`... why is it now `1486`? 

This is one of the perils of high-level programming languages. Most of the time, the machinations of memory and computation are hidden away... but we can't forget them, especially with mutable types.

The problem is that the command `vp_new = vp` didn't copy the actual list that resides in memory into a new bit of memory. It only made a new name and pointed it at the same bit of memory `vp` was pointed at. So both names are references to the same thing.

So, how do we make a copy? Well, it depends: there are two types of copy: a shallow copy and a deep copy. Shallow copies make new objects, but the elements of those new objects are only references to other objects. Deep copies, on the other hand, copy the object and all the references they contain.

This is the best way to make a shallow copy:

In [14]:
vp_new  = vp.copy()

There is another popular way to make a shallow copy &mdash; you'll often see people do `vp_new = vp[:]`. This achieves the same thing, but we find it a little cryptic. You might also see people doing `vp_new = list(vp)`, which is also not all that obvious. It's better to use the crystal clear `vp.copy()`.

To make a deep copy, which is always the safest thing to do if your lists contain mutable objects (such as other lists), you'll need the `copy` library, which is part of the standard distribution of Python and therefore always available. In our case the list only contains numbers so the deep copy has no advantage, but here's how we would do it:

In [15]:
import copy
vp_new = copy.deepcopy(vp)

Now that we have a copy, we can change the first element of `vp` back to what it was:

In [16]:
vp[0] = 2150

...and check that `vp_new` did not change:

In [17]:
vp_new

[1486, 2400, 2300, 2500, 2350, 2750, 3000]

It worked! The names `vp` and `vp_new` are now pointing at different objects in memory.

This behaviour does not apply only to lists. It's the same with tuples, which we'll meet next, and with dictionaries, which we'll meet in the next chapter:

In [18]:
a = {1: 'a', 2: 'b'}
b = a
b[2] = 'c'
print(a)

{1: 'a', 2: 'c'}


### `tuples`

*Tuples* are a lot like lists, but they are _immutable_ &mdash; they cannot be extended. They behave almost exactly the same as lists in every way except that you cannot change any of their values. There are no `append()` or `extend()` methods, and there are no *in-place* operators. 

They also differ from lists in their syntax. They are so central to how Python works, that *tuples* are defined by commas. Oftentimes, tuples will be seen surrounded by parentheses. These parentheses only serve to group actions or make the code more readable, not to actually define tuples.

In [19]:
a = (1,2,3,4)  # a length-4 tuple
b = (42,)      # length-1 tuple defined by the comma
c = (42)       # not a tuple, just the number 42
d = ()         # length-0 tuple- no commas means no elements
e = 42, 1      # a length-2 tuple

In [20]:
(),

((),)

In [21]:
a[2] = 5

TypeError: 'tuple' object does not support item assignment

You can concatenate tuples together in the same way as lists, but be careful about the order of operations. This is where parentheses come in handy,

(1, 2) + (3, 4)

In [22]:
(1,2)+(3,4)

(1, 2, 3, 4)

Note that even though tuples are immutable, they may have mutable elements. Suppose that we have a list embedded in a tuple. This list may be modified in-place even though the list may not be removed or replaced wholesale:

In [23]:
x = 1.0, [2, 4], 16
x[1].append(8)
x

(1.0, [2, 4, 8], 16)

In this way, the same gotcha that lists have can also apply to tuples:

In [24]:
a = [1]
b = [2]
c = (a, b)  # Immutable... but the contents are mutable!
d = c       # Just a pointer to the object `c` points to.
a[0] = 100
print(c)
print(d)

([100], [2])
([100], [2])


## Collector's items

Now you can make collections of things! These collections will be important in almost every Python program we write. If they haven't sunk in yet, don't worry — you will get plenty of practice with these data structures.

In the next chapter, we'll look at another important container — dictionaries — which will also become very familiar over time. We'll also meet sets, which have some special properties of their own.