# Intro to Python - For Video

We are going to work on the basics for working with some well data over the course of this and following tutorials.  At the end of this, you should be able to read in a text file with log data, manipulate it to some extent, and then save it back to disk.

We will start with reading a file containing some sample data in, and then take those and start working on them in a variety of ways. For now, do not focus too much on how the actual file is read; we _will_ come back to it. Just use it as block box for the moment, the explanation of the syntax needs a little more context about other concepts that have not been explained yet (because none have been explained yet!).

In [None]:
with open('../data/well_log.txt', 'r') as f:
    log = f.readlines()

log

We are going to start with that first group of values, which are the thicknesses of each horizon.

In order to do that, we need to extract them from `log`, whatever that is. Python has a built-in **_function_** named `type` to check. We use a function by _**calling**_ it with parentheses and giving zero or more **_arguments_** that the function will do things with. Here, `type` is the function, and `log` is the argument passed to it when we call it:

In [None]:
type(log)

We can see that `type` has _**returned**_ the type of `log`: `list`. A list is an ordered sequence made up of a number of elements. Lists are recognisable as `[]` surrounding a number of elements separated by commas. If you need to make a list explicitly, the general pattern is as follows:

In [None]:
['a', 'b', 'c']

A useful thing to know is how long a list is:

In [None]:
len(log)

This means that `log` is a list made up of three elements (in this case, each element is a line in our input file). We can extract a specific element from our `log` list using the following syntax:

In [None]:
log[0]

The value in the `[]` is the index of a given element, starting from `0`. What is the type of `log[0]`?

In [None]:
type(log[0])

`str` is the basic type for strings, which can be thought of as lists of characters, with matching `'` or `"` at each end: `'This is a string'` and `"so is this!"`. Strings have a number of useful **_methods_** (which are functions associated to a specific object, rather than to a type of object). We will first save our `log[0]` to something more convenient to type, using the **_assignment_** operator `=`.

In [None]:
layers = log[0]

Assignment is silent, which means that we do not get any feedback when it works. (If it failed for some reason, we would get an error, in Python termed an _**Exception**_. You will soon become familiar with these.) We now have a `str` named `layers`:

In [None]:
layers

One of the most useful string methods is `split`, which allows us to break a string into a list, based on a specific substring. In this case, we can see that each element is separated by a `,`. There is also `\n` at the end. This is a newline character, and we can remove it using `strip`. There are a number of other useful string methods that we will look at later, but for now we will just use these.

In [None]:
layers.strip()

In [None]:
layers.split(',')

Why do we still have the `\n` in the second line there? Strings are "immutable", which means that they can not be changed once created. The various string methods return _copies_ of the original string. We can also chain these methods together if we want to use multiple methods. The result will be assigned the name `layers_list`.

In [None]:
layers_list = layers.strip().split(',')
layers_list

This has now given us a list of numbers.

<hr>

We can access elements as already shown, where the first element has index `0`:

In [None]:
layers_list[0]

Accessing multiple elements uses the following pattern:

In [None]:
layers_list[3:7]

Note that the value (`1.3`) at the second index (`7`) is not included. `list[a:b]` will fetch the elements at index `a` up to, but not including index `b`. If we want the last element, we can use `list[-1]`:

In [None]:
layers_list[-1]

We can also take elements out of the list at regular intervals by adding a third value, termed the 'stride'. In this case, we will start at the end of the list and then stride backwards to every second element until we get to index 4:

In [None]:
layers_list[-1:4:-2]

Adding elements to the end of a list can be done using the `append` method, while `pop` will remove a given item, by default from the end of the list, and give you the value back. `insert` will add elements before a given index, while `extend` will combine two lists together. The first two are much more common, but you may find uses for the second pair.

In [None]:
layers_list.append(13)
layers_list

In [None]:
layers_list.pop()

In [None]:
layers_list

An important property of lists is that elements can be changed, if we need to change them. This means that lists are _**mutable**_. We do this by assigning a value to a given index in the list. (Recall we said that `str`s are lists of characters? This next cell does not work on a string, because `str`s are **_immutable_**. There are other mutable and immutable types that we will look at later.)

In [None]:
layers_list[1] = '32'
layers_list

It is worth noting that elements in a list can be anything (although this is usually confusing, and we will have better methods of dealing with heterogenous data later):

In [None]:
[123, 4242.21, "This is a string", "The next element is a function:", print]

Since we have numbers, we might want to do something like add them together:

In [None]:
layers_list[0] + layers_list[1]

That does not look at all right. What is the type of each element in `layers_list`?

In [None]:
type(layers_list[0]), type(layers_list[1])

These are strings, which do not add the same way that we might expect. We need to make the elements a proper number of some kind. Python gives us two major numbers: integers (`int`) and decimals (`float` - short for floating point). There is also support for `complex` numbers (of the form `(1 + 1j)`), if you need those. (An aside: these basic numbers are immutable.)

We can make a string that looks like a number a proper number like this:

In [None]:
int('42'), float('42'), int(float('3.8')), float('3.8')

Note that `float` gives us a decimal representation, even when given an integer. `int` can not handle strings that look like decimal numbers, and if given a `float`, simply truncates the number by removing everything after the decimal point. (`int('3.8')` raises an Exception.) We can now add the thickness of the first two horizons together, or do other mathematical operations:

Normal mathematical operators are as follows:

| Exponentiation | Integer division | Modulo (remainder) | Multiplication | Division | Addition | Subtraction | Equality |
|----------------|------------------|--------------------|----------------|----------|----------|-------------|----------|
| **             | //               | %                  | *              | /        | +        | -           | ==       |

Note in particular that equality is checked with `==`, **not** `=`, which we have used for assignment already.

In [None]:
int(layers_list[1]) + int(layers_list[0])

In [None]:
int(layers_list[1]) - int(layers_list[0])

In [None]:
int(layers_list[1]) * int(layers_list[0])

In [None]:
int(layers_list[1]) / int(layers_list[0])

Note that division gives us a `float` automatically. If we want integer division we need to use `//` instead. `%` will get us the remainder as well:

In [None]:
int(layers_list[1]) // int(layers_list[0]), int(layers_list[1]) % int(layers_list[0])

Something worth noting is how to deal with null data. Python has a built-in type `None`. If you have missing data or similar, this should be what you use, not special numbers like -999.25 or -1 or similar.

In [None]:
None, type(None)

For some mathematical things we need more than just the built-in functions. We might need or want to have some trigonometric functions, for example. In order to use these, we need to `import` the relevant library. We have a few ways of doing this, depending on exactly what we want to do.

In [None]:
from math import sin
sin(3)

In [None]:
import math
math.sin(3)

In [None]:
import math as mm
mm.sin(3)

Each of these gets us access to a `sin` function, with some differences:

- The first only imports the `sin` function specifically. If we already have something named `sin`, then it will overwrite it.
- The second imports all of the functions in the `math` module, which means that we need to access `sin` as a function in the `math` module with `math.sin`. It is safer to do this, since it is unlikely that we have anything with that name already.
- The last is commonly used for making shorter names for very commonly used functions, since instead of writing `math.sin` we can just use `mm.sin`. Some common examples of this **_aliasing_** that you will come across is `import numpy as np`, `import matplotlib.pyplot as plt`, and `import pandas as pd`. These three in particular are the standard way of importing those specific libraries.

<hr>

Working with each element in a list is somewhat tedious, so let us look at how to step through the list and make changes to each element as we go.

In Python, this is done with the pattern `for item in items:` followed by lines indented with four spaces:

In [None]:
for layer in layers_list:
    print(type(layer))
    print(layer)

A very common pattern is to `append` items to an empty list (`[]`) and use `for ... in ...:` to obtain each item.

In [None]:
layers_numbers = []
for layer in layers_list:
    layers_numbers.append(float(layer))

layers_numbers

We can achieve the same result with slightly more compact syntax using a "list comprehension":

In [None]:
layers_numbers_lc = [float(layer) for layer in layers_list]
layers_numbers_lc

In [None]:
layers_numbers == layers_numbers_lc

The return from that last cell is a type that we have not seen before. It is a `bool`, which is either `True` or `False`. We can use these to selectively run a block of code.

Boolean values can be obtained in a number of ways. Many functions or methods will return either True or False. Comparisons also return a `bool`:

| Equal to | Not equal to | Less than | Greater than | Less than or equal | Greater than or equal |
|----------|--------------|-----------|--------------|--------------------|-----------------------|
|   `==`   |     `!=`     |    `<`    |      `>`     |        `<=`        |          `>=`         |

Different types will never be equal (that is, something like `1 == '1'` is `False`). If you want to know if some is the same object as another, then you should use `is` and `is not`.

Some objects contain others (for example lists), and membership within a collection can be tested with `in`.

We can also link such expressions together:

| Operation | Result                               |
|-----------|--------------------------------------|
| x or y    | if x is false, then y, else x        |
| x and y   | if x is false, then x, else y        |
| not x     | if x is false, then True, else False |

In [None]:
layers_numbers[0] < layers_numbers[1]

In [None]:
layers_numbers[1] < layers_numbers[2]

In [None]:
layers_numbers[0] < layers_numbers[1] and layers_numbers[1] < layers_numbers[2]

In [None]:
layers_numbers[0] < layers_numbers[1] or layers_numbers[1] < layers_numbers[2]

<hr>

Using a new structure, `if` we can decide to run a given block of code or not. `if <condition>:` will only run the indented block following if the `<condition>` evaluates to `True`, skipping over the indented code if it evaluates to `False`:

In [None]:
if layers_numbers[0] < layers_numbers[1]:
    print('First thickness is less than second')

In [None]:
if layers_numbers[1] < layers_numbers[2]:
    print('First thickness is less than second')

The second cell never `print`s because `layers_numbers[1] < layers_numbers[2]` is False. We can put a catch-all at the end of this block, with `else:`. Note that this does not look at any condition; if none of the statements above are run, then this indented block will be run instead.

In [None]:
if layers_numbers[1] < layers_numbers[2]:
    print('First thickness is less than second')
else:
    print('First thickness is greater than second')

Multiple `if` statements will all be used, if the condition is met:

In [None]:
if layers_numbers[0] > layers_numbers[1]:
    print('First thickness is greater than second')
if layers_numbers[0] < 15:
    print('First thickness is less than 15')
if layers_numbers[0] > 5:
    print('First thickness is greater than 5')
else:
    print('None of these were True.')

If we only want the _first_ block that has a condition evaluate to True to be run, then we can use the `elif <condition>:` (for "else if") instead:

In [None]:
if layers_numbers[0] > layers_numbers[1]:
    print('First thickness is greater than second')
elif layers_numbers[0] < 15:
    print('First thickness is less than 15')
elif layers_numbers[0] > 5:
    print('First thickness is greater than 5')
else:
    print('None of these were True.')

Notice that in neither case was the `else` block run, because a block above it was run. We can use this to treat different elements in an iterable differently:

In [None]:
for layer in layers_numbers:
    if layer < 15:
        print(layer)
    else:
        print('Thick layer')

We can also control the loop based on the value of given elements, using the `break` and `continue` statements:

In [None]:
for layer in layers_numbers:
    if layer > 35:
        print('Thickness is >35m, stopping loop.')
        break
    print(layer)

In [None]:
for layer in layers_numbers:
    if layer > 35:
        print('Thickness is >35m, skipping value.')
        continue
    print(layer)

As these examples show, `break` ends the loop, while `continue` fetches the next element in the iterable.

<hr>

If we are working specifically with numbers, then `numpy` offers us some conveniences over the built-in libraries, so let us take a look at those in particular. First we need to import numpy, which is invariably done as follows:

_**Does this next bit go too far into the numpy weeds?**_

In [None]:
import numpy as np

What we are most interested in is a new type: numpy's `array`. We can make and use one in most any situation where we have data comprised solely of numbers. First, we will convert our `list` to an `array` by type-casting it:

In [None]:
arr_layers = np.array(layers_numbers)
print(type(arr_layers))
arr_layers

This is very similar to the list, but has a wealth of additional methods to make working with large numerical datasets easier. If we give it a `list` of `int`s, they will stay as `int`s, but if there is one `float`, all of the values will be converted:

In [None]:
np.array([1, 2, 3, 4, 5, 6])

In [None]:
np.array([1, 2, 3, 4, 5, 5.5])

Some common `array` methods that you may find useful:

In [None]:
arr_layers.shape

In [None]:
arr_layers.max(), arr_layers.min(), arr_layers.mean()

In [None]:
arr_layers.sum()

Some things that we may want are a function from numpy, rather than a method associated with a given array object. There are a few like this, so check the [documentation](https://numpy.org/doc/stable/) if you are looking for something specific. Some that are worth noting are the following:

In [None]:
np.nanmin(arr_layers), np.nanmax(arr_layers), np.median(arr_layers)

Where numpy arrays are really useful is in doing a mathematical operation on the entire array. For example, trying to multiply or add to all the values in a list requires accessing each value and performing multiplication or addition on it. In numpy that process is as simple as:

In [None]:
arr_layers

In [None]:
arr_layers * 2

In [None]:
arr_layers + 2

We can select specific elements from an array in the same sort of way as we do in a list:

In [None]:
arr_layers[3]

In [None]:
arr_layers[:5]

In [None]:
arr_layers[-1::-2]

We can also easily create a boolean array by checking if some condition is true or false for each element:

In [None]:
arr_layers > 15

If we have a boolean array, we can select only the values that match that condition or that do not match:

In [None]:
arr_layers[arr_layers > 15]

This gives us a good grounding in dealing with (lists of) numbers, along with accessing specific items, doing some basic mathematics, and deciding whether we want to do certain operations based on the value of something. There is also an introduction to some useful numpy things. 

<hr>

We should swing back and look further at some of the the things that we can do with strings.

In [None]:
log

The last item in the list is made up of strings, so we will use that further. We can start by `split`ting it as before:

In [None]:
lithologies = log[-1].split(',')
lithologies

Here are some useful string methods (there are others, of course). Note that all of these return a copy of the string, and do not change the original string. If we want to use the new copy, then it would need to be assigned to a name (which could be the same one).

In [None]:
lithologies[1].lower()

In [None]:
lithologies[1].title()

In [None]:
lithologies[1].replace('ST', 'stone')

In [None]:
lithologies[1].find('ST')

In [None]:
'sand' + 'stone'

In [None]:
lithologies[1] * 3

Strings are immutable, but are otherwise very similar to lists. We can extract specific characters out of a string using a very similar syntax to accessing elements in lists:

In [None]:
s = 'sandstone\tphi:\t0.3\n\t\tGR:\t32'
print(s)
s[4:9]

Because of the immutability, `s[4] = 'S'` will throw an exception.

Of course, we can combine string methods with things like `for` and `if` that we have seen already:

In [None]:
for lith in lithologies:
    if not lith.startswith('S'):
        print('Not shale, slate or sandstone:')
    print(lith)

We can format the results of values in strings in two ways: `.format` or using f-strings. The latter is more compact, and there is no particular reason to not use them, unless you are working on legacy code. We can put expressions or variables into strings directly using this.

In [None]:
for layer in layers_numbers:
    print(f'This layers is {layer}m thick.')

In [None]:
for layer in layers_numbers:
    print('This layers is {}m thick.'.format(layer))

We can also set the number of decimal points to display using either:

In [None]:
1/3

In [None]:
f'We can set decimal points in {1/3:.2f}.'

In [None]:
'We can set decimal points in {:.2f}.'.format(1/3)

As with some of the other stuff we have seen, we can also chain methods:

In [None]:
'sandstone'.replace('sand', 'silt').upper()

There are other string methods (or methods that work on strings), but these cover many use-cases already.

<hr>

We can now deal with the three different lines that we originally read in, but we will want to look at them holistically as well. First, we need to convert each element of `log` (recall that these are strings) as a list. The layer thicknesses (`log[0]`) and the lithologies (`log[2]`) are already dealt with, so we only need to convert the densities (`log[1]`).

In [None]:
log

In [None]:
layers_numbers

In [None]:
densities = log[1].split(',')

In [None]:
densities_numbers = [float(density) for density in densities]
densities_numbers

In [None]:
lithologies

Now that we have all three, we can work with all three lists using `zip`. This returns the nth element of each iterable that it is passed, stopping when an iterable runs out of elements. In order to use it, we need to look at how to assign multiple values at once:

In [None]:
x, y, z = ['X1', 'Y1', 'Z1']
print(x)
print(y)
print(z)

If the number of things to be assigned are the same as the number of things to assign, this works without a hitch. Python will raise exceptions if there is an imbalance. This is useful, because `zip` is going to give us the nth element from any iterables passed, so we can name each of them easily, rather than dealing with each triplet.

In [None]:
for elem in zip(layers_numbers, densities_numbers, lithologies):
    print(elem)

In [None]:
for layer, den, lith in zip(layers_numbers, densities_numbers, lithologies):
    print(f'This {lith} layer is {layer}m thick, with a density of {den}g/cm^3.')

Somewhat similar is the `enumerate` function. This returns `n` and the value of the nth element of an iterable. It may be useful if you need some sort of count as you do things to the elements for some reason.

In [None]:
for count, layer in enumerate(layers_numbers):
    print(f'The item at index {count} has a value of {layer}.')

`enumerate` only takes a single iterable, but we can give it the result of a `zip` if we need it.

In [None]:
for count, value in enumerate(zip(layers_numbers, densities_numbers, lithologies)):
    print(f'The item at index {count} has a value of {value}.')

We can see that `zip` returns something similar to a list, but different in some important ways.

In [None]:
type(value)

A `tuple` is a comma-separated sequence, like a list, but is immutable. It is represented surrounded by `()`, although this is not strictly needed when creating a tuple:

In [None]:
t = 1, 2, 'a', 'b'
type(t), t

In [None]:
t = (1, 2, 'a', 'b')
type(t), t

We can access elements in the same way as we have seen with lists and strings, although there are not many other methods associated with tuples, because of their immutability.

In [None]:
value[2]

<hr>

## Conclusion

This should set you up well with the basics of Python, with a relatively straightforward geoscience-based example. The rest of Agile's course will go into far more detail, but you have enough to be useful already.

<hr /><img src="https://avatars1.githubusercontent.com/u/1692321?v=3&s=200" style="float:center" width="40px" /><p><center>© 2020 <a href="http://www.agilegeoscience.com/">Agile Geoscience</a> - <a href="http://www.apache.org/licenses/LICENSE-2.0.html">Apache License 2.0</a></center></p>