# Intro to Python - For Video

We are going to work on the basics for working with some well data over the course of this and following tutorials. We will start by looking at how we might work with some of the basic data that we will be expecting.

In the sciences, much of our data is numerical, so we will start there.

## 1. Names and Maths

We can start with having some numbers associated with the thicknesses of each layer. Integers are referred to as `int`s, and decimal numbers as `float`s.

For the most part, we can use either when we are doing basic maths:

In [1]:
20 + 29.5

49.5

In [2]:
30 - 20

10

This gets inconvenient quickly if we need to refer to the same thickness repeatedly. We can "save" a thickness by assigning it to a name:

In [3]:
thickness1 = 20
thickness2 = 29.5
thickness3 = 30
thickness4 = 17

The name can now be used anywhere we like and it will pull the number that it refers to from somewhere in memory.

In [4]:
thickness1 + thickness2

49.5

In [5]:
thickness2 - thickness1

9.5

This also makes it clear what our data is, and what is other values that we want to use that are not actually data.

In [6]:
thickness4 / 10     # Division will get us a float by default.
                    # Integer division looks like: thickness // 10

1.7

In [7]:
2**3 # exponentiation is done like this.

8

For some maths operations, we might need additional options, which we can obtain from the `math` module.

In [8]:
import math

In [9]:
math.ceil(thickness3 / 10)

3

In [10]:
math.sin(2)

0.9092974268256817

We can also assign the results of a calculation to a new name:

In [11]:
proportion = thickness1 / thickness2
proportion

0.6779661016949152

We can also check for normal inequalities (greater than, less than, greater than or equal to, equal to, not equal to) and similar:

In [12]:
thickness1 > thickness2  # Greater than

False

In [13]:
thickness2 >= thickness4  # Greater than or equal to

True

In [14]:
thickness1 == thickness2  # Equal to

False

In [15]:
thickness1 != thickness2  # Not equal to

True

## 2: Flow control 1: `if ... elif ... else`

Anything that can evaluate to be either `True` or `False` can allow us to do something depending on if it is true or false. We do so as follows:

In [16]:
if thickness1 > thickness4:
    print('Thickness1 is bigger.')

Thickness1 is bigger.


We may want to check more than one thing:

In [17]:
if thickness1 < thickness2:
    print('Layer 1 is thinner than layer 2.')
if thickness1 < thickness3:
    print('Layer 1 is thinner than layer 3.')
if thickness1 < thickness4:
    print('Layer 1 is thinner than layer 4.')

Layer 1 is thinner than layer 2.
Layer 1 is thinner than layer 3.


If we only want the first condition that evaluates to `True` to print, then we can use an `elif` instead:

In [18]:
if thickness1 < thickness2:
    print('Layer 1 is thinner than layer 2.')
elif thickness1 < thickness3:
    print('Layer 1 is thinner than layer 3.')
elif thickness1 < thickness4:
    print('Layer 1 is thinner than layer 4.')

Layer 1 is thinner than layer 2.


We can also have a final `else` that will only evaluate if all the `if` or `elif` statements are false:

In [19]:
if thickness1 == thickness2:
    print('Layer 1 is as thick as layer 2.')
elif thickness1 == thickness3:
    print('Layer 1 is as thick as layer 3.')
elif thickness1 == thickness4:
    print('Layer 1 is as thick as layer 4.')
else:
    print('None of the layers are equally thick.')

None of the layers are equally thick.


## 3: Lists part 1

For something like a subsurface log, we will need to find a way to work with more than one value. `Lists` are a simple data structure that we can use to create a collection of data.

In [20]:
thicknesses = [20, 30, 15, 12, 25, 32]
thicknesses

[20, 30, 15, 12, 25, 32]

We can access a given element in the list as follows, where the first element has the index `0` and the last is `-1`. Elements in a list can be assigned to new values, if desired.

In [21]:
thicknesses[0]

20

In [22]:
thicknesses[-1]

32

In [23]:
print(thicknesses)
thicknesses[0] = 21.2
print(thicknesses)

[20, 30, 15, 12, 25, 32]
[21.2, 30, 15, 12, 25, 32]


And grab multiple elements using the following notation. Note that the second number is not included in the selection.

In [24]:
thicknesses[2:5]

[15, 12, 25]

In [25]:
len(thicknesses)

6

Unfortunately, lists are not able to handle maths on each element in the same way as a single number.

In [26]:
thicknesses[2:5] / 10

TypeError: unsupported operand type(s) for /: 'list' and 'int'

One possibility is to use `numpy`'s `array`, which is similar to a list, but with some additional superpowers. We will come back to arrays (and lists) again, but here is a simple example of how we can use `arrays` to scale our thickness values.

In [27]:
import numpy as np

arr_thicknesses = np.array(thicknesses)

In [28]:
arr_thicknesses[2:5] / 10

array([1.5, 1.2, 2.5])

In [29]:
arr_thicknesses.sum()

135.2

In [30]:
thickness1 + thickness2 + thickness3 + 12 + 25 + 32

148.5

In [31]:
arr_thicknesses.max()

32.0

## 4: Strings

Numbers are not the only basic data-type that we care about. Many labels are text, as are things like rock types.

In [32]:
'Sandstone\tΦ=0.23'

'Sandstone\tΦ=0.23'

The representation as a raw string can be a little messy, especially if there are special characters such as `\n` and `\t`. Often, `print`ing it will make it look neater:

In [33]:
print('Sandstone\tΦ=0.23')

Sandstone	Φ=0.23


In [34]:
sandstone = 'Sandstone\tΦ=0.23'

We can combine two or more strings with the `+` operator.

In [35]:
print(sandstone + '\n\t\t𝜌=2.4')

Sandstone	Φ=0.23
		𝜌=2.4


Strings have a number of potentially useful methods associated with them, for example:

In [36]:
sandstone.upper()

'SANDSTONE\tΦ=0.23'

In [37]:
sandstone.isalpha() # checks if a string is only alphabetic characters.

False

In [38]:
'\n' in sandstone # checks if a given character (in this case a new line) is in a string.

False

In [39]:
'  \nExtra   space.   '.strip() # Remove excess space from the start and end of a string.

'Extra space.'

In [40]:
sandstone.startswith('Sand')

True

In [41]:
sandstone.split('\t')

['Sandstone', 'Φ=0.23']

One way of thinking about strings is as a list of characters. We can access individual characters in the same way as we access elements in a list.

In [42]:
sandstone[8]

'e'

In [43]:
sandstone[:4]

'Sand'

Note that we can use anything that returns an `int` as the thing to do the slicing with, for example:

In [44]:
print(sandstone.find('\t'))
sandstone[:sandstone.find('\t')]

'Sandstone'

In [45]:
long_sandstone = sandstone + '\n\t\t𝜌=2.4'

In [46]:
long_sandstone[:long_sandstone.find('\t', 10)]

'Sandstone\tΦ=0.23\n'

A very useful method when dealing with strings is the `split` method, which gives us a list of strings, by splitting a string on a given string. This can be treated identically to a normal list.

In [47]:
long_sandstone.split('\t')

['Sandstone', 'Φ=0.23\n', '', '𝜌=2.4']

In [48]:
phi = long_sandstone.split('\t')[1]

We may want to use given values in a string, for example to make an automated report. There are two common ways of doing this:

In [49]:
print("The Φ of the sandstone is " + phi.split('=')[1] + '.')

The Φ of the sandstone is 0.23
.


In [50]:
print(f'The Φ of the sandstone is {phi.split("=")[1].strip()}.')

The Φ of the sandstone is 0.23.


It is worth noting that a number in a string is still a string, so we need to explicitly make it a number if we want to do maths to it:

In [78]:
phi.split('=')[1]

'0.23\n'

In [79]:
type(phi.split('=')[1])

str

In [80]:
float(phi.split('=')[1])

0.23

In [81]:
type(float(phi.split('=')[1]))

float

## 5: Lists part 2

Lists can contain anything, so we can make a log using them, potentially, by giving a list for the values (there are better ways that we will come back to later).

In [51]:
depths = [10, 20, 30, 40, 50, 60, 70]
densities = [2.3, 2.3, 2.5, 2.5, 2.35, 2.35, 3.2]
rocks = ['Sandstone', 'Sandstone', 'Shale', 'Shale', 'Sandstone', 'Sandstone', 'Granite']

In [52]:
log = [rocks, depths, densities]
log

[['Sandstone',
  'Sandstone',
  'Shale',
  'Shale',
  'Sandstone',
  'Sandstone',
  'Granite'],
 [10, 20, 30, 40, 50, 60, 70],
 [2.3, 2.3, 2.5, 2.5, 2.35, 2.35, 3.2]]

In [53]:
print(f'At {log[1][4]}m is {log[0][4].lower()} with a density of {log[2][4]}.')

At 50m is sandstone with a density of 2.35.


We can make this less brittle by using a variable to extract the same index in each sub-array.

In [54]:
index = 4
print(f'At {log[1][index]}m is {log[0][index].lower()} with a density of {log[2][index]}.')
index = 5
print(f'At {log[1][index]}m is {log[0][index].lower()} with a density of {log[2][index]}.')
index = 6
print(f'At {log[1][index]}m is {log[0][index].lower()} with a density of {log[2][index]}.')

At 50m is sandstone with a density of 2.35.
At 60m is sandstone with a density of 2.35.
At 70m is granite with a density of 3.2.


Again, we will come back to this, when we have a better data structure.

## 6: Flow control 2: `for ... in ...:`

As the last example shows, we often have to do the same thing over and over again. Doing it with copy-paste is likely to lead to us making a mistake. It is better to let the computer do it for us:

In [55]:
for index in range(7):
    print(f'At {log[1][index]}m is {log[0][index].lower()} with a density of {log[2][index]}.')

At 10m is sandstone with a density of 2.3.
At 20m is sandstone with a density of 2.3.
At 30m is shale with a density of 2.5.
At 40m is shale with a density of 2.5.
At 50m is sandstone with a density of 2.35.
At 60m is sandstone with a density of 2.35.
At 70m is granite with a density of 3.2.


We may not want to do the same thing to everything that we look at. We can check if something meets a condition and do something if it does, and not if it does not:

In [56]:
for index in range(7):
    if log[2][index] < 2.5:
        print(f'At {log[1][index]}m is {log[0][index].lower()} with a density of {log[2][index]}.')

At 10m is sandstone with a density of 2.3.
At 20m is sandstone with a density of 2.3.
At 50m is sandstone with a density of 2.35.
At 60m is sandstone with a density of 2.35.


In [57]:
for index in range(7):
    if log[2][index] < 2.5:
        print(f'At {log[1][index]}m is {log[0][index].lower()} with a density of {log[2][index]}.')
    else:
        print('Overly dense.')

At 10m is sandstone with a density of 2.3.
At 20m is sandstone with a density of 2.3.
Overly dense.
Overly dense.
At 50m is sandstone with a density of 2.35.
At 60m is sandstone with a density of 2.35.
Overly dense.


In Python, if we have an iterable sequence (like a list or array), then we can use a `for` loop to grab each item in the sequence in turn:

In [58]:
for depth in depths:
    print(depth)

10
20
30
40
50
60
70


We can also use the `continue` and `break` keywords to control how we step through the list.

- `break` will stop stepping through the sequence if it is triggered.
- `continue` will stop the current loop and get the next item in the sequence to pass through the loop.

The difference can be seen as follows:

In [59]:
for depth in depths:
    if depth == 50:
        print('Depth is 50m, stopping loop.')
        break
    else:
        print(depth)

10
20
30
40
Depth is 50m, stopping loop.


In [60]:
for depth in depths:
    if depth == 50:
        print('Depth is 50m, skipping.')
        continue
    else:
        print(f'{depth}m')

10m
20m
30m
40m
Depth is 50m, skipping.
60m
70m


A common pattern is building up a list is to use a loop to `append` values to an initially empty list:

In [61]:
porosities = []
for density in densities:
    porosities.append(density**1.3 * 5)

porosities

[14.764413207060604,
 14.764413207060604,
 16.45477755417797,
 16.45477755417797,
 15.183022138909266,
 15.183022138909266,
 22.68114505868613]

## 7: Dictionaries

The above structure for building a log (using a list of lists) was rather brittle. It was easy to get the wrong value for a depth. It is also difficult to see what a given list is meant to represent.

Dictionaries offer a means of fixing these issues, by using key: value pairs.

In [62]:
log = {
    'rock_type': rocks,
    'depth': depths,
    'density': densities,
}
log

{'rock_type': ['Sandstone',
  'Sandstone',
  'Shale',
  'Shale',
  'Sandstone',
  'Sandstone',
  'Granite'],
 'depth': [10, 20, 30, 40, 50, 60, 70],
 'density': [2.3, 2.3, 2.5, 2.5, 2.35, 2.35, 3.2]}

These lists can now be accessed using the key, instead of remembering the order of the lists:

In [63]:
log['depth']

[10, 20, 30, 40, 50, 60, 70]

In [64]:
log.get('density')

[2.3, 2.3, 2.5, 2.5, 2.35, 2.35, 3.2]

`get` is less prone to failure if the expected key does not exist, while `dict[key]` raises an exception.

In [65]:
log.get('porosity', 'Key not found!')

'Key not found!'

We can also access the keys, values or both as list-like things:

In [66]:
log.keys()

dict_keys(['rock_type', 'depth', 'density'])

In [67]:
log.values()

dict_values([['Sandstone', 'Sandstone', 'Shale', 'Shale', 'Sandstone', 'Sandstone', 'Granite'], [10, 20, 30, 40, 50, 60, 70], [2.3, 2.3, 2.5, 2.5, 2.35, 2.35, 3.2]])

In [68]:
log.items()

dict_items([('rock_type', ['Sandstone', 'Sandstone', 'Shale', 'Shale', 'Sandstone', 'Sandstone', 'Granite']), ('depth', [10, 20, 30, 40, 50, 60, 70]), ('density', [2.3, 2.3, 2.5, 2.5, 2.35, 2.35, 3.2])])

By casting the result of `dict.items` into a list, we can see that we get a pair, which we can assign to two names at once:

In [69]:
print(list(log.items())[1])
key, value = list(log.items())[1]
print(f'key:\t{key}')
print(f'value:\t{value}')

('depth', [10, 20, 30, 40, 50, 60, 70])
key:	depth
value:	[10, 20, 30, 40, 50, 60, 70]


We can use this simultaneous assignment/unpacking to easily loop through all the keys or values in a dictionary:

In [70]:
for key, item in log.items():
    print(f'{key} is {item}')

rock_type is ['Sandstone', 'Sandstone', 'Shale', 'Shale', 'Sandstone', 'Sandstone', 'Granite']
depth is [10, 20, 30, 40, 50, 60, 70]
density is [2.3, 2.3, 2.5, 2.5, 2.35, 2.35, 3.2]


Adding a new key:value pair to an existing dictionary is as simple as assigning a value to a key:

In [71]:
log['porosities'] = porosities
log

{'rock_type': ['Sandstone',
  'Sandstone',
  'Shale',
  'Shale',
  'Sandstone',
  'Sandstone',
  'Granite'],
 'depth': [10, 20, 30, 40, 50, 60, 70],
 'density': [2.3, 2.3, 2.5, 2.5, 2.35, 2.35, 3.2],
 'porosities': [14.764413207060604,
  14.764413207060604,
  16.45477755417797,
  16.45477755417797,
  15.183022138909266,
  15.183022138909266,
  22.68114505868613]}

It might be more useful to make the depth the key, with the values assigned to each depth. The enumerate function steps through an iterable, as well as getting us a count of how far through the iterable we are. This allows us to make a dicctionary from any number of equally-sized lists:

In [72]:
depth_log = {}
for count, depth in enumerate(depths):
    depth_log[depth] = {'density': densities[count], 'rock_type': rocks[count]}

depth_log

{10: {'density': 2.3, 'rock_type': 'Sandstone'},
 20: {'density': 2.3, 'rock_type': 'Sandstone'},
 30: {'density': 2.5, 'rock_type': 'Shale'},
 40: {'density': 2.5, 'rock_type': 'Shale'},
 50: {'density': 2.35, 'rock_type': 'Sandstone'},
 60: {'density': 2.35, 'rock_type': 'Sandstone'},
 70: {'density': 3.2, 'rock_type': 'Granite'}}

We can now get the values associated with a given depth by querying for that depth, and if we want a specific thing, say the density, then we can ask for it in the same way, since we are getting a dictionary back:

In [73]:
depth_log.get(10)

{'density': 2.3, 'rock_type': 'Sandstone'}

In [74]:
depth_log.get(10).get('density')

2.3

We could get all of the densities:

In [75]:
for value in depth_log.values():
    print(value.get('density'))

2.3
2.3
2.5
2.5
2.35
2.35
3.2


In the same way, we can add our porosities:

In [76]:
for count, value in enumerate(depth_log.values()):
    value['porosity'] = porosities[count]

In [77]:
for value in depth_log.values():
    print(value)

{'density': 2.3, 'rock_type': 'Sandstone', 'porosity': 14.764413207060604}
{'density': 2.3, 'rock_type': 'Sandstone', 'porosity': 14.764413207060604}
{'density': 2.5, 'rock_type': 'Shale', 'porosity': 16.45477755417797}
{'density': 2.5, 'rock_type': 'Shale', 'porosity': 16.45477755417797}
{'density': 2.35, 'rock_type': 'Sandstone', 'porosity': 15.183022138909266}
{'density': 2.35, 'rock_type': 'Sandstone', 'porosity': 15.183022138909266}
{'density': 3.2, 'rock_type': 'Granite', 'porosity': 22.68114505868613}


## Conclusion

This should set you up well with the basics of Python, with a simple geoscience-based example. The rest of Agile's course will go into far more detail, but you have enough to be useful already.

<hr /><img src="https://avatars1.githubusercontent.com/u/1692321?v=3&s=200" style="float:center" width="40px" /><p><center>© 2020 <a href="http://www.agilegeoscience.com/">Agile Geoscience</a> - <a href="http://www.apache.org/licenses/LICENSE-2.0.html">Apache License 2.0</a></center></p>