# Intro to Python

We are going to work on the basics for working with some well data over the course of this and following tutorials.  At the end of this, you should be able to read in a text file with log data, manipulate it to some extent, and then save it back to disk.

#### Instructor's note:
This notebook assumes that there has been an initial introduction to Python on the command line, covering at least basic assignment and maths. It might be good to gloss over the context manager for now, but it is good practice to use it regardless.

## Loading our data and an initial look

We will come back to reading and writing data from disc in more detail later in the week, but this is the minimal way to read a file in safely.

In [None]:
with open('../data/well_data.txt', 'r') as f:
    data = f.readlines()

We can then take a look at what our data is:

In [None]:
data

In [None]:
type(data)

We can access parts of our list to do things with:

In [None]:
data[0]

Notice that the index starts at `0`. We can get the length of a list (and many other objects) with the `len` function:

In [None]:
len(data)

### Exercise

* Use the built-in `type` function to find out what the data type of our first element in our `data` list is?
* Read the `GDR_clmag_part.XYZ` file in the `data` folder and assign it the name `mag_data`. (This is a subset of the data available [here](https://pubs.usgs.gov/of/2006/1204/german_mag.html) from the USGS.)
* How many rows are in the file?
* What is the `type` of the first element in `mag_data`?

In [None]:
# The type of the first element in `data`.


In [None]:
# Read in '../data/GDR_clmag_part.XYZ'.
with open #
    mag_data = #

In [None]:
# How many rows in `mag_data`?


In [None]:
# The type of the first element in `mag_data`.


In [None]:
with open('../data/GDR_clmag_part.XYZ', 'r') as f:
    mag_data = f.readlines()

In [None]:
len(mag_data)

In [None]:
mag_data

The way that we are reading data in gets us a list of strings. So, let us take a look at strings, and what we can do with them.

## Strings in detail

* Creating strings
* `print` and special characters (`\n`, `\t`)
* Indexing
* Membership and operations
* String methods (`strip`, `lower`, `islower`, `find`, `replace`)
* Formatting strings: f-strings and `.format`

In [None]:
type('sandstone')

In [None]:
type("sandstone")

In [None]:
"sandstone's"

In [None]:
s = 'sandstone\tphi:\t0.3\n\t\tGR:\t32'
s

In [None]:
print(s)

In [None]:
len(s)

#### Indexing and Slicing

In [None]:
s[4]

In [None]:
# This does _not_ work. Strings are immutable, so they can not be changed.
# Might want to compare to a list later instead?
s[4] = 'S'

In [None]:
s[:9]

In [None]:
s[-2:]

#### Membership and operations

In [None]:
'sand' in s

In [None]:
'=' * 10

In [None]:
'Hello ' + 'world' + '!'

In [None]:
'The value is ' + str(243.21) + ' Nm.'

In [None]:
' '.join(['Hello', 'World'])

#### String methods

In [None]:
'  a string    surrounded by spaces   \n   '.strip()

In [None]:
s.lower()

In [None]:
s.islower()

In [None]:
s.find('GR')

In [None]:
s.startswith('Sand')

In [None]:
s.replace('sand', 'silt')

Methods can be 'chained':

In [None]:
s.replace('sand', 'silt').upper()

### Formatting strings

We can format the results of values in strings in two ways: `.format` or using f-strings. The latter is more compact, and there is no particular reason to not use them, unless you are working on legacy code. We can put expressions or variables into strings directly using this.

In [None]:
poro = 0.15

'The porosity is ' + poro

In [None]:
'The porosity is ' + str(poro)
f'The porosity is {poro}'
'The porosity is {}'.format(poro)

The formatting mini-language docs: https://docs.python.org/3/library/string.html#formatspec

### Exercise

- Use the `math` or `numpy` module and an f-string to print $\mathrm{e}$ (the base of the natural logarithm) to 3 decimal places. 
- Change `'JURASSIC*PERIOD\n'` to lower case.
- Change the `'*'` to a space, change everything to title case, and remove the new line. Do this in a single expression if you can.

In [None]:
import math

In [None]:
# Print e in an f-string with 3 decimal places.


In [None]:
# Change the following string to lower case.
s = 'JURASSIC*PERIOD\n'


In [None]:
# Change the '*' to a space, change everything to title case, and remove the new line character.


## Lists in detail

Our data was loaded as a list, so let us look at how lists behave and what they can do.

* Creating lists
* Indexing
* Slicing
* Modifying list elements
* List methods (append, pop, insert, count, index)
* Things to be aware of

In [None]:
# creation of a new list:
a_list = [0, 1, 2, 3, 4, 5]

# creation of a list from a string:
thicknesses = data[0].split()[0].split(';')

**Indexing and slicing**

Either of the above lists should work to show the basics below:

In [None]:
thicknesses[0]

In [None]:
len(thicknesses)

In [None]:
thicknesses[-1]

In [None]:
thicknesses[3:6]

In [None]:
thicknesses[::2]

In [None]:
thicknesses

#### Modifying Lists

In [None]:
thicknesses[4] = '12'

In [None]:
thicknesses

#### List Methods

Useful methods to cover:

```
.index()
.count()
.append()
.extend()
.pop()
```

### Things to be aware of with lists:
#### Heterogeneous Lists

In [None]:
mixed_list = [0, 1, 2.5, thicknesses, 'sandstone', [100,200,300]]

#### Mutability gotcha:

In [None]:
a = [1, 2, 3, 4]
b = a

In [None]:
b[0] = 10
b

In [None]:
a

In [None]:
b = a.copy()

In [None]:
b[0] = 1
b

In [None]:
a

### Make sure that the following have been done to get the data ready for later:
Also, make sure that `len(thicknesses)` is 10.

In [None]:
data_thickness = data[0].split()[0]
data_thickness.split(';')

In [None]:
thicknesses = data_thickness.split(';')

In [None]:
thicknesses[4] = '12'

### Optional Exercise

* In the `mag_data` that we loaded previously:
    * How many of the items in the list are header rows, starting with `/`? <a title="Slice into the `mage_data`. There are not more than 20 header rows."><b>HINT</b></a>
    * How many footer rows, also starting with `/` are there? <a title="Slice into the `mage_data`. There are not more than 10 footer rows."><b>HINT</b></a>
    * Remove any header and footer items from `mag_data` and name the result `cleaned_mag_data`. You should have 52135 elements in your list.
    * Save any header data to a new list `mag_header`.

In [None]:
# 13 header rows:
mag_data[:20]

In [None]:
# 2 footer rows:
mag_data[-10:]
# mag_data[-1:-10:-1] # also works

In [None]:
cleaned_mag_data = #

In [None]:
# Either repeated
# mag_data.pop(0) and mag_data.pop()
# or
cleaned_mag_data = mag_data[13:-2]
# The second is nicer if you have to drop a bunch of things.
cleaned_mag_data[:15]

In [None]:
cleaned_mag_data[-10:]

In [None]:
# If you did the above correctly, this cell will have no output.
assert len(cleaned_mag_data) == 52135

In [None]:
mag_header = #

In [None]:
mag_header = mag_data[:13]
mag_header

## Looping: `for` ... `in` ...`:`

* Motivation
* Basic pattern
* Making new lists
* List comprehensions

In [None]:
thicknesses

In [None]:
type(thicknesses[0])

We can change the type of each element in a list by _typecasting_.

In [None]:
int('230'), float('35.2'), str(234), bool(1)

In [None]:
int(thicknesses[0])

#### Basic Pattern

In [None]:
for item in thicknesses:
    print(item)
    print(type(item))

#### Making new lists

This is a very common pattern:

In [None]:
thickness_ints = []
for item in thicknesses:
    thickness_ints.append(int(item))
    
thickness_ints

In [None]:
for item in thickness_ints:
    print(type(item))

#### Plotting taster

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.plot(thickness_ints)

#### List comprehensions

In [None]:
# List comprehension instead
[int(n) for n in thicknesses]

### Exercise

Rearrange the following lines to loop over a list of files and gather the second part of the file names &mdash; the months &mdash; into a new list, then print the new list.

When the code runs, it should produce:

    ['Jan', 'Mar', 'Jun', 'Jun']


In [None]:
print(months)
files = ['MH_Jan-18.png', 'MH_Mar-18.png', 'MH_Jun-18.png', 'MH_Jun-17.png']
months.append(month)
month = file.split('_')[1].split('-')[0]
for file in files:
months = []

In [None]:
files = ['MH_Jan-18.png', 'MH_Mar-18.png', 'MH_Jun-18.png', 'MH_Jun-17.png']
months = []
for file in files:
    month = file.split('_')[1].split('-')[0]
    months.append(month)
print(months)

### Exercise

* Create lists of the data from the strings stored in `data[1]` and `data[2]`, giving them appropriate names (`densities` and `lithologies`).
* Typecast the elements in the `density` list to an appropriate type, rather than strings. Save the results as a new list (`density_floats`).
* Optional: If you are feeling comfortable with list comprehensions, do the typecasting of the densities into `float` values using a list comprehension instead of a `for` loop.
* Optional: Calculate the depth of the top of each layer in our dataset by subtracting the thickness of the previous layer, starting at 0. <a title="You will need to keep track of the depth (starting at 0) with a separate name and change the value as you iterate through the loop."><b>HINT</b></a>

The last one is definitely a stretch exercise, unless you are already comfortable with programming in other languages, so do not feel bad if you are struggling.

In [None]:
# Create a list of the data.
data[1]

In [None]:
data[2]

In [None]:
densities = data[1].split()[0].split(';')
lithologies = data[2].split()[0].split(';')
densities

In [None]:
lithologies

In [None]:
# Cast the density values into a list of floats.
density_floats = #

In [None]:
density_floats = []
for den in densities:
    density_floats.append(float(den))
    
density_floats

In [None]:
# Do the typecasting as a list comprehension.


In [None]:
[float(den) for den in densities]

In [None]:
# Calculate the depth of each row.
current_depth = 0
depths = []
# for ... in ...:

In [None]:
# Build this up slowly.
curr_depth = 0
depths = []
for layer_thickness in thickness_ints:
    depths.append(curr_depth)
    #curr_depth = curr_depth + layer_thickness
    curr_depth -= layer_thickness  # Show the above first.
depths

In [None]:
plt.plot(thickness_ints, depths)

We now have the following lists from our well:

In [None]:
print(thickness_ints)
print(density_floats)
print(lithologies)
print(depths)

Make sure to give the clean versions of these to students:

In [None]:
thickness_ints = [10, 30, 15, 45, 12, 11, 21, 1, 12, 19]
density_floats = [2.3, 2.4, 2.5, 2.5, 2.45, 2.3, 2.55, -999.25, 2.55, 2.6]
lithologies = ['SH', 'SST', 'SLT', 'SH', 'SST', 'INT', 'SH', 'SST', 'SLT', 'SH']
depths = [0, -10, -40, -55, -100, -112, -123, -144, -145, -157]


- Optional: Our `mag_data` contains some header information that we loaded earlier. Print the header rows using string methods in a `for` loop. We only want the header data, not any footers that may be present. <a title="str.startswith is a useful method that may help."><b>HINT</b></a>
- Optional: Save just the header data as a list named `mag_header`.
- Optional: Save the non-header data as a list named `cleaned_mag_data`.

If you are feeling comfortable so far, the last three questions should be attainable with what we have shown. If you are feeling less comfortable, we will help you with solutions soon.

In [None]:
# Print the header lines in `mag_data`.




In [None]:
mag_header = # YOUR CODE HERE

In [None]:
cleaned_mag_data = #

In [None]:
for line in mag_data:
    if line.startswith('/'):
        print(line.strip())
    else: # We want the break, because otherwise we catch the footers in this dataset as well.
        break

In [None]:
mag_header = []
for line in mag_data:
    if line.startswith('/'):
        mag_header.append(line)
    else:
        break
mag_header

## Mathematics

We covered much of this in the console, but just as revision:


```python
2 + 3 # Addition
2 - 3  # Subtraction
2 * 3  # Multiplication
2 / 3  # Division
11 // 2  # Floor division
11 % 2  # Modulo (remainder)
2**3  # Exponentiation
```

If we want to do more complex maths than covered above, then we will need to `import` a library with additional functions. `import math` is the option from the standard library. This gives us access to a number of other functions, such as:

```python
math.sqrt(2)  # square root
math.floor(2.5)  # lowest integer (2 in this case)
math.ceiling(2.3)  # next integer (3 in this case)
math.cos(2)  # cosine of 2 radians. Other trig functions are available.
```

`help(math)` will give you a list of everything.

In [None]:
import math

### Optional Exercise

1. What is the log<sub>10</sub> of 7e6? (You will need to use the `math` library, or `numpy` if you prefer.)
1. What is the $\tan$ of $\pi / 4$ radians?
1. What is $\sin^2(45^\circ)$?
1. Given $\phi_0 = 0.3$, $c = 0.005$, and $z = 157$, what is $\phi = \phi_{0}\,\operatorname{e}^{-cz}$? <a title="There is a function, math.exp, that might be useful."><b>HINT</b></a>

In [None]:
phi_zero = 0.3
c = 0.005

phi = # your code here

### Booleans

`bool`s are either `True` or `False`. These can be very useful, most obviously for selectively running particular blocks of code.

Boolean values can be obtained in a number of ways. Many functions or methods will return either `True` or `False`. Comparisons also return a `bool`:

| Equal to | Not equal to | Less than | Greater than | Less than or equal | Greater than or equal |
|----------|--------------|-----------|--------------|--------------------|-----------------------|
|   `==`   |     `!=`     |    `<`    |      `>`     |        `<=`        |          `>=`         |

Different types will never be equal (that is, something like `1 == '1'` is `False`). If you want to know if something is the same object as another, then you should use `is` and `is not`.

Some objects contain others (for example lists), and membership within a collection can be tested with `in`, which gives a `True` or `False`.

We can also link expressions that are True or False together in a few ways:

| Operation 	| Result                                                           	|
|-----------	|------------------------------------------------------------------	|
| a **or** b    	| True if either a or b is true                                    	|
| a **and** b   	| False if either a or b is false,<br>True if both a and b are true |
| **not** a     	| True if a is true, else False                                    	|

In some cases (notably with numpy arrays) `&` and `|` are used instead of `and` and `or`. `&` and `|` are bitwise operators: they are only used on numbers, and work at the level of individual 1s and 0s. In most cases you will want `and` and `or` instead.

#### Truthiness

Some things are considered to be "truthy" (and will count as `True`) while others are "falsey" (counting as `False`). Examples of things that are falsey are the following:
* `0`
* `0.0`
* empty collections (such as an empty list `[]`, and empty versions of the other datastructures that we will cover in this notebook but have not seen yet),
* empty strings (`''` or `""`).

Most other things will be truthy.

Here is a simple example, but play around with more:

```python
e_list = []

if e_list:
    print('True!')
else:
    print('False!')
    
f_list = [0]

if f_list:
    print('True!')
else:
    print('False!')
```

## `if` statements

One of the most common places to see booleans is in `if` statements. These allow for different blocks of code to be run depending on the result of a check.

* Basic pattern
* `if` ... `else`
* `if` ... `elif` ... `else` - mutually exclusive options
* Combined with `for` ... `in` ... `:` to control iterations
    - `break`, `continue`

In [None]:
string = '12300 ft'

In [None]:
# Build up to this to show the basic pattern.

if 'm' in string:
    units = 'm'
elif 'ft' in string:
    units = 'ft'
else:
    units = None
    
print(units)

### `if` in `for` loops; `break` and `continue`

In [None]:
fixed_dens = []
for den in density_floats:
    if den == -999.25:
        den = None
    fixed_dens.append(den)
    
fixed_dens

In [None]:
plt.plot(fixed_dens, depths)

In [None]:
for den in density_floats:
    if den == -999.25:
        print('Density is -999.25, ending loop.')
        break
    print(den)

In [None]:
for den in density_floats:
    if den == -999.25:
        print('Density is -999.25, skipping.')
        continue
    print(den)

### Exercise
 
* We previously loaded a `xyz` file into memory (as `mag_data`). Loop over the string in each row containing a `lon`, `lat` and `resmag` and `split` the row into a list. Typecast each value into a number (the latitude and longitude should be `float`s, residual magnetic anomaly (`resmag`) can be `int`). Then `append` each number to its corresponding list. When you are done, you should have three lists, each with 52135 elements. Note: You may need to remove the headers and footers first. <a title="Try this with a smaller subset of the data, or `break` after dealing with the first row."><b>HINT</b></a> <a title="If you need to trim your data, you can either slice only the data out, or use string methods to skip non-data rows."><b>HINT 2</b></a>
* Use the given `plt.scatter` to create a scatter plot of our `resmags` at their `x`, `y` coordinates.
* Add an `if` statement inside your loop to only store rows where the `resmag` is positive and replot the resulting lists.

In [None]:
# If we skipped the exercise earlier, then we need to create `cleaned_mag_data`:
cleaned_mag_data = mag_data[13:-2]

# We can also do it with a for loop and string methods.

In [None]:
lats = []
lons = []
resmags = []

# for ... in ...:

In [None]:
# This probably needs some build up first.

lons = []
lats = []
resmags = []

for row in cleaned_mag_data:
    row = row.split()
    lons.append(float(row[0]))
    lats.append(float(row[1]))
    resmags.append(int(row[2]))

In [None]:
# As before, this should give no output if the above has worked correctly.
assert len(lats) == len(lons) == len(resmags) == 52135

In [None]:
plt.scatter(lons, lats, c=resmags, s=2)
plt.colorbar()

In [None]:
# Keep only values where `resmag` is positive
pos_lons = []
pos_lats = []
pos_resmags = []

# for ... in ...:

In [None]:
pos_lons = []
pos_lats = []
pos_resmags = []

for row in cleaned_mag_data:
    row = row.split()
    mag = int(row[2])
    if mag > 0:
        pos_lons.append(float(row[0]))
        pos_lats.append(float(row[1]))
        pos_resmags.append(mag)

In [None]:
plt.scatter(pos_lons, pos_lats, c=pos_resmags, s=2)
plt.colorbar()

## Maths using `numpy`

Numpy makes a number of things easier when working with numerical data.

* Creating a numpy ndarray.
* Elementwise mathematics
* Multi-dimensional indexing

In [None]:
import numpy as np

#### Creating ndarrays

In [None]:
np.array([1, 2, 3, 4])
np.array(depths)

In [None]:
depths_arr = np.array(depths)
type(depths_arr)

In [None]:
np.arange(0, 100, 10)

In [None]:
np.linspace(0, 100, 10)

#### Elementwise mathematics
_Maths on collections with numpy compared to using loops_

Turcotte and Schubert in _Geodynamics_ (2014) state that over a large depth range the porosity-depth relationship can be described by an exponential equation:

$$\phi = \phi_{0}\,\operatorname{e}^{-cz}$$

where $\phi$ is porosity, $\phi_{0}$ is some initial porosity, $\mathrm{e}$ is Euler's constant, $z$ is depth, $c$ is some constant. Note that our depths are negative, so we will not use `-c` in our implementation.

We can do it in a standard loop like this:
```python
porosities = []
for depth in depths:
    porosities.append(phi_zero * math.exp(c * depth))
```

This is much more direct when using numpy:

In [None]:
phi_zero = 0.3
c = 0.005

In [None]:
depths * 2

In [None]:
porosities_arr = phi_zero * np.exp(c * depths_arr)
porosities_arr

We can also plot this new array:

In [None]:
#porosities_arr = # this should be a numpy array
plt.plot(porosities, depths, 'o-')
plt.plot(porosities_arr, depths, 'o-', alpha=0.5)

#### Booleans and NumPy arrays

We will only do this if it comes up. It is covered in the matplotlib notebook anyway.

In [None]:
bool_arr = np.array([True, True, False, False, True])
my_arr = np.array([1, 2, 3, 4, 5])

In [None]:
my_arr[bool_arr]

In [None]:
porosities_arr

In [None]:
porosities_arr > 0.15

In [None]:
porosities_arr[porosities_arr > 0.15]

In [None]:
resmags_arr = np.array(resmags)
lats_arr = np.array(lats)
lons_arr = np.array(lons)

In [None]:
resmags_arr

In [None]:
resmags_arr >= 0

In [None]:
mask = resmags_arr >= 0
mask

In [None]:
lats_arr

In [None]:
lats_arr[mask]

In [None]:
lats_arr.size, lats_arr[mask].size

In [None]:
plt.scatter(lons_arr[mask], lats_arr[mask], c=resmags_arr[mask], s=2)

#### Multi-dimensional arrays in NumPy

In [11]:
before = np.loadtxt('../data/st-helens_before.txt')
after = np.loadtxt('../data/st-helens_after.zip')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 56: invalid start byte

In [None]:
before.shape, before.size, before.ndim

In [None]:
before[10:100].shape, before[10:100]

In [None]:
plt.imshow(before[10:100])

In [None]:
before[:, 0]

In [None]:
before[:, 0].shape

In [None]:
plt.imshow(before[::10, ::10])

In [None]:
plt.imshow(before - after)

In [None]:
plt.plot(before[:, 160])
plt.plot(after[:, 160])
plt.plot(before[:, 160] - after[:, 160])
plt.axhline(0, alpha=0.3, c='k')

## Tuples

We will be brief about these. At base, tuples are essentially immutable lists.

In [None]:
coords = # the lon and lat of our well.

## Sets

These are less common data structures, but when they are the right tool, they are fantastic. We are not going to cover them in detail, but it is worth seeing them. The official documentation describes them well:

> A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference.

We can create sets from existing objects like `list`s or `array`s, or from scratch:
```python
>>> basin1_lith = 'dolerite shale sandstone sandstone dolerite'.split()
>>> basin1_lith_set = set(basin1_lith)
{'dolerite', 'sandstone', 'shale'}
>>> basin2_lith_set = {'dolerite', 'limestone', 'mudstone', 'dolomite', 'mudstone', 'dolerite', 'dolerite'}
{'dolerite', 'dolomite', 'limestone', 'mudstone'}
```

Notice that the values in a set are unique: duplicates are removed. We can only put immutable objects into a set, such as strings, ints, and floats.

Once we have sets, then we can compare them in various ways:

* `intersection` (only what is in both sets)
* `union` (combination of both sets)
* `difference` (items only in one of the sets)

Note that sets are mutable, so you can add or remove items from them.

Below is some code to test some set operations for yourself:

```python
arr_40 = np.random.randint(20, 40, size=40)
arr_20 = np.random.randint(0, 30, size=40)

plt.hist(arr_40, alpha=0.5, bins=20)
plt.hist(arr_20, alpha=0.5, bins=30)
plt.show()

set_40 = set(arr_40)
set_20 = set(arr_20)
set_35 = {36, 37, 39, 36, 39, 38, 37, 39, 38, 39, 36, 36, 35, 39, 35}

#Things to try:
set_20.intersection(set_40)
set_20.difference(set_40)
set_20.isdisjoint(set_35)
set_35.issubset(set_40)
```

## Dictionaries

- Not a sequence, but a database-like mapping of k, v pairs
- Forming with `{...}`
- Keys and values
- Retrieving a value using a key
- Adding keys
- Deleting keys with `pop`
- `in`
- `get` (a bit like a database)
- `update` with `{k: v}`, or just do `dict[k] = v`
- Iterating over a dictionary

The data that we currently have from our well is the following:

In [None]:
print(coords)
print(thickness_ints)
print(density_floats)
print(lithologies)
print(depths)
print(porosities)

In [None]:
# you can easily share these with students:
coords = (-33.5, 21.0)
thickness_ints = [10, 30, 15, 45, 12, 11, 21, 1, 12, 19, 42, 33]
density_floats = [2.3, 2.4, 2.5, 2.5, 2.45, 2.3, 2.55, -999.25, 2.55, 2.6]
lithologies = ['SH', 'SST', 'SLT', 'SH', 'SST', 'INT', 'SH', 'SST', 'SLT', 'SH']
depths = [0, -10, -40, -55, -100, -112, -123, -144, -145, -157, -176, -218]
porosities = [0.3, 0.2853688273502142, 0.24561922592339452, 0.22787163696749052, 0.18195919791379003, 0.17136271915464446, 0.16219226859279498, 0.1460256767879915, 0.14529737068660872, 0.13683591053569175, 0.12443487350447441, 0.10086494811202]

We can store these in a single collection:

In [None]:
data_dict = {'location': coords,
             'thicknesses': thickness_ints,
             'density': density_floats,
             'lithology': lithologies,
             'depths': depths,
             'porosities': porosities,
            }

In [None]:
data_dict

In [None]:
data_dict['depths']

In [None]:
data_dict.get('depths')

In [None]:
data_dict.get('processor', 'Anonymous')

In [None]:
# adding K:V pair directly
data_dict['processor'] = 'Martin'
data_dict

In [None]:
# adding K:V pair with dictionary in `.update`
data_dict.update({'processor': 'Rob'})

In [None]:
data_dict.pop('processor')

**Finding things**

In [None]:
'porosity' in data_dict

In [None]:
data_dict.keys()

In [None]:
data_dict.values()

In [None]:
data_dict.items()

In [None]:
data_dict

#### Looping for a dictionary

In [None]:
# Dictionaries are iterable over their keys with no special effort:
for key in data_dict:  # the same as `for key in data_dict.keys():`
    print(key)

In [None]:
for value in data_dict.values():
    print(type(value))

In [None]:
for key, value in data_dict.items():
    print(key + ' has value: {}'.format(value))

## Exercise

Using our `data_dict`:
* Print the following, filling in the values from our `data_dict`: <a title="Note that we want a longitude of '21', not '21.0'."><b>HINT</b></a>
      'Our well is located at -33.5 S, 21 E.'
* The well is in the 'Western' prospect. Add this 'prospect' to the `dict`.
* One of the items in the `lithology` list is `INT`. Change it to `SLT`.
* Optional: Some of our keys are singular. It might be better to have the keys for lists as plurals. Change the `lithology` and `density` keys to be `lithologies` and `densities`. The old keys should not be in `data_dict` when you are done.

In [None]:
# Print 'Our well is located at -33.5 S, 21 E.', filling in the values from the dictionary.


In [None]:
lon, lat = data_dict['location']
print(f'Our well is located at {lon} S, {lat:.0f} E.')
print('Our well is located at {} S, {:.0f} E.'.format(*data_dict['location'])) # This is nice.

In [None]:
# Add a prospect.


In [None]:
data_dict['prospect'] = 'Western'

In [None]:
# Change 'INT' to 'SLT' in `data_dict`'s lithology list.


In [None]:
new_liths = []
for lith in data_dict['lithology']:
    if lith == 'INT':
        lith = 'SLT'
    new_liths.append(lith)
data_dict['lithology'] = new_liths

# This will also work:
# data_dict['lithology'][5] = 'SLT'
# data_dict['lithology']

# data_dict.get('lithology').index['INT'] to find the index is a better approach generally.

In [None]:
# Make the keys plural





In [None]:
plurals = {'lithology': 'lithologies', 'density': 'densities'}
for k, v in plurals.items():
    data_dict[v] = data_dict.pop(k)

In [None]:
data_dict

### Stretch/ Homework Exercise

We will leave this one for now, but it is a nice use case for how you might create a dictionary from a list of strings. Definitely come back and try it on your own. When you do, if you get stuck, please feel free to ask for help.

* Convert the `mag_header` list to a dictionary (`mag_dict`). You will need to loop over the list and decide which of the rows have information that makes sense to keep, as well as obtaining keys and values from those list elements. You do not need to try and keep all the rows. Also make sure that the keys are lower-case and contain `_` instead of any spaces. <a title="Look for rows with `:` in them."><b>HINT</b></a>
* Add the `lats`, `lons` and `resmags` as values to this dictionary, with appropriate keys (`lats`, `lons`, `resmags`). If you prefer, add them as `numpy` arrays instead of lists.

In [None]:
mag_header

In [None]:
mag_dict = {}

# for ... in ...:

In [None]:
for line in mag_header:
    if ':' in line:
        print(line.split(': '))

In [None]:
# As usual, build this one up.
mag_dict = {}

for line in mag_header:
    if ':' in line:
        part1, part2 = line.split(': ')
        part2 = part2.strip()
        try: # This can be skipped, but it might be nice to show.
            part2 = int(part2)
        except ValueError:
            pass
        part1 = part1.lower().replace('/ ', '').replace(' ', '_')
        mag_dict[part1] = part2
        
mag_dict

In [None]:
to_add = 'lats lons resmags'.split()

for key, value in zip(to_add, [lats, lons, resmags]):
    mag_dict[key] = value

In [None]:
mag_dict

## Writing back out

We have now loaded our data, processed it (and added some new things to it!). In order to keep this, we will write our file back to disk, in the simplest possible example (we will come back to reading and writing files later in the week).

We do this in much the same way as we read our data in:

In [None]:
with open('../data/processed_well.txt', 'w') as f:
    for k, v in data_dict.items():
        f.write(f'{v} # {k}\n')

### Saving our aeromag data

We will also save our plot to disk, using `plt.savefig()`.

In [None]:
maxx = 920  # absolute maximum value in our data, so that the colorbar is centred on 0.
plt.scatter(mag_dict['lons'], mag_dict['lats'], c=mag_dict['resmags'], s=2,
            cmap='RdBu', vmin=-maxx, vmax=maxx)
plt.xlabel('Longitude [°E]')
plt.ylabel('Latitude [°N]')
plt.colorbar(label='Residual Magnetic Anomaly [nT]')
plt.savefig('../data/afghan_aeromag.png', dpi=300)

We have now covered the basics of Python's syntax. While you can do a lot with what we have shown here, we will dive into some useful additional concepts over the rest of the week, which will make it easier for you to be productive.

---

## Other great places to pick up Python:

- [Learn X in Y minutes](https://learnxinyminutes.com/docs/python3/) — If you just want to get cracking.
- [Stavros](https://www.stavros.io/tutorials/python/) — If you want to know a bit more.
- [Robert Johansson's lectures](Lecture-1-Introduction-to-Python-Programming.ipynb)
- [Tutorials Point](http://www.tutorialspoint.com/python/python_quick_guide.htm) — Another option.
- [Code Academy](https://www.codecademy.com/learn/learn-python-3) — A more sedate pace.
- [Udacity Intro to Computer Science](https://www.udacity.com/course/intro-to-computer-science--cs101) — Fantastic but a serious undertaking.
- [All the tutorials!](https://wiki.python.org/moin/BeginnersGuide/Programmers)

**WARNING** There's still some Python 2 around. Keep away from it if you can! Python 3 has lots of advantages, and there are hardly any libraries now that have not made the switch.

----

## Python is...

- Not just a scripting language.
- Interpreted, not compiled.
- Strongly typed — types are enforced.
- Dynamically, implicitly typed — you don't have to declare variables.
- Case sensitive — var and VAR are two different variables.
- Object-oriented — everything is an object.
- Supportive of functional and procedural styles.

<hr />

<div>
<img src="https://avatars1.githubusercontent.com/u/1692321?s=50"><p style="text-align:center">© Agile Geoscience 2021</p>
</div>