# Logs and other 1D data

Wireline logs, and 1D data in general, are a fundamental data type for subsurface scientists and engineers.

The data themselves are usually some physical measurement: conductivity of rock, speed of sound, natural radiation counts, volume of fluid, and so on.

The data points also have a set of 'coordinates' in time or space. For a wireline log, the coordinates are depths; for production data, they are timestamps. Often, there are multiple sets of coordinates to worry about, such as MD, TVDSS, and TWT for wireline logs.

A lot of our work as scientists and engineers comes down to wrangling data like this. Let's look at some.

## Data from the F3 dataset

Let's start off by loading some well data, to do this, we'll use a `function` from [welly](https://code.agilescientific.com/welly/):

💡 Encourage students to write comments in code blocks.

In [None]:
from welly import Well

url = 'https://geocomp.s3.amazonaws.com/data/F02-1_logs.las'

w = Well.from_las(url)

dt = list(w.data['DT'].values)

dt[:5]

We have now loaded some data. We made an **assignment**, which means giving a name (`dt`) to a data structure (a `list` full of numbers in this case), so we can now inspect `dt`:

In [None]:
dt

As this is a very long log (how long in fact?), we'll just **slice** into the `list` to grab the first 10 values to learn about this data structure, the `list`:

In [None]:
len(dt)

In [None]:
# Slicing
dt[0:10]

In [None]:
# Saving the slice
dt_ten = dt[0:10]
dt_ten

Notice that, for now, there are no coordinates, only the data.

This thing now exists in memory, with the name `dt_ten` pointing at it. We can ask to see it:

In [None]:
dt_ten

But we can't do 'mathy' things with it in a particularly easy way:

In [None]:
dt_ten + 10 

# This results in an error:

Later on we'll meet NumPy and see how its `ndarray` data structure can help us with this.

We can plot it though! We need a library for this, plotting is not built into the core of Python. (Most things aren't; the core of Python just contains a few fundamental tools.)

In [None]:
import matplotlib.pyplot as plt

plt.plot(dt_ten, '*-')

In [None]:
plt.plot(dt)

Another handy plot:

In [None]:
# And let's look at the whole data set too
_ = plt.hist(dt)

### EXERCISE

- Make a plot of the `dt` log only from index `4000` up to index `4100`.

In [None]:
# YOUR CODE HERE



In [None]:
plt.plot(dt[4000:4100])

## What can `list` do?

This data structure has three important features:

- Its instances are sequences with concepts like length, membership, and iterability.
- Its instances are ordered collections that we can index and slice elements and subsequences from.
- Its instances have 'methods' attached to them, which are functions that access their data.

Let's explore!

In [None]:
dt_ten

In [None]:
# We'll talk about iterability in a minute.

# Length.
len(dt_ten)

In [None]:
# Membership.
42 in dt_ten

In [None]:
# Sortability.
sorted(dt_ten)  # Returns a copy, unlike list.sort() method.

In [None]:
# There's a built-in function to sum collections.
sum(dt_ten)

In [None]:
# Indexing.
dt_ten[0], dt_ten[3], dt_ten[11], dt_ten[-1]  # Explain why -1.

In [None]:
# Slicing.
dt_ten[5:8]    # 8 - 5 = 3 items.
dt_ten[:10]    # The first 10.
dt_ten[1:]     # All except the first.
dt_ten[:-1]    # All except the last
dt_ten[:10:2]  # Every other of first ten

In [None]:
# Assignment by index (but not slice).
dt_ten[0] = 100
dt_ten

# This CHANGES item 0, does not insert new item at 0 (method for that).
# Mutability. Be careful with it, often better to make a new thing.

In [None]:
# Methods.
dt_ten.index(100)

In [None]:
# Append is easily the most useful; works in place (lists are mutable).
dt_ten.append(50)
dt_ten

### EXERCISE

For this exercise, use `dt_ten`.
- Omit the first and last value from `dt_ten`, **assign** the result to `dt_8`.
- In `dt_8`, remove the last value and save it as `last` (this should no longer be `50`)
- Print every third value in `dt_8`.

In [None]:
# YOUR CODE HERE



In [None]:
dt_8 = dt_ten[1:-1]

In [None]:
last = dt_8.pop()
assert last != 50

In [None]:
print(dt_8[::3])

## Iterability

Often we'd lile to step over the items in a collection, transforming them or somehow using them as we go. For example, we might collect the transformed versions in a new list ("remove the endings from these filenames") or perhaps we'll loop over a list of URLs, making web requests to each one in turn and collecting the results.

In [None]:
# Iterability.
# Earlier on I tried to add 10 to the GR log.

# This is a `for` loop. Notice:
#  - No counters or increments.
#  - We get each `n` from `gr` in turn.
for n in dt_ten:
    print(n + 10)

In [None]:
# dt hasn't changed:
dt_ten

In [None]:
# `n` was an actual assignment happening for each step in `gr`:
n

In [None]:
# To 'save' the transformed numbers, I need to collect them in a new list.
dt_out = []
for n in dt_ten:
    dt_out.append(n + 10)
dt_out

### EXERCISE

Rearrange the following lines of code to make a list of depths using the datum as the new start level. Pay attention to the indentation.

In [None]:
for depth in depths:
print(adjusted_depths)
adjusted_depths = []
depths = [48.0, 63.0, 70.5, 78.0, 86.75, 100.5, 109.25, 111.75, 120.5, 120.5, 138.0, 140.5, 151.75]
adjusted_depths.append(depth - datum)
datum = 34.8

In [None]:
depths = [48.0, 63.0, 70.5, 78.0, 86.75, 100.5, 109.25, 111.75, 120.5, 120.5, 138.0, 140.5, 151.75]
adjusted_depths = []
datum = 34.8
for depth in depths:
    adjusted_depths.append(depth - datum)
    
print(adjusted_depths)

## `if` statements

One of the most common places to see booleans is in `if` statements. These allow for different blocks of code to be run depending on the result of a check.

* Basic pattern
* `if` ... `else`
* `if` ... `elif` ... `else` - mutually exclusive options
* Combined with `for` ... `in` ... `:` to control iterations
    - `break`, `continue`

Let's look at an example with our `gr_all` data:

In [None]:
# build this up

depth = '2034 ft'

if 'f' in depth.lower():
    units = 'ft'
elif 'm' in depth.lower():
    units = 'm'
else:
    untis = None

units

### EXERCISE

Rearrange the following lines of code to print 'Shallow' (<400), 'Medium' (>400, <600), or 'Deep' (>=600), based on `depth`. The code is all there, it just needs to be rearranged and indented correctly.

In [None]:
elif depth < 800:
print('Medium')
print('Shallow')
depth = 383
else:
if depth < 400:
print('Deep')

In [None]:
depth = 383
if depth < 400:
    print('Shallow')
elif depth < 800:
    print('Medium')
else:
    print('Deep')

## `break` and `continue`

The data we loaded, the `dt` curve, have no missing data, but if we load the `gr` we can see that the first few values are all `nan` values, so called for "not a number":

In [None]:
gr = list(w.data['GR'].values)
gr[:20]

So now if we want to iterate over these values, we need some control at each step of the iteration to check whether the value is valid, this is where `break` and `continue` can come in. First we'll use `numpy` to identify those `nan` values (we could also use the `math` library, but as `numpy` is the go-to library for all scientific computing in Python, we might as well get familiar with it:

In [None]:
# np.isnan returns a boolean value: True or False
np.isnan(42), np.isnan(gr[0])

In [None]:
# build this up
import numpy as np

print(f'Len of all data: {len(gr)}')

gr_clean = []
for g in gr:
    if np.isnan(g):
        print(f'Skipping value: {g}')
        continue
    else:
        gr_clean.append(g)
        
print(f'Len of cleaned data: {len(gr_clean)}')

In [None]:
print(f'Len of all data: {len(gr)}')

for idx, g in enumerate(gr):
    if np.isnan(g):
        print(f'Bad value `{g}` at idx `{idx}`, interrupting.')
        break

### EXERCISE

- First print each value in `gr_clean`.
- Next print only the first 15 values of `gr_clean`.
- Then modify your code again to only print values strictly smaller than `2`.
- Next add a condition to multiply values that are greater or equal to `2` by `10`, and print them.
- Finally, add a print statement to log that the loop is over.

In [None]:
# YOUR CODE HERE



In [None]:
for g in gr_clean[:15]:
    if g < 2:
        print(g)
    else:
        print(g * 10)

print('Loop finished.')

### Booleans

`bool`s are either `True` or `False`. These can be very useful, most obviously for selectively running particular blocks of code.

Boolean values can be obtained in a number of ways. Many functions or methods will return either `True` or `False`. Comparisons also return a `bool`:

| Equal to | Not equal to | Less than | Greater than | Less than or equal | Greater than or equal |
|----------|--------------|-----------|--------------|--------------------|-----------------------|
|   `==`   |     `!=`     |    `<`    |      `>`     |        `<=`        |          `>=`         |

Different types will never be equal (that is, something like `1 == '1'` is `False`). If you want to know if something is the same object as another, then you should use `is` and `is not`.

Some objects contain others (for example lists), and membership within a collection can be tested with `in`, which gives a `True` or `False`.

We can also link expressions that are True or False together in a few ways:

| Operation 	| Result                                                           	|
|-----------	|------------------------------------------------------------------	|
| a **or** b    	| True if either a or b is true                                    	|
| a **and** b   	| False if either a or b is false,<br>True if both a and b are true |
| **not** a     	| True if a is true, else False                                    	|

In some cases (notably with numpy arrays) `&` and `|` are used instead of `and` and `or`. `&` and `|` are bitwise operators: they are only used on numbers, and work at the level of individual 1s and 0s. In most cases you will want `and` and `or` instead.

#### Truthiness

Some things are considered to be "truthy" (and will count as `True`) while others are "falsey" (counting as `False`). Examples of things that are falsey are the following:
* `0`
* `0.0`
* empty collections (such as an empty list `[]`, and empty versions of the other datastructures that we will cover in this notebook but have not seen yet),
* empty strings (`''` or `""`).

Most other things will be truthy.

Here is a simple example, but play around with more:

```python
e_list = []

if e_list:
    print('True!')
else:
    print('False!')
    
f_list = [0]

if f_list:
    print('True!')
else:
    print('False!')
```

## Comprehensions

There's an optional extra bit of Python syntax that can sometimes help write more readable code. Any time you're doing some transformation on a collection like this, you can write it as a 'comprehension'. Let's start with a shorter subset of `gr` values:

In [None]:
dt_out = [n + 10 for n in dt_ten]
dt_out

If you find that harder to read than the `for` loop, just ignore it. You'll love it one day, but it can wait!

### EXERCISE

 - Create a list of numbers.
 - Make a new list that contains only the second half of your list.
 - Can you sort your list from largest to smallest?
 - Find the sum of the squares of the numbers in your list.
 - Append three new numbers to your list. Can you do it in one step? <a title="You might need to Google how to concatenate lists in Python.">HINT</a>

In [None]:
# YOUR CODE HERE



In [None]:
# Possible solutions here.
my_list = [1, 4, 5, 7, 3, 47, 65, 51, 11, 52]
print(f'my_list: {my_list}')
my_range = np.random.randint(low=10, high=100, size=10)
print(f'my_range: {my_range}')
print(f'2nd half of list: {my_list[len(my_list)//2:]}')
print(f'reverse sorted list: {sorted(my_list, reverse=True)}')
sum_squares = sum([n**2 for n in my_list])
print(f'Sum of squares: {sum_squares}')
my_list.extend([345, 987, -999])
print(f'extended list: {my_list}')

## Math on a `list` vs `np.ndarray`

Remember that we could not do this:

In [None]:
my_list + 10

In [None]:
np.array(my_list) + 10

In [None]:
my_list * 2

In [None]:
np.array(my_list) * 2

## Indexing and slicing `np.ndarray`

It is important to realize that everything you have learned about indexing and slicing on a `list` can be applied to a `np.ndarray`. Let's make one to illustrate:

In [None]:
arr = np.random.randint(low=10, high=100, size=50)
arr

In [None]:
arr[0], arr[-1]

In [None]:
arr[10:40:2]

In [None]:
tops = arr[:-1]
tops

In [None]:
bases = arr[1:]
bases

In [None]:
plt.plot(tops - bases)

## Handling `nan` values without a loop

When we wrote a loop to handle the `nan` values in `gr`, we actually needed `numpy` or the `math` library to identify the `nan` values, there is a shorter way to achieve the same results by leveraging `numpy`, for this we need `gr` to be a `np.ndarray` rather than a `list`, we can use typecasting to achieve this:

In [None]:
gr_arr = np.array(gr)
gr_arr[:25]

In [None]:
# Use numpy to filter out the nans
gr_arr = gr_arr[~np.isnan(gr_arr)]
print(type(gr_arr))
gr_arr[:15]

## Boolean array

What is going on here `gr_arr[~np.isnan(gr_arr)]`!?
Let's break it down to understand it:

In [None]:
empty_arr = np.array([])
empty_arr

In [None]:
test_arr   = np.array([1, 2, 3, 4, 5])
test_bools = np.array([True, True, False, True, False])
test_arr, test_bools

In [None]:
# contiguous slicing
test_arr[0:3]

In [None]:
# discontiguous slicing based on boolean array
test_arr[test_bools]

In [None]:
# adding nan values
test_gr = np.array([np.nan, 120, 80, 90, np.nan, 115, 90])
test_gr

In [None]:
np.isnan(test_gr[0])

In [None]:
np.isnan(test_gr)

In [None]:
test_gr[np.isnan(test_gr)]

In [None]:
~np.isnan(test_gr)

In [None]:
test_gr[~np.isnan(test_gr)]

## Wrapping up

So we've loaded a well log, we've modified it, looped over it, controlled the flow of our python programm, let's now finish by simply plotting it using `welly`:

In [None]:
w.data['GR'].plot()

## NEXT

To continue and do more meaningful processing on data like this,

- We're going to need one more data type (dictionaries).
- We're going to have to learn how to write our own functions.
- We'll practise making our own plots.
- We'll look at file I/O.

There's plenty more Python to learn!