# Working with 2D data

We have seen that `list`s and numpy `nd.array`s work well for 1D data like logs, but we will often want to work with 2D data of some kind. In this notebook we will explore how to work with something like a horizon, using numpy.

Let us start by loading our data. We are using a library named `pooch` to download them to a specific location on our hard drive. If we come back to this, the files will only be downloaded again if they have changed on the server or been deleted.

In [None]:
import pooch

spot = pooch.create(path='../data', base_url="https://geocomp.s3.amazonaws.com/data/",
                    registry={"F3_8-bit_int.sgy": "md5:cbde973eb6606da843f40aedf07793e4",
                              "F3_horizon.npy": "md5:9ba4f498ba3e2dfebeaa739aeac68d04"})

fname = spot.fetch("F3_horizon.npy")

We are also going to import `numpy` and `matplotlib` here, since we will use them throughout the notebook:

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
horizon = np.load(fname)
horizon

This is _raster_ data, where each cell in a grid has a value associated with it. We can not see all of the individual values above, but we can easily plot it, using matplotlib's `imshow` function:

In [None]:
plt.imshow(horizon)

The `imshow` function is a very useful one, and has a fair few options to control exactly how the result looks. We are going to start using a slightly different way of creating plots, using the so-called _object oriented_ paradigm. This is the recommended way in the matplotlib documentation, and what we will be using going forwards:

In [None]:
fig, ax = plt.subplots()

ax.imshow(horizon)

This looks very much like we had before, but gives us more flexibility than calling `plt.imshow` directly, especially if we want more than one plot on the same figure. We will see this more as we continue.

## 2D numpy `nd.array`s

Let us take a look at the `horizon` object more closely and see what we can do with it:

In [None]:
type(horizon)

In [None]:
horizon.shape

In [None]:
horizon.ndim

In [None]:
len(horizon), horizon.size

In [None]:
horizon.min(), horizon.max(), horizon.mean(), horizon.std()

In [None]:
np.min(horizon), np.max(horizon), np.mean(horizon), np.std(horizon)

We can plot the distribution using `hist` as we saw in the previous notebook, but we need to make sure that the ndarray is one-dimensional (otherwise it will plot hundreds of histograms, rather than one histogram of the entire horizon). The easiest way is to use the `ravel` method:

In [None]:
horizon.ravel().shape, horizon.ravel().ndim

In [None]:
horizon.ravel()

In [None]:
fig, ax = plt.subplots()
ax.hist(horizon.ravel(), bins='auto')
ax.axvline(horizon.mean(), color='red')

Instead of `ravel`, we could also use `reshape`. This is more general, as long as the shape of the resultant `ndarray` fits. In this case the proper value for `-1` is inferred from the length of the array and remaining dimensions. We will still get a 2D array, but we only want the first element:

In [None]:
horizon.reshape(1, -1).size, horizon.reshape(1, -1).shape, horizon.reshape(1, -1)

In [None]:
fig, ax = plt.subplots()
ax.hist(horizon.reshape(1, -1)[0], bins='auto')
ax.axvline(horizon.mean(), color='red')

### Indexing and Slicing

We have already seen how to index and slice in one dimension. This is slightly different in two dimensions:

In [None]:
horizon

In [None]:
horizon[0] # just the first dimension

In [None]:
horizon[0, 3]

In [None]:
horizon[0, :10]

In [None]:
horizon[:10, 0]

In [None]:
horizon[:10, 0:5]

It is important to note that we are getting back a view into the array, so we can do things like we saw above:

In [None]:
horizon[:10, 0:5].size, horizon[:10, 0:5].shape

In [None]:
horizon[:10, 0:5].min(), horizon[:10, 0:5].max(), horizon[:10, 0:5].mean()

We can also show this more visually:

In [None]:
fig, ax = plt.subplots()

ax.imshow(horizon[:300, 450:])

In [None]:
fig, ax = plt.subplots()

ax.imshow(horizon[::30, ::25])

In [None]:
fig, ax = plt.subplots()

ax.imshow(horizon[500:600:5, 800::5])

### Boolean Indexing

We have seen how this can work for 1D data, and we can do the same sort of thing for 2D:

In [None]:
horizon > 0.6

In [None]:
b_horizon = horizon > 0.6
b_horizon.shape, horizon.shape

In [None]:
fig, ax = plt.subplots()

im = ax.imshow(b_horizon)
plt.colorbar(im)

Having a boolean numpy array can let us do things to only part of our array, such as masking data:

In [None]:
masked = horizon.copy()
masked[horizon > 0.6] = np.nan

In [None]:
fig, ax = plt.subplots()
im = ax.imshow(masked)
plt.colorbar(im)

Combining binary tests on numpy array uses a different syntax to normal binary combinations. We need to use bitwise combinators, rather than the normal ones we have previously introduced:

| Standard Operator | Bitwise Equivalent |                        Description                       |
|:-----------------:|:------------------:|:--------------------------------------------------------:|
|         or        |         \|         |     True if element matches either of two conditions     |
|        and        |          &         |      True if element matches both of two conditions      |
|        xor        |          ^         | True if element matches in exactly one of two conditions |
|        not        |          ~         |                   Invert boolean array                   |

### Exercise

1. Any ideas?

In [None]:
# TODO: things

## More `imshow` options

`imshow` is the standard way to view raster data using matplotlib. It has a number of options to facilitate that, so let us take a look at a couple of them.

The default for `imshow` is to have the origin (0, 0) to the top-left, but we want it to be bottom left. We can do that by passing the argument `origin=lower` to `imshow`. We can also change the colour map with `cmap`:

In [None]:
fig, ax = plt.subplots()

im = ax.imshow(horizon, cmap='Greys',
               origin='lower',
              )
plt.colorbar(im, ax=ax)

Colour maps are reversible by adding `_r` at the end of the name:

In [None]:
fig, ax = plt.subplots()

im = ax.imshow(horizon, cmap='Greys_r',
               origin='lower',
              )
plt.colorbar(im, ax=ax)

We can also make the plots aware of real-world co-ordinates, such as inline/xline numbers. We can do this with the `extents` argument, which is a list (or similar data structure) with elements in the order left, right, bottom, top. Note that this is not a true georeferencing of the data: it simply gets values for the corner points in a rectangular reference frame. If you require more than that, look into libraries like GDAL or xarray.

In [None]:
fig, ax = plt.subplots(figsize=(10, 8))

im = ax.imshow(horizon, cmap='Greys_r',
               origin='lower',
               extent=[300, 1250, 100, 750], # we need to get these from somewhere else
              )
plt.colorbar(im, ax=ax, shrink=0.7)

We can also change the interpolation used. See [the docs](https://matplotlib.org/stable/gallery/images_contours_and_fields/interpolation_methods.html) for an overview.

In [None]:
fig, axs = plt.subplots(figsize=(12, 8), ncols=2)

im1 = axs[0].imshow(horizon[100:150, 650:700],
               origin='lower',
               interpolation='none',
              )
axs[0].set_title('No Interpolation')

im2 = axs[1].imshow(horizon[100:150, 650:700],
               origin='lower',
               interpolation='bicubic',
              )
axs[1].set_title('Bicubic Interpolation')

## More dimensions

The above concepts work for more than two dimensions. Numpy can handle up to 32 dimensions, which is hopefully enough for your use-case!

 We can take a look at a 3D survey, for example:

In [None]:
import segyio

# Load some seismic
fname = spot.fetch('F3_8-bit_int.sgy')

with segyio.open(fname) as s:
    vol = segyio.cube(s)

In [None]:
type(vol)

In [None]:
vol.shape

In [None]:
fig, ax = plt.subplots()

ax.imshow(vol[:, :, 300])

In [None]:
fig, ax = plt.subplots()

ax.imshow(vol[..., 300])

### Exercise

- Try plotting a vertical section through the data. You'll need to think about indexing into `vol`. It might look a little strange, but we can talk about how to fix that, so do not get too hung-up on it.
- Can you make a histogram of the amplitudes? Tip: Use only one slice of the data and use the `ravel()` method on it to change it into a 1D array. If there are NaNs in the data, you may need to deal with them.

In [None]:
fig, ax = plt.subplots(figsize=(6, 10))
ax.imshow(vol[200, :, :].T)

In [None]:
# The main trick is that you have to flatten the array:
fig, ax = plt.subplots(figsize=(15, 3))
n, bins, _ = ax.hist(vol[200, :, :].ravel(), bins=127, range=(-127, 127))
# ax.set_yscale('log', nonpositive='clip')