<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Useful-links:" data-toc-modified-id="Useful-links:-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Useful links:</a></span></li><li><span><a href="#Files" data-toc-modified-id="Files-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Files</a></span><ul class="toc-item"><li><span><a href="#Writing-to-files" data-toc-modified-id="Writing-to-files-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Writing to files</a></span></li><li><span><a href="#Reading-from-files" data-toc-modified-id="Reading-from-files-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Reading from files</a></span></li></ul></li><li><span><a href="#Plotting-and-Visualization" data-toc-modified-id="Plotting-and-Visualization-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Plotting and Visualization</a></span></li><li><span><a href="#Arrays" data-toc-modified-id="Arrays-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Arrays</a></span><ul class="toc-item"><li><span><a href="#Creating-arrays" data-toc-modified-id="Creating-arrays-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Creating arrays</a></span></li><li><span><a href="#Indexing-and-slicing" data-toc-modified-id="Indexing-and-slicing-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Indexing and slicing</a></span></li><li><span><a href="#Array-reshaping" data-toc-modified-id="Array-reshaping-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Array reshaping</a></span></li></ul></li><li><span><a href="#Array-Datatypes" data-toc-modified-id="Array-Datatypes-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Array Datatypes</a></span></li><li><span><a href="#Array-Arithmetic" data-toc-modified-id="Array-Arithmetic-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Array Arithmetic</a></span><ul class="toc-item"><li><span><a href="#Element-by-Element" data-toc-modified-id="Element-by-Element-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Element-by-Element</a></span></li><li><span><a href="#Matrix-multiplication" data-toc-modified-id="Matrix-multiplication-6.2"><span class="toc-item-num">6.2&nbsp;&nbsp;</span>Matrix multiplication</a></span></li></ul></li><li><span><a href="#An-Example" data-toc-modified-id="An-Example-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>An Example</a></span></li><li><span><a href="#More-on-File-reading-and-writing:" data-toc-modified-id="More-on-File-reading-and-writing:-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>More on File reading and writing:</a></span><ul class="toc-item"><li><span><a href="#Saving-&amp;-Loading-numpy-arrays" data-toc-modified-id="Saving-&amp;-Loading-numpy-arrays-8.1"><span class="toc-item-num">8.1&nbsp;&nbsp;</span>Saving &amp; Loading numpy arrays</a></span></li><li><span><a href="#Reading-Binary-files" data-toc-modified-id="Reading-Binary-files-8.2"><span class="toc-item-num">8.2&nbsp;&nbsp;</span>Reading Binary files</a></span></li></ul></li></ul></div>

by Lutz Gross, The University of Queensland, Australia
<a href="mailto:l.gross@uq.edu.au">l.gross@uq.edu.au</a>
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.

# Useful links:

- [numpy documentation](https://numpy.org)
- [matplotlib documentation](https://matplotlib.org)
- [scipy documentation](https://scipy.org)


# Files

Here comes a quick intro how write and read to files.

## Writing to files

To open a file for reading or writing use the `open` function. `open()`
returns a file object, and is most commonly used with two arguments: open(filename, mode).

It is commonly used with two arguments: `open(filename, mode)` where the `mode` takes the values:
- `w` open for writing. An existing file with the same name will be erased.
- `a` opens the file for appending; any data written to the file is automatically added to the end. 
- `r` opens the file for both reading only.
By default text mode `t` is used that means, you read and write strings from and to the file, which are encoded in a specific encoding. `b` appended to the mode opens the file in binary mode: now the data is read and written in the form of bytes objects. 

It is good practice to close a file immediately  if it is not used 

We want to write some code that writes the *Ricker* wavelet of a period of
`length` and given frequency `f` to the files `myRicker.csv` in the comma-separated-value (CSV) format. The time is incremented by `dt`.  

In [5]:
length=0.128
f=25
dt=0.001

In [6]:
import math
def getRicker(t, f=25):
    """
    return the value of the Ricker wavelet at time t for peak frequency f
    """
    r = (1.0 - 2.0*(math.pi**2)*(f**2)*(t**2)) * math.exp(-(math.pi**2)*(f**2)*(t**2))
    return r

You can download/open the file ['myRicker.csv'](myRicker.csv).
**Notice**: There is an extra new line character `\n` at the of string in the `write` statement. This makes sure that separate rows can be identified in the file.

## Reading from files

Now we want to read this back. First we need to open the file for reading:

We then can read the entire file as a string:

Don't forget to close the file:

Or more brutally:

In [11]:
infile=open("myRicker.csv", 'r')
del infile

Sometimes it is easier to read the file line by line, where a line is marked by `'\n'` (on Linux). `n` counts the number of lines read.

Now we want split each line to get the time and corresponding wavelet value. The numbers are 
collected as floats in two lists `times` and `ricker`:

Notice that the end of file is reached when the read line is empty (len(line)=0). Then the loop is exited.

# Plotting and Visualization

We would like to plot the Ricker wavelet. 
The `matplotlib` library provides a convenient, flexible and powerful tool for visualization at least for 2D data sets. Here we can give only a very brief introduction with more functionality being presented as the course evolves. 
For a comprehensive documentation and list of examples we refer to the [matplotlib web page](https://matplotlib.org).

Here we use the `matplotlib.pyplot` library which is a collection of command style functions but there 
is also a more general API which gives a reacher functionality:

It is very easy to plot data point we have read:

We can also plot this as a function rather than just data point:

Let's do this properly with a lenged, axis labels and title:

As now we want to save the plot to the  file `"ricker.png"`:

Notice that the file extension defined the file format. 
Click [here](ricker.png) to see ricker.png for the file.

Let's take a quick look at an image:

# Arrays

Dealing with  multidimensional array is a key functionality in scientific computing. 
The `numpy` module is the core library for this. It provides a rich set of tools to create and manipulate
arrays including tools signal processing and linear algebra. 
`numpy` works seamlessly with the `scipy` module providing further numerical capabilities 
and with `matplotlib` as introduced before. 

In this section we give an introduction into `numpy` and also in some more `matplotlib` capabilities.

## Creating arrays

We can initialize `numpy` arrays from python lists:

The type of `a` is `numpy.ndarray`

numpy arrays have a shape:

We can print entries (also called elements) of the array analogously to lists:

Arrays can have a higher rank:

So `b` be read having has two rows and three columns. 
Elements are accessed by a double index:

`Numpy` also provides several functions to create arrays for instance:

Create an array of zeros - here of shape `(2,2)`:

Or an array array of ones - try shape `(3,2)`:

There is an easy way to create a 3x3 identity matrix:

How about a 2x2 random matrix:

## Indexing and slicing

One would expect a convenient way to access elements in an array. In deed arrays support slicing:

In [32]:
c = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

Get the second row of `c`. Again counting rows starts with zero:

You can get rows 1 and 2 in one go:

The same works for column. Lets try the second column:

and the second and third column together:  

Slicing as well as indexing can be used to modify specific entries in an array. 
For instance multiplying the second row of `c` by -1:

or setting the columns 1 and 3 to 101:

Boolean array indexing allows picking arbitrary elements of an array based on a mask for instance to 
get the elements out of an array that satisfy some conditions: 

In [40]:
d = np.array([[1,2], [3, 4], [5, 6]])

Here we create a mask of those values in `c` that have a value larger than 3:

This mask can be used to gather all entries where the mask is True:

Notice that the array returned is one-dimensional (shape `(3,)`).
This can be done in one line without the extra `mask`:

Masks can also be used to assign new values to entries that meet a condition. For instance to set 
all values less or equal to three to zero:

## Array reshaping

Transposing an array is done by using the T attribute of an array object:

In [46]:
d = np.array([[1,2], [3, 4], [5, 6]])

Push array into a new shape:

**Warning** reshaping does not create a copy just a different view!
Let's test this. `s` is a reshaped version of `d` with shape `(6,)`: 

`s[1]` and `d[0,1]` have the same value:

In [53]:
s[1], d[0,1]

(2, 2)

Now we change the second entry of `s` to -1:

and `d[0,1]` has also changed:

# Array Datatypes

Every numpy array has elements of the same type. Numpy makes assumption or tries to guess a data type when you create an array. In general all functions creating arrays have an optional parameter to explicitly specify the data type the user wants. More on types can be found [here](https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)

Here all entries in the list of the np.array are integers so the resulting array is integer:

but it is sufficient to change one of the list entries to a float type to make the created array of type float:

You can also enforce the result to be of a specific type regardless of the list entries using the `dtype` argument:

`dtype` can be any standart python type: `np.int`, `np.float`, `np.complex` and also given in string form `'i'`, `'f'` and `'c'`. But there are more types available and you can also construct your own, see [dtype](https://numpy.org/doc/stable/reference/arrays.dtypes.html).

# Array Arithmetic

## Element-by-Element

As basic mathematical functions operate element-wise on arrays there is no need to loop over the entries. This is computationally more efficient and allows for clearer code.

Here some examples:

In [59]:
x = np.array([[1,2],[3,4]], dtype=np.float)
y = np.array([[5,6],[7,8]], dtype=np.float
            )

This is element-by-element sum:

This is element-by-element sum:

Just flip the sign:

Of course there is also element-by-element multiplication and division:

This can also be done in place to save memory and compute time but `x` is changing now:

Element-by-element square root (and other unitary operations) requires `numpy` function:

## Matrix multiplication

The dot function is used to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. Here some examples: 

In [66]:
X = np.array([[1,2],[3,4]])
Y = np.array([[5,6],[7,8]])

v = np.array([9,10])
w = np.array([11, 12])

Dot product of two vectors returning a floating point number:

Matrix vector product of matrix `X` and vector `v`: 

This is the same like applying the inner product between the rows 0 and 1 of `X` with vector `v`

There is also a matrix-matrix product:

# An Example

To illustrate how to work with `numpy` in a practical application we would like to interpolate some
scattered gravity Bouguer anomaly data to a rectangular grid. The data are given in a CSV file so we need to 
first read the CSV file, define a grid and then interpolate the scattered data to the grid. As the range covered 
by the data is not rectangular we need to remove improper data before writing the interpolated data back to a CSV file. Of course we also want to plot the interpolated data.  

Let's get started:

`numpy` is providing a variety of functions to directly load data files into arrays. 
The `loadtxt` function is commonly used to load CSV files. Where we are loading the
CSV file [gravity_tasmania.csv](./Data/gravity_tasmania.csv) into the 
array `gdata`. The data set is a the measurement of vertical gravity anomalies at various stations
across Tasmania. It has three columns namely the easting and northing of the station 
and the measured and corrected gravity in mGal. The columns are separated by commas 
and contains commend marked by `#`:

The argument `unpack=True` has the effect that the columns are returned as a tuple with three 1D arrays.  
Let's get some info from `x`:

We plot the position of the stations as a scatter plot:

We would like to plot the gravity distribution as recorder at the stations. For this
we interpolate the data from the point observations to a regular grid. First 
we create the grid lines `xi` and `yi` along the x and y axis using `linspace` which
return evenly spaced numbers over a specified interval:

Then these grid line positions are combined to actual grid node positions using `meshgrid`:

In [77]:
plt.figure()
plt.scatter(xi*1e-3, yi*1e-3, s=4)
plt.xlabel('easting [km]')
plt.ylabel('northing [km]')
plt.grid(True)
plt.title("Grid Points for Interpolation")

<IPython.core.display.Javascript object>

Text(0.5, 1.0, 'Grid Points for Interpolation')

Now we use the `griddata` function of the `scipy` to interpolate the scattered data `grav` at `(x,y)` to the 
grid nodes `(xi, yi)`:

Now we can plot the interpolated data `gravi` using contours:

What happens at the uncoloured area:

Now we want to write the interpolated data `gravi` but only those data that have been interpolated (not set to `nan`). So we first create an masked index to mark these entries:

Then we put the entries in `xi`, `yi` and `gravi` that correspond to non-nan entries in `gravi` into a new array `interpolated_data`.  `axis=-1` makes the stacking to work column-wise. 

And then we write them to a file ['test.out'](test.out):

# More on File reading and writing:

## Saving & Loading numpy arrays

Sometimes one want to save a `numpy` arrays for later use for instance if the computational work to 
create them was quite intense. We test this with two random arrays.

And a second array:

The [`savez`](https://numpy.org/doc/stable/reference/generated/numpy.savez.html#numpy.savez) function of `numpy` allows for fast dumping of arrays to a file (here `example`) where arrays can be labeled for easier retrieval (here `"samples"` and `"table"`):

Using [`load`](https://numpy.org/doc/stable/reference/generated/numpy.load.html) functions 
the arrays can be reloaded:

The arrays can then be retrieved using keywords (here `"samples"` and `"table"`):

## Reading Binary files

`numpy` provides also a mechanism to read (and write) arrays as binary files. 
Here we reading the "./Data/marmousi.bin" file which gives a seismic velocity distribution unsing the `fromfile` function of `numpy`. The values are given as 32 bit floating point numbers. This needs to be specified by argument `dtype="f4"`:


The veolcity data need to be reshaped as a $384 \times 122$ array. Notice that this 
information are not obtained from the file itself:

We can now plot the `velocity` data using `plt.imshow` assuming an horizontal extent of $4600m$ and a depth of $1500m$