# Tutorial/Assignment 1:

In [None]:
# Helper function for indexing. 
# Run this cell, but don't worry about understanding how it works for the purpose of this tutorial.

import matplotlib.pyplot as plt
import matplotlib.patches as mpatch

def plot_fancy_index(arr, indexed, fontsize=20):
    # arr is the original array
    # indexed is the indexed array that will be highlighted
    try:
        iter(indexed)
    except TypeError:
        indexed = [indexed]

    fig = plt.figure(figsize=(8,8))
    ax = plt.gca()    
    shape = arr.shape
    for v in np.nditer(arr):
        # Draw the rectangles
        y, x = -(v//shape[0]), v%shape[0]
        if v in indexed:
            # If it is in indexed highlight it in color C1
            ax.add_artist(mpatch.Rectangle((x, y), 1, -1, fc='C1', ec='k'))
        else:
            ax.add_artist(mpatch.Rectangle((x, y), 1, -1, ec='k'))
        ax.annotate(f'{v}\n{(-y, x)}', (x + 0.5, y - 0.5), color='w', weight='bold', 
                    fontsize=fontsize, ha='center', va='center')

    ax.set_xlim((0, shape[0]))
    ax.set_xticks(np.arange(shape[0]))
    ax.set_xticklabels(range(shape[0]), fontsize=16)
    ax.set_xlabel(r'Axis 1 $\rightarrow$', fontsize=20)
    
    ax.set_ylim((-shape[1], 0))
    ax.set_yticks(-np.arange(shape[1]))
    ax.set_yticklabels(range(shape[1]), fontsize=16)
    ax.set_ylabel(r'$\leftarrow$ Axis 0', fontsize=20)
    
    # Remove tick marks from left and bottom axes
    ax.tick_params(axis='both', which='both',length=0)
    
    # Top axis
    ax2 = plt.twiny()
    ax2.set_xticks(range(1, shape[0]+1))
    ax2.set_xticklabels(-np.array(list(reversed(range(1, shape[0]+1)))), fontsize=16)
    ax2.tick_params(axis='both', which='both',length=0)
    
    # Right axis
    ax3 = plt.twinx()
    ax3.set_yticks(range(shape[1]))
    ax3.set_yticklabels(-np.array(list(reversed(range(1, shape[1]+1)))), fontsize=16)
    ax3.set_ylim(0, shape[1])
    ax3.tick_params(axis='both', which='both',length=0)

    plt.show()

## Load the numpy library

Remember to alias it with a recognizable name.

## Load data using genfromtxt

Using the numpy documentation https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html
find a way to split the data into x and y components without using an intermediate step.

In [None]:
# This is the way it was shown in the lecture. 
data = np.genfromtxt('AtmWtAgt.csv', delimiter=',', skip_header=1)
x, y = data[:,0], data[:,1]
x, y

In [None]:
# Better way to do it

## Statistics

Print the mean and sample standard deviation of both arrays. Be careful to read the documentation for `std` to understand which standard deviation it is calculating, and whether or not you have to include an extra keyword parameter.

- https://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html
- https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html#numpy.mean

Write your own mean function using the `sum` function and the `shape` of the array.

Check that it matches the output of the NumPy `mean` function.

## Merging arrays

Combine the x and y arrays from the previous steps into one data array using the `hstack` function:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.hstack.html#numpy.hstack

Now try it with the `vstack` function:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.vstack.html#numpy.vstack

With the newly merged array, turn it back into a dataset which has x, y as columns, instead of rows.

Print the number of dimensions, shape, number of elements, and data type for this new array.

## Broadcasting

With the newly merged data multiply the x column by 10 and the y column by 20 using broadcasting, and store that in a new variable.

Using the new array, take every third element of the second column (y) and set it to zero.

Print the new array to confirm that the operation worked correctly.

## Finding specific elements

Use `argmax` to find the index of the smallest element in each row (either in x or y): https://docs.scipy.org/doc/numpy/reference/generated/numpy.argmax.html#numpy.argmax

Use the array returned by argmin to print the elements from the new array. You will need to use a for loop.

## Modifying values based on condition

Before continuing, save a copy of the data in a new variable: https://docs.scipy.org/doc/numpy/reference/generated/numpy.copy.html

For the data in the first column (d[:,0]) use the `where` function to set all values that are below the mean of that column to the negative of their value.

https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html

Print the result and convince yourself it is correct by comparing to the original array and the value of the mean.

## Copy vs view (optional)

There is a subtlety in the way that NumPy does the indexing that can on rare occasions lead to seemingly odd behaviour. Read the documentation here to find out more: https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html

## Save the data

Save the new array (with the zeros) to a new file. Make sure to use a comma for the delimiter.

Load the file into a new array and print it to convince yourself that saving the file worked.

## Linear algebra

The linear algebra functions for numpy are in the `linalg` module. To access it use

```
np.linalg.*
```

where the `*` represents the name of the function you want to use.

A full list of the functions can be found at https://docs.scipy.org/doc/numpy/reference/routines.linalg.html.

In [None]:
# Starting with the following matrix:
M = np.array([[2, 1, -1], [-3, -1, 2], [-2, 1, 2]])
M

Take the inverse of the matrix and store it in a variable

Multiply the matrix by its inverse

Check that this is close to the identity matrix by using the identity creation function in numpy: 
https://docs.scipy.org/doc/numpy/reference/generated/numpy.eye.html

Also use the `allclose` function (remember that we should not compare floating point values directly):
https://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html

Get the eigenvalues and eigenvectors of the matrix, `M`, and store them in separate variables. Print the eigenvalues and eigenvectors.

## More indexing

First we create an array and reshape it (store the array in a variable called `array`). Note the use of the `-1` which allows NumPy to calculate the missing dimension for you. 

In [None]:
array = np.arange(64).reshape((-1,16))
array

Change the reshape command above to create an array with 16 rows.

Change the reshape command to create a square matrix.

Extract the diagonal from the array using the indexing methods described in the lecture. It might be helpful to use the `arange` and `array.shape` to help build the indexes automatically. The `plot_fancy_index` function has been set up to help visualize your efforts.

In [None]:
indexed = array  # Change this line to see the effects
plot_fancy_index(array, indexed, fontsize=16)

Read about array masking and use this in combination with the NumPy `eye` function to get the diagonal elements from the array. You will have to specify the data type of the mask to be `bool`.

In [None]:
indexed = array  # Change this line to see the effects
plot_fancy_index(array, indexed, fontsize=16)

**Hard Challenge**

Use array masking to extract only the elements where the value starts with a 3, or starts with a 4 and ends with a number larger than 6.

Hints:

- Transforming the array of integers into an array of strings would help get values starting with a particular number.

- You will need to look up `defchararray` functions such as: https://docs.scipy.org/doc/numpy-1.16.0/reference/generated/numpy.core.defchararray.startswith.html#numpy.core.defchararray.startswith

- You will also need to look up the bitwise boolean operators: https://wiki.python.org/moin/BitwiseOperators

In [None]:
indexed = array  # Put your masks here
plot_fancy_index(array, indexed, fontsize=16)

## More array creation

Creating equally spaced arrays is often used when plotting. Look at the `linspace` command and create 21 equally spaced points between 0 and 10.

https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html

Use the `logspace` function to create 90 equally spaced points (in log space) between 1e-6 and 1e3 excluding the endpoint, and reshape the array so it has each decade in a row.

https://docs.scipy.org/doc/numpy/reference/generated/numpy.logspace.html#numpy.logspace

## Extra reading (optional)

Numpy has some extra modules that can help with doing discrete fourier transforms and generating pseudo random numbers. The documentation is linked here in case you wish to explore these further.

- https://docs.scipy.org/doc/numpy/reference/routines.fft.html
- https://docs.scipy.org/doc/numpy/reference/routines.random.html