# NumPy
-------------------------------------------------------------------

Kushal Keshavamurthy Raviprakash

kushalkr2992@gmail.com

This notebook is a part of the [Python for Earth and Atmospheric Sciences](https://github.com/Kushalkr/Python_for_Earth_and_Atmospheric_Sciences) workshop.

Disclaimer: Most of the material in this notebook is derived from [Lectures on Scientific Computing with Python](http://github.com/jrjohansson/scientific-python-lectures) by Robert Johannsson.

In [None]:
%matplotlib inline 
import matplotlib.pyplot as plt # Don't worry about these two lines. 
                                #I'll explain in the future lectures

## Introduction
-------------------------------------------------------------------

**NumPy** stands for **N**umerical **Py**thon. <img height="100" src="images/numpy.png" style="float: right" />

It is part of what is called the **Sci**entific **P**ython Stack or **SciPy** stack for short. The scipy stack is a collection of packages such as NumPy, SciPy, matplotlib, SymPy, Pandas which work well together and are used extensively in the scientific community.

the `numpy` package (module) is an extension module to pure python and is used in almost all numerical computation done using python.

The `numpy` package provides high-performance vector, matrix and higher directional structures which are optimized for performance.

As mentioned in the last lecture, `numpy` is a package. It is not available in pure python by default and has to imported.

We can import the `numpy` module using any of the methods mentioned below:

```py
import numpy
```
methods are invoked with the syntax: `numpy.array()` etc...
```py
from numpy import *
```
no need to use dot `(.)` notation in this case. Ex.: `array()` suffices
```py
import numpy as np
```
the first way of importing is the better way but, it is too long. With this form of importing, we can invoke methods using the syntax: `np.array()`.

The reason the first and the third way of importing the `numpy` module or for that matter any module is best because it does not leave any room for conflict.

For Example, if you import both the `math` and the `numpy` module using the second method, 
there will be a conflict as to which definition of the *sine* function is used since both modules contain definitions of the *sine* function.

Let's jump into working with the methods and structures in the `numpy` module. 

Before doing anything, we need to import the `numpy` module.

In [None]:
import numpy as np # importing the numpy module
np.set_printoptions(precision=3) # this is to make the output look pretty

.Vectors, matrices and multi-dimension data are all called **arrays** in numpy.

In [None]:
np.__version__

## Creating Numpy arrays
-------------------------------------------------------------------

You can create numpy arrays in a number of ways. Some of the methods are:
* By converting lists and tuples
* Using array creation functions defined in the `numpy` module. Ex.: `np.linspace()`, `np.arange()` etc....
* Reading data from files and the converting it to array form.

### Converting lists and tuples

To create a vector or matrix, use the `np.array()` function with the list as the argument.

In [None]:
v = np.array([1, 2, 3, 4, 5])

v

A matrix created by providing a nested list or tuple as argument.

In [None]:
M = np.array([[1, 2],[3, 4]])
M

the vector `v` and matrix `M`, both have the same type of `ndarray`.

In [None]:
type(v), type(M)

We can see the **size** (# of elements in the entire array) and **shape** (# of elements in each dimension) of arrays using the `size` and `shape` attribute of the array respectively.

In [None]:
print(v.size, M.size)

In [None]:
print(v.shape, M.shape)

the arrays we created look very similar to lists. Why don't we use lists directly?

The main reason is that lists are very general and because of dynamic typing, they can contain any type of data. As a result, implementing functions which handle multiple data types becomes difficult.

NumPy is statically typed. Meaning, all the elements of the array will have the same type. This allows us to implement optimized functions using compiled languages such as C and FORTRAN.

### Using array generating functions

#### `np.arange()`

In [None]:
a = np.arange(0, 1, 0.1) # arguments: start, stop, step
# It is similar to the range function. Except, you can get float arrays too.
print(a)
a = np.arange(0, 100, 10)
print(a)

#### `np.linspace()` and `np.logspace()`

In [None]:
b = np.linspace(0,1,10)
print(b)

In [None]:
b = np.logspace(0,10,11, base=10)
b

#### `np.mgrid()` (Similar to `meshgrid()` in MATLAB)

In [None]:
x, y =np.mgrid[0:5, 0:5]

In [None]:
x

In [None]:
y

#### `np.diag()`

In [None]:
np.diag([1, 2, 3])

#### `np.zeros()` and `np.ones()`

In [None]:
np.zeros((3,3))

In [None]:
np.ones((5,5))

### Reading from files

We will read in data from the file `sample.txt` and put the data into an array.

In [None]:
data = np.loadtxt('data/sample.txt', delimiter=',')

In [None]:
print(data.dtype)
print(data.shape)

Let me show you what the data looks like. Don't be daunted by the weird code. You will underestand it in the next lecture.

In [None]:
fig, ax = plt.subplots(figsize=(6,6))
ax.plot(data[:,0], data[:,1])
ax.axis('equal')

## Manipulating Numpy arrays
-------------------------------------------------------------------

### Indexing

Indexing of elements in an array is done by providing indices within square brackets.

In [None]:
# Definition of the vector a.
a = np.linspace(0,20, 21, endpoint=True)
a

In [None]:
# Definition of the matrix M.
M = np.random.randint(0,100,size=(5,5)) * np.float64(1.)
print(M)
print(M.dtype)

In [None]:
a[4] # a is a vector and therefore has only one dimension

In [None]:
M[1,0] # M is a matrix or 2-dimensional array. As a result, requires 2 indices

Omitting indices altogether gives the entire array

In [None]:
M

Omitting one of the indices gives the particular *axis* of the array.

In [None]:
M[1] # Single index like so will give the second row of M

The same can be achieved with `:` instead of an index. This is much more flexible.

In [None]:
M[:,1] # second column of M

In [None]:
M[1,:] #2nd row of M

We can modify the value of an array through assignment.

In [None]:
M[0,0] = 0. # Assign 0. to the element in the first row and first column.
M

This also works for rows and columns.

In [None]:
M[:,1] = -100. # Assign -100. to all rows in the second column
M

You can provide strides for indexing.

In [None]:
M[::2, ::2] # Return elements at every second indices

### Fancy indexing

Fancy indexing is the name used when a list or an array is used to index elements of an array in-place of an index value.

In [None]:
row_indices = [1, 2, 4] # row numbers
M[row_indices] 

In [None]:
col_indices = [2, 4, -1] # column numbers
M[row_indices, col_indices]

You can mask out certain values using booleans.

In [None]:
b = np.array([x for x in range(5)]) # This is called a list comprehension which is then converted to an array
b

In [None]:
row_mask = [False, True, True, False, False] # a list of booleans masking unwanted elements
b[row_mask]

In [None]:
row_mask = np.array([0,1,1,0,0], dtype=bool) # another way of defining a mask
b[row_mask]

You can use masks to conditionally select elements.

In [None]:
mask = M < 0 # Operates on individual elements and returns True or False for each element.
print(mask,"\n\n", M) # \n is an escape sequence for newline.

In [None]:
M[mask]

### Functions to find position indices and extracting data

#### `np.where()`

In [None]:
indices = np.where(mask)
indices

In [None]:
M[indices] # Similar to fancy indexing where the row and column are sent as a tuple

#### `np.diag()`

The `np.diag()` function was used before to create diagonal matrices. But, the same function can be used to retrieve the diagonal and sub-diagonal elements of a matrix.

In [None]:
np.diag(M)

In [None]:
np.diag(M,-1) # 1st sub-diagonal below the main diagonal

In [None]:
np.diag(M,1) # 1st sub-diagonal above the main diagonal

There are many other functions such as `np.take()` and `np.choose()` that have not been discussed here.

## Linear Algebra
-------------------------------------------------------------------

If you are going to be dealing with numerical calculations, it is better vectorize your code since vectorization allows efficient computation and faster performance. Vectorizing means to write code in terms of vector and matrix operations like matrix-vector multiplication or matrix-matrix multiplication etc...

### Scalar-Array operations

Basic arithmetic operations can be used to add subtract, divide and multiple arrays with scalars.

In [None]:
v = np.arange(5) # [0, 1, 2, 3, 4]

By default, for the `numpy.array` type, the operation is performed elementwise.

In [None]:
v * 2

In [None]:
v + 2

Let's create a matrix `A`.

In [None]:
A = np.array([[ i + j*5 for i in range(5)] for j in range(5)]) # List comprehension way of creating matrix A
A

In [None]:
A + 1

In [None]:
A * 2

### Matrix Algebra

Matrix multiplication is something that is done very efficiently with `numpy`. There are two methods to perform matrix-matrix, matrix-array or array-array inner products.

#### 1.Using the `np.dot()` method 

In [None]:
np.dot(A,A)

In [None]:
np.dot(A,v)

In [None]:
np.dot(v,v)

#### 2. Casting array objects to the `matrix` type 

In [None]:
M = np.matrix(A)
v = np.matrix(v).T

In [None]:
v

In [None]:
M

In [None]:
M * M

In [None]:
M * v

In [None]:
v + M * v

## Basic Data Processing
-------------------------------------------------------------------

Let us see some basic statistical functions that we can use to perform our data analysis.

In [None]:
data = np.random.randn(1000,1000) # Normally distributed data of 100 rows and 100 columns
data.size

In [None]:
np.max(data) # largest value in the data

In [None]:
np.min(data) # smallest value in the data

In [None]:
np.mean(data) # Mean value of the given sample

In [None]:
np.std(data) # Standard deviation of the sample.

In [None]:
np.var(data) # Variance of the sample

In [None]:
np.sum(data, axis=1)

## Further Reading
-------------------------------------------------------------------

* [Official NumPy Tutorial](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)
* [SciPy Lecture Notes](http://www.scipy-lectures.org/intro/numpy/index.html)
* [Scientific Python Lectures - J R Johansson](http://nbviewer.jupyter.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-2-Numpy.ipynb)
* [Scientific Python Book](https://hplgit.github.io/primer.html/doc/pub/half/book.pdf) (For Advanced users)