<center>
<table>
  <tr>
    <td><img src="http://www.nasa.gov/sites/all/themes/custom/nasatwo/images/nasa-logo.svg" width="100"/> </td>
     <td><img src="https://github.com/astg606/py_materials/blob/master/logos/ASTG_logo.png?raw=true" width="80"/> </td>
     <td> <img src="https://www.nccs.nasa.gov/sites/default/files/NCCS_Logo_0.png" width="130"/> </td>
    </tr>
</table>
</center>

        
<center>
<h1><font color= "blue" size="+3">ASTG Python Courses</font></h1>
</center>

---

<CENTER>
<H1>
    <font color="red">Introduction to Numpy</font>
</H1>
</CENTER>

## <font color='red'> Useful References </font>

- <a href="https://numpy.org/devdocs/user/quickstart.html">Numpy Quick Tutorial</a>
- <a href="https://www.python-course.eu/numpy.php">Numpy Tutorial</a>
- <a href="https://nbviewer.jupyter.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-2-Numpy.ipynb">Numpy - multidimensional data arrays</a>
- <a href="http://mathesaurus.sourceforge.net/idl-numpy.html"> Numpy for IDL users</a>
- <a href="https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html">NumPy for Matlab users</a>
- <a href="http://mathesaurus.sourceforge.net/r-numpy.html">Numpy for R users</a>
- <a href="https://www.machinelearningplus.com/python/101-numpy-exercises-python/">101 NumPy Exercises for Data Analysis (Python)</a>
- [Python NumPy Tutorial: An Applied Introduction for Beginners](https://www.learndatasci.com/tutorials/applied-introduction-to-numpy-python-tutorial/)
- Harris, C.R., Millman, K.J., van der Walt, S.J. et al., <a href="https://www.nature.com/articles/s41586-020-2649-2">Array programming with NumPy</a>, Nature **585**, 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2

## <font color='red'> What is Numpy?</font>

- NumPy is a Python open-source library, which provides a multidimensional Python array object along with array-aware functions that operate on it.
- Efficient array computing in Python: operates on in-memory arrays. 
- The critical thing to know is that Python `for` loops are very slow! One should try to use array-operations as much as possible.
- It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good.

**NumPy Array Concepts**

- Data structure and its associated metadata fields.
- Indexing an array with slices and steps. 
- Indexing an array with masks, scalar coordinates or other arrays, so that it returns a ‘copy’ of the original data. 
- Vectorization (process of performing the same operation in the same way for each element in an array) efficiently applies operations to groups of elements.
- Broadcasting in the multiplication of two-dimensional arrays. 
- Reduction operations act along one or more axes. 

**NumPy: Foundation of the Scientific Python Ecosystem**

- NumPy provides a foundation on which other data science packages are built.
- NumPy underpins almost every Python library that does scientific or numerical computation, including SciPy, Matplotlib, Pandas, Scikit-Learn and Scikit-Image.

![numpy](https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41586-020-2649-2/MediaObjects/41586_2020_2649_Fig2_HTML.png?as=webp)
Image Source: [NumPy is the base of the scientific Python ecosystem](https://www.nature.com/articles/s41586-020-2649-2/figures/2)


![use_numpy](https://i0.wp.com/techvidvan.com/tutorials/wp-content/uploads/sites/2/2020/07/Uses-of-NumPy-1.jpg?ssl=1)
Image Source: techvidvan.com

## <font color='red'> Making Numpy Arrays</font>

- A NumPy array is a data structure that efficiently stores and accesses multidimensional arrays.
- Each NumPy Array object has 2 components:
   1. The raw array data (data buffer) stored in a single contiguous (continuous) block of memory.
   2. A metadata: used to interpret the data stored there, notably `data type` (integer, float, etc.), `shape` (number of dimensions and the size of each dimension), `start` of the data within the data buffer, `strides` (separation between elements for each dimension), `byte order` of the data (which may not be the native byte order), `basic data element’s size in bytes`, `array ordering` (C-order or Fortran-order).


![fig_array](https://i.stack.imgur.com/EeBUb.png)
Image Source: [https://i.stack.imgur.com/EeBUb.png](https://i.stack.imgur.com/EeBUb.png)

This arrangement allows for very flexible use of arrays. For instance:
- We can modify the metadata to change the interpretation of the array buffer.
- Changing the byteorder of the array is a simple change involving no rearrangement of the data.
- The shape of the array can be changed very easily without changing anything in the data buffer or any data copying at all.

**The fact that items are stored contiguously in memory allows NumPy to take advantage of vectorized instructions of modern CPUs. For example, multiple consecutive floating point numbers can be loaded in 128, 256, or 512 bits registers for vectorized arithmetical computations implemented as CPU instructions.**

![fig_array2](https://ipython-books.github.io/pages/chapter04_optimization/images/layout.png)
Image Source:  IPython Cookbook, Second Edition, by Cyrille Rossant

First we want to import the appropriate modules into our name space.

In [1]:
import numpy as np

## <font color='red'> Creating Numpy Arrays from Lists</font>

In [None]:
my_list = [1, 2, 3, 5]
print(type(my_list))
print(my_list)

In [None]:
np_array = np.array(my_list)
print(type(np_array))
print(np_array)

Elements of a one-dimensional array are accessed with the same syntax as a list:

In [None]:
my_list[0]

In [None]:
np_array[0]

In [None]:
np_array[2:]

---

### <font color="blue"> Exercise</font>
How do you access the final element in the `np_array` array?

<p>

<details><summary><b>Click here to access the solution</b></summary>
<p>


```python
np_array[-1]
```

</p>
</details>

---

`numpy.ndarray`:

- Describes the collection of items of the same type. 
    - Items in the collection can be accessed using a zero-based index.
- Every item in a `ndarray` takes the same size of a block in the memory. 
- Each element in `ndarray` is an object of the data-type object (called `dtype`).
- Any item extracted from the `ndarray` object (by slicing) is represented by a Python object of one of the array scalar types.

```python
numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)
```

| PARAMETER	| DESCRIPTION |
| ---: | :--- |
| `object` | Represent the collection object. It can be a list, tuple, dictionary, set, etc. |
| `dtype` |	Set the data type of the array elements. The default is none. |
| `copy` | By default, it is true which means the object is copied. |
| `order` |	3 possible values are assigned: C (column order, default), R (row order), or A (any) |
| `subok` |	The returned array will be base class array by default. We can change this to make the subclasses passes through by setting this option to true. |
| `ndmin` |	Represent the minimum dimensions of the resultant array. |


### <font color='blue'> Difference between List and Array </font>

We can change the last element of our list:

In [None]:
my_list[-1] ='adding a string'
print(my_list)

But the same can not be done with an array, as we get an error message:

In [None]:
np_array[-1] ='adding a string'

Create a 2d array from a list of lists:

In [None]:
my_list = [[0,1,2], [3,4,5], [6,7,8]]
my_arr2d = np.array(my_list)
print(my_arr2d)

## <font color="red">Data Types in Numpy</font>

- You may specify the data type by setting the `dtype` argument. 
- Some of the most commonly used numpy dtypes are: `float`, `int`, `bool`, `str` and `object`.
- To control the memory allocations you may choose to use one of `float32`, `float64`, `int8`, `int16` or `int32`.

Here are some of the scalar data types.

| Data Types	| Description |
| :--- | :--- |
| 	bool_	| Boolean True/False |  
| 	intc	| Same as C int | 
| 	intp	| An integer used for indexing | 
| 	int8	| Byte(-128 to 127) | 
| 	int16	| Integer(-32768 to 32767) | 
| 	int32	| Integer(-2147483648 to 2147483647) | 
| 	int64	| Integer (-9223372036854775808 to 9223372036854775807) | 
| 	uint8	| Unsigned integer(0 to 225) | 
| 	unit16	| Unsigned integer(0 to 65535) | 
| 	unit32	| Unsigned Integer(0 to 4294967295) | 
| 	unit64	| Unsigned Integer(0 to 18446744073709551615) | 
| 	float16	| Half precision float | 
| 	float32	| Single precision float | 
| 	float64	| Double precision float | 
| 	complex64	| Two 32bit float complex number | 
| 	complex128	| Two 64 bit float complex number | 

 Create a `float` 2d array:

In [None]:
my_arr2d_f = np.array(my_list, dtype='float')
print(my_arr2d_f)
print(f"Type: {my_arr2d_f.dtype}")

Convert to `int` datatype:

In [None]:
a_i = my_arr2d_f.astype('int')
print(a_i)
print(f"Type: {a_i.dtype}")

Convert to `int` then to `str` datatype:

In [None]:
a_s = my_arr2d_f.astype('int').astype('str')
print(a_s)
print(f"Type: {a_s.dtype}")

- A numpy array must have all items to be of the same data type, unlike lists. 
- If you are uncertain about what datatype your array will hold or if you want to hold characters and numbers in the same array, you can set the `dtype` as `object`.

Create a `boolean` array:

In [None]:
my_arr1d_b = np.array([1, 0, 10], dtype='bool')
print(my_arr1d_b)
print(f"Type: {my_arr1d_b.dtype}")

Create an `object` array to hold numbers as well as strings:

In [None]:
my_arr1d_obj = np.array([1, 'a'], dtype='object')
print(my_arr1d_obj)
print(f"Type: {my_arr1d_obj.dtype}")

You can always convert an array back to a python list using `tolist()`.

In [None]:
from_array_to_list = my_arr1d_obj.tolist()
print(from_array_to_list)

### <font color="blue">Important </font>
- Numpy arrays are **statically typed** and **homogeneous**. The type of the elements is determined when the array is created.
- Numpy arrays are memory efficient. An equivalent Numpy array occupies much less space than a Python list.
- <font color="red">Once a Numpy array is created, you cannot increase its **size**</font>. 
- In contrast, lists can contain elements of arbitrary type.
- Numpy arrays support vectorised operations, while lists do not.
- Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of numpy arrays can be implemented in a compiled language (C and Fortran is used).

Compared to lists, Numpy arrays are convenient as they have the following three features:

- Less memory requirement
- Faster processing
- Convenience of use for mathematical operations (due to presence of compatible built-in functions).

#### `dtype`

- The information about the type of an array is contained in its `dtype` (the size of each item in an array) attribute.
- **Once an array has been created, its `dtype` is fixed and it can only store elements of the same type.**

    
For this example where the `dtype` is integer, if we store a floating point number it will be automatically converted into an integer:

In [None]:
np_arr = np.array([10, 20, 123123])

In [None]:
np_arr.dtype

In [None]:
np_arr[-1] = 1.234
arr

In [None]:
np_arr.dtype

In [None]:
np_arr

Why is a homogeneous data type required for arrays? 
- **Less memory**
- **Speed**

In [None]:
n = 50000
x = range(n)        # List
y = np.arange(n)    # Numpy array

Memory:

In [None]:
import sys
size_list = sys.getsizeof(1) * len(x)
size_npArray = y.nbytes
print(f"Size of the list (bytes):        {size_list}")
print(f"Size of the Numpy array (bytes): {size_npArray}")
print(f"Size ratio:                      {size_list/size_npArray}")

Speed:

In [None]:
time_list = %timeit -o [e**2  for e in x]

In [None]:
time_numpy = %timeit -o y**2

In [None]:
print(f"Speedup: {time_list.best/time_numpy.best}")

## <font color='red'>Array Creation from Functions</font>

There are three different ways to create Numpy arrays:

* Conversion from other Python structures like lists (see above)
* Using Numpy functions
* Using special library functions

### <font color="blue">Using Numpy Functions</font>

The function `ones` creates an array filled with ones

In [None]:
b = np.ones((3,2))
print(b)
print(b.shape)

The function `ones_like` returns an array of ones with the same shape and type as a given array.

In [None]:
bo = np.ones_like(b)
print(bo.shape)
print(bo)

The function `zeros` an array filled with zeros.

In [None]:
# integer values
c = np.zeros((1,3), int)
print(c)
print(type(c))
print(c.dtype)

In [None]:
# complex numbers
d = np.zeros(3, complex)
print(d)
print(d.dtype)

The function `zeros_like` returns an array of zeros with the same shape and type as a given array.

In [None]:
bz = np.zeros_like(b)
print(bz.shape)
print(bz)

The `eye` function lets you create a $n \times n$ array with the diagonal 1s and the other entries 0.

In [None]:
a = np.eye(5)
print(a)

The `empty` function creates an array. Its initial content is random and depends on the state of the memory.

In [None]:
a = np.empty((2,3))
print(a)

The function `empty_like` returns a new array with the same shape and type as a given array.

In [None]:
be = np.empty_like(b)
print(be.shape)
print(be)

The `full` function creates a $n \times n$ array filled with the given value.

In [None]:
a = np.full((2,2), 3)
print(a)

The function `full_like` returns a full array with the same shape and type as a given array.

In [None]:
bf = np.full_like(b, 7)
print(bf.shape)
print(bf)

The `linspace` function creates linearly-spaced grids, with a fixed number of points and including both ends of the specified interval.

`linspace(a, b, n)` generates `n` uniformly spaced coordinates, starting with `a` and ending with `b`.

In [None]:
x = np.linspace(-5, 5, 11)
print(x)

The function `logspace` rises in a logarithmic scale. Here, the given start value is actually $base^{start}$ and ends with $base^{stop}$, with a default `base` value of 10.

In [None]:
x = np.logspace(0, 2, 11)
print(x)

The function `arange` is the Numpy equivalent of `range`.

`arange(start, stop, step=1)`

In [None]:
x = np.arange(-5, 5, 1, float)   # upper limit 5 is not included!!
print (x)

#### <font color='red'> Example</font>: compute the square of a list of numbers

In [None]:
n = int(1e6)

Using the `range` function (list):

In [None]:
time_list = %timeit -o for i in range(n): i**2

Standard way of using `arange`:

In [None]:
time_numpy1 = %timeit -o for i in np.arange(n): i**2

Best way of using arange (vectorization):

In [None]:
time_numpy2 = %timeit -o np.arange(n) **2

In [None]:
print(f"Speedup-1: {time_list.best/time_numpy1.best}")
print(f"Speedup-2: {time_list.best/time_numpy2.best}")

---

### <font color="blue"> Exercise</font>

Write a function that:
* Takes as arguments two positive integers `n` amd `m`
* Returns two Numpy arrays:
   - An array of `n` uniformly spaced elements from 1 to $10^m$.
   - An array of `n` elements logarithmically spaced from 1 to $10^m$.
   
   
<p>
<details><summary><b>Click here to access the solution</b></summary>
<p>


```python
def array_creation(n, m):
    uni_array = np.linspace(1, 10**m, n)
    log_array = np.logspace(0, m, n)
    return uni_array, log_array
```

</p>
</details>

#### Other Useful Functions

| Function	| Description |
| :--- | :--- |
| `geomspace()` |	Return evenly spaced numbers on a log scale. |
| `copy()` |	Returns a copy of the given object |
| `diag()` |	a diagonal array |
| `frombuffer()` |	buffer as a 1-D array |
| `fromfile()` |	Construct an array from text or binary file |
| `bmat()` |	Build a matrix object from a string, nested sequence, or array |
| `mat()` |	Interpret the input as a matrix |
| `vander()` |	Generate a Vandermonde matrix |
| `triu()` |	Upper triangle of array |
| `tril()` |	Lower triangle of array |
| `tri()` |	An array with ones at & below the given diagonal and zeros elsewhere |
| `diagflat()` |	two-dimensional array with the flattened input as a diagonal |
| `fromfunction()` | executing a function over each coordinate |
| `meshgrid()` |	Return coordinate matrices from coordinate vectors |

Initializing an array from a Python function: use `fromfunction()`

In [None]:
def my_func(i, j):
    """
      Function that takes as arguments two integers
      and returns a number.
    """
    return (i+1)*(j+4-i)

In [None]:
# Make 3x6 array where a[i,j] = my_func(i,j):
a = np.fromfunction(my_func, (3,6))
print(a)

---

### <font color="blue"> Exercise</font>
Use the array initialization from a function method to create a 5x5 indentity matrix.
<p>
<p>

<details><summary><b>Click here to access the solution</b></summary>
<p>


```python
def identity_func(i, j):
    return i==j

a = np.fromfunction(identity_func, (5,5)).astype(int)
```

</p>
</details>

---

---

### <font color="blue"> Using Library Functions</font>

- You can also use special library functions to create arrays.
- For example, to create an array filled with random values between 0 and 1, use random function.
- This is particularly useful for problems where you need a random state to get started.

Random numbers between [0,1) of shape 2,3:

In [None]:
print(np.random.rand(2,3))

Normal distribution with `mean=0` and `variance=1` of shape 2,3:

In [None]:
print(np.random.randn(2,3))

Random integers between [0, 10) of shape 2,3:

In [None]:
print(np.random.randint(0, 10, size=[2,3]))

Random numbers between [0,1) of shape 2,3:

In [None]:
print(np.random.random(size=[2,3]))

Pick 10 items from a given list, with equal probability:

In [None]:
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10))  

Pick 10 items from a given list with a predefined probability `p`:

In [None]:
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], 
                       size=10, 
                       p=[0.3, .1, 0.1, 0.4, 0.1]))  # picks more o's

Create random integers of size 10 between [0,10):

In [None]:
np.random.seed(100)
arr_rand = np.random.randint(0, 10, size=10)
print(arr_rand)

Get the unique items and their counts:

In [None]:
uniqs, counts = np.unique(arr_rand, return_counts=True)
print("Unique items : ", uniqs)
print("Counts       : ", counts)

* Creating and populating a Numpy array is the first step to using Numpy to perform fast numeric array computations. 
* Armed with different tools for creating arrays, you are now well set to perform basic array operations.

---

### <font color="blue"> Exercise</font>

Write a function that:
- Takes as argument a Numpy array,  and 
- Prints the entries that appear more than once.

<p>
<p>

<details><summary><b>Click here to access the solution</b></summary>
<p>


```python
def print_repeating_entries(np_array):
    """
       Print the entries that appear more than once
       in a Numpy array.
    """
    uniqs, counts = np.unique(np_array, return_counts=True)
    for c, u in zip(counts, uniqs):
        if c > 1:
            print(f"The entry {u} occurs {c} times")
```

</p>
</details>

---

## <font color='red'>Changing Array Dimension</font>
- `reshape` changes the arrangement of items so that shape of the array changes while maintaining the same number of dimensions.
- `flatten` converts a multi-dimensional array to a flat 1d array. And not any other shape.

In [None]:
a = np.array([0, 1.2, 4, -9.1, 5, 8])
print(f"Initial shape: {a.shape}")

Turn a into a $2 \times 3$ array:

In [None]:
a.shape = (2, 3) 
print(a.size)
print(f"First shape change: {a.shape}")

Turn a into a vector of length 6 again:

In [None]:
a.shape = (a.size,) 
print(f"Second shape change: {a.shape}")

Same effect as setting `a.shape`:

In [None]:
a = a.reshape(2, 3) 
print(f"Third shape change: {a.shape}")

![fig_reshape](https://backtobazics.com/wp-content/uploads/2018/08/numpy-reshape-examples.jpg)
Image Source: backtobazics.com

**There are two popular ways to implement flattening:**

- `flatten()`: 
- `ravel()`: the new array created is actually a reference to the parent array. Any changes to the new array will affect the parent as well. But is memory efficient since it does not create a copy.

Changing the flattened array does not change parent 

In [None]:
b = a.flatten()
print(f"Flattened array: {b}")

b[0] = 100 
print(a)

Changing the raveled array changes the parent also.

In [None]:
c = a.ravel()  
c[0] = 101        # changing c changes a also
print(a)

### <font color="blue"> Exercise</font>
Reshape the array below into a 8x9 array.

```python
my_array = np.linspace(0, 50, 72)
```

<p>
<p>

<details><summary><b>Click here to access the solution</b></summary>
<p>


```python
my_array = my_array.reshape(8,9)
```

</p>
</details>

## <font color='red'>Indexing with other Arrays</font>: Array Masking

* Arrays allow for a more sophisticated kind of indexing which is very powerful: array masking. 
* You can index an array with another array, and in particular with an array of boolean values. 
* This is particluarly useful to extract information from an array that matches a certain condition.

Fancy indexing is the name for when an array or list is used in-place of an index:

In [2]:
A = np.array([[n+m*10 for n in range(5)] for m in range(6)])

In [3]:
A.shape

(6, 5)

In [4]:
A

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44],
       [50, 51, 52, 53, 54]])

Select specific row by passing a list of indices:

In [5]:
row_indices = [1, 2, 3]
A[row_indices]

array([[10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34]])

We can also pass column indices:

In [6]:
col_indices = [1, 2, -1]
A[:, col_indices]

array([[ 1,  2,  4],
       [11, 12, 14],
       [21, 22, 24],
       [31, 32, 34],
       [41, 42, 44],
       [51, 52, 54]])

We can combine the row and column indices in a pair-wise selection:

In [7]:
A[row_indices, col_indices]

array([11, 22, 34])

We can also use index masks: 

- If the index mask is an Numpy array of data type `bool`, then an element is selected (`True`) or not (`False`) depending on the value of the index mask at the position of each element.

In [8]:
B = np.array([n for n in range(5)])
B

array([0, 1, 2, 3, 4])

In [9]:
row_mask = np.array([True, False, True, False, False])
row_mask

array([ True, False,  True, False, False])

In [10]:
B[row_mask]

array([0, 2])

We can also use the formulation:

In [11]:
row_mask = np.array([1, 0, 1, 0, 0], dtype=bool)
B[row_mask]

array([0, 2])

#### Masking

- A **mask** is an array that has the exact same shape as your data, but instead of your values, it holds Boolean values: either `True` or `False`. 
- You can use this mask array to index into your data array in nonlinear and complex ways. It will return all of the elements where the Boolean array has a True value.


This feature is very useful to conditionally select elements from an array, using for example comparison operators:

In [12]:
x = np.arange(0, 10, 0.5)
x

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,
       6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])

In [13]:
mask = (5 < x) * (x < 7.5)
mask

array([False, False, False, False, False, False, False, False, False,
       False, False,  True,  True,  True,  True, False, False, False,
       False, False])

In [14]:
x[mask]

array([5.5, 6. , 6.5, 7. ])

### <font color='blue'>Functions for Extracting Data from Arrays and Creating Arrays</font>

**`where()`**

The index mask can be converted to position index using the `where()` function:

In [15]:
indices = np.where(mask)
indices

(array([11, 12, 13, 14]),)

The following this indexing is equivalent to the fancy indexing `x[mask]`:

In [16]:
x[indices]

array([5.5, 6. , 6.5, 7. ])

**`diag()`**

With the `diag(v, k=0)` function we can also extract the diagonal and subdiagonals of an array:
- The default value of `k` is 0. 
- Use `k>0` for diagonals above the main diagonal.
- Use `k<0` for diagonals below the main diagonal.

In [17]:
A

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34],
       [40, 41, 42, 43, 44],
       [50, 51, 52, 53, 54]])

In [18]:
np.diag(A)

array([ 0, 11, 22, 33, 44])

For the diagonal below the main diagonal:

In [19]:
np.diag(A, -1)

array([10, 21, 32, 43, 54])

**`take()`**

The `take` function is similar to fancy indexing described above:

In [20]:
v2 = np.arange(-3,3)
v2

array([-3, -2, -1,  0,  1,  2])

In [21]:
row_indices = [1, 3, 5]
v2[row_indices] 

array([-2,  0,  2])

In [22]:
v2.take(row_indices)

array([-2,  0,  2])

`take` also works on lists and other objects:

In [23]:
np.take([-3, -2, -1,  0,  1,  2], row_indices)

array([-2,  0,  2])

**`choose()`**

Constructs an array from an index array and a set of arrays to choose from:

```python
np.choose(a,c) == np.array([c[a[I]][I] for I in ndi.ndindex(a.shape)])
```

In [24]:
which = [1, 0, 1, 0]
choices = [[-2, -3, -4, -5], [5, 6, 7, 8]]

print(choices)

[[-2, -3, -4, -5], [5, 6, 7, 8]]


In [25]:
np.choose(which, choices)

array([ 5, -3,  7, -5])

- The first element of the result will be the first element of the second (`1`+1) "array" in choices, namely, 5. 
- The second element will be the second element of the first (`0`+1) choice array, i.e., -3.
- The third element will be the third element of the second (`1`+1) choice array, i.e., 7. 
- The fourth element will be the fourth element of the first (`0`+1) choice array, i.e., -5.

### <font color='blue'>Representing Missing Values and Infinite</font>
- Undefined or missing values can be represented using:
   - `np.nan`, or
   - `np.inf` (represents infinity value).
- Numpy uses the IEEE standard for floating-point for arithmetic. This means that `np.nan` is not equivalent to infinity value.

Consider the Numpy array:

In [34]:
a = np.array([0, 1.2, 4, -9.1, 5, 8]).reshape(2,3)
print(a)

[[ 0.   1.2  4. ]
 [-9.1  5.   8. ]]


We can get few information from the array:

In [27]:
print(f"Sum:  {a.sum()}")
print(f"Mean: {a.mean()}")
print(f"STD:  {a.std()}")

Sum:  9.100000000000001
Mean: 1.5166666666666668
STD:  5.407530757091345


Insert a `nan` and an `inf`:

In [28]:
a[1,1] = np.nan  # not a number
a[1,2] = np.inf  # infinite
print(a)

[[ 0.   1.2  4. ]
 [-9.1  nan  inf]]


What happens if we get generate basic statistics?

In [29]:
print(f"Sum:  {a.sum()}")
print(f"Mean: {a.mean()}")
print(f"STD:  {a.std()}")

Sum:  nan
Mean: nan
STD:  nan


- That is not what we want to obtain even if some entries are `NaN`
- We need to exclude `NaN` values from the calculations.

To check for `NaN` values, we can use the `isnan()` method:

In [31]:
np.isnan(a) 

array([[False, False, False],
       [False,  True, False]])

In [32]:
np.isinf(a)

array([[False, False, False],
       [False, False,  True]])

Create a mask and replace `NaN` and `inf` with -999:

In [35]:
missing_bool = np.isnan(a) | np.isinf(a)
print(f"Mask: {missing_bool}")

a[missing_bool] = -999  
print(f"a: {a}")

Mask: [[False False False]
 [False False False]]
Masked value of a: [[ 0.   1.2  4. ]
 [-9.1  5.   8. ]]


**`ma.masked_where` Function**

- Mask an array where a condition is met.

```python
    ma.masked_where(condition, arr, copy=True)
```

- `condition`: masking condition
- `arr`: Numpy array to mask.
- `copy`: If True (default) make a copy of `arr` in the result. If False modify `arr` in place and return a view.
- Returns the result of masking `arr` where condition is `True`.

In [36]:
b = np.array([0, 1.2, 4, -999.0, 5, -999.0]).reshape(2,3)
print(f"b = {b}")

b = [[   0.     1.2    4. ]
 [-999.     5.  -999. ]]


Replace `-999.0` with `NaN`:

In [37]:
b[b == -999.0] = np.nan
print(f"b (with nan) = \n {b}")

b (with nan) = 
 [[0.  1.2 4. ]
 [nan 5.  nan]]


Mask the `NaN` using the `ma.masked_where` function:

In [39]:
b_new = np.ma.masked_where(np.isnan(b), b)

print(f"b (with mask) = \n {b_new}")

b (with mask) = 
 [[0.0 1.2 4.0]
 [-- 5.0 --]]


You can obtain statistical information on the array:

In [40]:
print(f"Sum:  {b_new.sum()}")
print(f"Mean: {b_new.mean()}")
print(f"STD:  {b_new.std()}")

Sum:  10.2
Mean: 2.55
STD:  2.026696819951124


- The masked array has nearly all of the methods that an Numpy array has, and a few special ones of its own. 
- For example, to find out how many unmasked values it contains, there is the `count` method:

In [41]:
print(f"b has {b_new.count()} unmasked values")

b has 4 unmasked values


To extract a Numpy array containing only the unmasked values, use the `compressed` method:

In [42]:
print("unmasked values are: {}".format(b_new.compressed()))

unmasked values are: [0.  1.2 4.  5. ]


To obtain the mask array:

In [43]:
print(np.ma.getmaskarray(b_new))

[[False False False]
 [ True False  True]]


### <font color="blue"> Exercise</font>

Write a function that:
- Takes an arbitrary array with few elements having values -999.0
- Returns a new array where -999.0 are replaced by Numpy NaN.

---

<p>
<p>
<p>
<p>

<details><summary><b>Click here to access the solution</b></summary>
<p>


```python
def assign_nan(a):
    a[a == -999.0] = np.nan
    return a
    
```

</p>
</details>

### <font color="green">Application</font>

#### Dealing with the valid range, filled value, scale factor and offset

In [83]:
a = np.linspace(-4.0, 4.0, 10)
b = np.linspace(-2, 1.0, 5)
data = np.array([[n+m*10.0 for n in a] for m in b])

In [84]:
data.shape

(5, 10)

In [85]:
data

array([[-24.        , -23.11111111, -22.22222222, -21.33333333,
        -20.44444444, -19.55555556, -18.66666667, -17.77777778,
        -16.88888889, -16.        ],
       [-16.5       , -15.61111111, -14.72222222, -13.83333333,
        -12.94444444, -12.05555556, -11.16666667, -10.27777778,
         -9.38888889,  -8.5       ],
       [ -9.        ,  -8.11111111,  -7.22222222,  -6.33333333,
         -5.44444444,  -4.55555556,  -3.66666667,  -2.77777778,
         -1.88888889,  -1.        ],
       [ -1.5       ,  -0.61111111,   0.27777778,   1.16666667,
          2.05555556,   2.94444444,   3.83333333,   4.72222222,
          5.61111111,   6.5       ],
       [  6.        ,   6.88888889,   7.77777778,   8.66666667,
          9.55555556,  10.44444444,  11.33333333,  12.22222222,
         13.11111111,  14.        ]])

In [86]:
print(f" Min value: {data.min()} \n Max value: {data.max()}")

 Min value: -24.0 
 Max value: 14.0


Assume the following:

- The valid range is: `[-10, 12]`
- The filled value is: `-9.0`
- The scale factor is: `0.15`
- The offset is: `0.85`

How do we restore the array `data`?

In [87]:
valid_min = -10.0
valid_max = 12.0
_FillValue = -9.0
scale_factor = 0.15
add_offset = 0.85

- All the values outside the valid range will be set to `NaN`.
- All the values equal to the filled value will be set to `NaN`

In [88]:
invalid = np.logical_or(data > valid_max, data < valid_min)
invalid = np.logical_or(invalid, data == _FillValue)
data[invalid] = np.nan
data

array([[        nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan,         nan,         nan],
       [        nan,         nan,         nan,         nan,         nan,
                nan,         nan,         nan, -9.38888889, -8.5       ],
       [        nan, -8.11111111, -7.22222222, -6.33333333, -5.44444444,
        -4.55555556, -3.66666667, -2.77777778, -1.88888889, -1.        ],
       [-1.5       , -0.61111111,  0.27777778,  1.16666667,  2.05555556,
         2.94444444,  3.83333333,  4.72222222,  5.61111111,  6.5       ],
       [ 6.        ,  6.88888889,  7.77777778,  8.66666667,  9.55555556,
        10.44444444, 11.33333333,         nan,         nan,         nan]])

In [89]:
print(f" Min value: {data.min()} \n Max value: {data.max()}")

 Min value: nan 
 Max value: nan


We can now apply the `scale_factor` and `add_offset`:

In [90]:
data = (data - add_offset) * scale_factor 
data = np.ma.masked_array(data, np.isnan(data))

In [91]:
data

masked_array(
  data=[[--, --, --, --, --, --, --, --, --, --],
        [--, --, --, --, --, --, --, --, -1.5358333333333334,
         -1.4024999999999999],
        [--, -1.3441666666666665, -1.2108333333333334, -1.0775,
         -0.9441666666666666, -0.8108333333333332, -0.6775,
         -0.5441666666666668, -0.4108333333333334, -0.2775],
        [-0.3525, -0.21916666666666668, -0.08583333333333334,
         0.04749999999999998, 0.1808333333333333, 0.31416666666666665,
         0.44749999999999995, 0.5808333333333332, 0.7141666666666666,
         0.8475],
        [0.7725000000000001, 0.9058333333333334, 1.0391666666666666,
         1.1724999999999999, 1.3058333333333334, 1.4391666666666667,
         1.5724999999999998, --, --, --]],
  mask=[[ True,  True,  True,  True,  True,  True,  True,  True,  True,
          True],
        [ True,  True,  True,  True,  True,  True,  True,  True, False,
         False],
        [ True, False, False, False, False, False, False, False, False,
      

In [92]:
print(f" Min value: {data.min()} \n Max value: {data.max()}")

 Min value: -1.5358333333333334 
 Max value: 1.5724999999999998


## <font color='red'>Array Inspection and Indexing</font>

We want to know:
* If it is a 1D or a 2D array or more. (`ndim`)
* How many items are present in each dimension (`shape`)
* What is its datatype (`dtype`)
* What is the total number of items in it (`size`)
* Samples of first few items in the array (through indexing)

In [None]:
a = np.array([0, 1.2, 4, -9.1, 5, 8]).reshape(2,3)

Determine the shape: `shape`

In [None]:
print('Shape: ', a.shape)

Determine the data type of the entries: `dtype`

In [None]:
print('Datatype: ', a.dtype)

Determine the number of entries: `size`

In [None]:
print('Size: ', a.size)

Determine the number of dimensions: `ndim`

In [None]:
print('Num Dimensions: ', a.ndim)

Determine the number of bytes per entry: `itemsize`

In [None]:
print('Num bytes per element: ', a.itemsize)

Determine the number of bytes: `nbites`

In [None]:
print('Num bytes: ', a.nbytes)

## <font color='red'>Array Slicing</font>

* Slicing is specified using the colon operator `:` with a `from` and `to` index before and after the column respectively. 
* The slice extends from the `from` index and ends one item before the `to` index.

In [None]:
a = np.linspace(0, 35, 36)

Reshape the array:

In [None]:
a.shape = (6,6)
print(a)

```python
a[i,j] for i=1,2 and j=0,2,4
```

In [None]:
print(a[1:3,:-1:2])   

```python
a[i,j] for i=0,3 and j=2,4
```

In [None]:
print(a[::3,2:-1:2])   

![fig_sl](https://media.geeksforgeeks.org/wp-content/uploads/Numpy1.jpg)
Image Source: geeksforgeeks.org/numpy-indexing/

### Slice and Copy

- To achieve high performance, assignments in Python usually do not copy the underlaying objects. 
- This is important for example when objects are passed between functions, to avoid an excessive amount of memory copying when it is not necessary (technical term: pass by reference).
* With `a` as Numpy array, `a[:]` is a reference to the data.

In [None]:
a = np.linspace(0, 29, 30)
a.shape = (5,6)
print("a = {}".format(a))

Extract 2nd column of `a`:

In [None]:
b = a[1,:]
print("a[1,1] before: {}".format(a[1,1]))
b[1] = 2
print("a[1,1] after: {}".format(a[1,1]))

Use the `copy` method (deep copy) to avoid referencing via slices:

In [None]:
b = a[1,:].copy()
print("a[1,1] before: {}".format(a[1,1]))
b[1] = 7777     # b and a are two different arrays now
print("a[1,1] after: {}".format(a[1,1]))

- You can also use `np.copyto(c, a)` to copy the content of `a` into `c`.
- `a` and `c` should be of the same shape.

In [None]:
c = np.zeros_like(a)
print("a = {}".format(a))
print("c = {}".format(c))

In [None]:
np.copyto(c, a)

In [None]:
print("a = {}".format(a))
print("c = {}".format(c))

---

### <font color="blue"> Exercise</font>

Consider the array:

    my_array = np.arange(64).reshape(8,8)
    
Use array slicing to extract only entries with even values.

---

<p>
<p>
<p>
<p>

<details><summary><b>Click here to access the solution</b></summary>
<p>


```python
my_array = np.arange(64).reshape(8,8)
my_array[:,::2]    
```

</p>
</details>

## <font color='red'>Array Computations</font>

Consider the operation:

In [None]:
b = 3*a - 1    # a is array, b becomes array

The above operation generates a temporary array:

* **Step 1:** tb = 3*a
* **Step 2:** b = tb - 1

**As far as possible, we want to avoid the creation of temporary arrays to limit the memory usage and to decrease the computational time associated with with array computations.**

### <font color="blue">Array Broadcasting</font>

Array broadcasting is the process of extending two arrays of different shapes and figuring out how to perform a vectorized calculation between them. Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

- **Rule 1**: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
- **Rule 2**: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
- **Rule 3**: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

![fig_broad](https://scipy-lectures.org/_images/numpy_broadcasting.png)
Image Source: scipy-lectures.org

### <font color='blue'>In-Place Array Arithmetics</font>
* Do not involve the creation of temporary arrays.

In [None]:
b = a
b *= 3  # or multiply(b, 3, b)
b -= 1  # or subtract(b, 1, b)

In-place operations:

```python
   a *= 3.0     # multiply a's elements by 3
   a -= 1.0     # subtract 1 from each element
   a /= 3.0     # divide each element by 3
   a += 1.0     # add 1 to each element
   a **= 2.0    # square all elements
```

#### Example

In [None]:
import numpy as np

def regular_ops(a):
    a = 0.0*a

def inplace_ops(a):
    a *= 0.0
    
n = 100000000
a = np.zeros(n)

In [None]:
time_reg = %timeit -o regular_ops(a)

In [None]:
time_inp = %timeit -o inplace_ops(a)

In [None]:
print("Speedup: {}".format(time_reg.best/time_inp.best))

### <font color='blue'>Math Functions and Array Arguments</font>

##### Trigonometric functions

In [None]:
b = np.linspace(1.0, 15.5, 21)
print("b: ", b)

In [None]:
c = np.sin(b)    
c = np.arcsin(c) 
c = np.sinh(b)

##### Functions for rounding

`around(arr, decimals)`: rounds values to the desired precision

decimal is the number of decimals which to which the number is to be rounded. The default value is 0. If this value is negative, then the decimal will be moved to the left.

In [None]:
print(np.around(b))    
print(np.around(b, 2))
print(np.around(b, -1)) 

`floor(arr)`: returns the largest integer not greater than the input value.

In [None]:
print(np.floor(b))

`ceil(arr)`: returns the smallest integer value greater than the array element. 

In [None]:
print(np.ceil(b))

##### Exponential and Logarithmic Functions

In [None]:
c = b**2.5 
c = np.log(b)
c = np.exp(b)
c = np.log2(b)
c = np.sqrt(b)

##### Few Stats Functions

In [None]:
a = np.array([[3,7,5],[8,4,3],[2,4,9]]) 
print("a: ", a)

`amin()` and `amax()` return the minimum and the maximum from the elements in the given array along the specified axis.

In [None]:
print("amin axis=0:", np.amin(a,0))
print("amin axis=1:", np.amin(a,1))
print("amax:       ", np.amax(a))
print("amax axis=1:", np.amax(a,1))

The `ptp()` function returns the range (maximum-minimum) of values along an axis.

In [None]:
print("ptp:      ", np.ptp(a)) 
print("ptp axis=1", np.ptp(a, axis = 1)) 
print("ptp axis=0", np.ptp(a, axis = 0)) 

`mean()` returns the arithmetic mean of elements in the array.

`median()` is defined as the value separating the higher half of a data sample from the lower half.

In [None]:
print("a:      ", a)
print("Mean:   ", a.mean(), np.mean(a))
print("StDev:  ", a.std(), np.std(a))
print("Median: ", np.median(a))

In [None]:
print("Trapezoidal integration: ", np.trapz(b))
print("finite differences (da/dx): ", np.diff(b))

### <font color='blue'>NumPy Matrices</font>

- NumPy has provides a special matrix type, `np.matrix`, which is a subclass of ndarray which makes binary operations linear algebra operations. 
- You may see it used in some existing code instead of `np.array`. 
- Numpy matrices are strictly 2-dimensional, while while Numpy arrays can be of any dimension.

Use Numpy arrays (`np.array`): 
- They are the standard vector/matrix/tensor type of numpy. Many numpy functions return arrays, not matrices.
- There is a clear distinction between element-wise operations and linear algebra operations.
- You can have standard vectors or row/column vectors if you like.

It is more likely that `np.matrix` will be deprecated.

In [None]:
x1 = np.array([1, 2, 3], float)
print("x1: ", x1)
print("Type of x1: ", type(x1))

Row vector:

In [None]:
x2 = np.matrix(x1)               
print("x2: ", x2)                   
print("Type of x2: ", type(x2))

Column vector:

In [None]:
x3 = np.mat(x1).transpose() 
print("x3: ", x3)
print("Type of x3: ", type(x3))

Identity matrix:

In [None]:
A = np.eye(3)
print(A)

Turn array to matrix:

In [None]:
B = np.mat(A)
print(B)

Vector-matrix product:

In [None]:
y2 = x2*B  
print(y2)

Matrix-vector product:

In [None]:
y3 = B*x3
print(y3)

Element-wise multiplication:

In [None]:
a = np.array([[1,2],[3,4]])
print("a = ", a)
print("a*a  = ", a*a)

Matrix multiplication:

In [None]:
m = np.mat(a)
print("m*m  = ", m*m)
print("dot  = ", np.dot(a, a)) # matrix mutiplication with Numpy array

### <font color='blue'> Stacking and Repeating Arrays</font>

Using functions:

- `repeat`: Repeat elements of an array.
- `tile`: Construct an array by repeating the original array a number of times given.
- `vstack`: Stack arrays in sequence vertically (row wise).
- `hstack`: Stack arrays in sequence horizontally (column wise).
- `concatenate`: Join a sequence of arrays along an existing axis.

we can create larger vectors and matrices from smaller ones.

**`tile` and `repeat`**

In [None]:
a = np.array([[1, 2], [3, 4]])
a

Repeat each element 3 times:

In [None]:
np.repeat(a, 3)

Tile the array 3 times:

In [None]:
np.tile(a, 3)

**`concatenate`**

In [None]:
b = np.array([[5, 6]])

In [None]:
np.concatenate((a, b), axis=0)

In [None]:
np.concatenate((a, b.T), axis=1)

**`hstack` and `vstack`**

In [None]:
np.vstack((a, b))

In [None]:
np.hstack((a, b.T))

### <font color='blue'> Universal Functions and Loops</font>

* Universal functions run much faster than for loops, which should be avoided whenever possible

In [None]:
def mat_mult_intrinsic(a,b):
    return a * b

def mat_mult_loops(a,b):
    c = np.empty(a.shape)
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            c[i,j] = a[i,j] * b[i,j]
    return c

In [None]:
N = 800
A = np.random.random((N,N))
B = np.random.random((N,N))

In [None]:
time_loop = %timeit -o mat_mult_loops(A,B)

In [None]:
time_int = %timeit -o mat_mult_intrinsic(A,B)

In [None]:
print("Speedup: {}".format(time_loop.best/time_int.best))

# <font color='red'>Reading and writing arrays to disk </font>

Numpy lets you read and write arrays into files in a number of ways. In order to use these tools well, it is critical to understand the difference between a text and a binary file containing numerical data. 
In a text file, the number
&pi;
could be written as "3.141592653589793", for example: a string of digits that a human can read, with in this case 15 decimal digits. In contrast, that same number written to a binary file would be encoded as 8 characters (bytes) that are not readable by a human but which contain the exact same data that the variable pi had in the computer's memory. <P>

The tradeoffs between the two modes are thus:
<UL>
<LI> <B>Text mode</B>: occupies more space, precision can be lost (if not all digits are written to disk), but is readable and editable by hand with a text editor. Can only be used for one- and two-dimensional arrays.
<LI> <B>Binary mode</B>: compact and exact representation of the data in memory, can't be read or edited by hand. Arrays of any size and dimensionality can be saved and read without loss of information.
</UL>

First, let's see how to read and write arrays in text mode. The np.savetxt function saves an array to a text file, with options to control the precision, separators and even adding a header:

In [None]:
arr = np.arange(10).reshape(2, 5)
print(arr)                           
np.savetxt('test.out', arr)

In [None]:
!cat test.out

And this same type of file can then be read with the matching `np.loadtxt` function:

In [None]:
arr2 = np.loadtxt('test.out')
print(arr2)

You can also use the function `np.genfromtxt` that deals with missing values

In [None]:
arr3 = np.genfromtxt('test.out', 
                     missing_values='0.000000000000000000e+00', 
                     usemask=True)
print(arr3)

### <font color="blue"> Exercise</font>

Check the Global Land-Ocean Temperature Index webpage:

<a href="http://data.giss.nasa.gov/gistemp/graphs_v3/Fig.A2.txt"> http://data.giss.nasa.gov/gistemp/graphs_v3/Fig.A2.txt</a>

We want to use Numpy and Matplotlib to write a code that reads the above dataset and reproduces the <a href="https://scied.ucar.edu/global-annual-mean-surface-temperature-change">figure</a>.

For binary data, Numpy provides the two routines:

   + `np.save`: saves a single array to a file with `.npy` extension
   + `np.savez`: can be used to save a group of arrays into a single file with `.npz` extension. 
   
The files created with these routines can then be read with the `np.load` function.

In [None]:
np.save('test.npy', arr)
# Now we read this back
arr_loaded = np.load('test.npy')

print(arr)
print(arr_loaded)

print(arr_loaded.dtype)

# Let's see if any element is non-zero in the difference.
# A value of True would be a problem.
print ('Any differences?', np.any(arr - arr_loaded))

Now let us see how the `np.savez_compressed` function works.

In [None]:
np.savez_compressed('test.npz', first=arr, second=arr2)
arrays = np.load('test.npz')
arrays.files

The object returned by `np.load` from an `.npz` file works like a dictionary:

In [None]:
a=arrays['first']
b=arrays['second']
print('a = ', a)
print('b = ', b)

* This `.npz` format is a very convenient way to package compactly and without loss of information, into a single file, a group of related arrays that pertain to a specific problem. 
* At some point, however, the complexity of your dataset may be such that the optimal approach is to use one of the standard formats in scientific data processing that have been designed to handle complex datasets, such as NetCDF or HDF5.