<br>

<center>

# $NumPy$

</center>

<br>

---

[Numpy](http://www.numpy.org/) is the fundamental library for all scientific computing in Python. It provides a *high-performance, homogenous, multidimensional* array and matrix objects called the `ndarray`, along with an assortment of high-level functions for fast operations on these arrays, such as:



* mathematical, logical routines 
* shape manipulations 
* sorting, selecting data
* basic linear algebra and statistical operations, 
* random number generation capabilities
* sophisticated (broadcasting) functions

> Almost every data analysis or machine learning package in the PyData ecosystem (most notably `pandas` and `scikit-learn`) uses NumPy ndarrays under the hood.

Alright, let's jump into some code. Conventionally, we start with importing the module

```python
import numpy as np
```

---
# NumPy Arrays

- A NumPy array is a homogeneous collection of elements (usually *numerics*), indexed by a tuple of positive integers.
- In NumPy the dimensions are called *axes*.
- The number of axes is known as the *rank* of the array. For example, a 2D array has a rank of 2.

Arrays can be of different *types*:

- `int, float, string, object, bool`

- type conversion can be achieved using `.astype()`

<br>

---
# Creating an `array`

Arrays can be **created** by

  * calling `np.array()` or `reshape` on Python lists or nested lists

  * using special functions like `np.zeros(), np.ones(), np.eye()`

  * using sequence generators like `np.linspace(), np.arange()`

  * using random number generators like `np.random.randint(), np.random.randn()`

You may read about other methods of array creation [in the documentation](http://docs.scipy.org/doc/numpy/user/basics.creation.html#arrays-creation)

```python
# Create a 1D array from a Python list
 In [9]: ndarr_1 = np.array([100, 3, 19, 75, 43]) 
In [10]: ndarr_1
Out[10]: array([100,   3,  19,  75,  43])

# Create a 2D array from a Python list-of-lists    
In [11]: ndarr_2 = np.array([[2, 4, 6], [3, 5, 7]])
In [12]: ndarr_2
Out[12]: 
array([[2, 4, 6],
       [3, 5, 7]])

# Create a 2D array by reshaping a 1D array
In [14]: ndarr_3 = np.arange(20).reshape(4, 5)
In [15]: ndarr_3
Out[15]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

# Create a 3x3 array filled with random integers
In [16]: arr_4 = np.random.randint(0, 100, 9).reshape(3, 3)
In [17]: arr_4
Out[17]: 
array([[66, 87, 41],
       [49, 41, 33],
       [ 5, 57, 78]])

# Create a 4x4 matrix filled with random numbers
In [26]: arr_5 = np.random.rand(4, 4)
In [27]: arr_5
Out[27]: 
array([[ 0.35827069,  0.0687782 ,  0.16042925,  0.40275736],
       [ 0.8666519 ,  0.66905765,  0.05632903,  0.41828043],
       [ 0.12067091,  0.34682221,  0.68698091,  0.87333071],
       [ 0.72627949,  0.56991059,  0.80397042,  0.50586414]])       
```       

<br>

## Run the code cells below and try different ways of array creation

<br>

# Array Subsetting

Subsetting is the action of extracting one or many elements from an array

From a 2D array, for example, we can extract
- single or multiple rows
- single or multiple columns
- a single value from a certain position
- diagonals
- elements that satisfy a logical condition

---

Arrays can be **indexed** or **subsetted** by using the `[]` square brackets accessor with

* **integer slices** like we've seen for Python lists, but here we provide one slice each axis (rows, columns) as ndarrays may be multidimensional
  
* using **boolean subsetting**  to select the elements of an array that satisfy some condition

    * Done by passing an array of booleans of the *same length* as the array to be subsetted using logical operations
      
    * Returns elements from the array where condition is `True`

---

**Example 1 - Integer slices**

```python
# Using one of the arrays we created above
In [42]: ndarr_3
Out[42]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

# Subset one element
In [43]: ndarr_3[0, 0]
Out[43]: 0

# Subset the first row    
In [44]: ndarr_3[0, :]
Out[44]: array([0, 1, 2, 3, 4])

# Subset the first column    
In [45]: ndarr_3[:, 0]
Out[45]: array([ 0,  5, 10, 15])

# Subset the 2nd and 3rd rows
In [46]: ndarr_3[1:3, :]
Out[46]: 
array([[ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

# Subset the 3rd and 4th columns
In [47]: ndarr_3[:, 2:4]
Out[47]: 
array([[ 2,  3],
       [ 7,  8],
       [12, 13],
       [17, 18]])

# Subset both rows and columns (2nd and 3rd rows and columns)
In [48]: ndarr_3[1:3, 1:3]
Out[48]: 
array([[ 6,  7],
       [11, 12]])
```

**Example 2 - Boolean Subsetting**

```python
# Create an array of numbers
In [57]: arr_6 = np.arange(15)
In [58]: arr_6
Out[58]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

# Logical operations on arrays produce Boolean arrays    
In [59]: arr_6 % 2 == 0
Out[59]: 
array([ True, False,  True, False,  True, False,  True, False,  True,
       False,  True, False,  True, False,  True], dtype=bool)

# Boolean array to subset even numbers 
In [60]: even_bool = (arr_6 % 2 == 0)

# Boolean subsetting
In [61]: arr_6[even_bool]
Out[61]: array([ 0,  2,  4,  6,  8, 10, 12, 14])
```

<br>

Run the code cells below:

<br>

---
# Mathematical Operations on NumPy arrays

These are carried out **element-wise** and can be specified

* using mathematical operators like `+, -, *, /`
* using NumPy functions like `np.add(), np.subtract(), np.multiply(), np.divide() `

```python
# Create two new arrays
In [73]: x = np.array([[2, 5],[9, 6]], dtype=np.float64)
In [74]: y = np.array([[8, 4],[3, 7]], dtype=np.float64)
In [75]: x
Out[75]: 
array([[ 2.,  5.],
       [ 9.,  6.]])
In [76]: y
Out[76]: 
array([[ 8.,  4.],
       [ 3.,  7.]])

# Add
In [77]: x + y 
Out[77]: 
array([[ 10.,   9.],
       [ 12.,  13.]])
# Subtract
In [78]: x - y
Out[78]: 
array([[-6.,  1.],
       [ 6., -1.]])
# Multiply
In [79]: x * y
Out[79]: 
array([[ 16.,  20.],
       [ 27.,  42.]])
# Divide 
In [81]: x / y
Out[81]: 
array([[ 0.25      ,  1.25      ],
       [ 3.        ,  0.85714286]])       
```

<br>

Run the code below:

<br>

---
# Array Attribues and Methods

- These include  the following (assume we have an array object called *ndarr*)

<img src="./images/numpy_1.png">

- As we saw for methods and attributes of other Python objects, these too can be explored using the in-built help. We will be seeing some of these attributes and methods again with pandas' `Series` and `DataFrame` objects.

A few examples

- `.any(), .all()` work on boolean arrays and return True if any element is True or all are True respectively

- `.argmax()` returns the **location** of the largest value 

- `.argsort()` returns an **array with locations** of elements in the array sorted in ascending order

- `.clip(ll, ul)` changes values higher than ul to ul and lower than ll to ll

- `.mean(), .std()` find the mean and standard deviation for a numeric array

<br>

Try the code below: