# <center>**Numpy (Numerical Python)**</center>
| <div style=font-size:1.2em>Some tools numpy provide:</div> | <div style=font-size:1.2em>Data analysis applications:</div> |
|:-----------------------------------------------------------------|:----------------------------------------------------------------|
|Array-oriented arithmetic operations with broadcasting capabilities| Fast data munging and cleaning, subsetting and filtering |
|Multidimentional arrays| transforming, among others kind of operations | Array algorithms like sorting, unique, and set operations |
|Eficient and fast mathematical functions| Descriptive statistics and aggregating/summarizing data |
|Reading/writting array data to disk| Data alignment for merging and joining heterogenous datasets |
|Linear algebra, random number generator, and Fourier transform capabilities| Expressing conditional logic as array expressions instead of python 'for' loops |
|C API for connecting Numpy with libraries written in C, C++, or FORTRAN| Group-wise data manipulation|

### **ndarray: A multidimentional**

In [1]:
import numpy as np
import numpy.random

rng= np.random.default_rng(seed=12345)

data= np.ones((2,4))
data

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.]])

<br>
Arithmetic operations (Generates copies of the original array)

In [3]:
data*12

array([[12., 12., 12., 12.],
       [12., 12., 12., 12.]])

In [4]:
data - 3

array([[-2., -2., -2., -2.],
       [-2., -2., -2., -2.]])

In [7]:
data / 2

array([[0.5, 0.5, 0.5, 0.5],
       [0.5, 0.5, 0.5, 0.5]])

In [8]:
data * 8

array([[8., 8., 8., 8.],
       [8., 8., 8., 8.]])

In [9]:
data + data
np.ze()

array([[2., 2., 2., 2.],
       [2., 2., 2., 2.]])

In [19]:
data == data

array([[ True,  True,  True,  True],
       [ True,  True,  True,  True]])

### Ways to create ndarrays 
- array  ------------------> `np.array(sequence)`
- asarray  ----------------> `np.asarray(sequence)` *(Fails if ndarray already exists)*
- zeros  ------------------> `np.zeros(array)`
- ones  -------------------> `np.ones(2,3)`
- empty  -----------------> `np.empty(2,3)`
- likeVariante  ------------> `np.zeros_like(array), np.ones_like(array), np.empty_like(array)`
- full  ---------------------> `np.full(5, fill_value= 3)`
- arange  -----------------> `np.arange(2, 10)`
- sequence-like objects -> `list, tuple, etc`

### dtypes: date types for ndarrays
|    Type |Type Code    | Description |
|-----------|-------------|-----------|
| int8, uint8 | i1, u1| Signed and unsigned 8-bit (1 byte) integer types |
| int16, uint16 | i2, u2| Signed and unsigned 16-bit integer types |
| int32, uint32| i4, u4| Signed and unsigned 32-bit integer types |
| int64, uint64 | i8, u8| Signed and unsigned 64-bit integer types |
| float16 | f2| Half-precision floating point|
| float32 | f4 *or* f| Standard single-precision floating-point; compatible with C float type|
| float64 | f8 *or* d| Standard double-precision floating-point; compatible with C double and Python float object |
| float128 | f16 *or* g| Extended-precision floating-point|
| complex64 | c8, c16| Complex numbers represented by two 32, 64, or 128 floats, repectively |
| complex128  | c32||
| complex256| ||
| bool | ?| Boolean type storing `True` and `False` values|
| object| 0| Python object type; a value can be any Python type|
| string_ | S | Fixed length ASCII string type (1 byte per character); for example, <br> to create  string data type with length 10, use `S10`|
| unicode_ | U | Fixed length Unicode type (number of bytes plataform specific); <br> same specification semantic as `string_` (e.g, `U10`)|


### Casting dtypes

In [3]:
data.dtype

dtype('float64')

In [8]:
int_array= np.arange(10, dtype= 'i8')
int_array.dtype

dtype('int64')

In [13]:
int_data= data.astype(int_array.dtype)
int_data.dtype

dtype('int64')

In [16]:
int_data.astype(float).dtype

dtype('float64')

### Indexing and slicing
When assigning a value by indexing or slicing a view is always returned; therefore, the original data in the arrays will be overwrited even if it is given to another reference variable. To avoid this use the `copy()` method.

One dimentional arrays:

In [26]:
data= np.arange(10)
data[2:5]

array([2, 3, 4])

In [28]:
data[6:]= 1
data

array([0, 1, 2, 3, 4, 5, 1, 1, 1, 1])

In [30]:
data_slice= data[2:4]
data_slice[:]= 4
data_slice

array([4, 4])

In [31]:
data

array([0, 1, 4, 4, 4, 5, 1, 1, 1, 1])

<br>Two dimentional arrays:

In [41]:
array2d= rng.standard_normal((4, 4))
array2d

array([[ 0.06114402,  0.0709146 ,  0.43365454,  0.27748366],
       [ 0.53025239,  0.53672097,  0.61835001, -0.79501746],
       [ 0.30003095, -1.60270159,  0.26679883, -1.26162378],
       [-0.07127081,  0.47404973, -0.41485376,  0.0977165 ]])

In [43]:
array2d[2]

array([ 0.30003095, -1.60270159,  0.26679883, -1.26162378])

In [39]:
array2d[1][3]; array2d[1, 3]

np.float64(0.2575577684128723)

<br>
When slicing 2D arrays, the first value correspond to axis X (rows) and the second value to axis Y (columns).

In [42]:
array2d[:2, 1:3]

array([[0.0709146 , 0.43365454],
       [0.53672097, 0.61835001]])

### Boolean indexing

In [47]:
names= np.array(["Joe", "Will", "Steven", "Steven", "Joe", "Will", "Joe"])
info= np.array([[4, 7], [0, 2], [-5, 6], [0, 0], [1, 2], [-12, -4], [3, 4]])

In [54]:
names == "Steven"

array([False, False,  True,  True, False, False, False])

In [58]:
~(names == "Steven")  # '~' negates the condition

array([ True,  True, False, False,  True,  True,  True])

In [48]:
info[names == "Joe"]

array([[4, 7],
       [1, 2],
       [3, 4]])

In [51]:
info[names == "Will", 1:]

array([[ 2],
       [-4]])

In [56]:
info[names == "Will", 1]

array([ 2, -4])

In [61]:
mask= (names == "Steven") | (names == "Joe")
info[mask]

array([[ 4,  7],
       [-5,  6],
       [ 0,  0],
       [ 1,  2],
       [ 3,  4]])

**Tip:** Python keywords `and` and `or` doesn't work with Boolean arrays.

### Fancy indexing

In [62]:
arr= np.ones((8,4))
for i in range(8):
    arr[i]= i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [63]:
arr[[2,0,5,4]]

array([[2., 2., 2., 2.],
       [0., 0., 0., 0.],
       [5., 5., 5., 5.],
       [4., 4., 4., 4.]])

In [64]:
arr[[-2,-5,-1]]

array([[6., 6., 6., 6.],
       [3., 3., 3., 3.],
       [7., 7., 7., 7.]])

In [67]:
arr= np.arange(32).reshape((8, 4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [68]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

In [71]:
arr[[1, 5, 7, 2]] [:, [0, 3, 1, 2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

### Transposing arrays and swapping axes

In [72]:
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [82]:
arr.T; arr.transpose(); arr.swapaxes(0, 1)

array([[ 0,  4,  8, 12, 16, 20, 24, 28],
       [ 1,  5,  9, 13, 17, 21, 25, 29],
       [ 2,  6, 10, 14, 18, 22, 26, 30],
       [ 3,  7, 11, 15, 19, 23, 27, 31]])

### **Pseudorandom Number Generator**

To set a rng generator to a specific seed do the following:

In [85]:
rng= np.random.default_rng(seed=12345)
type(rng)

numpy.random._generator.Generator

In [89]:
arr= rng.uniform(size=(1, 10))
arr

array([[0.31675834, 0.79736546, 0.67625467, 0.39110955, 0.33281393,
        0.59830875, 0.18673419, 0.67275604, 0.94180287, 0.24824571]])

### **Universal Functions: Fast Element-Wise Array Functions**

Universal functions, or *ufunc*, are divided in two groups: 
- Which perform transformations over the data; known as *unary*   (i.e `abs, fabs`, `sqrt`, `exp`, `isnan`, `modf`, etc)
- Which takes two array and return a single new array; known as *binary*   (i.e `add`, `maximum, fmax`, `greater`, `logical_and`, `mod`, etc)

In [101]:
arr= arr.reshape((5,2))
data= data.reshape((5,2))
np.square(arr)

array([[0.10033585, 0.63579167],
       [0.45732038, 0.15296668],
       [0.11076511, 0.35797336],
       [0.03486966, 0.45260069],
       [0.88699264, 0.06162593]])

In [100]:
np.maximum(arr, data)

array([[0.31675834, 1.        ],
       [4.        , 4.        ],
       [4.        , 5.        ],
       [1.        , 1.        ],
       [1.        , 1.        ]])

### Expresing conditional logic as array operations
The `np.where()`function works as ternary expression `x if cond else y`.

In [111]:
cond= np.array([True, False, True, True, False])
xarr= np.ones(5)
yarr= np.ones(5) * 2
for i in range(5):
    xarr[i]+= (i+1)/10
    yarr[i]+= (i+1)/10

xarr

array([1.1, 1.2, 1.3, 1.4, 1.5])

In [108]:
yarr

array([2.1, 2.2, 2.3, 2.4, 2.5])

In [112]:
result= np.where(cond, xarr, yarr)
result

array([1.1, 2.2, 1.3, 1.4, 2.5])

### Mathematical and statistical methods
*Basic array statistical and mathematical methods*

| Method         	| Description                                                                      	|
|-------------------|-----------------------------------------------------------------------------------|
| `sum`            	| Sum of all elements in the array or along an axis; zero-length arrays have sum 0 	|
| `mean`           	| Arithmetic mean; invalid (returns NaN) on zero-length arrays                     	|
| `std, var`       	| Standard deviation and variance, respectively                                    	|
| `min, max`       	| Minimum and maximum                                                              	|
| `argmin, argmax` 	| Indices of minimum and maximum elements, respectively                            	|
| `cumsum`         	| Cumulative sum of elements starting from 0                                       	|
| `cumprod`       	| Cumulative sum of elements starting from 1                                       	|

### Methods for boolean arrays
Boolean values are coerced to 1 (True) or 0 (False).

In [10]:
arr= rng.standard_normal(18)
arr > 0

array([False, False, False,  True,  True, False,  True,  True,  True,
       False, False,  True, False, False, False, False,  True, False])

In [11]:
(arr > 0).sum()  # Positive values

np.int64(7)

In [12]:
(arr <= 0).sum()  # Negative values

np.int64(11)

In [14]:
bools= np.array(arr < 0)

bools.any()  # Is there one or more True values?

np.True_

In [15]:
bools.all()  # Are all values True

np.False_

**Tip:** Use function `np.sort()` or instance method `sort()` to sort array.

### Unique and other set methods

| Method            | Description                                                                        |
|-------------------|------------------------------------------------------------------------------------|
| unique(x)         | Compute the sorted, unique elements in x                                           |
| intersect1d(x, y) | Compute the sorted, common elements in x and y                                     |
| union1d(x, y)     | Compute the sorted union of elements                                               |
| in1d(x, y)        | Compute a Boolean array indicating whether each element of x is contained in y     |
| setdiff1d(x, y)   | Set difference, elements in x that are not in y                                    |
| setxor1d(x, y)    | Set symmetric differences; elements that are in either of the arrays, but not both |

**Tip:** `unique()` function returns the data already sorted.

### **File Input and Output with Arrays**
Saving array data:

In [19]:
data= rng.standard_normal(10)
data2= rng.standard_normal(10)

np.save("important_info.npy", data)  # Saves one array
np.savez('important_info2', a= data, b= data2)  # Saves multiple arrays
np.savez_compressed('compressed_info.npz', a= data, b= data2)  # Compress and save multiple arrays

Load data from file:

In [21]:
np.load("important_info.npy")

array([-0.34143629,  1.58453379,  0.28224121,  0.90954639,  0.39507157,
       -0.66937652,  1.55536898, -1.23813903, -1.19617346, -0.42914951])

In [25]:
arrl= np.load("important_info2.npz")
arrl

NpzFile 'important_info2.npz' with keys: a, b

In [27]:
arrl['a']

array([-0.34143629,  1.58453379,  0.28224121,  0.90954639,  0.39507157,
       -0.66937652,  1.55536898, -1.23813903, -1.19617346, -0.42914951])

In [28]:
arrl['b']

array([-0.72965989, -0.5574689 , -0.59995306,  0.9868272 ,  0.05419468,
        0.35190744, -1.58796951, -0.84695135,  1.08457026, -1.20382665])

### **Linear Algebra**
*Commonly used `numpy.linalg` functions*
| Function | Description                                                                                                                                              |
|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------|
| diag     | Return the diagonal (or off-diagonal) elements of a square matrix as 1D array,<br>or convert a 1D array into a square matrix with zeros on the off-diagonal |
| dot      | Matrix multiplication                                                                                                                                    |
| trace    | Compute the sum of diagonal elements                                                                                                                     |
| det      | Compute the matrix determinant                                                                                                                           |
| eig      | Compute the eigenvalues and eigenvectors of a square matrix                                                                                              |
| inv      | Compute the inverse of a square matrix                                                                                                                   |
| pinv     | Compute the Moore-Penrose pseudoinverse of square matrix                                                                                                 |
| qr       | Compute the QR decomposition                                                                                                                             |
| svd      | Compute the singular value decomposition (SVD)                                                                                                           |
| solve    | Solve the linear system Ax= b for x, where A is a square matrix                                                                                          |
| lstsq    | Compute the least-square solution for Ax= B                                                                                                              |

**Tip:** The infix `@` works also as `dot` function

In [32]:
arr.T @ arr;  np.dot(arr.T, arr)

np.float64(17.92271369355219)