## Section 5a: Introduction to NumPy

There are two main libraries that are used during data analysis. These two libraries and NumPy and Pandas. The discussion for this notebook will be more towards numpy and pandas in the next notebook.

Most people use numpy because of their arrays. The arrays are in fact very similar to python lists. However numpy arrays contain many in built functions which allows them a lot more flexibility as compared to a simple python list.

### Section 5a.1 Creation of NumPy Arrays

### Section 5a.1.1 Creation through a list

The creation of a NumPy array can be through a list:

In [None]:
import numpy as np
np.array([2, 5, 3, 1, 4])

array([2, 5, 3, 1, 4])

All elements in NumPy must have the same type. So if we have a floating (decimal) number in one of the elements, all the elements will be cast to a floating number.

In [None]:
np.array([5.555, 1, 2, 3])

array([5.555, 1.   , 2.   , 3.   ])

As you see from the array above, all the numbers are being cast into floating numbers.

Set the type of the elements of the array through dtype.

In [None]:
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

### Section 5a.1.2 Creation through a function


In [None]:
np.zeros(5, dtype=float)

array([0., 0., 0., 0., 0.])

In [None]:
np.ones((2, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [None]:
np.full((2, 5), 1.23)

array([[1.23, 1.23, 1.23, 1.23, 1.23],
       [1.23, 1.23, 1.23, 1.23, 1.23]])

### Section 5a.1.3 Creation through range


Create an array between 0 and 20 (non inclusive) whee the difference between the numbers is 2.

In [None]:
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

Create an array of 10 values evenly spaced between 0 and 18.


In [None]:
np.linspace(0, 18, 10)

array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14., 16., 18.])

### Section 5a.1.4 Creation through random function

Create array 2 by 5, with random values between 0 and 1.

In [None]:
np.random.random((2, 5))

array([[0.57351231, 0.42966476, 0.71038471, 0.51620601, 0.11888219],
       [0.94873952, 0.31726818, 0.74798847, 0.80668331, 0.68830432]])

If you need it between eg 3 and 9:

In [None]:
upper = 9
lower = 3

np.random.random((2, 5)) * (upper - lower) + lower

array([[4.34133936, 7.47302281, 5.19628002, 7.79213455, 7.97856028],
       [3.65980181, 6.64480274, 8.08396002, 3.36039985, 4.50406058]])

Create array 2 by 5, with under a normal distribution centered at 3 with std deviation of 9.

In [None]:
np.random.normal(3, 9, (2, 5))

array([[  3.91521554,   7.7991163 , -10.54817637,   6.55411922,
         14.65402889],
       [ -2.31619832,  15.92887649,  -8.16846235,   9.7639678 ,
          9.81054729]])

Create array 2 by 5, with random values between 0 and 10 (non inclusive).

In [None]:
np.random.randint(0, 10, (2, 5))

array([[9, 6, 4, 4, 3],
       [9, 1, 9, 8, 8]])

Creation of Identity matrix.

In [None]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

### Section 5a.2 NumPy Arrays Fundamentals

#### Section 5a.2.1 NumPy Arrays Indexing

In [None]:
np.random.seed(5)

np_array0 = np.random.randint(10, size=5)
np_array1 = np.random.randint(10, size=(2, 5))
np_array2 = np.random.randint(10, size=(2, 5, 3))

In [None]:
print(f"Number of Dimensions: {np_array2.ndim}")
print(f"Th shape of the array: {np_array2.shape}")
print(f"The number of entries in the array (4 * 5 * 6): {np_array2.size}")

Number of Dimensions: 3
Th shape of the array: (2, 5, 3)
The number of entries in the array (4 * 5 * 6): 30


In [None]:
np_array0

array([3, 6, 6, 0, 9])

In [None]:
np_array0[0]

3

In [None]:
np_array1

array([[8, 4, 7, 0, 0],
       [7, 1, 5, 7, 0]])

In [None]:
np_array1[0, 0]

8

In [None]:
np_array1[0, 1]

4

In [None]:
np_array1[1, 0]

NameError: ignored

In [None]:
np_array1[1, -8]

NameError: ignored

#### Section 5a.2.2 NumPy Arrays Slicing

The colon (:) character is used to access a slice of the array. The slice notation has three sections:

```
some_array[start:stop:step]
```

In [None]:
np_array0

array([3, 6, 6, 0, 9])

In [None]:
np_array0[:2]

array([3, 6])

Every other element, starting at index 0. Hence we get indexes 0, 2 and 4.

In [None]:
np_array0[0::2]

array([3, 6, 9])

Reverse the elements in the array:

In [None]:
np_array0[::-1]

array([9, 0, 6, 6, 3])

In [None]:
np_array0[::-2]

array([9, 6, 3])

Reversed every other from index 3

In [None]:
np_array0[3::-2]

array([0, 6])

Recall the array2 looks like the following:

In [None]:
np_array1

array([[8, 4, 7, 0, 0],
       [7, 1, 5, 7, 0]])

In [None]:
np_array1[:2, :3]

array([[8, 4, 7],
       [7, 1, 5]])

Every other column

In [None]:
np_array1[:3, ::2]

array([[8, 7, 0],
       [7, 5, 0]])

### Section 5a.3 Combining NumPy Arrays

In [None]:
x = np_array0
y = np_array0[::-1]

In [None]:
np.concatenate([x, y])

array([3, 6, 6, 0, 9, 9, 0, 6, 6, 3])

In [None]:
np.hstack([x, y])

array([3, 6, 6, 0, 9, 9, 0, 6, 6, 3])

In [None]:
np.vstack([x, y])

array([[3, 6, 6, 0, 9],
       [9, 0, 6, 6, 3]])

In [None]:
np_array1

array([[8, 4, 7, 0, 0],
       [7, 1, 5, 7, 0]])

In [None]:
np.concatenate([np_array1, np_array1])

array([[8, 4, 7, 0, 0],
       [7, 1, 5, 7, 0],
       [8, 4, 7, 0, 0],
       [7, 1, 5, 7, 0]])

In [None]:
np.concatenate([np_array1, np_array1], axis=1)

array([[8, 4, 7, 0, 0, 8, 4, 7, 0, 0],
       [7, 1, 5, 7, 0, 7, 1, 5, 7, 0]])

In [None]:
np.vstack([np_array0, np_array1])

array([[3, 6, 6, 0, 9],
       [8, 4, 7, 0, 0],
       [7, 1, 5, 7, 0]])

In [None]:
y = np.array([[99],
              [99]])
np.hstack([np_array1, y])

array([[ 8,  4,  7,  0,  0, 99],
       [ 7,  1,  5,  7,  0, 99]])

### Section 5a.4 NumPy Functions

In [None]:
np_array4 = np.arange(1, 5)

In [None]:
print("np_array4     =", np_array4)
print("np_array4 + 5 =", np_array4 + 5)
print("np_array4 - 5 =", np_array4 - 5)
print("np_array4 * 2 =", np_array4 * 2)
print("np_array4 / 2 =", np_array4 / 2)
print("np_array4 // 2 =", np_array4 // 2)  # floor division

np_array4     = [1 2 3 4]
np_array4 + 5 = [6 7 8 9]
np_array4 - 5 = [-4 -3 -2 -1]
np_array4 * 2 = [2 4 6 8]
np_array4 / 2 = [0.5 1.  1.5 2. ]
np_array4 // 2 = [0 1 1 2]


In [None]:
print("np_array4     =", np_array4)
print("e^np_array4   =", np.exp(np_array4))
print("2^np_array4   =", np.exp2(np_array4))
print("3^np_array4   =", np.power(3, np_array4))

np_array4     = [1 2 3 4]
e^np_array4   = [ 2.71828183  7.3890561  20.08553692 54.59815003]
2^np_array4   = [ 2.  4.  8. 16.]
3^np_array4   = [ 3  9 27 81]


In [None]:
print("np_array4        =", np_array4)
print("ln(np_array4)    =", np.log(np_array4))
print("log2(np_array4)  =", np.log2(np_array4))
print("log10(np_array4) =", np.log10(np_array4))

np_array4        = [1 2 3 4]
ln(np_array4)    = [0.         0.69314718 1.09861229 1.38629436]
log2(np_array4)  = [0.        1.        1.5849625 2.       ]
log10(np_array4) = [0.         0.30103    0.47712125 0.60205999]


In [None]:
big_array = np.random.rand(1000000)
%timeit sum(big_array)
%timeit np.sum(big_array)

88.3 ms ± 11.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
382 µs ± 15.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [None]:
%timeit min(big_array)
%timeit np.min(big_array)

79.9 ms ± 25.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
439 µs ± 1.99 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [None]:
np_array1

array([[8, 4, 7, 0, 0],
       [7, 1, 5, 7, 0]])

In [None]:
np_array1.shape

(2, 5)

In [None]:
np_array1.min(axis=0)

array([7, 1, 5, 0, 0])

In [None]:
np_array1.min(axis=1)

array([0, 0])

In [None]:
np_array1.min()

0

The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |

### Section 5a.5 Sorting Functions

In [None]:
medium_array = np.random.rand(10000)

In [None]:
import numpy as np

def selection_sort(x):
    for i in range(len(x)):
        swap = i + np.argmin(x[i:])
        (x[i], x[swap]) = (x[swap], x[i])
    return x

In [None]:
%timeit np.sort(big_array)

107 ms ± 6.87 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [None]:
%timeit selection_sort(medium_array)

103 ms ± 19.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [None]:
np.sort(np_array1, axis=0)

array([[7, 1, 5, 0, 0],
       [8, 4, 7, 7, 0]])

In [None]:
np.sort(np_array1, axis=1)

array([[0, 0, 4, 7, 8],
       [0, 1, 5, 7, 7]])

### Section 5a.6 Indexing

In [None]:
np_array1

array([[8, 4, 7, 0, 0],
       [7, 1, 5, 7, 0]])

In [None]:
np_array1 < 3

array([[False, False, False,  True,  True],
       [False,  True, False, False,  True]])

In [None]:
np_array1[np_array1 < 3]

array([0, 0, 1, 0])

In [None]:
np.argwhere(np_array1 < 3)

array([[0, 3],
       [0, 4],
       [1, 1],
       [1, 4]])

https://www.hackerrank.com/2023-python-quiz04/