# Introduction to NumPy

## What is NumPy?

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

## Why NumPy?

Put simply, it's fast at performing numeric functions. This is due to is being written in C.

To speed up calculations, NumPy uses vectorisation via broadcasting. In English, it avoids using loops as that can slow down processing time, especially with large datasets.

Finally, NumPy is also the backbone for other Python scientific packages. 

## NumPy DataTypes and Attributes

In [1]:
# --- Import NumPy amd pandas (needed for later on)
import numpy as np
import pandas as pd

In [2]:
# --- NumPy uses ndarray (n-dimensional array) for its main datatype.
# --- Create a simple one-dimensional array, also called a vector.
# --- Note: This has a shape of 1,3 (one row, three columns):
sample_array_1 = np.array([1,2,3])
sample_array_1

array([1, 2, 3])

In [3]:
# --- Create a two-dimensional array.
# --- Note 1: This has a shape of 2,3 (two rows, three columns).
# --- Note 2: As there is a float in the array, all the numbers will be converted to float:
sample_array_2 = np.array([[1, 2.0, 3.3],
                           [4, 5, 6.5]])
sample_array_2

array([[1. , 2. , 3.3],
       [4. , 5. , 6.5]])

In [4]:
# --- Create a multi-dimensional array.
# --- Note: This has a shape of 2, 3,  (two matrix's deep, three rows and three columns per matrix):
sample_array_3 = np.array([[[1, 2, 3],
                           [4, 5, 6],
                           [7, 8, 9]],
                          [[10, 11, 12],
                           [13, 14, 15],
                           [16, 17, 18]]])
sample_array_3

array([[[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9]],

       [[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]]])

In [5]:
# --- Show the shape and size of each sample array:
print(f"sample array 1. shape: {sample_array_1.shape}, size: {sample_array_1.size}")
print(f"sample array 2. shape: {sample_array_2.shape}, size: {sample_array_2.size}")
print(f"sample array 3. shape: {sample_array_3.shape}, size: {sample_array_3.size}")

sample array 1. shape: (3,), size: 3
sample array 2. shape: (2, 3), size: 6
sample array 3. shape: (2, 3, 3), size: 18


In [6]:
# --- Show the number of dimensions for each sample array:
sample_array_1.ndim, sample_array_2.ndim, sample_array_3.ndim

(1, 2, 3)

In [7]:
# --- Create a pandas dataframe from an ndarray:
sample_df_2 = pd.DataFrame(sample_array_2)
sample_df_2


Unnamed: 0,0,1,2
0,1.0,2.0,3.3
1,4.0,5.0,6.5


## Creating NumPy Arrays

In [8]:
# --- Create a 2 x 3 ndarray with values of 1 using the ones function.
# --- Note: The default datatype for each 1 is float64 so they will be 1. instead.
# --- You can change that with dtype = int:
ones = np.ones(shape=(2, 3))
ones

array([[1., 1., 1.],
       [1., 1., 1.]])

In [10]:
# --- Create a 2 x 3 ndarray with values of 1 using the zeros function.
# --- Note: The default datatype for each 0 is float64 so they will be 0. instead.
# --- You can change that with dtype = int:
zeros = np.zeros(shape=(2, 3))
zeros

array([[0., 0., 0.],
       [0., 0., 0.]])

In [12]:
# --- Create an ndarray with a range starting at 0, upto 10 and increment in 2:
range_array = np.arange(0, 10, 2)
range_array

array([0, 2, 4, 6, 8])

In [14]:
# --- Create an ndarray with 3 rows and five random integers per row:
random_array = np.random.randint(low = 0, high = 100, size = (3, 5))
print(random_array)
print(f"\nrandom_array size: {random_array.size}\nrandom_array shape: {random_array.shape}")


[[ 8 85 69 68 49]
 [83 93 35 23 70]
 [29 96 45 57 71]]

random_array size: 15
random_array shape: (3, 5)


Note: NumPy random numbers are Pseudo-random numbers. In short, it's random to us but not to a computer.

You can set the random number generators in NumPy to have a base starting point so that they start at the same point using the `np.random.seed()` function.

By default, the random.seed() is set to None. This will mean that each time a random number generator function is called, the seed will have a random value that will then generate a random number.

If you set a value in the seed function, each time a random number generator is run, it will generate the same numbers as before as the starting point will always be the same:

In [15]:
print(np.random.seed())

None


In [43]:
# --- Set the seed value to None and generate a random array of ints.
np.random.seed(seed=None)
random_array_seed_1 = np.random.randint(0, 10, size=(3, 5))
print(f"random_array_seed_1\n{random_array_seed_1}")

# --- Set the seed value to 1 and generate an array wind random ints:
np.random.seed(seed=1)
random_array_seed_2 = np.random.randint(0, 10, size=(3, 5))
print(f"\nrandom_array_seed_2\n{random_array_seed_2}")

# --- The result should be this each time:
# [[5 8 9 5 0]
# [0 1 7 6 9]
# [2 4 5 2 4]]

# --- Note: random.seed() only applies to the cell in Jupyter notebooks that it was run in.

random_array_seed_1
[[2 6 5 6 4]
 [1 5 0 9 2]
 [4 0 9 7 9]]

random_array_seed_2
[[5 8 9 5 0]
 [0 1 7 6 9]
 [2 4 5 2 4]]
