## 0. Importing NumPy
To get started using NumPy, the first step is to import it.

The most common way (and method you should use) is to import NumPy as the abbreviation np.

If you see the letters np used anywhere in machine learning or data science, it's probably referring to the NumPy library.

In [30]:
import numpy as np

#check the version
print(np.__version__)

1.26.4



1. DataTypes and attributes

    Note: Important to remember the main type in NumPy is ndarray, even seemingly different kinds of arrays are still ndarray's. This means an operation you do on one array, will work on another.



In [31]:
# 1-dimensonal array, also referred to as a vector
a1 = np.array([1, 2, 3])

# 2-dimensional array, also referred to as matrix
a2 = np.array([[1, 2.0, 3.3],
               [4, 5, 6.5]])

# 3-dimensional array, also referred to as a matrix
a3 = np.array([
    [
        [1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]
    ],
    [
        [10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]
    ]
])


In [32]:
a1.shape

(3,)

In [33]:
a1.ndim, a1.size, a1.dtype, type(a1)

(1, 3, dtype('int64'), numpy.ndarray)

In [34]:
a2.shape, a2.ndim, a2.dtype, a2.size, type(a2)

((2, 3), 2, dtype('float64'), 6, numpy.ndarray)

In [35]:
a3.shape, a3.ndim, a3.dtype, a3.size, type(a3)

((2, 3, 3), 3, dtype('int64'), 18, numpy.ndarray)

In [36]:
a1

array([1, 2, 3])

In [37]:
a2

array([[1. , 2. , 3.3],
       [4. , 5. , 6.5]])

In [38]:
a3

array([[[ 1,  2,  3],
        [ 4,  5,  6],
        [ 7,  8,  9]],

       [[10, 11, 12],
        [13, 14, 15],
        [16, 17, 18]]])

# Anatomy of an array

![](anatomy_ndarrays.png)

Key terms:

    Array - A list of numbers, can be multi-dimensional.
    Scalar - A single number (e.g. 7).
    Vector - A list of numbers with 1-dimension (e.g. np.array([1, 2, 3])).
    Matrix - A (usually) multi-dimensional list of numbers (e.g. np.array([[1, 2, 3], [4, 5, 6]])).



pandas DataFrame out of NumPy arrays

This is to examplify how NumPy is the backbone of many other libraries.


In [39]:
import pandas as pd

# np.random.randint(10, size=(5, 3)):
# This generates a 2D array (matrix) of random integers.
# 10: Specifies the upper limit for the random integers (0 to 9).
# size=(5, 3): Specifies the shape of the array—5 rows and 3 columns.
df= pd.DataFrame(np.random.randint(10, size=(5,3)),columns=['a','b','c'])

df

Unnamed: 0,a,b,c
0,7,0,0
1,2,5,2
2,9,8,8
3,6,4,5
4,3,3,0


In [40]:
a2

array([[1. , 2. , 3.3],
       [4. , 5. , 6.5]])

In [41]:
df2= pd.DataFrame(a2)
df2

Unnamed: 0,0,1,2
0,1.0,2.0,3.3
1,4.0,5.0,6.5



## 2. Creating arrays

    np.array()
    np.ones()
    np.zeros()
    np.random.rand(5, 3)
    np.random.randint(10, size=5)
    np.random.seed() - pseudo random numbers
    Searching the documentation example (finding np.unique() and using it)



In [42]:
# Create a simple array
simple_array = np.array([1, 2, 3])
simple_array

array([1, 2, 3])

In [43]:
simple_array = np.array((1, 2, 3))
simple_array, simple_array.dtype

(array([1, 2, 3]), dtype('int64'))

In [44]:
ones= np.ones((3,2), dtype="int64")

In [45]:
ones

array([[1, 1],
       [1, 1],
       [1, 1]])

In [46]:
# The default datatype is 'float64'
ones.dtype

dtype('int64')

In [47]:
# You can change the datatype with .astype()
ones.astype(float)

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

In [48]:
# Create an array of zeros
zeros=np.zeros((5,3,3))

In [49]:
zeros

array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]]])

In [50]:
zeros.dtype

dtype('float64')

In [51]:
# create an array within a range of values
range_array= np.arange(3,50,5)
range_array

array([ 3,  8, 13, 18, 23, 28, 33, 38, 43, 48])

In [52]:
# random array
random_array= np.random.randint(5,20,size=(3,3,3))
random_array

array([[[ 8, 16, 15],
        [17, 18, 18],
        [14, 14, 13]],

       [[15, 11, 16],
        [16, 14, 14],
        [18,  8, 18]],

       [[19, 18,  5],
        [ 6, 15, 19],
        [15,  9,  6]]])

In [53]:
# Random array of floats (between 0 & 1)
np.random.random((5, 3))

array([[0.83661883, 0.18121882, 0.99865275],
       [0.48758687, 0.98556494, 0.77196029],
       [0.4556777 , 0.63514104, 0.3029561 ],
       [0.1398238 , 0.37217316, 0.44780167],
       [0.84533846, 0.47629431, 0.11237489]])

In [54]:
# Random 5x3 array of floats (between 0 & 1), similar to above
np.random.rand(5, 3)

array([[0.03756598, 0.30052629, 0.05182854],
       [0.76905432, 0.45950258, 0.52247767],
       [0.40416158, 0.28026448, 0.87496897],
       [0.33349382, 0.85628134, 0.84304451],
       [0.31220776, 0.64020401, 0.63979651]])

In [55]:
np.random.rand(5, 3)

array([[0.89894926, 0.96754838, 0.17147725],
       [0.34488074, 0.11109061, 0.9939489 ],
       [0.42284727, 0.17059783, 0.8409322 ],
       [0.54925726, 0.14800714, 0.82796277],
       [0.41067789, 0.24157159, 0.34177647]])



NumPy uses pseudo-random numbers, which means, the numbers look random but aren't really, they're predetermined.

For consistency, you might want to keep the random numbers you generate similar throughout experiments.

To do this, you can use np.random.seed().

What this does is it tells NumPy, "Hey, I want you to create random numbers but keep them aligned with the seed."

Let's see it.


In [56]:
# Set random seed to 0
np.random.seed(0)

# Make 'random' numbers
np.random.randint(10, size=(5, 3))

array([[5, 0, 3],
       [3, 7, 9],
       [3, 5, 2],
       [4, 7, 6],
       [8, 8, 1]])



With np.random.seed() set, every time you run the cell above, the same random numbers will be generated.

What if np.random.seed() wasn't set?

Every time you run the cell below, a new set of numbers will appear.


In [57]:
# Make more random numbers
np.random.randint(10, size=(5, 3))

array([[6, 7, 7],
       [8, 1, 5],
       [9, 8, 9],
       [4, 3, 0],
       [3, 5, 0]])



With np.random.seed() set, every time you run the cell above, the same random numbers will be generated.

What if np.random.seed() wasn't set?

Every time you run the cell below, a new set of numbers will appear.


In [58]:
# Make more random numbers
np.random.randint(10, size=(5, 3))

array([[2, 3, 8],
       [1, 3, 3],
       [3, 7, 0],
       [1, 9, 9],
       [0, 4, 7]])



Let's see it in action again, we'll stay consistent and set the random seed to 0.
