# Introduction to Numpy

In [1]:
import numpy as np

Numpy has different data types but its main data type is `ndarray` where n stands for number and d stands for dimensional.

In [3]:
one_dimensional = np.array([1,2,3])
type(one_dimensional)

numpy.ndarray

In [12]:
two_dimensional =  np.array([
    [12.12, 22, 44],
    [33, 66.44, 99]
])

In [13]:
three_dimensional = np.array([
    [
        [15.5, 20, 44],
        [34, 23, 78.2],
        [43, 64.3, 76]
    ],
    [
        [24, 55, 87],
        [80, 69, 32],
        [12, 15, 19]
    ]
])

An array can have 1 to nth dimension but usually they are names as followed:
1. 1D = Vector
2. 2D and above = Matrix

In [14]:
one_dimensional.shape, two_dimensional.shape, three_dimensional.shape

((3,), (2, 3), (2, 3, 3))

We can look at how many dimensions any given array has using the `ndim` method.

In [15]:
one_dimensional.ndim, two_dimensional.ndim, three_dimensional.ndim

(1, 2, 3)

In [16]:
one_dimensional.dtype, two_dimensional.dtype, three_dimensional.dtype

(dtype('int64'), dtype('float64'), dtype('float64'))

In [17]:
one_dimensional.size, two_dimensional.size, three_dimensional.size

(3, 6, 18)

Given that Pandas is build on top of Numpy we can create Pandas data frames with Numpy arrays. As a matter of fact, Machine Learning's task is to find patterns in Pandas data which in turn conists of Numpy arrays. Thus, ML finds patterns in arrays (matrix).

In [18]:
import pandas as pd

In [46]:
data_frame = pd.DataFrame(two_dimensional)
data_frame

Unnamed: 0,0,1,2
0,12.12,22.0,44.0
1,33.0,66.44,99.0


## Creating Arrays

In [26]:
one_d = np.array([1,2,3])
one_d

array([1, 2, 3])

Sometimes we just want to create an array with a given shape. We can do this using the `ones`, `zeros`, `arange`

In [30]:
ones = np.ones((2,3,3), dtype=str)
ones

array([[['1', '1', '1'],
        ['1', '1', '1'],
        ['1', '1', '1']],

       [['1', '1', '1'],
        ['1', '1', '1'],
        ['1', '1', '1']]], dtype='<U1')

In [31]:
array_from_range = np.arange(0, 20, 3)
array_from_range

array([ 0,  3,  6,  9, 12, 15, 18])

In [33]:
array_with_random_ints = np.random.randint(1, 100, (5,4))
array_with_random_ints

array([[95, 57, 25, 73],
       [36, 89, 66, 65],
       [27, 39, 85, 20],
       [81, 51, 76, 13],
       [ 2, 81, 41, 21]])

In [38]:
array_with_random_floats = np.random.random((6,3))
array_with_random_floats

array([[0.41788665, 0.9515176 , 0.16452052],
       [0.20130019, 0.66642472, 0.94649291],
       [0.84080878, 0.0437964 , 0.44402054],
       [0.4341976 , 0.14303076, 0.74665228],
       [0.57422132, 0.77845534, 0.2051858 ],
       [0.06423967, 0.28807034, 0.74105615]])

Give all this randomness, we might want to share our notebook and allow people to rerun the cells with the same random values we got. It turns out that numpy's random numbers are not really random but `pseudo random numbers` which means that we have specify a `seed` number and numpy will stick to that seed rather than picking a different seed everytime.

In [41]:
np.random.seed(5)
random_ints = np.random.randint(1, 100, (5,4))
random_ints

array([[79, 62, 17, 74],
       [ 9, 63, 28, 31],
       [81,  8, 77, 16],
       [54, 81, 28, 45],
       [78, 76, 66, 48]])

As you can see, the above random array generator doesn't change while the array generator below keeps changing everytime the cell is rerun.

In [44]:
random_ints = np.random.randint(1, 100, (5,4))
random_ints

array([[86,  8, 17, 95],
       [15, 91, 32, 10],
       [39, 48, 17,  6],
       [35, 46, 60, 25],
       [14, 32, 33, 77]])