<a href="https://colab.research.google.com/github/Pabloo22/Data-Science-Libraries-Notes/blob/main/numpy_notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import numpy as np

### Creating Arrays

In [None]:
a = np.array([10, 20, 30])
b = np.array([1, 77, 2, 3])

#### Multidimensional Arrays

In [None]:
a = np.array([
 [10, 20, 30],
 [40, 50, 60]
], dtype=int)
print(a[1][2])
a

60


array([[10, 20, 30],
       [40, 50, 60]])

#### Zeros and Ones

In [None]:
a = np.zeros((3, 3))
print(a)
b = np.ones((2, 3, 4, 2))

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


#### Empty and Random

The function empty creates an array without initializing the values at all.
This makes it a little bit faster but also more dangerous to use, since the
user needs to manually initialize all the values.


In [None]:
a = np.empty((4,4))
a

array([[4.68049947e-310, 1.03977794e-312, 1.01855798e-312,
        9.54898106e-313],
       [1.01855798e-312, 1.03977794e-312, 1.23075756e-312,
        1.06099790e-312],
       [1.12465777e-312, 9.76118064e-313, 1.08221785e-312,
        1.12465777e-312],
       [1.01855798e-312, 4.44659081e-322, 0.00000000e+000,
        0.00000000e+000]])

When using the random function, make sure that you are referring to the
module np.random. You need to write it two times because otherwise you
are calling the library

In [None]:
b = np.random.random((2,3))
b

array([[0.8050872 , 0.90205914, 0.03840028],
       [0.10183105, 0.90584835, 0.60926322]])

#### Ranges

Instead of just filling arrays with the same values, we can fill create
sequences of values by specifying the boundaries. For this, we can use two
different functions, namely arange and linspace

In [None]:
a = np.arange(10, 50, 5)
# The function arange creates a list with values that range from the minimum
# to the maximum. The step-size has to be specified in the parameters.
a

array([10, 15, 20, 25, 30, 35, 40, 45])

By using linspace we also create a list from a minimum value to a
maximum value. But instead of specifying the step-size, we specify the
amount of values that we want to have in our list. They will all be spread
evenly and have the same distance to their neighbors.

In [None]:
b = np.linspace(0, 100, 11)
# Here, we want to create a list that ranges from 0 to 100 and contains 11
# elements. This fits smoothly with a difference of 10 between all numbers.
# So the result looks like this:
b

array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.])

#### Not a Number (NaN)

There is a special value in NumPy that represents values that are not
numbers. It is called NaN and stands for Not a Number. We basically just
use it as a placeholder for empty spaces. It can be seen as a value that
indicates that something is missing at that place.


When importing big data packets into our application, there will sometimes
be missing data. Instead of just setting these values to zero or something
else, we can set them to NaN and then filter these data sets out.

#### Attributes of Arrays

In [None]:
a.shape  # Returns the shape of the array e.g. (3,3) or (3,4,7)

(8,)

In [None]:
a.ndim  # Returns how many dimensions our array has

1

In [None]:
a.size  # Returns the amount of element an array has

8

In [None]:
a.dtype  # Returns the data type of the values in the array

dtype('int64')

### Mathematical Operations

Now that we know how to create an array and what attributes it has, let’s
take a look at how to work with arrays. For this, we will start out with basic
mathematical operations.

In [None]:
a = np.array([
 [1,4,2],
 [8,8,2]])

In [None]:
a + 2

array([[ 3,  6,  4],
       [10, 10,  4]])

In [None]:
a - 2

array([[-1,  2,  0],
       [ 6,  6,  0]])

In [None]:
a * 2

array([[ 2,  8,  4],
       [16, 16,  4]])

In [None]:
a / 2

array([[0.5, 2. , 1. ],
       [4. , 4. , 1. ]])

When we perform basic arithmetic operations like addition, subtraction,
multiplication and division to an array and a scalar, we apply the operation
on every single element in the array. But what happens when we apply these operations on two arrays?

In [None]:
a

array([[1, 4, 2],
       [8, 8, 2]])

In [None]:
b = np.array([[1,2,3]])

In [None]:
c = np.array([[1], 
              [2]])

In [None]:
d = np.array([[1, 2, 3],
              [3, 2, 1]])

In order to apply these operations on two arrays, we need to take care of the
shapes. They don’t have to be the same, but there has to be a reasonable
way of performing the operations. We then again apply the operations on
each element of the array.


In [None]:
print(a, "+")
print(b, "=")
a + b

[[1 4 2]
 [8 8 2]] +
[[1 2 3]] =


array([[ 2,  6,  5],
       [ 9, 10,  5]])

In [None]:
print(a, "+")
print(c, "=")
a + c

[[1 4 2]
 [8 8 2]] +
[[1]
 [2]] =


array([[ 2,  5,  3],
       [10, 10,  4]])

And of course it also works, when the shapes match exactly. The only
problem is when the shapes differ too much and there is no reasonable way
of performing the operations. In these cases, we get ValueErrors.

#### Mathematical Functions

In [None]:
np.exp(a)  # Takes e to the power of each value

array([[2.71828183e+00, 5.45981500e+01, 7.38905610e+00],
       [2.98095799e+03, 2.98095799e+03, 7.38905610e+00]])

In [None]:
np.sin(a)  # Returns the sine of each value

array([[ 0.84147098, -0.7568025 ,  0.90929743],
       [ 0.98935825,  0.98935825,  0.90929743]])

In [None]:
np.cos(a)

array([[ 0.54030231, -0.65364362, -0.41614684],
       [-0.14550003, -0.14550003, -0.41614684]])

In [None]:
np.tan(a) 

array([[ 1.55740772,  1.15782128, -2.18503986],
       [-6.79971146, -6.79971146, -2.18503986]])

In [None]:
np.log(a)

array([[0.        , 1.38629436, 0.69314718],
       [2.07944154, 2.07944154, 0.69314718]])

In [None]:
np.sqrt(a)

array([[1.        , 2.        , 1.41421356],
       [2.82842712, 2.82842712, 1.41421356]])

### Aggregate Functions

In [None]:
a

array([[1, 4, 2],
       [8, 8, 2]])

In [None]:
a.sum()

25

In [None]:
a.min()

1

In [None]:
a.max()

8

In [None]:
a.mean()

4.166666666666667

In [None]:
np.median(a)

3.0

In [None]:
b = np.array([2, 4, 9, 10, 12])
np.quantile(b, 0.25)

4.0

In [None]:
b = np.array([2,2,3,3,4,4,4,5,6,6,6,7,7,8,8])
np.quantile(b, 0.25)

6.5

In [None]:
np.std(a)  # Returns the standard deviation of the values in the array

2.852873794770615

### Manipulating Arrays

NumPy offers us numerous ways in which we can manipulate the data of
our arrays. Here, we are going to take a quick look at the most important
functions and categories of functions.
If you just want to change a single value however, you can just use the basic
indexing of lists.


In [None]:
a = np.array([[4, 2, 9],
              [8, 3, 2]])
print(a[1][2])
a[1][2] = 7
a[1][2]

2


7

#### Shape Manipulation Functions

One of the most important and helpful types of functions are the shape
manipulating functions. These allow us to restructure our arrays without
changing their values

In [None]:
a.reshape(3, 2)  # Returns an array with the same values structured in a different shape

array([[4, 2],
       [9, 8],
       [3, 7]])

In [None]:
a.flatten()  

array([4, 2, 9, 8, 3, 7])

In [None]:
a.ravel()  # Does the same as flatten but works with the actual array instead of a copy

array([4, 2, 9, 8, 3, 7])

In [None]:
a.T

array([[4, 8],
       [2, 3],
       [9, 7]])

In [None]:
a.flat # Not a function but an iterator for the flattened version of the array

<numpy.flatiter at 0x562910a06000>

#### Joining Functions

In [None]:
a

array([[4, 2, 9],
       [8, 3, 7]])

In [None]:
b

array([[1, 2, 3]])

In [None]:
np.concatenate((a, b))  # Note that we pass the parameters between parentheses

array([[4, 2, 9],
       [8, 3, 7],
       [1, 2, 3]])

In [None]:
np.stack((a[0], b[0]))  # Joins multiple arrays along a new axis

array([[4, 2, 9],
       [1, 2, 3]])

In [None]:
np.vstack((a,b))  # Stacks the arrays vertically (row-wise)

array([[4, 2, 9],
       [8, 3, 7],
       [1, 2, 3]])

What concatenate does is, it joins the arrays together by just appending one
onto the other. Stack on the other hand, creates an additional axis that
separates the two initial arrays.


#### Splitting Functions

In [None]:
a = np.array([
 [10, 20, 30],
 [40, 50, 60],
 [70, 80, 90],
 [100, 110, 120]])

In [None]:
np.split(a, 2)  # Splits one array into multiple arrays

[array([[10, 20, 30],
        [40, 50, 60]]), array([[ 70,  80,  90],
        [100, 110, 120]])]

In [None]:
np.hsplit(a, 3)  # plits one array into multiple arrays horizontally (column-wise)

[array([[ 10],
        [ 40],
        [ 70],
        [100]]), array([[ 20],
        [ 50],
        [ 80],
        [110]]), array([[ 30],
        [ 60],
        [ 90],
        [120]])]

In [None]:
np.vsplit(a, 4) # Splits one array into multiple arrays vertically (row-wise)

[array([[10, 20, 30]]),
 array([[40, 50, 60]]),
 array([[70, 80, 90]]),
 array([[100, 110, 120]])]

#### Adding and Removing

In [None]:
np.resize(a, (5, 5)) # Returns a resized version of the array and fills empty spaces by repeating copies of a

array([[ 10,  20,  30,  40,  50],
       [ 60,  70,  80,  90, 100],
       [110, 120,  10,  20,  30],
       [ 40,  50,  60,  70,  80],
       [ 90, 100, 110, 120,  10]])

In [None]:
np.append(a, 0)

array([ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120,   0])

In [None]:
np.append(a, [1, 2, 3])

array([ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100, 110, 120,   1,
         2,   3])

In [None]:
np.insert(a, [0, 1, -1, -2], -100)

array([-100,   10, -100,   20,   30,   40,   50,   60,   70,   80,   90,
        100, -100,  110, -100,  120])

In [None]:
np.delete(a, 2, 1)

array([[ 10,  20],
       [ 40,  50],
       [ 70,  80],
       [100, 110]])

### Loading and Saving Arrays

Now last but not least, we are going to talk about loading and saving
NumPy arrays. For this, we can use the integrated NumPy format or CSV-files.

#### Numpy Format

In [None]:
# np.save('myarray.npy', a) 

Notice that you don’t have to use the file ending npy. In this example, we
just use it for clarity. You can pick whatever you want.

Now, in order to load the array into our script again, we will need the load
function.

In [None]:
# a = np.load('myarray.npy')

#### CSV Format

As I already mentioned, we can also save our NumPy arrays into CSV files,
which are just comma-separated text files. For this, we use the function
savetxt.


In [None]:
# np.savetxt('myarray.csv', a)

In [None]:
# a = np.loadtxt('myarray.csv')