-----
# Numerical Python (NumPy) - Part 1
-----

*NumPy* is the fundamental package for numeric computing with Python. It is widely used in the data science community, as it works efficiently with arrays (another iterable data structure) and matrices. It also provides powerful ways to create, store, and/or manipulate data, which makes it able to seamlessly and speedily integrate with a wide variety of databases. This is also the foundation that Pandas is built on, which is a high-performance data-centric package that we will learn later in the course.

When studying NumPy, we will talk about creating arrays with certain data types, manipulating arrays, selecting elements from arrays, and loading datasets into arrays. Such functions are useful for manipulating data and understanding the functionalities of other common Python data packages.

In [2]:
import numpy as np

## Creating Arrays

An array is a data structure that stores a collection of items. Like lists, arrays are ordered, mutable (you can add and remove items to it), enclosed in square brackets, and able to store non-unique items.

So, what's the difference between arrays and lists?

(1) **Arrays need to be declared** using `np.array()`, lists don't.

(2) **Arrays can store data very compactly** and are more efficient for storing large amounts of data.

(3) **Arrays are great for numerical operations**; lists cannot directly handle math operations.



To create an array, you can create a list and convert it to a numpy array...

In [None]:
mylist = [1, 2, 3]
x = np.array(mylist)
x

... Or just pass in a list directly

In [None]:
y = np.array([4, 5, 6])
y

To create a "multidimensional" array (such as a matrix), we can pass in a list of lists.

In [3]:
m = np.array([[7, 8, 9], [10, 11, 12]])
m

array([[ 7,  8,  9],
       [10, 11, 12]])

To know the number of dimensions in a list, we use the `ndim` attribute.

In [4]:
m.ndim

2

To find out the length of each dimension in the array (the number of rows and columns in a 2D array / matrix), we can use the `shape` attribute.

In [5]:
m.shape

(2, 3)

Finally, we can use `dtype` to check the type of items in the array.

In [6]:
c = np.array([2.2, 5, 1.1])
c.dtype

dtype('float64')

Let's look at the data in `c`:

In [7]:
c

array([2.2, 5. , 1.1])

Note that numpy automatically converts integers, like 5, up to floats, since there is no loss of prescision. Numpy will try and give you the best data type format possible to keep your data types homogeneous, which means all the same, in the array

We can also create a sequence of numbers in an array with the `arange()` function. The first argument is the *starting bound* and the second argument is the *ending bound*, and the third argument is the *difference between each consecutive numbers*.

In [8]:
n = np.arange(0, 30, 2) # start at 0 count up by 2, stop before 30
n

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

Note that NumPy's `np.arange()` function is very similar to Python's built-in `range()` function. The main difference is that `.arange()` is a function of NumPy and returns an `ndarray` class (which stands for n-dimensional array), whereas Python's `range()` is actually an object and returns a range object.

While both are very similar, and serve the same purpose, I do have to highlight the two main differences between them:

  1. If you need to work with floating numbers (say iterate with half a step difference), then you can only do that with `np.arange()` as Python's `range()` doesn't work except with integers.

  2. `np.arange()` is much faster in creating sequences because of the way that the `np.ndarray` class is actually implemented and works under the hood. However, if you need to work with numbers on demand and you are concerned that you might have issues with the memory, then `range()` is a better choice if you can work with integers.

### Useful Functions (for creating arrays)

1. `reshape` returns an array with the same data but with a new shape. (Note that `reshape` is a function of the array and not numpy)

In [9]:
n = n.reshape(3, 5) # reshape array to be 3x5
n

array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])

2. `linspace` returns evenly spaced numbers over a specified interval.

In [15]:
o = np.linspace(0, 4, 9) # returns 9 evenly spaced values from 0 to 4
o

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ])

3. `resize` changes the shape and size of array in-place.

(The difference between `.reshape()` and `.resize()` is that the first does not change the original array but only returns the changed array, whereas the `.resize()` method returns nothing and directly changes the original array.)

In [11]:
o.resize(3, 3)
o

array([[0. , 0.5, 1. ],
       [1.5, 2. , 2.5],
       [3. , 3.5, 4. ]])

Sometimes we know the shape of an array that we want to create, but not what we want to be in it. NumPy offers several functions to create arrays with initial placeholders, such as zero's or one's.

4. `ones` returns a new array of given shape and type, filled with ones.

In [12]:
np.ones((3, 2))

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

5. `zeros` returns a new array of given shape and type, filled with zeros.



In [13]:
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

6. `random.rand()` returns an array with random numbers.

In [16]:
np.random.rand(2,3)

array([[0.25886987, 0.30521197, 0.01322924],
       [0.62400233, 0.37402184, 0.27072583]])

Note: You'll see `zeros`, `ones`, and `rand` used quite often to create example arrays, especially in stack overflow posts and other forums.

7. `eye` returns a 2-D array with ones on the diagonal and zeros elsewhere, also known as an "identity matrix".

In [17]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

8. `diag` extracts a diagonal or constructs a diagonal array.

In [18]:
x = np.array([2, 5, 3, 5])

np.diag(x)

array([[2, 0, 0, 0],
       [0, 5, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 5]])

9. Create an array with a repeated sequence using the repeating list property using the `*` (or see `np.tile`).

In [20]:
np.array([1, 2, 3] * 3)

array([1, 2, 3, 1, 2, 3, 1, 2, 3])

10. `repeat` repeats elements of an array.

In [21]:
np.repeat([1, 2, 3], 3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3])

### Combining Arrays

Say we have 2 two-dimensional arrays as follows, and wish to combine them into a single two-dimensional array.

In [22]:
p = np.ones([2, 3], int)
print(p)

r = np.random.rand(2,3)
print(r)

[[1 1 1]
 [1 1 1]]
[[0.57952484 0.07027974 0.3974206 ]
 [0.55875023 0.01255678 0.55301728]]


Use `vstack` to stack arrays in sequence vertically (row wise).

In [23]:
np.vstack([p, r])

array([[1.        , 1.        , 1.        ],
       [1.        , 1.        , 1.        ],
       [0.57952484, 0.07027974, 0.3974206 ],
       [0.55875023, 0.01255678, 0.55301728]])

Use `hstack` to stack arrays in sequence horizontally (column wise).

In [24]:
np.hstack([p, r])

array([[1.        , 1.        , 1.        , 0.57952484, 0.07027974,
        0.3974206 ],
       [1.        , 1.        , 1.        , 0.55875023, 0.01255678,
        0.55301728]])

## Array Operations
We can do many things on arrays, such as mathematical manipulation (addition, subtraction, square, exponents) as well as use boolean arrays. We can also do matrix manipulation such as product, transpose, inverse, and so forth.

###Mathematical Manipulation

Unlike lists, arrays can directly handle math operations.

For example, let's create a couple of arrays. Now, let's use `+`, `-`, `*`, `/` and `**` to perform element-wise addition, subtraction, multiplication, division and power.

In [25]:
x = np.array([1,2,3])
y = np.array([4,5,6])

print(x)
print(y)
print(x + y) # elementwise addition     [1 2 3] + [4 5 6] = [5  7  9]
print(x - y) # elementwise subtraction  [1 2 3] - [4 5 6] = [-3 -3 -3]

[1 2 3]
[4 5 6]
[5 7 9]
[-3 -3 -3]


In [26]:
print(x * y) # elementwise multiplication  [1 2 3] * [4 5 6] = [4  10  18]
print(x / y) # elementwise divison         [1 2 3] / [4 5 6] = [0.25  0.4  0.5]

[ 4 10 18]
[0.25 0.4  0.5 ]


In [27]:
print(x**y) # elementwise power  [1 2 3] ^[4 5 6] =  [1^4 2^5 3^6] = [1 32 729]

[  1  32 729]


And also unlike lists, it can even handle math manipulations with scalars.

Let's create a list and an array and see how an array handles `+` and `*` vs how we know lists handle them:

In [28]:
x = [1,2,3]
y = np.array([1,2,3])

print(x+[3]) #addition can only be done to concatenate another list
print(y+[3]) #addition is elementwise, be it scalar or list or array

[1, 2, 3, 3]
[4 5 6]


In [29]:
print(x*3) # * is requivalent to repeat in lists
print(y*3) # * is elementwise multiplication in arrays

[1, 2, 3, 1, 2, 3, 1, 2, 3]
[3 6 9]


Arrays can also handle `-`, '/', and '**' with scalars, while lists would just return errors.

### Math Functions

NumPy also has many built in math functions that can be performed on arrays.

Let's look at examples of the commonly used ones:

In [30]:
a = np.array([-4, -2, 1, 3, 5])

In [31]:
a.sum()

3

In [32]:
a.max()

5

In [34]:
a.min()

-4

In [33]:
a.mean()

0.6

In [35]:
a.std()

3.2619012860600183

NumPy also has other functions, `argmax` and `argmin`, that return the index of the maximum and minimum values in the array.

Remember, indexing always starts at 0.

In [36]:
a.argmax()

4

In [37]:
a.argmin()

0

### Matrix Functions

Besides elementwise manipulation, it is important to know that numpy supports matrix manipulation as well.

Let's look at matrix product. If we want to do elementwise product, we use the "*" sign.

In [38]:
A = np.array([[1,1],[0,1]])
B = np.array([[2,0],[3,4]])

print(A*B)

[[2 0]
 [0 4]]


If we want to do matrix dot product, we can use the "@" sign or use the `.dot()` function

$ \begin{bmatrix}x_1 \ x_2 \ x_3\end{bmatrix}
\cdot
\begin{bmatrix}y_1 \\ y_2 \\ y_3\end{bmatrix}
= x_1 y_1 + x_2 y_2 + x_3 y_3$

In [39]:
x = np.array([1,2,3])
y = np.array([4,5,6])

print(x@y) # dot product  1*4 + 2*5 + 3*6

print(x.dot(y)) #same result

32
32


Let's look at transposing arrays. Transposing permutes the dimensions of the array, i.e. switches the rows and columns.

In [40]:
#Let's create a two dimensional array where the second row is the the squared values of the first
z = np.array([y, y**2])
z

array([[ 4,  5,  6],
       [16, 25, 36]])

Just like in a two dimensional list, we can use `.shape` to find the size of the dimensions of an array.

The shape of array `z` is `(2,3)` before transposing.

In [41]:
z.shape

(2, 3)

You can transpose using `.T` (i.e. row 1 becomes col 1, row 2 become col 2, etc)

In [42]:
z.T

array([[ 4, 16],
       [ 5, 25],
       [ 6, 36]])

You can see that the number of rows has swapped with the number of columns.

In [43]:
z.T.shape

(3, 2)

This is just a taste of how NumPy deals with matrix manipulation. You don't have to worry about complex matrix operations for this course, but it's important to know that numpy is the underpinning of scientific computing libraries in python, and that it is capable of doing both element-wise operations as well as matrix-level operations.