# 2. NumPy

**NumPy** (Numerical Python) is an open source Python library that’s used in almost every field of science and engineering. It’s the universal standard for working with numerical data in Python, and it’s at the core of the scientific Python and PyData ecosystems. NumPy users include everyone from beginning coders to experienced researchers doing state-of-the-art scientific and industrial research and development. The NumPy API is used extensively in Pandas, SciPy, Matplotlib, scikit-learn, scikit-image and most other data science and scientific Python packages.

The **NumPy** library contains multidimensional array and matrix data structures (you’ll find more information about this in later sections). It provides ndarray, a homogeneous n-dimensional array object, with methods to efficiently operate on it. **NumPy** can be used to perform a wide variety of mathematical operations on arrays. It adds powerful data structures to Python that guarantee efficient calculations with arrays and matrices and it supplies an enormous library of high-level mathematical functions that operate on these arrays and matrices.

## 2.1. How to use NumPy

**import** means that you will use some library

To access NumPy and its functions **import** it in your Python code like this:

In [1]:
import numpy as np

We shorten the imported name to `np` for better readability of code using NumPy. This is a widely adopted convention that makes your code more readable for everyone working on it. We recommend to always use import numpy as `np`.

## 2.2 NumPy array

### 2.2.1 What’s the difference between a Python list and a NumPy array?

NumPy gives you an enormous range of fast and efficient ways of creating arrays and manipulating numerical data inside them. While a **Python list** can contain different data types within a single list, all of the elements in a **NumPy array** should be homogeneous. The mathematical operations that are meant to be performed on arrays would be extremely inefficient if the arrays weren’t homogeneous.

**Why use NumPy?**

**NumPy arrays** are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.

### 2.2.2 What is an array?

An **array** is a central data structure of the NumPy library. An array is a grid of values and it contains information about the raw data, how to locate an element, and how to interpret an element. It has a grid of elements that can be indexed in various ways. The elements are all of the same type, referred to as the array `dtype`.

An array can be indexed by a tuple of nonnegative integers, by booleans, by another array, or by integers. The `rank` of the array is the number of dimensions. The `shape` of the array is a tuple of integers giving the size of the array along each dimension.

One way we can initialize NumPy arrays is from Python lists, using nested lists for two- or higher-dimensional data.

For example:

In [2]:
a = np.array([1, 2, 3, 4, 5, 6])

or

In [3]:
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

We can access the elements in the array using **square brackets**. When you’re accessing elements, remember that indexing in NumPy starts at 0. That means that if you want to access the **first element** in your array, you’ll be accessing **element “0”**.

In [4]:
print(a[0])

[1 2 3 4]


### 2.2.3. ndarray

'ndarray' is shorthand for N-dimensional array. An N-dimensional array is simply an array with any number of dimensions. (N=1,2,3,...)
The NumPy `ndarray` class is used to represent both matrices and vectors. A **vector** is an array with a single dimension (there’s no difference between row and column vectors), while a **matrix** refers to an array with two dimensions.

**What are the attributes of an array?**

An `array` is usually a fixed-size container of items of the same type and size. The number of dimensions and items in an array is defined by its **shape**. The **shape** of an array is a tuple of non-negative integers that specify the sizes of each dimension.

In NumPy, dimensions are called **axes**. This means that if you have a 2D array that looks like this:

In [5]:
a = np.array([[0., 0., 0.],
              [1., 1., 1.]])
print(a)

[[0. 0. 0.]
 [1. 1. 1.]]


Your array has **2 axes**. The first axis has a length of 2 and the second axis has a length of 3.

Just like in other Python container objects, the contents of an array can be accessed and modified by **indexing** or **slicing** the array.

### 2.2.4. How to create a basic array

This section covers `np.array()`, `np.zeros()`, `np.ones()`, `np.empty()`, `np.arange()`, `np.linspace()`

To create a NumPy array, you can use the function `np.array()`.

All you need to do to create a simple array is pass a list to it. If you choose to, you can also specify the type of data in your list.

In [6]:
import numpy as np

a = np.array([1, 2, 3])

You can visualize your array this way:

![](https://numpy.org/doc/stable/_images/np_array.png)

Besides creating an array from a sequence of elements, you can easily create an array filled with `0`s using `np.zeros()`:

In [7]:
np.zeros(2)

array([0., 0.])

Or an array filled with `1`s using `np.ones(2)`:

In [8]:
np.ones(2)

array([1., 1.])

Or even an empty array! The function `np.empty()` creates an array whose initial content is random and depends on the state of the memory.

In [9]:
# Create an empty array with 2 elements
np.empty(2)

array([1., 1.])

You can create an array with a range of elements using `np.arange()` (0~N):

In [10]:
np.arange(4)

array([0, 1, 2, 3])

And even an array that contains a range of evenly spaced intervals. To do this, you will specify the **first number**, **last number**, and the **step size**.

In [11]:
# first number = 1
# last number = 9     [Caution] Last number means, arange cannot reach last number (<9)
# each step size = 2
np.arange(1, 9, 2)
# 1, 1+2, 1+2+2, 1+2+2+2

array([1, 3, 5, 7])

You can also use `np.linspace()` to create an array with values that are spaced linearly in a specified interval:

In [12]:
# Devide [0, 10] into four intervals
np.linspace(0, 10, num=5)

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

### 2.2.5. Adding, removing, and sorting elements

This section covers `np.sort()`, `np.concatenate()`

Sorting an element is simple with `np.sort()`. You can specify the axis, kind, and order when you call the function.

If you start with this array:

In [13]:
arr = np.array([2, 1, 5, 3, 7, 4, 6, 8])

You can quickly sort the numbers in ascending order with:

In [14]:
np.sort(arr)

array([1, 2, 3, 4, 5, 6, 7, 8])

If you start with these arrays:

In [15]:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

You can concatenate them with `np.concatenate()`.

In [16]:
np.concatenate((a, b))

array([1, 2, 3, 4, 5, 6, 7, 8])

Or, if you start with these arrays:

In [17]:
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6]])

You can concatenate them with:

In [18]:
# axis=0 means, concatenate two arrays by row
np.concatenate((x, y), axis=0)

# If you want to concatenate by column, you can use axis=1

array([[1, 2],
       [3, 4],
       [5, 6]])

### 2.2.6. How do you know the shape and size of an array?

This section covers `ndarray.ndim`, `ndarray.size`, `ndarray.shape`

`ndarray.ndim` will tell you the number of axes, or dimensions, of the array.

`ndarray.size` will tell you the total number of elements of the array. This is the product of the elements of the array’s shape.

`ndarray.shape` will display a tuple of integers that indicate the number of elements stored along each dimension of the array. If, for example, you have a 2-D array with 2 rows and 3 columns, the shape of your array is `(2, 3)`.

For example, if you create this array:

In [19]:
array_example = np.array([[[0, 1, 2, 3],
                           [4, 5, 6, 7]],

                          [[0, 1, 2, 3],
                           [4, 5, 6, 7]],

                          [[0 ,1 ,2, 3],
                           [4, 5, 6, 7]]])

To find the number of dimensions of the array, run:

In [20]:
array_example.ndim

3

To find the total number of elements in the array, run:

In [21]:
array_example.size

24

And to find the shape of your array, run:

In [22]:
array_example.shape

(3, 2, 4)

### 2.2.7. Can you reshape an array?

This section covers `arr.reshape()`

**Yes!**

Using `arr.reshape()` will give a new shape to an array without changing the data. Just remember that when you use the reshape method, the array you want to produce needs to have the **same number of elements** as the original array. If you start with an array with 12 elements, you’ll need to make sure that your new array also has a total of 12 elements.

If you start with this array:

In [23]:
a = np.arange(6)
print(a)

[0 1 2 3 4 5]


You can use `reshape()` to reshape your array. For example, you can reshape this array to an array with three rows and two columns:

In [24]:
# 6*1 -> 3*2
b = a.reshape(3, 2)
print(b)

[[0 1]
 [2 3]
 [4 5]]


### 2.2.8. Indexing and slicing

You can **index** and **slice** NumPy arrays in the same ways you can slice Python lists.

In [25]:
data = np.array([1, 2, 3])

In [26]:
# Always remember. Indexing starts from 0
data[1]

2

In [27]:
data[0:2]

array([1, 2])

In [28]:
data[1:]

array([2, 3])

In [29]:
data[-2:]

array([2, 3])

You can visualize it this way:

![](https://numpy.org/doc/stable/_images/np_indexing.png)

You may want to take a section of your array or specific array elements to use in further analysis or additional operations. To do that, you’ll need to subset, slice, and/or index your arrays.

If you want to select values from your array that fulfill certain **conditions**, it’s straightforward with NumPy.

For example, if you start with this array:

In [30]:
a = np.array([[1 , 2, 3, 4], 
              [5, 6, 7, 8], 
              [9, 10, 11, 12]])

You can easily print all of the values in the array that are less than 5.

In [31]:
print(a[a < 5])

[1 2 3 4]


You can also select, for example, numbers that are equal to or greater than 5, and use that condition to index an array.

In [32]:
five_up = (a >= 5)
print(a[five_up])

# same as a[a >= 5]

[ 5  6  7  8  9 10 11 12]


You can select elements that are divisible by 2:

In [33]:
divisible_by_2 = a[a%2==0]
print(divisible_by_2)

[ 2  4  6  8 10 12]


Or you can select elements that satisfy two conditions using the `&` and `|` operators:

In [34]:
# Remember? & means 'and' operation
c = a[(a > 2) & (a < 11)]
print(c)

[ 3  4  5  6  7  8  9 10]


## 2.3. Array operations

## 2.3.1. Basic array opeations

This section covers addition, subtraction, multiplication, division, and more

Once you’ve created your arrays, you can start to work with them. Let’s say, for example, that you’ve created two arrays, one called **“data”** and one called **“ones”**

![](https://numpy.org/doc/stable/_images/np_array_dataones.png)

In [35]:
data = np.array([1, 2])
ones = np.ones(2, dtype=int)

You can add the arrays together with the plus sign.

In [36]:
data + ones

array([2, 3])

![](https://numpy.org/doc/stable/_images/np_data_plus_ones.png)

You can, of course, do more than just addition!

In [37]:
data - ones

array([0, 1])

In [38]:
data * data

array([1, 4])

In [39]:
data / data

array([1., 1.])

![](https://numpy.org/doc/stable/_images/np_sub_mult_divide.png)

Basic operations are simple with NumPy. If you want to find the sum of the elements in an array, you’d use `sum()`. This works for 1D arrays, 2D arrays, and arrays in higher dimensions.

In [40]:
a = np.array([1, 2, 3, 4])
a.sum() # 1+2+3+4=10

10

To add the rows or the columns in a 2D array, you would specify the axis.

If you start with this array:

In [41]:
b = np.array([[1, 1], [2, 2]])

You can sum over the axis of rows with:

In [42]:
b.sum(axis=0)

array([3, 3])

You can sum over the axis of columns with:

In [43]:
b.sum(axis=1)

array([2, 4])

### 2.3.2. Broadcasting

There are times when you might want to carry out an **operation between an array and a single number** (also called an operation between a vector and a scalar) or between arrays of two different sizes. For example, your array (we’ll call it **“data”**) might contain information about distance in miles but you want to convert the information to kilometers. You can perform this operation with:

In [44]:
data = np.array([1.0, 2.0])
data * 1.6

array([1.6, 3.2])

![](https://numpy.org/doc/stable/_images/np_multiply_broadcasting.png)

NumPy understands that the multiplication should happen with each cell. That concept is called **broadcasting**. **Broadcasting** is a mechanism that allows NumPy to perform operations on arrays of different shapes. The dimensions of your array must be compatible, for example, when the dimensions of both arrays are equal or when one of them is 1. If the dimensions are not compatible, you will get a ValueError.

### 2.3.3. More useful array operations

This section covers **maximum, minimum, sum, mean, product, standard deviation**

NumPy also performs aggregation functions. In addition to `min`, `max`, and `sum`, you can easily run `mean` to get the average, `prod` to get the result of multiplying the elements together, `std` to get the standard deviation, and more.

In [45]:
data = np.array([1.0, 2.0, 3.0])

In [46]:
data.max()

3.0

In [47]:
data.min()

1.0

In [48]:
data.sum()

6.0

![](https://numpy.org/doc/stable/_images/np_aggregation.png)

Let’s start with this array, called “a”

In [49]:
a = np.array([[0.45053314, 0.17296777, 0.34376245, 0.5510652],
              [0.54627315, 0.05093587, 0.40067661, 0.55645993],
              [0.12697628, 0.82485143, 0.26590556, 0.56917101]])

It’s very common to want to aggregate along a **row** or **column**. By default, every NumPy aggregation function will return the aggregate of the entire array. To find the sum or the minimum of the elements in your array, run:

In [50]:
a.sum()

4.8595784

Or:

In [51]:
a.min()

0.05093587

You can specify on which axis you want the aggregation function to be computed. For example, you can find the minimum value within each **column** by specifying `axis=0`.

In [52]:
a.min(axis=0)

array([0.12697628, 0.05093587, 0.26590556, 0.5510652 ])

The four values listed above correspond to the number of columns in your array. With a four-column array, you will get four values as your result.

### 2.3.4. Transposing and reshaping a matrix

This section covers `arr.reshape()`, `arr.transpose()`, `arr.T`

It’s common to need to **transpose** your matrices. NumPy arrays have the property `T` that allows you to transpose a matrix.

![](https://numpy.org/doc/stable/_images/np_transposing_reshaping.png)

In [53]:
data = np.array([[1,2],[3,4],[5,6]])
print(data)

[[1 2]
 [3 4]
 [5 6]]


In [54]:
print(data.T)

[[1 3 5]
 [2 4 6]]


You can also use .transpose() to reverse or change the axes of an array according to the values you specify.

In [55]:
print(data.transpose())

[[1 3 5]
 [2 4 6]]


You may also need to **switch the dimensions** of a matrix. This can happen when, for example, you have a model that expects a certain input shape that is different from your dataset. This is where the `reshape` method can be useful. You simply need to pass in the new dimensions that you want for the matrix.

In [56]:
data.reshape(2, 3)

array([[1, 2, 3],
       [4, 5, 6]])

In [57]:
data.reshape(3, 2)

array([[1, 2],
       [3, 4],
       [5, 6]])

![](https://numpy.org/doc/stable/_images/np_reshape.png)

### 2.3.5. Numpy linear algebra

Numpy provides many functions to support linear algebra operations.

Let's get back to `*` operation we learned above. The `*` operation was **element-wise** operation, which means that it outputs the array with element-wise multiplication.

[[1,2],[3,4]] `*` [[10, 20],[30,40]] --> [[1`*`10,2`*`20],[3`*`30,4`*`40]]=[[10,40],[90,160]]

In [58]:
x = np.array([[1,2],[3,4]])
y = np.array([[10,20],[30,40]])

In [59]:
print(x*y)

[[ 10  40]
 [ 90 160]]


However, there are also operations for linear algebra. For example, the matrix multiplication. We can do matrix multiplication by using `.dot()`

In [60]:
print(x.dot(y))

[[ 70 100]
 [150 220]]


Or, you can also use `np.matmul()`

In [61]:
print(np.matmul(x,y))

[[ 70 100]
 [150 220]]


If the matrix is invertable, we can get the inverse of matrix using `np.linalg.inv()`

In [62]:
print(np.linalg.inv(x))

[[-2.   1. ]
 [ 1.5 -0.5]]


In [63]:
x.dot(np.linalg.inv(x))

array([[1.0000000e+00, 0.0000000e+00],
       [8.8817842e-16, 1.0000000e+00]])

### 2.3.6. Working with mathematical formulas

The ease of implementing mathematical formulas that work on arrays is one of the things that make NumPy so widely used in the scientific Python community.

For example, this is the **mean square error** formula (a central formula used in supervised machine learning models that deal with regression):

![](https://numpy.org/doc/stable/_images/np_MSE_formula.png)

Implementing this formula is simple and straightforward in NumPy:

![](https://numpy.org/doc/stable/_images/np_MSE_implementation.png)

What makes this work so well is that `predictions` and `labels` can contain one or a thousand values. They only need to be the same size.

You can visualize it this way:

![](https://numpy.org/doc/stable/_images/np_mse_viz1.png)

In this example, both the predictions and labels vectors contain three values, meaning `n` has a value of three. After we carry out subtractions the values in the vector are squared. Then NumPy sums the values, and your result is the error value for that prediction and a score for the quality of the model.

![](https://numpy.org/doc/stable/_images/np_mse_viz2.png)
![](https://numpy.org/doc/stable/_images/np_MSE_explanation2.png)

In [64]:
predictions = np.array([1,1,1])
labels = np.array([1,2,3])

In [65]:
error = (1/3) * np.sum(np.square(predictions - labels))
print(error)

1.6666666666666665


In [66]:
error = np.mean(np.square(predictions - labels))
print(error)

1.6666666666666667


## 2.4. Exercises

1.. Define `data` as the numpy array having value of matrix as 
[[1,2],
 [3,4],
  [5,6]]

1-1. Get a numpy array which is equal to the every element of `data` multiplied by **3**

1-2. What is the shape of `data`? How many rows and columns?

1-3. Reshape the shape of `data` to (2,3)

1-4. Output the transpose of `data`

2. Output a sequence of equally gapped 5 numbers in the range 0 to 100 (both inclusive)

(The output should be [0, 25, 50, 75, 100])

3. Given 2 numpy arrays `a` and `b` as follows, output the result of multiplying the 2 matrices

In [67]:
a = np.array([[1,2,3],
              [4,5,6],
              [7,8,9]])

b = np.array([[2,3,4],
              [5,6,7],
              [8,9,10]])

3-1. Define a new variable named `c` and make its value as the output of Exercise #3

In [68]:
c =

SyntaxError: invalid syntax (<ipython-input-68-0e8c8615afae>, line 1)

3-2. Find the minimum and maximum value of elements in `c`

3-3. Calculate the sum of elements in `c`

4. Solve the equation:

x_1 + 2x_2 = 8

3x_1 + 4x_2 = 22

What is the value of x_1 and x_2?

Can we solve it using matrix notation and NumPy operations? (x = [x1, x2])

In [None]:
# Hint
a = np.array([[1,2],
              [3,4]])

y = np.array([8,22])
