<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/Python-Notebook-Banners/Examples.png"  style="display: block; margin-left: auto; margin-right: auto;";/>
</div>

# Examples: Introduction to NumPy
© ExploreAI Academy

 In this train, we'll learn about NumPy and some of its basic operations. We'll define what an array is, learn how to create, access and modify an array, and explore useful functions on arrays.

## Learning Objectives
* Understand the NumPy Python library and its basic functionality.
* Know how to load, manipulate, and analyse data using NumPy.


NumPy is a Python package that allows us to work with data and perform operations like the loading, analysing and storing of data. It provides high-performance, multidimensional array objects, numerical computing tools, and is fundamental in scientific computing. It is the core library for scientific computing in Python ([see the full documentation](https://numpy.org/)).

### What is an array?
An array in the context of NumPy, a widely-used Python library, is a fundamental data structure designed for efficient storage and manipulation of numerical data. Although it shares some similarities with Python lists, there are key differences that make NumPy arrays more suitable for data analysis and scientific computing. Some of these key differences are:
- Lists can contain heterogeneous data types (combinations of `str`, `int`, even `list`), whilst NumPy arrays can only store values of the same data type. This restriction allows NumPy to optimise storage and computation, as operations on homogeneously-typed elements are more efficient.
- NumPy arrays can be thought of as a grid of values and can be multi-dimensional. This multidimensional nature makes them extremely versatile for a wide range of applications, from simple lists of values (1D) to images (2D) or even higher-dimensional data.
- NumPy arrays are stored more efficiently than Python lists. Lists in Python are essentially arrays of pointers to objects, with each pointer also requiring additional space. In contrast, NumPy arrays directly store data in adjacent blocks of memory. This compact and continuous storage reduces the memory footprint and improves cache utilisation, which is critical for performance in large-scale computations.
- One of the most powerful features of NumPy arrays is their support for vectorised operations. This means that operations on arrays, such as addition, multiplication, and complex mathematical functions, can be performed on all elements of the array simultaneously. This is not only more syntactically convenient but also significantly faster than looping over elements, as is common with Python lists.


### Creating a NumPy array
If we want to work with any NumPy objects or functions we first need to import the NumPy library.


In [None]:
import numpy as np

### Example 1

To create a NumPy array, we use the `np.array()` function. All we need to do is pass a list to it and optionally, we can specify the data type of the data. Let's look at an example:

In [None]:
# Create an array by passing in a list of lists.
ratings = np.array([[94, 89, 63, 45], [93, 92, 48, 23], [92, 94, 56, 98]])

ratings

array([[94, 89, 63, 45],
       [93, 92, 48, 23],
       [92, 94, 56, 98]])

We can see that there are 3 rows and 4 columns in this array. We can inspect the shape of the array by using its `shape` attribute. This will return a tuple of integers giving the size of the array along each dimension.

In [None]:
ratings.shape

(3, 4)

### Example 2

We can also create arrays using the following functions:

* `np.ones()` - array of ones
* `np.zeros()` - array of zeros
* `np.random.random()` - array with random values

Let's look at some examples:

In [None]:
# Create an array of ones - pass in shape as a tuple.
np.ones((3,4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [None]:
# Create an array of zeros - pass in shape as a tuple.
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [None]:
# Create an array of random values - pass in shape as a tuple.
np.random.random((3, 3))

array([[0.96558396, 0.74917461, 0.90458072],
       [0.95481794, 0.55636968, 0.55671106],
       [0.13687977, 0.32180334, 0.72262061]])

### Accessing NumPy arrays
NumPy offers several ways to index into arrays. Here, we'll work through slicing, and Boolean array indexing.

### Example 3

Similar to Python lists, NumPy arrays can be sliced. Since arrays may be multi-dimensional, we must specify a slice for each dimension of the array, where the slices per dimension is split by a comma. For a 2-D array, the first dimension is the vertical axis while the second dimension is the horizontal axis.

For a 2-D array:

* `np.array[vertical index , horizontal index]` - for one element
* `np.array[vertical start:vertical end , horizontal start:horizontal end]` - for more than one element

Let's look at a few examples:

In [None]:
# Whole array.
ratings

array([[94, 89, 63, 45],
       [93, 92, 48, 23],
       [92, 94, 56, 98]])

In [None]:
# Select top left element.
ratings[0, 0]

94

In [None]:
# Select first row.
ratings[0, :]

array([94, 89, 63, 45])

In [None]:
# Select first column.
ratings[:, 0]

array([94, 93, 92])

In [None]:
# Select first two rows and first two columns.
ratings[0:2, 0:2]

array([[94, 89],
       [93, 92]])

### Example 4

Boolean array indexing let's us pick out a selection of elements from an array. This type of indexing is often used to select the elements of an array which satisfy a specific condition. The syntax is as follows:

* `np.array[condition]`

Let's look at an example:

In [22]:
# Select all values greater than 90.
ratings[ratings > 90]

array([94, 93, 92, 92, 94, 98])

### Modifying NumPy arrays
We will now look at how to add elements to an array, followed by how to remove elements from an array.

### Example 5
Adding elements can be done by using the `np.append()` function. This will add elements to the end of an array. Let's look at an example:

In [None]:
# Append an extra row - note that axis=0.
ratings_extra_row = np.append(ratings, [[92, 88, 78, 55]], axis=0)

ratings_extra_row

array([[94, 89, 63, 45],
       [93, 92, 48, 23],
       [92, 94, 56, 98],
       [92, 88, 78, 55]])

### Example 6
Deleting elements can be done by using the `np.delete()` function. This will delete elements at the specified indices. Let's look at an example:

In [None]:
# Delete the 3rd row - note that axis=0.
ratings_del_row = np.delete(ratings, [2], axis=0)

ratings_del_row

array([[94, 89, 63, 45],
       [93, 92, 48, 23]])

### Functions on NumPy arrays
There are various functions we can perform on NumPy arrays. We will look at a few of them, namely:

* `np.array.sum()`
* `np.array.min()`
* `np.array.max()`
* `np.array.argmax()`
* `np.array.argmin()`

### Example 7
In all these functions we can pass an axis argument. When `axis = 1`, we are looking at rows, and when `axis = 0`, we are referring to columns. This will allow us to sum the columns and rows separately. Let's look at examples:

In [None]:
# Sum of all elements in array.
ratings.sum()

887

In [None]:
# Sum of each row.
ratings.sum(axis=1)

array([291, 256, 340])

In [None]:
# Sum of each column.
ratings.sum(axis=0)

array([279, 275, 167, 166])

In [None]:
# Min of each row.
ratings.min(axis=1)

array([45, 23, 56])

### Example 8
`argmax` and `argmin` are functions in NumPy used to find the indices of the maximum and minimum values, respectively, in an array.

* `np.argmax(array, axis=None)`: This function returns the indices of the maximum value along an axis. If the axis is not specified, `argmax` will return the index of the maximum value in the flattened array. If the axis is specified (e.g., 0 for columns, 1 for rows in a 2D array), it will return the indices of the maximum values along that axis.
* `np.argmin(array, axis=None)`: Similarly, `argmin` finds the indices of the minimum values. Without specifying an axis, it returns the index of the minimum value in the entire array. If an axis is specified, it returns the indices of the minimum values along that axis.

In [None]:
# Finds the index of the maximum value in the entire array
overall_max_index = np.argmax(ratings)

In [None]:
# Finds the index of the minimum value along each column
min_indices_per_column = np.argmin(ratings, axis=0)

### When to use NumPy arrays

A NumPy array is a data structure which stores multiple values, all of the same data type, and can be multi-dimensional. We should use a NumPy array if all of the following statements hold:

* We have multi-dimensional data
* All entries are of the same data type

NumPy arrays are also preferred over list of lists due to efficiency and functionality.

## Additional resources

- [Basic NumPy Functionality](https://numpy.org/doc/stable/user/absolute_beginners.html#what-is-an-array)
- [NumPy package home page](https://numpy.org)

#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>