# How to write efficient code

In this notebook, you will learn about
- Slicing
- Advanced indexing
- Broadcasting
- How to convert a 1D array into a 2D array (how to add a new axis to an array)
- Views/copies

---

## What is efficient code?

NumPy was created with the goal of making scientific computing in Python possible (and with good performance). While its high-level Python syntax makes it accessible and easy to learn, the core of NumPy is well-optimized C code. Let's see how to take full advantage of this efficient implementation.

We will continue using the example of [Notebook 1](01_Intro.ipynb):

In [None]:
import numpy as np
import pandas as pd

quality_of_life = pd.read_csv('../data/quality_of_life_index.csv')
quality_index = np.array(quality_of_life['Quality of Life Index'])
quality_cost_pollution = np.array(quality_of_life[['Quality of Life Index', 'Cost of Living Index', 'Pollution Index']]) 

## Slicing

NumPy allows you to select items in an array not only individually, but as a group. For example, you can take up a *slice* (a sub-array) of a NumPy array by using the same slicing syntax as you would use with Python lists, extending this concept to N dimensions. To select the top 5 quality of life indices from our array, we can do

In [None]:
top_quality = quality_index[0:5]
print(top_quality)

Note that

In [None]:
top_quality.shape

Consider now our 2-d array

In [None]:
quality_cost_pollution

If we want to select the first 5 rows of this 2-d array, we can use the following syntax:

In [None]:
quality_cost_pollution[0:5, :]

(Note that the colon `:` denotes we didn't make any explicit choice of indices for this axis, which in this case means we take all columns for the result)

If instead we wanted to choose the first two columns, with all rows, we would do

In [None]:
quality_cost_pollution[:, 0:2]

**Note** You may use slicing to set values in the array, but (unlike lists) you can never grow the array using slicing. For that, you need to create a new array with the appropriate size and copy the data to this new object.

## Advanced indexing

In addition to selecting elements with integer or tuple indices, NumPy implements *advanced indexing* techniques, allowing us to use ndarrays or boolean objects as indices. For example, suppose we want to select all elements in our `quality_index` array above a certain value - say 200. We can use

In [None]:
quality_index[quality_index > 200]

For example, let's say we want to select only the values larger than the array average. We can do this by

In [None]:
quality_index[quality_index >= np.mean(quality_index)]

Note that it is also possible to select elements from an array using another array (or a list, or tuple). For example:

In [None]:
top_quality = quality_index[0:5]
print(top_quality)

In [None]:
top_quality[[1, 1, 2, 3]]

**Note** Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view). We will talk about copies and views later.

## Broadcasting

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.

NumPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays must have exactly the same shape, as in the following example:

In [None]:
a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0, 2.0, 2.0])
a * b

NumPy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain constraints. The simplest broadcasting example occurs when an array and a scalar value are combined in an operation:

In [None]:
a = np.array([1.0, 2.0, 3.0])
b = 2.0
a * b

We can think of the scalar `b` being stretched during the arithmetic operation into an array with the same shape as `a`. The new elements in `b`, as shown in the figure below, are simply copies of the original scalar.

!["A scalar is broadcast to match the shape of the 1-d array it is being multiplied to."](https://numpy.org/devdocs/_images/broadcasting_1.svg)

*TODO: think of an example involving the current application.*

### Exercises

## How to reshape, flatten and increase the dimensions of an array

You can use `np.newaxis` and `np.expand_dims` to increase the dimensions of your existing array.

Using `np.newaxis` will increase the dimensions of your array by one dimension when used once. This means that a 1D array will become a 2D array, a 2D array will become a 3D array, and so on.

In [None]:
# TODO: example 

There are two popular ways to flatten an array: `np.flatten()` and `np.ravel()`. The primary difference between the two is that the new array created using `np.ravel()` is actually a reference to the parent array (i.e., a “view”). This means that any changes to the new array will affect the parent array as well. Since ravel does not create a copy, it’s memory efficient.

## Views and copies

Behind the scenes, the NumPy array is a contiguous block of memory consisting of two parts: the data buffer with the actual data elements, and the metadata which contains information about the data buffer. The metadata includes data type, strides and other important information that helps manipulate the ndarray easily.

Because of the way NumPy is built, it is often possible to access the data buffer directly for more efficient computations: we call this a `view`. When this is not possible, for example when we need to increase the number of elements of an array, a `copy` is made. Copies take more space in memory and can impact performance for large datasets, so they should be avoided.

You don't need to understand all the details, but you should be aware that basic indexing creates views, while advanced indexing creates copies - which is why it should not be overused.

The base attribute of the ndarray makes it easy to tell if an array is a view or a copy. The base attribute of a view returns the original array while for a copy it returns `None`.

In [None]:
top_quality.base is quality_index  # top_quality is a view of quality_index

In [None]:
top_quality[[1, 1, 2, 3]].base is quality_index  # when advanced indexing is used, we create a copy

---

## Read more

- [Indexing on ndarrays](https://numpy.org/devdocs/user/basics.indexing.html)
- [Broadcasting](https://numpy.org/devdocs/user/basics.broadcasting.html)
- [Copies and Views]()

## Next

Go to [Notebook 3: Vectorization](03_Vectorization.ipynb).