#  Numerical Python - NumPy


**NumPy** (short for *Numerical Python*) is a core library in Python, designed to handle high-performance numerical and scientific computing. It introduces the **ndarray**, a multi-dimensional array object that stores **elements of the same data type** in a compact, efficient manner. This allows NumPy to outperform Python's built-in data structures, such as lists, especially when working with large datasets. NumPy is designed for numerical computations and is implemented in C and Fortran, making it highly efficient and significantly faster than native Python lists when performing operations on large datasets.

#### Key Features and Capabilities:

- **Multi-dimensional Arrays (ndarray)**: NumPy’s core data structure, which enables storage and manipulation of data in one or more dimensions.

- **Mathematical Operations**: Supports element-wise computations, linear algebra, statistics, and more complex numerical calculations.

- **Broadcasting**:  Simplifies array operations on arrays with different shapes, enhancing code flexibility and efficiency.

- **Integration**: Works seamlessly with popular libraries like SciPy (for scientific computing), Matplotlib (for plotting), and Pandas (for data manipulation).

## Data Structure of the NumPy Library


The central data structure in NumPy is the **ndarray**, short for *n-dimensional array*. This structure allows for efficient storage and manipulation of large volumes of numerical data in Python. If you are working with numeric data of a consistent type, it is strongly recommended to use ndarray instead of Python’s built-in data structures like lists, as ndarray is optimized for performance and memory efficiency.

**Comparison between ndarray and Python lists?**

- NumPy arrays (ndarray) are **more compact and efficient** in terms of both memory usage and computational speed.  Python lists, while flexible, are not optimized for numerical operations, as they can store mixed data types and require more overhead.

- When performing mathematical operations, NumPy uses **vectorized computations**, allowing it to process entire arrays without the need for explicit loops in Python. This is possible because NumPy is implemented in C and Fortran, making it significantly faster.

- Once an ndarray is created, its **size cannot be changed**. If you need to modify the size, you must create a new array and copy the data, unlike Python lists, which are dynamic and can be resized. While this may seem restrictive, it allows for more memory-efficient handling of large datasets.
  
<div style="padding: 10px; border-left: 6px solid #FFA756; border-radius: 4px;">
  <strong>Conclusion:</strong> You should <strong>always prefer the ndarray over the Python list if you have numeric data of the same data type</strong>.
</div>

## Data Types within NumPy Library


The data type within NumPy usually specified by the keyword **dtype**. The full list of choices can be found in the [documentation](https://numpy.org/devdocs/user/basics.types.html) - commonly used types are the following:

| Data type  | Description |
|:-----------|:------------|
| bool_      | Boolean (True or False) stored as a byte |
| int_       | Default integer type (same as C long ; normally either int64 or int32 ) |
| intc       | Identical to C int (normally int32 or int64 ) |
| intp       | Integer used for indexing (same as C ssize_t ; normally either int32 or int64 ) |
| int8       | Byte (–128 to 127) |
| int16      | Integer (–32768 to 32767) |
| int32      | Integer (–2147483648 to 2147483647) |
| int64      | Integer (–9223372036854775808 to 9223372036854775807) |
| uint8      | Unsigned integer (0 to 255) |
| uint16     | Unsigned integer (0 to 65535) |
| uint32     | Unsigned integer (0 to 4294967295) |
| uint64     | Unsigned integer (0 to 18446744073709551615) |
| float_     | Shorthand for float64 |
| float16    | Half-precision float: sign bit, 5 bits exponent, 10 bits mantissa |
| float32    | Single-precision float: sign bit, 8 bits exponent, 23 bits mantissa |
| float64    | Double-precision float: sign bit, 11 bits exponent, 52 bits mantissa |
| complex_   | Shorthand for complex128 |
| complex64  | Complex number, represented by two 32-bit floats |
| complex128 | Complex number, represented by two 64-bit floats |

For more specialized data types like date/time or custom user-defined types, NumPy provides additional options. If you're dealing with non-numeric Python objects, you can use `dtype=object`. However, this sacrifices much of NumPy's efficiency, though it can be useful when working with strings or other generic Python data types.

## Getting Familiar with the NumPy Library

### Import NumPy

Now you will be working with a Python package. In this case you start by importing the NumPy library into your Python script by the short name 'np':

In [None]:
import numpy as np

<div style="padding: 10px; border-left: 6px solid #2196F3; border-radius: 4px;">
  <strong>Note:</strong> When using the online version of this Jupyter Notebook, the code has already been run. However, if you want to use the notebook in a code editor (e.g., VS Code), you might need to take some additional steps.
</div>

If you run the notebook in an editor of your choice, you might encounter an error message like <code>ModuleNotFoundError: No module named 'numpy'</code>. This simply means that you need to install Numpy by following the steps outlined in the <strong>Getting Started</strong> documentation.

### Creating a NumPy Array


NumPy's primary data structure is the ndarray (n-dimensional array). You can create NumPy arrays using various methods. The ones we will be exploring are the arrays created from:
- Python Lists
- Numpy functions

#### Creating an Array using a **Python list**

In [4]:
# Creating a Python list
a = [0,2,5,4,4]
print("a =", a, ", type =", type(a))

# Creating a NumPy ndarray with the same data (and automatic data type)
b = np.array(a)
print("b =",b", type =",type(b),", dtype =", b.dtype)

a = [0, 2, 5, 4, 4] , type = <class 'list'>
b = b', type =' <class 'numpy.ndarray'> , dtype = int32


In [5]:
# Creating the same array with a specific data type
b = np.array(a, dtype = np.int32)
print("b =", b, ", type =", type(b), ", dtype =", b.dtype)

c = np.array(a, dtype = np.float64)
print("c =", c, ", type =", type(c), ", dtype =", c.dtype)

b = [0 2 5 4 4] , type = <class 'numpy.ndarray'> , dtype = int32
c = [0. 2. 5. 4. 4.] , type = <class 'numpy.ndarray'> , dtype = float64


In [6]:
# Creating two-dimensional arrays from a Python list of lists
a = [[1,2,3],[4,5,6],[7,7,8],[9,8,11]]
print("a =",a,", type =",type(a))

# The property `shape` tells you the size of each array dimension.
b = np.array(a)
print("b =", b, ", type =", type(b), ", dtype =", b.dtype, ", shape =", b.shape)

a = [[1, 2, 3], [4, 5, 6], [7, 7, 8], [9, 8, 11]] , type = <class 'list'>
b = [[ 1  2  3]
 [ 4  5  6]
 [ 7  7  8]
 [ 9  8 11]] , type = <class 'numpy.ndarray'> , dtype = int32 , shape = (4, 3)


In [7]:
# We can also construct three-dimensional arrays from a Python list of lists of lists
a = [
        [
            [0,1],
            [2,3],
            [4,5]
        ],
        [
            [6,7],
            [8,9],
            [10,11]
        ]
    ]
print("a =", a, ", type =", type(a))

b = np.array(a)
print("\nb =", b, ", type =", type(b), ", dtype =", b.dtype, ", shape =", b.shape)

a = [[[0, 1], [2, 3], [4, 5]], [[6, 7], [8, 9], [10, 11]]] , type = <class 'list'>

b = [[[ 0  1]
  [ 2  3]
  [ 4  5]]

 [[ 6  7]
  [ 8  9]
  [10 11]]] , type = <class 'numpy.ndarray'> , dtype = int32 , shape = (2, 3, 2)


#### Creating an Array using **NumPy Functions** 

Some numpy functions are `np.zeros()`, `np.shape()`, `np.full()`, `np.arange()`, `np.linspace()`, `np.random.rand()`, which will be explained below.

In [8]:
# Creating a one-dimensional ndarray of zeros
a = np.zeros(6)
print(a)

[0. 0. 0. 0. 0. 0.]


In [9]:
# Creating a  multi-dimensional array ndarray of zeros, with selected data type
a = np.zeros((3,4,2), dtype = np.int64)

# (3, 4, 2): This tuple specifies the shape of the array. 3 blocks (or rows in a higher dimensional context), each block will contain 4 rows,and each row will have 2 columns (values in each row). So the array is 3-dimensional: 3x4x2.
print(a)
print(a.shape)

[[[0 0]
  [0 0]
  [0 0]
  [0 0]]

 [[0 0]
  [0 0]
  [0 0]
  [0 0]]

 [[0 0]
  [0 0]
  [0 0]
  [0 0]]]
(3, 4, 2)


Certain data from the multi dimentional array created above can be accessed in different forms.

In [10]:
# Accessing a specific element.
print(a[2,3,0])

0


In [11]:
# Accessing a specific 2D slice (or block) of the array.
b = a[2]
print(b)
print(b.shape)

[[0 0]
 [0 0]
 [0 0]
 [0 0]]
(4, 2)


In [12]:
# Changing the data in a 2D slice (or block)
b[:] = 4
print(b)


[[4 4]
 [4 4]
 [4 4]
 [4 4]]


In [13]:
# Creating a 3x4 ndarray of zeros of integers
a = np.zeros((3,4), dtype = np.int64)
print(a)

[[0 0 0 0]
 [0 0 0 0]
 [0 0 0 0]]


Similar to `np.zeros()` which creates an array of 0's, `np.ones()` can be used to create an array of ones.

In [14]:
# Creation of a 3x4 ndarray of ones:
a = np.ones([3,4])
print(a)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


`np.full()` creates arrays of any dimension with all entries filled with the same specified value:

In [15]:
a = np.full((3,4), 3.14)
print(a)

[[3.14 3.14 3.14 3.14]
 [3.14 3.14 3.14 3.14]
 [3.14 3.14 3.14 3.14]]


We can also distribute a certain range of values into an array, using `np.arange()`.  In one dimension, this is similar to Python's function `range(start,stop,step)`

In [16]:
a = np.arange(10)
print("a  =", a)

b = np.arange(2,10)
print("b  =", b)

c = np.arange(2,10,3)
print("c  =", c)

d = np.arange(0.2, 0.9, 0.1)
print("d  =", d)

a  = [0 1 2 3 4 5 6 7 8 9]
b  = [2 3 4 5 6 7 8 9]
c  = [2 5 8]
d  = [0.2 0.3 0.4 0.5 0.6 0.7 0.8]


With the `np.arange()` function it is also possible to enforce a data type to the array.

In [17]:
e = np.arange(2, 10, 3 , dtype=np.float64)
print("e =", e)

e = [2. 5. 8.]


`np.linspace()` is a similar function, however here it is possible to specify the start, the end point and the number of steps.

In [18]:
a = np.linspace(3, 20, 6)
print("a  =", a)

a  = [ 3.   6.4  9.8 13.2 16.6 20. ]


In [19]:
# This excludes the endpoint, but still results in 6 entries
a = np.linspace(3, 20, 6, endpoint=False)
print("a  =", a)

a  = [ 3.          5.83333333  8.66666667 11.5        14.33333333 17.16666667]


We can also create randomly filled ndarrays.

In [20]:
# Random values between 0 and 1:
a = np.random.rand(3,4)
print(a)
print(a.dtype)

[[0.44899349 0.00448995 0.05580094 0.39015861]
 [0.03178908 0.50361075 0.4542743  0.42206629]
 [0.69776288 0.804962   0.34934558 0.76552197]]
float64


In [21]:
# Random integer ndarray with values between 0 and 9
a = np.random.randint(0,10,(5,6))
print(a)
print(a.dtype)

[[1 3 0 7 5 4]
 [3 2 3 7 6 6]
 [8 4 9 8 7 0]
 [2 1 1 5 4 2]
 [1 8 3 7 6 6]]
int32


The `len()` function returns the number of rows of a numpy array.

In [22]:
print(a)
print('len =',len(a))

[[1 3 0 7 5 4]
 [3 2 3 7 6 6]
 [8 4 9 8 7 0]
 [2 1 1 5 4 2]
 [1 8 3 7 6 6]]
len = 5


The dimensions of an ndarray are stored in an attribute called `shape`.

In [23]:
print(a)
print("shape =", a.shape)

[[1 3 0 7 5 4]
 [3 2 3 7 6 6]
 [8 4 9 8 7 0]
 [2 1 1 5 4 2]
 [1 8 3 7 6 6]]
shape = (5, 6)


In fact, you can rearrange the data by `reshaping` the array:

In [24]:
a = np.arange(30)
print(a)

b = a.reshape(5,6)
print()
print(b)

c = b.reshape(5,3,2)
print()
print(c)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29]

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]]

[[[ 0  1]
  [ 2  3]
  [ 4  5]]

 [[ 6  7]
  [ 8  9]
  [10 11]]

 [[12 13]
  [14 15]
  [16 17]]

 [[18 19]
  [20 21]
  [22 23]]

 [[24 25]
  [26 27]
  [28 29]]]


 The `flat` property gives you all data in a 1D array. Alternatively, you may use the function `flatten()` to transform the multidimensional into a 1D array.

In [25]:
print(list(b.flat))
print(b.flatten())

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29]


It is not possible to append ndarrays. However, we can create a new combined ndarray.

In [26]:
a = np.zeros([3, 4])
print("a =\n", a)

b = np.ones([3, 1])
print("b =\n", b)
print()

# create a copy with added column at the end:
c = np.c_[a, b]
print("c =\n", c)
print()

# we can combine as many columns as we want:
d = np.c_[a, b, b, b]
print("d =\n", d)

a =
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
b =
 [[1.]
 [1.]
 [1.]]

c =
 [[0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 1.]]

d =
 [[0. 0. 0. 0. 1. 1. 1.]
 [0. 0. 0. 0. 1. 1. 1.]
 [0. 0. 0. 0. 1. 1. 1.]]


In general, this process is called **concatenation**. Instead of `np.c_` you can also use `np.concatenate()`

In [27]:
a = np.zeros([3,4])
print("a =\n",a)

b = np.ones([3,1])
print("b =\n",b)
print()

c = np.concatenate((a, b), axis=1)
print("c =\n",c)
print()

a =
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
b =
 [[1.]
 [1.]
 [1.]]

c =
 [[0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 1.]]



In [28]:
a = np.zeros([6,4])
print("a =\n",a)

b = np.ones([3,4])
print("b =\n",b)
print()

c = np.concatenate((a, b), axis=0)
print("c =\n",c)
print()

a =
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
b =
 [[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]

c =
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]



We have seen how it is possible to create a numpy array from a list of data, the reverse process is also possible.

In [29]:
# Create an ndarray from a list
a = np.array([0,1,3,4,5])
print(a,type(a))
print()

# Create a list from an ndarray:
b = a.tolist()
print(b,type(b))

[0 1 3 4 5] <class 'numpy.ndarray'>

[0, 1, 3, 4, 5] <class 'list'>


In [30]:
# This also works for lists of lists:
a = np.array([[0,1],[2,3]])
print(a,type(a))
print()

b = a.tolist()
print(b,type(b))

[[0 1]
 [2 3]] <class 'numpy.ndarray'>

[[0, 1], [2, 3]] <class 'list'>


### File i/o and NumPy

NumPy ndarray data can be written to files and read from files. In principle there are [three options](https://docs.scipy.org/doc/numpy/reference/routines.io.html):
- Read/write text files (e.g. csv, csv.gz)
- Read/write NumPy format files (nyx, nyz)
- Read/write raw binary files

We only touch this topic briefly here - focussing only on the first option.

In [31]:
# Create a random array
a = np.random.randint(0,10,(3,4))
print(a)

[[2 6 4 8]
 [7 1 7 6]
 [2 5 7 8]]


In [32]:
# Export dat file:
fname = "test.dat"
print("Writing file", fname)
np.savetxt(fname, a)

Writing file test.dat


By default, `np.savetxt()` writes floating-point numbers. The array contains integers, but the stored data are converted to floats (e.g., 7.0 instead of 7) by default.

If you want to save the integers without floating-point format, you can specify the format using the fmt argument in `np.savetxt()`:

In [33]:
np.savetxt(fname, a, fmt='%d')  # %d for integer format

In general, when no manipulation will be performed in a file (adding, deleting, modifying), the file should be opened in *read* mode.

In [34]:
# Reading a file as string
with open(fname, "r") as f:
    print(f.read())

2 6 4 8
7 1 7 6
2 5 7 8



When working with data, `.csv` (comma seperated values) is a commen file format. Numpy arrays can also be saved in this format by specifying the delimiter.

In [35]:
# Export csv file, now with comma
fname = "test.csv"
print("Writing file", fname)
np.savetxt(fname, a, delimiter=',' , header="x,y,z,value")

Writing file test.csv


Each column of your data can have different formatting.

In [36]:
fname = "test.csv"
print("Writing file", fname)
np.savetxt(fname, a, delimiter=',', header="x,y,z,value", fmt=['%.3f','%.2f','%.4e','%d'])

Writing file test.csv


In [37]:
# Reading the file
with open(fname, "r") as f:
    print(f.read())

# x,y,z,value
2.000,6.00,4.0000e+00,8
7.000,1.00,7.0000e+00,6
2.000,5.00,7.0000e+00,8



If the filename ends with ".gz", it is automatically compressed:

In [38]:
fname = "test.csv.gz"
print("Writing file", fname)
np.savetxt(fname, a, delimiter=',', header="x,y,z,value")

Writing file test.csv.gz


We can now read any of the files created above into an ndarray.

In [39]:
# We can now read any of the above written files into an ndarray:
fname = "test.csv.gz"
print("Reading file", fname)
a = np.genfromtxt(fname, delimiter=',', skip_header=1)
print(a)

Reading file test.csv.gz
[[2. 6. 4. 8.]
 [7. 1. 7. 6.]
 [2. 5. 7. 8.]]


You can also select the data type.

In [40]:
fname = "test.csv"
print("Reading file", fname)
a = np.genfromtxt(fname, delimiter=',', skip_header=1, dtype=np.int32)
print(a)

Reading file test.csv
[[2 6 4 8]
 [7 1 7 6]
 [2 5 7 8]]


### Indexing  


Elements of an ndarray can directly be accessed by specifying the position inside the `[ ]` operator:

In [41]:
# Creating a random array:
a = np.random.randint(0,10,(3,4))
print(a)

[[5 7 4 7]
 [9 6 0 1]
 [5 2 2 9]]


Now, we will see how to access and modify a specific element.

In [42]:
# Selecting a specific element at position (2,1):
print(a)
print(a[2,1])

# Changing a specific element:
a[2,1] = 11

print()
print(a)

[[5 7 4 7]
 [9 6 0 1]
 [5 2 2 9]]
2

[[ 5  7  4  7]
 [ 9  6  0  1]
 [ 5 11  2  9]]


In the *flat* version of an array, it's elements can be accessed as well.

In [43]:
# Create a flat ndarray from the array
print(list(a.flat))

# Pick 6th element from the flat data:
x = a.flat[6]       # The 'flat' version of the array is stored in an attribute
print(x)

[5, 7, 4, 7, 9, 6, 0, 1, 5, 11, 2, 9]
0


With *nditer* we can iterate over all elements, similar to `for x in list`

In [44]:
for x in np.nditer(a):
    print(x)

5
7
4
7
9
6
0
1
5
11
2
9


Iteration over rows instead of elements can also be done.

In [45]:
for x in a:
    print(x)

[5 7 4 7]
[9 6 0 1]
[ 5 11  2  9]


Numpy has an equivalent function to Python's `enumerate()` called `ndenumerate()`.

In [46]:
print(a)
print()

for i,x in np.ndenumerate(a):
    print(i,x)

[[ 5  7  4  7]
 [ 9  6  0  1]
 [ 5 11  2  9]]

(0, 0) 5
(0, 1) 7
(0, 2) 4
(0, 3) 7
(1, 0) 9
(1, 1) 6
(1, 2) 0
(1, 3) 1
(2, 0) 5
(2, 1) 11
(2, 2) 2
(2, 3) 9


Python's `enumerate()` also works for numpy arrays, but with a different result.

In [47]:
print(a)
print()

for i,x in enumerate(a):
    print(i,x)

[[ 5  7  4  7]
 [ 9  6  0  1]
 [ 5 11  2  9]]

0 [5 7 4 7]
1 [9 6 0 1]
2 [ 5 11  2  9]


We can loop over all index tuples by the *ndindex* function.

In [48]:
print(a)
print()

for i in np.ndindex(a.shape):
    print(i, a[i])

[[ 5  7  4  7]
 [ 9  6  0  1]
 [ 5 11  2  9]]

(0, 0) 5
(0, 1) 7
(0, 2) 4
(0, 3) 7
(1, 0) 9
(1, 1) 6
(1, 2) 0
(1, 3) 1
(2, 0) 5
(2, 1) 11
(2, 2) 2
(2, 3) 9


### Slicing


Slicing in NumPy is a powerful feature that allows you to create sub-arrays from an existing array. You can slice NumPy arrays similarly to Python lists, but it’s even more powerful due to its ability to handle multi-dimensional arrays.

When you slice a NumPy array, the result is a view of the original array. This means that the sliced array and the original array share the same data in memory. Changes to one will affect the other.

In [49]:
# Creating a random array
a = np.random.randint(0,10,(3,4))
print(a)

[[7 4 2 3]
 [3 6 0 9]
 [6 4 6 0]]


In [50]:
# Taking a slice
s = a[:]
print(s)

[[7 4 2 3]
 [3 6 0 9]
 [6 4 6 0]]


In [51]:
# Now let's change a
a[0,0] = 11
print(a)

print(s)

[[11  4  2  3]
 [ 3  6  0  9]
 [ 6  4  6  0]]
[[11  4  2  3]
 [ 3  6  0  9]
 [ 6  4  6  0]]


In [52]:
# Now let's change s:
s[-1,-1] = 100
print('a =\n', a)
print()
print('s =\n', s)

a =
 [[ 11   4   2   3]
 [  3   6   0   9]
 [  6   4   6 100]]

s =
 [[ 11   4   2   3]
 [  3   6   0   9]
 [  6   4   6 100]]


Row selection

In [53]:
# Creating a fresh random array
a = np.random.randint(0,10,(3,4))
print(a)

[[9 3 5 8]
 [1 8 2 3]
 [3 6 8 2]]


In [54]:
print(a)

# select row 0:
print()
print(a[0])

# select last row:
print()
print(a[-1])

[[9 3 5 8]
 [1 8 2 3]
 [3 6 8 2]]

[9 3 5 8]

[3 6 8 2]


In [55]:
print(a)

# Now let's change the data in a row, here to a homogeneous value
# This is an example of NumPy's "broadcasting" (see below)
a[1] = 11

print()
print(a)

[[9 3 5 8]
 [1 8 2 3]
 [3 6 8 2]]

[[ 9  3  5  8]
 [11 11 11 11]
 [ 3  6  8  2]]


In [56]:
print(a)

# Now let's change the data in a row, here to values of a list:
a[1] = [1,2,3,4]

print()
print(a)

[[ 9  3  5  8]
 [11 11 11 11]
 [ 3  6  8  2]]

[[9 3 5 8]
 [1 2 3 4]
 [3 6 8 2]]


In [57]:
# We can use the same indexing as for ranges: start, stop (excluded), step
b = np.arange(20) + 100
print(b)

print(b[3:12])   # elements from 3 to 11
print(b[3:12:3]) # elements from 3 to 11 with step size 3
print(b[2::2])   # all elements starting from 3 with step size 2

[100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
 118 119]
[103 104 105 106 107 108 109 110 111]
[103 106 109]
[102 104 106 108 110 112 114 116 118]


In [58]:
# Since this creates a slice, we can change the sub array data:
print(b)
print()

b[3:12] = -1
print(b)
print()

b[::3] = 99
print(b)
print()

[100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
 118 119]

[100 101 102  -1  -1  -1  -1  -1  -1  -1  -1  -1 112 113 114 115 116 117
 118 119]

[ 99 101 102  99  -1  -1  99  -1  -1  99  -1  -1  99 113 114  99 116 117
  99 119]



For ndarrays with dimension larger than 1 we can select data by using the `[ ]` operator with comma-seperated selections for each dimension (called **axis** in NumPy terminology).

In the context of indexing and slicing, a sole colon `:` represents the selection of  _all_ entries of an axis. We can use this to select a column:

In [59]:
# create fresh random array:
a = np.random.randint(0, 10, (3,4))
print(a)

[[2 9 5 1]
 [5 1 9 5]
 [8 2 2 1]]


In [60]:
# Column selection:

print(a)

# select column 2:
print()
print(a[:,2])

# select last column:
print()
print(a[:,-1])

[[2 9 5 1]
 [5 1 9 5]
 [8 2 2 1]]

[5 9 2]

[1 5 1]


In [61]:
print(a)

# Now let's change the data in a column, here to values of a list:
a[:,1] = [9,8,9]

print()
print(a)

[[2 9 5 1]
 [5 1 9 5]
 [8 2 2 1]]

[[2 9 5 1]
 [5 8 9 5]
 [8 9 2 1]]


In [62]:
print(a)
print("shape =",a.shape)

# We can also combine index selection by ranges:
print()
b = a[1:]

print(b)
print("shape =",b.shape)
print()

c = a[1:3,2:4]
print(c)
print("shape =",c.shape)

[[2 9 5 1]
 [5 8 9 5]
 [8 9 2 1]]
shape = (3, 4)

[[5 8 9 5]
 [8 9 2 1]]
shape = (2, 4)

[[9 5]
 [2 1]]
shape = (2, 2)


We can replace whole sub arrays by slicing:

In [63]:
print(a)
print()

# replace a data slice by a constant value:
a[1:3, 2:4] = 30

print()
print(a)

[[2 9 5 1]
 [5 8 9 5]
 [8 9 2 1]]


[[ 2  9  5  1]
 [ 5  8 30 30]
 [ 8  9 30 30]]


In [64]:
print(a)
print()

# replace a data slice by values from another array:
a[1:3,2:4] = np.array([[9,9],[8,8]])

print()
print(a)

[[ 2  9  5  1]
 [ 5  8 30 30]
 [ 8  9 30 30]]


[[2 9 5 1]
 [5 8 9 9]
 [8 9 8 8]]


### Fancy indexing

The term _fancy indexing_ describes further possibilities for the selection of elements other than by slicing, e.g. by direct element selection via lists. However, fancy indexing does not return a _view_ (i.e., not a reference to the original data) but a _copy_ . Therefore in general it cannot be used for updating data in a ndarray (we will see an exception below).

Fancy indexing is nevertheless very practical in many applications, and it is considerably faster than loop-wise element access.

In [65]:
# First let's create fresh random array:
a = np.random.randint(0,10,(3,4))
print(a)

[[2 6 8 6]
 [5 6 6 7]
 [3 6 7 6]]


In [66]:
print(a)
print()

# We now use a list as index specifications for our slice:
print(a[1,[1,3]])

[[2 6 8 6]
 [5 6 6 7]
 [3 6 7 6]]

[6 7]


In [67]:
print('a =\n', a)
print()

# We cannot change sub array data via fancy indexing:
b = a[1,[1,2]]
print('b =', b)
b[:] = 13 # without the [:] we would simply overwrite the variable b
print(b)

print()
print('a =\n', a)

a =
 [[2 6 8 6]
 [5 6 6 7]
 [3 6 7 6]]

b = [6 6]
[13 13]

a =
 [[2 6 8 6]
 [5 6 6 7]
 [3 6 7 6]]


In [68]:
print('a =\n', a)
print()

# for a slice, this works:
b = a[1,1:3]
print('b =', b)
b[:] = 13
print('b =', b)

print()
print('a =\n', a)

a =
 [[2 6 8 6]
 [5 6 6 7]
 [3 6 7 6]]

b = [6 6]
b = [13 13]

a =
 [[ 2  6  8  6]
 [ 5 13 13  7]
 [ 3  6  7  6]]


In [69]:
print(a)
print()

# What would you expect to happen here:
a[1,[1,3]] = 100

print()
print(a)

[[ 2  6  8  6]
 [ 5 13 13  7]
 [ 3  6  7  6]]


[[  2   6   8   6]
 [  5 100  13 100]
 [  3   6   7   6]]


This works because the Python interpreter is smart enough to replace the index selection _without_ creating an intermediate object in memory. 

<div style="padding: 10px; border-left: 6px solid #2196F3; border-radius: 4px;">
  <strong>Note:</strong> In general it is good advice to <b>be careful when changing data with fancy indexing -  it only works for direct assignment</b>.
</div>

We can also use list indexing in multidimensional arrays:

In [70]:
print(a)
print()

# this pairs up (0,0), (2,2), (2,3):
a[[0,2,2],[0,2,3]] = 100
print(a)

[[  2   6   8   6]
 [  5 100  13 100]
 [  3   6   7   6]]

[[100   6   8   6]
 [  5 100  13 100]
 [  3   6 100 100]]


Another way to access elements in an ndarray is **boolean indexing**:

In [71]:
print(a)
print()

# select by a bool list:
s = [True, True, False, True]
print(s)
print(a[0,s])
print()

# this works for bool ndarrays as well:
t = np.array(s)
print(t)
print(a[0,t])

[[100   6   8   6]
 [  5 100  13 100]
 [  3   6 100 100]]

[True, True, False, True]
[100   6   6]

[ True  True False  True]
[100   6   6]


 We can get such selections directly from conditions

In [72]:
print(a)
print()

# select elements greater 3:
s = a > 3
print(s)
print()

# the elements >3 as a 1D array
# (in general they could be located anywhere, hence 1D result)
# (the library numpy.mask can help keeping track --> not now!):
print(a[s])
print()

# we can change these elements:
a[s] = 100
print(a)

[[100   6   8   6]
 [  5 100  13 100]
 [  3   6 100 100]]

[[ True  True  True  True]
 [ True  True  True  True]
 [False  True  True  True]]

[100   6   8   6   5 100  13 100   6 100 100]

[[100 100 100 100]
 [100 100 100 100]
 [  3 100 100 100]]


In [73]:
a[ (a > 1) & (a < 3) ] = 8
print(a)

[[100 100 100 100]
 [100 100 100 100]
 [  3 100 100 100]]


#### _Index arrays_
We have now seen fancy indexing by lists for each axis. This of course also works for numpy arrays with integer data:

In [74]:
# create fresh random array:
a = np.random.randint(0,10,(3,4))
print(a)

[[4 8 1 7]
 [8 5 8 0]
 [1 0 1 5]]


In [75]:
# use index arrays - this will pair up (0,2), (0,1), (1,-1):
print(a[np.array([0,0,1]), np.array([2,1,-1])])

[1 8 0]


We have also seen how to invoke boolean indexing by a condition:

In [76]:
print(a[a < 5])

[4 1 0 1 0 1]


As an alternative, we can obtain the index arrays for a condition by using *np.where*:

In [77]:
print("a =", a)

# this returns a tuple of index arrays, one for each axis:
w = np.where(a < 5)
print()
print("w =", w)

# as seen above, these tuples of index arrays can directly be
# used for element access:
print()
print("a[w] =", a[w])

a = [[4 8 1 7]
 [8 5 8 0]
 [1 0 1 5]]

w = (array([0, 0, 1, 2, 2, 2], dtype=int64), array([0, 2, 3, 0, 1, 2], dtype=int64))

a[w] = [4 1 0 1 0 1]


Using `np.where()`, you can also get specify elements that should be returned if True or False, respectively:

In [78]:
print("a =", a)
print()

s = np.where(a < 5, 1, 0)
print("s =",s)
print()

k = np.where(a < 5, 10 *a, 0)
print("k =",k)
print()

b = np.arange(12).reshape(a.shape)
print("b =", b)
l = np.where(a < 5, 10 * a, b)
print("l =",l)

a = [[4 8 1 7]
 [8 5 8 0]
 [1 0 1 5]]

s = [[1 0 1 0]
 [0 0 0 1]
 [1 1 1 0]]

k = [[40  0 10  0]
 [ 0  0  0  0]
 [10  0 10  0]]

b = [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
l = [[40  1 10  3]
 [ 4  5  6  0]
 [10  0 10 11]]


The function `np.where()` returns the tuple *(axis0_indices, axis1_indices, ...)* which can be inconvenient in some applications.

In [79]:
print("a =", a)
print()

w = np.where(a > 5)
print("w =", w)
print()

a = [[4 8 1 7]
 [8 5 8 0]
 [1 0 1 5]]

w = (array([0, 0, 1, 1], dtype=int64), array([1, 3, 0, 2], dtype=int64))



Instead of `np.where()` we can also use `np.argwhere()`. This returns a list of index lists, each pointing to an element for which the condition is fulfilled.

In [80]:
print("a =", a)
print()

aw = np.argwhere(a > 5)
print("aw =", aw)

print()
for (i, j) in aw:
    print(i,j,":",a[i, j])

a = [[4 8 1 7]
 [8 5 8 0]
 [1 0 1 5]]

aw = [[0 1]
 [0 3]
 [1 0]
 [1 2]]

0 1 : 8
0 3 : 7
1 0 : 8
1 2 : 8


## Array Operations


Calculations with NumPy ndarrays are often performed using boolean fancy indexing. This means we can do complex calculations in a single step.

In [81]:
print(a)
print()

# here we take the square of all elements greater than 3 and smaller than 9:
a[ (a > 3) & (a < 9) ] **= 2

print(a)

[[4 8 1 7]
 [8 5 8 0]
 [1 0 1 5]]

[[16 64  1 49]
 [64 25 64  0]
 [ 1  0  1 25]]


All standard math operations can be performed **element-wise** on ndarrays.

In [82]:
# create fresh random arrays:
a = np.random.randint(0,10,(3,4))
print("a =\n",a)

b = np.random.randint(0,10,(3,4))
print("b =\n",b)

a =
 [[4 8 7 4]
 [7 1 9 1]
 [1 0 4 6]]
b =
 [[6 4 8 2]
 [1 5 0 3]
 [4 0 3 9]]


In [83]:
c = a + b
print("c =\n",c)

c =
 [[10 12 15  6]
 [ 8  6  9  4]
 [ 5  0  7 15]]


In [84]:
print("a =\n",a)
print("b =\n",b)
print()

c = a * b
print("c =\n",c)

a =
 [[4 8 7 4]
 [7 1 9 1]
 [1 0 4 6]]
b =
 [[6 4 8 2]
 [1 5 0 3]
 [4 0 3 9]]

c =
 [[24 32 56  8]
 [ 7  5  0  3]
 [ 4  0 12 54]]


Similar rules apply to slices

In [85]:
print("a =\n",a)
print("b =\n",b)
print()

c = a.copy()
print("c =\n",c)
print()

# add column 0 of b to column 1 of c:
c[:,1] += b[:,0]
print("c =\n",c)

a =
 [[4 8 7 4]
 [7 1 9 1]
 [1 0 4 6]]
b =
 [[6 4 8 2]
 [1 5 0 3]
 [4 0 3 9]]

c =
 [[4 8 7 4]
 [7 1 9 1]
 [1 0 4 6]]

c =
 [[ 4 14  7  4]
 [ 7  2  9  1]
 [ 1  4  4  6]]


Mathematical functions like `np.sin()`, `np.cos()`, `np.exp()`, etc; can be used element-wise on ndarrays.

In [86]:
print("a =\n",a)
print()

b = np.sin(a / 2 / np.pi)
print("b =\n",b)
print()

c = np.exp(-a)
print("c =\n",c)
print()

d = np.sqrt(a)
print("d =\n",d)

a =
 [[4 8 7 4]
 [7 1 9 1]
 [1 0 4 6]]

b =
 [[0.59448077 0.95605566 0.89750747 0.59448077]
 [0.89750747 0.15848389 0.99043774 0.15848389]
 [0.15848389 0.         0.59448077 0.81627311]]

c =
 [[1.83156389e-02 3.35462628e-04 9.11881966e-04 1.83156389e-02]
 [9.11881966e-04 3.67879441e-01 1.23409804e-04 3.67879441e-01]
 [3.67879441e-01 1.00000000e+00 1.83156389e-02 2.47875218e-03]]

d =
 [[2.         2.82842712 2.64575131 2.        ]
 [2.64575131 1.         3.         1.        ]
 [1.         0.         2.         2.44948974]]


NumPy provides functions like `np.sum()`, `np.mean()`, `np.max()`, `np.min()`, etc; for aggregating data in arrays.

In [87]:
print(a)
print()

# The element sum:
s = a.sum()
print(s)
print()

# The element sum along axis 0:
s = a.sum(axis=0)
print(s)
print()

# The element sum along axis 1:
s = a.sum(axis=1)
print(s)

[[4 8 7 4]
 [7 1 9 1]
 [1 0 4 6]]

52

[12  9 20 11]

[23 18 11]


In [88]:
print(a)
print()

# the mean of the elements:
print(a.mean())
print(np.mean(a))
print()

# the mean row:
print(a.mean(axis=0))
print(np.mean(a, axis=0))
print()

# the mean col:
print(a.mean(axis=1))
print(np.mean(a, axis=1))

[[4 8 7 4]
 [7 1 9 1]
 [1 0 4 6]]

4.333333333333333
4.333333333333333

[4.         3.         6.66666667 3.66666667]
[4.         3.         6.66666667 3.66666667]

[5.75 4.5  2.75]
[5.75 4.5  2.75]


In [89]:
print(a)
print()

# the max element:
print(a.max())
print(np.max(a))
print()

# the min element:
print(a.min())
print(np.min(a))

[[4 8 7 4]
 [7 1 9 1]
 [1 0 4 6]]

9
9

0
0


In [90]:
print(a)
print()

# the position of the (first) max element:
i = a.argmax()
print(i)
print()

print(a.flat[i])

[[4 8 7 4]
 [7 1 9 1]
 [1 0 4 6]]

6

9


In [91]:
print(a)
print()

# The position of the max element in the rows
i = a.argmax(axis=0)
print(i)
print()

# The position of the max element in the columns
i = a.argmax(axis=1)
print(i)

[[4 8 7 4]
 [7 1 9 1]
 [1 0 4 6]]

[1 0 1 2]

[1 2 3]


In NumPy arrays, matrix transposition, dot products or cross products can be performed.

In [92]:
# Create fresh random arrays
a = np.random.randint(0,10,(3,4))
print("a =\n",a,": shape =",a.shape)
b = np.random.randint(0,10,(3,4))
print("b =\n",b,": shape =",b.shape)

a =
 [[5 9 4 1]
 [4 3 0 5]
 [3 9 5 6]] : shape = (3, 4)
b =
 [[1 6 4 0]
 [8 8 9 4]
 [4 3 6 7]] : shape = (3, 4)


In [93]:
# 'transpose' is an attribute:
c = b.T
print("c =\n",c,": shape =",c.shape)

c =
 [[1 8 4]
 [6 8 3]
 [4 9 6]
 [0 4 7]] : shape = (4, 3)


In [94]:
# Dot product
d = np.dot(a,c)
print("d =", d,": shape =",d.shape)

d = [[ 75 152  78]
 [ 22  76  60]
 [ 77 165 111]] : shape = (3, 3)


In [95]:
# create random vectors:
v = np.random.randint(0,10,3)
print("v =",v)

w = np.random.randint(0,10,3)
print("w =",w)
print()

# Cross product
x = np.cross(v, w)
print("x =",x)

v = [9 2 6]
w = [6 1 8]

x = [ 10 -36  -3]


## NumPy array with Broadcasting

NumPy **broadcasting** is a powerful mechanism that allows NumPy to perform operations on arrays of different shapes in a way that avoids making explicit copies of the data. It automatically expands the smaller array to match the shape of the larger array, allowing element-wise operations between arrays of different shapes without writing extra code for looping.

When operating on two arrays, NumPy compares their shapes element by element, starting from the last dimension and applies these rules:

- If the dimensions are equal: The arrays can be operated on element-wise.
- If one of the dimensions is 1: The smaller array will be "stretched" to match the larger array along that dimension.
- If the shapes are not compatible: Broadcasting will raise a ValueError because the arrays cannot be broadcasted together.

In [96]:
# create fresh random array:
a = np.random.randint(0,10,(3,4))
print(a)
print()

print(a + 30)

[[6 1 4 3]
 [2 6 7 5]
 [3 4 1 4]]

[[36 31 34 33]
 [32 36 37 35]
 [33 34 31 34]]


An example would be addding an array to each row of a matrix.

In [97]:
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
print("x =\n", x)
print("v =\n", v)
print()

# Add v to each row of x using broadcasting
y = x + v  
print("y =\n", y) 

x =
 [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
v =
 [1 0 1]

y =
 [[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


 We can multiply the first and the last axis data by factors. We can do this by resizing an array:

In [98]:
# create fresh random arrays:
a = np.random.randint(0,20,(2,3,4))
print("a =", a)
print("\nShape a:", a.shape)

f = np.array([[2,2,1,2], [1,1,2,2]])
print("f =", f)
print("\nShape f:", f.shape)

a = [[[ 4 17 10 12]
  [11  0  7  3]
  [17 10 18  0]]

 [[ 2 14  3 14]
  [18  8  7  3]
  [ 6  4  9 16]]]

Shape a: (2, 3, 4)
f = [[2 2 1 2]
 [1 1 2 2]]

Shape f: (2, 4)


In [99]:
# Resize f, sneaking in a new 1-size dimension:
x = a * f.reshape(2,1,4)
print(x)

[[[ 8 34 10 24]
  [22  0  7  6]
  [34 20 18  0]]

 [[ 2 14  6 28]
  [18  8 14  6]
  [ 6  4 18 32]]]


Another alternative would be sneaking in 'None' into the addressing:

In [100]:
x = a * f[:, None, :] 
print()
print("x =", x)


x = [[[ 8 34 10 24]
  [22  0  7  6]
  [34 20 18  0]]

 [[ 2 14  6 28]
  [18  8 14  6]
  [ 6  4 18 32]]]
