# What is NumPy?

NumPy is a Python library used for working with arrays.

It also has functions for working in the domain of linear algebra, fourier transform, and matrices.

NumPy stands for Numerical Python.

# Why Use NumPy?

In Python we have lists that serve the purpose of arrays, but they are slow to process.

NumPy aims to provide an array object that is up to 50x faster than traditional Python lists.

In [3]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr)

[1 2 3 4 5]


To create an array, we can pass a list, tuple or any array-like object into the array() method.

In [4]:
arr = np.array((1, 2, 3, 4, 5, 6))
print(arr)

[1 2 3 4 5 6]


# Dimensions in Arrays

A dimension in arrays is one level of array depth (nested arrays). 

# 0-D Arrays

0-D arrays, or Scalars, are the elements in an array. Each value in an array is a 0-D array.

In [5]:
arr = np.array(42)
print(arr)

42


# 1-D Arrays

An Array that has 0-D arrays as its elements is called uni-dimensional or 1-D array.

These are the most common and basic arrays.

In [6]:
arr = np.array([1, 2, 3, 4, 5])
print(arr)

[1 2 3 4 5]


# 2-D Arrays

An array that has 1-D arrays as its elements is called a 2-D array. These are often used to represent matrix or 2nd order tensors.

NumPy has a whole sub module dedicated towards matrix operations called numpy.mat

In [7]:
# Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)

[[1 2 3]
 [4 5 6]]


# 3-D arrays

An array that has 2-D arrays (matrices) as its elements is called a 3-D array.

These are often used to represent a 3rd order tensor.

In [8]:
# Create a 3-D array with two 2-D arrays, both containing
# two arrays with the values 1,2,3 and 4,5,6
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)

[[[1 2 3]
  [4 5 6]]

 [[1 2 3]
  [4 5 6]]]


# Check Number of Dimensions

NumPy Arrays provide the ndim attribute that returns an integer that tells us how many dimensions an array has.

In [9]:
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

0
1
2
3


# Higher Dimension Arrays

An Array can have any number of dimensions.

When the array is created, you can define the number of dimensions using the ndmin argument.

In [10]:
arr = np.array([1, 2, 3, 4], ndmin=5)

print(arr)
print(f"Number of dimensions: {arr.ndim}")

[[[[[1 2 3 4]]]]]
Number of dimensions: 5


In this array, the innermost dimension (5th dim) has 4 elements, the 4th dim has 1 element that is the vector, the 3rd dim has 1 element that is the matrix with the vector, the 2nd dim has 1 element that is the 3D array and 1st dim has 1 element that is a 4D array.

# Access Array Elements

Array indexing is the same as accessing an array element.

You can access an array element by referring to its index number.

In [11]:
arr = np.array([1, 2, 3, 4])
print(arr[0])

1


# Access 2-D Arrays

To access elements from 2-D arrays, we can use comma separated integers representing the dimension and the index of the element.

Think of 2-D arrays like a table with rows and columns, where the dimension represents the row and the index represents the column.

In [12]:
# Access the element on the first row, second column
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(f"Second element of the first row: {arr[0, 1]}")

Second element of the first row: 2


# Access 3-D Arrays

To access elements from 3-D arrays, we can use comma separated integers representing the dimensions and the index of the element.

In [13]:
# Access the third element of the second array of the first array
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(f"Third element of the second array of the first array: {arr[0, 1, 2]}")

Third element of the second array of the first array: 6


# Negative Indexing

Use negative indexing to access an array from the end.

In [15]:
# Print the last element from the 2nd dim
arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
print(f"Last element froom the 2nd dim: {arr[-1, -1]}")

Last element froom the 2nd dim: 10


# Data Types

By default, Python has these data types:
- strings: used to represent text data
- integer: used to represent integer numbers
- float: used to represent real numbers
- boolean: used to represent True or False
- complex: used to represent complex numbers

# Data Types in NumPy

NumPy has some extra data types, and refer to data types with one character, like i for integers, u for unsigned integers etc.

Below is a list of all data types in NumPy and the character used to represent them.

- i : integer
- b : boolean
- u : unsigned integer
- f : float
- c : complex float
- m : timedelta
- M : datetime
- O : object
- S : string
- U : unicode string
- V : fixed chunk of memory for other type ( void )

# Checking the Data Type of an Array

The NumPy array object has a property called dtype that returns the data type of the array.

In [16]:
arr = np.array([1, 2, 3, 4])
print(arr.dtype)

int64


In [17]:
arr = np.array(['apple', 'banana', 'cherry'])
print(arr.dtype)

<U6


# Creating Arrays with a Defined Data Type

We use the array() function to create arrays, this function can take an optional argument: dtype that allows us to define the expected data type of the array elements:

In [18]:
arr = np.array([1, 2, 3, 4], dtype='S')
print(arr)
print(arr.dtype)

[b'1' b'2' b'3' b'4']
|S1


For i, u, f, S, and U we can define size as well.

In [19]:
arr = np.array([1, 2, 3, 4], dtype='i4')
print(arr)
print(arr.dtype)

[1 2 3 4]
int32


# Converting Data Type on Existing Arrays

The best way to change the data type of an existing array, is to make a copy of the array with the astype() method.

The astype() function creates a copy of the array, and allows you to specify the data type as a parameter.

The data type can be specified using a string, like 'f' for float, 'i' for integer etc. or you can use the data type directly like float for floar and int for integer.

In [None]:
# Change data type from float to integer by using 'i' as parameter value:
arr = np.array([1.1, 2.1, 3.1])
newarr = arr.astype('i')
print(newarr)
print(newarr.dtype)

[1 2 3]
int32


In [21]:
# Change data type from float to integer by using int as parameter value:
arr = np.array([1.1, 2.1, 3.1])
newarr = arr.astype(int)
print(newarr)
print(newarr.dtype)

[1 2 3]
int64


In [23]:
# Change data type from integer to boolean:
arr = np.array([1, 0, 3])
newarr = arr.astype(bool)
print(newarr)
print(newarr.dtype)

[ True False  True]
bool


# The Difference Between Copy and View

The main difference between a copy and a view of an array is that the copy is an new array, and the view is just a view of the original array.

The copy owns the data and any changes to the copy will not affect the origincal array, and any changes to the original array will not affect the copy.

The view does no own the data and any changes made to the view will affect the original array, and any changes made to the origincal array will affect the view.

# COPY:

In [None]:
# Make a copy, change the original array, and display both arrays:
arr = np.array([1, 2, 3, 4, 5])
x = arr.copy()
arr[0] = 42
print(arr)
print(x)

[42  2  3  4  5]
[1 2 3 4 5]


In [25]:
# Make a view, change the original array, and display both arrays:
arr = np.array([1, 2, 3, 4, 5])
x = arr.view()
arr[0] = 42
print(arr)
print(x)

[42  2  3  4  5]
[42  2  3  4  5]


In [26]:
# Make a view, change the view, and display both arrays:
arr = np.array([1, 2, 3, 4, 5])
x = arr.view()
x[0] = 31

print(arr)
print(x)

[31  2  3  4  5]
[31  2  3  4  5]


# Check if Array Owns its Data

Every NumPy array has the attribute base that returns None if the array owns the data.

Otherwise, the base attribute refers to the original object.

In [27]:
# Print the value of the base attribute to check if an array owns it's data or not:

arr = np.array([1, 2, 3, 4, 5])

x = arr.copy()
y = arr.view()

print(x.base)
print(y.base)

None
[1 2 3 4 5]


# Shape of an Array

The shape of an array is the number of elements in each dimension.

# Get the Shape of an Array

NumPy arrays have an attribute called shape that returns a tuple with each index having the number of corresponding elements.

In [28]:
# Print the shape of a 2-D array

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

print(arr.shape)

(2, 4)


In [29]:
# Create an array with 5 dimensions using ndmin using a vector with values 1,2,3,4 and verify that last dimension has value 4:

arr = np.array([1, 2, 3, 4], ndmin=5)

print(arr)
print(f"Shape of array: {arr.shape}")

[[[[[1 2 3 4]]]]]
Shape of array: (1, 1, 1, 1, 4)


# Reshaping Arrays

Reshaping means changing the shape of an array.

The shape of an array is the number of elements in each dimension.

By reshaping, we can add or remove dimensions or change number of elemenst in each dimension.

# Reshape from 1-D to 2-D

In [30]:
# Convert the following 1-D array with 12 elements into a 2-D array.
# The outermost dimension will have 4 arrays, each with 3 elements:

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

newarr = arr.reshape(4, 3)

print(newarr)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


# Reshape from 1-D to 3-D

In [31]:
# onvert the following 1-D array with 12 elements into a 3-D array.
# The outermost dimension will have 2 arrays that contains 3 arrays, each with 2 elements:

newarr = arr.reshape(2, 3, 2)
print(newarr)

[[[ 1  2]
  [ 3  4]
  [ 5  6]]

 [[ 7  8]
  [ 9 10]
  [11 12]]]


# Unknown Dimension

You are allowed to have one "unknown" dimension.

Meaning that you do not have to specify an exact number for one of the dimensions in the reshape method.

Pass -1 as the value, NumPy will calculate this number for you.

In [32]:
# Convert 1D array with 8 elements to 3D array with 2x2 elements:

newarr = arr.reshape(2, 2, -1)

print(newarr)

[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]]


# Flattening the arrays

Flattening means converting a multidimensional array into a 1D array.

We can use reshape(-1) to do this.

In [33]:
# Convert the array into a 1D array:

arr = np.array([[1, 2, 3], [4, 5, 6]])

newarr = arr.reshape(-1)

print(newarr)

[1 2 3 4 5 6]


# Iterating Arrays

In [34]:
arr = np.array([1, 2, 3])

for x in arr:
    print(x)

1
2
3


# Iterating 2-D Arrays

In a 2-D array it will go through all the rows.

In [35]:
arr = np.array([[1, 2, 3], [4, 5, 6]])

for x in arr:
    print(x)

[1 2 3]
[4 5 6]


To return the actual values, the scalars, we have to iterate the arrays in each dimension.

In [36]:
for x in arr:
    for y in x:
        print(y)

1
2
3
4
5
6


# Iterating Arrays Using nditer()

The function nditer() is a helping function that can be used from very basic to very advanced iterations. It solves some basic issues which we face in iteration, lets go through it with examples.

# Iterating on Each Scalar Element

In basic for loops, iterating through each scalar of an array we need to use n for loops which can be difficult to write for arrays with very high dimensionality.

In [37]:
# Iterate through a 3-D array

arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

for x in np.nditer(arr):
    print(x)

1
2
3
4
5
6
7
8


# Iterating Array with Different Data Types

We can use op_dtypes argument and pass it the expected datatype to change the datatype of elements while iterating.

NumPy does no change the data type of the element in-place (where the element is in array) so it needs some other space to perform this action, that extra space is called buffer, and in order to enable it in nditer() we pass flags=['buffered']

In [None]:
# Iterating thrigh the array as a string
arr = np.array([1, 2, 3])

for x in np.nditer(arr, flags=['buffered'], op_dtypes=['S']):
    print(x)

np.bytes_(b'1')
np.bytes_(b'2')
np.bytes_(b'3')


In [39]:
# Iterate through every scalar element, skipping 1 element

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

for x in np.nditer(arr[:, ::2]):
    print(x)

1
3
5
7


# Enumerated Iteration Using ndenumerate()

Enumeration means mentioning the sequence number of things, one by one.

Sometimes we require corresponding index of the element while iterating, the ndenumerate() method can be used for those usecases.

In [40]:
arr = np.array([1, 2, 3])

for idx, x, in np.ndenumerate(arr):
    print(idx, x)

(0,) 1
(1,) 2
(2,) 3


In [41]:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

for idx, x in np.ndenumerate(arr):
    print(idx, x)

(0, 0) 1
(0, 1) 2
(0, 2) 3
(0, 3) 4
(1, 0) 5
(1, 1) 6
(1, 2) 7
(1, 3) 8


# Joining NumPy Arrays

We can pass a sequence of arrays that we want to join to the concatenate() function, along with the axis. If the axis is not explicitly passed, it is taken as 0.

In [44]:
# Join two arrays

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.concatenate((arr1, arr2))

print(arr)

[1 2 3 4 5 6]


In [49]:
# Joining two 2-D arrays along rows (axis=1)

arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6], [7, 8]])

arr = np.concatenate((arr1, arr2), axis=1)

print(arr)

[[1 2 5 6]
 [3 4 7 8]]


# Joining Arrays Using Stack Functions

Stacking is the same as concatenation, the only difference is that stacking is done along a new axis.

We can concatenate two 1-D arrays along the second axis which would result in putting them one over the other, i.e., stacking.

We pass a sequence of arrays that we want to join to the stack() method along with the axis. If axis is not explicitly passed, it is taken as 0.

In [50]:
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.stack((arr1, arr2), axis=1)

print(arr)

[[1 4]
 [2 5]
 [3 6]]


# Stacking Along Rows

NumPy provides a helper function: hstack() to stack along rows.

In [51]:
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.hstack((arr1, arr2))

print(arr)

[1 2 3 4 5 6]


# Stacking Along Columns

NumPy provides a helper function: vstack() to stack along columns.

In [52]:
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.stack((arr1, arr2))

print(arr)

[[1 2 3]
 [4 5 6]]


# Stacking Along Height (depth)

NumPy provides a helper function: dstack() to stack along height, which is the same as depth.

In [53]:
arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.dstack((arr1, arr2))

print(arr)

[[[1 4]
  [2 5]
  [3 6]]]


# Splitting NumPy Arrays

Splitting is the reverse operation of Joining.

Joining merges multiple arrays into one and splitting breaks one array into multiple.

We use array_split() for splitting arrays, we pass it the array we want to split and the number of splits.

In [54]:
# Split the array in 3 parts

arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 3)

print(newarr)

[array([1, 2]), array([3, 4]), array([5, 6])]


The return value is a list containing three arrays.

If the array has less elements than required, it will adjust from the end accordingly.

In [55]:
# Split the array into 4 parts

arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 4)

print(newarr)

[array([1, 2]), array([3, 4]), array([5]), array([6])]


# Split Into Arrays

The return value of the array_split() method is an array containing each of the split as an array.

If you split an array into 3 arrays, you can access them from the result just like any array element.

In [56]:
# Access the splitted arrays

arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 3)

print(newarr[0])
print(newarr[1])
print(newarr[2])

[1 2]
[3 4]
[5 6]


# Splitting 2-D Arrays

Use the same syntax when splitting 2-D arrays.

Use the array_split() method, pass in the array you want to split and the number of splits you want to do.

In [57]:
# Split the 2-D array into three 2-D arrays

arr = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10], [11, 12]])

newarr = np.array_split(arr, 3)

print(newarr)

[array([[1, 2],
       [3, 4]]), array([[5, 6],
       [7, 8]]), array([[ 9, 10],
       [11, 12]])]


The example above returns three 2-D arrays.

Let's look at another example, this time each element in the 2-D array contains 3 elements.

In [58]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])

newarr = np.array_split(arr, 3)

print(newarr)

[array([[1, 2, 3],
       [4, 5, 6]]), array([[ 7,  8,  9],
       [10, 11, 12]]), array([[13, 14, 15],
       [16, 17, 18]])]


The example below returns three 2-D arrays, but they are split along the column (axis=1).

In [59]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17, 18]])

newarr = np.array_split(arr, 3, axis=1)

print(newarr)

[array([[ 1],
       [ 4],
       [ 7],
       [10],
       [13],
       [16]]), array([[ 2],
       [ 5],
       [ 8],
       [11],
       [14],
       [17]]), array([[ 3],
       [ 6],
       [ 9],
       [12],
       [15],
       [18]])]


# Searching Arrays

You can search an array for a certain value, and return the indexes that get a match.

To search an array, use the where() method.

In [60]:
# Find the indexes where the value is 4

arr = np.array([1, 2, 3, 4, 5, 4, 4])

x = np.where(arr == 4)

print(x)

(array([3, 5, 6]),)


In [61]:
# Find the indexes where the values are even

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

x = np.where(arr%2 == 0)

print(x)

(array([1, 3, 5, 7]),)


# Search Sorted

There is a method called searchsorted() which performs a binary search in the array, and returns the index where the specified value would be inserted to maintain the search order.

The searchsorted() method is assumed to be used on sorted arrays.

In [62]:
# Find the indexes where the value 7 should be inserted

arr = np.array([6, 7, 8, 9])

x = np.searchsorted(arr, 7)

print(x)

1


# Search From the Right Side

By default the left most index is returned, but we can give side='right' to return the right most index instead.

In [63]:
# Find the indexes where the value 7 should be inserted, starting from the right.

arr = np.array([6, 7, 8, 9])

x = np.searchsorted(arr, 7, side='right')

print(x)

2


# Multiple Values

To search for more than one value, use an array with the specified values.

In [64]:
# Find the indexes where the values 2, 4, and 6 should be inserted.

arr = np.array([1, 3, 5, 7])

x = np.searchsorted(arr, [2, 4, 6])

print(x)

[1 2 3]


# Sorting Arrays

Sorting means putting elements in an ordered sequence.

Ordered sequence is any sequence that has an order corresponding to elements, like numeric or alphabetical, ascending or descending.

The NumPy ndarray object has a function called sort(), that will sort a specified array.

In [65]:
arr = np.array([3, 2, 0, 1])

print(np.sort(arr))

[0 1 2 3]


This method returns a copy of the array, leaving the origical array unchanged.

You can sort arrays of strings, or any other data type.

In [66]:
# Sort the array alphabetically

arr = np.array(['banana', 'cherry', 'apple'])

print(np.sort(arr))

['apple' 'banana' 'cherry']


In [67]:
# Sort a boolean array

arr = np.array([True, False, True])

print(np.sort(arr))

[False  True  True]


# Sorting a 2-D Array

If you use the sort() method on a 2-D array, both arrays will be sorted.

In [68]:
arr = np.array([[3, 2, 4], [5, 0, 1]])

print(np.sort(arr))

[[2 3 4]
 [0 1 5]]


# Filtering Arrays

Getting some elements out of an existing array and creating a new array out of them is called filtering.

In NumPy, you can filter an array using a boolean index list.

A boolean index list is a list of booleans corresponding to indexes in the array.

If the value at an index is True that element is contained in the filtered array, if the value at that index is False that element is excluded from the filtered array.

In [69]:
# Create an array from the elements on index 0 and 2

arr = np.array([41, 42, 43, 44])

x = [True, False, True, False]

newarr = arr[x]

print(newarr)

[41 43]


# Creating the Filter Array

In the example above we hard-coded the True and False values, but the common use is to create a filter array based on conditions.

In [70]:
# Create a filter array that will return only values higher than 42

arr = np.array([41, 42, 43, 44])

filter_arr = []

for element in arr:
    if element > 42:
        filter_arr.append(True)
    else:
        filter_arr.append(False)

newarr = arr[filter_arr]

print(filter_arr)
print(newarr)

[False, False, True, True]
[43 44]


# Creating Filter Directly From Array

The above example is quite a common task in NumPy and NumPy provides a nice way to tackle it.

We can directly substitute the array instead of the iterable variable in our condition and it will work just as we expect it to.

In [71]:
# Create a filter array that will return only values hihger than 42

arr = np.array([41, 42, 43, 44])

filter_arr = arr > 42

new_arr = arr[filter_arr]

print(filter_arr)
print(new_arr)

[False False  True  True]
[43 44]
