### NumPy: Efficient Numerical Computing in Python

NumPy, short for Numerical Python, is a powerful library used for analyzing numeric data in Python. Its core feature is the NumPy array, a highly efficient data structure designed for handling homogeneous multi-dimensional arrays. To utilize NumPy’s functions and methods, we first need to import the library and use the array() function to define a NumPy array:

In [4]:
import numpy as np

# Creating a NumPy array using the array() function
numpy_array = np.array([[1, 2], [3, 4]])
print(numpy_array)

# Output:
# array([[1, 2],
#        [3, 4]])

# Checking the type of the array
print(type(numpy_array))  # Output: <class 'numpy.ndarray'>

[[1 2]
 [3 4]]
<class 'numpy.ndarray'>


The array() function generates an object of type numpy.ndarray, which is the fundamental data structure in NumPy.

### Why Use NumPy Arrays?

NumPy arrays, like lists, tuples, and Pandas DataFrames, can store and manipulate data. However, NumPy is particularly preferred for its efficiency, especially when working with large datasets.

### Memory Efficiency
A NumPy array consists of homogeneous data types stored in contiguous memory locations, which makes it more memory-efficient than lists. In contrast, Python lists store heterogeneous data types in non-contiguous memory locations, leading to increased memory overhead.

The following example demonstrates how NumPy arrays consume less memory compared to lists, tuples, and Pandas DataFrames:




In [8]:
import pandas as pd

# Creating a tuple, list, NumPy array, and Pandas DataFrame with 1000 elements
tuple_ex = tuple(range(1000))
list_ex = list(range(1000))
numpy_ex = np.array(range(1000))
pandas_df = pd.DataFrame(range(1000))

# Checking memory usage
print("Memory usage:")
print("Tuple:", tuple_ex.__sizeof__(), "bytes")
print("List:", list_ex.__sizeof__(), "bytes")
print("Pandas DataFrame:", pandas_df.__sizeof__(), "bytes")
print("NumPy array:", numpy_ex.__sizeof__(), "bytes")

Memory usage:
Tuple: 8024 bytes
List: 8040 bytes
Pandas DataFrame: 8132 bytes
NumPy array: 8112 bytes


As seen above, NumPy arrays take significantly less memory than other data structures.

### Data Type Consistency
NumPy’s memory efficiency is maximized only when the array contains homogeneous data types. Mixing data types increases memory usage, as shown below:



In [6]:
# Homogeneous NumPy array (all integers)
numpy_homogeneous = np.array([[1, 2], [3, 3]])
print("Size of homogeneous NumPy array:", numpy_homogeneous.__sizeof__(), "bytes")

# Introducing a string into the array (heterogeneous data)
numpy_heterogeneous = np.array([[1, '2'], [3, 3]])
print("Size of heterogeneous NumPy array:", numpy_heterogeneous.__sizeof__(), "bytes")

Size of homogeneous NumPy array: 160 bytes
Size of heterogeneous NumPy array: 464 bytes


The heterogeneous array consumes more than twice the memory of the homogeneous array due to type conversion overhead.

### Lists vs. NumPy Arrays
In contrast, Python lists do not exhibit significant memory differences whether their elements are homogeneous or heterogeneous:

In [9]:
# Homogeneous list
list_homogeneous = [1, 2, 3, 4]
print("Size of homogeneous list:", list_homogeneous.__sizeof__(), "bytes")

# Heterogeneous list (mixed data types)
list_heterogeneous = [1, '2', 3, 4]
print("Size of heterogeneous list:", list_heterogeneous.__sizeof__(), "bytes")

Size of homogeneous list: 72 bytes
Size of heterogeneous list: 72 bytes


### Key points about NumPy

* NumPy arrays store data more efficiently than lists, tuples, or Pandas DataFrames.
* Memory efficiency increases with homogeneous data. Introducing mixed data types negates this advantage.
* For small datasets, memory differences may be negligible. However, for large datasets, NumPy’s efficiency becomes significant.
* Unlike NumPy arrays, lists do not benefit from data homogeneity.
* For optimal performance, NumPy arrays should be used primarily for large-scale numerical computations involving homogeneous data.

### NumPy Arrays Are Fast!!
This is the most important point about NumPy arrays and the focus of this lecture.  You'll want to take advantage of this.  NumPy arrays enable significantly faster mathematical computations compared to other data structures due to several key factors:

Since NumPy arrays store homogeneous data in a contiguous memory block, data retrieval is quicker, which enhances computational speed.
NumPy supports vectorized operations, eliminating the need for slow Python for-loops. The package efficiently breaks down computations into multiple fragments and processes them in parallel.
NumPy integrates C and C++ code within Python, leveraging the speed of these low-level languages, which execute much faster than pure Python code.  We won't worry about the Big-O but just know that it is fast!

The following example demonstrates the performance advantage of NumPy for numerical computations.

Example: Multiplying all whole numbers up to 1 million by 2 and comparing the time taken using different data structures.

In [18]:
import numpy as np
import pandas as pd
import time as tm

# Multiplication using a list
start_time = tm.time()
list_ex = list(range(1000000))
a = [x * 2 for x in list_ex]
print("Time taken to multiply numbers in a list =", tm.time() - start_time)

# Multiplication using a tuple
start_time = tm.time()
tuple_ex = tuple(range(1000000))
a = tuple(x * 2 for x in tuple_ex)
print("Time taken to multiply numbers in a tuple =", tm.time() - start_time)

# Multiplication using a Pandas DataFrame
start_time = tm.time()
df_ex = pd.DataFrame(range(1000000))
a = df_ex * 2
print("Time taken to multiply numbers in a Pandas DataFrame =", tm.time() - start_time)

# Multiplication using a NumPy array
start_time = tm.time()
numpy_ex = np.arange(1000000)
a = numpy_ex * 2
print("Time taken to multiply numbers in a NumPy array =", tm.time() - start_time)

Time taken to multiply numbers in a list = 0.038156986236572266
Time taken to multiply numbers in a tuple = 0.03747200965881348
Time taken to multiply numbers in a Pandas DataFrame = 0.0072021484375
Time taken to multiply numbers in a NumPy array = 0.0016620159149169922


### Working with the NumPy Array

Let’s define a NumPy array:



In [21]:
numpy_ex = np.array([[1, 2, 3], [4, 5, 6]])
numpy_ex
#Output:
#
#array([[1, 2, 3],
#       [4, 5, 6]])


array([[1, 2, 3],
       [4, 5, 6]])

The attributes of numpy_ex can be viewed by typing numpy_ex, followed by a period (.), and pressing the tab key.

Here are some important attributes of a NumPy array:

* **ndim**
Indicates the number of dimensions (or axes) of the array.

* **shape**
A tuple that represents the size of the array along each dimension. For a matrix with n rows and m columns, the shape is (n, m). The number of dimensions, or rank, is the length of this tuple.

* **size**
The total number of elements in the array, which is the product of the elements in the shape tuple.

* **dtype**
Describes the type of the elements in the array. You can create or specify the dtype using standard Python types, or use the types provided by NumPy, such as bool_, int_, float32, complex64, etc.




In [30]:

print("ndim:", numpy_ex.ndim)
print("shape:", numpy_ex.shape)
print("size", numpy_ex.size)
print("dtype", numpy_ex.dtype)




ndim: 2
shape: (2, 3)
size 6


* **T**
This attribute returns the transpose of the NumPy array, which is often used to make 2D arrays compatible for matrix multiplication.

For matrix multiplication to occur, the number of columns in the first matrix must equal the number of rows in the second. For example:



In [37]:
matrix_to_multiply = np.array([[1, 2, 1], [0, 1, 0]])
print(matrix_to_multiply.shape)
print(numpy_ex.shape)
#Output:
#matrix_to_multiply.shape -> (2, 3)
#numpy_ex.shape -> (2, 3)


(2, 3)
(2, 3)


The shapes of the two matrices are not compatible for multiplication because the number of columns in the first matrix (3) does not match the number of rows in the second matrix (2). However, by transposing one of the matrices, their shapes will align.    To do matrix multiplication you use the **.dot** 



In [45]:
matrix_to_multiply_transpose = matrix_to_multiply.T

print(matrix_to_multiply_transpose.shape)  #Output (3, 2)

#Now, these matrices are compatible for multiplication, 
#and the result will depend on the multiplication order:

# Matrix multiplication with the transposed matrix first
print (matrix_to_multiply_transpose.dot(numpy_ex))
#Output:
#
#array([[1, 2, 3],
#       [6, 9, 12],
#       [1, 2, 3]])

# Matrix multiplication with numpy_ex first
print(numpy_ex.dot(matrix_to_multiply_transpose))
#Output:
#
#array([[8, 2],
#       [20, 5]])
#The resulting matrix shape will be (rows of the first matrix, columns of the second matrix). The order in which matrices are multiplied depends on the problem requirements.



(3, 2)
[[ 1  2  3]
 [ 6  9 12]
 [ 1  2  3]]
[[ 8  2]
 [20  5]]


###  Arithmetic Operations

NumPy arrays support arithmetic operators like +, -, *, and so on. You can perform arithmetic operations between an array and a scalar or between two arrays with the same shape. However, you cannot perform arithmetic operations between arrays with different shapes.

Here are some examples of arithmetic operations on arrays:



In [47]:
# Defining two arrays of the same shape
arr1 = np.array([[1, 2, 3, 4], 
                 [5, 6, 7, 8], 
                 [9, 1, 2, 3]])
arr2 = np.array([[11, 12, 13, 14], 
                 [15, 16, 17, 18], 
                 [19, 11, 12, 13]])

# Element-wise summation of arrays
arr3= arr1 + arr2
print(arr3)

#Output:
#
#array([[12, 14, 16, 18],
#       [20, 22, 24, 26],
#       [28, 12, 14, 16]])


# Element-wise subtraction
print(arr2 - arr1)

#Output:
#
#array([[10, 10, 10, 10],
#       [10, 10, 10, 10],
#       [10, 10, 10, 10]])


# Adding a scalar to an array adds the scalar to each element of the array
print(arr1 + 3)

#Output:
#
#array([[ 4,  5,  6,  7],
#       [ 8,  9, 10, 11],
#       [12,  4,  5,  6]])

# Dividing an array by a scalar divides all elements by the scalar
print(arr1 / 2)
#Output:
#
#array([[0.5, 1. , 1.5, 2. ],
#       [2.5, 3. , 3.5, 4. ],
#       [4.5, 0.5, 1. , 1.5]])

# Element-wise multiplication
print(arr1 * arr2)

#Output:
#array([[ 11,  24,  39,  56],
#       [ 75,  96, 119, 144],
#       [171,  11,  24,  39]])

# Modulus operator with scalar
print (arr1 % 4)
#Output:
#
#array([[1, 2, 3, 0],
#       [1, 2, 3, 0],
#       [1, 1, 2, 3]])

[[12 14 16 18]
 [20 22 24 26]
 [28 12 14 16]]
[[10 10 10 10]
 [10 10 10 10]
 [10 10 10 10]]
[[ 4  5  6  7]
 [ 8  9 10 11]
 [12  4  5  6]]
[[0.5 1.  1.5 2. ]
 [2.5 3.  3.5 4. ]
 [4.5 0.5 1.  1.5]]
[[ 11  24  39  56]
 [ 75  96 119 144]
 [171  11  24  39]]
[[1 2 3 0]
 [1 2 3 0]
 [1 1 2 3]]


### Broadcasting
Broadcasting enables arithmetic operations between arrays with different dimensions, provided their shapes are compatible.

NumPy’s documentation describes broadcasting as follows:

Broadcasting refers to how NumPy handles arrays of different shapes during arithmetic operations. Given certain conditions, the smaller array is expanded across the larger one to match its shape. This technique allows vectorized operations, ensuring computations are performed in C rather than Python, improving efficiency while avoiding unnecessary data duplication.
The following example demonstrates broadcasting:

In [51]:
arr1 = np.array([[1, 2, 3, 4], 
                 [5, 6, 7, 8], 
                 [9, 1, 2, 3]])
arr2 = np.array([4, 5, 6, 7])

arr1 + arr2

array([[ 5,  7,  9, 11],
       [ 9, 11, 13, 15],
       [13,  6,  8, 10]])

Here, arr2 has shape (4,), while arr1 has shape (3,4). NumPy expands arr2 across the rows of arr1, matching its shape without physically replicating data. This enhances performance while minimizing memory usage.

Broadcasting occurs because both arrays have the same rightmost dimension (4). However, the leftmost dimension of one array must be 1 for broadcasting to take effect.

NumPy follows specific rules to determine if broadcasting is possible:

“When performing operations on two arrays, NumPy compares their shapes element-wise, starting from the rightmost dimension and moving left. Two dimensions are considered compatible if:
They are equal, or
One of them is 1”
If arr2 had a shape (3,) instead of (4,), broadcasting would not occur, as the rightmost dimensions of arr1 and arr2 would not match.

In [54]:
# Defining arr2 with shape (3,)
arr2 = np.array([4, 5, 6])

# Broadcasting fails due to shape mismatch
arr1 + arr2

ValueError: operands could not be broadcast together with shapes (3,4) (3,) 

### Comparison
NumPy arrays support comparison operations such as ==, !=, and >, returning boolean arrays as results.

In [59]:
arr1 = np.array([[1, 2, 3], [3, 4, 5]])
arr2 = np.array([[2, 2, 3], [1, 2, 5]])

print(arr1 == arr2)


print(arr1 != arr2)

print(arr1>arr2)




[[False  True  True]
 [False False  True]]
[[ True False False]
 [ True  True False]]
[[False False False]
 [ True  True False]]


In [61]:
#Array comparisons are often used to count matching elements 
#between two arrays. Since True evaluates to 1 and False to 0 
#in arithmetic operations, we can use the .sum() method:

print((arr1 == arr2).sum())

3
