# CHAPTER 5 NUMPY FOR DATA ANALYSIS 

## 5.1 INTRODUCTION TO NUMPY AND ITS DATA STRUCTURES 

NumPy is a powerful library in Python for numerical computations and data manipulation. It provides a wide range of tools for working with arrays and matrices of numerical data, making it an essential tool for data analysis. In this section, we will explore the basics of NumPy, including its array and matrix data structures.

NumPy provides a powerful data structure called a ndarray (n-dimensional array), which is used to store and manipulate large arrays of homogeneous data (data of the same type, such as integers or floating point values). NumPy arrays are more efficient and convenient to work with compared to Python's built-in list or tuple data structures.

### Creating NumPy Arrays

There are several ways to create NumPy arrays, such as by using the array() function, by using a list or tuple, or by using built-in NumPy functions like zeros(), ones(), and arange(). For example, the following code creates a 1-dimensional array of integers from 0 to 9:


In [1]:
import numpy as np
array_1d = np.arange(10)
print(array_1d)

[0 1 2 3 4 5 6 7 8 9]


Here's a Python program that demonstrates the use of different numpy array placeholders:

In [6]:
import numpy as np

# Zeros
zeros_arr = np.zeros(5)
print("Zeros Array:")
print(zeros_arr)

# Ones
ones_arr = np.ones((2,3))
print("Ones Array:")
print(ones_arr)

# Arange
arange_arr = np.arange(5)
print("Arange Array:")
print(arange_arr)

# Linspace
linspace_arr = np.linspace(0, 1, 5)
print("Linspace Array:")
print(linspace_arr)

# Full
full_arr = np.full((2,2), 7)
print("Full Array:")
print(full_arr)

# Eye
eye_arr = np.eye(3)
print("Eye Array:")
print(eye_arr)

# Random
random_arr = np.random.random((2,3))
print("Random Array:")
print(random_arr)

# Empty
empty_arr = np.empty((2,2))
print("Empty Array:")
print(empty_arr)


Zeros Array:
[0. 0. 0. 0. 0.]
Ones Array:
[[1. 1. 1.]
 [1. 1. 1.]]
Arange Array:
[0 1 2 3 4]
Linspace Array:
[0.   0.25 0.5  0.75 1.  ]
Full Array:
[[7 7]
 [7 7]]
Eye Array:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]
Random Array:
[[0.42247426 0.30512348 0.16945615]
 [0.89307503 0.59665746 0.03105797]]
Empty Array:
[[0.25 0.5 ]
 [0.75 1.  ]]


### Creating Multidimensional Arrays

NumPy also allows us to create multidimensional arrays, such as 2-dimensional (matrices) and 3-dimensional arrays. We can create multidimensional arrays by passing a list of lists or a tuple of tuples to the array() function. 

here's an example program that generates arrays of various dimensions using NumPy:

In [5]:
import numpy as np

# 1D array
a1 = np.array([3, 2])

# 2D array
a2 = np.array([[1,0, 1], [3, 4, 1]])

# 3D array
a3 = np.array([[[1, 7, 9], [5, 9, 3]], [[7, 9, 9]]])

# 4D array
a4 = np.array([[[[1, 2], [3, 4]], [[5, 6], [7, 8]]], [[[9, 10], [11, 12]], [[13, 14], [15, 16]]]])

# 5D array
a5 = np.array([[[[[1, 2], [3, 4]], [[5, 6], [7, 8]]], [[[9, 10], [11, 12]], [[13, 14], [15, 16]]]],
               [[[[17, 18], [19, 20]], [[21, 22], [23, 24]]], [[[25, 26], [27, 28]], [[29, 30], [31, 32]]]]])

# Print the arrays
print("1D Array:")
print(a1)
print("2D Array:")
print(a2)
print("3D Array:")
print(a3)
print("4D Array:")
print(a4)
print("5D Array:")
print(a5)


1D Array:
[3 2]
2D Array:
[[1 0 1]
 [3 4 1]]
3D Array:
[list([[1, 7, 9], [5, 9, 3]]) list([[7, 9, 9]])]
4D Array:
[[[[ 1  2]
   [ 3  4]]

  [[ 5  6]
   [ 7  8]]]


 [[[ 9 10]
   [11 12]]

  [[13 14]
   [15 16]]]]
5D Array:
[[[[[ 1  2]
    [ 3  4]]

   [[ 5  6]
    [ 7  8]]]


  [[[ 9 10]
    [11 12]]

   [[13 14]
    [15 16]]]]



 [[[[17 18]
    [19 20]]

   [[21 22]
    [23 24]]]


  [[[25 26]
    [27 28]]

   [[29 30]
    [31 32]]]]]


  a3 = np.array([[[1, 7, 9], [5, 9, 3]], [[7, 9, 9]]])


### Data Types in NumPy Arrays

NumPy arrays have a fixed data type, and all elements of an array must be of the same type. The data type of an array can be accessed using the dtype attribute. For example, the following code creates an array of floating-point numbers and prints its data type:


In [3]:
import numpy as np
array_float = np.array([1.0, 2.0, 3.0])
print(array_float.dtype)

float64


In [7]:
import numpy as np 
a = np.array([1, 2, 3, 4, 5], dtype=int) 

In [8]:
print(a.dtype) 

int32


In [9]:
import numpy as np 
a = np.array([1, 2, 3, 4, 5], dtype=int) 
b = a.astype(float) 

## 5.2 MANIPULATING NUMPY ARRAYS 

In the previous section, we introduced NumPy and its data structures, including the ndarray. In this section, we will explore the various ways to manipulate NumPy arrays, including indexing, slicing, and reshaping.

### Indexing and Slicing

Indexing and slicing are powerful features in NumPy that allow you to access and manipulate specific elements or subarrays of an array. Indexing in NumPy is similar to indexing in Python lists, where you can access an element of an array using its index. For example, the following code accesses the first element of an array:


In [10]:
import numpy as np
array = np.array([1, 2, 3, 4, 5])
print(array[0])  # Output: 1

1


In [12]:
import numpy as np
array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(array_2d)
print(array_2d[1, 2])  # Output: 6

[[1 2 3]
 [4 5 6]
 [7 8 9]]
6


#### Slicing

Slicing, on the other hand, allows you to access a subarray of an array. The syntax for slicing is similar to indexing, but with a colon (:) to indicate the start and end of the slice. For example, the following code creates a slice of the first three elements of an array:

In [13]:
import numpy as np
array = np.array([1, 2, 3, 4, 5])
slice = array[0:3]
print(slice)  # Output: [1 2 3]

[1 2 3]


### Reshaping

Reshaping is the process of changing the shape or layout of an array without changing the data. NumPy provides several functions for reshaping arrays, including reshape(), ravel(), and flatten().

#### reshape()

The reshape() function allows you to change the shape of an array by specifying the number of rows and columns of the new shape. For example, the following code reshapes a 1D array with 4 elements into a 2D array with 2 rows and 2 columns:


In [19]:
import numpy as np
array = np.array([1, 2, 3, 4])
array = array.reshape(2, 2)
print(array) 

[[1 2]
 [3 4]]


#### ravel()

The ravel() function returns a 1D array with all the elements of the input array. It is equivalent to reshaping the array with -1 as one of the dimensions, which tells NumPy to infer the correct size based on the other dimensions.

In [23]:
import numpy as np
array = np.array([[1, 2, 3], [4, 5, 6]])
raveled = array.ravel()
print(raveled)  # Output: [1 2 3 4 5 6]

[1 2 3 4 5 6]


#### flatten()

The flatten() function also returns a 1D array with all the elements of the input array, but it creates a new copy of the data, rather than returning a view of the original array.


In [24]:
import numpy as np
array = np.array([[1, 2, 3], [4, 5, 6]])
flattened = array.flatten()
print(flattened)  #

[1 2 3 4 5 6]


#### Transposing

The transpose() function returns a new array with the axes transposed. For example, the following code transposes a 2D array:


In [26]:
import numpy as np
array = np.array([[1, 2, 3], [4, 5, 6]])
print("Before Transpose")
print(array) 
transposed = array.transpose()
print("\n After Transpose")
print(transposed) 

Before Transpose
[[1 2 3]
 [4 5 6]]

 After Transpose
[[1 4]
 [2 5]
 [3 6]]


#### Swapping Axes:

The swapaxes() function allows you to swap two axes of an array. For example, the following code swaps the first and second axes of a 3D array:


In [28]:
import numpy as np
array = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print("Before Swapping axes")
print(array) 

swapped = array.swapaxes(0, 1)
print("\n After Swapping axes")
print(swapped) 


Before Swapping axes
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]

 After Swapping axes
[[[1 2]
  [5 6]]

 [[3 4]
  [7 8]]]


In [29]:
import numpy as np
arr = np.array([[1, 2], [3, 4], [5, 6]])
print("Before Swapping axes")
print(arr) 

arr_swapped = arr.swapaxes(0, 1)
print("\n After Swapping axes")
print(arr_swapped)

Before Swapping axes
[[1 2]
 [3 4]
 [5 6]]

 After Swapping axes
[[1 3 5]
 [2 4 6]]


## 5.3 BROADCASTING 

Broadcasting is important concepts in NumPy that allow for efficient and flexible manipulation of arrays. Broadcasting refers to the ability of NumPy to perform mathematical operations on arrays of different shapes, and advanced array manipulation refers to techniques for manipulating arrays in sophisticated ways.

Broadcasting is a powerful feature of NumPy that allows for mathematical operations to be performed on arrays of different shapes. It works by "stretching" or "copying" the smaller array to match the shape of the larger array, before performing the operation.

For example, the following code adds a scalar value to each element of an array:


In [30]:
import numpy as np
a = np.array([1, 2, 3])
b = 2
c = a + b
print(c)  # Output: [3 4 5]5.3 BROADCASTING 
Broadcasting is important concepts in NumPy that allow for efficient and flexible manipulation of arrays. Broadcasting refers to the ability of NumPy to perform mathematical operations on arrays of different shapes, and advanced array manipulation refers to techniques for manipulating arrays in sophisticated ways.
Broadcasting is a powerful feature of NumPy that allows for mathematical operations to be performed on arrays of different shapes. It works by "stretching" or "copying" the smaller array to match the shape of the larger array, before performing the operation.
For example, the following code adds a scalar value to each element of an array:


[3 4 5]


In [31]:
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([1, 2, 3])
c = a * b
print(c)

[[ 1  4  9]
 [ 4 10 18]]


## 5.4 MATHEMATICAL OPERATIONS AND LINEAR ALGEBRA WITH NUMPY 

NumPy is a powerful library in Python for numerical computations and data manipulation. One of its key features is its support for mathematical operations and linear algebra. In this section, we will explore the various mathematical operations and linear algebra functions that are available in NumPy, and how they can be used in data analysis tasks.

### Arithmetic operations:

NumPy provides a wide range of functions for performing arithmetic operations on arrays, such as addition, subtraction, multiplication, and division. These operations can be performed element-wise on arrays of the same shape, and will return an array with the same shape as the input arrays.

For example, the following code performs element-wise operation of two arrays:


In [32]:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
sum1 = a + b
print(sum1)  # Output: [5 7 9]

cross = a * b
print(cross)  # Output: [4 10 18]

subtract1 = a - b
print(subtract1)  # Output: [-3 -3 -3]

div1 = a / b
print(div1)  # Output: [0.25 0.4  0.5]


[5 7 9]
[ 4 10 18]
[-3 -3 -3]
[0.25 0.4  0.5 ]


### Linear Algebra

Linear algebra is an essential part of data science and machine learning, and Numpy offers powerful tools for working with linear algebra. In this section, we will discuss the basics of linear algebra in Numpy and provide a coding example to illustrate the concepts.

Numpy provides a set of linear algebra functions that can be used to perform various operations such as matrix multiplication, inversion, decomposition, and many more. These functions are part of the linalg module in Numpy and are designed to work efficiently with large arrays and matrices.

To perform linear algebra operations in Numpy, we first need to create an array or matrix. We can create an array in Numpy using the array() function or create a matrix using the mat() function. Once we have an array or matrix, we can use the various functions provided by the linalg module to perform the desired operation.


In [33]:
import numpy as np

# Define matrices
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
B = np.array([[9, 8, 7], [6, 5, 4], [3, 2, 1]])

# Matrix multiplication
C = np.dot(A, B)
print("Matrix Multiplication:")
print(C)

# Matrix inversion
D = np.linalg.inv(A)
print("\nMatrix Inversion:")
print(D)

# Eigen decomposition
E, F = np.linalg.eig(A)
print("\nEigen Decomposition:")
print("Eigenvalues:\n", E)
print("Eigenvectors:\n", F)

# Singular Value Decomposition (SVD)
G, H, I = np.linalg.svd(A)
print("\nSingular Value Decomposition:")
print("Left Singular Vectors:\n", G)
print("Singular Values:\n", H)
print("Right Singular Vectors:\n", I)

# Matrix trace
J = np.trace(A)
print("\nMatrix Trace:")
print(J)

# Determinant of a matrix
K = np.linalg.det(A)
print("\nDeterminant of a Matrix:")
print(K)

# Solving linear equations
x = np.array([[1], [2], [3]])
y = np.linalg.solve(A, x)
print("\nSolving Linear Equations:")
print(y)


Matrix Multiplication:
[[ 30  24  18]
 [ 84  69  54]
 [138 114  90]]

Matrix Inversion:
[[ 3.15251974e+15 -6.30503948e+15  3.15251974e+15]
 [-6.30503948e+15  1.26100790e+16 -6.30503948e+15]
 [ 3.15251974e+15 -6.30503948e+15  3.15251974e+15]]

Eigen Decomposition:
Eigenvalues:
 [ 1.61168440e+01 -1.11684397e+00 -3.38433605e-16]
Eigenvectors:
 [[-0.23197069 -0.78583024  0.40824829]
 [-0.52532209 -0.08675134 -0.81649658]
 [-0.8186735   0.61232756  0.40824829]]

Singular Value Decomposition:
Left Singular Vectors:
 [[-0.21483724  0.88723069  0.40824829]
 [-0.52058739  0.24964395 -0.81649658]
 [-0.82633754 -0.38794278  0.40824829]]
Singular Values:
 [1.68481034e+01 1.06836951e+00 3.33475287e-16]
Right Singular Vectors:
 [[-0.47967118 -0.57236779 -0.66506441]
 [-0.77669099 -0.07568647  0.62531805]
 [-0.40824829  0.81649658 -0.40824829]]

Matrix Trace:
15

Determinant of a Matrix:
-9.51619735392994e-16

Solving Linear Equations:
[[-0.23333333]
 [ 0.46666667]
 [ 0.1       ]]


#### Eigenvalue and eigenvector

The eigenvalue and eigenvector of a matrix are important concepts in linear algebra. Eigenvalues are scalars that represent the amount by which a matrix stretches or shrinks a vector, and eigenvectors are vectors that are stretched or shrunk by a matrix.

NumPy provides the linalg.eig() function, which can be used to compute the eigenvalues and eigenvectors of a matrix.


In [35]:
import numpy as np
a = np.array([[1, 2], [3, 4]])
eigenvalues, eigenvectors = np.linalg.eig(a)
print("Eigenvalues: ", eigenvalues) # Output: Eigenvalues: [-0.37228132 5.37228132]
print("Eigenvectors: ", eigenvectors) # Output: Eigenvectors: [[-0.82456484 -0.41597356]
# [ 0.56576746 -0.90937671]]

Eigenvalues:  [-0.37228132  5.37228132]
Eigenvectors:  [[-0.82456484 -0.41597356]
 [ 0.56576746 -0.90937671]]


#### Matrix Decomposition:

Matrix decomposition is a technique for breaking a matrix down into simpler matrices. NumPy provides functions for several types of matrix decomposition, including the linalg.svd() function for singular value decomposition, and the linalg.qr() function for QR decomposition.

For example, the following code performs singular value decomposition on a matrix:


In [36]:
import numpy as np
a = np.array([[1, 2], [3, 4], [5, 6]])
U, s, V = np.linalg.svd(a)
print("U: ", U)
print("S: ", s)
print("V: ", V)

U:  [[-0.2298477   0.88346102  0.40824829]
 [-0.52474482  0.24078249 -0.81649658]
 [-0.81964194 -0.40189603  0.40824829]]
S:  [9.52551809 0.51430058]
V:  [[-0.61962948 -0.78489445]
 [-0.78489445  0.61962948]]


## 5.5 RANDOM SAMPLING & PROBABILITY DISTRIBUTIONS 

Random sampling and probability distributions are important concepts in data analysis that allow for the generation of random numbers and the modeling of random processes. NumPy provides several functions for generating random numbers and working with probability distributions, which are useful for tasks such as statistical modeling, simulation, and machine learning.

### Random Sampling

NumPy provides several functions for generating random numbers, including the random.rand() function for generating random floats in a given shape, and the random.randint() function for generating random integers in a given shape and range.

For example, the following code generates an array of 5 random floats between 0 and 1:


In [37]:
import numpy as np
np.random.rand(5)

array([0.29458009, 0.25169281, 0.69680856, 0.21127407, 0.7576471 ])

In [38]:
np.random.randint(0, 10, 5)

array([0, 8, 9, 2, 9])

## 5.6 USE OF NUMPY IN DATA ANALYIS 

Here are some real-life use cases of NumPy in data analysis.


In [48]:
import numpy as np

np.random.seed(42)

# Generate random age data for 500 people between 18 and 65
age = np.random.randint(18, 66, size=500)

# Generate random height data for 500 people between 4'6" and 7'0"
height_inches = np.random.randint(54, 84, size=500)
height_feet = height_inches / 12.0

# Generate random weight data for 500 people between 100 and 300 lbs
weight = np.random.randint(100, 301, size=500)

# Calculate BMI using the formula: weight (kg) / height (m)^2
height_meters = height_inches * 0.0254
weight_kg = weight * 0.453592
bmi = weight_kg / (height_meters ** 2)

# Print the mean, median, and standard deviation of age and BMI
print("Age:\n mean={:.2f}, \n median={:.2f},\n std={:.2f}".format(np.mean(age), np.median(age), np.std(age)))
print("BMI: \n mean={:.2f}, \n median={:.2f}, \n std={:.2f}".format(np.mean(bmi), np.median(bmi), np.std(bmi)))


Age:
 mean=41.98, 
 median=43.00,
 std=13.79
BMI: 
 mean=30.71, 
 median=28.50, 
 std=11.97
