# Detailed Introduction to NumPy for Data Science

NumPy (Numerical Python) is the foundational package for numerical computing in Python. It offers powerful data structures like arrays and matrices, and a variety of functions to perform operations on these structures efficiently. Understanding NumPy is crucial for data science, as it forms the basis of many other libraries like pandas, scikit-learn, and TensorFlow.

## 1. Importing NumPy

We start by importing the NumPy library. It is standard practice to import NumPy as `np` to keep your code concise and clear.

In [1]:
import numpy as np

## 2. Creating NumPy Arrays

NumPy arrays are the core of the NumPy library. They are similar to Python lists but are more powerful due to their ability to handle large datasets and perform operations efficiently.

### 2.1. Creating Arrays from Lists
You can create a NumPy array from a Python list using the `np.array()` function.

In [2]:
# Creating a NumPy array from a Python list
array_from_list = np.array([1, 2, 3, 4, 5])
print("Array from list:", array_from_list)

Array from list: [1 2 3 4 5]


### 2.2. Arrays with Specific Values
NumPy provides functions to create arrays filled with zeros, ones, or a specific range of values. These are useful for initializing data structures.

In [3]:
# Creating an array of zeros
zeros_array = np.zeros((3, 3))  # 3x3 matrix of zeros
print("Zeros Array:\n", zeros_array)

# Creating an array of ones
ones_array = np.ones((2, 4))  # 2x4 matrix of ones
print("Ones Array:\n", ones_array)

# Creating an array with a specific range
range_array = np.arange(0, 10, 2)  # Array with values [0, 2, 4, 6, 8]
print("Range Array:", range_array)

# Creating an array with random numbers
random_array = np.random.random((2, 2))  # 2x2 matrix of random numbers between 0 and 1
print("Random Array:\n", random_array)

Zeros Array:
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
Ones Array:
 [[1. 1. 1. 1.]
 [1. 1. 1. 1.]]
Range Array: [0 2 4 6 8]
Random Array:
 [[0.97498663 0.3175737 ]
 [0.49158296 0.17231358]]


### 2.3. Array of Evenly Spaced Numbers
The `np.linspace` function creates an array of evenly spaced numbers over a specified range. This is particularly useful for generating values for plotting graphs.

In [4]:
# Creating an array of 10 evenly spaced numbers between 0 and 1
linspace_array = np.linspace(0, 1, 10)
print("Linspace Array:", linspace_array)

Linspace Array: [0.         0.11111111 0.22222222 0.33333333 0.44444444 0.55555556
 0.66666667 0.77777778 0.88888889 1.        ]


## 3. Array Attributes

NumPy arrays have several attributes that are useful for understanding the properties of the array.

### 3.1. Array Shape
The shape of an array is a tuple that gives the size of the array along each dimension.

In [5]:
array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Shape of the array:", array.shape)

Shape of the array: (3, 3)


### 3.2. Number of Dimensions
The `ndim` attribute returns the number of dimensions (axes) of the array.

In [6]:
print("Number of dimensions:", array.ndim)

Number of dimensions: 2


### 3.3. Array Size
The `size` attribute returns the total number of elements in the array.

In [7]:
print("Size of the array:", array.size)

Size of the array: 9


## 4. Reshaping Arrays

Reshaping arrays is a common operation, especially when preparing data for machine learning models. The `reshape` method allows you to change the shape of an array without changing its data.

In [8]:
# Reshaping a 1D array to a 2D array
reshaped_array = np.arange(12).reshape((3, 4))  # 3x4 matrix
print("Reshaped Array (3x4):\n", reshaped_array)

Reshaped Array (3x4):
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


### Reshaping for Machine Learning
Reshaping is particularly useful in machine learning where you need to transform data into a shape suitable for training models, such as converting 1D arrays into 2D feature matrices.

In [9]:
# Example: Reshaping data for machine learning
data = np.arange(24).reshape((6, 4))  # 6 samples, 4 features each
print("Data reshaped for machine learning:\n", data)

Data reshaped for machine learning:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]


## 5. Indexing and Slicing Arrays

Indexing and slicing are powerful tools for accessing specific elements or subarrays within a larger array. This allows you to manipulate parts of the data without affecting the entire dataset.

### 5.1. Basic Indexing
You can access individual elements in a NumPy array by specifying their indices.

In [10]:
# Accessing elements
element = array[1, 2]  # Element at row index 1, column index 2 (value 6)
print("Element at index [1, 2]:", element)

Element at index [1, 2]: 6


### 5.2. Slicing
Slicing allows you to extract subarrays from an array by specifying a range of indices. This is useful for working with subsets of data.

In [11]:
# Slicing to get a subarray
subarray = array[0:2, 1:3]  # Subarray from rows 0 to 1 and columns 1 to 2
print("Subarray:\n", subarray)

# Another example: selecting all rows and specific columns
col_slice = array[:, 1:3]  # All rows, columns 1 to 2
print("Column Slice (all rows, cols 1 to 2):\n", col_slice)

Subarray:
 [[2 3]
 [5 6]]
Column Slice (all rows, cols 1 to 2):
 [[2 3]
 [5 6]
 [8 9]]


### 5.3. Advanced Indexing

NumPy also supports advanced indexing. This allows you to use arrays of indices to access multiple elements at once. It's particularly useful for selecting specific elements without using a loop.

In [12]:
# Using a list of indices to access specific elements
indexed_elements = array[[0, 1], [1, 2]]  # Elements (0, 1) and (1, 2)
print("Indexed Elements:", indexed_elements)

Indexed Elements: [2 6]


## 6. Array Operations

NumPy supports a wide range of operations on arrays, including element-wise operations, aggregation functions, and more. These operations are highly optimized and can handle large datasets efficiently.

### 6.1. Element-wise Operations

Element-wise operations are applied to each element in the array. This includes arithmetic operations like addition, subtraction, multiplication, and division, as well as mathematical functions like square root, exponential, and logarithm.

In [13]:
# Element-wise addition
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
sum_array = array1 + array2  # [5, 7, 9]
print("Element-wise Sum:", sum_array)

# Element-wise multiplication
prod_array = array1 * array2  # [4, 10, 18]
print("Element-wise Product:", prod_array)

# Using mathematical functions
sqrt_array = np.sqrt(array1)  # [1.0, 1.414, 1.732]
print("Square Root of array1:", sqrt_array)

Element-wise Sum: [5 7 9]
Element-wise Product: [ 4 10 18]
Square Root of array1: [1.         1.41421356 1.73205081]


### 6.2. Aggregation Functions

Aggregation functions perform a computation that reduces the dimension of the array by summarizing data. Common aggregation functions include `sum`, `mean`, `max`, and `min`.

In [14]:
array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Summing all elements
total_sum = array.sum()  # 45
print("Total Sum:", total_sum)

# Finding the maximum value
max_value = array.max()  # 9
print("Maximum Value:", max_value)

# Computing the mean
mean_value = array.mean()  # 5.0
print("Mean Value:", mean_value)

Total Sum: 45
Maximum Value: 9
Mean Value: 5.0


### 6.3. Axis-Based Operations

Many aggregation functions can be applied along a specific axis of the array. This is useful for operations like summing rows or columns in a matrix.

In [15]:
# Sum along the rows (axis 1)
row_sum = array.sum(axis=1)  # [6, 15, 24]
print("Sum of Rows:", row_sum)

# Mean along the columns (axis 0)
col_mean = array.mean(axis=0)  # [4.0, 5.0, 6.0]
print("Mean of Columns:", col_mean)

Sum of Rows: [ 6 15 24]
Mean of Columns: [4. 5. 6.]


## 7. Boolean Indexing

Boolean indexing is a powerful feature that allows you to filter arrays based on conditions. This is useful for selecting data that meets certain criteria.

In [16]:
# Boolean indexing example
array = np.array([1, 2, 3, 4, 5, 6])

# Selecting elements greater than 3
greater_than_three = array[array > 3]  # [4, 5, 6]
print("Elements greater than 3:", greater_than_three)

Elements greater than 3: [4 5 6]


## 8. Combining and Splitting Arrays

You can combine multiple arrays into one and split a single array into multiple smaller arrays. This is useful for data manipulation and organization.

### 8.1. Concatenating Arrays

You can concatenate arrays along different axes using `np.concatenate`, `np.vstack` (vertical stack), and `np.hstack` (horizontal stack).

In [17]:
# Concatenating arrays vertically (axis 0)
array1 = np.array([[1, 2], [3, 4]])
array2 = np.array([[5, 6]])
vertical_stack = np.vstack((array1, array2))
print("Vertical Stack:\n", vertical_stack)

# Concatenating arrays horizontally (axis 1)
horizontal_stack = np.hstack((array1, array2.T))  # Transpose array2 for horizontal stack
print("Horizontal Stack:\n", horizontal_stack)

Vertical Stack:
 [[1 2]
 [3 4]
 [5 6]]
Horizontal Stack:
 [[1 2 5]
 [3 4 6]]


### 8.2. Splitting Arrays

You can split an array into multiple smaller arrays using `np.split`, `np.vsplit` (vertical split), and `np.hsplit` (horizontal split).

In [18]:
# Splitting an array into 3 sub-arrays along axis 1
array = np.arange(9).reshape(3, 3)
split_arrays = np.split(array, 3, axis=1)
print("Original Array:\n", array)
print("Split Arrays:")
for i, sub_array in enumerate(split_arrays):
    print(f"Sub-array {i+1}:\n", sub_array)

Original Array:
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
Split Arrays:
Sub-array 1:
 [[0]
 [3]
 [6]]
Sub-array 2:
 [[1]
 [4]
 [7]]
Sub-array 3:
 [[2]
 [5]
 [8]]


## 9. Array Broadcasting

Broadcasting allows NumPy to perform operations on arrays of different shapes in a way that would not normally be possible. This is particularly useful for performing arithmetic operations on arrays of different shapes.

In [19]:
# Broadcasting example
array = np.array([1, 2, 3])
scalar = 2

# Adding a scalar to an array
broadcasted_array = array + scalar  # [3, 4, 5]
print("Broadcasted Array:", broadcasted_array)

# Broadcasting with a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
scalar_2d = 10
broadcasted_array_2d = array_2d + scalar_2d  # Adding 10 to each element
print("Broadcasted 2D Array:\n", broadcasted_array_2d)

Broadcasted Array: [3 4 5]
Broadcasted 2D Array:
 [[11 12 13]
 [14 15 16]]


## 10. Array Copying and Views

In NumPy, arrays can be copied or viewed. A copy creates a new array with its own data, while a view creates a new array object that references the original data. Modifications to a view will affect the original array, while modifications to a copy will not.

In [20]:
# Creating an original array
original_array = np.array([1, 2, 3, 4])

# Creating a copy
array_copy = original_array.copy()
array_copy[0] = 100  # Modify copy
print("Original Array:", original_array)
print("Modified Copy:", array_copy)

# Creating a view
array_view = original_array.view()
array_view[0] = 200  # Modify view
print("Original Array after modifying view:", original_array)
print("View:", array_view)

Original Array: [1 2 3 4]
Modified Copy: [100   2   3   4]
Original Array after modifying view: [200   2   3   4]
View: [200   2   3   4]


## 11. Mathematical Functions

NumPy provides a wide range of mathematical functions that can be applied element-wise to arrays. These functions include trigonometric functions, exponentiation, logarithms, and more.

In [21]:
# Applying mathematical functions
array = np.array([1, 4, 9, 16])

# Square root
sqrt_array = np.sqrt(array)  # [1.0, 2.0, 3.0, 4.0]
print("Square Root:", sqrt_array)

# Exponential
exp_array = np.exp(array)  # [2.718, 54.598, 810.558, 8882.208]
print("Exponential:", exp_array)

# Logarithm
log_array = np.log(array)  # [0.0, 1.386, 2.197, 2.773]
print("Logarithm:", log_array)

Square Root: [1. 2. 3. 4.]
Exponential: [2.71828183e+00 5.45981500e+01 8.10308393e+03 8.88611052e+06]
Logarithm: [0.         1.38629436 2.19722458 2.77258872]


## 12. Linear Algebra Operations

NumPy includes several functions for performing linear algebra operations such as matrix multiplication, determinants, and inverses. These are essential for many applications in data science and machine learning.

In [22]:
# Linear algebra operations
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

# Matrix multiplication
matrix_product = np.dot(matrix1, matrix2)
print("Matrix Product:\n", matrix_product)

# Determinant
det_matrix1 = np.linalg.det(matrix1)
print("Determinant of matrix1:", det_matrix1)

# Inverse
inverse_matrix1 = np.linalg.inv(matrix1)
print("Inverse of matrix1:\n", inverse_matrix1)

Matrix Product:
 [[19 22]
 [43 50]]
Determinant of matrix1: -2.0000000000000004
Inverse of matrix1:
 [[-2.   1. ]
 [ 1.5 -0.5]]


## Conclusion

An extensive introduction to NumPy for data manipulation was given in this notebook. Broadcasting, copying, reshaping, indexing, mathematical operations, and linear algebra were all covered. Gaining proficiency with these ideas is essential for efficient Python data manipulation and analysis.