# Numpy + Vectorized Programming 

- When we move over to data science, ML, and DL, we need to switch to Numpy structures and operations because there's so much more data (which can be expensive / break things)
- Vectorization is the ability to perform operations across a lot of data in parallel

In [1]:
import numpy as np # Common shorthand used

## Basic Numpy Structures

- Numpy Arrays are very similar to python lists, but have the ability to be operated in many more ways that make modeling and data science much easier
- Arrays are homogeneous (all elements must be the same type) and more memory-efficient than Python lists
- Arrays can be multi-dimensional (vectors (1D), matrices (2D), tensors (nD))
- They have fixed size after creation (unlike Python lists which can grow dynamically)
- NumPy arrays provide many built-in methods for mathematical operations, statistical functions, and data manipulation

In [4]:
# Create a 1D array (vector) - useful for storing sequences, time series, or feature vectors
vector = np.array([1, 2, 3, 4, 5])
print("1D array (vector):", vector)
print("Shape:", vector.shape)  # Will show (5,)

# Create a 2D array (matrix) - useful for tabular data, images (grayscale), or feature matrices
matrix = np.array([[1, 2, 3],
                    [4, 5, 6],
                    [7, 8, 9]])
print("\n2D array (matrix):\n", matrix)
print("Shape:", matrix.shape)  # Will show (3, 3)

# Create a 3D array - useful for RGB images, video data, or multiple samples of 2D data
tensor = np.array([[[1, 2], [3, 4]],
                    [[5, 6], [7, 8]],
                    [[9, 10], [11, 12]]])
print("\n3D array (tensor):\n", tensor)
print("Shape:", tensor.shape)  # Will show (3, 2, 2)

1D array (vector): [1 2 3 4 5]
Shape: (5,)

2D array (matrix):
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Shape: (3, 3)

3D array (tensor):
 [[[ 1  2]
  [ 3  4]]

 [[ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]]]
Shape: (3, 2, 2)


## Other types of Arrays

Here are other types of arrays commonly used in data science
- Zeroes and Ones arrays commonly used to initialize "empty" arrays (remember that array sizes are fixed). 
  - Zeroes are used for initializing biases, adding padding
  - Ones are used in cases where you need to multiply without effect / initializing scaling arrays
- Arange and Linspace are ways to make arrays that span a specific range 
  - Arange creates a range array with a specific step in between numbers 
  - Linspace creates a range array with a specific number of elements
- Eye and Diag are useful for creating identity matrices to start off with or scaling matrices
- You can use ndim, shape, etc to learn more about a structure of a specific array

In [5]:
# Common array creation functions
zeros = np.zeros((3, 3))  # Create array of zeros - useful for initializing arrays
ones = np.ones((2, 4))    # Create array of ones - useful for masks or initialization
range_array = np.arange(0, 10, 0.1) # Array of evenly spaced numbers with a step size (start, stop, step)
linspace_array = np.linspace(0, 10, 12) # Array of evenly spaced numbers with a fixed number of samples (start, stop, num samples)

print("\nArray of zeros:\n", zeros)
print("\nArray of ones:\n", ones)
print("\nRange array:\n", range_array)
print("\nLinspace array:\n", linspace_array)

# Identity matrix - useful for creating matrices with ones on the diagonal
identity_matrix = np.eye(4)  # 4x4 identity matrix
print("\nIdentity matrix:\n", identity_matrix)

# Create a diagonal matrix from a 1D array
diagonal_array = np.array([1, 2, 3, 4])
diagonal_matrix = np.diag(diagonal_array)
print("\nDiagonal matrix from array:\n", diagonal_matrix)

# Extract the diagonal from a matrix
extracted_diagonal = np.diag(matrix)
print("\nExtracted diagonal from matrix:", extracted_diagonal)


# In addition to shape, we can also 
dimensions = tensor.ndim # Number of dimensions
elements = tensor.size # Total number of elements
itemsize = tensor.itemsize # Size of each element in bytes
dtype = tensor.dtype # Data type of the array

print(f"Dimensions: {dimensions}")
print(f"Elements: {elements}")
print(f"Item size: {itemsize} bytes")


Array of zeros:
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

Array of ones:
 [[1. 1. 1. 1.]
 [1. 1. 1. 1.]]

Range array:
 [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.  1.1 1.2 1.3 1.4 1.5 1.6 1.7
 1.8 1.9 2.  2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.  3.1 3.2 3.3 3.4 3.5
 3.6 3.7 3.8 3.9 4.  4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.  5.1 5.2 5.3
 5.4 5.5 5.6 5.7 5.8 5.9 6.  6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.  7.1
 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.  8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9
 9.  9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9]

Linspace array:
 [ 0.          0.90909091  1.81818182  2.72727273  3.63636364  4.54545455
  5.45454545  6.36363636  7.27272727  8.18181818  9.09090909 10.        ]
Dimensions: 3
Elements: 12
Item size: 8 bytes


## Random Array Generation

- In a lot of cases we create arrays and matrices initialized with random numbers to kick off a training process or pass in test data 
- Although it's initialized with random numbers, we usually set a random seed so that the same random numbers appear for reproducibility
- The type of random array you use will vary based o what kind of data your sampling or what kind of weights you're initializing

In [11]:
# Setting a seed for random number generation (Very important for ML)
np.random.seed(42) # 42 is common

# Random Array of floats between [0,1)
random_array = np.random.rand(3, 3) # shape
print(random_array)

# Random Array of floats from a normal distribution
random_array = np.random.randn(3, 3) # shape
print(random_array)

# Random Array of integers between a range
random_array = np.random.randint(0, 10, (3, 3)) # shape
print(random_array)

# Random Array of booleans
random_array = np.random.choice([True, False], size=(3, 3)) # shape
print(random_array)

random_array = np.random.random(size=(3, 3)) # shape
print(random_array)

[[0.37454012 0.95071431 0.73199394]
 [0.59865848 0.15601864 0.15599452]
 [0.05808361 0.86617615 0.60111501]]
[[-0.58087813 -0.52516981 -0.57138017]
 [-0.92408284 -2.61254901  0.95036968]
 [ 0.81644508 -1.523876   -0.42804606]]
[[2 6 3]
 [8 2 4]
 [2 6 4]]
[[0 0 0]
 [0 0 0]
 [0 0 0]]
[[ True  True False]
 [False  True False]
 [False False False]]
[[0.30461377 0.09767211 0.68423303]
 [0.44015249 0.12203823 0.49517691]
 [0.03438852 0.9093204  0.25877998]]


## Array Operations 

- You don't need to remember all these functions, but there are a lot of ways to manipulate data and perform standard math in a vectorized, quick manner
- NumPy provides efficient implementations of basic arithmetic operations (+, -, *, /)
- Advanced mathematical functions are available (sqrt, exp, log)
- Statistical operations like mean, median, min, max are built-in

In [None]:
# Basic arithmetic operations
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Addition
c = a + b  # Element-wise addition
c = np.add(a, b)  # Same as a + b

# Subtraction
c = a - b  # Element-wise subtraction
c = np.subtract(a, b)  # Same as a - b

# Multiplication
c = a * b  # Element-wise multiplication
c = np.multiply(a, b)  # Same as a * b

# Division
c = a / b  # Element-wise division
c = np.divide(a, b)  # Same as a / b

# Integer division
c = a // b  # Element-wise integer division
c = np.floor_divide(a, b)  # Same as a // b

# Exponentiation
c = a ** 2  # Element-wise squaring
c = np.square(a)  # Same as a ** 2
c = a ** b  # Element-wise power
c = np.power(a, b)  # Same as a ** b

# Other mathematical operations
c = np.sqrt(a)  # Element-wise square root
c = np.exp(a)  # Element-wise exponential (e^x)
c = np.log(a)  # Element-wise natural logarithm
c = np.log10(a)  # Element-wise base-10 logarithm

# Statistical operations
a = np.array([1, 2, 3, 4, 5])
mean_val = np.mean(a)  # Mean of all elements
median_val = np.median(a)  # Median of all elements
max_val = np.max(a)  # Maximum value
min_val = np.min(a)  # Minimum value
sum_val = np.sum(a)  # Sum of all elements

# Index operations
max_idx = np.argmax(a)  # Index of maximum value
min_idx = np.argmin(a)  # Index of minimum value

# Statistical measures
std_val = np.std(a)  # Standard deviation
var_val = np.var(a)  # Variance

# Matrix operations
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

C = np.dot(A, B)  # Matrix multiplication
C = A @ B  # Also matrix multiplication

# Transpose
C = A.T  # Transpose of A
C = np.transpose(A)  # Same as A.T

# Aggregation along axes
a = np.array([[1, 2, 3], [4, 5, 6]])
row_sums = np.sum(a, axis=1)  # Sum of each row
col_sums = np.sum(a, axis=0)  # Sum of each column


# Accessing, Broadcasting, and Vectorized Operations

- 

In [13]:
# Accessing and Slicing Arrays
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

# Basic indexing
element = a[0, 0]  # First element
last = a[2, 3]     # Last element

# Slicing
rows_subset = a[1:]        # Rows 1 to end
columns_subset = a[:, 1:3] # All rows, columns 1 to 2
block = a[1:3, 2:]         # Rows 1 to 2, columns 2 to end

# Boolean indexing (common in ML for filtering data)
mask = a > 5
filtered = a[mask]  # Returns all elements > 5

# Replace values where mask is True with 0
a[mask] = 0  # Sets all elements > 5 to 0

# Fancy indexing (common in ML for batch processing)
row_indices = np.array([0, 2])
col_indices = np.array([1, 3])
selected = a[row_indices, col_indices]  # Select (0,1) and (2,3) elements

# Broadcasting
# Example 1: Adding a scalar to an array
a = np.array([[1, 2, 3], [4, 5, 6]])
result1 = a + 10  # Add 10 to each element

# Example 2: Adding a vector to each row of a matrix
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
b = np.array([10, 20, 30])
result2 = a + b  # b is broadcast to match a's shape

# Example 3: Adding a column vector to a matrix
c = np.array([[10], [20], [30]])
result3 = a + c  # c is broadcast across columns

# Vectorized Operations (common in ML/DL)
# Element-wise operations (no loops needed)
x = np.array([1, 2, 3, 4])
y = np.array([5, 6, 7, 8])
element_mult = x * y  # Element-wise multiplication

# Sigmoid function (common in neural networks)
z = np.array([-2, -1, 0, 1, 2])
sigmoid = 1 / (1 + np.exp(-z))  # Vectorized sigmoid calculation

# Batch normalization (common in deep learning)
batch = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
batch_mean = np.mean(batch, axis=0)
batch_std = np.std(batch, axis=0)
normalized_batch = (batch - batch_mean) / batch_std  # Normalized along features

# One-hot encoding (common in ML for categorical data)
labels = np.array([0, 2, 1, 0])
one_hot = np.eye(3)[labels]  # Convert to one-hot vectors

