### **`Understanding Numpy`**

- Numpy (Numerical Python) is the foundation library for scientific computing in Python.
- It provides a N-dimensional array object and tools for working with these arrays.

- Think of Numpy as the engine that powers most data science libraries- pandas uses Numpy arrays internally, scikit-learn expects Numpy arrays for machine learning and matplotlib uses Numpy for plotting.

- Numpy operations are implemented in C, making them 10-100x faster than pure python
- NumPy arrays store data more compactly than python lists.
- It perform operations on entire arrays without writing loops (Vectorization).
- Foundation for pandas, scikit-learn, matplotlib, and more.

In [2]:
# import all necessary libraries

import numpy as np
import matplotlib.pyplot as plt
import time

# check numpy version
print(f"Numpy version:", np.__version__)

# Diplay settings for cleaner output
np.set_printoptions(precision=1, suppress=True)

Numpy version: 2.3.3


**Creating NumPy Arrays**

In [3]:
# creating arrays from python lists
# 1D array: A simple seauence of numbers
arr1d = np.array([1, 2, 3, 4, 5])

# 2D array: Think of this as a matrix or table with rows and columns
arr2d = np.array([[1, 2 ,3],
                  [4,5,6]])

# 3D array: Like a stack of 2D arrays - useful for images, time series, e.t.c
arr3d = np.array([
                     [[1, 2], [3, 4]],
                     [[5, 6], [7, 8]]
                ])

print("1D array: ", arr1d)
print("2D array:\n ", arr2d)
print("3D array:\n ", arr3d)


1D array:  [1 2 3 4 5]
2D array:
  [[1 2 3]
 [4 5 6]]
3D array:
  [[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


When you pass a nested list to `np.array()`, NumPy automatically determines the dimensions. The 1D array is like a single row, 2D is like a spreadsheet, and 3D is like multiple spreadsheets stacked together.

**Creating Special Arrays in Numpy**


In [4]:
# creating arrays filled with zeros - useful for initializing arrays
# Shape (3, 4) means 3 rows and 4 columns
zeros = np.zeros((3,4))

# creating arrays filled with ones- often used as starting points
ones = np.ones ((2, 3, 4) ) #3D array: 2 layers, 3 rows, 4 columns

# Empty array- faster than zeros/ones but contains random values
# use when you will immediately fill the array with real data

empty= np.empty((2, 3))

print("Zeros Array (3x4): \n", zeros)
print("Ones array shape:\n ", ones)
print("Ones array shape:\n ", ones.shape)
print("Empty array (contains random values): \n", empty)


Zeros Array (3x4): 
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
Ones array shape:
  [[[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]]
Ones array shape:
  (2, 3, 4)
Empty array (contains random values): 
 [[-0. -0.  0.]
 [ 0.  0.  0.]]


`zeros()` and `ones()` are memory-efficient ways to create arrays of specific sizes. `empty()` is fastest but contains garbage values, so only use it when you will immediately overwrite the contents.

In [5]:
# Range arrays - like Python's range() but more powerful
range_arr = np.arange(0, 10, 2) #Start, Stop, Step: [0, 2, 4, 6, 8]
print("Range array: ", range_arr)

# linearly spaced arrays - divide a range into equal parts
# From 0 to 1 with exactly 5 points (including endpoints)
linspace_arr = np.linspace(0, 5, 50)
print("Linspace array: ", linspace_arr)

# Logarithmically spaced arrays - useful for scientific data
# From 10^0 to 10^2 (1 to 100) with 5 points
logspace_arr = np.logspace(0, 2, 5)
print("Logspace array: ", logspace_arr)

Range array:  [0 2 4 6 8]
Linspace array:  [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.  1.1 1.2 1.3 1.4 1.5 1.6 1.7
 1.8 1.9 2.  2.1 2.2 2.3 2.4 2.6 2.7 2.8 2.9 3.  3.1 3.2 3.3 3.4 3.5 3.6
 3.7 3.8 3.9 4.  4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5. ]
Logspace array:  [  1.    3.2  10.   31.6 100. ]


In [6]:
# Identity matrix - diagonal of ones, zeros elsewhere
# Essential for linear algebra operations

identity = np.eye(4) # 4x4 identity matrix

# diagonal matrix- put values in the diagonal
diagonal = np.diag([1, 2, 3, 4, 5])

# array filled with a specific value
full_arr = np.full((3, 3), 7) # 3 x 3 array filled with 7

print("Identity matrix: \n", identity)
print("\nDiagonal matrix: \n", diagonal)
print ("\nFull array (filled with 7):\n", full_arr)

Identity matrix: 
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

Diagonal matrix: 
 [[1 0 0 0 0]
 [0 2 0 0 0]
 [0 0 3 0 0]
 [0 0 0 4 0]
 [0 0 0 0 5]]

Full array (filled with 7):
 [[7 7 7]
 [7 7 7]
 [7 7 7]]


**Numpy Data Types**
- Understanding data types is crucial for memory efficiency and numerical precision

In [7]:
# Explicit data types- control memory usage and precision
int_arr = np.array([1, 2, 3], dtype= np.int32) #32-bit integers
float_arr = np.array ([1, 2, 3], dtype=np.float64) #64-bit floats (double precision)
bool_arr = np.array([True, False, True], dtype= np.bool_) #boolean values

# Type conversion- change dtype of existing array
converted = int_arr.astype(np.float32) #convert to 32-bit float

print("Integer array dtype: ", int_arr.dtype)
print("float array dtype: ", float_arr.dtype) 
print("Boolean array dtype: ", bool_arr)
print("Converted array dtype: ", converted.dtype)

# Memory usage comparison
print(f"int32 uses {int_arr.itemsize} bytes per element")
print(f"float64 uses {float_arr.itemsize} bytes per element")

Integer array dtype:  int32
float array dtype:  float64
Boolean array dtype:  [ True False  True]
Converted array dtype:  float32
int32 uses 4 bytes per element
float64 uses 8 bytes per element


- int8: 1 byte, range -128 to 127
- int32: 4 bytes, range ±2 billion
- float32: 4 bytes, ~7 decimal digits precision
- float64: 8 bytes, ~15 decimal digits precision

- Choose smaller types to save memory, larger types for precision

**Array Properties & Attributes**

- Understanding array properties helps you work effectively with your data and debug issue

In [8]:
# Create a sample 3D array for demonstration
# Think of this as 3 layers, each with 4 rows and 5 columns
arr = np.random.randn(3, 4, 5)
print(arr)
# Shape: The dimensions of the array (layers, rows, columns)
print("Shape:", arr.shape)           # Output: (3, 4, 5)

# Size: Total number of elements (3 × 4 × 5 = 60)
print("Size:", arr.size)             

# Ndim: Number of dimensions (3D in this case)
print("Ndim:", arr.ndim)             

# Dtype: Data type of elements
print("Dtype:", arr.dtype)           # Usually float64 for random numbers

# Itemsize: Memory size of each element in bytes
print("Itemsize:", arr.itemsize)     # 8 bytes for float64

# Total memory usage in bytes
print("Memory usage:", arr.nbytes, "bytes")  # size × itemsize
print("Memory usage:", arr.nbytes / 1024, "KB")  # Convert to KB

[[[-0.2 -0.4  0.7 -0.7  1.5]
  [-0.4  0.   0.1  0.6 -0.2]
  [ 1.4  1.   0.3 -0.5  0.6]
  [ 0.6  0.3  0.4  0.4  0.5]]

 [[ 0.4 -0.6 -1.   0.2 -0.4]
  [ 0.9  1.2  0.2  2.  -0.5]
  [-0.7  0.6  0.2 -0.5 -0.3]
  [-0.1  0.1 -1.9  0.8  0.8]]

 [[ 0.8  0.1 -2.3  0.5 -1. ]
  [-0.6  0.1  1.   0.5 -0.5]
  [-2.5  0.4  0.5 -0.6  1.1]
  [ 0.7 -0.1 -0.9 -0.2 -0.3]]]
Shape: (3, 4, 5)
Size: 60
Ndim: 3
Dtype: float64
Itemsize: 8
Memory usage: 480 bytes
Memory usage: 0.46875 KB


**Array Indexing & Slicing**

 **Basic Indexing - Accessing Individual Elements**

- NumPy indexing is similar to Python lists but more powerful for multi-dimensional arrays

In [9]:
# 1D array indexing- similar to Python lists
arr1d = np.array([10,20,30, 40, 50])

print("First element:", arr1d[0])     # Index 0: 10
print("Last element:", arr1d[-1])     # Negative indexing: 50  
print("Slice [1:4]:", arr1d[1:4])     # Elements 1, 2, 3: [20, 30, 40]
print("Every 2nd element:", arr1d[::2])  # Step of 2: [10, 30, 50]


First element: 10
Last element: 50
Slice [1:4]: [20 30 40]
Every 2nd element: [10 30 50]


Negative indices count from the end (-1 is last element). Slicing uses [start:stop:step] where stop is exclusive.

In [10]:
# 2D array indexing - row and column access
arr2d = np.array([[1, 2, 3, 4],
                  [5, 6, 7, 8],
                  [9, 10, 11, 12]])

# Access specific element : [row, column]
print("Element at row 1, column 2: ", arr2d[1, 2])

# access entire rows or columns
print("First row: ", arr2d[0,:]) #all columns of row 0
print("Second Column: ", arr2d[:,-1]) #all rows of column 1

# Subarray slicing: [row_start:row_end, col_start:col_end]
print("Subarray (rows 1-2, cols 1-2):\n", arr2d[1:3, 1:3])


Element at row 1, column 2:  7
First row:  [1 2 3 4]
Second Column:  [ 4  8 12]
Subarray (rows 1-2, cols 1-2):
 [[ 6  7]
 [10 11]]


The comma separates dimensions. : means "all elements along this dimension". Slicing creates views of the original data when possible, not copies.

**Advanced Indexing - Powerful Selection Methods**

In [11]:
# Fancy indexing - use arrays of indices to select elements
arr = np.array([10, 20, 30, 40, 50])
indices = np.array([0, 2, 4])  # Select elements at positions 0, 2, 4
print("Fancy indexing:", arr[indices])         # [10, 30, 50]

# This is much more flexible than simple slicing
random_indices = np.array([4, 1, 3, 1])  # Can repeat and reorder
print("Random order:", arr[random_indices])   # [50, 20, 40, 20]

Fancy indexing: [10 30 50]
Random order: [50 20 40 20]


Fancy indexing lets you select elements in any order, repeat elements, and select non-contiguous elements. Very useful for data sampling and reordering.

In [12]:
# 2D fancy indexing - select specific row/column combinations
arr2d = np.arange(12).reshape(3, 4)  # 3x4 array: [[0,1,2,3], [4,5,6,7], [8,9,10,11]]
print("Original 2D array:\n", arr2d)

# Select elements at (row, col) pairs: (0,1) and (2,3)
rows = np.array([0, 2])
cols = np.array([1, 3])
print("Elements at (0,1) and (2,3):", arr2d[rows, cols])  # [1, 11]

# Select entire rows using fancy indexing
selected_rows = arr2d[[0, 2], :]  # Rows 0 and 2, all columns
print("Selected rows:\n", selected_rows)

Original 2D array:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Elements at (0,1) and (2,3): [ 1 11]
Selected rows:
 [[ 0  1  2  3]
 [ 8  9 10 11]]


When you provide arrays for both dimensions, NumPy pairs them element-wise. This is different from slicing, which creates a rectangular subarray.

**Array Reshaping & Manipulation**

- Reshaping changes how the same data is organized in memory without changing the actual values.


In [13]:
# Start with a 1D array
arr = np.arange(12)  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
print("Original 1D array:", arr)

# Reshape to 2D: 3 rows × 4 columns
reshaped_2d = arr.reshape(3, 4)
print("Reshaped to 3x4:\n", reshaped_2d)

# Reshape to 3D: 2 layers × 2 rows × 3 columns  
reshaped_3d = arr.reshape(2, 2, 3)
print("Reshaped to 2x2x3:\n", reshaped_3d)

# Use -1 to let NumPy calculate one dimension automatically
auto_reshape = arr.reshape(4, -1)  # 4 rows, NumPy calculates columns
print("Auto-reshaped to 4x?:\n", auto_reshape)

Original 1D array: [ 0  1  2  3  4  5  6  7  8  9 10 11]
Reshaped to 3x4:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Reshaped to 2x2x3:
 [[[ 0  1  2]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]]
Auto-reshaped to 4x?:
 [[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


The total number of elements must remain the same (12 in this case). Using -1 tells NumPy to calculate that dimension automatically. Reshaping creates a view when possible, not a copy

In [14]:
# Flattening - convert multi-dimensional array to 1D
arr2d = np.array([[1, 2, 3], [4, 5, 6]])

# flatten() always returns a copy
flattened = arr2d.flatten()                 
print("Flattened (copy):", flattened)

# ravel() returns a view if possible (faster, memory efficient)
ravel = arr2d.ravel()                       
print("Ravel (view if possible):", ravel)

# Demonstrate the difference
arr2d[0, 0] = 999
print("After modifying original:")
print("Flattened (unchanged):", flattened)  # Copy is independent
print("Ravel (changed):", ravel)            # View reflects changes

Flattened (copy): [1 2 3 4 5 6]
Ravel (view if possible): [1 2 3 4 5 6]
After modifying original:
Flattened (unchanged): [1 2 3 4 5 6]
Ravel (changed): [999   2   3   4   5   6]


Use ravel() when you don't need to modify the flattened array independently. Use flatten() when you need a separate copy that won't be affected by changes to the original.

**Transposing and Swapping Axes**

 - Transposing is essential for matrix operations and changing data orientation

In [15]:
# 2D transposition- flip rows and columns
arr2d = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

print('Oirginal shape:', arr2d.shape) # (2,3)
print("Original: \n", arr2d)

print("Transposed shape: ", arr2d.T.shape)
print("Transposed:\n ", arr2d.T)

# Alternative transpose methods
print("\nTranspose method:\n", arr2d.transpose())

Oirginal shape: (2, 3)
Original: 
 [[1 2 3]
 [4 5 6]]
Transposed shape:  (3, 2)
Transposed:
  [[1 4]
 [2 5]
 [3 6]]

Transpose method:
 [[1 4]
 [2 5]
 [3 6]]


Transposing swaps rows and columns. This is crucial for matrix multiplication and when you need to change data orientation (e.g., from samples×features to features×samples)

In [16]:
# Higher-dimensional transposition
arr3d = np.arange(24).reshape(2, 3, 4)  # 2 layers, 3 rows, 4 columns
print("Original 3D shape:\n", arr3d)

# specify new axis order: (axis0, axis 1, axis 2) -> (axis2, axis0, axis1)
transposed_3d = arr3d.transpose(2,0,1)
print("Transposed to 3D shape:\n", transposed_3d.shape)

# another method to reaarange axes
moved = np.moveaxis(arr3d, 0, -1)
print("Moveaxis result shape: ", moved.shape)

Original 3D shape:
 [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
Transposed to 3D shape:
 (4, 2, 3)
Moveaxis result shape:  (3, 4, 2)


 For 3D+ arrays, you specify the new order of axes. This is useful for reshaping data for different algorithms or visualization requirements.

**Concatenating and Splitting Arrays**

- Combining and dividing arrays is fundamental for data manipulation

In [17]:
# concatenation -joining arrays along existing axes
arr1 =np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])

concat_rows = np.concatenate([arr1, arr2], axis = 0) #stack vertically (add rows)
concat_cols = np.concatenate ([arr1, arr2], axis= 1) #stack horizointally

print("Original arrays:")
print("Array 1:\n", arr1)
print("Array 2:\n", arr2)
print("Concatenated vertically (axis=0):\n", concat_rows)
print("Concatenated horizontally (axis=1):\n", concat_cols)


Original arrays:
Array 1:
 [[1 2]
 [3 4]]
Array 2:
 [[5 6]
 [7 8]]
Concatenated vertically (axis=0):
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]
Concatenated horizontally (axis=1):
 [[1 2 5 6]
 [3 4 7 8]]


`axis=0` means along rows (vertical stacking), `axis=1` means along columns (horizontal stacking). Arrays must have compatible shapes along the non-concatenated dimensions.

In [18]:
# convenient stacking functiom
vstack_result = np.vstack([arr1,arr2]) # same as concatenate with axis = 0
hstack_result = np.hstack([arr1, arr2])   # same as concatenate with axia =1
dstack_result = np.dstack([arr1, arr2])   # satck along depth (3rd dimension)

print("vstack (vertical):\n", vstack_result)
print("hstack (horizontal):\n", hstack_result)
print("dstack shape:", dstack_result.shape)  # Creates 3D array

# splitting arrays - opposite of concatenation
arr = np.arange(12).reshape(3, 4)
split_arrays = np.split(arr, 3, axis = 0)

print("Original array for splitting: \n", arr)
print("Split into 3 parts along rows: ")

for i, split_part in enumerate (split_arrays):
    print(f"Part{i}: \n", split_part)

vstack (vertical):
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]
hstack (horizontal):
 [[1 2 5 6]
 [3 4 7 8]]
dstack shape: (2, 2, 2)
Original array for splitting: 
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Split into 3 parts along rows: 
Part0: 
 [[0 1 2 3]]
Part1: 
 [[4 5 6 7]]
Part2: 
 [[ 8  9 10 11]]


Stacking functions are shortcuts for concatenation. Splitting divides an array into equal parts - useful for creating training/validation sets or processing data in chunks.

**Mathematical Operations**


**Element-wise Operations - The Power of Vectorization**

- NumPy's biggest advantage is performing operations on entire arrays without writing loop

In [19]:
# Basic arithmetic operations work element-by-element
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([10, 20, 30, 40])

print("Array 1:", arr1)
print("Array 2:", arr2)

# All operations happen element-wise automatically
print("Addition:", arr1 + arr2)               # [11, 22, 33, 44]
print("Subtraction:", arr2 - arr1)            # [9, 18, 27, 36]
print("Multiplication:", arr1 * arr2)         # [10, 40, 90, 160]
print("Division:", arr2 / arr1)               # [10, 10, 10, 10]
print("Power:", arr1 ** 2)                    # [1, 4, 9, 16]
print("Modulo:", arr2 % 3)                    # [1, 2, 0, 1]

Array 1: [1 2 3 4]
Array 2: [10 20 30 40]
Addition: [11 22 33 44]
Subtraction: [ 9 18 27 36]
Multiplication: [ 10  40  90 160]
Division: [10. 10. 10. 10.]
Power: [ 1  4  9 16]
Modulo: [1 2 0 1]


 This vectorization is much faster than Python loops because the operations are implemented in optimized C code. Each operation applies to corresponding elements.

In [20]:
# operations with scalars- broadcasting in action
print("Scalar Operations: ")
print("Add 10 to all elemements:", arr1 + 10)
print("Multiply all by 3:", arr1 * 3)               # [3, 6, 9, 12]
print("Divide all by 2:", arr1 / 2)                 # [0.5, 1, 1.5, 2]

# Compound operations
result = (arr1 + 5) * 2 - 1                         # ((arr1 + 5) * 2) - 1
print("Compound operation (arr1 + 5) * 2 - 1:", result)

Scalar Operations: 
Add 10 to all elemements: [11 12 13 14]
Multiply all by 3: [ 3  6  9 12]
Divide all by 2: [0.5 1.  1.5 2. ]
Compound operation (arr1 + 5) * 2 - 1: [11 13 15 17]


When you operate on arrays with scalars, the scalar is automatically "broadcast" to match the array shape. This is much more readable and efficient than manual loops

**Mathematical Functions - Beyond Basic Arithmetic**

- NumPy provides vectorized versions of most mathematical function

In [21]:
arr = np.array([1, 4, 9, 16, 25])
print("Original array:", arr)

# square roots and powers
print("Square root: "), np.sqrt(arr)
print("Square: ", np.square(arr))
print("Cube root: "), np.cbrt(arr)

# Exponential and logarithmic functions
small_arr = np.array([1, 2, 3])
print("Exponential:", np.exp(small_arr))
print("Natural  log:", np.log(arr))
print("Log base 10:", np.log10(arr))
print("Log base 2:", np.log2(arr))

Original array: [ 1  4  9 16 25]
Square root: 
Square:  [  1  16  81 256 625]
Cube root: 
Exponential: [ 2.7  7.4 20.1]
Natural  log: [0.  1.4 2.2 2.8 3.2]
Log base 10: [0.  0.6 1.  1.2 1.4]
Log base 2: [0.  2.  3.2 4.  4.6]


These functions are much faster than applying Python's math functions in a loop. They also handle edge cases (like log of zero) more gracefully.

In [22]:
# trigonometric functions - essential for signal processsin and geometry
angles = np.array([0, np.pi/4, np.pi/2, np.pi])
print("Angles (radians):", angles)
print("Sine:", np.sin(angles))
print("Cosine:", np.cos(angles))
print("Tangent:", np.tan(angles))

# convert degreses to radians
degrees = np.array ([0, 45, 90, 180])
radians = np.deg2rad(degrees)
degree = np.rad2deg(radians)
print("Degrees to radians:", radians)
print("Radians to degree:", degree)

Angles (radians): [0.  0.8 1.6 3.1]
Sine: [0.  0.7 1.  0. ]
Cosine: [ 1.   0.7  0.  -1. ]
Tangent: [ 0.0e+00  1.0e+00  1.6e+16 -1.2e-16]
Degrees to radians: [0.  0.8 1.6 3.1]
Radians to degree: [  0.  45.  90. 180.]


Trigonometric functions expect angles in radians. Use `deg2rad()` and `rad2deg()` for conversions. These functions are essential for signal processing, computer graphics, and physics simulations.

In [23]:
# rounding and comparison functions

decimals = np.array([1.234, 5.678, 9.999, -2.345])
print("Original decimals: ", decimals)
print("Round to 2 places: ", np.round(decimals, 2))
print("Floor (round down): ", np.floor(decimals))
print("Ceiling (round up):", np.ceil(decimals))      # [2, 6, 10, -2]
print("Truncate (toward zero):", np.trunc(decimals))

# absolute values and sign
print("Absolute values:", np.abs(decimals))
print("sign (-1, 0, or 1): ", np.sign(decimals))

Original decimals:  [ 1.2  5.7 10.  -2.3]
Round to 2 places:  [ 1.2  5.7 10.  -2.4]
Floor (round down):  [ 1.  5.  9. -3.]
Ceiling (round up): [ 2.  6. 10. -2.]
Truncate (toward zero): [ 1.  5.  9. -2.]
Absolute values: [ 1.2  5.7 10.   2.3]
sign (-1, 0, or 1):  [ 1.  1.  1. -1.]


Different rounding functions serve different purposes. `floor()` always rounds down, `ceil()` always rounds up, `trunc()` removes the decimal part, and `round()` rounds to nearest value.

**Aggregate Functions - Summarizing Your Data**

    - Aggregate functions reduce arrays to summary statistics

In [24]:
# # create a 2D array for demonstration
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])
print("sample array: \n", arr)

# Aggregation across entire array
print("Sum of all elements:", np.sum(arr))        # 45
print("Mean of all elements:", np.mean(arr))      # 5.0
print("Standard deviation:", np.std(arr))         # 2.58
print("Minimum value:", np.min(arr))              # 1
print("Maximum value:", np.max(arr))              # 9


sample array: 
 [[1 2 3]
 [4 5 6]
 [7 8 9]]
Sum of all elements: 45
Mean of all elements: 5.0
Standard deviation: 2.581988897471611
Minimum value: 1
Maximum value: 9


When you don't specify an axis, these functions operate on the flattened array, giving you a single summary value for the entire dataset.

In [None]:
# Axis-specific aggregation

print("Sum along axis 0 (columns):", np.sum(arr, axis=0))  # [12, 15, 18]
print("Sum along axis 1 (rows):", np.sum(arr, axis=1))     # [6, 15, 24]

print("Mean along axis 0:", np.mean(arr, axis=0))          # [4, 5, 6]
print("Mean along axis 1:", np.mean(arr, axis=1))          # [2, 5, 8]

# Finding positions of extreme values
print("Position of max (flattened):", np.argmax(arr))      # 8 (element 9 at position 8)
print("Position of max along axis 0:", np.argmax(arr, axis=0))  # [2, 2, 2]
print("Position of max along axis 1:", np.argmax(arr, axis=1))  # [2, 2, 2]

Sum along axis 0 (columns): 45
Sum along axis 0 (columns): [12 15 18]
Sum along axis 1 (rows): [ 6 15 24]
Mean along axis 0: [4. 5. 6.]
Mean along axis 1: [2. 5. 8.]
Position of max (flattened): 8
Position of max along axis 0: [2 2 2]
Position of max along axis 1: [2 2 2]


- `axis=0`: Operations go "down" the rows (result has same number of columns)
- `axis=1`: Operations go "across" the columns (result has same number of rows)
- `argmax/argmin` return indices, not values

**Broadcasting**
- Broadcasting is NumPy's way of performing operations on arrays with different shapes without explicitly reshaping them. This is one of NumPy's most powerful features

In [28]:
# Broadcasting examples - automatic expansion of arrays for element-wise operations
scalar = 5
arr1d = np.array([1, 2, 3, 4])
arr2d = np.array([[10], [20], [30]])  # Column vector

print("Scalar:", scalar)
print("1D array:", arr1d)  
print("2D array (column vector):\n", arr2d)

# Scalar broadcasts to any shape
result1 = scalar + arr1d
print("Scalar + 1D array:", result1)         # [6, 7, 8, 9]

# 2D + 1D broadcasting
result2 = arr2d + arr1d
print("2D + 1D broadcasting:\n", result2)

Scalar: 5
1D array: [1 2 3 4]
2D array (column vector):
 [[10]
 [20]
 [30]]
Scalar + 1D array: [6 7 8 9]
2D + 1D broadcasting:
 [[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


- Broadcasting follows these rules:

 - Start from the trailing (rightmost) dimensions
 - Dimensions are compatible if they're equal or one is 1
 - Missing dimensions are assumed to be 1

In [30]:
# visualizing broadcasting step by step
a = np.arange(4).reshape(4, 1)
b= np.arange(5).reshape(1, 5)

print("Array a (4x1):\n", a)
print("Array b (1x5):\n", b)

# broadcasting to create a 4 x 5 array
broadcasted = a + b
print("Broadcasted result (4x5):\n", broadcasted)
print("Shape of broadcasted result:", broadcasted.shape)


Array a (4x1):
 [[0]
 [1]
 [2]
 [3]]
Array b (1x5):
 [[0 1 2 3 4]]
Broadcasted result (4x5):
 [[0 1 2 3 4]
 [1 2 3 4 5]
 [2 3 4 5 6]
 [3 4 5 6 7]]
Shape of broadcasted result: (4, 5)


 Array `a` gets broadcast horizontally, array `b` gets broadcast vertically. This creates all pairwise combinations without storing redundant data.

In [38]:
arr = np.array([1, 2, 3])

# print(arr.reshape(-1, 1))
print("Original Shape:", arr.shape)

column_vector = arr[ : , np.newaxis]  # Same as arr.reshape(-1, 1)
print("Column vector shape:", column_vector.shape)   # (3, 1)
print("Column vector:\n", column_vector)

# Convert to row vector (usually not needed - 1D arrays broadcast as rows)
row_vector = arr[np.newaxis, :]                     # Same as arr.reshape(1, -1)
print("Row vector shape:", row_vector.shape)        # (1, 3)
print("row vector:\n", row_vector)

Original Shape: (3,)
Column vector shape: (3, 1)
Column vector:
 [[1]
 [2]
 [3]]
Row vector shape: (1, 3)
row vector:
 [[1 2 3]]


`np.newaxis` is an alias for None and adds a new axis of length 1. This gives you explicit control over broadcasting behavior

**Common Broadcasting Patterns**

In [47]:
data = np.random.randn(5, 3)  # 5 samples, 3 features
print("Original data shape:", data.shape)
print("Original data:\n", data)

# calculating mean for each column
column_mean = np.mean(data, axis = 0)
print("Column means:", column_mean)

# Subtract mean from each column (broadcasting!)
centered_data = data - column_mean
print("Centered data:\n", centered_data)
print("New column means (should be ~0):", np.mean(centered_data, axis=0))
   

Original data shape: (5, 3)
Original data:
 [[ 0.1  1.8  1. ]
 [ 0.9 -1.3 -0.9]
 [-0.8 -0.8 -0.2]
 [-0.6  0.   0.7]
 [-0.3 -0.2 -0.5]]
Column means: [-0.1 -0.1  0. ]
Centered data:
 [[ 0.2  1.9  1. ]
 [ 1.1 -1.2 -1. ]
 [-0.7 -0.7 -0.2]
 [-0.5  0.1  0.7]
 [-0.1 -0.1 -0.5]]
New column means (should be ~0): [0. 0. 0.]


This is a common preprocessing step in machine learning. The column means broadcast across all rows automatically.

In [54]:
# Pattern 2: Normalizing by row sums (useful for probabilities)
data = np.random.rand(4, 3)  # Random data
print("Random data:\n", data)

#  Calculate row sums
row_sums = np.sum(data, axis=1, keepdims=True) 
column_sums = np.sum(data, axis=0, keepdims=True) 
# Shape: (4, 1)
print("Row sums shape:", row_sums.shape)
print("Column sums shape:", column_sums.shape)

print("Row sums:\n", row_sums)

print("Column sums:\n", column_sums)


Random data:
 [[0.1 0.1 0.8]
 [0.1 0.1 0.6]
 [0.6 0.7 0.3]
 [0.8 0.5 0.5]]
Row sums shape: (4, 1)
Column sums shape: (1, 3)
Row sums:
 [[1.1]
 [0.9]
 [1.5]
 [1.8]]
Column sums:
 [[1.6 1.5 2.1]]


`keepdims=True` preserves the dimension as size 1, making broadcasting explicit and avoiding shape errors.