### **`Understanding Numpy`**

- Numpy (Numerical Python) is the foundation library for scientific computing in Python.
- It provides a N-dimensional array object and tools for working with these arrays.

- Think of Numpy as the engine that powers most data science libraries- pandas uses Numpy arrays internally, scikit-learn expects Numpy arrays for machine learning and matplotlib uses Numpy for plotting.

- Numpy operations are implemented in C, making them 10-100x faster than pure python
- NumPy arrays store data more compactly than python lists.
- It perform operations on entire arrays without writing loops (Vectorization).
- Foundation for pandas, scikit-learn, matplotlib, and more.

In [17]:
# import all necessary libraries

import numpy as np
import matplotlib.pyplot as plt
import time

# check numpy version
print(f"Numpy version:", np.__version__)

# Diplay settings for cleaner output
np.set_printoptions(precision=1, suppress=True)

Numpy version: 2.3.2


**Creating NumPy Arrays**

In [18]:
# creating arrays from python lists
# 1D array: A simple seauence of numbers
arr1d = np.array([1, 2, 3, 4, 5])

# 2D array: Think of this as a matrix or table with rows and columns
arr2d = np.array([[1, 2 ,3],
                  [4,5,6]])

# 3D array: Like a stack of 2D arrays - useful for images, time series, e.t.c
arr3d = np.array([
                     [[1, 2], [3, 4]],
                     [[5, 6], [7, 8]]
                ])

print("1D array: ", arr1d)
print("2D array:\n ", arr2d)
print("3D array:\n ", arr3d)


1D array:  [1 2 3 4 5]
2D array:
  [[1 2 3]
 [4 5 6]]
3D array:
  [[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


When you pass a nested list to `np.array()`, NumPy automatically determines the dimensions. The 1D array is like a single row, 2D is like a spreadsheet, and 3D is like multiple spreadsheets stacked together.

**Creating Special Arrays in Numpy**


In [19]:
# creating arrays filled with zeros - useful for initializing arrays
# Shape (3, 4) means 3 rows and 4 columns
zeros = np.zeros((3,4))

# creating arrays filled with ones- often used as starting points
ones = np.ones ((2, 3, 4) ) #3D array: 2 layers, 3 rows, 4 columns

# Empty array- faster than zeros/ones but contains random values
# use when you will immediately fill the array with real data

empty= np.empty((2, 3))

print("Zeros Array (3x4): \n", zeros)
print("Ones array shape:\n ", ones)
print("Ones array shape:\n ", ones.shape)
print("Empty array (contains random values): \n", empty)


Zeros Array (3x4): 
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
Ones array shape:
  [[[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]]
Ones array shape:
  (2, 3, 4)
Empty array (contains random values): 
 [[0. 0. 0.]
 [0. 0. 0.]]


`zeros()` and `ones()` are memory-efficient ways to create arrays of specific sizes. `empty()` is fastest but contains garbage values, so only use it when you will immediately overwrite the contents.

In [20]:
# Range arrays - like Python's range() but more powerful
range_arr = np.arange(0, 10, 2) #Start, Stop, Step: [0, 2, 4, 6, 8]
print("Range array: ", range_arr)

# linearly spaced arrays - divide a range into equal parts
# From 0 to 1 with exactly 5 points (including endpoints)
linspace_arr = np.linspace(0, 5, 50)
print("Linspace array: ", linspace_arr)

# Logarithmically spaced arrays - useful for scientific data
# From 10^0 to 10^2 (1 to 100) with 5 points
logspace_arr = np.logspace(0, 2, 5)
print("Logspace array: ", logspace_arr)

Range array:  [0 2 4 6 8]
Linspace array:  [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.  1.1 1.2 1.3 1.4 1.5 1.6 1.7
 1.8 1.9 2.  2.1 2.2 2.3 2.4 2.6 2.7 2.8 2.9 3.  3.1 3.2 3.3 3.4 3.5 3.6
 3.7 3.8 3.9 4.  4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5. ]
Logspace array:  [  1.    3.2  10.   31.6 100. ]


In [21]:
# Identity matrix - diagonal of ones, zeros elsewhere
# Essential for linear algebra operations

identity = np.eye(4) # 4x4 identity matrix

# diagonal matrix- put values in the diagonal
diagonal = np.diag([1, 2, 3, 4, 5])

# array filled with a specific value
full_arr = np.full((3, 3), 7) # 3 x 3 array filled with 7

print("Identity matrix: \n", identity)
print("\nDiagonal matrix: \n", diagonal)
print ("\nFull array (filled with 7):\n", full_arr)

Identity matrix: 
 [[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

Diagonal matrix: 
 [[1 0 0 0 0]
 [0 2 0 0 0]
 [0 0 3 0 0]
 [0 0 0 4 0]
 [0 0 0 0 5]]

Full array (filled with 7):
 [[7 7 7]
 [7 7 7]
 [7 7 7]]


**Numpy Data Types**
- Understanding data types is crucial for memory efficiency and numerical precision

In [22]:
# Explicit data types- control memory usage and precision
int_arr = np.array([1, 2, 3], dtype= np.int32) #32-bit integers
float_arr = np.array ([1, 2, 3], dtype=np.float64) #64-bit floats (double precision)
bool_arr = np.array([True, False, True], dtype= np.bool_) #boolean values

# Type conversion- change dtype of existing array
converted = int_arr.astype(np.float32) #convert to 32-bit float

print("Integer array dtype: ", int_arr.dtype)
print("float array dtype: ", float_arr.dtype) 
print("Boolean array dtype: ", bool_arr)
print("Converted array dtype: ", converted.dtype)

# Memory usage comparison
print(f"int32 uses {int_arr.itemsize} bytes per element")
print(f"float64 uses {float_arr.itemsize} bytes per element")

Integer array dtype:  int32
float array dtype:  float64
Boolean array dtype:  [ True False  True]
Converted array dtype:  float32
int32 uses 4 bytes per element
float64 uses 8 bytes per element


- int8: 1 byte, range -128 to 127
- int32: 4 bytes, range ±2 billion
- float32: 4 bytes, ~7 decimal digits precision
- float64: 8 bytes, ~15 decimal digits precision

- Choose smaller types to save memory, larger types for precision

**Array Properties & Attributes**

- Understanding array properties helps you work effectively with your data and debug issue

In [None]:
# Create a sample 3D array for demonstration
# Think of this as 3 layers, each with 4 rows and 5 columns
arr = np.random.randn(3, 4, 5)
print(arr)
# Shape: The dimensions of the array (layers, rows, columns)
print("Shape:", arr.shape)           # Output: (3, 4, 5)

# Size: Total number of elements (3 × 4 × 5 = 60)
print("Size:", arr.size)             

# Ndim: Number of dimensions (3D in this case)
print("Ndim:", arr.ndim)             

# Dtype: Data type of elements
print("Dtype:", arr.dtype)           # Usually float64 for random numbers

# Itemsize: Memory size of each element in bytes
print("Itemsize:", arr.itemsize)     # 8 bytes for float64

# Total memory usage in bytes
print("Memory usage:", arr.nbytes, "bytes")  # size × itemsize
print("Memory usage:", arr.nbytes / 1024, "KB")  # Convert to KB

**Array Indexing & Slicing**

 **Basic Indexing - Accessing Individual Elements**

- NumPy indexing is similar to Python lists but more powerful for multi-dimensional arrays

In [27]:
# 1D array indexing- similar to Python lists
arr1d = np.array([10,20,30, 40, 50])

print("First element:", arr1d[0])     # Index 0: 10
print("Last element:", arr1d[-1])     # Negative indexing: 50  
print("Slice [1:4]:", arr1d[1:4])     # Elements 1, 2, 3: [20, 30, 40]
print("Every 2nd element:", arr1d[::2])  # Step of 2: [10, 30, 50]


First element: 10
Last element: 50
Slice [1:4]: [20 30 40]
Every 2nd element: [10 30 50]


Negative indices count from the end (-1 is last element). Slicing uses [start:stop:step] where stop is exclusive.

In [34]:
# 2D array indexing - row and column access
arr2d = np.array([[1, 2, 3, 4],
                  [5, 6, 7, 8],
                  [9, 10, 11, 12]])

# Access specific element : [row, column]
print("Element at row 1, column 2: ", arr2d[1, 2])

# access entire rows or columns
print("First row: ", arr2d[0,:]) #all columns of row 0
print("Second Column: ", arr2d[:,-1]) #all rows of column 1

# Subarray slicing: [row_start:row_end, col_start:col_end]
print("Subarray (rows 1-2, cols 1-2):\n", arr2d[1:3, 1:3])


Element at row 1, column 2:  7
First row:  [1 2 3 4]
Second Column:  [ 4  8 12]
Subarray (rows 1-2, cols 1-2):
 [[ 6  7]
 [10 11]]


The comma separates dimensions. : means "all elements along this dimension". Slicing creates views of the original data when possible, not copies.

**Advanced Indexing - Powerful Selection Methods**

In [35]:
# Fancy indexing - use arrays of indices to select elements
arr = np.array([10, 20, 30, 40, 50])
indices = np.array([0, 2, 4])  # Select elements at positions 0, 2, 4
print("Fancy indexing:", arr[indices])         # [10, 30, 50]

# This is much more flexible than simple slicing
random_indices = np.array([4, 1, 3, 1])  # Can repeat and reorder
print("Random order:", arr[random_indices])   # [50, 20, 40, 20]

Fancy indexing: [10 30 50]
Random order: [50 20 40 20]


When you provide arrays for both dimensions, NumPy pairs them element-wise. This is different from slicing, which creates a rectangular subarray.

**Array Reshaping & Manipulation**

- Reshaping changes how the same data is organized in memory without changing the actual values.


In [40]:
# Start with a 1D array
arr = np.arange(12)  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
print("Original 1D array:", arr)

# Reshape to 2D: 3 rows × 4 columns
reshaped_2d = arr.reshape(3, 4)
print("Reshaped to 3x4:\n", reshaped_2d)

# Reshape to 3D: 2 layers × 2 rows × 3 columns  
reshaped_3d = arr.reshape(2, 2, 3)
print("Reshaped to 2x2x3:\n", reshaped_3d)

# Use -1 to let NumPy calculate one dimension automatically
auto_reshape = arr.reshape(4, -1)  # 4 rows, NumPy calculates columns
print("Auto-reshaped to 4x?:\n", auto_reshape)

Original 1D array: [ 0  1  2  3  4  5  6  7  8  9 10 11]
Reshaped to 3x4:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Reshaped to 2x2x3:
 [[[ 0  1  2]
  [ 3  4  5]]

 [[ 6  7  8]
  [ 9 10 11]]]
Auto-reshaped to 4x?:
 [[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]


The total number of elements must remain the same (12 in this case). Using -1 tells NumPy to calculate that dimension automatically. Reshaping creates a view when possible, not a copy

In [46]:
# Flattening - convert multi-dimensional array to 1D
arr2d = np.array([[1, 2, 3], [4, 5, 6]])

# flatten() always returns a copy
flattened = arr2d.flatten()                 
print("Flattened (copy):", flattened)

# ravel() returns a view if possible (faster, memory efficient)
ravel = arr2d.ravel()                       
print("Ravel (view if possible):", ravel)

# Demonstrate the difference
arr2d[0, 0] = 999
print("After modifying original:")
print("Flattened (unchanged):", flattened)  # Copy is independent
print("Ravel (changed):", ravel)            # View reflects changes

Flattened (copy): [1 2 3 4 5 6]
Ravel (view if possible): [1 2 3 4 5 6]
After modifying original:
Flattened (unchanged): [1 2 3 4 5 6]
Ravel (changed): [999   2   3   4   5   6]


Use ravel() when you don't need to modify the flattened array independently. Use flatten() when you need a separate copy that won't be affected by changes to the original.

**Transposing and Swapping Axes**

 - Transposing is essential for matrix operations and changing data orientation

In [52]:
# 2D transposition- flip rows and columns
arr2d = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

print('Oirginal shape:', arr2d.shape) # (2,3)
print("Original: \n", arr2d)

print("Transposed shape: ", arr2d.T.shape)
print("Transposed:\n ", arr2d.T)

# Alternative transpose methods
print("\nTranspose method:\n", arr2d.transpose())

Oirginal shape: (2, 3)
Original: 
 [[1 2 3]
 [4 5 6]]
Transposed shape:  (3, 2)
Transposed:
  [[1 4]
 [2 5]
 [3 6]]

Transpose method:
 [[1 4]
 [2 5]
 [3 6]]


Transposing swaps rows and columns. This is crucial for matrix multiplication and when you need to change data orientation (e.g., from samples×features to features×samples)

In [59]:
# Higher-dimensional transposition
arr3d = np.arange(24).reshape(2, 3, 4)  # 2 layers, 3 rows, 4 columns
print("Original 3D shape:\n", arr3d)

# specify new axis order: (axis0, axis 1, axis 2) -> (axis2, axis0, axis1)
transposed_3d = arr3d.transpose(2,0,1)
print("Transposed to 3D shape:\n", transposed_3d)


Original 3D shape:
 [[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
Transposed to 3D shape:
 [[[ 0  4  8]
  [12 16 20]]

 [[ 1  5  9]
  [13 17 21]]

 [[ 2  6 10]
  [14 18 22]]

 [[ 3  7 11]
  [15 19 23]]]
