### **`Understanding NumPy`**

- Numpy aka Numerical python is the foundation library for scientific computing in Python.
- It provides a powerful N-dimensional array object and tools for working with these arrays.
- It is like the powerhouse of most data science libraries- Pandas uses Numpy arrays internally, Scikit-learn expects Numpy arrays for machine learning and matplotlib uses NumPy for plotting.

- NumPy operations are implemented in C, making it 10-100x faster than pure Python
- NumPy arrays store data more compactly than Python lists
- Vectorization: Perform operations on entire arrays without writing loops.
- Work with arrays of different shapes seamlessly
Foundation for Pandas, Scikit-learn, matplotlib and more 

In [2]:
# import all necessary libraries
import numpy as np
import matplotlib.pyplot as plt
import time

# Check NumPy version
print(f"NumPy version: {np.__version__}")

# Display settings for cleaner output
np.set_printoptions(precision=2, suppress=True)

NumPy version: 2.3.2


**Creating NumPy Arrays**

In [None]:
# Creating arrays from Pyhton lists
# 1D array: A simple sequence of numbers
arr1d = np.array([1, 2, 3, 4, 5])

# 2D array: Think of this as a matrix or table with rows and columns
arr2d = np.array([[1, 2, 3], 
         [4, 5, 6]])

# 3D array: Like a stack of 2D arrays - useful for images, time series
arr3d = np.array([[[1, 2], [3, 4]], 
         [[5, 6], [7, 8]]])

print("1D array:", arr1d)
print("2D array:\n", arr2d)
print("3D array:\n", arr3d)

When you pass a nested list to np.array(), NumPy automatically dtermines the dimensions. The 1D array is like single row, 2D is like a spreadsheet and 3D is like multiple speadsheets stacked together.

**Creating Special Arrays in NumPy**

In [None]:
# Creating arrays filled with zeros - useful for initializing arrays 
# Shape (3, 4) means 3 rows and 4 columns
zeros = np.zeros((3, 4))
print("Zeros array (3x4):\n", zeros)


# Creating arrays filled with ones - often used as starting points 
ones = np.ones((2, 3, 4)) # 3D array: 2 layers, 3 rows, 4 columns
print("Ones array shape:\n", ones)
print("Ones array shape:\n", ones.shape)

# Empty array - faster than zeros/ones but contains random values
# Use when you will immediately fill the array with real data
empty = np.empty((2, 2))
print("Empty array (contains random values):\n", empty)

`zeros()` and `ones()` are memory-efficient ways to create arrays of specific sizes. `empty()` is fastest but contains garbage values, so only use it when you will immediately overwrite the contents.

In [62]:
# Range arrays - like Python/s range but more efficient

range_arr = np.arange(0, 15, 3) # Start Stop Step : [0, 3, 6, 9, 12]
print("Range array:", range_arr)

# Linearly spaced arrays - divide a range into equal parts
# From 0 to 1 with excatly 5 points (including endpoints)
linspace_arr = np.linspace(0, 1, 5)
print("Linspace array:", linspace_arr)

# Logarithmically spaced arrays - useful for scientic data
# From 10^0 to 10^2 (1 to 100) with 5 points
logspace_arr = np.logspace(0, 2, 5)
print("Logspace array:", logspace_arr)

Range array: [ 0  3  6  9 12]
Linspace array: [0.   0.25 0.5  0.75 1.  ]
Logspace array: [  1.     3.16  10.    31.62 100.  ]


In [6]:
# Essential for linear algebra operations
identity = np.eye(9)
print("Identity matrix:\n", identity)

# Diagonal matrix - put values on the diagonal
diagonal = np.diag([1, 2, 3, 4])
print("Diagonal matrix:\n", diagonal)

# Array filled with a specific value
full_arr = np.full((3, 4), 8)
print("Full array (filled with 8):\n", full_arr)

Identity matrix:
 [[1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1.]]
Diagonal matrix:
 [[1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]
 [0 0 0 4]]
Full array (filled with 8):
 [[8 8 8 8]
 [8 8 8 8]
 [8 8 8 8]]


**Numpy Data types**

- Understanding data types is crucial for memory efficiency and numerical precision.

In [None]:
# Explicit data types- control memory usage and precision.
int_arr = np.array([1, 2, 3, 4], dtype=np.int32) # 32-bit integers
float_arr = np.array([1, 2, 3], dtype=np.float64) 
bool_arr = np.array([True, False, True], dtype=np.bool_)

# Type conversion - change dtype of existing array
converted = int_arr.astype(np.float32) # convert to 32-bit float

print("Integer array dtype:", int_arr.dtype)
print("Float array dtype:", float_arr.dtype)
print("Boolean array dtype:", bool_arr.dtype)
print("Converted array dtype:", int_arr.dtype)

# Memory usage comparison
print(f"int32 uses {int_arr.itemsize} bytes per element")
print(f"float64 uses {float_arr.itemsize} bytes per element")



Integer array dtype: int32
Float array dtype: float64
Boolean array dtype: bool
Converted array dtype: int32
int32 uses 4 bytes per element
float64 uses 8 bytes per element


- int8: 1 byte, range -128 to 127
- int32: 4 bytes, range ±2 billion
- float32: 4 bytes, ~7 decimal digits precision
- float64: 8 bytes, ~15 decimal digits precision

- Choose smaller types to save memory, larger types for precision

**Array Properties & Attributes**

- Understanding array properties helps you work effectively with your data and debug issue

In [14]:
# Create a sample 3D array for demonstration
# Think of this as 3 layers, each with 4 rows and 5 columns
arr = np.random.randn(3, 4, 5)

# Shape: The dimensions of the array (layers, rows, columns)
print("Shape:\n", arr)           # Output: (3, 4, 5)

# Size: Total number of elements (3 × 4 × 5 = 60)
print("Size:", arr.size)             

# Ndim: Number of dimensions (3D in this case)
print("Ndim:", arr.ndim)             

# Dtype: Data type of elements
print("Dtype:", arr.dtype)           # Usually float64 for random numbers

# Itemsize: Memory size of each element in bytes
print("Itemsize:", arr.itemsize)     # 8 bytes for float64

# Total memory usage in bytes
print("Memory usage:", arr.nbytes, "bytes")  # size × itemsize
print("Memory usage:", arr.nbytes / 1024, "KB")  # Convert to KB

Shape:
 [[[ 1.58  0.69  1.08  0.86  2.02]
  [-1.48  0.17  0.3  -0.21  0.3 ]
  [ 0.41 -1.61 -0.03 -0.97 -0.25]
  [ 0.94  0.4   0.07 -1.31 -0.28]]

 [[ 1.16 -1.03 -1.13 -0.95  0.04]
  [-0.53  0.93  0.37  2.04  0.35]
  [-0.27 -1.25  0.85 -0.15 -0.05]
  [-1.04 -1.29  0.63 -0.55 -0.77]]

 [[-0.47  2.19  0.14 -1.07  0.13]
  [-0.38  1.44  1.06  1.31  0.22]
  [-1.35  0.25 -0.64 -0.14  0.09]
  [ 1.58  0.73  0.19 -0.84  3.07]]]
Size: 60
Ndim: 3
Dtype: float64
Itemsize: 8
Memory usage: 480 bytes
Memory usage: 0.46875 KB


**Array Indexing & Slicing**

 **Basic Indexing - Accessing Individual Elements**

- NumPy indexing is similar to Python lists but more powerful for multi-dimensional arrays

In [None]:
# 1D array indexing - similar to Python lists
arr1d = np.array([20, 25, 30, 35, 40])

print("First element:", arr1d[0])     # Index 0: 20
print("Last element:", arr1d[-1])     # Negative indexing: 45 
print("Slice [1:4]:", arr1d[1:4])     # Elements 1, 2, 3: [25, 30, 35]
print("Every 2nd element:", arr1d[::2])  # Step of 2: [20, 30, 40]

Negative indices count from the end (-1 is last element). Slicing uses [start:stop:step] where stop is exclusive.

In [20]:
# 2D array indexing - row and column access
arr2d = np.array([[1, 2, 3, 4],
                  [5, 6, 7, 8],
                  [9, 10, 11, 12]])

# Access specific element: [row, column]
print("Element at row 1, column 2:", arr2d[1, 2])        # 7

# Access entire rows or columns
print("First row:", arr2d[0, :])               # All columns of row 0
print("Second column:", arr2d[:, 2])           # All rows of column 1

# Subarray slicing: [row_start:row_end, col_start:col_end]
print("Subarray (rows 1-2, cols 1-2):\n", arr2d[1:3, 1:4])

Element at row 1, column 2: 7
First row: [1 2 3 4]
Second column: [ 3  7 11]
Subarray (rows 1-2, cols 1-2):
 [[ 6  7  8]
 [10 11 12]]


**Advanced Indexing - Powerful Selection Methods**

In [None]:
# Fancy indexing - use arrays of indices to select elements
arr = np.array([10, 20, 30, 40, 50])
indices = np.array([0, 2, 4])  # Select elements at positions 0, 2, 4
print("Fancy indexing:", arr[indices])         # [10, 30, 50]

# This is much more flexible than simple slicing
random_indices = np.array([4, 1, 3, 1])  # Can repeat and reorder
print("Random order:", arr[random_indices])   # [50, 20, 40, 20]

Fancy indexing lets you select elements in any order, repeat elements, and select non-contiguous elements. Very useful for data sampling and reordering.

In [24]:
# 2D fancy indexing - select specific row/column combinations
arr2d = np.arange(12).reshape(3, 4)  # 3x4 array: [[0,1,2,3], [4,5,6,7], [8,9,10,11]]
print("Original 2D array:\n", arr2d)

# Select elements at (row, col) pairs: (0,1) and (2,3)
rows = np.array([0, 2])
cols = np.array([1, 3])
print("Elements at (0,1) and (2,3):", arr2d[rows, cols])  # [1, 11]

# Select entire rows using fancy indexing
selected_rows = arr2d[[0, 2], :]  # Rows 0 and 2, all columns
print("Selected rows:\n", selected_rows)

Original 2D array:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
Elements at (0,1) and (2,3): [ 1 11]
Selected rows:
 [[ 0  1  2  3]
 [ 8  9 10 11]]


**Array Reshaping & Manipulation**

- Reshaping changes how the same data is organized in memory without changing the actual values.

In [None]:
# Start with a 1D array
arr = np.arange(16)  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
print("Original 1D array:", arr)

# Reshape to 2D: 3 rows × 4 columns
reshaped_2d = arr.reshape(4, 4)
print("Reshaped to 3x4:\n", reshaped_2d)

# Reshape to 3D: 2 layers × 2 rows × 3 columns  
reshaped_3d = arr.reshape(2, 2, 4)
print("Reshaped to 2x2x3:\n", reshaped_3d)

# Use -1 to let NumPy calculate one dimension automatically
auto_reshape = arr.reshape(4, -1)  # 4 rows, NumPy calculates columns
print("Auto-reshaped to 4x?:\n", auto_reshape)

The total number of elements must remain the same (12 in this case). Using -1 tells NumPy to calculate that dimension automatically. Reshaping creates a view when possible, not a copy

In [31]:
# Flattening - convert multi-dimensional array to 1D
arr2d = np.array([[1, 2, 3], [4, 5, 6]])

# flatten() always returns a copy
flattened = arr2d.flatten()                 
print("Flattened (copy):", flattened)

# ravel() returns a view if possible (faster, memory efficient)
ravel = arr2d.ravel()                       
print("Ravel (view if possible):", ravel)

# Demonstrate the difference
arr2d[0, 0] = 999
print("After modifying original:")
print("Flattened (unchanged):", flattened)  # Copy is independent
print("Ravel (changed):", ravel)            # View reflects changes

Flattened (copy): [1 2 3 4 5 6]
Ravel (view if possible): [1 2 3 4 5 6]
After modifying original:
Flattened (unchanged): [1 2 3 4 5 6]
Ravel (changed): [999   2   3   4   5   6]
