# Complete NumPy Tutorial

NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

## Why NumPy?
- **Performance**: NumPy operations are implemented in C, making them much faster than pure Python
- **Memory Efficiency**: NumPy arrays use less memory than Python lists
- **Broadcasting**: Powerful mechanism for performing operations on arrays of different shapes
- **Foundation**: Core library for data science libraries like Pandas, SciPy, Scikit-learn, and TensorFlow

In [None]:
# Import NumPy
import numpy as np

# Check NumPy version
print(f"NumPy version: {np.__version__}")

## 1. Creating NumPy Arrays

NumPy arrays can be created in multiple ways. Let's explore the most common methods.

In [None]:
# 1.1 From Python lists
arr_1d = np.array([1, 2, 3, 4, 5])
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
arr_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

print("1D Array:", arr_1d)
print("\n2D Array:\n", arr_2d)
print("\n3D Array:\n", arr_3d)

In [None]:
# 1.2 Using built-in functions
zeros = np.zeros((3, 4))              # Array of zeros
ones = np.ones((2, 3))                # Array of ones
empty = np.empty((2, 2))              # Empty array (uninitialized)
full = np.full((3, 3), 7)             # Array filled with 7
eye = np.eye(4)                       # Identity matrix

print("Zeros:\n", zeros)
print("\nOnes:\n", ones)
print("\nFull (7s):\n", full)
print("\nIdentity Matrix:\n", eye)

In [None]:
# 1.3 Using range functions
arange_arr = np.arange(0, 10, 2)           # Similar to Python's range
linspace_arr = np.linspace(0, 1, 5)        # 5 evenly spaced values between 0 and 1
logspace_arr = np.logspace(0, 2, 5)        # 5 values logarithmically spaced

print("Arange (0 to 10, step 2):", arange_arr)
print("Linspace (0 to 1, 5 values):", linspace_arr)
print("Logspace (10^0 to 10^2, 5 values):", logspace_arr)

## 2. Array Attributes

Understanding array properties is crucial for working with NumPy effectively.

In [None]:
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

print("Array:\n", arr)
print("\nShape (dimensions):", arr.shape)        # (rows, columns)
print("Number of dimensions:", arr.ndim)         # 2 for 2D array
print("Size (total elements):", arr.size)        # 12 elements
print("Data type:", arr.dtype)                   # int32 or int64
print("Item size (bytes):", arr.itemsize)        # 4 or 8 bytes
print("Total bytes:", arr.nbytes)                # size * itemsize

## 3. Array Indexing and Slicing

NumPy provides powerful indexing and slicing capabilities similar to Python lists but extended to multiple dimensions.

In [None]:
# 3.1 Basic indexing (0-based)
arr = np.array([10, 20, 30, 40, 50])
print("Original array:", arr)
print("First element:", arr[0])
print("Last element:", arr[-1])
print("Third element:", arr[2])

# 3.2 Slicing [start:stop:step]
print("\nSlicing examples:")
print("arr[1:4]:", arr[1:4])          # Elements from index 1 to 3
print("arr[:3]:", arr[:3])            # First 3 elements
print("arr[2:]:", arr[2:])            # From index 2 to end
print("arr[::2]:", arr[::2])          # Every second element
print("arr[::-1]:", arr[::-1])        # Reverse the array

In [None]:
# 3.3 Multi-dimensional indexing
arr_2d = np.array([[1, 2, 3, 4],
                   [5, 6, 7, 8],
                   [9, 10, 11, 12]])

print("2D Array:\n", arr_2d)
print("\nElement at row 1, column 2:", arr_2d[1, 2])    # 7
print("First row:", arr_2d[0])                          # [1, 2, 3, 4]
print("Second column:", arr_2d[:, 1])                   # [2, 6, 10]
print("\nFirst 2 rows, last 2 columns:\n", arr_2d[:2, 2:])
print("\nEvery other row and column:\n", arr_2d[::2, ::2])

In [None]:
# 3.4 Boolean indexing
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Create boolean mask
mask = arr > 5
print("Array:", arr)
print("Mask (arr > 5):", mask)
print("Elements > 5:", arr[mask])

# Direct boolean indexing
print("\nElements divisible by 3:", arr[arr % 3 == 0])
print("Elements between 3 and 8:", arr[(arr >= 3) & (arr <= 8)])

In [None]:
# 3.5 Fancy indexing (array indexing)
arr = np.array([10, 20, 30, 40, 50, 60])
indices = np.array([0, 2, 4])

print("Array:", arr)
print("Indices:", indices)
print("Selected elements:", arr[indices])

# 2D fancy indexing
arr_2d = np.array([[1, 2], [3, 4], [5, 6]])
rows = np.array([0, 1, 2])
cols = np.array([1, 0, 1])
print("\n2D Array:\n", arr_2d)
print("Fancy indexed elements:", arr_2d[rows, cols])  # [2, 3, 6]

## 4. Array Operations

NumPy supports element-wise operations and vectorized computations.

In [None]:
# 4.1 Arithmetic operations (element-wise)
a = np.array([1, 2, 3, 4])
b = np.array([10, 20, 30, 40])

print("a:", a)
print("b:", b)
print("\nAddition (a + b):", a + b)
print("Subtraction (a - b):", a - b)
print("Multiplication (a * b):", a * b)
print("Division (a / b):", a / b)
print("Power (a ** 2):", a ** 2)
print("Modulo (b % 3):", b % 3)

In [None]:
# 4.2 Universal functions (ufuncs)
arr = np.array([0, 30, 45, 60, 90])

print("Array (degrees):", arr)
print("\nSine:", np.sin(np.radians(arr)))
print("Cosine:", np.cos(np.radians(arr)))
print("Tangent:", np.tan(np.radians(arr)))

# Other ufuncs
arr2 = np.array([1, 4, 9, 16, 25])
print("\nSquare root:", np.sqrt(arr2))
print("Exponential:", np.exp([1, 2, 3]))
print("Logarithm:", np.log([1, np.e, np.e**2]))

In [None]:
# 4.3 Aggregation functions
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

print("Array:\n", arr)
print("\nSum of all elements:", np.sum(arr))
print("Sum along axis 0 (columns):", np.sum(arr, axis=0))
print("Sum along axis 1 (rows):", np.sum(arr, axis=1))

print("\nMean:", np.mean(arr))
print("Median:", np.median(arr))
print("Standard deviation:", np.std(arr))
print("Variance:", np.var(arr))
print("Min:", np.min(arr))
print("Max:", np.max(arr))
print("Argmin (index of min):", np.argmin(arr))
print("Argmax (index of max):", np.argmax(arr))

## 5. Array Manipulation

Reshaping, stacking, splitting, and transposing arrays.

In [None]:
# 5.1 Reshaping
arr = np.arange(12)
print("Original 1D array:", arr)

# Reshape to different dimensions
reshaped_2d = arr.reshape(3, 4)
reshaped_3d = arr.reshape(2, 3, 2)

print("\nReshaped to 3x4:\n", reshaped_2d)
print("\nReshaped to 2x3x2:\n", reshaped_3d)

# Flatten back to 1D
print("\nFlattened:", reshaped_2d.flatten())
print("Raveled:", reshaped_2d.ravel())  # Similar to flatten but returns view when possible

In [None]:
# 5.2 Transposing
arr = np.array([[1, 2, 3],
                [4, 5, 6]])

print("Original array (2x3):\n", arr)
print("\nTransposed (3x2):\n", arr.T)
print("\nTransposed using transpose():\n", np.transpose(arr))

# For multi-dimensional arrays, specify axis order
arr_3d = np.arange(24).reshape(2, 3, 4)
print("\n3D array shape:", arr_3d.shape)
print("Transposed 3D shape:", np.transpose(arr_3d, (2, 0, 1)).shape)

In [None]:
# 5.3 Stacking and concatenating
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

print("Array a:\n", a)
print("\nArray b:\n", b)

# Vertical stacking (row-wise)
print("\nVertical stack (vstack):\n", np.vstack((a, b)))

# Horizontal stacking (column-wise)
print("\nHorizontal stack (hstack):\n", np.hstack((a, b)))

# Concatenate along specific axis
print("\nConcatenate axis=0:\n", np.concatenate((a, b), axis=0))
print("\nConcatenate axis=1:\n", np.concatenate((a, b), axis=1))

In [None]:
# 5.4 Splitting arrays
arr = np.arange(16).reshape(4, 4)
print("Original array:\n", arr)

# Horizontal split (column-wise)
h_split = np.hsplit(arr, 2)  # Split into 2 equal parts
print("\nHorizontal split (2 parts):")
for i, sub_arr in enumerate(h_split):
    print(f"Part {i+1}:\n{sub_arr}")

# Vertical split (row-wise)
v_split = np.vsplit(arr, 2)
print("\nVertical split (2 parts):")
for i, sub_arr in enumerate(v_split):
    print(f"Part {i+1}:\n{sub_arr}")

## 6. Broadcasting

Broadcasting allows NumPy to perform operations on arrays of different shapes.

In [None]:
# Broadcasting examples
# Example 1: Scalar with array
arr = np.array([1, 2, 3, 4])
print("Array:", arr)
print("Array + 10:", arr + 10)  # Scalar is broadcast to match array shape

# Example 2: 1D array with 2D array
arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])
arr_1d = np.array([10, 20, 30])

print("\n2D Array:\n", arr_2d)
print("1D Array:", arr_1d)
print("\n2D + 1D (row-wise broadcast):\n", arr_2d + arr_1d)

In [None]:
# Broadcasting rules demonstration
# Rule: Arrays can be broadcast together if dimensions are compatible
# Compatible means: dimensions are equal OR one of them is 1

# Example with column vector
col_vector = np.array([[1], [2], [3]])  # shape (3, 1)
row_vector = np.array([10, 20, 30])      # shape (3,)

print("Column vector (3,1):\n", col_vector)
print("\nRow vector (3,):", row_vector)
print("\nBroadcast multiplication:\n", col_vector * row_vector)

# Creating a multiplication table using broadcasting
x = np.arange(1, 6).reshape(5, 1)
y = np.arange(1, 6)
multiplication_table = x * y
print("\nMultiplication table (5x5):\n", multiplication_table)

## 7. Linear Algebra

NumPy provides comprehensive linear algebra operations through `numpy.linalg`.

In [None]:
# 7.1 Matrix operations
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print("Matrix A:\n", A)
print("\nMatrix B:\n", B)

# Matrix multiplication (dot product)
print("\nMatrix multiplication (A @ B):\n", A @ B)
print("\nUsing np.dot:\n", np.dot(A, B))

# Element-wise multiplication
print("\nElement-wise multiplication (A * B):\n", A * B)

# Matrix power
print("\nA^2 (matrix power):\n", np.linalg.matrix_power(A, 2))

In [None]:
# 7.2 Matrix properties
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 10]])

print("Matrix A:\n", A)

# Determinant
det = np.linalg.det(A)
print(f"\nDeterminant: {det:.2f}")

# Trace (sum of diagonal elements)
trace = np.trace(A)
print(f"Trace: {trace}")

# Rank
rank = np.linalg.matrix_rank(A)
print(f"Rank: {rank}")

# Inverse (if determinant != 0)
if det != 0:
    inv = np.linalg.inv(A)
    print("\nInverse:\n", inv)
    print("\nVerify (A @ A_inv should be identity):\n", A @ inv)

In [None]:
# 7.3 Eigenvalues and eigenvectors
A = np.array([[4, -2],
              [1, 1]])

print("Matrix A:\n", A)

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

print("\nEigenvalues:", eigenvalues)
print("\nEigenvectors:\n", eigenvectors)

# Verify: A @ v = λ @ v
for i in range(len(eigenvalues)):
    lambda_i = eigenvalues[i]
    v_i = eigenvectors[:, i]
    print(f"\nVerification for eigenvalue {lambda_i:.2f}:")
    print("A @ v =", A @ v_i)
    print("λ * v =", lambda_i * v_i)

In [None]:
# 7.4 Solving linear systems
# Solve: Ax = b

A = np.array([[3, 1],
              [1, 2]])
b = np.array([9, 8])

print("Coefficient matrix A:\n", A)
print("Constants vector b:", b)

# Solve for x
x = np.linalg.solve(A, b)
print("\nSolution x:", x)

# Verify: A @ x should equal b
print("\nVerification (A @ x):", A @ x)
print("Expected (b):", b)

## 8. Random Number Generation

NumPy's random module provides tools for generating random numbers and sampling.

In [None]:
# 8.1 Basic random number generation
# Set seed for reproducibility
np.random.seed(42)

# Random floats between 0 and 1
rand_floats = np.random.rand(3, 3)
print("Random floats (0 to 1):\n", rand_floats)

# Random integers
rand_ints = np.random.randint(1, 100, size=(3, 4))
print("\nRandom integers (1 to 99):\n", rand_ints)

# Random floats from standard normal distribution
rand_normal = np.random.randn(3, 3)
print("\nRandom normal distribution:\n", rand_normal)

# Random choice from array
choices = np.random.choice([10, 20, 30, 40, 50], size=5)
print("\nRandom choices:", choices)

In [None]:
# 8.2 Statistical distributions
# Normal (Gaussian) distribution
normal = np.random.normal(loc=0, scale=1, size=1000)  # mean=0, std=1
print("Normal distribution - Mean:", normal.mean(), "Std:", normal.std())

# Uniform distribution
uniform = np.random.uniform(low=0, high=10, size=1000)
print("Uniform distribution - Min:", uniform.min(), "Max:", uniform.max())

# Binomial distribution
binomial = np.random.binomial(n=10, p=0.5, size=1000)
print("Binomial distribution - Mean:", binomial.mean())

# Poisson distribution
poisson = np.random.poisson(lam=5, size=1000)
print("Poisson distribution - Mean:", poisson.mean())

In [None]:
# 8.3 Shuffling and permutations
arr = np.arange(10)
print("Original array:", arr)

# Shuffle in-place
np.random.shuffle(arr)
print("Shuffled:", arr)

# Create a permutation (doesn't modify original)
arr2 = np.arange(10)
permuted = np.random.permutation(arr2)
print("\nOriginal:", arr2)
print("Permuted:", permuted)

# Random sampling without replacement
sample = np.random.choice(np.arange(100), size=5, replace=False)
print("\nRandom sample (no replacement):", sample)

## 9. Advanced Topics

In [None]:
# 9.1 Vectorization for performance
import time

# Using loops (slow)
def sum_with_loop(n):
    result = 0
    for i in range(n):
        result += i
    return result

# Using NumPy (fast)
def sum_with_numpy(n):
    return np.sum(np.arange(n))

n = 1_000_000

# Time loop version
start = time.time()
loop_result = sum_with_loop(n)
loop_time = time.time() - start

# Time NumPy version
start = time.time()
numpy_result = sum_with_numpy(n)
numpy_time = time.time() - start

print(f"Loop time: {loop_time:.6f} seconds")
print(f"NumPy time: {numpy_time:.6f} seconds")
print(f"NumPy is {loop_time/numpy_time:.2f}x faster!")

In [None]:
# 9.2 Memory views and copies
arr = np.array([1, 2, 3, 4, 5])

# View (shares memory with original)
view = arr[1:4]
print("Original:", arr)
print("View:", view)

# Modify view
view[0] = 999
print("\nAfter modifying view:")
print("Original:", arr)  # Original is also modified!
print("View:", view)

# Copy (independent)
arr2 = np.array([1, 2, 3, 4, 5])
copy = arr2[1:4].copy()
copy[0] = 999
print("\nAfter modifying copy:")
print("Original:", arr2)  # Original is unchanged
print("Copy:", copy)

In [None]:
# 9.3 Structured arrays (records)
# Create structured array with named fields
dt = np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
people = np.array([
    ('Alice', 25, 55.5),
    ('Bob', 30, 75.2),
    ('Charlie', 35, 80.0)
], dtype=dt)

print("Structured array:\n", people)
print("\nNames:", people['name'])
print("Ages:", people['age'])
print("Weights:", people['weight'])

# Access individual record
print("\nFirst person:", people[0])
print("Bob's age:", people[1]['age'])

In [None]:
# 9.4 File I/O
# Save and load NumPy arrays

# Save single array
arr = np.array([[1, 2, 3], [4, 5, 6]])
np.save('my_array.npy', arr)
print("Array saved to 'my_array.npy'")

# Load array
loaded = np.load('my_array.npy')
print("Loaded array:\n", loaded)

# Save multiple arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
np.savez('multiple_arrays.npz', first=arr1, second=arr2)
print("\nMultiple arrays saved to 'multiple_arrays.npz'")

# Load multiple arrays
data = np.load('multiple_arrays.npz')
print("First array:", data['first'])
print("Second array:", data['second'])

# Save to text file (CSV-like)
np.savetxt('array.txt', arr, delimiter=',', fmt='%d')
print("\nArray saved to 'array.txt'")

# Load from text file
loaded_txt = np.loadtxt('array.txt', delimiter=',')
print("Loaded from text:\n", loaded_txt)

In [None]:
# 9.5 Advanced indexing techniques
# np.where - conditional selection
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
result = np.where(arr > 5, arr * 2, arr)  # If > 5, multiply by 2, else keep original
print("Original:", arr)
print("Conditional result:", result)

# np.select - multiple conditions
conditions = [arr < 3, (arr >= 3) & (arr < 7), arr >= 7]
choices = ['small', 'medium', 'large']
categories = np.select(conditions, choices)
print("\nCategories:", categories)

# np.take - extract elements by indices
indices = [0, 2, 4, 8]
taken = np.take(arr, indices)
print("\nTaken elements:", taken)

# np.put - replace elements at indices
arr_copy = arr.copy()
np.put(arr_copy, [1, 3, 5], [99, 99, 99])
print("\nAfter put:", arr_copy)

In [None]:
# 9.6 Set operations
a = np.array([1, 2, 3, 4, 5])
b = np.array([4, 5, 6, 7, 8])

print("Array a:", a)
print("Array b:", b)

print("\nUnique values in a:", np.unique(a))
print("Union:", np.union1d(a, b))
print("Intersection:", np.intersect1d(a, b))
print("Set difference (a - b):", np.setdiff1d(a, b))
print("Symmetric difference:", np.setxor1d(a, b))

# Check membership
print("\nIs 3 in a?", np.isin(3, a))
print("Which elements of a are in b?", np.isin(a, b))

## 10. Practical Examples

Real-world applications combining multiple NumPy concepts.

In [None]:
# Example 1: Image manipulation (grayscale image as 2D array)
# Simulate a 10x10 grayscale image
image = np.random.randint(0, 256, size=(10, 10), dtype=np.uint8)
print("Original image (10x10):\n", image)

# Apply threshold
threshold = 128
binary_image = np.where(image > threshold, 255, 0)
print("\nBinary image (threshold=128):\n", binary_image)

# Flip image
flipped = np.flipud(image)  # Flip upside down
print("\nFlipped image:\n", flipped)

# Rotate 90 degrees
rotated = np.rot90(image)
print("\nRotated 90°:\n", rotated)

In [None]:
# Example 2: Statistics on dataset
# Simulate student scores
np.random.seed(42)
scores = np.random.randint(50, 100, size=(30, 5))  # 30 students, 5 subjects

print("Student scores (30 students, 5 subjects):")
print(scores[:5])  # Show first 5 students

# Calculate statistics
avg_per_student = np.mean(scores, axis=1)
avg_per_subject = np.mean(scores, axis=0)

print(f"\nAverage score per student (first 5): {avg_per_student[:5]}")
print(f"Average score per subject: {avg_per_subject}")

# Find top 5 students
top_5_indices = np.argsort(avg_per_student)[-5:][::-1]
print(f"\nTop 5 students (indices): {top_5_indices}")
print(f"Their average scores: {avg_per_student[top_5_indices]}")

# Find students who failed (< 60 in any subject)
failed_students = np.any(scores < 60, axis=1)
print(f"\nNumber of students who failed at least one subject: {np.sum(failed_students)}")

In [None]:
# Example 3: Polynomial fitting
# Generate noisy data
x = np.linspace(0, 10, 50)
y_true = 2 * x**2 + 3 * x + 5
noise = np.random.normal(0, 20, 50)
y_noisy = y_true + noise

print("X values (first 5):", x[:5])
print("Y values (first 5):", y_noisy[:5])

# Fit polynomial (degree 2)
coefficients = np.polyfit(x, y_noisy, deg=2)
print(f"\nFitted coefficients: {coefficients}")
print(f"True coefficients: [2, 3, 5]")

# Generate predictions
y_pred = np.polyval(coefficients, x)

# Calculate R-squared
ss_res = np.sum((y_noisy - y_pred) ** 2)
ss_tot = np.sum((y_noisy - np.mean(y_noisy)) ** 2)
r_squared = 1 - (ss_res / ss_tot)
print(f"\nR-squared: {r_squared:.4f}")

In [None]:
# Example 4: Moving average (time series smoothing)
# Generate time series data
time = np.arange(100)
signal = np.sin(time / 10) + np.random.normal(0, 0.2, 100)

# Calculate moving average
window = 5
moving_avg = np.convolve(signal, np.ones(window)/window, mode='valid')

print(f"Original signal length: {len(signal)}")
print(f"Moving average length: {len(moving_avg)}")
print(f"\nFirst 10 original values: {signal[:10]}")
print(f"First 10 smoothed values: {moving_avg[:10]}")

In [None]:
# Example 5: Distance matrix (pairwise distances)
# Points in 2D space
points = np.random.rand(5, 2) * 10  # 5 points with (x, y) coordinates

print("Points:\n", points)

# Calculate pairwise Euclidean distances
# Using broadcasting
diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
distances = np.sqrt(np.sum(diff**2, axis=2))

print("\nDistance matrix (5x5):")
print(distances)
print("\nDistance from point 0 to all points:", distances[0])

## Summary and Best Practices

### Key Takeaways:
1. **Always use vectorized operations** instead of loops for better performance
2. **Be aware of views vs copies** to avoid unexpected behavior
3. **Use broadcasting** to write concise and efficient code
4. **Choose appropriate data types** to optimize memory usage
5. **Leverage NumPy's built-in functions** instead of reinventing the wheel

### Common Pitfalls:
- Modifying arrays through views unintentionally
- Using Python loops instead of vectorized operations
- Not understanding axis parameter in aggregation functions
- Forgetting that most slicing operations return views

### Next Steps:
- Explore **SciPy** for advanced scientific computing
- Learn **Pandas** for data manipulation and analysis
- Study **Matplotlib** for data visualization
- Practice with real datasets from Kaggle or UCI ML Repository

### Resources:
- [NumPy Documentation](https://numpy.org/doc/)
- [NumPy User Guide](https://numpy.org/doc/stable/user/index.html)
- [NumPy for MATLAB Users](https://numpy.org/doc/stable/user/numpy-for-matlab-users.html)