# Module 2.1: NumPy in Depth

Welcome to the data analysis module! We'll start with a deeper look at **NumPy**, the fundamental package for numerical computing in Python. While we used it for basic linear algebra and stats, its true power lies in its core data structure: the **N-dimensional array (`ndarray`)**. 🔢

**Why is NumPy so important?**
* **Performance:** NumPy operations are implemented in C, making them incredibly fast—orders of magnitude faster than standard Python lists for mathematical tasks.
* **Convenience:** It allows you to perform complex array operations (like filtering, transforming, and aggregating) in a single, readable line of code.
* **Foundation:** Almost every other data science library, including Pandas, is built on top of NumPy.

**Goal of this Notebook:**
We'll go beyond `np.array()` and learn the most common and powerful NumPy features:

1.  Creating NumPy Arrays efficiently.
2.  Indexing and Slicing (selecting specific data).
3.  Conditional Selection (Boolean Indexing).
4.  Performing Universal Functions (`ufuncs`).

In [None]:
import numpy as np

## 1. Creating NumPy Arrays

Besides creating arrays from Python lists, NumPy provides several handy functions for generating arrays from scratch.

In [None]:
# Create an array of a specific range of numbers
# np.arange(start, stop, step)
range_array = np.arange(0, 10, 2) # From 0 up to (but not including) 10, in steps of 2
print(f"Array from arange: {range_array}\n")

# Create an array of zeros or ones
# Useful for initializing arrays before you fill them with data
zeros_array = np.zeros((2, 3)) # A 2x3 matrix of zeros
print(f"Zeros array:\n{zeros_array}\n")

# Create an array of evenly spaced numbers over a specified interval
# np.linspace(start, stop, num_points)
linspace_array = np.linspace(0, 10, 5) # 5 points from 0 to 10 (inclusive)
print(f"Linspace array: {linspace_array}\n")

# Create an identity matrix
identity_matrix = np.eye(3)
print(f"Identity matrix:\n{identity_matrix}\n")

# Create an array of random numbers
random_array = np.random.rand(3, 2) # A 3x2 matrix of random values between 0 and 1
print(f"Random array:\n{random_array}")

## 2. Indexing and Slicing

This is the process of selecting and retrieving subsets of data from a NumPy array. The syntax is powerful and concise.

In [None]:
# Let's create a sample 2D array (a matrix)
matrix = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])
print(f"Original Matrix:\n{matrix}\n")

# Get a single element [row, column]
element = matrix[1, 2] # Row 1, Column 2
print(f"Element at [1, 2]: {element}\n")

# Get an entire row
row = matrix[0] # Get the first row
print(f"First row: {row}\n")

# Slicing: Get a sub-matrix
# syntax: array[start_row:end_row, start_col:end_col]
sub_matrix = matrix[:2, 1:] # First 2 rows, from column 1 to the end
print(f"Sub-matrix (top right):\n{sub_matrix}")

## 3. Conditional Selection (Boolean Indexing)

This is one of NumPy's most powerful features. It lets you select elements from an array based on a condition.

In [None]:
arr = np.arange(1, 11) # Array from 1 to 10
print(f"Original array: {arr}\n")

# First, we create a boolean array based on a condition
bool_arr = arr > 5
print(f"Boolean mask (arr > 5): {bool_arr}\n")

# Now, we use this boolean array to select elements from the original array
# It will only return the elements where the boolean mask is True
filtered_arr = arr[bool_arr]
print(f"Elements greater than 5: {filtered_arr}\n")

# You can also do this in a single line
print(f"Elements less than 4: {arr[arr < 4]}")

## 4. Universal Functions (ufuncs)

NumPy `ufuncs` are functions that operate on `ndarrays` in an element-by-element fashion. They are incredibly fast.


In [None]:
arr = np.arange(1, 6)
print(f"Original array: {arr}\n")

# Basic arithmetic ufuncs
print(f"Array + 100: {arr + 100}")
print(f"Array squared: {arr ** 2}\n")

# Mathematical ufuncs
print(f"Square root of array: {np.sqrt(arr)}")
print(f"Sine of array: {np.sin(arr)}")

## ✅ What's Next?

You now have a strong command of NumPy's core functionalities. This understanding is the perfect springboard for learning **Pandas**, the most important data analysis library in Python.

In the next notebook, we will introduce the Pandas **Series** and **DataFrame**, the workhorses of data manipulation.