## Advanced Numpy

**Course:** EE6201 – Power Systems Lab | **Instructor:** V. Seshadri Sravan Kumar | **IIT Hyderabad**

This notebook contains lecture notes and examples on **Broadcasting, Vectorization, and Linear Algebra**. Some examples in this notebook are **adopted or adapted from publicly available resources**.

---

### Broadcasting

Broadcasting allows NumPy to perform arithmetic operations on arrays of different shapes without explicitly copying data. Instead of manually expanding a smaller array to match a larger one (which wastes memory), NumPy automatically "stretches" the smaller array to be compatible with the larger one.

**How does Broadcasting work?**: 
When NumPy operates on two arrays, it compares their shapes element-wise, starting from the rightmost (trailing) dimension.
1. Dimension Alignment: If the arrays do not have the same number of dimensions (ndim), NumPy prepends 1s to the left of the smaller shape until both have the same ndim.
2. Compatibility Check: Two dimensions are compatible if:
   * They are equal, OR
   * One of them is 1.
3. The "Stretch": If a dimension is 1, NumPy treats the data as if it were copied along that dimension to match the larger array's size (without actually using extra memory).

#### Common Scenarios & Examples
A. **1D Array and a Scalar:** The simplest form of broadcasting. The scalar is broadcast across every element of the vector.

B. **2D Array and a Scalar:** The scalar is broadcast across every row and every column of the matrix.

C. **2D Array and a 1D Vector**: If the shapes are $(M, N)$ and $(1, N)$: The vector is broadcast across every row. If the shapes are $(M, N)$ and $(M, 1)$: The vector is broadcast across every column.

##### Exercise 1: 

**Scenario:** You have a $3 \times 3$ matrix representing voltage measurements for 3 Phases over 3 different time intervals. You need to apply different levels of calibration.

*Task:* 
1. Define a $3 \times 3$ matrix V_meas using np.arange(1, 10).reshape(3, 3).
2. Scalar: Add a global noise offset of 0.05 to the entire matrix.
3. Vector (Row-wise): Subtract a phase-specific correction [0.1, 0.2, 0.3] from the matrix. (Each phase in a row gets a different correction).
4. Vector (Column-wise): Multiply the matrix by a time-specific scaling factor [[10], [20], [30]]. (Each time interval/row gets a different multiplier).

In [3]:
import numpy as np

# Step 1: Initialize 3x3 Matrix
V_meas = np.arange(1, 10).reshape(3, 3)

# Step 2: 2D + Scalar (Global Offset)
V_global = ...

# Step 3: 2D + 1D Vector (Phase Correction)
# Shape (3,3) + (3,)
phase_corr = np.array([0.1, 0.2, 0.3])
V_phase_adj = ...

# Step 4: 2D + 1D Column Vector (Time Scaling)
# Shape (3,3) + (3,1)
time_scale = np.array([[10], [20], [30]])
V_time_adj = ...

print("Original Matrix:\n", V_meas)
print("\nAfter Phase Correction (Row-wise):\n", V_phase_adj)
print("\nAfter Time Scaling (Column-wise):\n", V_time_adj)

Original Matrix:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

After Phase Correction (Row-wise):
 Ellipsis

After Time Scaling (Column-wise):
 Ellipsis


---
#### Sorting and Reorganizing Data
NumPy provides highly optimized algorithms (like Quicksort and Mergesort) to rearrange data. When dealing with multidimensional arrays, the axis parameter is crucial:

* `axis=0:` Sorts elements within each column (vertically).
* `axis=1:` Sorts elements within each row (horizontally).

**The "Arg" Methods: Why use them?**: Functions like `argsort` and `argpartition` do not return the sorted values themselves; they return the indices that would sort the array.

| Function | Description | Key Options / Arguments |
|----------|------------|--------------------------|
| `np.sort(x)` | Returns a sorted copy of an array. | `axis`, `kind` ('quicksort', 'mergesort') |
| `x.sort()` | Performs an in-place sort (modifies original array, returns None). | `axis` |
| `np.flip(x)` | Reverses the order of elements along the given axis. | `axis` |
| `np.argsort(x)` | Returns the indices that would sort the array. | `axis`, `kind` |
| `np.take_along_axis()` | Uses indices (from `argsort`) to pick values from another array. | `indices`, `axis` |
| `np.partition(x, k)` | Rearranges the array so the k-th element is in its sorted position. All smaller elements are to its left. | `kth`, `axis` |
| `np.argpartition()` | Returns the indices that partition the array. | `kth`, `axis` |
| `np.split()` | Splits an array into multiple sub-arrays of equal size. | `indices_or_sections`, `axis` |
| `np.array_split()` | Similar to `split`, but allows unequal sub-arrays (no error if division isn't even). | `indices_or_sections`, `axis` |

##### Example 2
**Scenario:** You have a list of Bus IDs and their corresponding voltage magnitudes ($V_{pu}$). You need to find the 2 lowest voltages to trigger a "Low Voltage" alarm, but you must keep the Bus IDs associated with the values.

*Task:*
1. Create a 1D array `bus_ids = [101, 102, 103, 104, 105]`.
2. Create a 1D array `v_mags = [0.98, 0.92, 1.04, 0.89, 1.01]`.
3. Use `np.argsort` on `v_mags` to find the indices of the voltages from lowest to highest.
4. Use those indices to print the `bus_ids` in order of their voltage (worst to best).
5. Use `np.partition` to find the 2 lowest voltages without sorting the entire dataset.

In [5]:
bus_ids = np.array([101, 102, 103, 104, 105])
v_mags = np.array([0.98, 0.92, 1.04, 0.89, 1.01])

# Step 1: Get indices that would sort v_mags
sort_indices = ...

# Step 2: Reorder Bus IDs based on those indices
sorted_buses = bus_ids[...]

# Step 3: Use take_along_axis to get sorted voltages (alternative to slicing)
sorted_v = ...

# Step 4: Partition to find the 2 smallest voltages quickly
# (The first 2 elements will be the smallest, but not necessarily sorted)
partitioned_v = ...

print("Buses sorted by Voltage (Weakest First):", sorted_buses)
print("Sorted Voltages:", sorted_v)
print("Two smallest voltages (unsorted):", ...)

Buses sorted by Voltage (Weakest First): [101 102 103 104 105]
Sorted Voltages: Ellipsis
Two smallest voltages (unsorted): Ellipsis


---
#### Stacking and Concatenation
While NumPy provides several ways to join arrays, it is important to remember that NumPy arrays have a fixed size.

**Performance**: Every time you use hstack, vstack, or concatenate, NumPy creates a brand new array in memory and copies all the data from the old arrays into it.

**Best Practice:** If you know the final size of your dataset, it is much more efficient to pre-allocate an empty array using `np.zeros()` and fill it slice-by-slice, rather than stacking arrays inside a loop.

| Function | Description | Axis Significance |
|----------|------------|-------------------|
| `np.concatenate()` | Joins a sequence of arrays along an existing axis. | You must specify `axis`. Default is `axis=0`. |
| `np.vstack()` | Stacks arrays vertically (row-wise). | Equivalent to concatenation along `axis=0`. |
| `np.hstack()` | Stacks arrays horizontally (column-wise). | Equivalent to concatenation along `axis=1`. |
| `np.stack()` | Joins arrays along a new axis. | Increases the dimensions (e.g., stacking 2D matrices into a 3D block). |

**The Significance of `axis`** When joining or stacking arrays, the `axis` parameter tells NumPy **in which direction the "growth" should happen**:

- **axis = 0 (Rows):** Stacks data **on top of each other**.  
- **axis = 1 (Columns):** Stacks data **side-by-side**.  

##### Example: Building a System Snapshot
**Scenario:** You have two separate datasets: one for Phase A voltages and one for Phase B voltages of the same 4 buses. You need to combine them into a single matrix.

*Task:*
1. Create two 1D arrays: `phase_A = [1.0, 1.01, 0.99, 1.0]` and `phase_B = [0.98, 0.97, 0.99, 1.0]`.
2. Use `np.vstack` to combine them so that each row is a phase.
3. Use `np.hstack` (after reshaping) or np.concatenate to combine them so that each row is a bus and each column is a phase.
4. Use `np.stack` to see how it creates a new dimension.

In [7]:
phase_A = np.array([1.0, 1.01, 0.99, 1.0])
phase_B = np.array([0.98, 0.97, 0.99, 1.0])

# Step 1: Vertical Stack (Phase A as Row 0, Phase B as Row 1)
v_stacked = ...

# Step 2: Horizontal Combination (Bus 1: [PhA, PhB], Bus 2: [PhA, PhB]...)
# Hint: You may need to reshape the 1D arrays to (4,1) first
h_stacked = ...

# Step 3: New Axis Stack
# Results in a (2, 4) shape but allows for adding a 3rd phase later
new_stack = ...

print("Vertically Stacked (Phase-wise):\n", v_stacked)
print("\nHorizontally Stacked (Bus-wise):\n", h_stacked)
print("\nNew Axis Stack Shape:", ...)

Vertically Stacked (Phase-wise):
 Ellipsis

Horizontally Stacked (Bus-wise):
 Ellipsis

New Axis Stack Shape: Ellipsis


---
### Searching in NumPy
Searching allows you to find the locations (indices) of elements that satisfy a specific condition. This is often more useful than just getting the values because the indices allow you to cross-reference multiple datasets (e.g., finding the name of a bus that has an undervoltage condition).

| Function | Description | Typical Use Case |
|----------|------------|-----------------|
| `np.nonzero(x)` | Returns the indices of elements that are non-zero. | Finding active lines in a connectivity matrix. |
| `np.where(condition)` | Returns the indices where the condition is True. Equivalent to `nonzero` on the boolean mask. | Identifying buses where V < 0.9 pu. |
| `np.where(cond, x, y)` | Vectorized If-Else: Returns an array where elements are taken from `x` if `cond` is True, and from `y` if False. | Replacing fault values with a nominal 1.0 pu value. |

##### Example: Automated Fault Correction

**Scenario:** You are monitoring a small 5-bus system. You have a vector of voltage measurements, but some sensors are malfunctioning and reporting 0.0 or extremely low values.

*Task:*
1. Create a voltage array `v_mags = [1.02, 0.0, 0.98, 0.03, 1.01]`.
2. Use `np.where(condition)` to find the indices of the buses that are reporting values below $0.5$ pu.
3. Use `np.where(cond, x, y)` to create a new "Clean" array where:
   - If the voltage is $> 0.5$, keep the original value.
   - If the voltage is $\le 0.5$, replace it with the nominal value of $1.0$.
4. Use `np.nonzero()` to find which buses are currently "Live".

In [8]:
v_mags = np.array([1.02, 0.0, 0.98, 0.03, 1.01])

# Step 1: Find indices of fault buses (< 0.5 pu)
fault_indices = ...

# Step 2: Create a corrected array 
# If v_mags < 0.5, set to 1.0, else keep v_mags
v_corrected = ...

# Step 3: Find indices of non-zero measurements
live_buses = ...

print("Indices of faulty sensors:", fault_indices)
print("Corrected Voltage Vector:", v_corrected)
print("Indices of buses with non-zero readings:", live_buses)

Indices of faulty sensors: Ellipsis
Corrected Voltage Vector: Ellipsis
Indices of buses with non-zero readings: Ellipsis


---
#### Finding Unique Elements: `np.unique`

Unlike the standard Python set(), `np.unique` is optimized for multidimensional arrays and can return metadata about the occurrences of each element.

| Parameter        | Description | Syntax Example | Return Values |
|-----------------|------------|----------------|---------------|
| `axis`          | The axis to operate on. If `None` (default), the array is flattened. If `0`, it finds unique rows; if `1`, unique columns. | `np.unique(arr, axis=0)` | Array of unique elements along the specified axis |
| `return_index`  | If `True`, returns the indices of the first occurrences of the unique values in the original array. | `np.unique(arr, return_index=True)` | Tuple: `(unique_array, indices_of_first_occurrences)` |
| `return_inverse`| If `True`, returns the indices to reconstruct the original array from the unique values. | `np.unique(arr, return_inverse=True)` | Tuple: `(unique_array, inverse_indices)` |
| `return_counts` | If `True`, returns the number of times each unique value appears. | `np.unique(arr, return_counts=True)` | Tuple: `(unique_array, counts_array)` |

##### Example

**Scenario:** You have a list of equipment codes representing transformers (T), circuit breakers (B), and isolators (I) across various bays in a substation. You need to identify the unique types of equipment and determine which one is most common.

*Task*:
1. Create a 1D array `equipment_codes = ['T', 'B', 'B', 'I', 'T', 'B', 'T', 'B']`.
2. Use `np.unique` to find the unique equipment types.
3. Use the `return_counts` parameter to find how many of each type exist.
4. Use the `return_index` parameter to find the position of the first instance of each equipment type.

In [10]:
equipment_codes = np.array(['T', 'B', 'B', 'I', 'T', 'B', 'T', 'B'])

# Step 1: Find unique values and their frequencies
# Hint: You can unpack multiple return values: val, count = np.unique(...)
unique_vals, counts = ...

# Step 2: Find unique values and their first occurrence indices
_, first_indices = ...

print("Unique Equipment Types:", ...)
print("Equipment Counts:", ...)
print("First Appearance Indices:", ...)

# Bonus: Which equipment is most frequent?
# Hint: Use unique_vals[np.argmax(counts)]
most_frequent = ...
print("Most frequent equipment:", ...)

TypeError: cannot unpack non-iterable ellipsis object

#### Statistical Binning: `np.histogram`

Binning allows you to take a continuous set of data and group them into discrete intervals (bins) to see the underlying distribution.

*Ways to Specify Bins*
1. Integer: `bins=10` — NumPy automatically creates 10 equal-width bins between the minimum and maximum value.
2. Sequence: `bins=[0, 0.9, 1.1, 1.5]` — You manually define the edges of the bins. This is useful for grouping "Undervoltage," "Normal," and "Overvoltage" regions.

| Parameter | Description | Syntax Example | Return Values |
|-----------|------------|----------------|---------------|
| `bins`   | Defines the edges of the bins. Can be an integer (number of bins) or a sequence of bin edges. | `np.histogram(data, bins=5)` | `(counts, bin_edges)` |
| `range`  | The lower and upper range of the bins. Values outside this range are ignored. | `np.histogram(data, range=(0.9, 1.1))` | `(counts, bin_edges)` |
| `density` | If `True`, normalizes the histogram so that the area under the histogram sums to 1 (returns probability density). | `np.histogram(data, density=True)` | `(density_values, bin_edges)` |
| `weights` | An array of weights of the same shape as `data`. Used for weighted histograms. | `np.histogram(data, weights=w_array)` | `(weighted_counts, bin_edges)` |

##### Example
**Scenario:** You have a dataset of 100 voltage readings. You need to categorize how many readings fall into specific "Quality Zones": Critical Low ($<0.9$), Normal ($0.9$ to $1.1$), and Critical High ($>1.1$).

*Task:*
1. Generate a sample dataset: v_data = `np.random.normal(1.0, 0.1, 100)`.
2. Use `np.histogram` with specific bin edges: [0, 0.9, 1.1, 1.5].
3. Use the `range` parameter to ignore any extreme outliers below $0.5$ or above $1.5$.
4. Observe the return values: `counts` (the number of readings in each zone) and `bin_edges`.

In [11]:
# Step 1: Simulated Voltage Data
np.random.seed(42)
v_data = np.random.normal(1.0, 0.1, 100)

# Step 2: Binning into 3 specific zones
# Zone 1: [0, 0.9], Zone 2: [0.9, 1.1], Zone 3: [1.1, 1.5]
zones = [0, 0.9, 1.1, 1.5]
counts, bin_edges = ...

# Step 3: Using Density for Probability
# Find the probability of being in the "Normal" zone
prob_density, _ = ...

print("Counts per Zone (Critical Low, Normal, Critical High):", counts)
print("Bin Edges used:", bin_edges)

TypeError: cannot unpack non-iterable ellipsis object