### Fancy Indexing (Advanced Practice)

This notebook contains **advanced** NumPy fancy-indexing problems **with solutions**.

**Best practices used here:**
- Each task is self-contained and reproducible.
- Solutions include correctness checks (`assert`) and short explanations.
- We prefer vectorized NumPy operations over Python loops.

> Tip: Fancy indexing usually returns a **copy** (unlike slicing). For in-place accumulation with repeated indices, use `np.add.at` / `np.maximum.at` etc.

In [1]:
import numpy as np

np.set_printoptions(edgeitems=6, linewidth=120)

rng = np.random.default_rng(42)

def check(name, got, expected):
    """Small helper for readable asserts."""
    assert np.array_equal(got, expected), f"{name} failed.\nGot:\n{got}\nExpected:\n{expected}"
    return True

## Problem 1 — Pairwise selection (row/col) from a matrix

You are given a 2D array `A` and two 1D arrays `rows` and `cols` of the same length.

**Task:** Select the elements `(rows[i], cols[i])` for all `i` (i.e., pairwise / zipped indexing).

**Goal:** Return a 1D array `picked` of length `len(rows)`.

In [2]:
# Setup
A = np.arange(1, 26).reshape(5, 5)
rows = np.array([0, 1, 3, 4])
cols = np.array([4, 2, 0, 3])

# TODO: create `picked`
# picked = ...

A, rows, cols

(array([[ 1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10],
        [11, 12, 13, 14, 15],
        [16, 17, 18, 19, 20],
        [21, 22, 23, 24, 25]]),
 array([0, 1, 3, 4]),
 array([4, 2, 0, 3]))

In [3]:
# Solution
picked = A[rows, cols]

# Checks
expected = np.array([A[0, 4], A[1, 2], A[3, 0], A[4, 3]])
check("Problem 1", picked, expected)

picked

array([ 5,  8, 16, 24])

## Problem 2 — Cross-product selection with `np.ix_`

Given `A` and lists of `row_idx` and `col_idx`, extract the **submatrix** formed by *all combinations* of those rows and columns.

**Task:** Create `sub` with shape `(len(row_idx), len(col_idx))`.

> Warning: `A[row_idx, col_idx]` does *pairwise* selection if both are 1D arrays of the same shape. For a Cartesian product, use `np.ix_`.

In [4]:
# Setup
A = np.arange(1, 1 + 6*8).reshape(6, 8)
row_idx = np.array([0, 2, 5])
col_idx = np.array([1, 3, 7])

# TODO: create `sub` (Cartesian product)
# sub = ...

A

array([[ 1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16],
       [17, 18, 19, 20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29, 30, 31, 32],
       [33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48]])

In [5]:
# Solution
sub = A[np.ix_(row_idx, col_idx)]

# Checks
expected = np.vstack([A[r, col_idx] for r in row_idx])
check("Problem 2", sub, expected)

sub

array([[ 2,  4,  8],
       [18, 20, 24],
       [42, 44, 48]])

## Problem 3 — Gather `k` columns per row (ragged-style) using `take_along_axis`

You have a matrix `X` and, for each row, a set of `k` column indices to pick.

**Task:** Build `picked` with shape `(n_rows, k)` such that:
- `picked[i, j] = X[i, idx[i, j]]`

**Constraint:** Do it without Python loops.

In [6]:
# Setup
X = rng.integers(0, 100, size=(5, 7))
idx = rng.integers(0, X.shape[1], size=(5, 3))  # 3 columns per row

# TODO: create `picked`
# picked = ...

X, idx

(array([[ 8, 77, 65, 43, 43, 85,  8],
        [69, 20,  9, 52, 97, 73, 76],
        [71, 78, 51, 12, 83, 45, 50],
        [37, 18, 92, 78, 64, 40, 82],
        [54, 44, 45, 22,  9, 55, 88]]),
 array([[0, 6, 5],
        [1, 4, 1],
        [5, 4, 2],
        [0, 6, 3],
        [6, 4, 5]]))

In [7]:
# Solution
# Option A: take_along_axis (recommended)
picked = np.take_along_axis(X, idx, axis=1)

# Checks (slow reference with Python loop)
expected = np.array([[X[i, idx[i, j]] for j in range(idx.shape[1])] for i in range(X.shape[0])])
check("Problem 3", picked, expected)

picked

array([[ 8,  8, 85],
       [20, 97, 20],
       [45, 83, 51],
       [37, 82, 78],
       [88,  9, 55]])

## Problem 4 — Top-`k` per row (indices + values) with fancy indexing

Given a 2D array `S`, find the **top-`k` largest** elements in each row.

**Tasks:**
1. Create `top_idx` of shape `(n_rows, k)` containing the column indices of the top `k` values per row.
2. Create `top_vals` of shape `(n_rows, k)` containing the corresponding values.

**Notes:**
- If you need *sorted* top-`k` (descending), sort the `k` values per row.
- Use `argpartition` for efficiency; then refine order if needed.

In [8]:
# Setup
S = rng.normal(size=(6, 10))
k = 3

# TODO:
# top_idx = ...
# top_vals = ...

S[:2]

array([[-0.35213355,  0.53230919,  0.36544406,  0.41273261,  0.430821  ,  2.1416476 , -0.40641502, -0.51224273,
        -0.81377273,  0.61597942],
       [ 1.12897229, -0.11394746, -0.84015648, -0.82448122,  0.65059279,  0.74325417,  0.54315427, -0.66550971,
         0.23216132,  0.11668581]])

In [9]:
# Solution
# 1) Get indices of k largest per row (unordered among the k)
cand_idx = np.argpartition(S, -k, axis=1)[:, -k:]

# 2) Gather candidate values
cand_vals = np.take_along_axis(S, cand_idx, axis=1)

# 3) Sort those k values descending within each row
order = np.argsort(cand_vals, axis=1)[:, ::-1]
top_idx = np.take_along_axis(cand_idx, order, axis=1)
top_vals = np.take_along_axis(cand_vals, order, axis=1)

# Checks: each row's top_vals match sorting of entire row
expected_vals = np.sort(S, axis=1)[:, ::-1][:, :k]
assert np.allclose(top_vals, expected_vals)

top_idx, top_vals

(array([[5, 9, 1],
        [0, 5, 4],
        [1, 3, 6],
        [2, 4, 9],
        [3, 0, 8],
        [0, 3, 5]]),
 array([[2.1416476 , 0.61597942, 0.53230919],
        [1.12897229, 0.74325417, 0.65059279],
        [0.87142878, 0.67891356, 0.63128823],
        [1.49494131, 0.96827835, 0.71122658],
        [0.85797588, 0.79334724, 0.49716074],
        [0.69048535, 0.62559039, 0.45677524]]))

## Problem 5 — Assignment pitfalls with repeated indices (+ correct accumulation)

Fancy-index assignment like `a[idx] += 1` can behave unexpectedly when `idx` contains duplicates.

Given `a` and `idx`, you want to **increment** counts at positions in `idx` (including duplicates).

**Task:** Produce `counts` such that each occurrence of an index increments that position.

> Best practice: use `np.add.at` for unbuffered in-place accumulation with repeated indices.

In [10]:
# Setup
a = np.zeros(8, dtype=int)
idx = np.array([1, 1, 1, 3, 3, 6])

# TODO: create `counts` from scratch or starting from zeros
# counts = ...

a, idx

(array([0, 0, 0, 0, 0, 0, 0, 0]), array([1, 1, 1, 3, 3, 6]))

In [11]:
# Solution
counts = np.zeros_like(a)
np.add.at(counts, idx, 1)

# Checks
expected = np.array([0, 3, 0, 2, 0, 0, 1, 0])
check("Problem 5", counts, expected)

counts

array([0, 3, 0, 2, 0, 0, 1, 0])

## Problem 6 — Mask + fancy indexing to extract structured blocks

You have a feature matrix `F` of shape `(n_samples, n_features)` and a boolean mask selecting a subset of samples.
You also have an index list selecting a subset of features.

**Task:** Extract `block = F[mask][:, feat_idx]` in a single expression, and return:
- `block`
- `row_positions`: the integer indices of selected rows in the original array (useful for bookkeeping)

> Best practice: `np.flatnonzero(mask)` gives row indices directly.

In [12]:
# Setup
F = rng.integers(0, 50, size=(10, 6))
mask = F[:, 0] % 2 == 0          # keep rows where first feature is even
feat_idx = np.array([5, 2, 2, 0]) # note: includes a repeated feature index

# TODO:
# block = ...
# row_positions = ...

F

array([[34, 23, 35,  8, 45, 25],
       [46,  7, 24, 34, 24, 22],
       [ 8, 19, 11, 15, 34, 31],
       [30, 18, 47,  4, 17,  5],
       [16, 48, 18, 45, 24, 34],
       [22, 13, 38, 48, 13, 38],
       [13, 35, 39, 22, 36, 13],
       [ 3,  4, 22, 45,  6, 22],
       [35, 10, 36, 15, 40, 28],
       [27,  8, 23, 42,  0, 37]])

In [13]:
# Solution
row_positions = np.flatnonzero(mask)
block = F[row_positions][:, feat_idx]

# Checks
expected = np.array([F[i, feat_idx] for i in row_positions])
check("Problem 6 (block)", block, expected)
check("Problem 6 (rows)", row_positions, np.where(mask)[0])

row_positions, block

(array([0, 1, 2, 3, 4, 5]),
 array([[25, 35, 35, 34],
        [22, 24, 24, 46],
        [31, 11, 11,  8],
        [ 5, 47, 47, 30],
        [34, 18, 18, 16],
        [38, 38, 38, 22]]))

## Problem 7 — Clip out-of-range indices safely, then gather

Sometimes you receive index arrays that might include negatives or values beyond the last valid index.

Given `x` and `idx`, build `safe_picked` such that:
- Indices < 0 are treated as 0
- Indices >= len(x) are treated as len(x)-1
- Then gather `x[safe_idx]`

**Task:** Do it with vectorized NumPy operations (no loops).

In [14]:
# Setup
x = np.array([10, 20, 30, 40, 50])
idx = np.array([-3, 0, 2, 9, 4, -1])

# TODO:
# safe_idx = ...
# safe_picked = ...

x, idx

(array([10, 20, 30, 40, 50]), array([-3,  0,  2,  9,  4, -1]))

In [15]:
# Solution
safe_idx = np.clip(idx, 0, x.size - 1)
safe_picked = x[safe_idx]

# Checks
expected = np.array([10, 10, 30, 50, 50, 10])
check("Problem 7", safe_picked, expected)

safe_idx, safe_picked

(array([0, 0, 2, 4, 4, 0]), array([10, 10, 30, 50, 50, 10]))

## Problem 8 — Scatter into a 2D grid with repeated coordinates (use `np.add.at`)

You have a list of `(row, col)` coordinates and values to add into a 2D grid. Coordinates may repeat.

**Task:** Create `grid` and accumulate all contributions.

**Inputs:**
- `coords`: array of shape `(m, 2)` storing `(r, c)`
- `vals`: array of shape `(m,)`

> Best practice: `np.add.at(grid, (rows, cols), vals)` handles repeated positions correctly.

In [16]:
# Setup
H, W = 4, 5
coords = np.array([
    [0, 1],
    [0, 1],
    [2, 3],
    [3, 0],
    [2, 3],
    [2, 4],
])
vals = np.array([5, 7, 2, 9, 4, 1])

# TODO:
# grid = ...

coords, vals

(array([[0, 1],
        [0, 1],
        [2, 3],
        [3, 0],
        [2, 3],
        [2, 4]]),
 array([5, 7, 2, 9, 4, 1]))

In [17]:
# Solution
grid = np.zeros((H, W), dtype=int)
r = coords[:, 0]
c = coords[:, 1]
np.add.at(grid, (r, c), vals)

# Checks
expected = np.array([
    [0, 12, 0, 0, 0],
    [0,  0, 0, 0, 0],
    [0,  0, 0, 6, 1],
    [9,  0, 0, 0, 0],
])
check("Problem 8", grid, expected)

grid

array([[ 0, 12,  0,  0,  0],
       [ 0,  0,  0,  0,  0],
       [ 0,  0,  0,  6,  1],
       [ 9,  0,  0,  0,  0]])

### Wrap-up

Key patterns you practiced:
- Pairwise selection: `A[rows, cols]`
- Cartesian submatrix selection: `A[np.ix_(rows, cols)]`
- Per-row gathers: `np.take_along_axis`
- Top-k per row: `argpartition` + `take_along_axis` + `argsort`
- Correct repeated-index updates: `np.add.at`
- Safe indices: `np.clip`
