Write a Python function to generate random subsets of a given dataset. The function should take in a 2D numpy array X, a 1D numpy array y, an integer n_subsets, and a boolean replacements. It should return a list of n_subsets random subsets of the dataset, where each subset is a tuple of (X_subset, y_subset). If replacements is True, the subsets should be created with replacements; otherwise, without replacements.
replacements true means duplicate indices are allowed
all susbsets dont necessarily have the same size

In [8]:
import numpy as np
def get_random_subsets(X, y, n_subsets, replacements):
  n,m=X.shape
  np.random.seed(42)
  subset_size=n if replacements else n//2
  idx=np.array([np.random.choice(n,subset_size,replace=replacements) for _ in range(n_subsets)])# np.random.choice(n, subset_size, replace=False) behaves like a shuffle if subset_size == n and replace=False
  return [(X[idx][i].tolist(),y[idx][i].tolist()) for i in range(n_subsets)]#X[idx] selects rows (or elements) from X in the order specified by the indices in idx


You are absolutely correct to bring this up—thank you for clarifying! Let's go through the code and behavior again step by step.

---

### **How `X[idx]` Works in This Context**

In the code:
```python
idx = np.array([np.random.choice(n, subset_size, replace=replacements) for _ in range(n_subsets)])
```

- `idx` is a **2D array**, where each row contains indices generated by `np.random.choice`.
- When `X[idx]` is used, it applies **advanced indexing**.

---

### **What Happens with `X[idx]` in Advanced Indexing**
For a 2D array `X` and a 2D array `idx`:
- **`X[idx]` treats each row of `idx` as a separate set of row indices**.
- The output is effectively **a list of subsets of rows**, with each subset corresponding to a row in `idx`.

So, **`X[idx][i]`** selects the rows of `X` based on the indices in the `i`-th row of `idx`.

---

### **Code Walkthrough with Explanation**

Given:

```python
X = np.array([[1, 2],
              [3, 4],
              [5, 6],
              [7, 8],
              [9, 10]])
y = np.array([1, 2, 3, 4, 5])
n_subsets = 3
replacements = False
```

1. **Step 1: Generate `idx`**:
   ```python
   np.random.seed(42)
   subset_size = n // 2  # Since replacements=False, subset size is 2
   idx = np.array([np.random.choice(5, 2, replace=False) for _ in range(3)])
   ```
   Example output for `idx`:
   ```
   [[1, 4],
    [3, 1],
    [1, 0]]
   ```

2. **Step 2: Use `X[idx]`**:
   When `X[idx]` is applied:
   - For the first subset (`idx[0] = [1, 4]`), it selects rows `X[1]` and `X[4]`.
   - For the second subset (`idx[1] = [3, 1]`), it selects rows `X[3]` and `X[1]`.
   - For the third subset (`idx[2] = [1, 0]`), it selects rows `X[1]` and `X[0]`.

   Example:
   ```
   X[idx] = [
       [[3, 4], [9, 10]],  # Rows corresponding to [1, 4]
       [[7, 8], [3, 4]],   # Rows corresponding to [3, 1]
       [[3, 4], [1, 2]]    # Rows corresponding to [1, 0]
   ]
   ```



In [9]:
X = np.array([[1, 2],
                  [3, 4],
                  [5, 6],
                  [7, 8],
                  [9, 10]])
y = np.array([1, 2, 3, 4, 5])
n_subsets = 3
replacements = False
get_random_subsets(X, y, n_subsets, replacements)


[([[3, 4], [9, 10]], [2, 5]),
 ([[7, 8], [3, 4]], [4, 2]),
 ([[3, 4], [1, 2]], [2, 1])]

The function generates three random subsets of the dataset without replacements. Each subset includes 50% of the samples (since replacements=False). The samples are randomly selected without duplication.

