Merge pull request #177 from asmeurer/docs-fixes

More docs fixes
Quansight-Labs · May 20, 2024 · 0ba06d3 · 0ba06d3
2 parents d6c87eb + aa8b264
commit 0ba06d3
Show file tree

Hide file tree

Showing 7 changed files with 68 additions and 67 deletions.
diff --git a/docs/indexing-guide/index.md b/docs/indexing-guide/index.md
@@ -39,7 +39,7 @@ for each of the remaining index types, the basic indices:
 [ellipses](multidimensional-indices/ellipses.md), and
 [newaxis](multidimensional-indices/newaxis.md); and the advanced indices:
 [integer arrays](multidimensional-indices/integer-arrays.md) and [boolean
-arrays](multidimensional-indices/boolean-arrays.md).
+arrays](multidimensional-indices/boolean-arrays.md) (i.e., masks).
 
 Finally, a page on [other topics relevant to indexing](other-topics.md) covers
 a set of miscellaneous topics about NumPy arrays that are useful for

diff --git a/docs/indexing-guide/multidimensional-indices/boolean-arrays.md b/docs/indexing-guide/multidimensional-indices/boolean-arrays.md
@@ -156,7 +156,7 @@ It's important to not be fooled by this way of constructing a mask. Even
 though the *expression* `(a > 0) & (a % 2 == 1)` depends on `a`, the resulting
 *array itself* does not---it is just an array of booleans. **Boolean array
 indexing, as with [all other types of indexing](../intro.md), does not depend
-on the values of the array, only in the positions of its elements.**
+on the values of the array, only on the positions of its elements.**
 
 This distinction might feel overly pedantic, but it matters once you realize
 that a mask created with one array can be used on another array, so long as it
@@ -186,10 +186,10 @@ both. -->
    >>> plt.scatter(x, y, marker=',', s=1)
    <matplotlib.collections.PathCollection object at ...>
 
-If we want to show only those x values that are positive, we could easily do
-this by modifying the ``linspace`` call that created ``x``. But what if we
-want to show only those ``y`` values that are positive? The only way to do
-this is to select them using a mask:
+If we want to show only those :math:`x` values that are positive, we could
+easily do this by modifying the ``linspace`` call that created ``x``. But what
+if we want to show only those :math:`y` values that are positive? The only way
+to do this is to select them using a mask:
 
 .. plot::
    :context: close-figs
@@ -359,19 +359,21 @@ Masking a subset of dimension is not as common as masking the entire array
 "array of subarrays". For instance, suppose we have a video with 1920 x 1080
 pixels and 500 frames. This might be represented as an array of shape `(500,
 1080, 1920, 3)`, where the final dimension, 3, represents the 3 RGB color
-values of a pixel. We can think of this array as 500 `(1080, 1920, 3)`
-"frames". Or as 500 x 1080 x 1920 3-tuple "pixels". Or we could slice along
-the last dimension and think of it as 3 `(500, 1080, 1920)` video "channels",
-one for each primary color.
+values of a pixel. We can think of this array as 500 different 1080 &times;
+1920 &times; 3 "frames". Or as a 500 &times; 1080 &times; 1920 array of
+3-tuple "pixels". Or we could slice along the last dimension and think of it
+as three 500 &times; 1080 &times; 1920 video "channels", one for each primary
+color.
 
 In each case, we imagine that our array is really an array (or a stack or
 batch) of subarrays, where some of our dimensions are the "stacking"
 dimensions and some of them are the array dimensions. This way of thinking is
 also common when doing linear algebra on arrays. The last two dimensions
 (typically) are considered matrices, and the leading dimensions are batch
-dimensions. An array of shape `(10, 5, 4)` might be thought of as ten 5 x 4
-matrices. NumPy linear algebra functions like `solve` and the `@` matmul
-operator will automatically operate on the last two dimensions of an array.
+dimensions. An array of shape `(10, 5, 4)` might be thought of as ten 5
+&times; 4 matrices. NumPy linear algebra functions like `solve` and the `@`
+matmul operator will automatically operate on the last two dimensions of an
+array.
 
 So, how does this relate to using a boolean array index to select only a
 subset of the array dimensions? Well, we might want to use a boolean index to
@@ -437,18 +439,22 @@ saturation of only those pixels:
    >>> hsv_image[high_sat_mask, 1] = np.clip(hsv_image[high_sat_mask, 1] + 0.3, 0, 1)
    >>> # Convert back to RGB
    >>> enhanced_color_image = color.hsv2rgb(hsv_image)
-   >>> imshow(enhanced_color_image, "Saturated Image")
+   >>> imshow(enhanced_color_image, "Saturated Image (Better)")
 
 ```
 
 Here, `hsv_image.shape` is `(512, 512, 3)`, so our mask `hsv_image[:, :, 1] >
-0.6` has shape `(512, 512)`, i.e., the shape of the first two dimensions. In
-other words, the mask has one value for each pixel, either `True` if the
-saturation is `> 0.6` or `False` if it isn't. To add 0.3 to only those pixels
-above the threshold, we mask the original array with `hsv_image[high_sat_mask,
-1]`. The `high_sat_mask` part of the index selects only those pixel values
-that have high saturation, and the `1` in the final dimension selects the
-saturation channel for those pixels.
+0.6`[^high_sat_mask-footnote] has shape `(512, 512)`, i.e., the shape of the
+first two dimensions. In other words, the mask has one value for each pixel,
+either `True` if the saturation is `> 0.6` or `False` if it isn't. To add
+`0.3` saturation to only those pixels above the threshold, we mask the
+original array with `hsv_image[high_sat_mask, 1]`. The `high_sat_mask` part of
+the index selects only those pixel values that have high saturation, and the
+`1` in the final dimension selects the saturation channel for those pixels.
+
+[^high_sat_mask-footnote]: We could have also written `(hsv_image > 0.6)[:, :,
+    1]`, although this would be less efficient because it would unnecessarily
+    compute `> 0.6` for the hue and value channels.
 
 (nonzero-equivalence)=
 ### `nonzero()` Equivalence
@@ -674,9 +680,9 @@ Or if it had no actual `0`s:[^0-d-mask-footnote]
 array([1, 1, 2])
 ```
 
-But even if `a` is a 0-D array, i.e., a single scalar value, we would expect
-this sort of thing to still work, since, as we said, `a[a == 0] = -1` should
-work for *any* array. And indeed, it does:
+But even if `a` is a 0-D array, i.e., a single scalar value, we would still
+expect this sort of thing to still work, since, as we said, `a[a == 0] = -1`
+should work for *any* array. And indeed, it does:
 
 ```py
 >>> a = np.asarray(0)
@@ -714,7 +720,7 @@ array([], dtype=int64)
 ```
 
 In this case, `a[a == 0] = -1` would assign `-1` to all the values in `a[a
-== 0]`, which would be no values, so `a` would remain unchanged:
+== 0]`, i.e., no values, so `a` would remain unchanged:
 
 ```py
 >>> a[a == 0] = -1

diff --git a/docs/indexing-guide/multidimensional-indices/integer-arrays.md b/docs/indexing-guide/multidimensional-indices/integer-arrays.md
@@ -108,26 +108,12 @@ For example:
 ```
 
 In particular, even when the index array `idx` has more than one dimension, an
-integer array index still only selects elements from a single axis of `a`.
-
-```
->>> a = np.array([[100, 101, 102],
-...               [103, 104, 105]])
->>> idx = np.array([0, 0, 1])
->>> a[idx] # Index the first dimension
-array([[100, 101, 102],
-       [100, 101, 102],
-       [103, 104, 105]])
->>> a[:, idx] # Index the second dimension
-array([[100, 100, 101],
-       [103, 103, 104]])
-```
-
-It would appear that this limits the ability to arbitrarily shuffle elements
-of `a` using integer indexing. For instance, suppose we want to create the
-array `[105, 100]` from the above 2-D `a`. Based on the above examples, it
-might not seem possible. The elements `105` and `100` are not in the same row
-or column of `a`.
+integer array index still only selects elements from a single axis of `a`. It
+would appear that this limits the ability to arbitrarily shuffle elements of
+`a` using integer indexing. For instance, suppose we want to create the array
+`[105, 100]` from the above 2-D `a`. Based on the above examples, it might not
+seem possible, since the elements `105` and `100` are not in the same row or
+column of `a`.
 
 However, this is doable by providing multiple integer array
 indices:
@@ -136,11 +122,12 @@ indices:
 > **When multiple integer array indices are provided, the elements of each
 > index are selected correspondingly for that axis.**
 
-It's perhaps most illustrative to
-show this as an example. Given the above `a`, we can produce the array `[105,
-100]` using.
+It's perhaps most illustrative to show this as an example. Given the above
+`a`, we can produce the array `[105, 100]` using
 
 ```
+>>> a = np.array([[100, 101, 102],
+...               [103, 104, 105]])
 >>> idx = (np.array([1, 0]), np.array([2, 0]))
 >>> a[idx]
 array([105, 100])
@@ -415,9 +402,9 @@ array([105, 100])
 However, you might have noticed that this behavior is somewhat unusual
 compared to other index types. For all other index types we've discussed so
 far, such as [slices](../slices.md) and [integer indices](../integer-indices.md),
-each index applies "independently" along each dimension. For example, `x[0:3,
-0:2]` applies the slice `0:3` to the first dimension of `x` and `0:2` to the
-second dimension. The resulting array has `3*2 = 6` elements, because there
+each index applies "independently" along each dimension. For example, `x[0:2,
+0:3]` applies the slice `0:2` to the first dimension of `x` and `0:3` to the
+second dimension. The resulting array has `2*3 = 6` elements, because there
 are 3 subarrays selected from the first dimension with 2 elements each. But in
 the above example, `a[[1, 0], [2, 0]]` only has 2 elements, not 4. And
 something like `a[[1, 0], [2, 0, 1]]` is an error.
@@ -548,14 +535,15 @@ Conversely, a slice like `2:9` is equivalent to the outer index `[2, 3,
 
 [^slice-outer-index-footnote]: They aren't actually equivalent, because [a
     slice creates a view and an integer array index creates a
-    copy](views-vs-copies). If your index can be represented as a slice, it's
-    better to use an actual `slice`.
+    copy](views-vs-copies), not to mention the fact that slices
+    [clip](clipping) and integer arrays have bounds checks. If your index can
+    be represented as a slice, it's usually better to use an actual `slice`.
 
 ### Assigning to an Integer Array Index
 
 As with all index types discussed in this guide, an integer array index can be
 used on the left-hand side of an assignment. This is useful because it allows
-you to surgically inject new elements into your array.
+you to surgically inject new elements into existing positions in your array.
 
 ```py
 >>> a = np.array([100, 101, 102, 103]) # as above

diff --git a/docs/indexing-guide/multidimensional-indices/newaxis.md b/docs/indexing-guide/multidimensional-indices/newaxis.md
@@ -102,23 +102,24 @@ array([[[0],
         [7]]])
 ```
 
+Let's look at each of these more closely:
 
-- `a[np.newaxis, 0, :2]`: the new axis is inserted before the first axis, but
+1. `a[np.newaxis, 0, :2]`: the new axis is inserted before the first axis, but
 the `0` and `:2` still index the original first and second axes. The resulting
 shape is `(1, 2, 4)`.
 
-- `a[0, np.newaxis, :2]`: the new axis is inserted after the first axis, but
+2. `a[0, np.newaxis, :2]`: the new axis is inserted after the first axis, but
 because the `0` removes this axis when it indexes it, the resulting shape is
 still `(1, 2, 4)` (and the resulting array is the same).
 
-- `a[0, :2, np.newaxis]`: the new axis is inserted after the second axis,
+3. `a[0, :2, np.newaxis]`: the new axis is inserted after the second axis,
 because the `newaxis` comes right after the `:2`, which indexes the second
 axis. The resulting shape is `(2, 1, 4)`. Remember that the `4` in the shape
 corresponds to the last axis, which isn't represented in the index at all.
 That's why in this example, the `4` still comes at the end of the resulting
 shape.
 
-- `a[0, :2, ..., np.newaxis]`: the `newaxis` is after an ellipsis, so the new
+4. `a[0, :2, ..., np.newaxis]`: the `newaxis` is after an ellipsis, so the new
 axis is inserted at the end of the shape. The resulting shape is `(2, 4, 1)`.
 
 In general, in a tuple index, the axis that each index selects corresponds to
@@ -155,6 +156,7 @@ In summary,
   non-`newaxis` indices in the tuple index are indexed as if the `newaxis`
   indices were not there.**
 
+(where-newaxis-is-used)=
 ## Where `newaxis` is Used
 
 What we haven't said yet is why you would want to do such a thing in the first

diff --git a/docs/indexing-guide/multidimensional-indices/tuples.md b/docs/indexing-guide/multidimensional-indices/tuples.md
@@ -199,9 +199,9 @@ every index type as a single element tuple index. An integer index `0` is
 `a[0:3,]`. This is a good way to think about indices because it will help you
 remember that non-tuple indices operate as if they were the first element of a
 single-element tuple index, namely, they operate on the first axis of the
-array. Remember, however, that this does not apply to Python built-in types;
-for example, `l[0,]` and `l[0:3,]` will both produce errors if `l` is a
-`list`, `tuple`, or `str`.
+array. Remember, however, that this does not apply to Python built-in types:
+`l[0,]` and `l[0:3,]` will both produce errors if `l` is a `list`, `tuple`, or
+`str`.
 
 Up to now, we looked at the tuple index `(1, 0, 2)`, which selected a single
 element. And we considered sub-tuples of this, `(1,)` and `(1, 0)`, which
@@ -355,7 +355,7 @@ argument to retain the dimension as a size-1 dimension instead.
     array.
 
 There are two final facts about tuple indices that should be noted before we
-move on to the other basic index types. First, as we noticed above,
+move on to the other basic index types. First, as we saw above,
 
 > **if a tuple index has more elements than there are dimensions in an array,
   it raises an `IndexError`.**

diff --git a/docs/indexing-guide/other-topics.md b/docs/indexing-guide/other-topics.md
@@ -109,6 +109,11 @@ It can be useful to think of broadcasting as repeating "stacks" of smaller
 arrays in this way. The size `1` dimension rule allows these "stacks" to be
 along any dimensions of the array, not just the last ones.
 
+When it comes to indexing, one of the most useful types of index for use with
+broadcasting is [newaxis](./multidimensional-indices/newaxis.md), which lets
+you easily insert size `1` dimensions into an array to make the broadcastable
+in a specific way. See [](where-newaxis-is-used).
+
 See the [NumPy
 documentation](https://numpy.org/doc/stable/user/basics.broadcasting.html) for
 more examples of broadcasting.

diff --git a/docs/indexing-guide/slices.md b/docs/indexing-guide/slices.md
@@ -1299,7 +1299,7 @@ Something like the following would work:
 ```py
 >>> mid = len(a)//2
 >>> n = 4
->>> a[mid - n//2: mid + n//2]
+>>> a[mid - n//2:mid + n//2]
 ['b', 'c', 'd', 'e']
 ```
 
@@ -1311,7 +1311,7 @@ However, let's look at what happens when `n` is larger than the size of `a`:
 
 ```py
 >>> n = 8
->>> a[mid - n//2: mid + n//2]
+>>> a[mid - n//2:mid + n//2]
 ['g']
 ```
 
@@ -1346,10 +1346,10 @@ manually clip with `max(mid - n//2, 0)`:
 
 ```py
 >>> n = 4
->>> a[max(mid - n//2, 0): mid + n//2]
+>>> a[max(mid - n//2, 0):mid + n//2]
 ['b', 'c', 'd', 'e']
 >>> n = 8
->>> a[max(mid - n//2, 0): mid + n//2]
+>>> a[max(mid - n//2, 0):mid + n//2]
 ['a', 'b', 'c', 'd', 'e', 'f', 'g']
 ```
 
@@ -1457,7 +1457,7 @@ the *maximum* length of a slice. If the shape or length of the input is known,
 {meth}`len(ndindex.Slice(...).reduce(shape)) <ndindex.Slice.reduce>` will
 compute the true length of the slice. Of course, if you already have a list or
 a NumPy array, you can just slice it and check the shape. Slicing a NumPy
-array always produces a [view on the array](views-vs-copies), so it is a very
+array always produces a [view on the array](views-vs-copies), so it is an
 inexpensive operation. Slicing a `list` does make a copy, but it's a shallow
 copy so it isn't particularly expensive either.