From 5785c72b960a979a5aad8e86411bf96749f9abe1 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Tue, 18 Jul 2023 15:50:24 -0400
Subject: [PATCH 01/29] Initial commit

---
 docs/src/man/basics.md | 1221 ++++++++++++++++++++++++++++++++++++++--
 1 file changed, 1183 insertions(+), 38 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 9ddede8cf..ad68bf691 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -1565,40 +1565,1187 @@ julia> german[Not(5), r"S"]
                 984 rows omitted
 ```
 
-## Basic Usage of Transformation Functions
-
-In DataFrames.jl we have five functions that we can be used to perform
-transformations of columns of a data frame:
-
-- `combine`: creates a new data frame populated with columns that are results of
-  transformation applied to the source data frame columns, potentially combining
-  its rows;
-- `select`: creates a new data frame that has the same number of rows as the
-  source data frame populated with columns that are results of transformations
-  applied to the source data frame columns;
-- `select!`: the same as `select` but updates the passed data frame in place;
-- `transform`: the same as `select` but keeps the columns that were already
-  present in the data frame (note though that these columns can be potentially
-  modified by the transformation passed to `transform`);
-- `transform!`: the same as `transform` but updates the passed data frame in
-  place.
-
-The fundamental ways to specify a transformation are:
-
-- `source_column => transformation => target_column_name`; In this scenario the
-  `source_column` is passed as an argument to `transformation` function and
-  stored in `target_column_name` column.
-- `source_column => transformation`; In this scenario we apply the
-  transformation function to `source_column` and the target column names is
-  automatically generated.
-- `source_column => target_column_name` renames the `source_column` to
-  `target_column_name`.
-- `source_column` just keep the source column as is in the result without any
-  transformation;
-
-These rules are typically called transformation mini-language.
-
-Let us move to the examples of application of these rules
+## Basic Usage of Manipulation Functions
+
+In DataFrames.jl there are seven functions that can be used
+to manipulate data frame columns:
+
+| Function     | Memory Usage                     | Column Retention                             | Row Retention                                     |
+| ------------ | -------------------------------- | -------------------------------------------- | ------------------------------------------------- |
+| `transform`  | Creates a new data frame.        | Retains both source and manipulated columns. | Retains same number of rows as source data frame. |
+| `transform!` | Modifies an existing data frame. | Retains both source and manipulated columns. | Retains same number of rows as source data frame. |
+| `select`     | Creates a new data frame.        | Retains only manipulated columns.            | Retains same number of rows as source data frame. |
+| `select!`    | Modifies an existing data frame. | Retains only manipulated columns.            | Retains same number of rows as source data frame. |
+| `subset`     | Creates a new data frame.        | Retains only source columns.                 | Number of rows is determined by the manipulation. |
+| `subset!`    | Modifies an existing data frame. | Retains only source columns.                 | Number of rows is determined by the manipulation. |
+| `combine`    | Creates a new data frame.        | Retains only manipulated columns.            | Number of rows is determined by the manipulation. |
+
+### Constructing Operation Pairs
+All of the functions above use the same syntax which is commonly
+`manipulation_function(dataframe, operation)`.
+The `operation` argument is a `Pair` which defines the
+operation to be applied to the source `dataframe`,
+and it can take any of the following common forms explained below:
+
+`source_column_selector`
+: selects source column(s) without manipulating or renaming them
+
+`source_column_selector => operation_function`
+: passes source column(s) as arguments to a function
+and automatically names the resulting column(s)
+
+`source_column_selector => operation_function => new_column_names`
+: passes source column(s) as arguments to a function
+and names the resulting column(s) `new_column_names`
+
+`source_column_selector => new_column_names`
+: renames a source column,
+or splits a column containing collection elements into multiple new columns
+
+!!! Note
+      The `source_column_selector`
+      and the `source_column_selector => new_column_names` operation forms
+      are not available for the `subset` and `subset!` manipulation functions.
+
+#### `source_column_selector`
+Inside an `operation`, `source_column_selector` is usually a column name
+or column index which identifies a data frame column.
+`source_column_selector` may be used as the entire `operation`
+with `select` or `select!` to isolate or reorder columns.
+
+```julia
+julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6], c = [7, 8, 9])
+3×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      7
+   2 │     2      5      8
+   3 │     3      6      9
+
+julia> select(df, :b)
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+
+julia> select(df, "b")
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+
+julia> select(df, 2)
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+```
+
+`source_column_selector` may also be a collection of columns such as a vector,
+a [regular expression](https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions),
+a `Not`, `Between`, `All`, or `Cols` expression,
+or a `:`.
+See the [Indexing](@ref) API for the full list of possible values with references.
+
+!!! Note
+      The Julia parser sometimes prevents `:` from being used by itself.
+      `ERROR: syntax: whitespace not allowed after ":" used for quoting`
+      means your `:` must be wrapped in either `(:)` or `Cols(:)`
+      to be properly interpreted.
+
+```julia
+julia> df = DataFrame(
+           id = [1, 2, 3],
+           first_name = ["José", "Emma", "Nathan"],
+           last_name = ["Garcia", "Marino", "Boyer"],
+           age = [61, 24, 33]
+       )
+3×4 DataFrame
+ Row │ id     first_name  last_name  age
+     │ Int64  String      String     Int64
+─────┼─────────────────────────────────────
+   1 │     1  José        Garcia        61
+   2 │     2  Emma        Marino        24
+   3 │     3  Nathan      Boyer         33
+
+julia> select(df, [:last_name, :first_name])
+3×2 DataFrame
+ Row │ last_name  first_name
+     │ String     String
+─────┼───────────────────────
+   1 │ Garcia     José
+   2 │ Marino     Emma
+   3 │ Boyer      Nathan
+
+julia> select(df, r"name")
+3×2 DataFrame
+ Row │ first_name  last_name
+     │ String      String
+─────┼───────────────────────
+   1 │ José        Garcia
+   2 │ Emma        Marino
+   3 │ Nathan      Boyer
+
+julia> select(df, Not(:id))
+3×3 DataFrame
+ Row │ first_name  last_name  age
+     │ String      String     Int64
+─────┼──────────────────────────────
+   1 │ José        Garcia        61
+   2 │ Emma        Marino        24
+   3 │ Nathan      Boyer         33
+
+julia> select(df, Between(2,4))
+3×3 DataFrame
+ Row │ first_name  last_name  age
+     │ String      String     Int64
+─────┼──────────────────────────────
+   1 │ José        Garcia        61
+   2 │ Emma        Marino        24
+   3 │ Nathan      Boyer         33
+```
+
+`AsTable(source_column_selector)` is a special `source_column_selector`
+that can be used to select multiple columns into a single `NamedTuple`.
+This is not useful on its own, so the function of this selector
+will be explained in the next section.
+
+
+#### `operation_function`
+Inside an `operation` pair, `operation_function` is a function
+which operates on data frame columns passed as vectors.
+When multiple columns are selected by `source_column_selector`,
+the `operation_function` will receive the columns as multiple positional arguments
+in the order they were selected, e.g. `f(column1, column2, column3)`.
+
+```julia
+julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 4])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      4
+
+julia> combine(df, :a => sum)
+1×1 DataFrame
+ Row │ a_sum
+     │ Int64
+─────┼───────
+   1 │     6
+
+julia> transform(df, :b => maximum) # `transform` and `select` copy result to all rows
+3×3 DataFrame
+ Row │ a      b      b_maximum
+     │ Int64  Int64  Int64
+─────┼─────────────────────────
+   1 │     1      4          5
+   2 │     2      5          5
+   3 │     3      4          5
+
+julia> transform(df, [:b, :a] => -) # vector subtraction is okay
+3×3 DataFrame
+ Row │ a      b      b_a_-
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      3
+   2 │     2      5      3
+   3 │     3      4      1
+
+julia> transform(df, [:a, :b] => *) # vector multiplication is not defined
+ERROR: MethodError: no method matching *(::Vector{Int64}, ::Vector{Int64})
+```
+
+Don't worry! There is a quick fix for the previous error.
+If you want to apply a function to each element in a column
+instead of to the entire column vector,
+then you can wrap your element-wise function in `ByRow` like
+`ByRow(my_elementwise_function)`.
+This will apply `my_elementwise_function` to every element in the column
+and then collect the results back into a vector.
+
+```julia
+julia> transform(df, [:a, :b] => ByRow(*))
+3×3 DataFrame
+ Row │ a      b      a_b_*
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      4
+   2 │     2      5     10
+   3 │     3      4     12
+
+julia> transform(df, Cols(:) => ByRow(max))
+3×3 DataFrame
+ Row │ a      b      a_b_max
+     │ Int64  Int64  Int64
+─────┼───────────────────────
+   1 │     1      4        4
+   2 │     2      5        5
+   3 │     3      4        4
+
+julia> f(x) = x + 1
+f (generic function with 1 method)
+
+julia> transform(df, :a => ByRow(f))
+3×3 DataFrame
+ Row │ a      b      a_f
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      2
+   2 │     2      5      3
+   3 │     3      4      4
+```
+
+Alternatively, you may just want to define the function itself so it
+[broadcasts](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
+over vectors.
+
+```julia
+julia> g(x) = x .+ 1
+g (generic function with 1 method)
+
+julia> transform(df, :a => g)
+3×3 DataFrame
+ Row │ a      b      a_g
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      2
+   2 │     2      5      3
+   3 │     3      4      4
+```
+
+[Anonymous functions](https://docs.julialang.org/en/v1/manual/functions/#man-anonymous-functions)
+are a convenient way to define and use an `operation_function`
+all within the manipulation function call.
+
+```julia
+julia> select(df, :a => ByRow(x -> x + 1))
+3×1 DataFrame
+ Row │ a_function
+     │ Int64
+─────┼────────────
+   1 │          2
+   2 │          3
+   3 │          4
+
+julia> transform(df, [:a, :b] => ByRow((x, y) -> 2x + y))
+3×3 DataFrame
+ Row │ a      b      a_b_function
+     │ Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      4             6
+   2 │     2      5             9
+   3 │     3      4            10
+
+julia> subset(df, :b => ByRow(x -> x < 5))
+2×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     3      4
+
+julia> subset(df, :b => ByRow(<(5))) # shorter version of the previous
+2×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     3      4
+```
+
+!!! Note
+    `operation_functions` within `subset` or `subset!` function calls
+    must return a boolean vector.
+    `true` elements in the boolean vector will determine
+    which rows are retained in the resulting data frame.
+
+As demonstrated above, `DataFrame` columns are usually passed
+from `source_column_selector` to `operation_function` as one or more
+vector arguments.
+However, when `AsTable(source_column_selector)` is used,
+the selected columns are collected and passed as a single `NamedTuple`
+to `operation_function`.
+
+This is often useful when your `operation_function` is defined to operate
+on a single collection argument rather than on multiple positional arguments.
+The distinction is somewhat similar to the difference between the built-in
+`min` and `minimum` functions.
+`min` is defined to find the minimum value among multiple positional arguments,
+while `minimum` is defined to find the minimum value
+among the elements of a single collection argument.
+
+```julia
+julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 2:-1:1)
+2×4 DataFrame
+ Row │ a      b      c      d
+     │ Int64  Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      3      5      2
+   2 │     2      4      6      1
+
+julia> select(df, Cols(:) => ByRow(min)) # min works on multiple arguments
+2×1 DataFrame
+ Row │ a_b_etc_min
+     │ Int64
+─────┼─────────────
+   1 │           1
+   2 │           1
+
+julia> select(df, AsTable(:) => ByRow(minimum)) # minimum works on a collection
+2×1 DataFrame
+ Row │ a_b_etc_minimum
+     │ Int64
+─────┼─────────────────
+   1 │               1
+   2 │               1
+
+julia> select(df, [:a,:b] => ByRow(+)) # `+` works on a multiple arguments
+2×1 DataFrame
+ Row │ a_b_+
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     6
+
+julia> select(df, AsTable([:a,:b]) => ByRow(sum)) # `sum` works on a collection
+2×1 DataFrame
+ Row │ a_b_sum
+     │ Int64
+─────┼─────────
+   1 │       4
+   2 │       6
+
+julia> using Statistics # contains the `mean` function
+
+julia> select(df, AsTable(Between(:b, :d)) => ByRow(mean))
+2×1 DataFrame
+ Row │ b_c_d_mean
+     │ Float64
+─────┼────────────
+   1 │    3.33333
+   2 │    3.66667
+```
+
+`AsTable` can also be used to pass columns to a function which operates
+on fields of a `NamedTuple`.
+
+```julia
+julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 7:8)
+2×4 DataFrame
+ Row │ a      b      c      d
+     │ Int64  Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      3      5      7
+   2 │     2      4      6      8
+
+julia> f(nt) = nt.a + nt.d
+f (generic function with 1 method)
+
+julia> transform(df, AsTable(:) => ByRow(f))
+2×5 DataFrame
+ Row │ a      b      c      d      a_b_etc_f
+     │ Int64  Int64  Int64  Int64  Int64
+─────┼───────────────────────────────────────
+   1 │     1      3      5      7          8
+   2 │     2      4      6      8         10
+```
+
+As demonstrated above,
+in the `source_column_selector => operation_function` operation pair form,
+the results of an operation will be placed into a new column with an
+automatically-generated name based on the operation;
+the new column name will be the `operation_function` name
+appended to the source column name(s) with an underscore.
+
+This automatic column naming behavior can be avoided in two ways.
+First, the operation result can be placed back into the original column
+with the original column name by switching the keyword argument `renamecols`
+from its default value (`true`) to `renamecols=false`.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, :a => ByRow(x->x+10), renamecols=false) # add 10 in-place
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │    11      5
+   2 │    12      6
+   3 │    13      7
+   4 │    14      8
+```
+
+The second method to avoid the default manipulation column naming is to
+specify your own `new_column_names`.
+
+#### `new_column_names`
+
+`new_column_names` can be included at the end of an `operation` pair to specify
+the name of the new column(s).
+`new_column_names` may be a symbol or a string.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, Cols(:) => ByRow(+) => :c)
+4×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, Cols(:) => ByRow(+) => "a+b")
+4×3 DataFrame
+ Row │ a      b      a+b
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, :a => ByRow(x->x+10) => "a+10")
+4×3 DataFrame
+ Row │ a      b      a+10
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     11
+   2 │     2      6     12
+   3 │     3      7     13
+   4 │     4      8     14
+```
+
+The `source_column_selector => new_column_names` operation form
+can be used to rename columns without an intermediate function.
+However, there are `rename` and `rename!` functions,
+which accept the same syntax,
+that tend to be more useful for this operation.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, :a => :α) # adds column α
+4×3 DataFrame
+ Row │ a      b      α
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      1
+   2 │     2      6      2
+   3 │     3      7      3
+   4 │     4      8      4
+
+julia> select(df, :a => :α) # retains only column α
+4×1 DataFrame
+ Row │ α
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+   4 │     4
+
+julia> rename(df, :a => :α) # renames column α in-place
+4×2 DataFrame
+ Row │ α      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+```
+
+Additionally, in the
+`source_column_selector => operation_function => new_column_names` operation form,
+`new_column_names` may be a renaming function which operates on a string
+to create the destination column names programmatically.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> add_prefix(s) = "new_" * s
+add_prefix (generic function with 1 method)
+
+julia> transform(df, :a => (x -> 10 .* x) => add_prefix) # with named renaming function
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     10
+   2 │     2      6     20
+   3 │     3      7     30
+   4 │     4      8     40
+
+julia> transform(df, :a => (x -> 10 .* x) => (s -> "new_" * s)) # with anonymous renaming function
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     10
+   2 │     2      6     20
+   3 │     3      7     30
+   4 │     4      8     40
+```
+
+Note that a renaming function will not work in the
+`source_column_selector => new_column_names` operation form
+because a function in the second element of the operation pair is assumed to take
+the `source_column_selector => operation_function` operation form.
+To work around this limitation, use the
+`source_column_selector => operation_function => new_column_names` operation form
+with `identity` as the `operation_function`.
+
+```julia
+julia> transform(df, :a => add_prefix)
+ERROR: MethodError: no method matching *(::String, ::Vector{Int64})
+
+julia> transform(df, :a => identity => add_prefix)
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      1
+   2 │     2      6      2
+   3 │     3      7      3
+   4 │     4      8      4
+```
+
+!!! Note
+      Renaming functions are not currently supported within `Pair` arguments
+      to the `rename` and `rename!` functions.
+      However, renaming functions can be applied to an entire data frame
+      with the `rename(renaming_function, dataframe)` method.
+
+In the `source_column_selector => new_column_names` operation form,
+only a single source column may be selected per operation,
+so why is `new_column_names` plural?
+It is possible to split the data contained inside a single column
+into multiple new columns by supplying a vector of strings or symbols
+as `new_column_names`.
+
+```julia
+julia> df = DataFrame(data = [(1,2), (3,4)]) # vector of tuples
+2×1 DataFrame
+ Row │ data
+     │ Tuple…
+─────┼────────
+   1 │ (1, 2)
+   2 │ (3, 4)
+
+julia> transform(df, :data => [:first, :second]) # manual naming
+2×3 DataFrame
+ Row │ data    first  second
+     │ Tuple…  Int64  Int64
+─────┼───────────────────────
+   1 │ (1, 2)      1       2
+   2 │ (3, 4)      3       4
+```
+
+This kind of data splitting can even be done automatically with `AsTable`.
+
+```julia
+julia> transform(df, :data => AsTable) # default automatic naming with tuples
+2×3 DataFrame
+ Row │ data    x1     x2
+     │ Tuple…  Int64  Int64
+─────┼──────────────────────
+   1 │ (1, 2)      1      2
+   2 │ (3, 4)      3      4
+```
+
+If a data frame column contains `NamedTuple`s,
+then `AsTable` will preserve the field names.
+```julia
+julia> df = DataFrame(data = [(a=1,b=2), (a=3,b=4)]) # vector of named tuples
+2×1 DataFrame
+ Row │ data
+     │ NamedTup…
+─────┼────────────────
+   1 │ (a = 1, b = 2)
+   2 │ (a = 3, b = 4)
+
+julia> transform(df, :data => AsTable) # keeps names from named tuples
+2×3 DataFrame
+ Row │ data            a      b
+     │ NamedTup…       Int64  Int64
+─────┼──────────────────────────────
+   1 │ (a = 1, b = 2)      1      2
+   2 │ (a = 3, b = 4)      3      4
+```
+
+!!! Note
+      To pack multiple columns into a single column of `NamedTuple`s
+      (reverse of the above operation)
+      apply the `identity` function `ByRow`, e.g.
+      `transform(df, AsTable([:a, :b]) => ByRow(identity) => :data)`.
+
+Renaming functions also work for multi-column transformations,
+but they must operate on a vector of strings.
+
+```julia
+julia> df = DataFrame(data = [(1,2), (3,4)])
+2×1 DataFrame
+ Row │ data
+     │ Tuple…
+─────┼────────
+   1 │ (1, 2)
+   2 │ (3, 4)
+
+julia> new_names(v) = ["primary ", "secondary "] .* v
+new_names (generic function with 1 method)
+
+julia> transform(df, :data => identity => new_names)
+2×3 DataFrame
+ Row │ data    primary data  secondary data
+     │ Tuple…  Int64         Int64
+─────┼──────────────────────────────────────
+   1 │ (1, 2)             1               2
+   2 │ (3, 4)             3               4
+```
+
+#### Multiple Operations per Manipulation
+All data frame manipulation functions can accept multiple `operation` pairs
+at once using any of the following methods:
+- `manipulation_function(dataframe, operation1, operation2)`   : multiple arguments
+- `manipulation_function(dataframe, [operation1, operation2])` : vector argument
+- `manipulation_function(dataframe, [operation1 operation2])`  : matrix argument
+
+Passing multiple operations is especially useful for the `select`, `select!`,
+and `combine` manipulation functions,
+since they only retain columns which are a result of the passed operations.
+
+```julia
+julia> df = DataFrame(a = 1:4, b = [50,50,60,60], c = ["hat","bat","cat","dog"])
+4×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │     1     50  hat
+   2 │     2     50  bat
+   3 │     3     60  cat
+   4 │     4     60  dog
+
+julia> combine(df, :a => maximum, :b => sum, :c => join) # 3 combine operations
+1×3 DataFrame
+ Row │ a_maximum  b_sum  c_join
+     │ Int64      Int64  String
+─────┼────────────────────────────────
+   1 │         4    220  hatbatcatdog
+
+julia> select(df, :c, :b, :a) # re-order columns
+4×3 DataFrame
+ Row │ c       b      a
+     │ String  Int64  Int64
+─────┼──────────────────────
+   1 │ hat        50      1
+   2 │ bat        50      2
+   3 │ cat        60      3
+   4 │ dog        60      4
+
+ulia> select(df, :b, :) # `:` here means all other columns
+4×3 DataFrame
+ Row │ b      a      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │    50      1  hat
+   2 │    50      2  bat
+   3 │    60      3  cat
+   4 │    60      4  dog
+
+julia> select(
+           df,
+           :c => (x -> "a " .* x) => :one_c,
+           :a => (x -> 100x),
+           :b,
+           renamecols=false
+       ) # can mix operation forms
+4×3 DataFrame
+ Row │ one_c   a      b
+     │ String  Int64  Int64
+─────┼──────────────────────
+   1 │ a hat     100     50
+   2 │ a bat     200     50
+   3 │ a cat     300     60
+   4 │ a dog     400     60
+
+julia> select(
+           df,
+           :c => ByRow(reverse),
+           :c => ByRow(uppercase)
+       ) # multiple operations on same column
+4×2 DataFrame
+ Row │ c_reverse  c_uppercase
+     │ String     String
+─────┼────────────────────────
+   1 │ tah        HAT
+   2 │ tab        BAT
+   3 │ tac        CAT
+   4 │ god        DOG
+```
+
+In the last two examples,
+the manipulation function arguments were split across multiple lines.
+This is a good way to make manipulations with many operations more readable.
+
+Passing multiple operations to `subset` or `subset!` is an easy way to narrow in
+on a particular row of data.
+
+```julia
+julia> subset(
+           df,
+           :b => ByRow(==(60)),
+           :c => ByRow(contains("at"))
+       ) # rows with 60 and "at"
+1×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │     3     60  cat
+```
+
+Note that all operations within a single manipulation must use the data
+as it existed before the function call
+i.e. you cannot use newly created columns for subsequent operations
+within the same manipulation.
+
+```julia
+julia> transform(
+           df,
+           [:a, :b] => ByRow(+) => :d,
+           :d => (x -> x ./ 2),
+       ) # requires two separate transformations
+ERROR: ArgumentError: column name :d not found in the data frame; existing most similar names are: :a, :b and :c
+
+julia> new_df = transform(df, [:a, :b] => ByRow(+) => :d)
+4×4 DataFrame
+ Row │ a      b      c       d
+     │ Int64  Int64  String  Int64
+─────┼─────────────────────────────
+   1 │     1     50  hat        51
+   2 │     2     50  bat        52
+   3 │     3     60  cat        63
+   4 │     4     60  dog        64
+
+julia> transform!(new_df, :d => (x -> x ./ 2) => :d_2)
+4×5 DataFrame
+ Row │ a      b      c       d      d_2
+     │ Int64  Int64  String  Int64  Float64
+─────┼──────────────────────────────────────
+   1 │     1     50  hat        51     25.5
+   2 │     2     50  bat        52     26.0
+   3 │     3     60  cat        63     31.5
+   4 │     4     60  dog        64     32.0
+```
+
+
+#### Broadcasting Operation Pairs
+
+[Broadcasting](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
+pairs with `.=>` is often a convenient way to generate multiple
+similar `operation`s to be applied within a single manipulation.
+Broadcasting within the `Pair` of an `operation` is no different than
+broadcasting in base Julia.
+The broadcasting `.=>` will be expanded into a vector of pairs
+(`[operation1, operation2, ...]`),
+and this expansion will occur before the manipulation function is invoked.
+Then the manipulation function will use the
+`manipulation_function(dataframe, [operation1, operation2, ...])` method.
+This process will be explained in more detail below.
+
+To illustrate these concepts, let us first examine the `Type` of a basic `Pair`.
+In DataFrames.jl, a symbol, string, or integer
+may be used to select a single column.
+Some `Pair`s with these types are below.
+
+```julia
+julia> typeof(:x => :a)
+Pair{Symbol, Symbol}
+
+julia> typeof("x" => "a")
+Pair{String, String}
+
+julia> typeof(1 => "a")
+Pair{Int64, String}
+```
+
+Any of the `Pair`s above could be used to rename the first column
+of the data frame below to `a`.
+
+```julia
+julia> df = DataFrame(x = 1:3, y = 4:6)
+3×2 DataFrame
+ Row │ x      y
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+
+julia> select(df, :x => :a)
+3×1 DataFrame
+ Row │ a
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+
+julia> select(df, 1 => "a")
+3×1 DataFrame
+ Row │ a
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+```
+
+What should we do if we want to keep and rename both the `x` and `y` column?
+One option is to supply a `Vector` of operation `Pair`s to `select`.
+`select` will process all of these operations in order.
+
+```julia
+julia> ["x" => "a", "y" => "b"]
+2-element Vector{Pair{String, String}}:
+ "x" => "a"
+ "y" => "b"
+
+julia> select(df, ["x" => "a", "y" => "b"])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+```
+
+We can use broadcasting to simplify the syntax above.
+
+```julia
+julia> ["x", "y"] .=> ["a", "b"]
+2-element Vector{Pair{String, String}}:
+ "x" => "a"
+ "y" => "b"
+
+julia> select(df, ["x", "y"] .=> ["a", "b"])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+```
+
+Notice that `select` sees the same `Vector{Pair{String, String}}` operation
+argument whether the individual pairs are written out explicitly or
+constructed with broadcasting.
+The broadcasting is applied before the call to `select`.
+
+```julia
+julia> ["x" => "a", "y" => "b"] == (["x", "y"] .=> ["a", "b"])
+true
+```
+
+!!! Note
+      These operation pairs (or vector of pairs) can be given variable names.
+      This is uncommon in practice but could be helpful for intermediate
+      inspection and testing.
+      ```julia
+      df = DataFrame(x = 1:3, y = 4:6)       # create data frame
+      operation = ["x", "y"] .=> ["a", "b"]  # save operation to variable
+      typeof(operation)                      # check type of operation
+      first(operation)                       # check first pair in operation
+      last(operation)                        # check last pair in operation
+      select(df, operation)                  # manipulate `df` with `operation`
+      ```
+
+If a function is used as part of a transformation `Pair`,
+like in the `source_column_selector => function => new_column_names` form,
+then the function is repeated in each pair of the resultant vector.
+This is an easy way to apply a function to multiple columns at the same time.
+
+```julia
+julia> f(x) = 2 * x
+f (generic function with 1 method)
+
+julia> ["x", "y"] .=> f .=> ["a", "b"]
+2-element Vector{Pair{String, Pair{typeof(f), String}}}:
+ "x" => (f => "a")
+ "y" => (f => "b")
+
+julia> select(df, ["x", "y"] .=> f .=> ["a", "b"])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
+ ```
+
+A renaming function can be applied to multiple columns in the same way.
+It will also be repeated in each operation `Pair`.
+
+```julia
+julia> newname(s::String) = s * "_new"
+newname (generic function with 1 method)
+
+julia> ["x", "y"] .=> f .=> newname
+2-element Vector{Pair{String, Pair{typeof(f), typeof(newname)}}}:
+ "x" => (f => newname)
+ "y" => (f => newname)
+
+julia> select(df, ["x", "y"] .=> f .=> newname)
+3×2 DataFrame
+ Row │ x_new  y_new
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
+```
+
+You can see from the type output above
+that a three element pair does not actually exist.
+A `Pair` (as the name implies) can only contain two elements.
+Thus, `:x => :y => :z` becomes a nested `Pair`,
+where `:x` is the first element and points to the `Pair` `:y => :z`,
+which is the second element.
+
+```julia
+julia> p = :x => :y => :z
+:x => (:y => :z)
+
+julia> p[1]
+:x
+
+julia> p[2]
+:y => :z
+
+julia> p[2][1]
+:y
+
+julia> p[2][2]
+```
+
+In the previous examples, the source columns have been individually selected.
+When broadcasting multiple columns to the same function,
+often similarities in the column names or position can be exploited to avoid
+tedious selection.
+Consider a data frame with temperature data at three different locations
+taken over time.
+```julia
+julia> df = DataFrame(Time = 1:4,
+                      Temperature1 = [20, 23, 25, 28],
+                      Temperature2 = [33, 37, 41, 44],
+                      Temperature3 = [15, 10, 4, 0])
+4×4 DataFrame
+ Row │ Time   Temperature1  Temperature2  Temperature3
+     │ Int64  Int64         Int64         Int64
+─────┼─────────────────────────────────────────────────
+   1 │     1            20            33            15
+   2 │     2            23            37            10
+   3 │     3            25            41             4
+   4 │     4            28            44             0
+```
+
+To convert all of the temperature data in one transformation,
+we just need to define a conversion function and broadcast
+it to all of the "Temperature" columns.
+
+```julia
+julia> celsius_to_kelvin(x) = x + 273
+celsius_to_kelvin (generic function with 1 method)
+
+julia> transform(
+           df,
+           Cols(r"Temp") .=> ByRow(celsius_to_kelvin),
+           renamecols = false
+       )
+4×4 DataFrame
+ Row │ Time   Temperature1  Temperature2  Temperature3
+     │ Int64  Int64         Int64         Int64
+─────┼─────────────────────────────────────────────────
+   1 │     1           293           306           288
+   2 │     2           296           310           283
+   3 │     3           298           314           277
+   4 │     4           301           317           273
+```
+Or, simultaneously changing the column names:
+
+```julia
+julia> rename_function(s) = "Temperature $(last(s)) (°K)"
+rename_function (generic function with 1 method)
+
+julia> select(
+           df,
+           "Time",
+           Cols(r"Temp") .=> ByRow(celsius_to_kelvin) .=> rename_function
+       )
+4×4 DataFrame
+ Row │ Time   Temperature 1 (°K)  Temperature 2 (°K)  Temperature 3 (°K)
+     │ Int64  Int64               Int64               Int64
+─────┼───────────────────────────────────────────────────────────────────
+   1 │     1                 293                 306                 288
+   2 │     2                 296                 310                 283
+   3 │     3                 298                 314                 277
+   4 │     4                 301                 317                 273
+```
+
+!!! Note Notes
+      * `Not("Time")` or `2:4` would have been equally good choices for `source_column_selector` in the above operations.
+      * Don't forget `ByRow` if your function is to be applied to elements rather than entire column vectors.
+      Without `ByRow`, the manipulations above would have thrown
+      `ERROR: MethodError: no method matching +(::Vector{Int64}, ::Int64)`.
+      * Regular expression (`r""`) and `:` `source_column_selectors`
+      must be wrapped in `Cols` to be properly broadcasted
+      because otherwise the broadcasting occurs before the expression is expanded into a vector of matches.
+
+You could also broadcast different columns to different functions
+by supplying a vector of functions.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> f1(x) = x .+ 1
+f1 (generic function with 1 method)
+
+julia> f2(x) = x ./ 10
+f2 (generic function with 1 method)
+
+julia> transform(df, [:a, :b] .=> [f1, f2])
+4×4 DataFrame
+ Row │ a      b      a_f1   b_f2
+     │ Int64  Int64  Int64  Float64
+─────┼──────────────────────────────
+   1 │     1      5      2      0.5
+   2 │     2      6      3      0.6
+   3 │     3      7      4      0.7
+   4 │     4      8      5      0.8
+```
+
+However, this form is not much more convenient than supplying
+multiple individual operations.
+
+```julia
+julia> transform(df, [:a => f1, :b => f2]) # same manipulation as previous
+4×4 DataFrame
+ Row │ a      b      a_f1   b_f2
+     │ Int64  Int64  Int64  Float64
+─────┼──────────────────────────────
+   1 │     1      5      2      0.5
+   2 │     2      6      3      0.6
+   3 │     3      7      4      0.7
+   4 │     4      8      5      0.8
+```
+
+Perhaps more useful for broadcasting syntax
+is to apply multiple functions to multiple columns
+by changing the vector of functions to a 1-by-x matrix of functions.
+(Recall that a list, a vector, or a matrix of operation pairs are all valid
+for passing to the manipulation functions.)
+
+```julia
+julia> [:a, :b] .=> [f1 f2] # No comma `,` between f1 and f2
+2×2 Matrix{Pair{Symbol}}:
+ :a=>f1  :a=>f2
+ :b=>f1  :b=>f2
+
+julia> transform(df, [:a, :b] .=> [f1 f2]) # No comma `,` between f1 and f2
+4×6 DataFrame
+ Row │ a      b      a_f1   b_f1   a_f2     b_f2
+     │ Int64  Int64  Int64  Int64  Float64  Float64
+─────┼──────────────────────────────────────────────
+   1 │     1      5      2      6      0.1      0.5
+   2 │     2      6      3      7      0.2      0.6
+   3 │     3      7      4      8      0.3      0.7
+   4 │     4      8      5      9      0.4      0.8
+```
+
+In this way, every combination of selected columns and functions will be applied.
+
+Pair broadcasting is a simple but powerful tool
+that can be used in any of the manipulation functions listed under
+[Basic Usage of Manipulation Functions](@ref).
+Experiment for yourself to discover other useful operations.
+
+#### Additional Resources
+The operation pair syntax is sometimes referred to as the DataFrames mini-language
+or domain-specific language (DSL).
+More details and examples of the opertation mini-language can be found in
+[this blog post](https://bkamins.github.io/julialang/2020/12/24/minilanguage.html).
+
+For additional syntax niceties,
+many users find the [Chain.jl](https://github.com/jkrumbiegel/Chain.jl)
+and [DataFramesMeta.jl](https://github.com/JuliaData/DataFramesMeta.jl)
+packages useful
+to help simplify manipulations that may be tedious with operation pairs alone.
+
+For additional practice,
+an interactive tutorial is provided by the DataFrames package author
+[here](https://github.com/bkamins/Julia-DataFrames-Tutorial).
+
+#### More Manipulation Examples with the German Dataset
+
+Let us move to the examples of application of these rules using the German dataset.
 
 ```jldoctest dataframe
 julia> using Statistics
@@ -2162,7 +3309,5 @@ julia> select(german, :Age, :Job, [:Age, :Job] => (+) => :res)
 ```
 
 In the examples given in this introductory tutorial we did not cover all
-options of the transformation mini-language. More advanced examples, in particular
-showing how to pass or produce multiple columns using the `AsTable` operation
-(which you might have seen in some DataFrames.jl demos) are given in the later
-sections of the manual.
+options of the DataFrames.jl operation mini-language.
+More advanced examples, are given in the later sections of the manual.

From f5cc15fec921aac9097cf9d2d5cfd904b648a071 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Wed, 19 Jul 2023 11:32:33 -0400
Subject: [PATCH 02/29] assumes requested method is added

---
 docs/src/man/basics.md | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index ad68bf691..f8fdada79 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -2159,10 +2159,29 @@ julia> transform(df, :a => identity => add_prefix)
 ```
 
 !!! Note
-      Renaming functions are not currently supported within `Pair` arguments
-      to the `rename` and `rename!` functions.
-      However, renaming functions can be applied to an entire data frame
-      with the `rename(renaming_function, dataframe)` method.
+      The `rename` and `rename!` functions are a simpler way
+      to apply a renaming function without an intermediate `operation_function`.
+      ```julia
+      julia> rename(df, :a => add_prefix) # rename some columns
+      4×2 DataFrame
+      Row │ new_a  b
+         │ Int64  Int64
+      ─────┼──────────────
+         1 │     1      5
+         2 │     2      6
+         3 │     3      7
+         4 │     4      8
+
+      julia> rename(add_prefix, df) # rename all columns
+      4×2 DataFrame
+      Row │ new_a  new_b
+         │ Int64  Int64
+      ─────┼──────────────
+         1 │     1      5
+         2 │     2      6
+         3 │     3      7
+         4 │     4      8
+      ```
 
 In the `source_column_selector => new_column_names` operation form,
 only a single source column may be selected per operation,

From 7c3db8a82a86eb98ecf6656a38783ab9bc5ab19a Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Wed, 19 Jul 2023 13:30:32 -0400
Subject: [PATCH 03/29] Typo: missing :z

---
 docs/src/man/basics.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index f8fdada79..9842033b4 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -2595,6 +2595,7 @@ julia> p[2][1]
 :y
 
 julia> p[2][2]
+:z
 ```
 
 In the previous examples, the source columns have been individually selected.

From 27d7e3220133b3230cd0bf3019efb3b3a1c04ae6 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Tue, 25 Jul 2023 12:10:53 -0400
Subject: [PATCH 04/29] added subset(df, source_column_selector)

---
 docs/src/man/basics.md | 56 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 50 insertions(+), 6 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 9842033b4..55981cb50 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -1601,15 +1601,12 @@ and names the resulting column(s) `new_column_names`
 `source_column_selector => new_column_names`
 : renames a source column,
 or splits a column containing collection elements into multiple new columns
-
-!!! Note
-      The `source_column_selector`
-      and the `source_column_selector => new_column_names` operation forms
-      are not available for the `subset` and `subset!` manipulation functions.
+(not available for `subset` or `subset!`)
 
 #### `source_column_selector`
 Inside an `operation`, `source_column_selector` is usually a column name
 or column index which identifies a data frame column.
+
 `source_column_selector` may be used as the entire `operation`
 with `select` or `select!` to isolate or reorder columns.
 
@@ -1651,7 +1648,33 @@ julia> select(df, 2)
    3 │     6
 ```
 
-`source_column_selector` may also be a collection of columns such as a vector,
+`source_column_selector` may also be used as the entire `operation`
+with `subset` or `subset!` if the source column contains `Bool` values.
+
+```julia
+julia> df = DataFrame(
+           name = ["Scott", "Jill", "Erica", "Jimmy"],
+           minor = [false, true, false, true],
+       )
+4×2 DataFrame
+ Row │ name    minor
+     │ String  Bool
+─────┼───────────────
+   1 │ Scott   false
+   2 │ Jill     true
+   3 │ Erica   false
+   4 │ Jimmy    true
+
+julia> subset(df, :minor)
+2×2 DataFrame
+ Row │ name    minor
+     │ String  Bool
+─────┼───────────────
+   1 │ Jill     true
+   2 │ Jimmy    true
+```
+
+`source_column_selector` may instead be a collection of columns such as a vector,
 a [regular expression](https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions),
 a `Not`, `Between`, `All`, or `Cols` expression,
 or a `:`.
@@ -1713,6 +1736,27 @@ julia> select(df, Between(2,4))
    1 │ José        Garcia        61
    2 │ Emma        Marino        24
    3 │ Nathan      Boyer         33
+
+julia> df2 = DataFrame(
+           name = ["Scott", "Jill", "Erica", "Jimmy"],
+           minor = [false, true, false, true],
+           male = [true, false, false, true],
+       )
+4×3 DataFrame
+ Row │ name    minor  male
+     │ String  Bool   Bool
+─────┼──────────────────────
+   1 │ Scott   false   true
+   2 │ Jill     true  false
+   3 │ Erica   false  false
+   4 │ Jimmy    true   true
+
+julia> subset(df2, [:minor, :male])
+1×3 DataFrame
+ Row │ name    minor  male
+     │ String  Bool   Bool
+─────┼─────────────────────
+   1 │ Jimmy    true  true
 ```
 
 `AsTable(source_column_selector)` is a special `source_column_selector`

From be5fa9e9a940228b19d61ee43abde17ef39c526a Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Tue, 25 Jul 2023 12:17:44 -0400
Subject: [PATCH 05/29] Added italics

---
 docs/src/man/basics.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 55981cb50..e6d1e1bd5 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -1601,7 +1601,7 @@ and names the resulting column(s) `new_column_names`
 `source_column_selector => new_column_names`
 : renames a source column,
 or splits a column containing collection elements into multiple new columns
-(not available for `subset` or `subset!`)
+(*not available for `subset` or `subset!`*)
 
 #### `source_column_selector`
 Inside an `operation`, `source_column_selector` is usually a column name

From b1b3babd6188ec18e6b6a193641865ef237481bd Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Wed, 2 Aug 2023 14:23:01 -0400
Subject: [PATCH 06/29] Moved note to main text

---
 docs/src/man/basics.md | 49 +++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 24 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index e6d1e1bd5..518d72450 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -2202,30 +2202,31 @@ julia> transform(df, :a => identity => add_prefix)
    4 │     4      8      4
 ```
 
-!!! Note
-      The `rename` and `rename!` functions are a simpler way
-      to apply a renaming function without an intermediate `operation_function`.
-      ```julia
-      julia> rename(df, :a => add_prefix) # rename some columns
-      4×2 DataFrame
-      Row │ new_a  b
-         │ Int64  Int64
-      ─────┼──────────────
-         1 │     1      5
-         2 │     2      6
-         3 │     3      7
-         4 │     4      8
-
-      julia> rename(add_prefix, df) # rename all columns
-      4×2 DataFrame
-      Row │ new_a  new_b
-         │ Int64  Int64
-      ─────┼──────────────
-         1 │     1      5
-         2 │     2      6
-         3 │     3      7
-         4 │     4      8
-      ```
+In this case though,
+it is probably again more useful to use the `rename` or `rename!` function
+rather than one of the manipulation functions
+in order to rename in-place and avoid the intermediate `operation_function`.
+```julia
+julia> rename(df, :a => add_prefix) # rename some columns
+4×2 DataFrame
+Row │ new_a  b
+   │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> rename(add_prefix, df) # rename all columns
+4×2 DataFrame
+Row │ new_a  new_b
+   │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+```
 
 In the `source_column_selector => new_column_names` operation form,
 only a single source column may be selected per operation,

From da6607d27eae9b4a6ff38b896e55293ba1866465 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Wed, 2 Aug 2023 15:02:46 -0400
Subject: [PATCH 07/29] =?UTF-8?q?Added=20error=20example=20and=20removed?=
 =?UTF-8?q?=20=C2=B0=20symbol?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 docs/src/man/basics.md | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 518d72450..08e3dc7a8 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -2641,6 +2641,9 @@ julia> p[2][1]
 
 julia> p[2][2]
 :z
+
+julia> p[3] # there is no index 3 for a pair
+ERROR: BoundsError: attempt to access Pair{Symbol, Pair{Symbol, Symbol}} at index [3]
 ```
 
 In the previous examples, the source columns have been individually selected.
@@ -2689,7 +2692,7 @@ julia> transform(
 Or, simultaneously changing the column names:
 
 ```julia
-julia> rename_function(s) = "Temperature $(last(s)) (°K)"
+julia> rename_function(s) = "Temperature $(last(s)) (K)"
 rename_function (generic function with 1 method)
 
 julia> select(
@@ -2698,13 +2701,13 @@ julia> select(
            Cols(r"Temp") .=> ByRow(celsius_to_kelvin) .=> rename_function
        )
 4×4 DataFrame
- Row │ Time   Temperature 1 (°K)  Temperature 2 (°K)  Temperature 3 (°K)
-     │ Int64  Int64               Int64               Int64
-─────┼───────────────────────────────────────────────────────────────────
-   1 │     1                 293                 306                 288
-   2 │     2                 296                 310                 283
-   3 │     3                 298                 314                 277
-   4 │     4                 301                 317                 273
+ Row │ Time   Temperature 1 (K)  Temperature 2 (K)  Temperature 3 (K)
+     │ Int64  Int64              Int64              Int64
+─────┼────────────────────────────────────────────────────────────────
+   1 │     1                293                306                288
+   2 │     2                296                310                283
+   3 │     3                298                314                277
+   4 │     4                301                317                273
 ```
 
 !!! Note Notes

From 0c47d10c66d89b58d6b8d11420920c8259a2b23e Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Thu, 17 Aug 2023 10:10:38 -0400
Subject: [PATCH 08/29] Moved Additional Resources to the end and cleaned

---
 docs/src/man/basics.md | 42 +++++++++++++++++++++++-------------------
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 08e3dc7a8..bf5caf9c4 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -2795,22 +2795,6 @@ that can be used in any of the manipulation functions listed under
 [Basic Usage of Manipulation Functions](@ref).
 Experiment for yourself to discover other useful operations.
 
-#### Additional Resources
-The operation pair syntax is sometimes referred to as the DataFrames mini-language
-or domain-specific language (DSL).
-More details and examples of the opertation mini-language can be found in
-[this blog post](https://bkamins.github.io/julialang/2020/12/24/minilanguage.html).
-
-For additional syntax niceties,
-many users find the [Chain.jl](https://github.com/jkrumbiegel/Chain.jl)
-and [DataFramesMeta.jl](https://github.com/JuliaData/DataFramesMeta.jl)
-packages useful
-to help simplify manipulations that may be tedious with operation pairs alone.
-
-For additional practice,
-an interactive tutorial is provided by the DataFrames package author
-[here](https://github.com/bkamins/Julia-DataFrames-Tutorial).
-
 #### More Manipulation Examples with the German Dataset
 
 Let us move to the examples of application of these rules using the German dataset.
@@ -3376,6 +3360,26 @@ julia> select(german, :Age, :Job, [:Age, :Job] => (+) => :res)
             985 rows omitted
 ```
 
-In the examples given in this introductory tutorial we did not cover all
-options of the DataFrames.jl operation mini-language.
-More advanced examples, are given in the later sections of the manual.
+This concludes the introductory explaination of data frame manipulations.
+For more advanced examples,
+see later sections of the manual or the additional resources below.
+
+#### Additional Resources
+More details and examples of operation pair syntax can be found in
+[this blog post](https://bkamins.github.io/julialang/2020/12/24/minilanguage.html).
+(The official wording describing the syntax has changed since the blog post was written,
+but the examples are still illustrative.
+The operation pair syntax is sometimes referred to as the DataFrames.jl mini-language
+or Domain-Specific Language.)
+
+For additional practice,
+an interactive tutorial is provided on a variety of introductory topics
+by the DataFrames.jl package author
+[here](https://github.com/bkamins/Julia-DataFrames-Tutorial).
+
+
+For additional syntax niceties,
+many users find the [Chain.jl](https://github.com/jkrumbiegel/Chain.jl)
+and [DataFramesMeta.jl](https://github.com/JuliaData/DataFramesMeta.jl)
+packages useful
+to help simplify manipulations that may be tedious with operation pairs alone.
\ No newline at end of file

From 6f5dfc5835daeaf261c611092f80abbdebe9dae0 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Thu, 17 Aug 2023 10:16:29 -0400
Subject: [PATCH 09/29] Capitalized Boolean

---
 docs/src/man/basics.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index bf5caf9c4..a8f2a9892 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -1911,8 +1911,8 @@ julia> subset(df, :b => ByRow(<(5))) # shorter version of the previous
 
 !!! Note
     `operation_functions` within `subset` or `subset!` function calls
-    must return a boolean vector.
-    `true` elements in the boolean vector will determine
+    must return a Boolean vector.
+    `true` elements in the Boolean vector will determine
     which rows are retained in the resulting data frame.
 
 As demonstrated above, `DataFrame` columns are usually passed

From 886d9980f6d592180a28e5032f982575c9a748b0 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Thu, 17 Aug 2023 10:28:25 -0400
Subject: [PATCH 10/29] Removed extra space character

---
 docs/src/man/basics.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index a8f2a9892..f8f8d74fe 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -2595,7 +2595,7 @@ julia> select(df, ["x", "y"] .=> f .=> ["a", "b"])
    1 │     2      8
    2 │     4     10
    3 │     6     12
- ```
+```
 
 A renaming function can be applied to multiple columns in the same way.
 It will also be repeated in each operation `Pair`.

From cabd73fc7275b539d7f446bb93c7845d115a4253 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Mon, 18 Sep 2023 12:02:11 -0400
Subject: [PATCH 11/29] Change function broadcasting to avoid old language

---
 docs/src/man/basics.md | 42 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 34 insertions(+), 8 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index f8f8d74fe..4626d2d4a 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -2573,21 +2573,47 @@ true
       select(df, operation)                  # manipulate `df` with `operation`
       ```
 
-If a function is used as part of a transformation `Pair`,
-like in the `source_column_selector => function => new_column_names` form,
-then the function is repeated in each pair of the resultant vector.
-This is an easy way to apply a function to multiple columns at the same time.
+In Julia,
+a non-vector broadcasted with a vector will be repeated in each resultant pair element.
+
+```julia
+julia> ["x", "y"] .=> :a    # :a is repeated
+2-element Vector{Pair{String, Symbol}}:
+ "x" => :a
+ "y" => :a
+
+julia> 1 .=> [:a, :b]       # 1 is repeated
+2-element Vector{Pair{Int64, Symbol}}:
+ 1 => :a
+ 1 => :b
+```
+
+We can use this fact to easily broadcast an `operation_function` to multiple columns.
 
 ```julia
 julia> f(x) = 2 * x
 f (generic function with 1 method)
 
-julia> ["x", "y"] .=> f .=> ["a", "b"]
+julia> ["x", "y"] .=> f  # f is repeated
+2-element Vector{Pair{String, typeof(f)}}:
+ "x" => f
+ "y" => f
+
+julia> select(df, ["x", "y"] .=> f)  # apply f with automatic column renaming
+3×2 DataFrame
+ Row │ x_f    y_f
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
+
+julia> ["x", "y"] .=> f .=> ["a", "b"]  # f is repeated
 2-element Vector{Pair{String, Pair{typeof(f), String}}}:
  "x" => (f => "a")
  "y" => (f => "b")
 
-julia> select(df, ["x", "y"] .=> f .=> ["a", "b"])
+julia> select(df, ["x", "y"] .=> f .=> ["a", "b"])  # apply f with manual column renaming
 3×2 DataFrame
  Row │ a      b
      │ Int64  Int64
@@ -2604,12 +2630,12 @@ It will also be repeated in each operation `Pair`.
 julia> newname(s::String) = s * "_new"
 newname (generic function with 1 method)
 
-julia> ["x", "y"] .=> f .=> newname
+julia> ["x", "y"] .=> f .=> newname  # both f and newname are repeated
 2-element Vector{Pair{String, Pair{typeof(f), typeof(newname)}}}:
  "x" => (f => newname)
  "y" => (f => newname)
 
-julia> select(df, ["x", "y"] .=> f .=> newname)
+julia> select(df, ["x", "y"] .=> f .=> newname)  # apply f then rename column with newname
 3×2 DataFrame
  Row │ x_new  y_new
      │ Int64  Int64

From 2e9d2aff807983152b1a77f735441ebe424e2bf8 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Mon, 18 Sep 2023 12:31:49 -0400
Subject: [PATCH 12/29] Made consistent with current proposal #3361

---
 docs/src/man/basics.md | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 4626d2d4a..12fea13fe 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -2207,7 +2207,7 @@ it is probably again more useful to use the `rename` or `rename!` function
 rather than one of the manipulation functions
 in order to rename in-place and avoid the intermediate `operation_function`.
 ```julia
-julia> rename(df, :a => add_prefix) # rename some columns
+julia> rename(df, :a => add_prefix) # rename one column
 4×2 DataFrame
 Row │ new_a  b
    │ Int64  Int64
@@ -2226,6 +2226,9 @@ Row │ new_a  new_b
    2 │     2      6
    3 │     3      7
    4 │     4      8
+
+# Broadcasting syntax can be used to rename only some columns.
+# See the Broadcasting Operation Pairs section below.
 ```
 
 In the `source_column_selector => new_column_names` operation form,

From b0777b180c02aa6e197d1a57eac8de2092745aa7 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Thu, 21 Sep 2023 11:12:53 -0400
Subject: [PATCH 13/29] =?UTF-8?q?Change=20=CE=B1=20to=20apple=20and=20make?=
 =?UTF-8?q?=20consistent=20with=20#3380?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 docs/src/man/basics.md | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 12fea13fe..0f432a90b 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -2108,9 +2108,9 @@ julia> df = DataFrame(a=1:4, b=5:8)
    3 │     3      7
    4 │     4      8
 
-julia> transform(df, :a => :α) # adds column α
+julia> transform(df, :a => :apple) # adds column `apple`
 4×3 DataFrame
- Row │ a      b      α
+ Row │ a      b      apple
      │ Int64  Int64  Int64
 ─────┼─────────────────────
    1 │     1      5      1
@@ -2118,9 +2118,9 @@ julia> transform(df, :a => :α) # adds column α
    3 │     3      7      3
    4 │     4      8      4
 
-julia> select(df, :a => :α) # retains only column α
+julia> select(df, :a => :apple) # retains only column `apple`
 4×1 DataFrame
- Row │ α
+ Row │ apple
      │ Int64
 ─────┼───────
    1 │     1
@@ -2128,9 +2128,9 @@ julia> select(df, :a => :α) # retains only column α
    3 │     3
    4 │     4
 
-julia> rename(df, :a => :α) # renames column α in-place
+julia> rename(df, :a => :apple) # renames column `a` to `apple` in-place
 4×2 DataFrame
- Row │ α      b
+ Row │ apple  b
      │ Int64  Int64
 ─────┼──────────────
    1 │     1      5
@@ -2207,28 +2207,25 @@ it is probably again more useful to use the `rename` or `rename!` function
 rather than one of the manipulation functions
 in order to rename in-place and avoid the intermediate `operation_function`.
 ```julia
-julia> rename(df, :a => add_prefix) # rename one column
+julia> rename(add_prefix, df)  # rename all columns with a function
 4×2 DataFrame
-Row │ new_a  b
-   │ Int64  Int64
+ Row │ new_a  new_b
+     │ Int64  Int64
 ─────┼──────────────
    1 │     1      5
    2 │     2      6
    3 │     3      7
    4 │     4      8
 
-julia> rename(add_prefix, df) # rename all columns
+julia> rename(add_prefix, df; cols=:a)  # rename some columns with a function
 4×2 DataFrame
-Row │ new_a  new_b
-   │ Int64  Int64
+ Row │ new_a  b
+     │ Int64  Int64
 ─────┼──────────────
    1 │     1      5
    2 │     2      6
    3 │     3      7
    4 │     4      8
-
-# Broadcasting syntax can be used to rename only some columns.
-# See the Broadcasting Operation Pairs section below.
 ```
 
 In the `source_column_selector => new_column_names` operation form,

From a111ef8a1e6ecd437832ef18e4afc5b022e7e997 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Thu, 28 Sep 2023 16:26:37 -0400
Subject: [PATCH 14/29] First round review corrections

---
 docs/src/man/basics.md | 117 ++++++++++++++++++++++++++++++-----------
 1 file changed, 87 insertions(+), 30 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 0f432a90b..485175ffa 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -1570,38 +1570,63 @@ julia> german[Not(5), r"S"]
 In DataFrames.jl there are seven functions that can be used
 to manipulate data frame columns:
 
-| Function     | Memory Usage                     | Column Retention                             | Row Retention                                     |
-| ------------ | -------------------------------- | -------------------------------------------- | ------------------------------------------------- |
-| `transform`  | Creates a new data frame.        | Retains both source and manipulated columns. | Retains same number of rows as source data frame. |
-| `transform!` | Modifies an existing data frame. | Retains both source and manipulated columns. | Retains same number of rows as source data frame. |
-| `select`     | Creates a new data frame.        | Retains only manipulated columns.            | Retains same number of rows as source data frame. |
-| `select!`    | Modifies an existing data frame. | Retains only manipulated columns.            | Retains same number of rows as source data frame. |
-| `subset`     | Creates a new data frame.        | Retains only source columns.                 | Number of rows is determined by the manipulation. |
-| `subset!`    | Modifies an existing data frame. | Retains only source columns.                 | Number of rows is determined by the manipulation. |
-| `combine`    | Creates a new data frame.        | Retains only manipulated columns.            | Number of rows is determined by the manipulation. |
+| Function     | Memory Usage                     | Column Retention                        | Row Retention                                       |
+| ------------ | -------------------------------- | --------------------------------------- | --------------------------------------------------- |
+| `transform`  | Creates a new data frame.        | Retains original and resultant columns. | Retains same number of rows as original data frame. |
+| `transform!` | Modifies an existing data frame. | Retains original and resultant columns. | Retains same number of rows as original data frame. |
+| `select`     | Creates a new data frame.        | Retains only resultant columns.         | Retains same number of rows as original data frame. |
+| `select!`    | Modifies an existing data frame. | Retains only resultant columns.         | Retains same number of rows as original data frame. |
+| `subset`     | Creates a new data frame.        | Retains original columns.               | Retains only rows where condition is true.          |
+| `subset!`    | Modifies an existing data frame. | Retains original columns.               | Retains only rows where condition is true.          |
+| `combine`    | Creates a new data frame.        | Retains only resultant columns.         | Retains only resultant rows.                        |
 
 ### Constructing Operation Pairs
+
 All of the functions above use the same syntax which is commonly
 `manipulation_function(dataframe, operation)`.
-The `operation` argument is a `Pair` which defines the
+The `operation` argument defines the
 operation to be applied to the source `dataframe`,
 and it can take any of the following common forms explained below:
 
 `source_column_selector`
 : selects source column(s) without manipulating or renaming them
 
+   Examples: `:a`, `[:a, :b]`, `All()`, `Not(:a)`
+
 `source_column_selector => operation_function`
 : passes source column(s) as arguments to a function
 and automatically names the resulting column(s)
 
+   Examples: `:a => sum`, `[:a, :b] => +`, `:a => ByRow(==(3))`
+
 `source_column_selector => operation_function => new_column_names`
 : passes source column(s) as arguments to a function
 and names the resulting column(s) `new_column_names`
 
+   Examples: `:a => sum => :sum_of_a`, `[:a, :b] => + => :a_plus_b`
+
+   (*Not available for `subset` or `subset!`*)
+
 `source_column_selector => new_column_names`
 : renames a source column,
 or splits a column containing collection elements into multiple new columns
-(*not available for `subset` or `subset!`*)
+
+   Examples: `:a => :new_a`, `:a_b => [:a, :b]`, `:nt => AsTable`
+
+   (*Not available for `subset` or `subset!`*)
+
+The `=>` operator constructs a
+[Pair](https://docs.julialang.org/en/v1/base/collections/#Core.Pair),
+which is a type to link one object to another.
+(Pairs are commonly used to create elements of a
+[Dictionary](https://docs.julialang.org/en/v1/base/collections/#Dictionaries).)
+In DataFrames.jl manipulation functions,
+`Pair` arguments are used to define column `operations` to be performed.
+The provided examples will be explained in more detail below.
+
+The manipulation functions also have methods for applying multiple operations.
+See the later sections [Multiple Operations per Manipulation](@ref)
+and [Broadcasting Operation Pairs](@ref) for more information.
 
 #### `source_column_selector`
 Inside an `operation`, `source_column_selector` is usually a column name
@@ -1682,9 +1707,9 @@ See the [Indexing](@ref) API for the full list of possible values with reference
 
 !!! Note
       The Julia parser sometimes prevents `:` from being used by itself.
-      `ERROR: syntax: whitespace not allowed after ":" used for quoting`
-      means your `:` must be wrapped in either `(:)` or `Cols(:)`
-      to be properly interpreted.
+      If you get
+      `ERROR: syntax: whitespace not allowed after ":" used for quoting`,
+      try using `All()`, `Cols(:)`, or `(:)` instead to select all columns.
 
 ```julia
 julia> df = DataFrame(
@@ -1759,17 +1784,11 @@ julia> subset(df2, [:minor, :male])
    1 │ Jimmy    true  true
 ```
 
-`AsTable(source_column_selector)` is a special `source_column_selector`
-that can be used to select multiple columns into a single `NamedTuple`.
-This is not useful on its own, so the function of this selector
-will be explained in the next section.
-
-
 #### `operation_function`
 Inside an `operation` pair, `operation_function` is a function
 which operates on data frame columns passed as vectors.
 When multiple columns are selected by `source_column_selector`,
-the `operation_function` will receive the columns as multiple positional arguments
+the `operation_function` will receive the columns as separate positional arguments
 in the order they were selected, e.g. `f(column1, column2, column3)`.
 
 ```julia
@@ -1789,7 +1808,7 @@ julia> combine(df, :a => sum)
 ─────┼───────
    1 │     6
 
-julia> transform(df, :b => maximum) # `transform` and `select` copy result to all rows
+julia> transform(df, :b => maximum) # `transform` and `select` copy scalar result to all rows
 3×3 DataFrame
  Row │ a      b      b_maximum
      │ Int64  Int64  Int64
@@ -1867,6 +1886,18 @@ julia> transform(df, :a => g)
    1 │     1      4      2
    2 │     2      5      3
    3 │     3      4      4
+
+julia> h(x, y) = 2x .+ y
+h (generic function with 1 method)
+
+julia> transform(df, [:a, :b] => h)
+3×3 DataFrame
+ Row │ a      b      a_b_h
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      6
+   2 │     2      5      9
+   3 │     3      4     10
 ```
 
 [Anonymous functions](https://docs.julialang.org/en/v1/manual/functions/#man-anonymous-functions)
@@ -1939,7 +1970,7 @@ julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 2:-1:1)
    1 │     1      3      5      2
    2 │     2      4      6      1
 
-julia> select(df, Cols(:) => ByRow(min)) # min works on multiple arguments
+julia> select(df, Cols(:) => ByRow(min)) # min operates on multiple arguments
 2×1 DataFrame
  Row │ a_b_etc_min
      │ Int64
@@ -1947,7 +1978,7 @@ julia> select(df, Cols(:) => ByRow(min)) # min works on multiple arguments
    1 │           1
    2 │           1
 
-julia> select(df, AsTable(:) => ByRow(minimum)) # minimum works on a collection
+julia> select(df, AsTable(:) => ByRow(minimum)) # minimum operates on a collection
 2×1 DataFrame
  Row │ a_b_etc_minimum
      │ Int64
@@ -1955,7 +1986,7 @@ julia> select(df, AsTable(:) => ByRow(minimum)) # minimum works on a collection
    1 │               1
    2 │               1
 
-julia> select(df, [:a,:b] => ByRow(+)) # `+` works on a multiple arguments
+julia> select(df, [:a,:b] => ByRow(+)) # `+` operates on a multiple arguments
 2×1 DataFrame
  Row │ a_b_+
      │ Int64
@@ -1963,7 +1994,7 @@ julia> select(df, [:a,:b] => ByRow(+)) # `+` works on a multiple arguments
    1 │     4
    2 │     6
 
-julia> select(df, AsTable([:a,:b]) => ByRow(sum)) # `sum` works on a collection
+julia> select(df, AsTable([:a,:b]) => ByRow(sum)) # `sum` operates on a collection
 2×1 DataFrame
  Row │ a_b_sum
      │ Int64
@@ -1973,7 +2004,7 @@ julia> select(df, AsTable([:a,:b]) => ByRow(sum)) # `sum` works on a collection
 
 julia> using Statistics # contains the `mean` function
 
-julia> select(df, AsTable(Between(:b, :d)) => ByRow(mean))
+julia> select(df, AsTable(Between(:b, :d)) => ByRow(mean)) # `mean` operates on a collection
 2×1 DataFrame
  Row │ b_c_d_mean
      │ Float64
@@ -2047,7 +2078,7 @@ specify your own `new_column_names`.
 
 `new_column_names` can be included at the end of an `operation` pair to specify
 the name of the new column(s).
-`new_column_names` may be a symbol or a string.
+`new_column_names` may be a symbol, string, function, vector of symbols, vector of strings, or `AsTable`.
 
 ```julia
 julia> df = DataFrame(a=1:4, b=5:8)
@@ -2094,7 +2125,7 @@ julia> transform(df, :a => ByRow(x->x+10) => "a+10")
 The `source_column_selector => new_column_names` operation form
 can be used to rename columns without an intermediate function.
 However, there are `rename` and `rename!` functions,
-which accept the same syntax,
+which accept similar syntax,
 that tend to be more useful for this operation.
 
 ```julia
@@ -2179,7 +2210,33 @@ julia> transform(df, :a => (x -> 10 .* x) => (s -> "new_" * s)) # with anonymous
    4 │     4      8     40
 ```
 
-Note that a renaming function will not work in the
+!!! Note
+      It is a good idea to wrap anonymous functions in parentheses
+      to avoid the `=>` operator accidently becoming part of the anonymous function.
+      The examples above do not work correctly without the parentheses!
+      ```julia
+      julia> transform(df, :a => x -> 10 .* x => add_prefix)  # Not what we wanted!
+      4×3 DataFrame
+       Row │ a      b      a_function
+           │ Int64  Int64  Pair…
+      ─────┼────────────────────────────────────────────
+         1 │     1      5  [10, 20, 30, 40]=>add_prefix
+         2 │     2      6  [10, 20, 30, 40]=>add_prefix
+         3 │     3      7  [10, 20, 30, 40]=>add_prefix
+         4 │     4      8  [10, 20, 30, 40]=>add_prefix
+
+      julia> transform(df, :a => x -> 10 .* x => s -> "new_" * s)  # Not what we wanted!
+      4×3 DataFrame
+       Row │ a      b      a_function
+           │ Int64  Int64  Pair…
+      ─────┼─────────────────────────────────────
+         1 │     1      5  [10, 20, 30, 40]=>#18
+         2 │     2      6  [10, 20, 30, 40]=>#18
+         3 │     3      7  [10, 20, 30, 40]=>#18
+         4 │     4      8  [10, 20, 30, 40]=>#18
+      ```
+
+A renaming function will not work in the
 `source_column_selector => new_column_names` operation form
 because a function in the second element of the operation pair is assumed to take
 the `source_column_selector => operation_function` operation form.

From ce55607655f6136060abf21d1fbbbf4023e63c27 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Fri, 29 Sep 2023 11:07:18 -0400
Subject: [PATCH 15/29] Move to its own section

---
 docs/src/man/basics.md                 | 1361 +-----------------------
 docs/src/man/manipulation_functions.md | 1345 +++++++++++++++++++++++
 2 files changed, 1369 insertions(+), 1337 deletions(-)
 create mode 100644 docs/src/man/manipulation_functions.md

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 485175ffa..daadf7a00 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -1567,1320 +1567,28 @@ julia> german[Not(5), r"S"]
 
 ## Basic Usage of Manipulation Functions
 
-In DataFrames.jl there are seven functions that can be used
-to manipulate data frame columns:
-
-| Function     | Memory Usage                     | Column Retention                        | Row Retention                                       |
-| ------------ | -------------------------------- | --------------------------------------- | --------------------------------------------------- |
-| `transform`  | Creates a new data frame.        | Retains original and resultant columns. | Retains same number of rows as original data frame. |
-| `transform!` | Modifies an existing data frame. | Retains original and resultant columns. | Retains same number of rows as original data frame. |
-| `select`     | Creates a new data frame.        | Retains only resultant columns.         | Retains same number of rows as original data frame. |
-| `select!`    | Modifies an existing data frame. | Retains only resultant columns.         | Retains same number of rows as original data frame. |
-| `subset`     | Creates a new data frame.        | Retains original columns.               | Retains only rows where condition is true.          |
-| `subset!`    | Modifies an existing data frame. | Retains original columns.               | Retains only rows where condition is true.          |
-| `combine`    | Creates a new data frame.        | Retains only resultant columns.         | Retains only resultant rows.                        |
-
-### Constructing Operation Pairs
-
-All of the functions above use the same syntax which is commonly
-`manipulation_function(dataframe, operation)`.
-The `operation` argument defines the
-operation to be applied to the source `dataframe`,
-and it can take any of the following common forms explained below:
-
-`source_column_selector`
-: selects source column(s) without manipulating or renaming them
-
-   Examples: `:a`, `[:a, :b]`, `All()`, `Not(:a)`
-
-`source_column_selector => operation_function`
-: passes source column(s) as arguments to a function
-and automatically names the resulting column(s)
-
-   Examples: `:a => sum`, `[:a, :b] => +`, `:a => ByRow(==(3))`
-
-`source_column_selector => operation_function => new_column_names`
-: passes source column(s) as arguments to a function
-and names the resulting column(s) `new_column_names`
-
-   Examples: `:a => sum => :sum_of_a`, `[:a, :b] => + => :a_plus_b`
-
-   (*Not available for `subset` or `subset!`*)
-
-`source_column_selector => new_column_names`
-: renames a source column,
-or splits a column containing collection elements into multiple new columns
-
-   Examples: `:a => :new_a`, `:a_b => [:a, :b]`, `:nt => AsTable`
-
-   (*Not available for `subset` or `subset!`*)
-
-The `=>` operator constructs a
-[Pair](https://docs.julialang.org/en/v1/base/collections/#Core.Pair),
-which is a type to link one object to another.
-(Pairs are commonly used to create elements of a
-[Dictionary](https://docs.julialang.org/en/v1/base/collections/#Dictionaries).)
-In DataFrames.jl manipulation functions,
-`Pair` arguments are used to define column `operations` to be performed.
-The provided examples will be explained in more detail below.
-
-The manipulation functions also have methods for applying multiple operations.
-See the later sections [Multiple Operations per Manipulation](@ref)
-and [Broadcasting Operation Pairs](@ref) for more information.
-
-#### `source_column_selector`
-Inside an `operation`, `source_column_selector` is usually a column name
-or column index which identifies a data frame column.
-
-`source_column_selector` may be used as the entire `operation`
-with `select` or `select!` to isolate or reorder columns.
-
-```julia
-julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6], c = [7, 8, 9])
-3×3 DataFrame
- Row │ a      b      c
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      7
-   2 │     2      5      8
-   3 │     3      6      9
-
-julia> select(df, :b)
-3×1 DataFrame
- Row │ b
-     │ Int64
-─────┼───────
-   1 │     4
-   2 │     5
-   3 │     6
-
-julia> select(df, "b")
-3×1 DataFrame
- Row │ b
-     │ Int64
-─────┼───────
-   1 │     4
-   2 │     5
-   3 │     6
-
-julia> select(df, 2)
-3×1 DataFrame
- Row │ b
-     │ Int64
-─────┼───────
-   1 │     4
-   2 │     5
-   3 │     6
-```
-
-`source_column_selector` may also be used as the entire `operation`
-with `subset` or `subset!` if the source column contains `Bool` values.
-
-```julia
-julia> df = DataFrame(
-           name = ["Scott", "Jill", "Erica", "Jimmy"],
-           minor = [false, true, false, true],
-       )
-4×2 DataFrame
- Row │ name    minor
-     │ String  Bool
-─────┼───────────────
-   1 │ Scott   false
-   2 │ Jill     true
-   3 │ Erica   false
-   4 │ Jimmy    true
-
-julia> subset(df, :minor)
-2×2 DataFrame
- Row │ name    minor
-     │ String  Bool
-─────┼───────────────
-   1 │ Jill     true
-   2 │ Jimmy    true
-```
-
-`source_column_selector` may instead be a collection of columns such as a vector,
-a [regular expression](https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions),
-a `Not`, `Between`, `All`, or `Cols` expression,
-or a `:`.
-See the [Indexing](@ref) API for the full list of possible values with references.
-
-!!! Note
-      The Julia parser sometimes prevents `:` from being used by itself.
-      If you get
-      `ERROR: syntax: whitespace not allowed after ":" used for quoting`,
-      try using `All()`, `Cols(:)`, or `(:)` instead to select all columns.
-
-```julia
-julia> df = DataFrame(
-           id = [1, 2, 3],
-           first_name = ["José", "Emma", "Nathan"],
-           last_name = ["Garcia", "Marino", "Boyer"],
-           age = [61, 24, 33]
-       )
-3×4 DataFrame
- Row │ id     first_name  last_name  age
-     │ Int64  String      String     Int64
-─────┼─────────────────────────────────────
-   1 │     1  José        Garcia        61
-   2 │     2  Emma        Marino        24
-   3 │     3  Nathan      Boyer         33
-
-julia> select(df, [:last_name, :first_name])
-3×2 DataFrame
- Row │ last_name  first_name
-     │ String     String
-─────┼───────────────────────
-   1 │ Garcia     José
-   2 │ Marino     Emma
-   3 │ Boyer      Nathan
-
-julia> select(df, r"name")
-3×2 DataFrame
- Row │ first_name  last_name
-     │ String      String
-─────┼───────────────────────
-   1 │ José        Garcia
-   2 │ Emma        Marino
-   3 │ Nathan      Boyer
-
-julia> select(df, Not(:id))
-3×3 DataFrame
- Row │ first_name  last_name  age
-     │ String      String     Int64
-─────┼──────────────────────────────
-   1 │ José        Garcia        61
-   2 │ Emma        Marino        24
-   3 │ Nathan      Boyer         33
-
-julia> select(df, Between(2,4))
-3×3 DataFrame
- Row │ first_name  last_name  age
-     │ String      String     Int64
-─────┼──────────────────────────────
-   1 │ José        Garcia        61
-   2 │ Emma        Marino        24
-   3 │ Nathan      Boyer         33
-
-julia> df2 = DataFrame(
-           name = ["Scott", "Jill", "Erica", "Jimmy"],
-           minor = [false, true, false, true],
-           male = [true, false, false, true],
-       )
-4×3 DataFrame
- Row │ name    minor  male
-     │ String  Bool   Bool
-─────┼──────────────────────
-   1 │ Scott   false   true
-   2 │ Jill     true  false
-   3 │ Erica   false  false
-   4 │ Jimmy    true   true
-
-julia> subset(df2, [:minor, :male])
-1×3 DataFrame
- Row │ name    minor  male
-     │ String  Bool   Bool
-─────┼─────────────────────
-   1 │ Jimmy    true  true
-```
-
-#### `operation_function`
-Inside an `operation` pair, `operation_function` is a function
-which operates on data frame columns passed as vectors.
-When multiple columns are selected by `source_column_selector`,
-the `operation_function` will receive the columns as separate positional arguments
-in the order they were selected, e.g. `f(column1, column2, column3)`.
-
-```julia
-julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 4])
-3×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     2      5
-   3 │     3      4
-
-julia> combine(df, :a => sum)
-1×1 DataFrame
- Row │ a_sum
-     │ Int64
-─────┼───────
-   1 │     6
-
-julia> transform(df, :b => maximum) # `transform` and `select` copy scalar result to all rows
-3×3 DataFrame
- Row │ a      b      b_maximum
-     │ Int64  Int64  Int64
-─────┼─────────────────────────
-   1 │     1      4          5
-   2 │     2      5          5
-   3 │     3      4          5
-
-julia> transform(df, [:b, :a] => -) # vector subtraction is okay
-3×3 DataFrame
- Row │ a      b      b_a_-
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      3
-   2 │     2      5      3
-   3 │     3      4      1
-
-julia> transform(df, [:a, :b] => *) # vector multiplication is not defined
-ERROR: MethodError: no method matching *(::Vector{Int64}, ::Vector{Int64})
-```
-
-Don't worry! There is a quick fix for the previous error.
-If you want to apply a function to each element in a column
-instead of to the entire column vector,
-then you can wrap your element-wise function in `ByRow` like
-`ByRow(my_elementwise_function)`.
-This will apply `my_elementwise_function` to every element in the column
-and then collect the results back into a vector.
-
-```julia
-julia> transform(df, [:a, :b] => ByRow(*))
-3×3 DataFrame
- Row │ a      b      a_b_*
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      4
-   2 │     2      5     10
-   3 │     3      4     12
-
-julia> transform(df, Cols(:) => ByRow(max))
-3×3 DataFrame
- Row │ a      b      a_b_max
-     │ Int64  Int64  Int64
-─────┼───────────────────────
-   1 │     1      4        4
-   2 │     2      5        5
-   3 │     3      4        4
-
-julia> f(x) = x + 1
-f (generic function with 1 method)
-
-julia> transform(df, :a => ByRow(f))
-3×3 DataFrame
- Row │ a      b      a_f
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      2
-   2 │     2      5      3
-   3 │     3      4      4
-```
-
-Alternatively, you may just want to define the function itself so it
-[broadcasts](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
-over vectors.
-
-```julia
-julia> g(x) = x .+ 1
-g (generic function with 1 method)
-
-julia> transform(df, :a => g)
-3×3 DataFrame
- Row │ a      b      a_g
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      2
-   2 │     2      5      3
-   3 │     3      4      4
-
-julia> h(x, y) = 2x .+ y
-h (generic function with 1 method)
-
-julia> transform(df, [:a, :b] => h)
-3×3 DataFrame
- Row │ a      b      a_b_h
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      6
-   2 │     2      5      9
-   3 │     3      4     10
-```
-
-[Anonymous functions](https://docs.julialang.org/en/v1/manual/functions/#man-anonymous-functions)
-are a convenient way to define and use an `operation_function`
-all within the manipulation function call.
-
-```julia
-julia> select(df, :a => ByRow(x -> x + 1))
-3×1 DataFrame
- Row │ a_function
-     │ Int64
-─────┼────────────
-   1 │          2
-   2 │          3
-   3 │          4
-
-julia> transform(df, [:a, :b] => ByRow((x, y) -> 2x + y))
-3×3 DataFrame
- Row │ a      b      a_b_function
-     │ Int64  Int64  Int64
-─────┼────────────────────────────
-   1 │     1      4             6
-   2 │     2      5             9
-   3 │     3      4            10
-
-julia> subset(df, :b => ByRow(x -> x < 5))
-2×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     3      4
-
-julia> subset(df, :b => ByRow(<(5))) # shorter version of the previous
-2×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     3      4
-```
-
-!!! Note
-    `operation_functions` within `subset` or `subset!` function calls
-    must return a Boolean vector.
-    `true` elements in the Boolean vector will determine
-    which rows are retained in the resulting data frame.
-
-As demonstrated above, `DataFrame` columns are usually passed
-from `source_column_selector` to `operation_function` as one or more
-vector arguments.
-However, when `AsTable(source_column_selector)` is used,
-the selected columns are collected and passed as a single `NamedTuple`
-to `operation_function`.
-
-This is often useful when your `operation_function` is defined to operate
-on a single collection argument rather than on multiple positional arguments.
-The distinction is somewhat similar to the difference between the built-in
-`min` and `minimum` functions.
-`min` is defined to find the minimum value among multiple positional arguments,
-while `minimum` is defined to find the minimum value
-among the elements of a single collection argument.
-
-```julia
-julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 2:-1:1)
-2×4 DataFrame
- Row │ a      b      c      d
-     │ Int64  Int64  Int64  Int64
-─────┼────────────────────────────
-   1 │     1      3      5      2
-   2 │     2      4      6      1
-
-julia> select(df, Cols(:) => ByRow(min)) # min operates on multiple arguments
-2×1 DataFrame
- Row │ a_b_etc_min
-     │ Int64
-─────┼─────────────
-   1 │           1
-   2 │           1
-
-julia> select(df, AsTable(:) => ByRow(minimum)) # minimum operates on a collection
-2×1 DataFrame
- Row │ a_b_etc_minimum
-     │ Int64
-─────┼─────────────────
-   1 │               1
-   2 │               1
-
-julia> select(df, [:a,:b] => ByRow(+)) # `+` operates on a multiple arguments
-2×1 DataFrame
- Row │ a_b_+
-     │ Int64
-─────┼───────
-   1 │     4
-   2 │     6
-
-julia> select(df, AsTable([:a,:b]) => ByRow(sum)) # `sum` operates on a collection
-2×1 DataFrame
- Row │ a_b_sum
-     │ Int64
-─────┼─────────
-   1 │       4
-   2 │       6
-
-julia> using Statistics # contains the `mean` function
-
-julia> select(df, AsTable(Between(:b, :d)) => ByRow(mean)) # `mean` operates on a collection
-2×1 DataFrame
- Row │ b_c_d_mean
-     │ Float64
-─────┼────────────
-   1 │    3.33333
-   2 │    3.66667
-```
-
-`AsTable` can also be used to pass columns to a function which operates
-on fields of a `NamedTuple`.
-
-```julia
-julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 7:8)
-2×4 DataFrame
- Row │ a      b      c      d
-     │ Int64  Int64  Int64  Int64
-─────┼────────────────────────────
-   1 │     1      3      5      7
-   2 │     2      4      6      8
-
-julia> f(nt) = nt.a + nt.d
-f (generic function with 1 method)
-
-julia> transform(df, AsTable(:) => ByRow(f))
-2×5 DataFrame
- Row │ a      b      c      d      a_b_etc_f
-     │ Int64  Int64  Int64  Int64  Int64
-─────┼───────────────────────────────────────
-   1 │     1      3      5      7          8
-   2 │     2      4      6      8         10
-```
-
-As demonstrated above,
-in the `source_column_selector => operation_function` operation pair form,
-the results of an operation will be placed into a new column with an
-automatically-generated name based on the operation;
-the new column name will be the `operation_function` name
-appended to the source column name(s) with an underscore.
-
-This automatic column naming behavior can be avoided in two ways.
-First, the operation result can be placed back into the original column
-with the original column name by switching the keyword argument `renamecols`
-from its default value (`true`) to `renamecols=false`.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> transform(df, :a => ByRow(x->x+10), renamecols=false) # add 10 in-place
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │    11      5
-   2 │    12      6
-   3 │    13      7
-   4 │    14      8
-```
-
-The second method to avoid the default manipulation column naming is to
-specify your own `new_column_names`.
-
-#### `new_column_names`
-
-`new_column_names` can be included at the end of an `operation` pair to specify
-the name of the new column(s).
-`new_column_names` may be a symbol, string, function, vector of symbols, vector of strings, or `AsTable`.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> transform(df, Cols(:) => ByRow(+) => :c)
-4×3 DataFrame
- Row │ a      b      c
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      6
-   2 │     2      6      8
-   3 │     3      7     10
-   4 │     4      8     12
-
-julia> transform(df, Cols(:) => ByRow(+) => "a+b")
-4×3 DataFrame
- Row │ a      b      a+b
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      6
-   2 │     2      6      8
-   3 │     3      7     10
-   4 │     4      8     12
-
-julia> transform(df, :a => ByRow(x->x+10) => "a+10")
-4×3 DataFrame
- Row │ a      b      a+10
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5     11
-   2 │     2      6     12
-   3 │     3      7     13
-   4 │     4      8     14
-```
-
-The `source_column_selector => new_column_names` operation form
-can be used to rename columns without an intermediate function.
-However, there are `rename` and `rename!` functions,
-which accept similar syntax,
-that tend to be more useful for this operation.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> transform(df, :a => :apple) # adds column `apple`
-4×3 DataFrame
- Row │ a      b      apple
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      1
-   2 │     2      6      2
-   3 │     3      7      3
-   4 │     4      8      4
-
-julia> select(df, :a => :apple) # retains only column `apple`
-4×1 DataFrame
- Row │ apple
-     │ Int64
-─────┼───────
-   1 │     1
-   2 │     2
-   3 │     3
-   4 │     4
-
-julia> rename(df, :a => :apple) # renames column `a` to `apple` in-place
-4×2 DataFrame
- Row │ apple  b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-```
-
-Additionally, in the
-`source_column_selector => operation_function => new_column_names` operation form,
-`new_column_names` may be a renaming function which operates on a string
-to create the destination column names programmatically.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> add_prefix(s) = "new_" * s
-add_prefix (generic function with 1 method)
-
-julia> transform(df, :a => (x -> 10 .* x) => add_prefix) # with named renaming function
-4×3 DataFrame
- Row │ a      b      new_a
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5     10
-   2 │     2      6     20
-   3 │     3      7     30
-   4 │     4      8     40
-
-julia> transform(df, :a => (x -> 10 .* x) => (s -> "new_" * s)) # with anonymous renaming function
-4×3 DataFrame
- Row │ a      b      new_a
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5     10
-   2 │     2      6     20
-   3 │     3      7     30
-   4 │     4      8     40
-```
-
-!!! Note
-      It is a good idea to wrap anonymous functions in parentheses
-      to avoid the `=>` operator accidently becoming part of the anonymous function.
-      The examples above do not work correctly without the parentheses!
-      ```julia
-      julia> transform(df, :a => x -> 10 .* x => add_prefix)  # Not what we wanted!
-      4×3 DataFrame
-       Row │ a      b      a_function
-           │ Int64  Int64  Pair…
-      ─────┼────────────────────────────────────────────
-         1 │     1      5  [10, 20, 30, 40]=>add_prefix
-         2 │     2      6  [10, 20, 30, 40]=>add_prefix
-         3 │     3      7  [10, 20, 30, 40]=>add_prefix
-         4 │     4      8  [10, 20, 30, 40]=>add_prefix
-
-      julia> transform(df, :a => x -> 10 .* x => s -> "new_" * s)  # Not what we wanted!
-      4×3 DataFrame
-       Row │ a      b      a_function
-           │ Int64  Int64  Pair…
-      ─────┼─────────────────────────────────────
-         1 │     1      5  [10, 20, 30, 40]=>#18
-         2 │     2      6  [10, 20, 30, 40]=>#18
-         3 │     3      7  [10, 20, 30, 40]=>#18
-         4 │     4      8  [10, 20, 30, 40]=>#18
-      ```
-
-A renaming function will not work in the
-`source_column_selector => new_column_names` operation form
-because a function in the second element of the operation pair is assumed to take
-the `source_column_selector => operation_function` operation form.
-To work around this limitation, use the
-`source_column_selector => operation_function => new_column_names` operation form
-with `identity` as the `operation_function`.
-
-```julia
-julia> transform(df, :a => add_prefix)
-ERROR: MethodError: no method matching *(::String, ::Vector{Int64})
-
-julia> transform(df, :a => identity => add_prefix)
-4×3 DataFrame
- Row │ a      b      new_a
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      1
-   2 │     2      6      2
-   3 │     3      7      3
-   4 │     4      8      4
-```
-
-In this case though,
-it is probably again more useful to use the `rename` or `rename!` function
-rather than one of the manipulation functions
-in order to rename in-place and avoid the intermediate `operation_function`.
-```julia
-julia> rename(add_prefix, df)  # rename all columns with a function
-4×2 DataFrame
- Row │ new_a  new_b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> rename(add_prefix, df; cols=:a)  # rename some columns with a function
-4×2 DataFrame
- Row │ new_a  b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-```
-
-In the `source_column_selector => new_column_names` operation form,
-only a single source column may be selected per operation,
-so why is `new_column_names` plural?
-It is possible to split the data contained inside a single column
-into multiple new columns by supplying a vector of strings or symbols
-as `new_column_names`.
-
-```julia
-julia> df = DataFrame(data = [(1,2), (3,4)]) # vector of tuples
-2×1 DataFrame
- Row │ data
-     │ Tuple…
-─────┼────────
-   1 │ (1, 2)
-   2 │ (3, 4)
-
-julia> transform(df, :data => [:first, :second]) # manual naming
-2×3 DataFrame
- Row │ data    first  second
-     │ Tuple…  Int64  Int64
-─────┼───────────────────────
-   1 │ (1, 2)      1       2
-   2 │ (3, 4)      3       4
-```
-
-This kind of data splitting can even be done automatically with `AsTable`.
-
-```julia
-julia> transform(df, :data => AsTable) # default automatic naming with tuples
-2×3 DataFrame
- Row │ data    x1     x2
-     │ Tuple…  Int64  Int64
-─────┼──────────────────────
-   1 │ (1, 2)      1      2
-   2 │ (3, 4)      3      4
-```
-
-If a data frame column contains `NamedTuple`s,
-then `AsTable` will preserve the field names.
-```julia
-julia> df = DataFrame(data = [(a=1,b=2), (a=3,b=4)]) # vector of named tuples
-2×1 DataFrame
- Row │ data
-     │ NamedTup…
-─────┼────────────────
-   1 │ (a = 1, b = 2)
-   2 │ (a = 3, b = 4)
-
-julia> transform(df, :data => AsTable) # keeps names from named tuples
-2×3 DataFrame
- Row │ data            a      b
-     │ NamedTup…       Int64  Int64
-─────┼──────────────────────────────
-   1 │ (a = 1, b = 2)      1      2
-   2 │ (a = 3, b = 4)      3      4
-```
-
-!!! Note
-      To pack multiple columns into a single column of `NamedTuple`s
-      (reverse of the above operation)
-      apply the `identity` function `ByRow`, e.g.
-      `transform(df, AsTable([:a, :b]) => ByRow(identity) => :data)`.
-
-Renaming functions also work for multi-column transformations,
-but they must operate on a vector of strings.
-
-```julia
-julia> df = DataFrame(data = [(1,2), (3,4)])
-2×1 DataFrame
- Row │ data
-     │ Tuple…
-─────┼────────
-   1 │ (1, 2)
-   2 │ (3, 4)
-
-julia> new_names(v) = ["primary ", "secondary "] .* v
-new_names (generic function with 1 method)
-
-julia> transform(df, :data => identity => new_names)
-2×3 DataFrame
- Row │ data    primary data  secondary data
-     │ Tuple…  Int64         Int64
-─────┼──────────────────────────────────────
-   1 │ (1, 2)             1               2
-   2 │ (3, 4)             3               4
-```
-
-#### Multiple Operations per Manipulation
-All data frame manipulation functions can accept multiple `operation` pairs
-at once using any of the following methods:
-- `manipulation_function(dataframe, operation1, operation2)`   : multiple arguments
-- `manipulation_function(dataframe, [operation1, operation2])` : vector argument
-- `manipulation_function(dataframe, [operation1 operation2])`  : matrix argument
-
-Passing multiple operations is especially useful for the `select`, `select!`,
-and `combine` manipulation functions,
-since they only retain columns which are a result of the passed operations.
-
-```julia
-julia> df = DataFrame(a = 1:4, b = [50,50,60,60], c = ["hat","bat","cat","dog"])
-4×3 DataFrame
- Row │ a      b      c
-     │ Int64  Int64  String
-─────┼──────────────────────
-   1 │     1     50  hat
-   2 │     2     50  bat
-   3 │     3     60  cat
-   4 │     4     60  dog
-
-julia> combine(df, :a => maximum, :b => sum, :c => join) # 3 combine operations
-1×3 DataFrame
- Row │ a_maximum  b_sum  c_join
-     │ Int64      Int64  String
-─────┼────────────────────────────────
-   1 │         4    220  hatbatcatdog
-
-julia> select(df, :c, :b, :a) # re-order columns
-4×3 DataFrame
- Row │ c       b      a
-     │ String  Int64  Int64
-─────┼──────────────────────
-   1 │ hat        50      1
-   2 │ bat        50      2
-   3 │ cat        60      3
-   4 │ dog        60      4
-
-ulia> select(df, :b, :) # `:` here means all other columns
-4×3 DataFrame
- Row │ b      a      c
-     │ Int64  Int64  String
-─────┼──────────────────────
-   1 │    50      1  hat
-   2 │    50      2  bat
-   3 │    60      3  cat
-   4 │    60      4  dog
-
-julia> select(
-           df,
-           :c => (x -> "a " .* x) => :one_c,
-           :a => (x -> 100x),
-           :b,
-           renamecols=false
-       ) # can mix operation forms
-4×3 DataFrame
- Row │ one_c   a      b
-     │ String  Int64  Int64
-─────┼──────────────────────
-   1 │ a hat     100     50
-   2 │ a bat     200     50
-   3 │ a cat     300     60
-   4 │ a dog     400     60
-
-julia> select(
-           df,
-           :c => ByRow(reverse),
-           :c => ByRow(uppercase)
-       ) # multiple operations on same column
-4×2 DataFrame
- Row │ c_reverse  c_uppercase
-     │ String     String
-─────┼────────────────────────
-   1 │ tah        HAT
-   2 │ tab        BAT
-   3 │ tac        CAT
-   4 │ god        DOG
-```
-
-In the last two examples,
-the manipulation function arguments were split across multiple lines.
-This is a good way to make manipulations with many operations more readable.
-
-Passing multiple operations to `subset` or `subset!` is an easy way to narrow in
-on a particular row of data.
-
-```julia
-julia> subset(
-           df,
-           :b => ByRow(==(60)),
-           :c => ByRow(contains("at"))
-       ) # rows with 60 and "at"
-1×3 DataFrame
- Row │ a      b      c
-     │ Int64  Int64  String
-─────┼──────────────────────
-   1 │     3     60  cat
-```
-
-Note that all operations within a single manipulation must use the data
-as it existed before the function call
-i.e. you cannot use newly created columns for subsequent operations
-within the same manipulation.
-
-```julia
-julia> transform(
-           df,
-           [:a, :b] => ByRow(+) => :d,
-           :d => (x -> x ./ 2),
-       ) # requires two separate transformations
-ERROR: ArgumentError: column name :d not found in the data frame; existing most similar names are: :a, :b and :c
-
-julia> new_df = transform(df, [:a, :b] => ByRow(+) => :d)
-4×4 DataFrame
- Row │ a      b      c       d
-     │ Int64  Int64  String  Int64
-─────┼─────────────────────────────
-   1 │     1     50  hat        51
-   2 │     2     50  bat        52
-   3 │     3     60  cat        63
-   4 │     4     60  dog        64
-
-julia> transform!(new_df, :d => (x -> x ./ 2) => :d_2)
-4×5 DataFrame
- Row │ a      b      c       d      d_2
-     │ Int64  Int64  String  Int64  Float64
-─────┼──────────────────────────────────────
-   1 │     1     50  hat        51     25.5
-   2 │     2     50  bat        52     26.0
-   3 │     3     60  cat        63     31.5
-   4 │     4     60  dog        64     32.0
-```
-
-
-#### Broadcasting Operation Pairs
-
-[Broadcasting](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
-pairs with `.=>` is often a convenient way to generate multiple
-similar `operation`s to be applied within a single manipulation.
-Broadcasting within the `Pair` of an `operation` is no different than
-broadcasting in base Julia.
-The broadcasting `.=>` will be expanded into a vector of pairs
-(`[operation1, operation2, ...]`),
-and this expansion will occur before the manipulation function is invoked.
-Then the manipulation function will use the
-`manipulation_function(dataframe, [operation1, operation2, ...])` method.
-This process will be explained in more detail below.
-
-To illustrate these concepts, let us first examine the `Type` of a basic `Pair`.
-In DataFrames.jl, a symbol, string, or integer
-may be used to select a single column.
-Some `Pair`s with these types are below.
-
-```julia
-julia> typeof(:x => :a)
-Pair{Symbol, Symbol}
-
-julia> typeof("x" => "a")
-Pair{String, String}
-
-julia> typeof(1 => "a")
-Pair{Int64, String}
-```
-
-Any of the `Pair`s above could be used to rename the first column
-of the data frame below to `a`.
-
-```julia
-julia> df = DataFrame(x = 1:3, y = 4:6)
-3×2 DataFrame
- Row │ x      y
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     2      5
-   3 │     3      6
-
-julia> select(df, :x => :a)
-3×1 DataFrame
- Row │ a
-     │ Int64
-─────┼───────
-   1 │     1
-   2 │     2
-   3 │     3
-
-julia> select(df, 1 => "a")
-3×1 DataFrame
- Row │ a
-     │ Int64
-─────┼───────
-   1 │     1
-   2 │     2
-   3 │     3
-```
-
-What should we do if we want to keep and rename both the `x` and `y` column?
-One option is to supply a `Vector` of operation `Pair`s to `select`.
-`select` will process all of these operations in order.
-
-```julia
-julia> ["x" => "a", "y" => "b"]
-2-element Vector{Pair{String, String}}:
- "x" => "a"
- "y" => "b"
-
-julia> select(df, ["x" => "a", "y" => "b"])
-3×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     2      5
-   3 │     3      6
-```
-
-We can use broadcasting to simplify the syntax above.
-
-```julia
-julia> ["x", "y"] .=> ["a", "b"]
-2-element Vector{Pair{String, String}}:
- "x" => "a"
- "y" => "b"
-
-julia> select(df, ["x", "y"] .=> ["a", "b"])
-3×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     2      5
-   3 │     3      6
-```
-
-Notice that `select` sees the same `Vector{Pair{String, String}}` operation
-argument whether the individual pairs are written out explicitly or
-constructed with broadcasting.
-The broadcasting is applied before the call to `select`.
-
-```julia
-julia> ["x" => "a", "y" => "b"] == (["x", "y"] .=> ["a", "b"])
-true
-```
-
-!!! Note
-      These operation pairs (or vector of pairs) can be given variable names.
-      This is uncommon in practice but could be helpful for intermediate
-      inspection and testing.
-      ```julia
-      df = DataFrame(x = 1:3, y = 4:6)       # create data frame
-      operation = ["x", "y"] .=> ["a", "b"]  # save operation to variable
-      typeof(operation)                      # check type of operation
-      first(operation)                       # check first pair in operation
-      last(operation)                        # check last pair in operation
-      select(df, operation)                  # manipulate `df` with `operation`
-      ```
-
-In Julia,
-a non-vector broadcasted with a vector will be repeated in each resultant pair element.
-
-```julia
-julia> ["x", "y"] .=> :a    # :a is repeated
-2-element Vector{Pair{String, Symbol}}:
- "x" => :a
- "y" => :a
-
-julia> 1 .=> [:a, :b]       # 1 is repeated
-2-element Vector{Pair{Int64, Symbol}}:
- 1 => :a
- 1 => :b
-```
-
-We can use this fact to easily broadcast an `operation_function` to multiple columns.
-
-```julia
-julia> f(x) = 2 * x
-f (generic function with 1 method)
-
-julia> ["x", "y"] .=> f  # f is repeated
-2-element Vector{Pair{String, typeof(f)}}:
- "x" => f
- "y" => f
-
-julia> select(df, ["x", "y"] .=> f)  # apply f with automatic column renaming
-3×2 DataFrame
- Row │ x_f    y_f
-     │ Int64  Int64
-─────┼──────────────
-   1 │     2      8
-   2 │     4     10
-   3 │     6     12
-
-julia> ["x", "y"] .=> f .=> ["a", "b"]  # f is repeated
-2-element Vector{Pair{String, Pair{typeof(f), String}}}:
- "x" => (f => "a")
- "y" => (f => "b")
-
-julia> select(df, ["x", "y"] .=> f .=> ["a", "b"])  # apply f with manual column renaming
-3×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     2      8
-   2 │     4     10
-   3 │     6     12
-```
-
-A renaming function can be applied to multiple columns in the same way.
-It will also be repeated in each operation `Pair`.
-
-```julia
-julia> newname(s::String) = s * "_new"
-newname (generic function with 1 method)
-
-julia> ["x", "y"] .=> f .=> newname  # both f and newname are repeated
-2-element Vector{Pair{String, Pair{typeof(f), typeof(newname)}}}:
- "x" => (f => newname)
- "y" => (f => newname)
-
-julia> select(df, ["x", "y"] .=> f .=> newname)  # apply f then rename column with newname
-3×2 DataFrame
- Row │ x_new  y_new
-     │ Int64  Int64
-─────┼──────────────
-   1 │     2      8
-   2 │     4     10
-   3 │     6     12
-```
-
-You can see from the type output above
-that a three element pair does not actually exist.
-A `Pair` (as the name implies) can only contain two elements.
-Thus, `:x => :y => :z` becomes a nested `Pair`,
-where `:x` is the first element and points to the `Pair` `:y => :z`,
-which is the second element.
-
-```julia
-julia> p = :x => :y => :z
-:x => (:y => :z)
-
-julia> p[1]
-:x
-
-julia> p[2]
-:y => :z
-
-julia> p[2][1]
-:y
-
-julia> p[2][2]
-:z
-
-julia> p[3] # there is no index 3 for a pair
-ERROR: BoundsError: attempt to access Pair{Symbol, Pair{Symbol, Symbol}} at index [3]
-```
-
-In the previous examples, the source columns have been individually selected.
-When broadcasting multiple columns to the same function,
-often similarities in the column names or position can be exploited to avoid
-tedious selection.
-Consider a data frame with temperature data at three different locations
-taken over time.
-```julia
-julia> df = DataFrame(Time = 1:4,
-                      Temperature1 = [20, 23, 25, 28],
-                      Temperature2 = [33, 37, 41, 44],
-                      Temperature3 = [15, 10, 4, 0])
-4×4 DataFrame
- Row │ Time   Temperature1  Temperature2  Temperature3
-     │ Int64  Int64         Int64         Int64
-─────┼─────────────────────────────────────────────────
-   1 │     1            20            33            15
-   2 │     2            23            37            10
-   3 │     3            25            41             4
-   4 │     4            28            44             0
-```
-
-To convert all of the temperature data in one transformation,
-we just need to define a conversion function and broadcast
-it to all of the "Temperature" columns.
-
-```julia
-julia> celsius_to_kelvin(x) = x + 273
-celsius_to_kelvin (generic function with 1 method)
-
-julia> transform(
-           df,
-           Cols(r"Temp") .=> ByRow(celsius_to_kelvin),
-           renamecols = false
-       )
-4×4 DataFrame
- Row │ Time   Temperature1  Temperature2  Temperature3
-     │ Int64  Int64         Int64         Int64
-─────┼─────────────────────────────────────────────────
-   1 │     1           293           306           288
-   2 │     2           296           310           283
-   3 │     3           298           314           277
-   4 │     4           301           317           273
-```
-Or, simultaneously changing the column names:
-
-```julia
-julia> rename_function(s) = "Temperature $(last(s)) (K)"
-rename_function (generic function with 1 method)
-
-julia> select(
-           df,
-           "Time",
-           Cols(r"Temp") .=> ByRow(celsius_to_kelvin) .=> rename_function
-       )
-4×4 DataFrame
- Row │ Time   Temperature 1 (K)  Temperature 2 (K)  Temperature 3 (K)
-     │ Int64  Int64              Int64              Int64
-─────┼────────────────────────────────────────────────────────────────
-   1 │     1                293                306                288
-   2 │     2                296                310                283
-   3 │     3                298                314                277
-   4 │     4                301                317                273
-```
-
-!!! Note Notes
-      * `Not("Time")` or `2:4` would have been equally good choices for `source_column_selector` in the above operations.
-      * Don't forget `ByRow` if your function is to be applied to elements rather than entire column vectors.
-      Without `ByRow`, the manipulations above would have thrown
-      `ERROR: MethodError: no method matching +(::Vector{Int64}, ::Int64)`.
-      * Regular expression (`r""`) and `:` `source_column_selectors`
-      must be wrapped in `Cols` to be properly broadcasted
-      because otherwise the broadcasting occurs before the expression is expanded into a vector of matches.
-
-You could also broadcast different columns to different functions
-by supplying a vector of functions.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> f1(x) = x .+ 1
-f1 (generic function with 1 method)
-
-julia> f2(x) = x ./ 10
-f2 (generic function with 1 method)
-
-julia> transform(df, [:a, :b] .=> [f1, f2])
-4×4 DataFrame
- Row │ a      b      a_f1   b_f2
-     │ Int64  Int64  Int64  Float64
-─────┼──────────────────────────────
-   1 │     1      5      2      0.5
-   2 │     2      6      3      0.6
-   3 │     3      7      4      0.7
-   4 │     4      8      5      0.8
-```
-
-However, this form is not much more convenient than supplying
-multiple individual operations.
-
-```julia
-julia> transform(df, [:a => f1, :b => f2]) # same manipulation as previous
-4×4 DataFrame
- Row │ a      b      a_f1   b_f2
-     │ Int64  Int64  Int64  Float64
-─────┼──────────────────────────────
-   1 │     1      5      2      0.5
-   2 │     2      6      3      0.6
-   3 │     3      7      4      0.7
-   4 │     4      8      5      0.8
-```
-
-Perhaps more useful for broadcasting syntax
-is to apply multiple functions to multiple columns
-by changing the vector of functions to a 1-by-x matrix of functions.
-(Recall that a list, a vector, or a matrix of operation pairs are all valid
-for passing to the manipulation functions.)
-
-```julia
-julia> [:a, :b] .=> [f1 f2] # No comma `,` between f1 and f2
-2×2 Matrix{Pair{Symbol}}:
- :a=>f1  :a=>f2
- :b=>f1  :b=>f2
-
-julia> transform(df, [:a, :b] .=> [f1 f2]) # No comma `,` between f1 and f2
-4×6 DataFrame
- Row │ a      b      a_f1   b_f1   a_f2     b_f2
-     │ Int64  Int64  Int64  Int64  Float64  Float64
-─────┼──────────────────────────────────────────────
-   1 │     1      5      2      6      0.1      0.5
-   2 │     2      6      3      7      0.2      0.6
-   3 │     3      7      4      8      0.3      0.7
-   4 │     4      8      5      9      0.4      0.8
-```
-
-In this way, every combination of selected columns and functions will be applied.
-
-Pair broadcasting is a simple but powerful tool
-that can be used in any of the manipulation functions listed under
-[Basic Usage of Manipulation Functions](@ref).
-Experiment for yourself to discover other useful operations.
-
-#### More Manipulation Examples with the German Dataset
-
-Let us move to the examples of application of these rules using the German dataset.
+In DataFrames.jl there are seven functions
+which can be used to perform operations on data frame columns:
+
+- `combine`: creates a new data frame populated with columns that result from
+  operations applied to the source data frame columns, potentially combining
+  its rows;
+- `select`: creates a new data frame that has the same number of rows as the
+  source data frame populated with columns that result from operations
+  applied to the source data frame columns;
+- `select!`: the same as `select` but updates the passed data frame in place;
+- `transform`: the same as `select` but keeps the columns that were already
+  present in the data frame (note though that these columns can be potentially
+  modified by the transformation passed to `transform`);
+- `transform!`: the same as `transform` but updates the passed data frame in
+  place.
+- `subset`: creates a new data frame populated with the same columns
+as the source data frame, but with only the rows where the passed operations are true;
+- `subset!`: the same as `subset` but updates the passed data frame in place;
+
+These functions and their methods are explained in more detail in the section
+[Data Frame Manipulation Functions](@ref).
+In this section, we will move straight to examples using the German dataset.
 
 ```jldoctest dataframe
 julia> using Statistics
@@ -3443,26 +2151,5 @@ julia> select(german, :Age, :Job, [:Age, :Job] => (+) => :res)
             985 rows omitted
 ```
 
-This concludes the introductory explaination of data frame manipulations.
-For more advanced examples,
-see later sections of the manual or the additional resources below.
-
-#### Additional Resources
-More details and examples of operation pair syntax can be found in
-[this blog post](https://bkamins.github.io/julialang/2020/12/24/minilanguage.html).
-(The official wording describing the syntax has changed since the blog post was written,
-but the examples are still illustrative.
-The operation pair syntax is sometimes referred to as the DataFrames.jl mini-language
-or Domain-Specific Language.)
-
-For additional practice,
-an interactive tutorial is provided on a variety of introductory topics
-by the DataFrames.jl package author
-[here](https://github.com/bkamins/Julia-DataFrames-Tutorial).
-
-
-For additional syntax niceties,
-many users find the [Chain.jl](https://github.com/jkrumbiegel/Chain.jl)
-and [DataFramesMeta.jl](https://github.com/JuliaData/DataFramesMeta.jl)
-packages useful
-to help simplify manipulations that may be tedious with operation pairs alone.
\ No newline at end of file
+This concludes the introductory explanation of data frame manipulations.
+For more advanced examples, see later sections of the manual.
diff --git a/docs/src/man/manipulation_functions.md b/docs/src/man/manipulation_functions.md
new file mode 100644
index 000000000..da4fb1e63
--- /dev/null
+++ b/docs/src/man/manipulation_functions.md
@@ -0,0 +1,1345 @@
+# Data Frame Manipulation Functions
+
+The seven functions below can be used to manipulate data frames
+by applying operations to them.
+
+The functions without a `!` in their name
+will create a new data frame based on the source data frame,
+so you will probably want to store the new data frame to a new variable name,
+e.g. `new_df = transform(source_df, operation)`.
+The functions with a `!` at the end of their name
+will modify an existing data frame in-place,
+so there is typically no need to assign the result to a variable,
+e.g. `transform!(source_df, operation)` instead of
+`source_df = transform(source_df, operation)`.
+
+The number of columns and rows in the resultant data frame varies
+depending on the manipulation function employed.
+
+| Function     | Memory Usage                     | Column Retention                        | Row Retention                                       |
+| ------------ | -------------------------------- | --------------------------------------- | --------------------------------------------------- |
+| `transform`  | Creates a new data frame.        | Retains original and resultant columns. | Retains same number of rows as original data frame. |
+| `transform!` | Modifies an existing data frame. | Retains original and resultant columns. | Retains same number of rows as original data frame. |
+| `select`     | Creates a new data frame.        | Retains only resultant columns.         | Retains same number of rows as original data frame. |
+| `select!`    | Modifies an existing data frame. | Retains only resultant columns.         | Retains same number of rows as original data frame. |
+| `subset`     | Creates a new data frame.        | Retains original columns.               | Retains only rows where condition is true.          |
+| `subset!`    | Modifies an existing data frame. | Retains original columns.               | Retains only rows where condition is true.          |
+| `combine`    | Creates a new data frame.        | Retains only resultant columns.         | Retains only resultant rows.                        |
+
+## Constructing Operations
+
+All of the functions above use the same syntax which is commonly
+`manipulation_function(dataframe, operation)`.
+The `operation` argument defines the
+operation to be applied to the source `dataframe`,
+and it can take any of the following common forms explained below:
+
+`source_column_selector`
+: selects source column(s) without manipulating or renaming them
+
+   Examples: `:a`, `[:a, :b]`, `All()`, `Not(:a)`
+
+`source_column_selector => operation_function`
+: passes source column(s) as arguments to a function
+and automatically names the resulting column(s)
+
+   Examples: `:a => sum`, `[:a, :b] => +`, `:a => ByRow(==(3))`
+
+`source_column_selector => operation_function => new_column_names`
+: passes source column(s) as arguments to a function
+and names the resulting column(s) `new_column_names`
+
+   Examples: `:a => sum => :sum_of_a`, `[:a, :b] => + => :a_plus_b`
+
+   *(Not available for `subset` or `subset!`)*
+
+`source_column_selector => new_column_names`
+: renames a source column,
+or splits a column containing collection elements into multiple new columns
+
+   Examples: `:a => :new_a`, `:a_b => [:a, :b]`, `:nt => AsTable`
+
+   (*Not available for `subset` or `subset!`*)
+
+The `=>` operator constructs a
+[Pair](https://docs.julialang.org/en/v1/base/collections/#Core.Pair),
+which is a type to link one object to another.
+(Pairs are commonly used to create elements of a
+[Dictionary](https://docs.julialang.org/en/v1/base/collections/#Dictionaries).)
+In DataFrames.jl manipulation functions,
+`Pair` arguments are used to define column `operations` to be performed.
+The provided examples will be explained in more detail below.
+
+The manipulation functions also have methods for applying multiple operations.
+See the later sections [Multiple Operations per Manipulation](@ref)
+and [Broadcasting Operation Pairs](@ref) for more information.
+
+### `source_column_selector`
+Inside an `operation`, `source_column_selector` is usually a column name
+or column index which identifies a data frame column.
+
+`source_column_selector` may be used as the entire `operation`
+with `select` or `select!` to isolate or reorder columns.
+
+```julia
+julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6], c = [7, 8, 9])
+3×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      7
+   2 │     2      5      8
+   3 │     3      6      9
+
+julia> select(df, :b)
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+
+julia> select(df, "b")
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+
+julia> select(df, 2)
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+```
+
+`source_column_selector` may also be used as the entire `operation`
+with `subset` or `subset!` if the source column contains `Bool` values.
+
+```julia
+julia> df = DataFrame(
+           name = ["Scott", "Jill", "Erica", "Jimmy"],
+           minor = [false, true, false, true],
+       )
+4×2 DataFrame
+ Row │ name    minor
+     │ String  Bool
+─────┼───────────────
+   1 │ Scott   false
+   2 │ Jill     true
+   3 │ Erica   false
+   4 │ Jimmy    true
+
+julia> subset(df, :minor)
+2×2 DataFrame
+ Row │ name    minor
+     │ String  Bool
+─────┼───────────────
+   1 │ Jill     true
+   2 │ Jimmy    true
+```
+
+`source_column_selector` may instead be a collection of columns such as a vector,
+a [regular expression](https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions),
+a `Not`, `Between`, `All`, or `Cols` expression,
+or a `:`.
+See the [Indexing](@ref) API for the full list of possible values with references.
+
+!!! Note
+      The Julia parser sometimes prevents `:` from being used by itself.
+      If you get
+      `ERROR: syntax: whitespace not allowed after ":" used for quoting`,
+      try using `All()`, `Cols(:)`, or `(:)` instead to select all columns.
+
+```julia
+julia> df = DataFrame(
+           id = [1, 2, 3],
+           first_name = ["José", "Emma", "Nathan"],
+           last_name = ["Garcia", "Marino", "Boyer"],
+           age = [61, 24, 33]
+       )
+3×4 DataFrame
+ Row │ id     first_name  last_name  age
+     │ Int64  String      String     Int64
+─────┼─────────────────────────────────────
+   1 │     1  José        Garcia        61
+   2 │     2  Emma        Marino        24
+   3 │     3  Nathan      Boyer         33
+
+julia> select(df, [:last_name, :first_name])
+3×2 DataFrame
+ Row │ last_name  first_name
+     │ String     String
+─────┼───────────────────────
+   1 │ Garcia     José
+   2 │ Marino     Emma
+   3 │ Boyer      Nathan
+
+julia> select(df, r"name")
+3×2 DataFrame
+ Row │ first_name  last_name
+     │ String      String
+─────┼───────────────────────
+   1 │ José        Garcia
+   2 │ Emma        Marino
+   3 │ Nathan      Boyer
+
+julia> select(df, Not(:id))
+3×3 DataFrame
+ Row │ first_name  last_name  age
+     │ String      String     Int64
+─────┼──────────────────────────────
+   1 │ José        Garcia        61
+   2 │ Emma        Marino        24
+   3 │ Nathan      Boyer         33
+
+julia> select(df, Between(2,4))
+3×3 DataFrame
+ Row │ first_name  last_name  age
+     │ String      String     Int64
+─────┼──────────────────────────────
+   1 │ José        Garcia        61
+   2 │ Emma        Marino        24
+   3 │ Nathan      Boyer         33
+
+julia> df2 = DataFrame(
+           name = ["Scott", "Jill", "Erica", "Jimmy"],
+           minor = [false, true, false, true],
+           male = [true, false, false, true],
+       )
+4×3 DataFrame
+ Row │ name    minor  male
+     │ String  Bool   Bool
+─────┼──────────────────────
+   1 │ Scott   false   true
+   2 │ Jill     true  false
+   3 │ Erica   false  false
+   4 │ Jimmy    true   true
+
+julia> subset(df2, [:minor, :male])
+1×3 DataFrame
+ Row │ name    minor  male
+     │ String  Bool   Bool
+─────┼─────────────────────
+   1 │ Jimmy    true  true
+```
+
+### `operation_function`
+Inside an `operation` pair, `operation_function` is a function
+which operates on data frame columns passed as vectors.
+When multiple columns are selected by `source_column_selector`,
+the `operation_function` will receive the columns as separate positional arguments
+in the order they were selected, e.g. `f(column1, column2, column3)`.
+
+```julia
+julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 4])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      4
+
+julia> combine(df, :a => sum)
+1×1 DataFrame
+ Row │ a_sum
+     │ Int64
+─────┼───────
+   1 │     6
+
+julia> transform(df, :b => maximum) # `transform` and `select` copy scalar result to all rows
+3×3 DataFrame
+ Row │ a      b      b_maximum
+     │ Int64  Int64  Int64
+─────┼─────────────────────────
+   1 │     1      4          5
+   2 │     2      5          5
+   3 │     3      4          5
+
+julia> transform(df, [:b, :a] => -) # vector subtraction is okay
+3×3 DataFrame
+ Row │ a      b      b_a_-
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      3
+   2 │     2      5      3
+   3 │     3      4      1
+
+julia> transform(df, [:a, :b] => *) # vector multiplication is not defined
+ERROR: MethodError: no method matching *(::Vector{Int64}, ::Vector{Int64})
+```
+
+Don't worry! There is a quick fix for the previous error.
+If you want to apply a function to each element in a column
+instead of to the entire column vector,
+then you can wrap your element-wise function in `ByRow` like
+`ByRow(my_elementwise_function)`.
+This will apply `my_elementwise_function` to every element in the column
+and then collect the results back into a vector.
+
+```julia
+julia> transform(df, [:a, :b] => ByRow(*))
+3×3 DataFrame
+ Row │ a      b      a_b_*
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      4
+   2 │     2      5     10
+   3 │     3      4     12
+
+julia> transform(df, Cols(:) => ByRow(max))
+3×3 DataFrame
+ Row │ a      b      a_b_max
+     │ Int64  Int64  Int64
+─────┼───────────────────────
+   1 │     1      4        4
+   2 │     2      5        5
+   3 │     3      4        4
+
+julia> f(x) = x + 1
+f (generic function with 1 method)
+
+julia> transform(df, :a => ByRow(f))
+3×3 DataFrame
+ Row │ a      b      a_f
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      2
+   2 │     2      5      3
+   3 │     3      4      4
+```
+
+Alternatively, you may just want to define the function itself so it
+[broadcasts](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
+over vectors.
+
+```julia
+julia> g(x) = x .+ 1
+g (generic function with 1 method)
+
+julia> transform(df, :a => g)
+3×3 DataFrame
+ Row │ a      b      a_g
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      2
+   2 │     2      5      3
+   3 │     3      4      4
+
+julia> h(x, y) = 2x .+ y
+h (generic function with 1 method)
+
+julia> transform(df, [:a, :b] => h)
+3×3 DataFrame
+ Row │ a      b      a_b_h
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      6
+   2 │     2      5      9
+   3 │     3      4     10
+```
+
+[Anonymous functions](https://docs.julialang.org/en/v1/manual/functions/#man-anonymous-functions)
+are a convenient way to define and use an `operation_function`
+all within the manipulation function call.
+
+```julia
+julia> select(df, :a => ByRow(x -> x + 1))
+3×1 DataFrame
+ Row │ a_function
+     │ Int64
+─────┼────────────
+   1 │          2
+   2 │          3
+   3 │          4
+
+julia> transform(df, [:a, :b] => ByRow((x, y) -> 2x + y))
+3×3 DataFrame
+ Row │ a      b      a_b_function
+     │ Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      4             6
+   2 │     2      5             9
+   3 │     3      4            10
+
+julia> subset(df, :b => ByRow(x -> x < 5))
+2×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     3      4
+
+julia> subset(df, :b => ByRow(<(5))) # shorter version of the previous
+2×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     3      4
+```
+
+!!! Note
+    `operation_functions` within `subset` or `subset!` function calls
+    must return a Boolean vector.
+    `true` elements in the Boolean vector will determine
+    which rows are retained in the resulting data frame.
+
+As demonstrated above, `DataFrame` columns are usually passed
+from `source_column_selector` to `operation_function` as one or more
+vector arguments.
+However, when `AsTable(source_column_selector)` is used,
+the selected columns are collected and passed as a single `NamedTuple`
+to `operation_function`.
+
+This is often useful when your `operation_function` is defined to operate
+on a single collection argument rather than on multiple positional arguments.
+The distinction is somewhat similar to the difference between the built-in
+`min` and `minimum` functions.
+`min` is defined to find the minimum value among multiple positional arguments,
+while `minimum` is defined to find the minimum value
+among the elements of a single collection argument.
+
+```julia
+julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 2:-1:1)
+2×4 DataFrame
+ Row │ a      b      c      d
+     │ Int64  Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      3      5      2
+   2 │     2      4      6      1
+
+julia> select(df, Cols(:) => ByRow(min)) # min operates on multiple arguments
+2×1 DataFrame
+ Row │ a_b_etc_min
+     │ Int64
+─────┼─────────────
+   1 │           1
+   2 │           1
+
+julia> select(df, AsTable(:) => ByRow(minimum)) # minimum operates on a collection
+2×1 DataFrame
+ Row │ a_b_etc_minimum
+     │ Int64
+─────┼─────────────────
+   1 │               1
+   2 │               1
+
+julia> select(df, [:a,:b] => ByRow(+)) # `+` operates on a multiple arguments
+2×1 DataFrame
+ Row │ a_b_+
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     6
+
+julia> select(df, AsTable([:a,:b]) => ByRow(sum)) # `sum` operates on a collection
+2×1 DataFrame
+ Row │ a_b_sum
+     │ Int64
+─────┼─────────
+   1 │       4
+   2 │       6
+
+julia> using Statistics # contains the `mean` function
+
+julia> select(df, AsTable(Between(:b, :d)) => ByRow(mean)) # `mean` operates on a collection
+2×1 DataFrame
+ Row │ b_c_d_mean
+     │ Float64
+─────┼────────────
+   1 │    3.33333
+   2 │    3.66667
+```
+
+`AsTable` can also be used to pass columns to a function which operates
+on fields of a `NamedTuple`.
+
+```julia
+julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 7:8)
+2×4 DataFrame
+ Row │ a      b      c      d
+     │ Int64  Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      3      5      7
+   2 │     2      4      6      8
+
+julia> f(nt) = nt.a + nt.d
+f (generic function with 1 method)
+
+julia> transform(df, AsTable(:) => ByRow(f))
+2×5 DataFrame
+ Row │ a      b      c      d      a_b_etc_f
+     │ Int64  Int64  Int64  Int64  Int64
+─────┼───────────────────────────────────────
+   1 │     1      3      5      7          8
+   2 │     2      4      6      8         10
+```
+
+As demonstrated above,
+in the `source_column_selector => operation_function` operation pair form,
+the results of an operation will be placed into a new column with an
+automatically-generated name based on the operation;
+the new column name will be the `operation_function` name
+appended to the source column name(s) with an underscore.
+
+This automatic column naming behavior can be avoided in two ways.
+First, the operation result can be placed back into the original column
+with the original column name by switching the keyword argument `renamecols`
+from its default value (`true`) to `renamecols=false`.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, :a => ByRow(x->x+10), renamecols=false) # add 10 in-place
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │    11      5
+   2 │    12      6
+   3 │    13      7
+   4 │    14      8
+```
+
+The second method to avoid the default manipulation column naming is to
+specify your own `new_column_names`.
+
+### `new_column_names`
+
+`new_column_names` can be included at the end of an `operation` pair to specify
+the name of the new column(s).
+`new_column_names` may be a symbol, string, function, vector of symbols, vector of strings, or `AsTable`.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, Cols(:) => ByRow(+) => :c)
+4×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, Cols(:) => ByRow(+) => "a+b")
+4×3 DataFrame
+ Row │ a      b      a+b
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, :a => ByRow(x->x+10) => "a+10")
+4×3 DataFrame
+ Row │ a      b      a+10
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     11
+   2 │     2      6     12
+   3 │     3      7     13
+   4 │     4      8     14
+```
+
+The `source_column_selector => new_column_names` operation form
+can be used to rename columns without an intermediate function.
+However, there are `rename` and `rename!` functions,
+which accept similar syntax,
+that tend to be more useful for this operation.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, :a => :apple) # adds column `apple`
+4×3 DataFrame
+ Row │ a      b      apple
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      1
+   2 │     2      6      2
+   3 │     3      7      3
+   4 │     4      8      4
+
+julia> select(df, :a => :apple) # retains only column `apple`
+4×1 DataFrame
+ Row │ apple
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+   4 │     4
+
+julia> rename(df, :a => :apple) # renames column `a` to `apple` in-place
+4×2 DataFrame
+ Row │ apple  b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+```
+
+Additionally, in the
+`source_column_selector => operation_function => new_column_names` operation form,
+`new_column_names` may be a renaming function which operates on a string
+to create the destination column names programmatically.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> add_prefix(s) = "new_" * s
+add_prefix (generic function with 1 method)
+
+julia> transform(df, :a => (x -> 10 .* x) => add_prefix) # with named renaming function
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     10
+   2 │     2      6     20
+   3 │     3      7     30
+   4 │     4      8     40
+
+julia> transform(df, :a => (x -> 10 .* x) => (s -> "new_" * s)) # with anonymous renaming function
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     10
+   2 │     2      6     20
+   3 │     3      7     30
+   4 │     4      8     40
+```
+
+!!! Note
+      It is a good idea to wrap anonymous functions in parentheses
+      to avoid the `=>` operator accidently becoming part of the anonymous function.
+      The examples above do not work correctly without the parentheses!
+      ```julia
+      julia> transform(df, :a => x -> 10 .* x => add_prefix)  # Not what we wanted!
+      4×3 DataFrame
+       Row │ a      b      a_function
+           │ Int64  Int64  Pair…
+      ─────┼────────────────────────────────────────────
+         1 │     1      5  [10, 20, 30, 40]=>add_prefix
+         2 │     2      6  [10, 20, 30, 40]=>add_prefix
+         3 │     3      7  [10, 20, 30, 40]=>add_prefix
+         4 │     4      8  [10, 20, 30, 40]=>add_prefix
+
+      julia> transform(df, :a => x -> 10 .* x => s -> "new_" * s)  # Not what we wanted!
+      4×3 DataFrame
+       Row │ a      b      a_function
+           │ Int64  Int64  Pair…
+      ─────┼─────────────────────────────────────
+         1 │     1      5  [10, 20, 30, 40]=>#18
+         2 │     2      6  [10, 20, 30, 40]=>#18
+         3 │     3      7  [10, 20, 30, 40]=>#18
+         4 │     4      8  [10, 20, 30, 40]=>#18
+      ```
+
+A renaming function will not work in the
+`source_column_selector => new_column_names` operation form
+because a function in the second element of the operation pair is assumed to take
+the `source_column_selector => operation_function` operation form.
+To work around this limitation, use the
+`source_column_selector => operation_function => new_column_names` operation form
+with `identity` as the `operation_function`.
+
+```julia
+julia> transform(df, :a => add_prefix)
+ERROR: MethodError: no method matching *(::String, ::Vector{Int64})
+
+julia> transform(df, :a => identity => add_prefix)
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      1
+   2 │     2      6      2
+   3 │     3      7      3
+   4 │     4      8      4
+```
+
+In this case though,
+it is probably again more useful to use the `rename` or `rename!` function
+rather than one of the manipulation functions
+in order to rename in-place and avoid the intermediate `operation_function`.
+```julia
+julia> rename(add_prefix, df)  # rename all columns with a function
+4×2 DataFrame
+ Row │ new_a  new_b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> rename(add_prefix, df; cols=:a)  # rename some columns with a function
+4×2 DataFrame
+ Row │ new_a  b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+```
+
+In the `source_column_selector => new_column_names` operation form,
+only a single source column may be selected per operation,
+so why is `new_column_names` plural?
+It is possible to split the data contained inside a single column
+into multiple new columns by supplying a vector of strings or symbols
+as `new_column_names`.
+
+```julia
+julia> df = DataFrame(data = [(1,2), (3,4)]) # vector of tuples
+2×1 DataFrame
+ Row │ data
+     │ Tuple…
+─────┼────────
+   1 │ (1, 2)
+   2 │ (3, 4)
+
+julia> transform(df, :data => [:first, :second]) # manual naming
+2×3 DataFrame
+ Row │ data    first  second
+     │ Tuple…  Int64  Int64
+─────┼───────────────────────
+   1 │ (1, 2)      1       2
+   2 │ (3, 4)      3       4
+```
+
+This kind of data splitting can even be done automatically with `AsTable`.
+
+```julia
+julia> transform(df, :data => AsTable) # default automatic naming with tuples
+2×3 DataFrame
+ Row │ data    x1     x2
+     │ Tuple…  Int64  Int64
+─────┼──────────────────────
+   1 │ (1, 2)      1      2
+   2 │ (3, 4)      3      4
+```
+
+If a data frame column contains `NamedTuple`s,
+then `AsTable` will preserve the field names.
+```julia
+julia> df = DataFrame(data = [(a=1,b=2), (a=3,b=4)]) # vector of named tuples
+2×1 DataFrame
+ Row │ data
+     │ NamedTup…
+─────┼────────────────
+   1 │ (a = 1, b = 2)
+   2 │ (a = 3, b = 4)
+
+julia> transform(df, :data => AsTable) # keeps names from named tuples
+2×3 DataFrame
+ Row │ data            a      b
+     │ NamedTup…       Int64  Int64
+─────┼──────────────────────────────
+   1 │ (a = 1, b = 2)      1      2
+   2 │ (a = 3, b = 4)      3      4
+```
+
+!!! Note
+      To pack multiple columns into a single column of `NamedTuple`s
+      (reverse of the above operation)
+      apply the `identity` function `ByRow`, e.g.
+      `transform(df, AsTable([:a, :b]) => ByRow(identity) => :data)`.
+
+Renaming functions also work for multi-column transformations,
+but they must operate on a vector of strings.
+
+```julia
+julia> df = DataFrame(data = [(1,2), (3,4)])
+2×1 DataFrame
+ Row │ data
+     │ Tuple…
+─────┼────────
+   1 │ (1, 2)
+   2 │ (3, 4)
+
+julia> new_names(v) = ["primary ", "secondary "] .* v
+new_names (generic function with 1 method)
+
+julia> transform(df, :data => identity => new_names)
+2×3 DataFrame
+ Row │ data    primary data  secondary data
+     │ Tuple…  Int64         Int64
+─────┼──────────────────────────────────────
+   1 │ (1, 2)             1               2
+   2 │ (3, 4)             3               4
+```
+
+## Applying Multiple Operations per Manipulation
+All data frame manipulation functions can accept multiple `operation` pairs
+at once using any of the following methods:
+- `manipulation_function(dataframe, operation1, operation2)`   : multiple arguments
+- `manipulation_function(dataframe, [operation1, operation2])` : vector argument
+- `manipulation_function(dataframe, [operation1 operation2])`  : matrix argument
+
+Passing multiple operations is especially useful for the `select`, `select!`,
+and `combine` manipulation functions,
+since they only retain columns which are a result of the passed operations.
+
+```julia
+julia> df = DataFrame(a = 1:4, b = [50,50,60,60], c = ["hat","bat","cat","dog"])
+4×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │     1     50  hat
+   2 │     2     50  bat
+   3 │     3     60  cat
+   4 │     4     60  dog
+
+julia> combine(df, :a => maximum, :b => sum, :c => join) # 3 combine operations
+1×3 DataFrame
+ Row │ a_maximum  b_sum  c_join
+     │ Int64      Int64  String
+─────┼────────────────────────────────
+   1 │         4    220  hatbatcatdog
+
+julia> select(df, :c, :b, :a) # re-order columns
+4×3 DataFrame
+ Row │ c       b      a
+     │ String  Int64  Int64
+─────┼──────────────────────
+   1 │ hat        50      1
+   2 │ bat        50      2
+   3 │ cat        60      3
+   4 │ dog        60      4
+
+ulia> select(df, :b, :) # `:` here means all other columns
+4×3 DataFrame
+ Row │ b      a      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │    50      1  hat
+   2 │    50      2  bat
+   3 │    60      3  cat
+   4 │    60      4  dog
+
+julia> select(
+           df,
+           :c => (x -> "a " .* x) => :one_c,
+           :a => (x -> 100x),
+           :b,
+           renamecols=false
+       ) # can mix operation forms
+4×3 DataFrame
+ Row │ one_c   a      b
+     │ String  Int64  Int64
+─────┼──────────────────────
+   1 │ a hat     100     50
+   2 │ a bat     200     50
+   3 │ a cat     300     60
+   4 │ a dog     400     60
+
+julia> select(
+           df,
+           :c => ByRow(reverse),
+           :c => ByRow(uppercase)
+       ) # multiple operations on same column
+4×2 DataFrame
+ Row │ c_reverse  c_uppercase
+     │ String     String
+─────┼────────────────────────
+   1 │ tah        HAT
+   2 │ tab        BAT
+   3 │ tac        CAT
+   4 │ god        DOG
+```
+
+In the last two examples,
+the manipulation function arguments were split across multiple lines.
+This is a good way to make manipulations with many operations more readable.
+
+Passing multiple operations to `subset` or `subset!` is an easy way to narrow in
+on a particular row of data.
+
+```julia
+julia> subset(
+           df,
+           :b => ByRow(==(60)),
+           :c => ByRow(contains("at"))
+       ) # rows with 60 and "at"
+1×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │     3     60  cat
+```
+
+Note that all operations within a single manipulation must use the data
+as it existed before the function call
+i.e. you cannot use newly created columns for subsequent operations
+within the same manipulation.
+
+```julia
+julia> transform(
+           df,
+           [:a, :b] => ByRow(+) => :d,
+           :d => (x -> x ./ 2),
+       ) # requires two separate transformations
+ERROR: ArgumentError: column name :d not found in the data frame; existing most similar names are: :a, :b and :c
+
+julia> new_df = transform(df, [:a, :b] => ByRow(+) => :d)
+4×4 DataFrame
+ Row │ a      b      c       d
+     │ Int64  Int64  String  Int64
+─────┼─────────────────────────────
+   1 │     1     50  hat        51
+   2 │     2     50  bat        52
+   3 │     3     60  cat        63
+   4 │     4     60  dog        64
+
+julia> transform!(new_df, :d => (x -> x ./ 2) => :d_2)
+4×5 DataFrame
+ Row │ a      b      c       d      d_2
+     │ Int64  Int64  String  Int64  Float64
+─────┼──────────────────────────────────────
+   1 │     1     50  hat        51     25.5
+   2 │     2     50  bat        52     26.0
+   3 │     3     60  cat        63     31.5
+   4 │     4     60  dog        64     32.0
+```
+
+
+## Broadcasting Operation Pairs
+
+[Broadcasting](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
+pairs with `.=>` is often a convenient way to generate multiple
+similar `operation`s to be applied within a single manipulation.
+Broadcasting within the `Pair` of an `operation` is no different than
+broadcasting in base Julia.
+The broadcasting `.=>` will be expanded into a vector of pairs
+(`[operation1, operation2, ...]`),
+and this expansion will occur before the manipulation function is invoked.
+Then the manipulation function will use the
+`manipulation_function(dataframe, [operation1, operation2, ...])` method.
+This process will be explained in more detail below.
+
+To illustrate these concepts, let us first examine the `Type` of a basic `Pair`.
+In DataFrames.jl, a symbol, string, or integer
+may be used to select a single column.
+Some `Pair`s with these types are below.
+
+```julia
+julia> typeof(:x => :a)
+Pair{Symbol, Symbol}
+
+julia> typeof("x" => "a")
+Pair{String, String}
+
+julia> typeof(1 => "a")
+Pair{Int64, String}
+```
+
+Any of the `Pair`s above could be used to rename the first column
+of the data frame below to `a`.
+
+```julia
+julia> df = DataFrame(x = 1:3, y = 4:6)
+3×2 DataFrame
+ Row │ x      y
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+
+julia> select(df, :x => :a)
+3×1 DataFrame
+ Row │ a
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+
+julia> select(df, 1 => "a")
+3×1 DataFrame
+ Row │ a
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+```
+
+What should we do if we want to keep and rename both the `x` and `y` column?
+One option is to supply a `Vector` of operation `Pair`s to `select`.
+`select` will process all of these operations in order.
+
+```julia
+julia> ["x" => "a", "y" => "b"]
+2-element Vector{Pair{String, String}}:
+ "x" => "a"
+ "y" => "b"
+
+julia> select(df, ["x" => "a", "y" => "b"])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+```
+
+We can use broadcasting to simplify the syntax above.
+
+```julia
+julia> ["x", "y"] .=> ["a", "b"]
+2-element Vector{Pair{String, String}}:
+ "x" => "a"
+ "y" => "b"
+
+julia> select(df, ["x", "y"] .=> ["a", "b"])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+```
+
+Notice that `select` sees the same `Vector{Pair{String, String}}` operation
+argument whether the individual pairs are written out explicitly or
+constructed with broadcasting.
+The broadcasting is applied before the call to `select`.
+
+```julia
+julia> ["x" => "a", "y" => "b"] == (["x", "y"] .=> ["a", "b"])
+true
+```
+
+!!! Note
+      These operation pairs (or vector of pairs) can be given variable names.
+      This is uncommon in practice but could be helpful for intermediate
+      inspection and testing.
+      ```julia
+      df = DataFrame(x = 1:3, y = 4:6)       # create data frame
+      operation = ["x", "y"] .=> ["a", "b"]  # save operation to variable
+      typeof(operation)                      # check type of operation
+      first(operation)                       # check first pair in operation
+      last(operation)                        # check last pair in operation
+      select(df, operation)                  # manipulate `df` with `operation`
+      ```
+
+In Julia,
+a non-vector broadcasted with a vector will be repeated in each resultant pair element.
+
+```julia
+julia> ["x", "y"] .=> :a    # :a is repeated
+2-element Vector{Pair{String, Symbol}}:
+ "x" => :a
+ "y" => :a
+
+julia> 1 .=> [:a, :b]       # 1 is repeated
+2-element Vector{Pair{Int64, Symbol}}:
+ 1 => :a
+ 1 => :b
+```
+
+We can use this fact to easily broadcast an `operation_function` to multiple columns.
+
+```julia
+julia> f(x) = 2 * x
+f (generic function with 1 method)
+
+julia> ["x", "y"] .=> f  # f is repeated
+2-element Vector{Pair{String, typeof(f)}}:
+ "x" => f
+ "y" => f
+
+julia> select(df, ["x", "y"] .=> f)  # apply f with automatic column renaming
+3×2 DataFrame
+ Row │ x_f    y_f
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
+
+julia> ["x", "y"] .=> f .=> ["a", "b"]  # f is repeated
+2-element Vector{Pair{String, Pair{typeof(f), String}}}:
+ "x" => (f => "a")
+ "y" => (f => "b")
+
+julia> select(df, ["x", "y"] .=> f .=> ["a", "b"])  # apply f with manual column renaming
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
+```
+
+A renaming function can be applied to multiple columns in the same way.
+It will also be repeated in each operation `Pair`.
+
+```julia
+julia> newname(s::String) = s * "_new"
+newname (generic function with 1 method)
+
+julia> ["x", "y"] .=> f .=> newname  # both f and newname are repeated
+2-element Vector{Pair{String, Pair{typeof(f), typeof(newname)}}}:
+ "x" => (f => newname)
+ "y" => (f => newname)
+
+julia> select(df, ["x", "y"] .=> f .=> newname)  # apply f then rename column with newname
+3×2 DataFrame
+ Row │ x_new  y_new
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
+```
+
+You can see from the type output above
+that a three element pair does not actually exist.
+A `Pair` (as the name implies) can only contain two elements.
+Thus, `:x => :y => :z` becomes a nested `Pair`,
+where `:x` is the first element and points to the `Pair` `:y => :z`,
+which is the second element.
+
+```julia
+julia> p = :x => :y => :z
+:x => (:y => :z)
+
+julia> p[1]
+:x
+
+julia> p[2]
+:y => :z
+
+julia> p[2][1]
+:y
+
+julia> p[2][2]
+:z
+
+julia> p[3] # there is no index 3 for a pair
+ERROR: BoundsError: attempt to access Pair{Symbol, Pair{Symbol, Symbol}} at index [3]
+```
+
+In the previous examples, the source columns have been individually selected.
+When broadcasting multiple columns to the same function,
+often similarities in the column names or position can be exploited to avoid
+tedious selection.
+Consider a data frame with temperature data at three different locations
+taken over time.
+```julia
+julia> df = DataFrame(Time = 1:4,
+                      Temperature1 = [20, 23, 25, 28],
+                      Temperature2 = [33, 37, 41, 44],
+                      Temperature3 = [15, 10, 4, 0])
+4×4 DataFrame
+ Row │ Time   Temperature1  Temperature2  Temperature3
+     │ Int64  Int64         Int64         Int64
+─────┼─────────────────────────────────────────────────
+   1 │     1            20            33            15
+   2 │     2            23            37            10
+   3 │     3            25            41             4
+   4 │     4            28            44             0
+```
+
+To convert all of the temperature data in one transformation,
+we just need to define a conversion function and broadcast
+it to all of the "Temperature" columns.
+
+```julia
+julia> celsius_to_kelvin(x) = x + 273
+celsius_to_kelvin (generic function with 1 method)
+
+julia> transform(
+           df,
+           Cols(r"Temp") .=> ByRow(celsius_to_kelvin),
+           renamecols = false
+       )
+4×4 DataFrame
+ Row │ Time   Temperature1  Temperature2  Temperature3
+     │ Int64  Int64         Int64         Int64
+─────┼─────────────────────────────────────────────────
+   1 │     1           293           306           288
+   2 │     2           296           310           283
+   3 │     3           298           314           277
+   4 │     4           301           317           273
+```
+Or, simultaneously changing the column names:
+
+```julia
+julia> rename_function(s) = "Temperature $(last(s)) (K)"
+rename_function (generic function with 1 method)
+
+julia> select(
+           df,
+           "Time",
+           Cols(r"Temp") .=> ByRow(celsius_to_kelvin) .=> rename_function
+       )
+4×4 DataFrame
+ Row │ Time   Temperature 1 (K)  Temperature 2 (K)  Temperature 3 (K)
+     │ Int64  Int64              Int64              Int64
+─────┼────────────────────────────────────────────────────────────────
+   1 │     1                293                306                288
+   2 │     2                296                310                283
+   3 │     3                298                314                277
+   4 │     4                301                317                273
+```
+
+!!! Note Notes
+      * `Not("Time")` or `2:4` would have been equally good choices for `source_column_selector` in the above operations.
+      * Don't forget `ByRow` if your function is to be applied to elements rather than entire column vectors.
+      Without `ByRow`, the manipulations above would have thrown
+      `ERROR: MethodError: no method matching +(::Vector{Int64}, ::Int64)`.
+      * Regular expression (`r""`) and `:` `source_column_selectors`
+      must be wrapped in `Cols` to be properly broadcasted
+      because otherwise the broadcasting occurs before the expression is expanded into a vector of matches.
+
+You could also broadcast different columns to different functions
+by supplying a vector of functions.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> f1(x) = x .+ 1
+f1 (generic function with 1 method)
+
+julia> f2(x) = x ./ 10
+f2 (generic function with 1 method)
+
+julia> transform(df, [:a, :b] .=> [f1, f2])
+4×4 DataFrame
+ Row │ a      b      a_f1   b_f2
+     │ Int64  Int64  Int64  Float64
+─────┼──────────────────────────────
+   1 │     1      5      2      0.5
+   2 │     2      6      3      0.6
+   3 │     3      7      4      0.7
+   4 │     4      8      5      0.8
+```
+
+However, this form is not much more convenient than supplying
+multiple individual operations.
+
+```julia
+julia> transform(df, [:a => f1, :b => f2]) # same manipulation as previous
+4×4 DataFrame
+ Row │ a      b      a_f1   b_f2
+     │ Int64  Int64  Int64  Float64
+─────┼──────────────────────────────
+   1 │     1      5      2      0.5
+   2 │     2      6      3      0.6
+   3 │     3      7      4      0.7
+   4 │     4      8      5      0.8
+```
+
+Perhaps more useful for broadcasting syntax
+is to apply multiple functions to multiple columns
+by changing the vector of functions to a 1-by-x matrix of functions.
+(Recall that a list, a vector, or a matrix of operation pairs are all valid
+for passing to the manipulation functions.)
+
+```julia
+julia> [:a, :b] .=> [f1 f2] # No comma `,` between f1 and f2
+2×2 Matrix{Pair{Symbol}}:
+ :a=>f1  :a=>f2
+ :b=>f1  :b=>f2
+
+julia> transform(df, [:a, :b] .=> [f1 f2]) # No comma `,` between f1 and f2
+4×6 DataFrame
+ Row │ a      b      a_f1   b_f1   a_f2     b_f2
+     │ Int64  Int64  Int64  Int64  Float64  Float64
+─────┼──────────────────────────────────────────────
+   1 │     1      5      2      6      0.1      0.5
+   2 │     2      6      3      7      0.2      0.6
+   3 │     3      7      4      8      0.3      0.7
+   4 │     4      8      5      9      0.4      0.8
+```
+
+In this way, every combination of selected columns and functions will be applied.
+
+Pair broadcasting is a simple but powerful tool
+that can be used in any of the manipulation functions listed under
+[Basic Usage of Manipulation Functions](@ref).
+Experiment for yourself to discover other useful operations.
+
+## Additional Resources
+More details and examples of operation pair syntax can be found in
+[this blog post](https://bkamins.github.io/julialang/2020/12/24/minilanguage.html).
+(The official wording describing the syntax has changed since the blog post was written,
+but the examples are still illustrative.
+The operation pair syntax is sometimes referred to as the DataFrames.jl mini-language
+or Domain-Specific Language.)
+
+For additional practice,
+an interactive tutorial is provided on a variety of introductory topics
+by the DataFrames.jl package author
+[here](https://github.com/bkamins/Julia-DataFrames-Tutorial).
+
+
+For additional syntax niceties,
+many users find the [Chain.jl](https://github.com/jkrumbiegel/Chain.jl)
+and [DataFramesMeta.jl](https://github.com/JuliaData/DataFramesMeta.jl)
+packages useful
+to help simplify manipulations that may be tedious with operation pairs alone.
\ No newline at end of file

From 46363d9a1a075160d404705e89c0c5a35c671bf4 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Fri, 29 Sep 2023 11:11:44 -0400
Subject: [PATCH 16/29] Add new file to make and index

---
 docs/make.jl      | 1 +
 docs/src/index.md | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/make.jl b/docs/make.jl
index fa64782da..c35d55b0b 100644
--- a/docs/make.jl
+++ b/docs/make.jl
@@ -26,6 +26,7 @@ makedocs(
             "Working with DataFrames" => "man/working_with_dataframes.md",
             "Importing and Exporting Data (I/O)" => "man/importing_and_exporting.md",
             "Joins" => "man/joins.md",
+            "Data Frame Manipulation Functions" => "man/manipulation_functions.md",
             "Split-apply-combine" => "man/split_apply_combine.md",
             "Reshaping" => "man/reshaping_and_pivoting.md",
             "Sorting" => "man/sorting.md",
diff --git a/docs/src/index.md b/docs/src/index.md
index 1d7511908..ea8697e9b 100644
--- a/docs/src/index.md
+++ b/docs/src/index.md
@@ -218,6 +218,7 @@ page](https://github.com/JuliaData/DataFrames.jl/releases).
 Pages = ["man/basics.md",
          "man/getting_started.md",
          "man/joins.md",
+         "man/manipulation_functions.md",
          "man/split_apply_combine.md",
          "man/reshaping_and_pivoting.md",
          "man/sorting.md",
@@ -277,7 +278,7 @@ missing please kindly report an issue
     during which it is deprecated. The situations where such a breaking change
     might be allowed are (still such breaking changes will be avoided if
     possible):
-    
+
     * the affected functionality was previously clearly identified in the
       documentation as being subject to changes (for example in DataFrames.jl 1.4
       release propagation rules of `:note`-style metadata are documented as such);

From cd4c539e08beca0ddfe07a3ce322c07eae628f3d Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Fri, 29 Sep 2023 11:25:37 -0400
Subject: [PATCH 17/29] Rewrite Basics.md conclusion

---
 docs/src/man/basics.md | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index daadf7a00..5980083d0 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -2151,5 +2151,9 @@ julia> select(german, :Age, :Job, [:Age, :Job] => (+) => :res)
             985 rows omitted
 ```
 
-This concludes the introductory explanation of data frame manipulations.
-For more advanced examples, see later sections of the manual.
+This concludes the introductory examples of data frame manipulations.
+See later sections of the manual,
+particularly [Data Frame Manipulation Functions](@ref),
+for additional explanations and functionality,
+including how to broadcast operation functions and operation pairs
+and how to pass or produce multiple columns using `AsTable`.

From d70af831ca01d77f63fbaec52141980de6c68874 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Mon, 2 Oct 2023 15:27:17 -0400
Subject: [PATCH 18/29] Review Edits Round 2

---
 docs/make.jl                           |   2 +-
 docs/src/index.md                      |   8 +-
 docs/src/man/basics.md                 |  13 +++-
 docs/src/man/manipulation_functions.md | 100 +++++++++++++++++++++++--
 4 files changed, 110 insertions(+), 13 deletions(-)

diff --git a/docs/make.jl b/docs/make.jl
index c35d55b0b..d854981e2 100644
--- a/docs/make.jl
+++ b/docs/make.jl
@@ -26,7 +26,6 @@ makedocs(
             "Working with DataFrames" => "man/working_with_dataframes.md",
             "Importing and Exporting Data (I/O)" => "man/importing_and_exporting.md",
             "Joins" => "man/joins.md",
-            "Data Frame Manipulation Functions" => "man/manipulation_functions.md",
             "Split-apply-combine" => "man/split_apply_combine.md",
             "Reshaping" => "man/reshaping_and_pivoting.md",
             "Sorting" => "man/sorting.md",
@@ -35,6 +34,7 @@ makedocs(
             "Data manipulation frameworks" => "man/querying_frameworks.md",
             "Comparison with Python/R/Stata" => "man/comparisons.md"
         ],
+        "A Gentle Introduction to Data Frame Manipulation Functions" => "man/manipulation_functions.md",
         "API" => Any[
             "Types" => "lib/types.md",
             "Functions" => "lib/functions.md",
diff --git a/docs/src/index.md b/docs/src/index.md
index ea8697e9b..78c9ecd92 100644
--- a/docs/src/index.md
+++ b/docs/src/index.md
@@ -218,7 +218,6 @@ page](https://github.com/JuliaData/DataFrames.jl/releases).
 Pages = ["man/basics.md",
          "man/getting_started.md",
          "man/joins.md",
-         "man/manipulation_functions.md",
          "man/split_apply_combine.md",
          "man/reshaping_and_pivoting.md",
          "man/sorting.md",
@@ -229,6 +228,13 @@ Pages = ["man/basics.md",
 Depth = 2
 ```
 
+## A Gentle Introduction to Data Frame Manipulation Functions
+
+```@contents
+Pages = ["man/manipulation_functions.md"]
+Depth = 1
+```
+
 ## API
 
 Only exported (i.e. available for use without `DataFrames.` qualifier after
diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 5980083d0..7f77d555b 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -1586,9 +1586,14 @@ which can be used to perform operations on data frame columns:
 as the source data frame, but with only the rows where the passed operations are true;
 - `subset!`: the same as `subset` but updates the passed data frame in place;
 
-These functions and their methods are explained in more detail in the section
-[Data Frame Manipulation Functions](@ref).
-In this section, we will move straight to examples using the German dataset.
+!!! Note Other Resources
+    * For formal, comprehensive explanations of all manipulation functions,
+    see the [Functions](@ref) API.
+
+    * For an informal, long-form tutorial on these functions,
+    see [A Gentle Introduction to Data Frame Manipulation Functions](@ref).
+
+Let us now move straight to examples using the German dataset.
 
 ```jldoctest dataframe
 julia> using Statistics
@@ -2153,7 +2158,7 @@ julia> select(german, :Age, :Job, [:Age, :Job] => (+) => :res)
 
 This concludes the introductory examples of data frame manipulations.
 See later sections of the manual,
-particularly [Data Frame Manipulation Functions](@ref),
+particularly [A Gentle Introduction to Data Frame Manipulation Functions](@ref),
 for additional explanations and functionality,
 including how to broadcast operation functions and operation pairs
 and how to pass or produce multiple columns using `AsTable`.
diff --git a/docs/src/man/manipulation_functions.md b/docs/src/man/manipulation_functions.md
index da4fb1e63..db62e7adb 100644
--- a/docs/src/man/manipulation_functions.md
+++ b/docs/src/man/manipulation_functions.md
@@ -1,7 +1,10 @@
-# Data Frame Manipulation Functions
+# A Gentle Introduction to Data Frame Manipulation Functions
 
 The seven functions below can be used to manipulate data frames
 by applying operations to them.
+This section of the documentation aims to methodically build understanding
+of these functions and their possible arguments
+by reinforcing foundational concepts and slowly increasing complexity.
 
 The functions without a `!` in their name
 will create a new data frame based on the source data frame,
@@ -68,11 +71,11 @@ which is a type to link one object to another.
 [Dictionary](https://docs.julialang.org/en/v1/base/collections/#Dictionaries).)
 In DataFrames.jl manipulation functions,
 `Pair` arguments are used to define column `operations` to be performed.
-The provided examples will be explained in more detail below.
+The examples shown above will be explained in more detail later.
 
-The manipulation functions also have methods for applying multiple operations.
+*The manipulation functions also have methods for applying multiple operations.
 See the later sections [Multiple Operations per Manipulation](@ref)
-and [Broadcasting Operation Pairs](@ref) for more information.
+and [Broadcasting Operation Pairs](@ref) for more information.*
 
 ### `source_column_selector`
 Inside an `operation`, `source_column_selector` is usually a column name
@@ -494,6 +497,8 @@ This automatic column naming behavior can be avoided in two ways.
 First, the operation result can be placed back into the original column
 with the original column name by switching the keyword argument `renamecols`
 from its default value (`true`) to `renamecols=false`.
+This option prevents the function name from being appended to the column name
+as it usually would be.
 
 ```julia
 julia> df = DataFrame(a=1:4, b=5:8)
@@ -616,9 +621,90 @@ julia> rename(df, :a => :apple) # renames column `a` to `apple` in-place
    4 │     4      8
 ```
 
-Additionally, in the
-`source_column_selector => operation_function => new_column_names` operation form,
-`new_column_names` may be a renaming function which operates on a string
+If `new_column_names` already exist in the source data frame,
+those columns will be replaced in the existing column location
+rather than being added to the end.
+This can be done by manually specifying an existing column name
+or by using the `renamecols=false` keyword argument.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, :b => (x -> x .+ 10))  # automatic new column and column name
+4×3 DataFrame
+ Row │ a      b      b_function
+     │ Int64  Int64  Int64
+─────┼──────────────────────────
+   1 │     1      5          15
+   2 │     2      6          16
+   3 │     3      7          17
+   4 │     4      8          18
+
+julia> transform(df, :b => (x -> x .+ 10), renamecols=false)  # transform column in-place
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1     15
+   2 │     2     16
+   3 │     3     17
+   4 │     4     18
+
+julia> transform(df, :b => (x -> x .+ 10) => :a)  # replace column :a
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │    15      5
+   2 │    16      6
+   3 │    17      7
+   4 │    18      8
+```
+
+Actually, `renamecols=false` just prevents the function name from being appended to the final column name such that the operation is *usually* returned to the same column.
+
+```julia
+julia> transform(df, [:a, :b] => +)  # new column name is all source columns and function name
+4×3 DataFrame
+ Row │ a      b      a_b_+
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, [:a, :b] => +, renamecols=false)  # same as above but with no function name
+4×3 DataFrame
+ Row │ a      b      a_b
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, [:a, :b] => (+) => :a)  # manually overwrite column :a (see Note below about parentheses)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     6      5
+   2 │     8      6
+   3 │    10      7
+   4 │    12      8
+```
+
+In the `source_column_selector => operation_function => new_column_names` operation form,
+`new_column_names` may also be a renaming function which operates on a string
 to create the destination column names programmatically.
 
 ```julia

From 6377441dec4ea67082fe259d10025a540ec8395b Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Mon, 2 Oct 2023 16:30:25 -0400
Subject: [PATCH 19/29] Fix reference?

---
 docs/src/man/manipulation_functions.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/src/man/manipulation_functions.md b/docs/src/man/manipulation_functions.md
index db62e7adb..cabda4fd7 100644
--- a/docs/src/man/manipulation_functions.md
+++ b/docs/src/man/manipulation_functions.md
@@ -74,7 +74,7 @@ In DataFrames.jl manipulation functions,
 The examples shown above will be explained in more detail later.
 
 *The manipulation functions also have methods for applying multiple operations.
-See the later sections [Multiple Operations per Manipulation](@ref)
+See the later sections [Applying Multiple Operations per Manipulation](@ref)
 and [Broadcasting Operation Pairs](@ref) for more information.*
 
 ### `source_column_selector`

From 6e7ed849fdd90287e84cf5a38366b0a3469a16a0 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Mon, 2 Oct 2023 16:51:16 -0400
Subject: [PATCH 20/29] maybe fix documenter?

---
 docs/src/man/basics.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 7f77d555b..0e9874301 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -1589,7 +1589,6 @@ as the source data frame, but with only the rows where the passed operations are
 !!! Note Other Resources
     * For formal, comprehensive explanations of all manipulation functions,
     see the [Functions](@ref) API.
-
     * For an informal, long-form tutorial on these functions,
     see [A Gentle Introduction to Data Frame Manipulation Functions](@ref).
 

From d2d3de85b166a26b220ec8410e76a84d141ba4aa Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Thu, 5 Oct 2023 12:29:32 -0400
Subject: [PATCH 21/29] make h function require broadcasting

---
 docs/src/man/manipulation_functions.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/src/man/manipulation_functions.md b/docs/src/man/manipulation_functions.md
index cabda4fd7..72df94476 100644
--- a/docs/src/man/manipulation_functions.md
+++ b/docs/src/man/manipulation_functions.md
@@ -336,7 +336,7 @@ julia> transform(df, :a => g)
    2 │     2      5      3
    3 │     3      4      4
 
-julia> h(x, y) = 2x .+ y
+julia> h(x, y) = x .+ y .+ 1
 h (generic function with 1 method)
 
 julia> transform(df, [:a, :b] => h)
@@ -345,8 +345,8 @@ julia> transform(df, [:a, :b] => h)
      │ Int64  Int64  Int64
 ─────┼─────────────────────
    1 │     1      4      6
-   2 │     2      5      9
-   3 │     3      4     10
+   2 │     2      5      8
+   3 │     3      4      8
 ```
 
 [Anonymous functions](https://docs.julialang.org/en/v1/manual/functions/#man-anonymous-functions)

From 0bdfc44fa0dc4abb3ba05e0bea46507e0a017547 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Thu, 12 Oct 2023 16:46:10 -0400
Subject: [PATCH 22/29] Fix existing typos in basics.md

---
 docs/src/man/basics.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 0e9874301..4e8ba02f7 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -1075,7 +1075,7 @@ true
 
 If in indexing you select a subset of rows from a data frame the mutation is
 performed in place, i.e. writing to an existing vector.
-Below setting values of column `:Job` in rows `1:3` to values `[2, 4, 6]`:
+Below setting values of column `:Job` in rows `1:3` to values `[2, 3, 2]`:
 
 ```jldoctest dataframe
 julia> df1[1:3, :Job] = [2, 3, 2]
@@ -1181,7 +1181,7 @@ DataFrameRow
    2 │    98  male        2
 ```
 
-This operations updated the data stored in the `df1` data frame.
+These operations updated the data stored in the `df1` data frame.
 
 In a similar fashion views can be used to update data stored in their parent
 data frame. Here are some examples:

From 72d87d26d9391026baf8baaeb601fd857bf62fb9 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Fri, 13 Oct 2023 17:04:13 -0400
Subject: [PATCH 23/29] Move back to basics.md and add comparison

---
 docs/make.jl                           |    1 -
 docs/src/index.md                      |    7 -
 docs/src/man/basics.md                 | 2108 ++++++++++++++++++------
 docs/src/man/manipulation_functions.md | 1431 ----------------
 4 files changed, 1568 insertions(+), 1979 deletions(-)
 delete mode 100644 docs/src/man/manipulation_functions.md

diff --git a/docs/make.jl b/docs/make.jl
index d854981e2..fa64782da 100644
--- a/docs/make.jl
+++ b/docs/make.jl
@@ -34,7 +34,6 @@ makedocs(
             "Data manipulation frameworks" => "man/querying_frameworks.md",
             "Comparison with Python/R/Stata" => "man/comparisons.md"
         ],
-        "A Gentle Introduction to Data Frame Manipulation Functions" => "man/manipulation_functions.md",
         "API" => Any[
             "Types" => "lib/types.md",
             "Functions" => "lib/functions.md",
diff --git a/docs/src/index.md b/docs/src/index.md
index e259fd7f1..66ed6f3e5 100644
--- a/docs/src/index.md
+++ b/docs/src/index.md
@@ -229,13 +229,6 @@ Pages = ["man/basics.md",
 Depth = 2
 ```
 
-## A Gentle Introduction to Data Frame Manipulation Functions
-
-```@contents
-Pages = ["man/manipulation_functions.md"]
-Depth = 1
-```
-
 ## API
 
 Only exported (i.e. available for use without `DataFrames.` qualifier after
diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 4e8ba02f7..55937b849 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -1565,599 +1565,1627 @@ julia> german[Not(5), r"S"]
                 984 rows omitted
 ```
 
-## Basic Usage of Manipulation Functions
-
-In DataFrames.jl there are seven functions
-which can be used to perform operations on data frame columns:
-
-- `combine`: creates a new data frame populated with columns that result from
-  operations applied to the source data frame columns, potentially combining
-  its rows;
-- `select`: creates a new data frame that has the same number of rows as the
-  source data frame populated with columns that result from operations
-  applied to the source data frame columns;
-- `select!`: the same as `select` but updates the passed data frame in place;
-- `transform`: the same as `select` but keeps the columns that were already
-  present in the data frame (note though that these columns can be potentially
-  modified by the transformation passed to `transform`);
-- `transform!`: the same as `transform` but updates the passed data frame in
-  place.
-- `subset`: creates a new data frame populated with the same columns
-as the source data frame, but with only the rows where the passed operations are true;
-- `subset!`: the same as `subset` but updates the passed data frame in place;
-
-!!! Note Other Resources
-    * For formal, comprehensive explanations of all manipulation functions,
-    see the [Functions](@ref) API.
-    * For an informal, long-form tutorial on these functions,
-    see [A Gentle Introduction to Data Frame Manipulation Functions](@ref).
-
-Let us now move straight to examples using the German dataset.
+## Manipulation Functions
 
-```jldoctest dataframe
-julia> using Statistics
+The seven functions below can be used to manipulate data frames
+by applying operations to them.
+
+The functions without a `!` in their name
+will create a new data frame based on the source data frame,
+so you will probably want to store the new data frame to a new variable name,
+e.g. `new_df = transform(source_df, operation)`.
+The functions with a `!` at the end of their name
+will modify an existing data frame in-place,
+so there is typically no need to assign the result to a variable,
+e.g. `transform!(source_df, operation)` instead of
+`source_df = transform(source_df, operation)`.
+
+The number of columns and rows in the resultant data frame varies
+depending on the manipulation function employed.
+
+| Function     | Memory Usage                     | Column Retention                        | Row Retention                                       |
+| ------------ | -------------------------------- | --------------------------------------- | --------------------------------------------------- |
+| `transform`  | Creates a new data frame.        | Retains original and resultant columns. | Retains same number of rows as original data frame. |
+| `transform!` | Modifies an existing data frame. | Retains original and resultant columns. | Retains same number of rows as original data frame. |
+| `select`     | Creates a new data frame.        | Retains only resultant columns.         | Retains same number of rows as original data frame. |
+| `select!`    | Modifies an existing data frame. | Retains only resultant columns.         | Retains same number of rows as original data frame. |
+| `subset`     | Creates a new data frame.        | Retains original columns.               | Retains only rows where condition is true.          |
+| `subset!`    | Modifies an existing data frame. | Retains original columns.               | Retains only rows where condition is true.          |
+| `combine`    | Creates a new data frame.        | Retains only resultant columns.         | Retains only resultant rows.                        |
+
+### Constructing Operations
+
+All of the functions above use the same syntax which is commonly
+`manipulation_function(dataframe, operation)`.
+The `operation` argument defines the
+operation to be applied to the source `dataframe`,
+and it can take any of the following common forms explained below:
+
+`source_column_selector`
+: selects source column(s) without manipulating or renaming them
+
+   Examples: `:a`, `[:a, :b]`, `All()`, `Not(:a)`
+
+`source_column_selector => operation_function`
+: passes source column(s) as arguments to a function
+and automatically names the resulting column(s)
+
+   Examples: `:a => sum`, `[:a, :b] => +`, `:a => ByRow(==(3))`
+
+`source_column_selector => operation_function => new_column_names`
+: passes source column(s) as arguments to a function
+and names the resulting column(s) `new_column_names`
+
+   Examples: `:a => sum => :sum_of_a`, `[:a, :b] => + => :a_plus_b`
+
+   *(Not available for `subset` or `subset!`)*
+
+`source_column_selector => new_column_names`
+: renames a source column,
+or splits a column containing collection elements into multiple new columns
+
+   Examples: `:a => :new_a`, `:a_b => [:a, :b]`, `:nt => AsTable`
+
+   (*Not available for `subset` or `subset!`*)
+
+The `=>` operator constructs a
+[Pair](https://docs.julialang.org/en/v1/base/collections/#Core.Pair),
+which is a type to link one object to another.
+(Pairs are commonly used to create elements of a
+[Dictionary](https://docs.julialang.org/en/v1/base/collections/#Dictionaries).)
+In DataFrames.jl manipulation functions,
+`Pair` arguments are used to define column `operations` to be performed.
+The examples shown above will be explained in more detail later.
+
+*The manipulation functions also have methods for applying multiple operations.
+See the later sections [Applying Multiple Operations per Manipulation](@ref)
+and [Broadcasting Operation Pairs](@ref) for more information.*
+
+#### `source_column_selector`
+Inside an `operation`, `source_column_selector` is usually a column name
+or column index which identifies a data frame column.
+
+`source_column_selector` may be used as the entire `operation`
+with `select` or `select!` to isolate or reorder columns.
+
+```julia
+julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6], c = [7, 8, 9])
+3×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      7
+   2 │     2      5      8
+   3 │     3      6      9
+
+julia> select(df, :b)
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+
+julia> select(df, "b")
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+
+julia> select(df, 2)
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+```
+
+`source_column_selector` may also be used as the entire `operation`
+with `subset` or `subset!` if the source column contains `Bool` values.
+
+```julia
+julia> df = DataFrame(
+           name = ["Scott", "Jill", "Erica", "Jimmy"],
+           minor = [false, true, false, true],
+       )
+4×2 DataFrame
+ Row │ name    minor
+     │ String  Bool
+─────┼───────────────
+   1 │ Scott   false
+   2 │ Jill     true
+   3 │ Erica   false
+   4 │ Jimmy    true
+
+julia> subset(df, :minor)
+2×2 DataFrame
+ Row │ name    minor
+     │ String  Bool
+─────┼───────────────
+   1 │ Jill     true
+   2 │ Jimmy    true
+```
+
+`source_column_selector` may instead be a collection of columns such as a vector,
+a [regular expression](https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions),
+a `Not`, `Between`, `All`, or `Cols` expression,
+or a `:`.
+See the [Indexing](@ref) API for the full list of possible values with references.
+
+!!! Note
+      The Julia parser sometimes prevents `:` from being used by itself.
+      If you get
+      `ERROR: syntax: whitespace not allowed after ":" used for quoting`,
+      try using `All()`, `Cols(:)`, or `(:)` instead to select all columns.
 
-julia> combine(german, :Age => mean => :mean_age)
+```julia
+julia> df = DataFrame(
+           id = [1, 2, 3],
+           first_name = ["José", "Emma", "Nathan"],
+           last_name = ["Garcia", "Marino", "Boyer"],
+           age = [61, 24, 33]
+       )
+3×4 DataFrame
+ Row │ id     first_name  last_name  age
+     │ Int64  String      String     Int64
+─────┼─────────────────────────────────────
+   1 │     1  José        Garcia        61
+   2 │     2  Emma        Marino        24
+   3 │     3  Nathan      Boyer         33
+
+julia> select(df, [:last_name, :first_name])
+3×2 DataFrame
+ Row │ last_name  first_name
+     │ String     String
+─────┼───────────────────────
+   1 │ Garcia     José
+   2 │ Marino     Emma
+   3 │ Boyer      Nathan
+
+julia> select(df, r"name")
+3×2 DataFrame
+ Row │ first_name  last_name
+     │ String      String
+─────┼───────────────────────
+   1 │ José        Garcia
+   2 │ Emma        Marino
+   3 │ Nathan      Boyer
+
+julia> select(df, Not(:id))
+3×3 DataFrame
+ Row │ first_name  last_name  age
+     │ String      String     Int64
+─────┼──────────────────────────────
+   1 │ José        Garcia        61
+   2 │ Emma        Marino        24
+   3 │ Nathan      Boyer         33
+
+julia> select(df, Between(2,4))
+3×3 DataFrame
+ Row │ first_name  last_name  age
+     │ String      String     Int64
+─────┼──────────────────────────────
+   1 │ José        Garcia        61
+   2 │ Emma        Marino        24
+   3 │ Nathan      Boyer         33
+
+julia> df2 = DataFrame(
+           name = ["Scott", "Jill", "Erica", "Jimmy"],
+           minor = [false, true, false, true],
+           male = [true, false, false, true],
+       )
+4×3 DataFrame
+ Row │ name    minor  male
+     │ String  Bool   Bool
+─────┼──────────────────────
+   1 │ Scott   false   true
+   2 │ Jill     true  false
+   3 │ Erica   false  false
+   4 │ Jimmy    true   true
+
+julia> subset(df2, [:minor, :male])
+1×3 DataFrame
+ Row │ name    minor  male
+     │ String  Bool   Bool
+─────┼─────────────────────
+   1 │ Jimmy    true  true
+```
+
+!!! Note
+      Using `Symbol` in `source_column_selector` will perform slightly faster than using `String`.
+      However, `String` is convenient when column names contain spaces.
+
+      All elements of `source_column_selector` must be the same type
+      (unless wrapped in `Cols`),
+      e.g. `subset(df2, [:minor, "male"])` will error
+      since `Symbol` and `String` are used simultaneously.)
+
+#### `operation_function`
+Inside an `operation` pair, `operation_function` is a function
+which operates on data frame columns passed as vectors.
+When multiple columns are selected by `source_column_selector`,
+the `operation_function` will receive the columns as separate positional arguments
+in the order they were selected, e.g. `f(column1, column2, column3)`.
+
+```julia
+julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 4])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      4
+
+julia> combine(df, :a => sum)
 1×1 DataFrame
- Row │ mean_age
+ Row │ a_sum
+     │ Int64
+─────┼───────
+   1 │     6
+
+julia> transform(df, :b => maximum) # `transform` and `select` copy scalar result to all rows
+3×3 DataFrame
+ Row │ a      b      b_maximum
+     │ Int64  Int64  Int64
+─────┼─────────────────────────
+   1 │     1      4          5
+   2 │     2      5          5
+   3 │     3      4          5
+
+julia> transform(df, [:b, :a] => -) # vector subtraction is okay
+3×3 DataFrame
+ Row │ a      b      b_a_-
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      3
+   2 │     2      5      3
+   3 │     3      4      1
+
+julia> transform(df, [:a, :b] => *) # vector multiplication is not defined
+ERROR: MethodError: no method matching *(::Vector{Int64}, ::Vector{Int64})
+```
+
+Don't worry! There is a quick fix for the previous error.
+If you want to apply a function to each element in a column
+instead of to the entire column vector,
+then you can wrap your element-wise function in `ByRow` like
+`ByRow(my_elementwise_function)`.
+This will apply `my_elementwise_function` to every element in the column
+and then collect the results back into a vector.
+
+```julia
+julia> transform(df, [:a, :b] => ByRow(*))
+3×3 DataFrame
+ Row │ a      b      a_b_*
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      4
+   2 │     2      5     10
+   3 │     3      4     12
+
+julia> transform(df, Cols(:) => ByRow(max))
+3×3 DataFrame
+ Row │ a      b      a_b_max
+     │ Int64  Int64  Int64
+─────┼───────────────────────
+   1 │     1      4        4
+   2 │     2      5        5
+   3 │     3      4        4
+
+julia> f(x) = x + 1
+f (generic function with 1 method)
+
+julia> transform(df, :a => ByRow(f))
+3×3 DataFrame
+ Row │ a      b      a_f
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      2
+   2 │     2      5      3
+   3 │     3      4      4
+```
+
+Alternatively, you may just want to define the function itself so it
+[broadcasts](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
+over vectors.
+
+```julia
+julia> g(x) = x .+ 1
+g (generic function with 1 method)
+
+julia> transform(df, :a => g)
+3×3 DataFrame
+ Row │ a      b      a_g
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      2
+   2 │     2      5      3
+   3 │     3      4      4
+
+julia> h(x, y) = x .+ y .+ 1
+h (generic function with 1 method)
+
+julia> transform(df, [:a, :b] => h)
+3×3 DataFrame
+ Row │ a      b      a_b_h
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      6
+   2 │     2      5      8
+   3 │     3      4      8
+```
+
+[Anonymous functions](https://docs.julialang.org/en/v1/manual/functions/#man-anonymous-functions)
+are a convenient way to define and use an `operation_function`
+all within the manipulation function call.
+
+```julia
+julia> select(df, :a => ByRow(x -> x + 1))
+3×1 DataFrame
+ Row │ a_function
+     │ Int64
+─────┼────────────
+   1 │          2
+   2 │          3
+   3 │          4
+
+julia> transform(df, [:a, :b] => ByRow((x, y) -> 2x + y))
+3×3 DataFrame
+ Row │ a      b      a_b_function
+     │ Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      4             6
+   2 │     2      5             9
+   3 │     3      4            10
+
+julia> subset(df, :b => ByRow(x -> x < 5))
+2×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     3      4
+
+julia> subset(df, :b => ByRow(<(5))) # shorter version of the previous
+2×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     3      4
+```
+
+!!! Note
+    `operation_functions` within `subset` or `subset!` function calls
+    must return a Boolean vector.
+    `true` elements in the Boolean vector will determine
+    which rows are retained in the resulting data frame.
+
+As demonstrated above, `DataFrame` columns are usually passed
+from `source_column_selector` to `operation_function` as one or more
+vector arguments.
+However, when `AsTable(source_column_selector)` is used,
+the selected columns are collected and passed as a single `NamedTuple`
+to `operation_function`.
+
+This is often useful when your `operation_function` is defined to operate
+on a single collection argument rather than on multiple positional arguments.
+The distinction is somewhat similar to the difference between the built-in
+`min` and `minimum` functions.
+`min` is defined to find the minimum value among multiple positional arguments,
+while `minimum` is defined to find the minimum value
+among the elements of a single collection argument.
+
+```julia
+julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 2:-1:1)
+2×4 DataFrame
+ Row │ a      b      c      d
+     │ Int64  Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      3      5      2
+   2 │     2      4      6      1
+
+julia> select(df, Cols(:) => ByRow(min)) # min operates on multiple arguments
+2×1 DataFrame
+ Row │ a_b_etc_min
+     │ Int64
+─────┼─────────────
+   1 │           1
+   2 │           1
+
+julia> select(df, AsTable(:) => ByRow(minimum)) # minimum operates on a collection
+2×1 DataFrame
+ Row │ a_b_etc_minimum
+     │ Int64
+─────┼─────────────────
+   1 │               1
+   2 │               1
+
+julia> select(df, [:a,:b] => ByRow(+)) # `+` operates on a multiple arguments
+2×1 DataFrame
+ Row │ a_b_+
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     6
+
+julia> select(df, AsTable([:a,:b]) => ByRow(sum)) # `sum` operates on a collection
+2×1 DataFrame
+ Row │ a_b_sum
+     │ Int64
+─────┼─────────
+   1 │       4
+   2 │       6
+
+julia> using Statistics # contains the `mean` function
+
+julia> select(df, AsTable(Between(:b, :d)) => ByRow(mean)) # `mean` operates on a collection
+2×1 DataFrame
+ Row │ b_c_d_mean
      │ Float64
-─────┼──────────
-   1 │   35.546
+─────┼────────────
+   1 │    3.33333
+   2 │    3.66667
+```
 
-julia> select(german, :Age => mean => :mean_age)
-1000×1 DataFrame
-  Row │ mean_age
-      │ Float64
-──────┼──────────
-    1 │   35.546
-    2 │   35.546
-    3 │   35.546
-    4 │   35.546
-    5 │   35.546
-    6 │   35.546
-    7 │   35.546
-    8 │   35.546
-  ⋮   │    ⋮
-  994 │   35.546
-  995 │   35.546
-  996 │   35.546
-  997 │   35.546
-  998 │   35.546
-  999 │   35.546
- 1000 │   35.546
- 985 rows omitted
-```
-
-As you can see in both cases the `mean` function was applied to `:Age` column
-and the result was stored in the `:mean_age` column. The difference between
-the `combine` and `select` functions is that the `combine` aggregates data
-and produces as many rows as were returned by the transformation function.
-On the other hand the `select` function always keeps the number of rows in a
-data frame to be the same as in the source data frame. Therefore in this case
-the result of the `mean` function got broadcasted.
-
-As `combine` potentially allows any number of rows to be produced as a result
-of the transformation if we have a combination of transformations where some of
-them produce a vector, and other produce scalars then scalars get broadcasted
-exactly like in  `select`. Here is an example:
+`AsTable` can also be used to pass columns to a function which operates
+on fields of a `NamedTuple`.
 
-```jldoctest dataframe
-julia> combine(german, :Age => mean => :mean_age, :Housing => unique => :housing)
-3×2 DataFrame
- Row │ mean_age  housing
-     │ Float64   String7
-─────┼───────────────────
-   1 │   35.546  own
-   2 │   35.546  free
-   3 │   35.546  rent
+```julia
+julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 7:8)
+2×4 DataFrame
+ Row │ a      b      c      d
+     │ Int64  Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      3      5      7
+   2 │     2      4      6      8
+
+julia> f(nt) = nt.a + nt.d
+f (generic function with 1 method)
+
+julia> transform(df, AsTable(:) => ByRow(f))
+2×5 DataFrame
+ Row │ a      b      c      d      a_b_etc_f
+     │ Int64  Int64  Int64  Int64  Int64
+─────┼───────────────────────────────────────
+   1 │     1      3      5      7          8
+   2 │     2      4      6      8         10
 ```
 
-Note, however, that it is not allowed to return vectors of different lengths in
-different transformations:
+As demonstrated above,
+in the `source_column_selector => operation_function` operation pair form,
+the results of an operation will be placed into a new column with an
+automatically-generated name based on the operation;
+the new column name will be the `operation_function` name
+appended to the source column name(s) with an underscore.
 
-```jldoctest dataframe
-julia> combine(german, :Age, :Housing => unique => :Housing)
-ERROR: ArgumentError: New columns must have the same length as old columns
+This automatic column naming behavior can be avoided in two ways.
+First, the operation result can be placed back into the original column
+with the original column name by switching the keyword argument `renamecols`
+from its default value (`true`) to `renamecols=false`.
+This option prevents the function name from being appended to the column name
+as it usually would be.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, :a => ByRow(x->x+10), renamecols=false) # add 10 in-place
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │    11      5
+   2 │    12      6
+   3 │    13      7
+   4 │    14      8
 ```
 
-Let us discuss some other examples using `select`. Often we want to apply some
-function not to the whole column of a data frame, but rather to its individual
-elements. Normally we can achieve this using broadcasting like this:
+The second method to avoid the default manipulation column naming is to
+specify your own `new_column_names`.
 
-```jldoctest dataframe
-julia> select(german, :Sex => (x -> uppercase.(x)) => :Sex)
-1000×1 DataFrame
-  Row │ Sex
-      │ String
-──────┼────────
-    1 │ MALE
-    2 │ FEMALE
-    3 │ MALE
-    4 │ MALE
-    5 │ MALE
-    6 │ MALE
-    7 │ MALE
-    8 │ MALE
-  ⋮   │   ⋮
-  994 │ MALE
-  995 │ MALE
-  996 │ FEMALE
-  997 │ MALE
-  998 │ MALE
-  999 │ MALE
- 1000 │ MALE
-985 rows omitted
+#### `new_column_names`
+
+`new_column_names` can be included at the end of an `operation` pair to specify
+the name of the new column(s).
+`new_column_names` may be a symbol, string, function, vector of symbols, vector of strings, or `AsTable`.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, Cols(:) => ByRow(+) => :c)
+4×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, Cols(:) => ByRow(+) => "a+b")
+4×3 DataFrame
+ Row │ a      b      a+b
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, :a => ByRow(x->x+10) => "a+10")
+4×3 DataFrame
+ Row │ a      b      a+10
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     11
+   2 │     2      6     12
+   3 │     3      7     13
+   4 │     4      8     14
 ```
 
-This pattern is encountered very often in practice, therefore there is a `ByRow`
-convenience wrapper for a function that creates its broadcasted variant. In
-these examples `ByRow` is a special type used for selection operations to signal
-that the wrapped function should be applied to each element (row) of the
-selection. Here we are passing `ByRow` wrapper to target column name `:Sex`
-using `uppercase` function:
+The `source_column_selector => new_column_names` operation form
+can be used to rename columns without an intermediate function.
+However, there are `rename` and `rename!` functions,
+which accept similar syntax,
+that tend to be more useful for this operation.
 
-```jldoctest dataframe
-julia> select(german, :Sex => ByRow(uppercase) => :SEX)
-1000×1 DataFrame
-  Row │ SEX
-      │ String
-──────┼────────
-    1 │ MALE
-    2 │ FEMALE
-    3 │ MALE
-    4 │ MALE
-    5 │ MALE
-    6 │ MALE
-    7 │ MALE
-    8 │ MALE
-  ⋮   │   ⋮
-  994 │ MALE
-  995 │ MALE
-  996 │ FEMALE
-  997 │ MALE
-  998 │ MALE
-  999 │ MALE
- 1000 │ MALE
-985 rows omitted
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, :a => :apple) # adds column `apple`
+4×3 DataFrame
+ Row │ a      b      apple
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      1
+   2 │     2      6      2
+   3 │     3      7      3
+   4 │     4      8      4
+
+julia> select(df, :a => :apple) # retains only column `apple`
+4×1 DataFrame
+ Row │ apple
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+   4 │     4
+
+julia> rename(df, :a => :apple) # renames column `a` to `apple` in-place
+4×2 DataFrame
+ Row │ apple  b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
 ```
 
-In this case we transform our source column `:Age` using `ByRow` wrapper and
-automatically generate the target column name:
+If `new_column_names` already exist in the source data frame,
+those columns will be replaced in the existing column location
+rather than being added to the end.
+This can be done by manually specifying an existing column name
+or by using the `renamecols=false` keyword argument.
 
-```jldoctest dataframe
-julia> select(german, :Age, :Age => ByRow(sqrt))
-1000×2 DataFrame
-  Row │ Age    Age_sqrt
-      │ Int64  Float64
-──────┼─────────────────
-    1 │    67   8.18535
-    2 │    22   4.69042
-    3 │    49   7.0
-    4 │    45   6.7082
-    5 │    53   7.28011
-    6 │    35   5.91608
-    7 │    53   7.28011
-    8 │    35   5.91608
-  ⋮   │   ⋮       ⋮
-  994 │    30   5.47723
-  995 │    50   7.07107
-  996 │    31   5.56776
-  997 │    40   6.32456
-  998 │    38   6.16441
-  999 │    23   4.79583
- 1000 │    27   5.19615
-        985 rows omitted
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, :b => (x -> x .+ 10))  # automatic new column and column name
+4×3 DataFrame
+ Row │ a      b      b_function
+     │ Int64  Int64  Int64
+─────┼──────────────────────────
+   1 │     1      5          15
+   2 │     2      6          16
+   3 │     3      7          17
+   4 │     4      8          18
+
+julia> transform(df, :b => (x -> x .+ 10), renamecols=false)  # transform column in-place
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1     15
+   2 │     2     16
+   3 │     3     17
+   4 │     4     18
+
+julia> transform(df, :b => (x -> x .+ 10) => :a)  # replace column :a
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │    15      5
+   2 │    16      6
+   3 │    17      7
+   4 │    18      8
 ```
 
-When we pass just a column (without the `=>` part) we can use any column selector
-that is allowed in indexing.
+Actually, `renamecols=false` just prevents the function name from being appended to the final column name such that the operation is *usually* returned to the same column.
 
-Here we exclude the column `:Age` from the resulting data frame:
+```julia
+julia> transform(df, [:a, :b] => +)  # new column name is all source columns and function name
+4×3 DataFrame
+ Row │ a      b      a_b_+
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, [:a, :b] => +, renamecols=false)  # same as above but with no function name
+4×3 DataFrame
+ Row │ a      b      a_b
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
 
-```jldoctest dataframe
-julia> select(german, Not(:Age))
-1000×9 DataFrame
-  Row │ id     Sex      Job    Housing  Saving accounts  Checking account  Cre ⋯
-      │ Int64  String7  Int64  String7  String15         String15          Int ⋯
-──────┼─────────────────────────────────────────────────────────────────────────
-    1 │     0  male         2  own      NA               little                ⋯
-    2 │     1  female       2  own      little           moderate
-    3 │     2  male         1  own      little           NA
-    4 │     3  male         2  free     little           little
-    5 │     4  male         2  free     little           little                ⋯
-    6 │     5  male         1  free     NA               NA
-    7 │     6  male         2  own      quite rich       NA
-    8 │     7  male         3  rent     little           moderate
-  ⋮   │   ⋮       ⋮       ⋮       ⋮            ⋮                ⋮              ⋱
-  994 │   993  male         3  own      little           little                ⋯
-  995 │   994  male         2  own      NA               NA
-  996 │   995  female       1  own      little           NA
-  997 │   996  male         3  own      little           little
-  998 │   997  male         2  own      little           NA                    ⋯
-  999 │   998  male         2  free     little           little
- 1000 │   999  male         2  own      moderate         moderate
-                                                  3 columns and 985 rows omitted
+julia> transform(df, [:a, :b] => (+) => :a)  # manually overwrite column :a (see Note below about parentheses)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     6      5
+   2 │     8      6
+   3 │    10      7
+   4 │    12      8
 ```
 
-In the next example we drop columns `"Age"`, `"Saving accounts"`,
-`"Checking account"`, `"Credit amount"`, and `"Purpose"`. Note that this time
-we use string column selectors because some of the column names have spaces
-in them:
+In the `source_column_selector => operation_function => new_column_names` operation form,
+`new_column_names` may also be a renaming function which operates on a string
+to create the destination column names programmatically.
 
-```jldoctest dataframe
-julia> select(german, Not(["Age", "Saving accounts", "Checking account",
-                           "Credit amount", "Purpose"]))
-1000×5 DataFrame
-  Row │ id     Sex      Job    Housing  Duration
-      │ Int64  String7  Int64  String7  Int64
-──────┼──────────────────────────────────────────
-    1 │     0  male         2  own             6
-    2 │     1  female       2  own            48
-    3 │     2  male         1  own            12
-    4 │     3  male         2  free           42
-    5 │     4  male         2  free           24
-    6 │     5  male         1  free           36
-    7 │     6  male         2  own            24
-    8 │     7  male         3  rent           36
-  ⋮   │   ⋮       ⋮       ⋮       ⋮        ⋮
-  994 │   993  male         3  own            36
-  995 │   994  male         2  own            12
-  996 │   995  female       1  own            12
-  997 │   996  male         3  own            30
-  998 │   997  male         2  own            12
-  999 │   998  male         2  free           45
- 1000 │   999  male         2  own            45
-                                 985 rows omitted
-
-```
-
-As another example let us present that the `r"S"` regular expression we used
-above also works with `select`:
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
 
-```jldoctest dataframe
-julia> select(german, r"S")
-1000×2 DataFrame
-  Row │ Sex      Saving accounts
-      │ String7  String15
-──────┼──────────────────────────
-    1 │ male     NA
-    2 │ female   little
-    3 │ male     little
-    4 │ male     little
-    5 │ male     little
-    6 │ male     NA
-    7 │ male     quite rich
-    8 │ male     little
-  ⋮   │    ⋮            ⋮
-  994 │ male     little
-  995 │ male     NA
-  996 │ female   little
-  997 │ male     little
-  998 │ male     little
-  999 │ male     little
- 1000 │ male     moderate
-                 985 rows omitted
-```
-
-The benefit of `select` or `combine` over indexing is that it is easier
-to get the union of several column selectors, e.g.:
+julia> add_prefix(s) = "new_" * s
+add_prefix (generic function with 1 method)
 
-```jldoctest dataframe
-julia> select(german, r"S", "Job", 1)
-1000×4 DataFrame
-  Row │ Sex      Saving accounts  Job    id
-      │ String7  String15         Int64  Int64
-──────┼────────────────────────────────────────
-    1 │ male     NA                   2      0
-    2 │ female   little               2      1
-    3 │ male     little               1      2
-    4 │ male     little               2      3
-    5 │ male     little               2      4
-    6 │ male     NA                   1      5
-    7 │ male     quite rich           2      6
-    8 │ male     little               3      7
-  ⋮   │    ⋮            ⋮           ⋮      ⋮
-  994 │ male     little               3    993
-  995 │ male     NA                   2    994
-  996 │ female   little               1    995
-  997 │ male     little               3    996
-  998 │ male     little               2    997
-  999 │ male     little               2    998
- 1000 │ male     moderate             2    999
-                               985 rows omitted
-```
-
-Taking advantage of this flexibility here is an idiomatic pattern to move some
-column to the front of a data frame:
+julia> transform(df, :a => (x -> 10 .* x) => add_prefix) # with named renaming function
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     10
+   2 │     2      6     20
+   3 │     3      7     30
+   4 │     4      8     40
+
+julia> transform(df, :a => (x -> 10 .* x) => (s -> "new_" * s)) # with anonymous renaming function
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     10
+   2 │     2      6     20
+   3 │     3      7     30
+   4 │     4      8     40
+```
+
+!!! Note
+      It is a good idea to wrap anonymous functions in parentheses
+      to avoid the `=>` operator accidently becoming part of the anonymous function.
+      The examples above do not work correctly without the parentheses!
+      ```julia
+      julia> transform(df, :a => x -> 10 .* x => add_prefix)  # Not what we wanted!
+      4×3 DataFrame
+       Row │ a      b      a_function
+           │ Int64  Int64  Pair…
+      ─────┼────────────────────────────────────────────
+         1 │     1      5  [10, 20, 30, 40]=>add_prefix
+         2 │     2      6  [10, 20, 30, 40]=>add_prefix
+         3 │     3      7  [10, 20, 30, 40]=>add_prefix
+         4 │     4      8  [10, 20, 30, 40]=>add_prefix
+
+      julia> transform(df, :a => x -> 10 .* x => s -> "new_" * s)  # Not what we wanted!
+      4×3 DataFrame
+       Row │ a      b      a_function
+           │ Int64  Int64  Pair…
+      ─────┼─────────────────────────────────────
+         1 │     1      5  [10, 20, 30, 40]=>#18
+         2 │     2      6  [10, 20, 30, 40]=>#18
+         3 │     3      7  [10, 20, 30, 40]=>#18
+         4 │     4      8  [10, 20, 30, 40]=>#18
+      ```
+
+A renaming function will not work in the
+`source_column_selector => new_column_names` operation form
+because a function in the second element of the operation pair is assumed to take
+the `source_column_selector => operation_function` operation form.
+To work around this limitation, use the
+`source_column_selector => operation_function => new_column_names` operation form
+with `identity` as the `operation_function`.
 
-```jldoctest dataframe
-julia> select(german, "Sex", :)
-1000×10 DataFrame
-  Row │ Sex      id     Age    Job    Housing  Saving accounts  Checking accou ⋯
-      │ String7  Int64  Int64  Int64  String7  String15         String15       ⋯
-──────┼─────────────────────────────────────────────────────────────────────────
-    1 │ male         0     67      2  own      NA               little         ⋯
-    2 │ female       1     22      2  own      little           moderate
-    3 │ male         2     49      1  own      little           NA
-    4 │ male         3     45      2  free     little           little
-    5 │ male         4     53      2  free     little           little         ⋯
-    6 │ male         5     35      1  free     NA               NA
-    7 │ male         6     53      2  own      quite rich       NA
-    8 │ male         7     35      3  rent     little           moderate
-  ⋮   │    ⋮       ⋮      ⋮      ⋮       ⋮            ⋮                ⋮       ⋱
-  994 │ male       993     30      3  own      little           little         ⋯
-  995 │ male       994     50      2  own      NA               NA
-  996 │ female     995     31      1  own      little           NA
-  997 │ male       996     40      3  own      little           little
-  998 │ male       997     38      2  own      little           NA             ⋯
-  999 │ male       998     23      2  free     little           little
- 1000 │ male       999     27      2  own      moderate         moderate
-                                                  4 columns and 985 rows omitted
+```julia
+julia> transform(df, :a => add_prefix)
+ERROR: MethodError: no method matching *(::String, ::Vector{Int64})
+
+julia> transform(df, :a => identity => add_prefix)
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      1
+   2 │     2      6      2
+   3 │     3      7      3
+   4 │     4      8      4
 ```
 
-Below, we are simply passing source column and target column name to rename them
-(without specifying the transformation part):
+In this case though,
+it is probably again more useful to use the `rename` or `rename!` function
+rather than one of the manipulation functions
+in order to rename in-place and avoid the intermediate `operation_function`.
+```julia
+julia> rename(add_prefix, df)  # rename all columns with a function
+4×2 DataFrame
+ Row │ new_a  new_b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> rename(add_prefix, df; cols=:a)  # rename some columns with a function
+4×2 DataFrame
+ Row │ new_a  b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+```
 
-```jldoctest dataframe
-julia> select(german, :Sex => :x1, :Age => :x2)
-1000×2 DataFrame
-  Row │ x1       x2
-      │ String7  Int64
-──────┼────────────────
-    1 │ male        67
-    2 │ female      22
-    3 │ male        49
-    4 │ male        45
-    5 │ male        53
-    6 │ male        35
-    7 │ male        53
-    8 │ male        35
-  ⋮   │    ⋮       ⋮
-  994 │ male        30
-  995 │ male        50
-  996 │ female      31
-  997 │ male        40
-  998 │ male        38
-  999 │ male        23
- 1000 │ male        27
-       985 rows omitted
+In the `source_column_selector => new_column_names` operation form,
+only a single source column may be selected per operation,
+so why is `new_column_names` plural?
+It is possible to split the data contained inside a single column
+into multiple new columns by supplying a vector of strings or symbols
+as `new_column_names`.
+
+```julia
+julia> df = DataFrame(data = [(1,2), (3,4)]) # vector of tuples
+2×1 DataFrame
+ Row │ data
+     │ Tuple…
+─────┼────────
+   1 │ (1, 2)
+   2 │ (3, 4)
+
+julia> transform(df, :data => [:first, :second]) # manual naming
+2×3 DataFrame
+ Row │ data    first  second
+     │ Tuple…  Int64  Int64
+─────┼───────────────────────
+   1 │ (1, 2)      1       2
+   2 │ (3, 4)      3       4
 ```
 
-It is important to note that `select` always returns a data frame, even if a
-single column selected as opposed to indexing syntax. Compare the following:
+This kind of data splitting can even be done automatically with `AsTable`.
 
-```jldoctest dataframe
-julia> select(german, :Age)
-1000×1 DataFrame
-  Row │ Age
-      │ Int64
-──────┼───────
-    1 │    67
-    2 │    22
-    3 │    49
-    4 │    45
-    5 │    53
-    6 │    35
-    7 │    53
-    8 │    35
-  ⋮   │   ⋮
-  994 │    30
-  995 │    50
-  996 │    31
-  997 │    40
-  998 │    38
-  999 │    23
- 1000 │    27
-985 rows omitted
+```julia
+julia> transform(df, :data => AsTable) # default automatic naming with tuples
+2×3 DataFrame
+ Row │ data    x1     x2
+     │ Tuple…  Int64  Int64
+─────┼──────────────────────
+   1 │ (1, 2)      1      2
+   2 │ (3, 4)      3      4
+```
 
-julia> german[:, :Age]
-1000-element Vector{Int64}:
- 67
- 22
- 49
- 45
- 53
- 35
- 53
- 35
- 61
- 28
-  ⋮
- 34
- 23
- 30
- 50
- 31
- 40
- 38
- 23
- 27
-```
-
-By default `select` copies columns of a passed source data frame. In order to
-avoid copying, pass the `copycols=false` keyword argument:
+If a data frame column contains `NamedTuple`s,
+then `AsTable` will preserve the field names.
+```julia
+julia> df = DataFrame(data = [(a=1,b=2), (a=3,b=4)]) # vector of named tuples
+2×1 DataFrame
+ Row │ data
+     │ NamedTup…
+─────┼────────────────
+   1 │ (a = 1, b = 2)
+   2 │ (a = 3, b = 4)
 
-```jldoctest dataframe
-julia> df = select(german, :Sex)
-1000×1 DataFrame
-  Row │ Sex
-      │ String7
-──────┼─────────
-    1 │ male
-    2 │ female
-    3 │ male
-    4 │ male
-    5 │ male
-    6 │ male
-    7 │ male
-    8 │ male
-  ⋮   │    ⋮
-  994 │ male
-  995 │ male
-  996 │ female
-  997 │ male
-  998 │ male
-  999 │ male
- 1000 │ male
-985 rows omitted
+julia> transform(df, :data => AsTable) # keeps names from named tuples
+2×3 DataFrame
+ Row │ data            a      b
+     │ NamedTup…       Int64  Int64
+─────┼──────────────────────────────
+   1 │ (a = 1, b = 2)      1      2
+   2 │ (a = 3, b = 4)      3      4
+```
 
-julia> df.Sex === german.Sex # copy
-false
+!!! Note
+      To pack multiple columns into a single column of `NamedTuple`s
+      (reverse of the above operation)
+      apply the `identity` function `ByRow`, e.g.
+      `transform(df, AsTable([:a, :b]) => ByRow(identity) => :data)`.
 
-julia> df = select(german, :Sex, copycols=false)
-1000×1 DataFrame
-  Row │ Sex
-      │ String7
-──────┼─────────
-    1 │ male
-    2 │ female
-    3 │ male
-    4 │ male
-    5 │ male
-    6 │ male
-    7 │ male
-    8 │ male
-  ⋮   │    ⋮
-  994 │ male
-  995 │ male
-  996 │ female
-  997 │ male
-  998 │ male
-  999 │ male
- 1000 │ male
-985 rows omitted
+Renaming functions also work for multi-column transformations,
+but they must operate on a vector of strings.
+
+```julia
+julia> df = DataFrame(data = [(1,2), (3,4)])
+2×1 DataFrame
+ Row │ data
+     │ Tuple…
+─────┼────────
+   1 │ (1, 2)
+   2 │ (3, 4)
+
+julia> new_names(v) = ["primary ", "secondary "] .* v
+new_names (generic function with 1 method)
+
+julia> transform(df, :data => identity => new_names)
+2×3 DataFrame
+ Row │ data    primary data  secondary data
+     │ Tuple…  Int64         Int64
+─────┼──────────────────────────────────────
+   1 │ (1, 2)             1               2
+   2 │ (3, 4)             3               4
+```
+
+### Applying Multiple Operations per Manipulation
+All data frame manipulation functions can accept multiple `operation` pairs
+at once using any of the following methods:
+- `manipulation_function(dataframe, operation1, operation2)`   : multiple arguments
+- `manipulation_function(dataframe, [operation1, operation2])` : vector argument
+- `manipulation_function(dataframe, [operation1 operation2])`  : matrix argument
+
+Passing multiple operations is especially useful for the `select`, `select!`,
+and `combine` manipulation functions,
+since they only retain columns which are a result of the passed operations.
+
+```julia
+julia> df = DataFrame(a = 1:4, b = [50,50,60,60], c = ["hat","bat","cat","dog"])
+4×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │     1     50  hat
+   2 │     2     50  bat
+   3 │     3     60  cat
+   4 │     4     60  dog
+
+julia> combine(df, :a => maximum, :b => sum, :c => join) # 3 combine operations
+1×3 DataFrame
+ Row │ a_maximum  b_sum  c_join
+     │ Int64      Int64  String
+─────┼────────────────────────────────
+   1 │         4    220  hatbatcatdog
+
+julia> select(df, :c, :b, :a) # re-order columns
+4×3 DataFrame
+ Row │ c       b      a
+     │ String  Int64  Int64
+─────┼──────────────────────
+   1 │ hat        50      1
+   2 │ bat        50      2
+   3 │ cat        60      3
+   4 │ dog        60      4
+
+ulia> select(df, :b, :) # `:` here means all other columns
+4×3 DataFrame
+ Row │ b      a      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │    50      1  hat
+   2 │    50      2  bat
+   3 │    60      3  cat
+   4 │    60      4  dog
+
+julia> select(
+           df,
+           :c => (x -> "a " .* x) => :one_c,
+           :a => (x -> 100x),
+           :b,
+           renamecols=false
+       ) # can mix operation forms
+4×3 DataFrame
+ Row │ one_c   a      b
+     │ String  Int64  Int64
+─────┼──────────────────────
+   1 │ a hat     100     50
+   2 │ a bat     200     50
+   3 │ a cat     300     60
+   4 │ a dog     400     60
+
+julia> select(
+           df,
+           :c => ByRow(reverse),
+           :c => ByRow(uppercase)
+       ) # multiple operations on same column
+4×2 DataFrame
+ Row │ c_reverse  c_uppercase
+     │ String     String
+─────┼────────────────────────
+   1 │ tah        HAT
+   2 │ tab        BAT
+   3 │ tac        CAT
+   4 │ god        DOG
+```
+
+In the last two examples,
+the manipulation function arguments were split across multiple lines.
+This is a good way to make manipulations with many operations more readable.
+
+Passing multiple operations to `subset` or `subset!` is an easy way to narrow in
+on a particular row of data.
+
+```julia
+julia> subset(
+           df,
+           :b => ByRow(==(60)),
+           :c => ByRow(contains("at"))
+       ) # rows with 60 and "at"
+1×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │     3     60  cat
+```
 
-julia> df.Sex === german.Sex # no-copy is performed
+Note that all operations within a single manipulation must use the data
+as it existed before the function call
+i.e. you cannot use newly created columns for subsequent operations
+within the same manipulation.
+
+```julia
+julia> transform(
+           df,
+           [:a, :b] => ByRow(+) => :d,
+           :d => (x -> x ./ 2),
+       ) # requires two separate transformations
+ERROR: ArgumentError: column name :d not found in the data frame; existing most similar names are: :a, :b and :c
+
+julia> new_df = transform(df, [:a, :b] => ByRow(+) => :d)
+4×4 DataFrame
+ Row │ a      b      c       d
+     │ Int64  Int64  String  Int64
+─────┼─────────────────────────────
+   1 │     1     50  hat        51
+   2 │     2     50  bat        52
+   3 │     3     60  cat        63
+   4 │     4     60  dog        64
+
+julia> transform!(new_df, :d => (x -> x ./ 2) => :d_2)
+4×5 DataFrame
+ Row │ a      b      c       d      d_2
+     │ Int64  Int64  String  Int64  Float64
+─────┼──────────────────────────────────────
+   1 │     1     50  hat        51     25.5
+   2 │     2     50  bat        52     26.0
+   3 │     3     60  cat        63     31.5
+   4 │     4     60  dog        64     32.0
+```
+
+
+### Broadcasting Operation Pairs
+
+[Broadcasting](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
+pairs with `.=>` is often a convenient way to generate multiple
+similar `operation`s to be applied within a single manipulation.
+Broadcasting within the `Pair` of an `operation` is no different than
+broadcasting in base Julia.
+The broadcasting `.=>` will be expanded into a vector of pairs
+(`[operation1, operation2, ...]`),
+and this expansion will occur before the manipulation function is invoked.
+Then the manipulation function will use the
+`manipulation_function(dataframe, [operation1, operation2, ...])` method.
+This process will be explained in more detail below.
+
+To illustrate these concepts, let us first examine the `Type` of a basic `Pair`.
+In DataFrames.jl, a symbol, string, or integer
+may be used to select a single column.
+Some `Pair`s with these types are below.
+
+```julia
+julia> typeof(:x => :a)
+Pair{Symbol, Symbol}
+
+julia> typeof("x" => "a")
+Pair{String, String}
+
+julia> typeof(1 => "a")
+Pair{Int64, String}
+```
+
+Any of the `Pair`s above could be used to rename the first column
+of the data frame below to `a`.
+
+```julia
+julia> df = DataFrame(x = 1:3, y = 4:6)
+3×2 DataFrame
+ Row │ x      y
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+
+julia> select(df, :x => :a)
+3×1 DataFrame
+ Row │ a
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+
+julia> select(df, 1 => "a")
+3×1 DataFrame
+ Row │ a
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+```
+
+What should we do if we want to keep and rename both the `x` and `y` column?
+One option is to supply a `Vector` of operation `Pair`s to `select`.
+`select` will process all of these operations in order.
+
+```julia
+julia> ["x" => "a", "y" => "b"]
+2-element Vector{Pair{String, String}}:
+ "x" => "a"
+ "y" => "b"
+
+julia> select(df, ["x" => "a", "y" => "b"])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+```
+
+We can use broadcasting to simplify the syntax above.
+
+```julia
+julia> ["x", "y"] .=> ["a", "b"]
+2-element Vector{Pair{String, String}}:
+ "x" => "a"
+ "y" => "b"
+
+julia> select(df, ["x", "y"] .=> ["a", "b"])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+```
+
+Notice that `select` sees the same `Vector{Pair{String, String}}` operation
+argument whether the individual pairs are written out explicitly or
+constructed with broadcasting.
+The broadcasting is applied before the call to `select`.
+
+```julia
+julia> ["x" => "a", "y" => "b"] == (["x", "y"] .=> ["a", "b"])
 true
 ```
 
-To perform the selection operation in-place use `select!`:
+!!! Note
+      These operation pairs (or vector of pairs) can be given variable names.
+      This is uncommon in practice but could be helpful for intermediate
+      inspection and testing.
+      ```julia
+      df = DataFrame(x = 1:3, y = 4:6)       # create data frame
+      operation = ["x", "y"] .=> ["a", "b"]  # save operation to variable
+      typeof(operation)                      # check type of operation
+      first(operation)                       # check first pair in operation
+      last(operation)                        # check last pair in operation
+      select(df, operation)                  # manipulate `df` with `operation`
+      ```
 
-```jldoctest dataframe
-julia> select!(german, Not(:Age));
+In Julia,
+a non-vector broadcasted with a vector will be repeated in each resultant pair element.
 
-julia> german
-1000×9 DataFrame
-  Row │ id     Sex      Job    Housing  Saving accounts  Checking account  Cre ⋯
-      │ Int64  String7  Int64  String7  String15         String15          Int ⋯
-──────┼─────────────────────────────────────────────────────────────────────────
-    1 │     0  male         2  own      NA               little                ⋯
-    2 │     1  female       2  own      little           moderate
-    3 │     2  male         1  own      little           NA
-    4 │     3  male         2  free     little           little
-    5 │     4  male         2  free     little           little                ⋯
-    6 │     5  male         1  free     NA               NA
-    7 │     6  male         2  own      quite rich       NA
-    8 │     7  male         3  rent     little           moderate
-  ⋮   │   ⋮       ⋮       ⋮       ⋮            ⋮                ⋮              ⋱
-  994 │   993  male         3  own      little           little                ⋯
-  995 │   994  male         2  own      NA               NA
-  996 │   995  female       1  own      little           NA
-  997 │   996  male         3  own      little           little
-  998 │   997  male         2  own      little           NA                    ⋯
-  999 │   998  male         2  free     little           little
- 1000 │   999  male         2  own      moderate         moderate
-                                                  3 columns and 985 rows omitted
+```julia
+julia> ["x", "y"] .=> :a    # :a is repeated
+2-element Vector{Pair{String, Symbol}}:
+ "x" => :a
+ "y" => :a
+
+julia> 1 .=> [:a, :b]       # 1 is repeated
+2-element Vector{Pair{Int64, Symbol}}:
+ 1 => :a
+ 1 => :b
 ```
 
-As you can see the `:Age` column was dropped from the `german` data frame.
+We can use this fact to easily broadcast an `operation_function` to multiple columns.
 
-The `transform` and `transform!` functions work identically to `select` and
-`select!` with the only difference that they retain all columns that are present
-in the source data frame. Here are some examples:
+```julia
+julia> f(x) = 2 * x
+f (generic function with 1 method)
 
-```jldoctest dataframe
-julia> german = copy(german_ref);
+julia> ["x", "y"] .=> f  # f is repeated
+2-element Vector{Pair{String, typeof(f)}}:
+ "x" => f
+ "y" => f
 
-julia> df = german_ref[1:8, 1:5]
-8×5 DataFrame
- Row │ id     Age    Sex      Job    Housing
-     │ Int64  Int64  String7  Int64  String7
-─────┼───────────────────────────────────────
-   1 │     0     67  male         2  own
-   2 │     1     22  female       2  own
-   3 │     2     49  male         1  own
-   4 │     3     45  male         2  free
-   5 │     4     53  male         2  free
-   6 │     5     35  male         1  free
-   7 │     6     53  male         2  own
-   8 │     7     35  male         3  rent
-
-julia> transform(df, :Age => maximum)
-8×6 DataFrame
- Row │ id     Age    Sex      Job    Housing  Age_maximum
-     │ Int64  Int64  String7  Int64  String7  Int64
+julia> select(df, ["x", "y"] .=> f)  # apply f with automatic column renaming
+3×2 DataFrame
+ Row │ x_f    y_f
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
+
+julia> ["x", "y"] .=> f .=> ["a", "b"]  # f is repeated
+2-element Vector{Pair{String, Pair{typeof(f), String}}}:
+ "x" => (f => "a")
+ "y" => (f => "b")
+
+julia> select(df, ["x", "y"] .=> f .=> ["a", "b"])  # apply f with manual column renaming
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
+```
+
+A renaming function can be applied to multiple columns in the same way.
+It will also be repeated in each operation `Pair`.
+
+```julia
+julia> newname(s::String) = s * "_new"
+newname (generic function with 1 method)
+
+julia> ["x", "y"] .=> f .=> newname  # both f and newname are repeated
+2-element Vector{Pair{String, Pair{typeof(f), typeof(newname)}}}:
+ "x" => (f => newname)
+ "y" => (f => newname)
+
+julia> select(df, ["x", "y"] .=> f .=> newname)  # apply f then rename column with newname
+3×2 DataFrame
+ Row │ x_new  y_new
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
+```
+
+You can see from the type output above
+that a three element pair does not actually exist.
+A `Pair` (as the name implies) can only contain two elements.
+Thus, `:x => :y => :z` becomes a nested `Pair`,
+where `:x` is the first element and points to the `Pair` `:y => :z`,
+which is the second element.
+
+```julia
+julia> p = :x => :y => :z
+:x => (:y => :z)
+
+julia> p[1]
+:x
+
+julia> p[2]
+:y => :z
+
+julia> p[2][1]
+:y
+
+julia> p[2][2]
+:z
+
+julia> p[3] # there is no index 3 for a pair
+ERROR: BoundsError: attempt to access Pair{Symbol, Pair{Symbol, Symbol}} at index [3]
+```
+
+In the previous examples, the source columns have been individually selected.
+When broadcasting multiple columns to the same function,
+often similarities in the column names or position can be exploited to avoid
+tedious selection.
+Consider a data frame with temperature data at three different locations
+taken over time.
+```julia
+julia> df = DataFrame(Time = 1:4,
+                      Temperature1 = [20, 23, 25, 28],
+                      Temperature2 = [33, 37, 41, 44],
+                      Temperature3 = [15, 10, 4, 0])
+4×4 DataFrame
+ Row │ Time   Temperature1  Temperature2  Temperature3
+     │ Int64  Int64         Int64         Int64
+─────┼─────────────────────────────────────────────────
+   1 │     1            20            33            15
+   2 │     2            23            37            10
+   3 │     3            25            41             4
+   4 │     4            28            44             0
+```
+
+To convert all of the temperature data in one transformation,
+we just need to define a conversion function and broadcast
+it to all of the "Temperature" columns.
+
+```julia
+julia> celsius_to_kelvin(x) = x + 273
+celsius_to_kelvin (generic function with 1 method)
+
+julia> transform(
+           df,
+           Cols(r"Temp") .=> ByRow(celsius_to_kelvin),
+           renamecols = false
+       )
+4×4 DataFrame
+ Row │ Time   Temperature1  Temperature2  Temperature3
+     │ Int64  Int64         Int64         Int64
+─────┼─────────────────────────────────────────────────
+   1 │     1           293           306           288
+   2 │     2           296           310           283
+   3 │     3           298           314           277
+   4 │     4           301           317           273
+```
+Or, simultaneously changing the column names:
+
+```julia
+julia> rename_function(s) = "Temperature $(last(s)) (K)"
+rename_function (generic function with 1 method)
+
+julia> select(
+           df,
+           "Time",
+           Cols(r"Temp") .=> ByRow(celsius_to_kelvin) .=> rename_function
+       )
+4×4 DataFrame
+ Row │ Time   Temperature 1 (K)  Temperature 2 (K)  Temperature 3 (K)
+     │ Int64  Int64              Int64              Int64
+─────┼────────────────────────────────────────────────────────────────
+   1 │     1                293                306                288
+   2 │     2                296                310                283
+   3 │     3                298                314                277
+   4 │     4                301                317                273
+```
+
+!!! Note Notes
+      * `Not("Time")` or `2:4` would have been equally good choices for `source_column_selector` in the above operations.
+      * Don't forget `ByRow` if your function is to be applied to elements rather than entire column vectors.
+      Without `ByRow`, the manipulations above would have thrown
+      `ERROR: MethodError: no method matching +(::Vector{Int64}, ::Int64)`.
+      * Regular expression (`r""`) and `:` `source_column_selectors`
+      must be wrapped in `Cols` to be properly broadcasted
+      because otherwise the broadcasting occurs before the expression is expanded into a vector of matches.
+
+You could also broadcast different columns to different functions
+by supplying a vector of functions.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> f1(x) = x .+ 1
+f1 (generic function with 1 method)
+
+julia> f2(x) = x ./ 10
+f2 (generic function with 1 method)
+
+julia> transform(df, [:a, :b] .=> [f1, f2])
+4×4 DataFrame
+ Row │ a      b      a_f1   b_f2
+     │ Int64  Int64  Int64  Float64
+─────┼──────────────────────────────
+   1 │     1      5      2      0.5
+   2 │     2      6      3      0.6
+   3 │     3      7      4      0.7
+   4 │     4      8      5      0.8
+```
+
+However, this form is not much more convenient than supplying
+multiple individual operations.
+
+```julia
+julia> transform(df, [:a => f1, :b => f2]) # same manipulation as previous
+4×4 DataFrame
+ Row │ a      b      a_f1   b_f2
+     │ Int64  Int64  Int64  Float64
+─────┼──────────────────────────────
+   1 │     1      5      2      0.5
+   2 │     2      6      3      0.6
+   3 │     3      7      4      0.7
+   4 │     4      8      5      0.8
+```
+
+Perhaps more useful for broadcasting syntax
+is to apply multiple functions to multiple columns
+by changing the vector of functions to a 1-by-x matrix of functions.
+(Recall that a list, a vector, or a matrix of operation pairs are all valid
+for passing to the manipulation functions.)
+
+```julia
+julia> [:a, :b] .=> [f1 f2] # No comma `,` between f1 and f2
+2×2 Matrix{Pair{Symbol}}:
+ :a=>f1  :a=>f2
+ :b=>f1  :b=>f2
+
+julia> transform(df, [:a, :b] .=> [f1 f2]) # No comma `,` between f1 and f2
+4×6 DataFrame
+ Row │ a      b      a_f1   b_f1   a_f2     b_f2
+     │ Int64  Int64  Int64  Int64  Float64  Float64
+─────┼──────────────────────────────────────────────
+   1 │     1      5      2      6      0.1      0.5
+   2 │     2      6      3      7      0.2      0.6
+   3 │     3      7      4      8      0.3      0.7
+   4 │     4      8      5      9      0.4      0.8
+```
+
+In this way, every combination of selected columns and functions will be applied.
+
+Pair broadcasting is a simple but powerful tool
+that can be used in any of the manipulation functions listed under
+[Basic Usage of Manipulation Functions](@ref).
+Experiment for yourself to discover other useful operations.
+
+### Additional Resources
+More details and examples of operation pair syntax can be found in
+[this blog post](https://bkamins.github.io/julialang/2020/12/24/minilanguage.html).
+(The official wording describing the syntax has changed since the blog post was written,
+but the examples are still illustrative.
+The operation pair syntax is sometimes referred to as the DataFrames.jl mini-language
+or Domain-Specific Language.)
+
+For additional syntax niceties,
+many users find the [Chain.jl](https://github.com/jkrumbiegel/Chain.jl)
+and [DataFramesMeta.jl](https://github.com/JuliaData/DataFramesMeta.jl)
+packages useful
+to help simplify manipulations that may be tedious with operation pairs alone.
+
+## Approach Comparison
+
+After that deep dive into [Manipulation Functions](@ref),
+it is a good idea to review the alternative approaches covered in
+[Getting and Setting Data in a Data Frame](@ref).
+Let us compare the two approaches with a few examples.
+
+### Convenience
+
+For simple operations,
+often getting/setting data with dot syntax
+is simpler than the equivalent data frame manipulation.
+Here we will add the two columns of our data frame together
+and place the result in a new third column.
+
+Setup:
+
+```julia
+julia> df = DataFrame(x = 1:3, y = 4:6)  # define data frame
+3×2 DataFrame
+ Row │ x      y
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+```
+
+Manipulation:
+
+```julia
+julia> transform!(df, [:x, :y] => (+) => :z)
+3×3 DataFrame
+ Row │ x      y      z
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      5
+   2 │     2      5      7
+   3 │     3      6      9
+```
+
+Dot Syntax:
+
+```julia
+julia> df.x  # dot syntax returns a vector
+3-element Vector{Int64}:
+ 1
+ 2
+ 3
+
+julia> df.z = df.x + df.y
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
+
+julia> df  # see that the previous expression updated the data frame `df`
+3×3 DataFrame
+ Row │ x      y      z
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      5
+   2 │     2      5      7
+   3 │     3      6      9
+```
+
+Recall that the return type from a data frame manipulation function call is always a `DataFrame`.
+The return type of a data frame column accessed with dot syntax is a `Vector`.
+Thus the expression `df.x + df.y` gets the column data as vectors
+and returns the result of the vector addition.
+However, in that same line,
+we assigned the resultant `Vector` to a new column `z` in the data frame `df`.
+We could have instead assigned the resultant `Vector` to some other variable,
+and then `df` would not have been altered.
+The approach with dot syntax is very versatile
+since the data getting, mathematics, and data setting can be separate steps.
+
+```julia
+julia> df.x
+3-element Vector{Int64}:
+ 1
+ 2
+ 3
+
+julia> v = df.x + df.y
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
+
+julia> df.z = v
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
+```
+
+One downside to dot syntax is that the column name must be explicitly written in the code.
+Indexing syntax can perform a similar operation with dynamic column names.
+(Manipulation functions can also work with dynamic column names as will be shown in the next example.)
+
+```julia
+julia> df = DataFrame("My First Column" => 1:3, "My Second Column" => 4:6)  # define data frame
+3×2 DataFrame
+ Row │ My First Column  My Second Column
+     │ Int64            Int64
+─────┼───────────────────────────────────
+   1 │               1                 4
+   2 │               2                 5
+   3 │               3                 6
+
+julia> c1 = "My First Column"; c2 = "My Second Column"; c3 = "My Third Column";  # define column names
+
+# Imagine the above data was read from a file or entered by a user at runtime.
+
+julia> df.c1  # dot syntax expects an explicit column name and cannot be used
+ERROR: ArgumentError: column name :c1 not found in the data frame
+
+julia> df[:, c3] = df[:, c1] + df[:, c2]  # access columns with names stored in variables
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
+
+julia> df  # see that the previous expression updated the data frame `df`
+3×3 DataFrame
+ Row │ My First Column  My Second Column  My Third Column
+     │ Int64            Int64             Int64
 ─────┼────────────────────────────────────────────────────
-   1 │     0     67  male         2  own               67
-   2 │     1     22  female       2  own               67
-   3 │     2     49  male         1  own               67
-   4 │     3     45  male         2  free              67
-   5 │     4     53  male         2  free              67
-   6 │     5     35  male         1  free              67
-   7 │     6     53  male         2  own               67
-   8 │     7     35  male         3  rent              67
+   1 │               1                 4                5
+   2 │               2                 5                7
+   3 │               3                 6                9
 ```
 
-In the example below we are swapping values stored in columns `:Sex` and `:Age`:
+One benefit of using manipulation functions is that
+the name of the data frame only needs to be written once.
 
-```jldoctest dataframe
-julia> transform(german, :Age => :Sex, :Sex => :Age)
-1000×10 DataFrame
-  Row │ id     Age      Sex    Job    Housing  Saving accounts  Checking accou ⋯
-      │ Int64  String7  Int64  Int64  String7  String15         String15       ⋯
-──────┼─────────────────────────────────────────────────────────────────────────
-    1 │     0  male        67      2  own      NA               little         ⋯
-    2 │     1  female      22      2  own      little           moderate
-    3 │     2  male        49      1  own      little           NA
-    4 │     3  male        45      2  free     little           little
-    5 │     4  male        53      2  free     little           little         ⋯
-    6 │     5  male        35      1  free     NA               NA
-    7 │     6  male        53      2  own      quite rich       NA
-    8 │     7  male        35      3  rent     little           moderate
-  ⋮   │   ⋮       ⋮       ⋮      ⋮       ⋮            ⋮                ⋮       ⋱
-  994 │   993  male        30      3  own      little           little         ⋯
-  995 │   994  male        50      2  own      NA               NA
-  996 │   995  female      31      1  own      little           NA
-  997 │   996  male        40      3  own      little           little
-  998 │   997  male        38      2  own      little           NA             ⋯
-  999 │   998  male        23      2  free     little           little
- 1000 │   999  male        27      2  own      moderate         moderate
-                                                  4 columns and 985 rows omitted
+Setup:
+
+```julia
+julia> my_very_long_data_frame_name = DataFrame(
+           "My First Column" => 1:3,
+           "My Second Column" => 4:6
+       )  # define data frame
+3×2 DataFrame
+ Row │ My First Column  My Second Column
+     │ Int64            Int64
+─────┼───────────────────────────────────
+   1 │               1                 4
+   2 │               2                 5
+   3 │               3                 6
+
+julia> c1 = "My First Column"; c2 = "My Second Column"; c3 = "My Third Column";  # define column names
 ```
 
-If we give more than one source column to a transformation they are passed as
-consecutive positional arguments. So for example the
-`[:Age, :Job] => (+) => :res` transformation below evaluates `+(df1.Age, df1.Job)`
-(which adds two columns) and stores the result in the `:res` column:
+Manipulation:
 
-```jldoctest dataframe
-julia> select(german, :Age, :Job, [:Age, :Job] => (+) => :res)
-1000×3 DataFrame
-  Row │ Age    Job    res
-      │ Int64  Int64  Int64
-──────┼─────────────────────
-    1 │    67      2     69
-    2 │    22      2     24
-    3 │    49      1     50
-    4 │    45      2     47
-    5 │    53      2     55
-    6 │    35      1     36
-    7 │    53      2     55
-    8 │    35      3     38
-  ⋮   │   ⋮      ⋮      ⋮
-  994 │    30      3     33
-  995 │    50      2     52
-  996 │    31      1     32
-  997 │    40      3     43
-  998 │    38      2     40
-  999 │    23      2     25
- 1000 │    27      2     29
-            985 rows omitted
-```
-
-This concludes the introductory examples of data frame manipulations.
-See later sections of the manual,
-particularly [A Gentle Introduction to Data Frame Manipulation Functions](@ref),
-for additional explanations and functionality,
-including how to broadcast operation functions and operation pairs
-and how to pass or produce multiple columns using `AsTable`.
+```julia
+
+julia> transform!(my_very_long_data_frame_name, [c1, c2] => (+) => c3)
+3×3 DataFrame
+ Row │ My First Column  My Second Column  My Third Column
+     │ Int64            Int64             Int64
+─────┼────────────────────────────────────────────────────
+   1 │               1                 4                5
+   2 │               2                 5                7
+   3 │               3                 6                9
+```
+
+Indexing:
+
+```julia
+julia> my_very_long_data_frame_name[:, c3] = my_very_long_data_frame_name[:, c1] + my_very_long_data_frame_name[:, c2]
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
+
+julia> df  # see that the previous expression updated the data frame `df`
+3×3 DataFrame
+ Row │ My First Column  My Second Column  My Third Column
+     │ Int64            Int64             Int64
+─────┼────────────────────────────────────────────────────
+   1 │               1                 4                5
+   2 │               2                 5                7
+   3 │               3                 6                9
+```
+
+### Speed
+
+TODO: Compare speed, memory, and view options (@view, !, :, copycols=false).
+(May need someone else to write this part unless I do more studying.)
diff --git a/docs/src/man/manipulation_functions.md b/docs/src/man/manipulation_functions.md
deleted file mode 100644
index 72df94476..000000000
--- a/docs/src/man/manipulation_functions.md
+++ /dev/null
@@ -1,1431 +0,0 @@
-# A Gentle Introduction to Data Frame Manipulation Functions
-
-The seven functions below can be used to manipulate data frames
-by applying operations to them.
-This section of the documentation aims to methodically build understanding
-of these functions and their possible arguments
-by reinforcing foundational concepts and slowly increasing complexity.
-
-The functions without a `!` in their name
-will create a new data frame based on the source data frame,
-so you will probably want to store the new data frame to a new variable name,
-e.g. `new_df = transform(source_df, operation)`.
-The functions with a `!` at the end of their name
-will modify an existing data frame in-place,
-so there is typically no need to assign the result to a variable,
-e.g. `transform!(source_df, operation)` instead of
-`source_df = transform(source_df, operation)`.
-
-The number of columns and rows in the resultant data frame varies
-depending on the manipulation function employed.
-
-| Function     | Memory Usage                     | Column Retention                        | Row Retention                                       |
-| ------------ | -------------------------------- | --------------------------------------- | --------------------------------------------------- |
-| `transform`  | Creates a new data frame.        | Retains original and resultant columns. | Retains same number of rows as original data frame. |
-| `transform!` | Modifies an existing data frame. | Retains original and resultant columns. | Retains same number of rows as original data frame. |
-| `select`     | Creates a new data frame.        | Retains only resultant columns.         | Retains same number of rows as original data frame. |
-| `select!`    | Modifies an existing data frame. | Retains only resultant columns.         | Retains same number of rows as original data frame. |
-| `subset`     | Creates a new data frame.        | Retains original columns.               | Retains only rows where condition is true.          |
-| `subset!`    | Modifies an existing data frame. | Retains original columns.               | Retains only rows where condition is true.          |
-| `combine`    | Creates a new data frame.        | Retains only resultant columns.         | Retains only resultant rows.                        |
-
-## Constructing Operations
-
-All of the functions above use the same syntax which is commonly
-`manipulation_function(dataframe, operation)`.
-The `operation` argument defines the
-operation to be applied to the source `dataframe`,
-and it can take any of the following common forms explained below:
-
-`source_column_selector`
-: selects source column(s) without manipulating or renaming them
-
-   Examples: `:a`, `[:a, :b]`, `All()`, `Not(:a)`
-
-`source_column_selector => operation_function`
-: passes source column(s) as arguments to a function
-and automatically names the resulting column(s)
-
-   Examples: `:a => sum`, `[:a, :b] => +`, `:a => ByRow(==(3))`
-
-`source_column_selector => operation_function => new_column_names`
-: passes source column(s) as arguments to a function
-and names the resulting column(s) `new_column_names`
-
-   Examples: `:a => sum => :sum_of_a`, `[:a, :b] => + => :a_plus_b`
-
-   *(Not available for `subset` or `subset!`)*
-
-`source_column_selector => new_column_names`
-: renames a source column,
-or splits a column containing collection elements into multiple new columns
-
-   Examples: `:a => :new_a`, `:a_b => [:a, :b]`, `:nt => AsTable`
-
-   (*Not available for `subset` or `subset!`*)
-
-The `=>` operator constructs a
-[Pair](https://docs.julialang.org/en/v1/base/collections/#Core.Pair),
-which is a type to link one object to another.
-(Pairs are commonly used to create elements of a
-[Dictionary](https://docs.julialang.org/en/v1/base/collections/#Dictionaries).)
-In DataFrames.jl manipulation functions,
-`Pair` arguments are used to define column `operations` to be performed.
-The examples shown above will be explained in more detail later.
-
-*The manipulation functions also have methods for applying multiple operations.
-See the later sections [Applying Multiple Operations per Manipulation](@ref)
-and [Broadcasting Operation Pairs](@ref) for more information.*
-
-### `source_column_selector`
-Inside an `operation`, `source_column_selector` is usually a column name
-or column index which identifies a data frame column.
-
-`source_column_selector` may be used as the entire `operation`
-with `select` or `select!` to isolate or reorder columns.
-
-```julia
-julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6], c = [7, 8, 9])
-3×3 DataFrame
- Row │ a      b      c
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      7
-   2 │     2      5      8
-   3 │     3      6      9
-
-julia> select(df, :b)
-3×1 DataFrame
- Row │ b
-     │ Int64
-─────┼───────
-   1 │     4
-   2 │     5
-   3 │     6
-
-julia> select(df, "b")
-3×1 DataFrame
- Row │ b
-     │ Int64
-─────┼───────
-   1 │     4
-   2 │     5
-   3 │     6
-
-julia> select(df, 2)
-3×1 DataFrame
- Row │ b
-     │ Int64
-─────┼───────
-   1 │     4
-   2 │     5
-   3 │     6
-```
-
-`source_column_selector` may also be used as the entire `operation`
-with `subset` or `subset!` if the source column contains `Bool` values.
-
-```julia
-julia> df = DataFrame(
-           name = ["Scott", "Jill", "Erica", "Jimmy"],
-           minor = [false, true, false, true],
-       )
-4×2 DataFrame
- Row │ name    minor
-     │ String  Bool
-─────┼───────────────
-   1 │ Scott   false
-   2 │ Jill     true
-   3 │ Erica   false
-   4 │ Jimmy    true
-
-julia> subset(df, :minor)
-2×2 DataFrame
- Row │ name    minor
-     │ String  Bool
-─────┼───────────────
-   1 │ Jill     true
-   2 │ Jimmy    true
-```
-
-`source_column_selector` may instead be a collection of columns such as a vector,
-a [regular expression](https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions),
-a `Not`, `Between`, `All`, or `Cols` expression,
-or a `:`.
-See the [Indexing](@ref) API for the full list of possible values with references.
-
-!!! Note
-      The Julia parser sometimes prevents `:` from being used by itself.
-      If you get
-      `ERROR: syntax: whitespace not allowed after ":" used for quoting`,
-      try using `All()`, `Cols(:)`, or `(:)` instead to select all columns.
-
-```julia
-julia> df = DataFrame(
-           id = [1, 2, 3],
-           first_name = ["José", "Emma", "Nathan"],
-           last_name = ["Garcia", "Marino", "Boyer"],
-           age = [61, 24, 33]
-       )
-3×4 DataFrame
- Row │ id     first_name  last_name  age
-     │ Int64  String      String     Int64
-─────┼─────────────────────────────────────
-   1 │     1  José        Garcia        61
-   2 │     2  Emma        Marino        24
-   3 │     3  Nathan      Boyer         33
-
-julia> select(df, [:last_name, :first_name])
-3×2 DataFrame
- Row │ last_name  first_name
-     │ String     String
-─────┼───────────────────────
-   1 │ Garcia     José
-   2 │ Marino     Emma
-   3 │ Boyer      Nathan
-
-julia> select(df, r"name")
-3×2 DataFrame
- Row │ first_name  last_name
-     │ String      String
-─────┼───────────────────────
-   1 │ José        Garcia
-   2 │ Emma        Marino
-   3 │ Nathan      Boyer
-
-julia> select(df, Not(:id))
-3×3 DataFrame
- Row │ first_name  last_name  age
-     │ String      String     Int64
-─────┼──────────────────────────────
-   1 │ José        Garcia        61
-   2 │ Emma        Marino        24
-   3 │ Nathan      Boyer         33
-
-julia> select(df, Between(2,4))
-3×3 DataFrame
- Row │ first_name  last_name  age
-     │ String      String     Int64
-─────┼──────────────────────────────
-   1 │ José        Garcia        61
-   2 │ Emma        Marino        24
-   3 │ Nathan      Boyer         33
-
-julia> df2 = DataFrame(
-           name = ["Scott", "Jill", "Erica", "Jimmy"],
-           minor = [false, true, false, true],
-           male = [true, false, false, true],
-       )
-4×3 DataFrame
- Row │ name    minor  male
-     │ String  Bool   Bool
-─────┼──────────────────────
-   1 │ Scott   false   true
-   2 │ Jill     true  false
-   3 │ Erica   false  false
-   4 │ Jimmy    true   true
-
-julia> subset(df2, [:minor, :male])
-1×3 DataFrame
- Row │ name    minor  male
-     │ String  Bool   Bool
-─────┼─────────────────────
-   1 │ Jimmy    true  true
-```
-
-### `operation_function`
-Inside an `operation` pair, `operation_function` is a function
-which operates on data frame columns passed as vectors.
-When multiple columns are selected by `source_column_selector`,
-the `operation_function` will receive the columns as separate positional arguments
-in the order they were selected, e.g. `f(column1, column2, column3)`.
-
-```julia
-julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 4])
-3×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     2      5
-   3 │     3      4
-
-julia> combine(df, :a => sum)
-1×1 DataFrame
- Row │ a_sum
-     │ Int64
-─────┼───────
-   1 │     6
-
-julia> transform(df, :b => maximum) # `transform` and `select` copy scalar result to all rows
-3×3 DataFrame
- Row │ a      b      b_maximum
-     │ Int64  Int64  Int64
-─────┼─────────────────────────
-   1 │     1      4          5
-   2 │     2      5          5
-   3 │     3      4          5
-
-julia> transform(df, [:b, :a] => -) # vector subtraction is okay
-3×3 DataFrame
- Row │ a      b      b_a_-
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      3
-   2 │     2      5      3
-   3 │     3      4      1
-
-julia> transform(df, [:a, :b] => *) # vector multiplication is not defined
-ERROR: MethodError: no method matching *(::Vector{Int64}, ::Vector{Int64})
-```
-
-Don't worry! There is a quick fix for the previous error.
-If you want to apply a function to each element in a column
-instead of to the entire column vector,
-then you can wrap your element-wise function in `ByRow` like
-`ByRow(my_elementwise_function)`.
-This will apply `my_elementwise_function` to every element in the column
-and then collect the results back into a vector.
-
-```julia
-julia> transform(df, [:a, :b] => ByRow(*))
-3×3 DataFrame
- Row │ a      b      a_b_*
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      4
-   2 │     2      5     10
-   3 │     3      4     12
-
-julia> transform(df, Cols(:) => ByRow(max))
-3×3 DataFrame
- Row │ a      b      a_b_max
-     │ Int64  Int64  Int64
-─────┼───────────────────────
-   1 │     1      4        4
-   2 │     2      5        5
-   3 │     3      4        4
-
-julia> f(x) = x + 1
-f (generic function with 1 method)
-
-julia> transform(df, :a => ByRow(f))
-3×3 DataFrame
- Row │ a      b      a_f
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      2
-   2 │     2      5      3
-   3 │     3      4      4
-```
-
-Alternatively, you may just want to define the function itself so it
-[broadcasts](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
-over vectors.
-
-```julia
-julia> g(x) = x .+ 1
-g (generic function with 1 method)
-
-julia> transform(df, :a => g)
-3×3 DataFrame
- Row │ a      b      a_g
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      2
-   2 │     2      5      3
-   3 │     3      4      4
-
-julia> h(x, y) = x .+ y .+ 1
-h (generic function with 1 method)
-
-julia> transform(df, [:a, :b] => h)
-3×3 DataFrame
- Row │ a      b      a_b_h
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      6
-   2 │     2      5      8
-   3 │     3      4      8
-```
-
-[Anonymous functions](https://docs.julialang.org/en/v1/manual/functions/#man-anonymous-functions)
-are a convenient way to define and use an `operation_function`
-all within the manipulation function call.
-
-```julia
-julia> select(df, :a => ByRow(x -> x + 1))
-3×1 DataFrame
- Row │ a_function
-     │ Int64
-─────┼────────────
-   1 │          2
-   2 │          3
-   3 │          4
-
-julia> transform(df, [:a, :b] => ByRow((x, y) -> 2x + y))
-3×3 DataFrame
- Row │ a      b      a_b_function
-     │ Int64  Int64  Int64
-─────┼────────────────────────────
-   1 │     1      4             6
-   2 │     2      5             9
-   3 │     3      4            10
-
-julia> subset(df, :b => ByRow(x -> x < 5))
-2×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     3      4
-
-julia> subset(df, :b => ByRow(<(5))) # shorter version of the previous
-2×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     3      4
-```
-
-!!! Note
-    `operation_functions` within `subset` or `subset!` function calls
-    must return a Boolean vector.
-    `true` elements in the Boolean vector will determine
-    which rows are retained in the resulting data frame.
-
-As demonstrated above, `DataFrame` columns are usually passed
-from `source_column_selector` to `operation_function` as one or more
-vector arguments.
-However, when `AsTable(source_column_selector)` is used,
-the selected columns are collected and passed as a single `NamedTuple`
-to `operation_function`.
-
-This is often useful when your `operation_function` is defined to operate
-on a single collection argument rather than on multiple positional arguments.
-The distinction is somewhat similar to the difference between the built-in
-`min` and `minimum` functions.
-`min` is defined to find the minimum value among multiple positional arguments,
-while `minimum` is defined to find the minimum value
-among the elements of a single collection argument.
-
-```julia
-julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 2:-1:1)
-2×4 DataFrame
- Row │ a      b      c      d
-     │ Int64  Int64  Int64  Int64
-─────┼────────────────────────────
-   1 │     1      3      5      2
-   2 │     2      4      6      1
-
-julia> select(df, Cols(:) => ByRow(min)) # min operates on multiple arguments
-2×1 DataFrame
- Row │ a_b_etc_min
-     │ Int64
-─────┼─────────────
-   1 │           1
-   2 │           1
-
-julia> select(df, AsTable(:) => ByRow(minimum)) # minimum operates on a collection
-2×1 DataFrame
- Row │ a_b_etc_minimum
-     │ Int64
-─────┼─────────────────
-   1 │               1
-   2 │               1
-
-julia> select(df, [:a,:b] => ByRow(+)) # `+` operates on a multiple arguments
-2×1 DataFrame
- Row │ a_b_+
-     │ Int64
-─────┼───────
-   1 │     4
-   2 │     6
-
-julia> select(df, AsTable([:a,:b]) => ByRow(sum)) # `sum` operates on a collection
-2×1 DataFrame
- Row │ a_b_sum
-     │ Int64
-─────┼─────────
-   1 │       4
-   2 │       6
-
-julia> using Statistics # contains the `mean` function
-
-julia> select(df, AsTable(Between(:b, :d)) => ByRow(mean)) # `mean` operates on a collection
-2×1 DataFrame
- Row │ b_c_d_mean
-     │ Float64
-─────┼────────────
-   1 │    3.33333
-   2 │    3.66667
-```
-
-`AsTable` can also be used to pass columns to a function which operates
-on fields of a `NamedTuple`.
-
-```julia
-julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 7:8)
-2×4 DataFrame
- Row │ a      b      c      d
-     │ Int64  Int64  Int64  Int64
-─────┼────────────────────────────
-   1 │     1      3      5      7
-   2 │     2      4      6      8
-
-julia> f(nt) = nt.a + nt.d
-f (generic function with 1 method)
-
-julia> transform(df, AsTable(:) => ByRow(f))
-2×5 DataFrame
- Row │ a      b      c      d      a_b_etc_f
-     │ Int64  Int64  Int64  Int64  Int64
-─────┼───────────────────────────────────────
-   1 │     1      3      5      7          8
-   2 │     2      4      6      8         10
-```
-
-As demonstrated above,
-in the `source_column_selector => operation_function` operation pair form,
-the results of an operation will be placed into a new column with an
-automatically-generated name based on the operation;
-the new column name will be the `operation_function` name
-appended to the source column name(s) with an underscore.
-
-This automatic column naming behavior can be avoided in two ways.
-First, the operation result can be placed back into the original column
-with the original column name by switching the keyword argument `renamecols`
-from its default value (`true`) to `renamecols=false`.
-This option prevents the function name from being appended to the column name
-as it usually would be.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> transform(df, :a => ByRow(x->x+10), renamecols=false) # add 10 in-place
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │    11      5
-   2 │    12      6
-   3 │    13      7
-   4 │    14      8
-```
-
-The second method to avoid the default manipulation column naming is to
-specify your own `new_column_names`.
-
-### `new_column_names`
-
-`new_column_names` can be included at the end of an `operation` pair to specify
-the name of the new column(s).
-`new_column_names` may be a symbol, string, function, vector of symbols, vector of strings, or `AsTable`.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> transform(df, Cols(:) => ByRow(+) => :c)
-4×3 DataFrame
- Row │ a      b      c
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      6
-   2 │     2      6      8
-   3 │     3      7     10
-   4 │     4      8     12
-
-julia> transform(df, Cols(:) => ByRow(+) => "a+b")
-4×3 DataFrame
- Row │ a      b      a+b
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      6
-   2 │     2      6      8
-   3 │     3      7     10
-   4 │     4      8     12
-
-julia> transform(df, :a => ByRow(x->x+10) => "a+10")
-4×3 DataFrame
- Row │ a      b      a+10
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5     11
-   2 │     2      6     12
-   3 │     3      7     13
-   4 │     4      8     14
-```
-
-The `source_column_selector => new_column_names` operation form
-can be used to rename columns without an intermediate function.
-However, there are `rename` and `rename!` functions,
-which accept similar syntax,
-that tend to be more useful for this operation.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> transform(df, :a => :apple) # adds column `apple`
-4×3 DataFrame
- Row │ a      b      apple
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      1
-   2 │     2      6      2
-   3 │     3      7      3
-   4 │     4      8      4
-
-julia> select(df, :a => :apple) # retains only column `apple`
-4×1 DataFrame
- Row │ apple
-     │ Int64
-─────┼───────
-   1 │     1
-   2 │     2
-   3 │     3
-   4 │     4
-
-julia> rename(df, :a => :apple) # renames column `a` to `apple` in-place
-4×2 DataFrame
- Row │ apple  b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-```
-
-If `new_column_names` already exist in the source data frame,
-those columns will be replaced in the existing column location
-rather than being added to the end.
-This can be done by manually specifying an existing column name
-or by using the `renamecols=false` keyword argument.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> transform(df, :b => (x -> x .+ 10))  # automatic new column and column name
-4×3 DataFrame
- Row │ a      b      b_function
-     │ Int64  Int64  Int64
-─────┼──────────────────────────
-   1 │     1      5          15
-   2 │     2      6          16
-   3 │     3      7          17
-   4 │     4      8          18
-
-julia> transform(df, :b => (x -> x .+ 10), renamecols=false)  # transform column in-place
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1     15
-   2 │     2     16
-   3 │     3     17
-   4 │     4     18
-
-julia> transform(df, :b => (x -> x .+ 10) => :a)  # replace column :a
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │    15      5
-   2 │    16      6
-   3 │    17      7
-   4 │    18      8
-```
-
-Actually, `renamecols=false` just prevents the function name from being appended to the final column name such that the operation is *usually* returned to the same column.
-
-```julia
-julia> transform(df, [:a, :b] => +)  # new column name is all source columns and function name
-4×3 DataFrame
- Row │ a      b      a_b_+
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      6
-   2 │     2      6      8
-   3 │     3      7     10
-   4 │     4      8     12
-
-julia> transform(df, [:a, :b] => +, renamecols=false)  # same as above but with no function name
-4×3 DataFrame
- Row │ a      b      a_b
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      6
-   2 │     2      6      8
-   3 │     3      7     10
-   4 │     4      8     12
-
-julia> transform(df, [:a, :b] => (+) => :a)  # manually overwrite column :a (see Note below about parentheses)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     6      5
-   2 │     8      6
-   3 │    10      7
-   4 │    12      8
-```
-
-In the `source_column_selector => operation_function => new_column_names` operation form,
-`new_column_names` may also be a renaming function which operates on a string
-to create the destination column names programmatically.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> add_prefix(s) = "new_" * s
-add_prefix (generic function with 1 method)
-
-julia> transform(df, :a => (x -> 10 .* x) => add_prefix) # with named renaming function
-4×3 DataFrame
- Row │ a      b      new_a
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5     10
-   2 │     2      6     20
-   3 │     3      7     30
-   4 │     4      8     40
-
-julia> transform(df, :a => (x -> 10 .* x) => (s -> "new_" * s)) # with anonymous renaming function
-4×3 DataFrame
- Row │ a      b      new_a
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5     10
-   2 │     2      6     20
-   3 │     3      7     30
-   4 │     4      8     40
-```
-
-!!! Note
-      It is a good idea to wrap anonymous functions in parentheses
-      to avoid the `=>` operator accidently becoming part of the anonymous function.
-      The examples above do not work correctly without the parentheses!
-      ```julia
-      julia> transform(df, :a => x -> 10 .* x => add_prefix)  # Not what we wanted!
-      4×3 DataFrame
-       Row │ a      b      a_function
-           │ Int64  Int64  Pair…
-      ─────┼────────────────────────────────────────────
-         1 │     1      5  [10, 20, 30, 40]=>add_prefix
-         2 │     2      6  [10, 20, 30, 40]=>add_prefix
-         3 │     3      7  [10, 20, 30, 40]=>add_prefix
-         4 │     4      8  [10, 20, 30, 40]=>add_prefix
-
-      julia> transform(df, :a => x -> 10 .* x => s -> "new_" * s)  # Not what we wanted!
-      4×3 DataFrame
-       Row │ a      b      a_function
-           │ Int64  Int64  Pair…
-      ─────┼─────────────────────────────────────
-         1 │     1      5  [10, 20, 30, 40]=>#18
-         2 │     2      6  [10, 20, 30, 40]=>#18
-         3 │     3      7  [10, 20, 30, 40]=>#18
-         4 │     4      8  [10, 20, 30, 40]=>#18
-      ```
-
-A renaming function will not work in the
-`source_column_selector => new_column_names` operation form
-because a function in the second element of the operation pair is assumed to take
-the `source_column_selector => operation_function` operation form.
-To work around this limitation, use the
-`source_column_selector => operation_function => new_column_names` operation form
-with `identity` as the `operation_function`.
-
-```julia
-julia> transform(df, :a => add_prefix)
-ERROR: MethodError: no method matching *(::String, ::Vector{Int64})
-
-julia> transform(df, :a => identity => add_prefix)
-4×3 DataFrame
- Row │ a      b      new_a
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      1
-   2 │     2      6      2
-   3 │     3      7      3
-   4 │     4      8      4
-```
-
-In this case though,
-it is probably again more useful to use the `rename` or `rename!` function
-rather than one of the manipulation functions
-in order to rename in-place and avoid the intermediate `operation_function`.
-```julia
-julia> rename(add_prefix, df)  # rename all columns with a function
-4×2 DataFrame
- Row │ new_a  new_b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> rename(add_prefix, df; cols=:a)  # rename some columns with a function
-4×2 DataFrame
- Row │ new_a  b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-```
-
-In the `source_column_selector => new_column_names` operation form,
-only a single source column may be selected per operation,
-so why is `new_column_names` plural?
-It is possible to split the data contained inside a single column
-into multiple new columns by supplying a vector of strings or symbols
-as `new_column_names`.
-
-```julia
-julia> df = DataFrame(data = [(1,2), (3,4)]) # vector of tuples
-2×1 DataFrame
- Row │ data
-     │ Tuple…
-─────┼────────
-   1 │ (1, 2)
-   2 │ (3, 4)
-
-julia> transform(df, :data => [:first, :second]) # manual naming
-2×3 DataFrame
- Row │ data    first  second
-     │ Tuple…  Int64  Int64
-─────┼───────────────────────
-   1 │ (1, 2)      1       2
-   2 │ (3, 4)      3       4
-```
-
-This kind of data splitting can even be done automatically with `AsTable`.
-
-```julia
-julia> transform(df, :data => AsTable) # default automatic naming with tuples
-2×3 DataFrame
- Row │ data    x1     x2
-     │ Tuple…  Int64  Int64
-─────┼──────────────────────
-   1 │ (1, 2)      1      2
-   2 │ (3, 4)      3      4
-```
-
-If a data frame column contains `NamedTuple`s,
-then `AsTable` will preserve the field names.
-```julia
-julia> df = DataFrame(data = [(a=1,b=2), (a=3,b=4)]) # vector of named tuples
-2×1 DataFrame
- Row │ data
-     │ NamedTup…
-─────┼────────────────
-   1 │ (a = 1, b = 2)
-   2 │ (a = 3, b = 4)
-
-julia> transform(df, :data => AsTable) # keeps names from named tuples
-2×3 DataFrame
- Row │ data            a      b
-     │ NamedTup…       Int64  Int64
-─────┼──────────────────────────────
-   1 │ (a = 1, b = 2)      1      2
-   2 │ (a = 3, b = 4)      3      4
-```
-
-!!! Note
-      To pack multiple columns into a single column of `NamedTuple`s
-      (reverse of the above operation)
-      apply the `identity` function `ByRow`, e.g.
-      `transform(df, AsTable([:a, :b]) => ByRow(identity) => :data)`.
-
-Renaming functions also work for multi-column transformations,
-but they must operate on a vector of strings.
-
-```julia
-julia> df = DataFrame(data = [(1,2), (3,4)])
-2×1 DataFrame
- Row │ data
-     │ Tuple…
-─────┼────────
-   1 │ (1, 2)
-   2 │ (3, 4)
-
-julia> new_names(v) = ["primary ", "secondary "] .* v
-new_names (generic function with 1 method)
-
-julia> transform(df, :data => identity => new_names)
-2×3 DataFrame
- Row │ data    primary data  secondary data
-     │ Tuple…  Int64         Int64
-─────┼──────────────────────────────────────
-   1 │ (1, 2)             1               2
-   2 │ (3, 4)             3               4
-```
-
-## Applying Multiple Operations per Manipulation
-All data frame manipulation functions can accept multiple `operation` pairs
-at once using any of the following methods:
-- `manipulation_function(dataframe, operation1, operation2)`   : multiple arguments
-- `manipulation_function(dataframe, [operation1, operation2])` : vector argument
-- `manipulation_function(dataframe, [operation1 operation2])`  : matrix argument
-
-Passing multiple operations is especially useful for the `select`, `select!`,
-and `combine` manipulation functions,
-since they only retain columns which are a result of the passed operations.
-
-```julia
-julia> df = DataFrame(a = 1:4, b = [50,50,60,60], c = ["hat","bat","cat","dog"])
-4×3 DataFrame
- Row │ a      b      c
-     │ Int64  Int64  String
-─────┼──────────────────────
-   1 │     1     50  hat
-   2 │     2     50  bat
-   3 │     3     60  cat
-   4 │     4     60  dog
-
-julia> combine(df, :a => maximum, :b => sum, :c => join) # 3 combine operations
-1×3 DataFrame
- Row │ a_maximum  b_sum  c_join
-     │ Int64      Int64  String
-─────┼────────────────────────────────
-   1 │         4    220  hatbatcatdog
-
-julia> select(df, :c, :b, :a) # re-order columns
-4×3 DataFrame
- Row │ c       b      a
-     │ String  Int64  Int64
-─────┼──────────────────────
-   1 │ hat        50      1
-   2 │ bat        50      2
-   3 │ cat        60      3
-   4 │ dog        60      4
-
-ulia> select(df, :b, :) # `:` here means all other columns
-4×3 DataFrame
- Row │ b      a      c
-     │ Int64  Int64  String
-─────┼──────────────────────
-   1 │    50      1  hat
-   2 │    50      2  bat
-   3 │    60      3  cat
-   4 │    60      4  dog
-
-julia> select(
-           df,
-           :c => (x -> "a " .* x) => :one_c,
-           :a => (x -> 100x),
-           :b,
-           renamecols=false
-       ) # can mix operation forms
-4×3 DataFrame
- Row │ one_c   a      b
-     │ String  Int64  Int64
-─────┼──────────────────────
-   1 │ a hat     100     50
-   2 │ a bat     200     50
-   3 │ a cat     300     60
-   4 │ a dog     400     60
-
-julia> select(
-           df,
-           :c => ByRow(reverse),
-           :c => ByRow(uppercase)
-       ) # multiple operations on same column
-4×2 DataFrame
- Row │ c_reverse  c_uppercase
-     │ String     String
-─────┼────────────────────────
-   1 │ tah        HAT
-   2 │ tab        BAT
-   3 │ tac        CAT
-   4 │ god        DOG
-```
-
-In the last two examples,
-the manipulation function arguments were split across multiple lines.
-This is a good way to make manipulations with many operations more readable.
-
-Passing multiple operations to `subset` or `subset!` is an easy way to narrow in
-on a particular row of data.
-
-```julia
-julia> subset(
-           df,
-           :b => ByRow(==(60)),
-           :c => ByRow(contains("at"))
-       ) # rows with 60 and "at"
-1×3 DataFrame
- Row │ a      b      c
-     │ Int64  Int64  String
-─────┼──────────────────────
-   1 │     3     60  cat
-```
-
-Note that all operations within a single manipulation must use the data
-as it existed before the function call
-i.e. you cannot use newly created columns for subsequent operations
-within the same manipulation.
-
-```julia
-julia> transform(
-           df,
-           [:a, :b] => ByRow(+) => :d,
-           :d => (x -> x ./ 2),
-       ) # requires two separate transformations
-ERROR: ArgumentError: column name :d not found in the data frame; existing most similar names are: :a, :b and :c
-
-julia> new_df = transform(df, [:a, :b] => ByRow(+) => :d)
-4×4 DataFrame
- Row │ a      b      c       d
-     │ Int64  Int64  String  Int64
-─────┼─────────────────────────────
-   1 │     1     50  hat        51
-   2 │     2     50  bat        52
-   3 │     3     60  cat        63
-   4 │     4     60  dog        64
-
-julia> transform!(new_df, :d => (x -> x ./ 2) => :d_2)
-4×5 DataFrame
- Row │ a      b      c       d      d_2
-     │ Int64  Int64  String  Int64  Float64
-─────┼──────────────────────────────────────
-   1 │     1     50  hat        51     25.5
-   2 │     2     50  bat        52     26.0
-   3 │     3     60  cat        63     31.5
-   4 │     4     60  dog        64     32.0
-```
-
-
-## Broadcasting Operation Pairs
-
-[Broadcasting](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
-pairs with `.=>` is often a convenient way to generate multiple
-similar `operation`s to be applied within a single manipulation.
-Broadcasting within the `Pair` of an `operation` is no different than
-broadcasting in base Julia.
-The broadcasting `.=>` will be expanded into a vector of pairs
-(`[operation1, operation2, ...]`),
-and this expansion will occur before the manipulation function is invoked.
-Then the manipulation function will use the
-`manipulation_function(dataframe, [operation1, operation2, ...])` method.
-This process will be explained in more detail below.
-
-To illustrate these concepts, let us first examine the `Type` of a basic `Pair`.
-In DataFrames.jl, a symbol, string, or integer
-may be used to select a single column.
-Some `Pair`s with these types are below.
-
-```julia
-julia> typeof(:x => :a)
-Pair{Symbol, Symbol}
-
-julia> typeof("x" => "a")
-Pair{String, String}
-
-julia> typeof(1 => "a")
-Pair{Int64, String}
-```
-
-Any of the `Pair`s above could be used to rename the first column
-of the data frame below to `a`.
-
-```julia
-julia> df = DataFrame(x = 1:3, y = 4:6)
-3×2 DataFrame
- Row │ x      y
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     2      5
-   3 │     3      6
-
-julia> select(df, :x => :a)
-3×1 DataFrame
- Row │ a
-     │ Int64
-─────┼───────
-   1 │     1
-   2 │     2
-   3 │     3
-
-julia> select(df, 1 => "a")
-3×1 DataFrame
- Row │ a
-     │ Int64
-─────┼───────
-   1 │     1
-   2 │     2
-   3 │     3
-```
-
-What should we do if we want to keep and rename both the `x` and `y` column?
-One option is to supply a `Vector` of operation `Pair`s to `select`.
-`select` will process all of these operations in order.
-
-```julia
-julia> ["x" => "a", "y" => "b"]
-2-element Vector{Pair{String, String}}:
- "x" => "a"
- "y" => "b"
-
-julia> select(df, ["x" => "a", "y" => "b"])
-3×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     2      5
-   3 │     3      6
-```
-
-We can use broadcasting to simplify the syntax above.
-
-```julia
-julia> ["x", "y"] .=> ["a", "b"]
-2-element Vector{Pair{String, String}}:
- "x" => "a"
- "y" => "b"
-
-julia> select(df, ["x", "y"] .=> ["a", "b"])
-3×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     2      5
-   3 │     3      6
-```
-
-Notice that `select` sees the same `Vector{Pair{String, String}}` operation
-argument whether the individual pairs are written out explicitly or
-constructed with broadcasting.
-The broadcasting is applied before the call to `select`.
-
-```julia
-julia> ["x" => "a", "y" => "b"] == (["x", "y"] .=> ["a", "b"])
-true
-```
-
-!!! Note
-      These operation pairs (or vector of pairs) can be given variable names.
-      This is uncommon in practice but could be helpful for intermediate
-      inspection and testing.
-      ```julia
-      df = DataFrame(x = 1:3, y = 4:6)       # create data frame
-      operation = ["x", "y"] .=> ["a", "b"]  # save operation to variable
-      typeof(operation)                      # check type of operation
-      first(operation)                       # check first pair in operation
-      last(operation)                        # check last pair in operation
-      select(df, operation)                  # manipulate `df` with `operation`
-      ```
-
-In Julia,
-a non-vector broadcasted with a vector will be repeated in each resultant pair element.
-
-```julia
-julia> ["x", "y"] .=> :a    # :a is repeated
-2-element Vector{Pair{String, Symbol}}:
- "x" => :a
- "y" => :a
-
-julia> 1 .=> [:a, :b]       # 1 is repeated
-2-element Vector{Pair{Int64, Symbol}}:
- 1 => :a
- 1 => :b
-```
-
-We can use this fact to easily broadcast an `operation_function` to multiple columns.
-
-```julia
-julia> f(x) = 2 * x
-f (generic function with 1 method)
-
-julia> ["x", "y"] .=> f  # f is repeated
-2-element Vector{Pair{String, typeof(f)}}:
- "x" => f
- "y" => f
-
-julia> select(df, ["x", "y"] .=> f)  # apply f with automatic column renaming
-3×2 DataFrame
- Row │ x_f    y_f
-     │ Int64  Int64
-─────┼──────────────
-   1 │     2      8
-   2 │     4     10
-   3 │     6     12
-
-julia> ["x", "y"] .=> f .=> ["a", "b"]  # f is repeated
-2-element Vector{Pair{String, Pair{typeof(f), String}}}:
- "x" => (f => "a")
- "y" => (f => "b")
-
-julia> select(df, ["x", "y"] .=> f .=> ["a", "b"])  # apply f with manual column renaming
-3×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     2      8
-   2 │     4     10
-   3 │     6     12
-```
-
-A renaming function can be applied to multiple columns in the same way.
-It will also be repeated in each operation `Pair`.
-
-```julia
-julia> newname(s::String) = s * "_new"
-newname (generic function with 1 method)
-
-julia> ["x", "y"] .=> f .=> newname  # both f and newname are repeated
-2-element Vector{Pair{String, Pair{typeof(f), typeof(newname)}}}:
- "x" => (f => newname)
- "y" => (f => newname)
-
-julia> select(df, ["x", "y"] .=> f .=> newname)  # apply f then rename column with newname
-3×2 DataFrame
- Row │ x_new  y_new
-     │ Int64  Int64
-─────┼──────────────
-   1 │     2      8
-   2 │     4     10
-   3 │     6     12
-```
-
-You can see from the type output above
-that a three element pair does not actually exist.
-A `Pair` (as the name implies) can only contain two elements.
-Thus, `:x => :y => :z` becomes a nested `Pair`,
-where `:x` is the first element and points to the `Pair` `:y => :z`,
-which is the second element.
-
-```julia
-julia> p = :x => :y => :z
-:x => (:y => :z)
-
-julia> p[1]
-:x
-
-julia> p[2]
-:y => :z
-
-julia> p[2][1]
-:y
-
-julia> p[2][2]
-:z
-
-julia> p[3] # there is no index 3 for a pair
-ERROR: BoundsError: attempt to access Pair{Symbol, Pair{Symbol, Symbol}} at index [3]
-```
-
-In the previous examples, the source columns have been individually selected.
-When broadcasting multiple columns to the same function,
-often similarities in the column names or position can be exploited to avoid
-tedious selection.
-Consider a data frame with temperature data at three different locations
-taken over time.
-```julia
-julia> df = DataFrame(Time = 1:4,
-                      Temperature1 = [20, 23, 25, 28],
-                      Temperature2 = [33, 37, 41, 44],
-                      Temperature3 = [15, 10, 4, 0])
-4×4 DataFrame
- Row │ Time   Temperature1  Temperature2  Temperature3
-     │ Int64  Int64         Int64         Int64
-─────┼─────────────────────────────────────────────────
-   1 │     1            20            33            15
-   2 │     2            23            37            10
-   3 │     3            25            41             4
-   4 │     4            28            44             0
-```
-
-To convert all of the temperature data in one transformation,
-we just need to define a conversion function and broadcast
-it to all of the "Temperature" columns.
-
-```julia
-julia> celsius_to_kelvin(x) = x + 273
-celsius_to_kelvin (generic function with 1 method)
-
-julia> transform(
-           df,
-           Cols(r"Temp") .=> ByRow(celsius_to_kelvin),
-           renamecols = false
-       )
-4×4 DataFrame
- Row │ Time   Temperature1  Temperature2  Temperature3
-     │ Int64  Int64         Int64         Int64
-─────┼─────────────────────────────────────────────────
-   1 │     1           293           306           288
-   2 │     2           296           310           283
-   3 │     3           298           314           277
-   4 │     4           301           317           273
-```
-Or, simultaneously changing the column names:
-
-```julia
-julia> rename_function(s) = "Temperature $(last(s)) (K)"
-rename_function (generic function with 1 method)
-
-julia> select(
-           df,
-           "Time",
-           Cols(r"Temp") .=> ByRow(celsius_to_kelvin) .=> rename_function
-       )
-4×4 DataFrame
- Row │ Time   Temperature 1 (K)  Temperature 2 (K)  Temperature 3 (K)
-     │ Int64  Int64              Int64              Int64
-─────┼────────────────────────────────────────────────────────────────
-   1 │     1                293                306                288
-   2 │     2                296                310                283
-   3 │     3                298                314                277
-   4 │     4                301                317                273
-```
-
-!!! Note Notes
-      * `Not("Time")` or `2:4` would have been equally good choices for `source_column_selector` in the above operations.
-      * Don't forget `ByRow` if your function is to be applied to elements rather than entire column vectors.
-      Without `ByRow`, the manipulations above would have thrown
-      `ERROR: MethodError: no method matching +(::Vector{Int64}, ::Int64)`.
-      * Regular expression (`r""`) and `:` `source_column_selectors`
-      must be wrapped in `Cols` to be properly broadcasted
-      because otherwise the broadcasting occurs before the expression is expanded into a vector of matches.
-
-You could also broadcast different columns to different functions
-by supplying a vector of functions.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> f1(x) = x .+ 1
-f1 (generic function with 1 method)
-
-julia> f2(x) = x ./ 10
-f2 (generic function with 1 method)
-
-julia> transform(df, [:a, :b] .=> [f1, f2])
-4×4 DataFrame
- Row │ a      b      a_f1   b_f2
-     │ Int64  Int64  Int64  Float64
-─────┼──────────────────────────────
-   1 │     1      5      2      0.5
-   2 │     2      6      3      0.6
-   3 │     3      7      4      0.7
-   4 │     4      8      5      0.8
-```
-
-However, this form is not much more convenient than supplying
-multiple individual operations.
-
-```julia
-julia> transform(df, [:a => f1, :b => f2]) # same manipulation as previous
-4×4 DataFrame
- Row │ a      b      a_f1   b_f2
-     │ Int64  Int64  Int64  Float64
-─────┼──────────────────────────────
-   1 │     1      5      2      0.5
-   2 │     2      6      3      0.6
-   3 │     3      7      4      0.7
-   4 │     4      8      5      0.8
-```
-
-Perhaps more useful for broadcasting syntax
-is to apply multiple functions to multiple columns
-by changing the vector of functions to a 1-by-x matrix of functions.
-(Recall that a list, a vector, or a matrix of operation pairs are all valid
-for passing to the manipulation functions.)
-
-```julia
-julia> [:a, :b] .=> [f1 f2] # No comma `,` between f1 and f2
-2×2 Matrix{Pair{Symbol}}:
- :a=>f1  :a=>f2
- :b=>f1  :b=>f2
-
-julia> transform(df, [:a, :b] .=> [f1 f2]) # No comma `,` between f1 and f2
-4×6 DataFrame
- Row │ a      b      a_f1   b_f1   a_f2     b_f2
-     │ Int64  Int64  Int64  Int64  Float64  Float64
-─────┼──────────────────────────────────────────────
-   1 │     1      5      2      6      0.1      0.5
-   2 │     2      6      3      7      0.2      0.6
-   3 │     3      7      4      8      0.3      0.7
-   4 │     4      8      5      9      0.4      0.8
-```
-
-In this way, every combination of selected columns and functions will be applied.
-
-Pair broadcasting is a simple but powerful tool
-that can be used in any of the manipulation functions listed under
-[Basic Usage of Manipulation Functions](@ref).
-Experiment for yourself to discover other useful operations.
-
-## Additional Resources
-More details and examples of operation pair syntax can be found in
-[this blog post](https://bkamins.github.io/julialang/2020/12/24/minilanguage.html).
-(The official wording describing the syntax has changed since the blog post was written,
-but the examples are still illustrative.
-The operation pair syntax is sometimes referred to as the DataFrames.jl mini-language
-or Domain-Specific Language.)
-
-For additional practice,
-an interactive tutorial is provided on a variety of introductory topics
-by the DataFrames.jl package author
-[here](https://github.com/bkamins/Julia-DataFrames-Tutorial).
-
-
-For additional syntax niceties,
-many users find the [Chain.jl](https://github.com/jkrumbiegel/Chain.jl)
-and [DataFramesMeta.jl](https://github.com/JuliaData/DataFramesMeta.jl)
-packages useful
-to help simplify manipulations that may be tedious with operation pairs alone.
\ No newline at end of file

From 043605a15647ba24b38afa161f76bb67d3fa5f45 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Fri, 13 Oct 2023 17:17:17 -0400
Subject: [PATCH 24/29] Add more comments

---
 docs/src/man/basics.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 55937b849..19d6e8492 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -3075,19 +3075,19 @@ The approach with dot syntax is very versatile
 since the data getting, mathematics, and data setting can be separate steps.
 
 ```julia
-julia> df.x
+julia> df.x  # dot syntax returns a vector
 3-element Vector{Int64}:
  1
  2
  3
 
-julia> v = df.x + df.y
+julia> v = df.x + df.y  # assign mathematical result to a vector `v`
 3-element Vector{Int64}:
  5
  7
  9
 
-julia> df.z = v
+julia> df.z = v  # place `v` into the data frame `df` with the column name `z`
 3-element Vector{Int64}:
  5
  7

From 26b503e3b9baf42b06d78a8e6a56c8fb17150630 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Fri, 13 Oct 2023 17:22:59 -0400
Subject: [PATCH 25/29] Add link to @with macro

---
 docs/src/man/basics.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 19d6e8492..5c4df8bca 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -3133,6 +3133,9 @@ julia> df  # see that the previous expression updated the data frame `df`
 
 One benefit of using manipulation functions is that
 the name of the data frame only needs to be written once.
+(The `@with` macro from the
+[DataFramesMeta](https://juliadata.github.io/DataFramesMeta.jl/stable/#@with) package
+can somewhat relieve this issue in the other approaches.)
 
 Setup:
 

From 679f65f4723c811bdb05942c4f1e26cd93f2d87f Mon Sep 17 00:00:00 2001
From: Nathan Boyer <65452054+nathanrboyer@users.noreply.github.com>
Date: Sat, 14 Oct 2023 21:42:47 -0400
Subject: [PATCH 26/29] Delete redundant expression

---
 docs/src/man/basics.md | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 5c4df8bca..cac082b31 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -3041,12 +3041,6 @@ julia> transform!(df, [:x, :y] => (+) => :z)
 Dot Syntax:
 
 ```julia
-julia> df.x  # dot syntax returns a vector
-3-element Vector{Int64}:
- 1
- 2
- 3
-
 julia> df.z = df.x + df.y
 3-element Vector{Int64}:
  5

From 79a11711b713ae777638c25f562b26679e9ef8bd Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Mon, 16 Oct 2023 16:20:38 -0400
Subject: [PATCH 27/29] Clean up new section and delete with reference

---
 docs/src/man/basics.md | 135 +++++++++++++++++++++++++++++++++--------
 1 file changed, 110 insertions(+), 25 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index cac082b31..6f2427c56 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -3002,9 +3002,7 @@ to help simplify manipulations that may be tedious with operation pairs alone.
 After that deep dive into [Manipulation Functions](@ref),
 it is a good idea to review the alternative approaches covered in
 [Getting and Setting Data in a Data Frame](@ref).
-Let us compare the two approaches with a few examples.
-
-### Convenience
+Let us compare the approaches with a few examples.
 
 For simple operations,
 often getting/setting data with dot syntax
@@ -3012,10 +3010,10 @@ is simpler than the equivalent data frame manipulation.
 Here we will add the two columns of our data frame together
 and place the result in a new third column.
 
-Setup:
+**Setup:**
 
 ```julia
-julia> df = DataFrame(x = 1:3, y = 4:6)  # define data frame
+julia> df = DataFrame(x = 1:3, y = 4:6)  # define a data frame
 3×2 DataFrame
  Row │ x      y
      │ Int64  Int64
@@ -3025,7 +3023,7 @@ julia> df = DataFrame(x = 1:3, y = 4:6)  # define data frame
    3 │     3      6
 ```
 
-Manipulation:
+**Manipulation:**
 
 ```julia
 julia> transform!(df, [:x, :y] => (+) => :z)
@@ -3038,7 +3036,7 @@ julia> transform!(df, [:x, :y] => (+) => :z)
    3 │     3      6      9
 ```
 
-Dot Syntax:
+**Dot Syntax:**
 
 ```julia
 julia> df.z = df.x + df.y
@@ -3088,12 +3086,19 @@ julia> df.z = v  # place `v` into the data frame `df` with the column name `z`
  9
 ```
 
-One downside to dot syntax is that the column name must be explicitly written in the code.
-Indexing syntax can perform a similar operation with dynamic column names.
-(Manipulation functions can also work with dynamic column names as will be shown in the next example.)
+However, one way in which dot syntax is less versatile
+is that the column name must be explicitly written in the code.
+Indexing syntax is a good alternative in these cases
+which is only slightly longer to write than dot syntax.
+Both indexing syntax and manipulation functions can operate on dynamic column names
+stored in variables.
+
+**Setup:**
+
+Imagine this setup data was read from a file and/or entered by a user at runtime.
 
 ```julia
-julia> df = DataFrame("My First Column" => 1:3, "My Second Column" => 4:6)  # define data frame
+julia> df = DataFrame("My First Column" => 1:3, "My Second Column" => 4:6)  # define a data frame
 3×2 DataFrame
  Row │ My First Column  My Second Column
      │ Int64            Int64
@@ -3103,12 +3108,18 @@ julia> df = DataFrame("My First Column" => 1:3, "My Second Column" => 4:6)  # de
    3 │               3                 6
 
 julia> c1 = "My First Column"; c2 = "My Second Column"; c3 = "My Third Column";  # define column names
+```
 
-# Imagine the above data was read from a file or entered by a user at runtime.
+**Dot Syntax:**
 
-julia> df.c1  # dot syntax expects an explicit column name and cannot be used
+```julia
+julia> df.c1  # dot syntax expects an explicit column name and cannot be used to access variable column name
 ERROR: ArgumentError: column name :c1 not found in the data frame
+```
 
+**Indexing:**
+
+```julia
 julia> df[:, c3] = df[:, c1] + df[:, c2]  # access columns with names stored in variables
 3-element Vector{Int64}:
  5
@@ -3125,19 +3136,30 @@ julia> df  # see that the previous expression updated the data frame `df`
    3 │               3                 6                9
 ```
 
-One benefit of using manipulation functions is that
-the name of the data frame only needs to be written once.
-(The `@with` macro from the
-[DataFramesMeta](https://juliadata.github.io/DataFramesMeta.jl/stable/#@with) package
-can somewhat relieve this issue in the other approaches.)
+**Manipulation:**
 
-Setup:
+```julia
+julia> transform!(df, [c1, c2] => (+) => c3)  # access columns with names stored in variables
+3×3 DataFrame
+ Row │ My First Column  My Second Column  My Third Column
+     │ Int64            Int64             Int64
+─────┼────────────────────────────────────────────────────
+   1 │               1                 4                5
+   2 │               2                 5                7
+   3 │               3                 6                9
+```
+
+Additionally, manipulation functions only require
+the name of the data frame to be written once.
+This can be helpful when dealing with long variable and column names.
+
+**Setup:**
 
 ```julia
 julia> my_very_long_data_frame_name = DataFrame(
            "My First Column" => 1:3,
            "My Second Column" => 4:6
-       )  # define data frame
+       )  # define a data frame
 3×2 DataFrame
  Row │ My First Column  My Second Column
      │ Int64            Int64
@@ -3149,7 +3171,7 @@ julia> my_very_long_data_frame_name = DataFrame(
 julia> c1 = "My First Column"; c2 = "My Second Column"; c3 = "My Third Column";  # define column names
 ```
 
-Manipulation:
+**Manipulation:**
 
 ```julia
 
@@ -3163,7 +3185,7 @@ julia> transform!(my_very_long_data_frame_name, [c1, c2] => (+) => c3)
    3 │               3                 6                9
 ```
 
-Indexing:
+**Indexing:**
 
 ```julia
 julia> my_very_long_data_frame_name[:, c3] = my_very_long_data_frame_name[:, c1] + my_very_long_data_frame_name[:, c2]
@@ -3182,7 +3204,70 @@ julia> df  # see that the previous expression updated the data frame `df`
    3 │               3                 6                9
 ```
 
-### Speed
+Another benefit of manipulation functions and indexing over dot syntax is that
+it is easier to operate on a subset of columns.
+
+**Setup:**
+
+```julia
+julia> df = DataFrame(x = 1:3, y = 4:6, z = 7:9)  # define data frame
+3×3 DataFrame
+ Row │ x      y      z
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      7
+   2 │     2      5      8
+   3 │     3      6      9
+```
+
+**Dot Syntax:**
+
+```julia
+julia> df.Not(:x)  # will not work; requires a literal column name
+ERROR: ArgumentError: column name :Not not found in the data frame
+```
+
+**Indexing:**
+
+```julia
+julia> df[:, :y_z_max] = maximum.(eachrow(df[:, Not(:x)]))  # find maximum value across all rows except for column `x`
+3-element Vector{Int64}:
+ 7
+ 8
+ 9
+
+julia> df  # see that the previous expression updated the data frame `df`
+3×4 DataFrame
+ Row │ x      y      z      y_z_max
+     │ Int64  Int64  Int64  Int64
+─────┼──────────────────────────────
+   1 │     1      4      7        7
+   2 │     2      5      8        8
+   3 │     3      6      9        9
+```
+
+**Manipulation:**
+
+```julia
+julia> transform!(df, Not(:x) => ByRow(max))  # find maximum value across all rows except for column `x`
+3×4 DataFrame
+ Row │ x      y      z      y_z_max
+     │ Int64  Int64  Int64  Int64
+─────┼──────────────────────────────
+   1 │     1      4      7        7
+   2 │     2      5      8        8
+   3 │     3      6      9        9
+```
+
+Moreover, indexing can operate on a subset of columns *and* rows.
+
+**Indexing:**
+
+```julia
+julia> y_z_max_row3 = maximum(df[3, Not(:x)])  # find maximum value across row 3 except for column `x`
+9
+```
 
-TODO: Compare speed, memory, and view options (@view, !, :, copycols=false).
-(May need someone else to write this part unless I do more studying.)
+Hopefully this small comparison has illustrated some of the benefits and drawbacks
+of the various syntaxes available in DataFrames.jl.
+The best syntax to use depends on the situation.

From 82d935c39a15949c464aacdad98629df145ca697 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Thu, 30 May 2024 09:50:11 -0400
Subject: [PATCH 28/29] Fix admonitions

---
 docs/src/man/basics.md | 132 +++++++++++++++++++++--------------------
 1 file changed, 69 insertions(+), 63 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index d51038c2d..5449fb0e9 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -1752,11 +1752,12 @@ a `Not`, `Between`, `All`, or `Cols` expression,
 or a `:`.
 See the [Indexing](@ref) API for the full list of possible values with references.
 
-!!! Note
-      The Julia parser sometimes prevents `:` from being used by itself.
-      If you get
-      `ERROR: syntax: whitespace not allowed after ":" used for quoting`,
-      try using `All()`, `Cols(:)`, or `(:)` instead to select all columns.
+!!! note
+
+    The Julia parser sometimes prevents `:` from being used by itself.
+    If you get
+    `ERROR: syntax: whitespace not allowed after ":" used for quoting`,
+    try using `All()`, `Cols(:)`, or `(:)` instead to select all columns.
 
 ```julia
 julia> df = DataFrame(
@@ -1831,14 +1832,15 @@ julia> subset(df2, [:minor, :male])
    1 │ Jimmy    true  true
 ```
 
-!!! Note
-      Using `Symbol` in `source_column_selector` will perform slightly faster than using `String`.
-      However, `String` is convenient when column names contain spaces.
+!!! note
+
+    Using `Symbol` in `source_column_selector` will perform slightly faster than using `String`.
+    However, `String` is convenient when column names contain spaces.
 
-      All elements of `source_column_selector` must be the same type
-      (unless wrapped in `Cols`),
-      e.g. `subset(df2, [:minor, "male"])` will error
-      since `Symbol` and `String` are used simultaneously.)
+    All elements of `source_column_selector` must be the same type
+    (unless wrapped in `Cols`),
+    e.g. `subset(df2, [:minor, "male"])` will error
+    since `Symbol` and `String` are used simultaneously.
 
 #### `operation_function`
 Inside an `operation` pair, `operation_function` is a function
@@ -1996,7 +1998,8 @@ julia> subset(df, :b => ByRow(<(5))) # shorter version of the previous
    2 │     3      4
 ```
 
-!!! Note
+!!! note
+
     `operation_functions` within `subset` or `subset!` function calls
     must return a Boolean vector.
     `true` elements in the Boolean vector will determine
@@ -2349,31 +2352,31 @@ julia> transform(df, :a => (x -> 10 .* x) => (s -> "new_" * s)) # with anonymous
    4 │     4      8     40
 ```
 
-!!! Note
-      It is a good idea to wrap anonymous functions in parentheses
-      to avoid the `=>` operator accidently becoming part of the anonymous function.
-      The examples above do not work correctly without the parentheses!
-      ```julia
-      julia> transform(df, :a => x -> 10 .* x => add_prefix)  # Not what we wanted!
-      4×3 DataFrame
-       Row │ a      b      a_function
-           │ Int64  Int64  Pair…
-      ─────┼────────────────────────────────────────────
-         1 │     1      5  [10, 20, 30, 40]=>add_prefix
-         2 │     2      6  [10, 20, 30, 40]=>add_prefix
-         3 │     3      7  [10, 20, 30, 40]=>add_prefix
-         4 │     4      8  [10, 20, 30, 40]=>add_prefix
-
-      julia> transform(df, :a => x -> 10 .* x => s -> "new_" * s)  # Not what we wanted!
-      4×3 DataFrame
-       Row │ a      b      a_function
-           │ Int64  Int64  Pair…
-      ─────┼─────────────────────────────────────
-         1 │     1      5  [10, 20, 30, 40]=>#18
-         2 │     2      6  [10, 20, 30, 40]=>#18
-         3 │     3      7  [10, 20, 30, 40]=>#18
-         4 │     4      8  [10, 20, 30, 40]=>#18
-      ```
+!!! note
+
+    It is a good idea to wrap anonymous functions in parentheses
+    to avoid the `=>` operator accidently becoming part of the anonymous function.
+    The examples above do not work correctly without the parentheses!
+    ```julia
+    julia> transform(df, :a => x -> 10 .* x => add_prefix)  # Not what we wanted!
+    4×3 DataFrame
+     Row │ a      b      a_function
+         │ Int64  Int64  Pair…
+    ─────┼────────────────────────────────────────────
+       1 │     1      5  [10, 20, 30, 40]=>add_prefix
+       2 │     2      6  [10, 20, 30, 40]=>add_prefix
+       3 │     3      7  [10, 20, 30, 40]=>add_prefix
+       4 │     4      8  [10, 20, 30, 40]=>add_prefix
+    julia> transform(df, :a => x -> 10 .* x => s -> "new_" * s)  # Not what we wanted!
+    4×3 DataFrame
+     Row │ a      b      a_function
+         │ Int64  Int64  Pair…
+    ─────┼─────────────────────────────────────
+       1 │     1      5  [10, 20, 30, 40]=>#18
+       2 │     2      6  [10, 20, 30, 40]=>#18
+       3 │     3      7  [10, 20, 30, 40]=>#18
+       4 │     4      8  [10, 20, 30, 40]=>#18
+    ```
 
 A renaming function will not work in the
 `source_column_selector => new_column_names` operation form
@@ -2481,11 +2484,12 @@ julia> transform(df, :data => AsTable) # keeps names from named tuples
    2 │ (a = 3, b = 4)      3      4
 ```
 
-!!! Note
-      To pack multiple columns into a single column of `NamedTuple`s
-      (reverse of the above operation)
-      apply the `identity` function `ByRow`, e.g.
-      `transform(df, AsTable([:a, :b]) => ByRow(identity) => :data)`.
+!!! note
+
+    To pack multiple columns into a single column of `NamedTuple`s
+    (reverse of the above operation)
+    apply the `identity` function `ByRow`, e.g.
+    `transform(df, AsTable([:a, :b]) => ByRow(identity) => :data)`.
 
 Renaming functions also work for multi-column transformations,
 but they must operate on a vector of strings.
@@ -2756,18 +2760,19 @@ julia> ["x" => "a", "y" => "b"] == (["x", "y"] .=> ["a", "b"])
 true
 ```
 
-!!! Note
-      These operation pairs (or vector of pairs) can be given variable names.
-      This is uncommon in practice but could be helpful for intermediate
-      inspection and testing.
-      ```julia
-      df = DataFrame(x = 1:3, y = 4:6)       # create data frame
-      operation = ["x", "y"] .=> ["a", "b"]  # save operation to variable
-      typeof(operation)                      # check type of operation
-      first(operation)                       # check first pair in operation
-      last(operation)                        # check last pair in operation
-      select(df, operation)                  # manipulate `df` with `operation`
-      ```
+!!! note
+
+    These operation pairs (or vector of pairs) can be given variable names.
+    This is uncommon in practice but could be helpful for intermediate
+    inspection and testing.
+    ```julia
+    df = DataFrame(x = 1:3, y = 4:6)       # create data frame
+    operation = ["x", "y"] .=> ["a", "b"]  # save operation to variable
+    typeof(operation)                      # check type of operation
+    first(operation)                       # check first pair in operation
+    last(operation)                        # check last pair in operation
+    select(df, operation)                  # manipulate `df` with `operation`
+    ```
 
 In Julia,
 a non-vector broadcasted with a vector will be repeated in each resultant pair element.
@@ -2932,14 +2937,15 @@ julia> select(
    4 │     4                301                317                273
 ```
 
-!!! Note Notes
-      * `Not("Time")` or `2:4` would have been equally good choices for `source_column_selector` in the above operations.
-      * Don't forget `ByRow` if your function is to be applied to elements rather than entire column vectors.
-      Without `ByRow`, the manipulations above would have thrown
-      `ERROR: MethodError: no method matching +(::Vector{Int64}, ::Int64)`.
-      * Regular expression (`r""`) and `:` `source_column_selectors`
-      must be wrapped in `Cols` to be properly broadcasted
-      because otherwise the broadcasting occurs before the expression is expanded into a vector of matches.
+!!! note "Notes"
+
+    * `Not("Time")` or `2:4` would have been equally good choices for `source_column_selector` in the above operations.
+    * Don't forget `ByRow` if your function is to be applied to elements rather than entire column vectors.
+    Without `ByRow`, the manipulations above would have thrown
+    `ERROR: MethodError: no method matching +(::Vector{Int64}, ::Int64)`.
+    * Regular expression (`r""`) and `:` `source_column_selectors`
+    must be wrapped in `Cols` to be properly broadcasted
+    because otherwise the broadcasting occurs before the expression is expanded into a vector of matches.
 
 You could also broadcast different columns to different functions
 by supplying a vector of functions.

From d9864bac42ffe6d50bce57271de3f507ce07306e Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Thu, 30 May 2024 12:03:34 -0400
Subject: [PATCH 29/29] Fix manipulation function reference

---
 docs/src/man/basics.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 5449fb0e9..242bab3f5 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -3020,7 +3020,7 @@ In this way, every combination of selected columns and functions will be applied
 
 Pair broadcasting is a simple but powerful tool
 that can be used in any of the manipulation functions listed under
-[Basic Usage of Manipulation Functions](@ref).
+[Manipulation Functions](@ref).
 Experiment for yourself to discover other useful operations.
 
 ### Additional Resources