diff --git a/docs/src/man/split_apply_combine.md b/docs/src/man/split_apply_combine.md index 8b06ded..6e0f2d3 100644 --- a/docs/src/man/split_apply_combine.md +++ b/docs/src/man/split_apply_combine.md @@ -2,7 +2,7 @@ Many data analysis tasks involve splitting a data set into groups, applying some functions to each of the groups and then combining the results. A standardized framework for handling this sort of computation is described in the paper, The Split-Apply-Combine Strategy for Data Analysis \<\>, written by Hadley Wickham. -The DataTables package supports the Split-Apply-Combine strategy through the `by` function, which takes in three arguments: (1) a DataTable, (2) a column to split the DataTable on, and (3) a function or expression to apply to each subset of the DataTable. +The DataTables package supports the Split-Apply-Combine strategy through the `by` function, which takes in three arguments: (1) a DataTable, (2) one or more columns to split the DataTable on, and (3) a function or expression to apply to each subset of the DataTable. We show several examples of the `by` function applied to the `iris` dataset below: @@ -23,7 +23,7 @@ by(iris, :Species) do dt end ``` -A second approach to the Split-Apply-Combine strategy is implemented in the `aggregate` function, which also takes three arguments: (1) a DataTable, (2) a column (or columns) to split the DataTable on, and a (3) function (or several functions) that are used to compute a summary of each subset of the DataTable. Each function is applied to each column, that was not used to split the DataTable, creating new columns of the form `$name_$function` e.g. `SepalLength_mean`. Anonymous functions and expressions that do not have a name will be called `λ1`. +A second approach to the Split-Apply-Combine strategy is implemented in the `aggregate` function, which also takes three arguments: (1) a DataTable, (2) one or more columns to split the DataTable on, and (3) one or more functions that are used to compute a summary of each subset of the DataTable. Each function is applied to each column, that was not used to split the DataTable, creating new columns of the form `$name_$function` e.g. `SepalLength_mean`. Anonymous functions and expressions that do not have a name will be called `λ1`. We show several examples of the `aggregate` function applied to the `iris` dataset below: