Skip to content
This repository has been archived by the owner on May 5, 2019. It is now read-only.

Commit

Permalink
"One or more (columns|functions)" in S-A-C manual
Browse files Browse the repository at this point in the history
  • Loading branch information
ararslan committed Mar 6, 2017
1 parent 20c71d6 commit 73b5401
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions docs/src/man/split_apply_combine.md
Expand Up @@ -2,7 +2,7 @@

Many data analysis tasks involve splitting a data set into groups, applying some functions to each of the groups and then combining the results. A standardized framework for handling this sort of computation is described in the paper, The Split-Apply-Combine Strategy for Data Analysis \<<http://www.jstatsoft.org/v40/i01>\>, written by Hadley Wickham.

The DataTables package supports the Split-Apply-Combine strategy through the `by` function, which takes in three arguments: (1) a DataTable, (2) a column to split the DataTable on, and (3) a function or expression to apply to each subset of the DataTable.
The DataTables package supports the Split-Apply-Combine strategy through the `by` function, which takes in three arguments: (1) a DataTable, (2) one or more columns to split the DataTable on, and (3) a function or expression to apply to each subset of the DataTable.

We show several examples of the `by` function applied to the `iris` dataset below:

Expand All @@ -23,7 +23,7 @@ by(iris, :Species) do dt
end
```

A second approach to the Split-Apply-Combine strategy is implemented in the `aggregate` function, which also takes three arguments: (1) a DataTable, (2) a column (or columns) to split the DataTable on, and a (3) function (or several functions) that are used to compute a summary of each subset of the DataTable. Each function is applied to each column, that was not used to split the DataTable, creating new columns of the form `$name_$function` e.g. `SepalLength_mean`. Anonymous functions and expressions that do not have a name will be called `λ1`.
A second approach to the Split-Apply-Combine strategy is implemented in the `aggregate` function, which also takes three arguments: (1) a DataTable, (2) one or more columns to split the DataTable on, and (3) one or more functions that are used to compute a summary of each subset of the DataTable. Each function is applied to each column, that was not used to split the DataTable, creating new columns of the form `$name_$function` e.g. `SepalLength_mean`. Anonymous functions and expressions that do not have a name will be called `λ1`.

We show several examples of the `aggregate` function applied to the `iris` dataset below:

Expand Down

0 comments on commit 73b5401

Please sign in to comment.