Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transformation and renaming to select and select! #2080

Merged
merged 51 commits into from Mar 19, 2020
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
77f4623
add support for transforms in select and define transform and transform!
bkamins Jan 6, 2020
147a427
fix SubDataFrame select signature
bkamins Jan 6, 2020
11fd0a2
fix problem in autogeneration of column names
bkamins Jan 6, 2020
6fa4f84
add documentation of automatic generation of column names
bkamins Jan 7, 2020
d501fb4
improvements after code review
bkamins Jan 8, 2020
7053e5b
updates after a code review
bkamins Jan 9, 2020
ec834e2
correct variable name
bkamins Jan 10, 2020
6c76aca
minor fix
bkamins Jan 10, 2020
e59d129
fix select for SubDataFrame
bkamins Jan 10, 2020
dee8ac7
improved multiple column transformation
bkamins Jan 10, 2020
4a8a40b
improve select for SubDataFrame
bkamins Jan 10, 2020
f04a549
Apply suggestions from code review
bkamins Jan 10, 2020
bbc06f4
fixes after code review
bkamins Jan 10, 2020
498d9df
fixes from code review
bkamins Jan 12, 2020
fa5a1f1
disallow duplicates in single column selection
bkamins Jan 15, 2020
cd8f41b
fix select for SubDataFrame to avoid duplicate ColumnIndex selelctions
bkamins Jan 15, 2020
3c7149b
Apply suggestions from code review
bkamins Jan 15, 2020
807adfc
fixes after the code review
bkamins Jan 16, 2020
aa7746b
change default behavior to whole-column and add Row
bkamins Feb 1, 2020
7524706
fix typo
bkamins Feb 4, 2020
3d77f6b
add funname to Row
bkamins Feb 5, 2020
e560a14
merge normalize_selection methods
bkamins Feb 5, 2020
9caab2d
make ByRow a functor
bkamins Feb 11, 2020
db8f103
Update src/abstractdataframe/selection.jl
bkamins Feb 14, 2020
df6795a
disallow transofmation of 0 columns
bkamins Feb 14, 2020
ba1feb9
disallow 0 columns only in ByRow
bkamins Feb 15, 2020
0c30db7
Merge branch 'master' into flexible_select
bkamins Feb 15, 2020
6d03a1c
sync with Tables 1.0
bkamins Feb 15, 2020
34aa4cd
fix documentation
bkamins Feb 15, 2020
a03afd7
fix missing parenthesis
bkamins Feb 16, 2020
d4fced0
fix method signature
bkamins Feb 17, 2020
c712088
export ByRow
bkamins Feb 17, 2020
9b5c027
auto-splat (no docs update)
bkamins Feb 22, 2020
8e73abc
fix @views
bkamins Feb 22, 2020
930875e
move to broadcasting in ByRow
bkamins Feb 26, 2020
ab4103a
Apply suggestions from code review
bkamins Feb 28, 2020
4289c48
update implementation
bkamins Feb 28, 2020
6341ccc
reorganize tests
bkamins Feb 28, 2020
09e632e
first round of tests
bkamins Feb 28, 2020
df59216
disallow AbstractDataFrame, NamedTuple, DataFrameRow, and AbstractMat…
bkamins Feb 29, 2020
d932b05
fix test
bkamins Feb 29, 2020
688b077
clean up transformation implementation
bkamins Mar 1, 2020
c34ee72
further sanitizing select rules and more code explanations
bkamins Mar 1, 2020
b818d57
fix comments
bkamins Mar 1, 2020
08d4043
tests of disallowed values
bkamins Mar 2, 2020
49dff0e
finalize tests
bkamins Mar 2, 2020
d685576
fix Julia 1.0 tests
bkamins Mar 2, 2020
35f8996
stop doing pessimistic copy when copycols=true
bkamins Mar 12, 2020
78b492d
Apply suggestions from code review
bkamins Mar 18, 2020
52e690d
fixes after code review
bkamins Mar 18, 2020
20642c5
improve docstring
bkamins Mar 19, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/src/lib/types.md
Expand Up @@ -57,6 +57,9 @@ The `RepeatedVector` and `StackedVector` types are subtypes of `AbstractVector`
with the exception that they are read only. Note that they are not exported and should not be constructed directly,
but they are columns of a `DataFrame` returned by `stack` with `view=true`.

The `ByRow` type is a special type used for selection operations to signal that the wrapped function should be applied
to each element (row) of the selection.

## [The design of handling of columns of a `DataFrame`](@id man-columnhandling)

When a `DataFrame` is constructed columns are copied by default. You can disable
Expand Down Expand Up @@ -111,6 +114,7 @@ without caution because:

```@docs
AbstractDataFrame
ByRow
DataFrame
DataFrameRow
GroupedDataFrame
Expand Down
28 changes: 26 additions & 2 deletions docs/src/man/getting_started.md
Expand Up @@ -522,11 +522,14 @@ julia> df[in.(df.A, Ref([1, 5, 601])), :]
│ 3 │ 601 │ 7 │ 301 │
```

Equivalently, the `in` function can be called with a single argument to create a function object that tests whether each value belongs to the subset (partial application of `in`): `df[in([1, 5, 601]).(df.A), :]`.
Equivalently, the `in` function can be called with a single argument to create
a function object that tests whether each value belongs to the subset
(partial application of `in`): `df[in([1, 5, 601]).(df.A), :]`.

#### Column selection using `select` and `select!`

You can also use the [`select`](@ref) and [`select!`](@ref) functions to select columns in a data frame.
You can also use the [`select`](@ref) and [`select!`](@ref) functions to select,
rename and transform columns in a data frame.

The `select` function creates a new data frame:
```jldoctest dataframe
Expand All @@ -550,6 +553,27 @@ julia> select(df, r"x") # select columns containing 'x' character
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 2 │

julia> select(df, :x1 => :a1, :x2 => :a2) # rename columns
1×2 DataFrame
│ Row │ a1 │ a2 │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 2 │

julia> select(df, :x1, :x2 => (x -> 2x) => :x2) # transform columns
1×2 DataFrame
│ Row │ x1 │ x2 │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 4 │

julia> select(df, :x1, :x2 => ByRow(UInt8) => :x2) # transform columns by row
bkamins marked this conversation as resolved.
Show resolved Hide resolved
1×2 DataFrame
│ Row │ x1 │ x2 │
│ │ Int64 │ UInt8 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 0x02 │
```

It is important to note that `select` always returns a data frame,
Expand Down
2 changes: 2 additions & 0 deletions src/DataFrames.jl
Expand Up @@ -15,6 +15,7 @@ import DataAPI.All,
export AbstractDataFrame,
All,
Between,
ByRow,
DataFrame,
DataFrame!,
DataFrameRow,
Expand Down Expand Up @@ -80,6 +81,7 @@ include("dataframerow/utils.jl")

include("other/broadcasting.jl")

include("abstractdataframe/selection.jl")
include("abstractdataframe/iteration.jl")
include("abstractdataframe/join.jl")
include("abstractdataframe/reshape.jl")
Expand Down