Skip to content

Commit

Permalink
Merge branch 'master' into nl/refgrouping
Browse files Browse the repository at this point in the history
  • Loading branch information
nalimilan committed Oct 11, 2020
2 parents 9d05965 + 4ec8009 commit f3ce3ed
Show file tree
Hide file tree
Showing 21 changed files with 960 additions and 270 deletions.
26 changes: 17 additions & 9 deletions CONTRIBUTING.md
Expand Up @@ -16,26 +16,34 @@ Thanks for taking the plunge!

## Contributing

* DataFrames.jl is a relatively complex package that also has many external dependencies.
Therefore if you would want to propose a new functionality (which is encouraged) it is
strongly recommended to open an issue first and reach a decision on the final design.
Then a pull request serves an implementation of the agreed way how things should work.
* Feel free to open, or comment on, an issue and solicit feedback early on,
especially if you're unsure about aligning with design goals and direction,
or if relevant historical comments are ambiguous
or if relevant historical comments are ambiguous.
* Pair new functionality with tests, and bug fixes with tests that fail pre-fix.
Increasing test coverage as you go is always nice
Increasing test coverage as you go is always nice.
* Aim for atomic commits, if possible, e.g. `change 'foo' behavior like so` &
`'bar' handles such and such corner case`,
rather than `update 'foo' and 'bar'` & `fix typo` & `fix 'bar' better`
rather than `update 'foo' and 'bar'` & `fix typo` & `fix 'bar' better`.
* Pull requests are tested against release and development branches of Julia,
so using `Pkg.test("DataFrames")` as you develop can be helpful
so using `Pkg.test("DataFrames")` as you develop can be helpful.
* The style guidelines outlined below are not the personal style of most contributors,
but for consistency throughout the project, we've adopted them
* It is recommended to disable GitHub Actions on your fork; check Settings > Actions
but for consistency throughout the project, we've adopted them.
* It is recommended to disable GitHub Actions on your fork; check Settings > Actions.
* If a PR adds a new exported name then make sure to add a docstring for it and
add a reference to it in the documentation
* A PR with breaking changes should have `[BREAKING]` as a first part of its name
add a reference to it in the documentation.
* A PR with breaking changes should have `[BREAKING]` as a first part of its name.
* If a PR changes or adds functionality please update NEWS.md file accordingly as
a part of the PR (along with the link to the PR); please do not add entries
to NEWS.md for changes that are bug fixes or are not user visible, such as
adding tests, updating documentation or improving code layout
adding tests, updating documentation or improving code layout.
* If you make a PR please try to avoid pushing many small commits to GitHub in
a sequence as each such commit triggers a separate CI job, which takes over
an hour. This has a consequence of making other PRs in packages from the JuliaData
ecosystem wait for such CI jobs to finish as hey share a common pool of CI resources.

## Style Guidelines

Expand Down
7 changes: 7 additions & 0 deletions NEWS.md
Expand Up @@ -2,6 +2,10 @@

## Breaking changes

* the rules for transformations passed to `select`/`select!`, `transform`/`transform!`,
and `combine` have been made more flexible; in particular now it is allowed to
return multiple columns from a transformation function
[#2461](https://github.com/JuliaData/DataFrames.jl/pull/2461)
* CategoricalArrays.jl is no longer reexported: call `using CategoricalArrays`
to use it [#2404]((https://github.com/JuliaData/DataFrames.jl/pull/2404)).
In the same vein, the `categorical` and `categorical!` functions
Expand Down Expand Up @@ -32,6 +36,8 @@
choose the fast path only when it is safe; this resolves inconsistencies
with what the same functions not using fast path produce
([#2357](https://github.com/JuliaData/DataFrames.jl/pull/2357))
* `GroupKeys` now supports `in` for `GroupKey`, `Tuple`, `NamedTuple` and dictionaries
([2392](https://github.com/JuliaData/DataFrames.jl/pull/2392))
* in `describe` the specification of custom aggregation is now `function => name`;
old `name => function` order is now deprecated
([#2401](https://github.com/JuliaData/DataFrames.jl/pull/2401))
Expand Down Expand Up @@ -67,6 +73,7 @@
* `filter`, `sort`, `dropmissing`, and `unique` now support a `view` keyword argument
which if set to `true` makes them retun a `SubDataFrame` view into the passed
data frame.
* add `only` method for `AbstractDataFrame` ([#2449](https://github.com/JuliaData/DataFrames.jl/pull/2449))

## Deprecated

Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Expand Up @@ -36,7 +36,7 @@ test = ["DataStructures", "DataValues", "Dates", "Logging", "Random", "Test"]
[compat]
julia = "1"
CategoricalArrays = "0.8.3"
Compat = "2.2, 3"
Compat = "3.17"
DataAPI = "1.2"
InvertedIndices = "1"
IteratorInterfaceExtensions = "0.1.1, 1"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Expand Up @@ -2,7 +2,7 @@ DataFrames.jl
=============

[![Coverage Status](https://coveralls.io/repos/JuliaData/DataFrames.jl/badge.svg?branch=master&service=github)](https://coveralls.io/github/JuliaData/DataFrames.jl?branch=master)
[![Travis Build Status](https://travis-ci.org/JuliaData/DataFrames.jl.svg?branch=master)](https://travis-ci.org/JuliaData/DataFrames.jl)
[![Travis Build Status](https://travis-ci.com/JuliaData/DataFrames.jl.svg?branch=master)](https://travis-ci.com/JuliaData/DataFrames.jl)

Tools for working with tabular data in Julia.

Expand Down
1 change: 1 addition & 0 deletions docs/src/lib/functions.md
Expand Up @@ -99,6 +99,7 @@ filter
filter!
first
last
only
nonunique
unique
unique!
Expand Down
3 changes: 2 additions & 1 deletion docs/src/lib/types.md
Expand Up @@ -55,7 +55,8 @@ The `ByRow` type is a special type used for selection operations to signal that
to each element (row) of the selection.

The `AsTable` type is a special type used for selection operations to signal that the columns selected by a wrapped
selector should be passed as a `NamedTuple` to the function.
selector should be passed as a `NamedTuple` to the function or to signal that it is requested
to expand the return value of a transformation into multiple columns.

## [The design of handling of columns of a `DataFrame`](@id man-columnhandling)

Expand Down
28 changes: 20 additions & 8 deletions docs/src/man/comparisons.md
Expand Up @@ -69,13 +69,13 @@ rows having the index value of `'c'`.
| Reduce multiple values | `df['z'].mean(skipna = False)` | `mean(df.z)` |
| | `df['z'].mean()` | `mean(skipmissing(df.z))` |
| | `df[['z']].agg(['mean'])` | `combine(df, :z => mean ∘ skipmissing)` |
| Add new columns | `df.assign(z1 = df['z'] + 1)` | `df.z1 = df.z .+ 1` |
| | | `insertcols!(df, :z1 => df.z .+ 1)` |
| | | `transform(df, :z => (v -> v .+ 1) => :z1)` |
| Add new columns | `df.assign(z1 = df['z'] + 1)` | `transform(df, :z => (v -> v .+ 1) => :z1)` |
| Rename columns | `df.rename(columns = {'x': 'x_new'})` | `rename(df, :x => :x_new)` |
| Pick & transform columns | `df.assign(x_mean = df['x'].mean())[['x_mean', 'y']]` | `select(df, :x => mean, :y)` |
| Sort rows | `df.sort_values(by = 'x')` | `sort(df, :x)` |
| | `df.sort_values(by = ['grp', 'x'], ascending = [True, False])` | `sort(df, [:grp, order(:x, rev = true)])` |
| Drop missing rows | `df.dropna()` | `dropmissing(df)` |
| Select unique rows | `df.drop_duplicates()` | `unique(df)` |

Note that pandas skips `NaN` values in its analytic functions by default. By contrast,
Julia functions do not skip `NaN`'s. If necessary, you can filter out
Expand All @@ -93,6 +93,21 @@ examples above do not synchronize the column names between pandas and DataFrames
(you can pass `renamecols=false` keyword argument to `select`, `transform` and
`combine` functions to retain old column names).

### Mutating operations

| Operation | pandas | DataFrames.jl |
| :----------------- | :---------------------------------------------------- | :------------------------------------------- |
| Add new columns | `df['z1'] = df['z'] + 1` | `df.z1 = df.z .+ 1` |
| | | `transform!(df, :z => (x -> x .+ 1) => :z1)` |
| | `df.insert(1, 'const', 10)` | `insertcols!(df, 2, :const => 10)` |
| Rename columns | `df.rename(columns = {'x': 'x_new'}, inplace = True)` | `rename!(df, :x => :x_new)` |
| Sort rows | `df.sort_values(by = 'x', inplace = True)` | `sort!(df, :x)` |
| Drop missing rows | `df.dropna(inplace = True)` | `dropmissing!(df)` |
| Select unique rows | `df.drop_duplicates(inplace = True)` | `unique!(df)` |

Generally speaking, DataFrames.jl follows the Julia convention of using `!` in the
function name to indicate mutation behavior.

### Grouping data and aggregation

DataFrames.jl provides a `groupby` function to apply operations
Expand Down Expand Up @@ -178,11 +193,8 @@ In DataFrames.jl, it just works normally with an array of join keys specified in
The following table compares the main functions of DataFrames.jl with the R package dplyr (version 1):

```R
df <- tibble(id = c('a','b','c','d','e','f'),
grp = c(1, 2, 1, 2, 1, 2),
x = c(6, 5, 4, 3, 2, 1),
y = c(4, 5, 6, 7, 8, 9),
z = c(3, 4, 5, 6, 7, 8))
df <- tibble(grp = rep(1:2, 3), x = 6:1, y = 4:9,
z = c(3:7, NA), id = letters[1:6])
```

| Operation | dplyr | DataFrames.jl |
Expand Down
19 changes: 16 additions & 3 deletions docs/src/man/getting_started.md
Expand Up @@ -355,7 +355,12 @@ we can observe that:

#### Indexing syntax

Specific subsets of a data frame can be extracted using the indexing syntax, similar to matrices. The colon `:` indicates that all items (rows or columns depending on its position) should be retained:
Specific subsets of a data frame can be extracted using the indexing syntax,
similar to matrices. In the [Indexing](@ref) section of the manual you can find
all the details about the available options. Here we highlight the basic options.

The colon `:` indicates that all items (rows or columns
depending on its position) should be retained:

```jldoctest dataframe
julia> df[1:3, :]
Expand Down Expand Up @@ -481,7 +486,7 @@ julia> df[!, Not(:x1)]
│ 1 │ 2 │ 3 │
```

Finally, you can use `Not` and `All` selectors in more complex column selection scenarios.
Finally, you can use `Not`, `Between`, and `All` selectors in more complex column selection scenarios.
The following examples move all columns whose names match `r"x"` regular expression respectively to the front and to the end of a data frame:
```
julia> df = DataFrame(r=1, x1=2, x2=3, y=4)
Expand Down Expand Up @@ -571,7 +576,7 @@ a function object that tests whether each value belongs to the subset
- when `view` or `@view` is used (e.g. `@view df[1:3, :A]`).

More details on copies, views, and references can be found
[here.](https://juliadata.github.io/DataFrames.jl/stable/lib/indexing/#getindex-and-view-1)
in the [`getindex` and `view`](@ref) section.

#### Column selection using `select` and `select!`, `transform` and `transform!`

Expand Down Expand Up @@ -627,6 +632,14 @@ julia> select(df, :x2, :x2 => ByRow(sqrt)) # transform columns by row
├─────┼───────┼─────────┤
│ 1 │ 3 │ 1.73205 │
│ 2 │ 4 │ 2.0 │
julia> select(df, AsTable(:) => ByRow(extrema) => [:lo, :hi]) # return multiple columns
2×2 DataFrame
│ Row │ lo │ hi │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 5 │
│ 2 │ 2 │ 6 │
```

It is important to note that `select` always returns a data frame,
Expand Down
7 changes: 7 additions & 0 deletions src/DataFrames.jl
Expand Up @@ -80,6 +80,13 @@ if VERSION < v"1.2"
export hasproperty
end

if isdefined(Base, :only) # Introduced in 1.4.0
import Base.only
else
import Compat.only
export only
end

include("other/utils.jl")
include("other/index.jl")

Expand Down
10 changes: 10 additions & 0 deletions src/abstractdataframe/abstractdataframe.jl
Expand Up @@ -434,6 +434,16 @@ end
##
##############################################################################

"""
only(df::AbstractDataFrame)
If `df` has a single row return it as a `DataFrameRow`; otherwise throw `ArgumentError`.
"""
function only(df::AbstractDataFrame)
nrow(df) != 1 && throw(ArgumentError("data frame must contain exactly 1 row"))
return df[1, :]
end

"""
first(df::AbstractDataFrame)
Expand Down
4 changes: 2 additions & 2 deletions src/abstractdataframe/join.jl
Expand Up @@ -812,9 +812,9 @@ function rightjoin(df1::AbstractDataFrame, df2::AbstractDataFrame;
end

"""
outerjoin(df1, df2; on, kind = :inner, makeunique = false, indicator = nothing,
outerjoin(df1, df2; on, makeunique = false, indicator = nothing,
validate = (false, false), renamecols = identity => identity)
outerjoin(df1, df2, dfs...; on, kind = :inner, makeunique = false,
outerjoin(df1, df2, dfs...; on, makeunique = false,
validate = (false, false))
Perform an outer join of two or more data frame objects and return a `DataFrame`
Expand Down

0 comments on commit f3ce3ed

Please sign in to comment.