Skip to content

Commit

Permalink
Merge branch 'master' into names_predicate
Browse files Browse the repository at this point in the history
  • Loading branch information
bkamins committed Nov 1, 2020
2 parents fc66601 + 540f901 commit 5a15791
Show file tree
Hide file tree
Showing 50 changed files with 2,970 additions and 913 deletions.
26 changes: 17 additions & 9 deletions CONTRIBUTING.md
Expand Up @@ -16,26 +16,34 @@ Thanks for taking the plunge!

## Contributing

* DataFrames.jl is a relatively complex package that also has many external dependencies.
Therefore if you would want to propose a new functionality (which is encouraged) it is
strongly recommended to open an issue first and reach a decision on the final design.
Then a pull request serves an implementation of the agreed way how things should work.
* Feel free to open, or comment on, an issue and solicit feedback early on,
especially if you're unsure about aligning with design goals and direction,
or if relevant historical comments are ambiguous
or if relevant historical comments are ambiguous.
* Pair new functionality with tests, and bug fixes with tests that fail pre-fix.
Increasing test coverage as you go is always nice
Increasing test coverage as you go is always nice.
* Aim for atomic commits, if possible, e.g. `change 'foo' behavior like so` &
`'bar' handles such and such corner case`,
rather than `update 'foo' and 'bar'` & `fix typo` & `fix 'bar' better`
rather than `update 'foo' and 'bar'` & `fix typo` & `fix 'bar' better`.
* Pull requests are tested against release and development branches of Julia,
so using `Pkg.test("DataFrames")` as you develop can be helpful
so using `Pkg.test("DataFrames")` as you develop can be helpful.
* The style guidelines outlined below are not the personal style of most contributors,
but for consistency throughout the project, we've adopted them
* It is recommended to disable GitHub Actions on your fork; check Settings > Actions
but for consistency throughout the project, we've adopted them.
* It is recommended to disable GitHub Actions on your fork; check Settings > Actions.
* If a PR adds a new exported name then make sure to add a docstring for it and
add a reference to it in the documentation
* A PR with breaking changes should have `[BREAKING]` as a first part of its name
add a reference to it in the documentation.
* A PR with breaking changes should have `[BREAKING]` as a first part of its name.
* If a PR changes or adds functionality please update NEWS.md file accordingly as
a part of the PR (along with the link to the PR); please do not add entries
to NEWS.md for changes that are bug fixes or are not user visible, such as
adding tests, updating documentation or improving code layout
adding tests, updating documentation or improving code layout.
* If you make a PR please try to avoid pushing many small commits to GitHub in
a sequence as each such commit triggers a separate CI job, which takes over
an hour. This has a consequence of making other PRs in packages from the JuliaData
ecosystem wait for such CI jobs to finish as hey share a common pool of CI resources.

## Style Guidelines

Expand Down
26 changes: 26 additions & 0 deletions NEWS.md
Expand Up @@ -2,6 +2,10 @@

## Breaking changes

* the rules for transformations passed to `select`/`select!`, `transform`/`transform!`,
and `combine` have been made more flexible; in particular now it is allowed to
return multiple columns from a transformation function
[#2461](https://github.com/JuliaData/DataFrames.jl/pull/2461)
* CategoricalArrays.jl is no longer reexported: call `using CategoricalArrays`
to use it [#2404]((https://github.com/JuliaData/DataFrames.jl/pull/2404)).
In the same vein, the `categorical` and `categorical!` functions
Expand Down Expand Up @@ -32,6 +36,16 @@
choose the fast path only when it is safe; this resolves inconsistencies
with what the same functions not using fast path produce
([#2357](https://github.com/JuliaData/DataFrames.jl/pull/2357))
* joins now return `PooledVector` not `CategoricalVector` in indicator column
([#2505](https://github.com/JuliaData/DataFrames.jl/pull/2505))
* `GroupKeys` now supports `in` for `GroupKey`, `Tuple`, `NamedTuple` and dictionaries
([2392](https://github.com/JuliaData/DataFrames.jl/pull/2392))
* in `describe` the specification of custom aggregation is now `function => name`;
old `name => function` order is now deprecated
([#2401](https://github.com/JuliaData/DataFrames.jl/pull/2401))
* `unstack` now produces row and column keys in the order of their first appearance
and has two new keyword arguments `allowmissing` and `allowduplicates`
([#2494](https://github.com/JuliaData/DataFrames.jl/pull/2494))

## New functionalities

Expand Down Expand Up @@ -61,6 +75,14 @@
keyword argument that makes it possible to avoid adding transformation function name
as a suffix in automatically generated column names
([#2397](https://github.com/JuliaData/DataFrames.jl/pull/2397))
* `filter`, `sort`, `dropmissing`, and `unique` now support a `view` keyword argument
which if set to `true` makes them retun a `SubDataFrame` view into the passed
data frame.
* add `only` method for `AbstractDataFrame` ([#2449](https://github.com/JuliaData/DataFrames.jl/pull/2449))
* passing empty sets of columns in `filter`/`filter!` and in `select`/`transform`/`combine`
with `ByRow` is now accepted ([#2476](https://github.com/JuliaData/DataFrames.jl/pull/2476))
* add `permutedims` method for `AbstractDataFrame` ([#2447](https://github.com/JuliaData/DataFrames.jl/pull/2447))
* add support for `Cols` from DataAPI.jl ([#2495](https://github.com/JuliaData/DataFrames.jl/pull/2495))

## Deprecated

Expand All @@ -76,3 +98,7 @@
([#2315](https://github.com/JuliaData/DataFrames.jl/pull/2315))
* add rich display support for Markdown cell entries in HTML and LaTeX
([#2346](https://github.com/JuliaData/DataFrames.jl/pull/2346))
* limit the maximal display width the output can use in `text/plain` before
being truncated (in the `textwidth` sense, excluding ``) to `32` per column
by default and fix a corner case when no columns are printed in situations when
they are too wide ([2403](https://github.com/JuliaData/DataFrames.jl/pull/2403))
6 changes: 3 additions & 3 deletions Project.toml
Expand Up @@ -35,9 +35,9 @@ test = ["DataStructures", "DataValues", "Dates", "Logging", "Random", "Test"]

[compat]
julia = "1"
CategoricalArrays = "0.8"
Compat = "2.2, 3"
DataAPI = "1.2"
CategoricalArrays = "0.8.3"
Compat = "3.17"
DataAPI = "1.3"
InvertedIndices = "1"
IteratorInterfaceExtensions = "0.1.1, 1"
Missings = "0.4.2"
Expand Down
2 changes: 1 addition & 1 deletion README.md
Expand Up @@ -2,7 +2,7 @@ DataFrames.jl
=============

[![Coverage Status](https://coveralls.io/repos/JuliaData/DataFrames.jl/badge.svg?branch=master&service=github)](https://coveralls.io/github/JuliaData/DataFrames.jl?branch=master)
[![Travis Build Status](https://travis-ci.org/JuliaData/DataFrames.jl.svg?branch=master)](https://travis-ci.org/JuliaData/DataFrames.jl)
[![Travis Build Status](https://travis-ci.com/JuliaData/DataFrames.jl.svg?branch=master)](https://travis-ci.com/JuliaData/DataFrames.jl)

Tools for working with tabular data in Julia.

Expand Down
7 changes: 5 additions & 2 deletions docs/make.jl
Expand Up @@ -14,7 +14,10 @@ makedocs(
doctest = false,
clean = false,
sitename = "DataFrames.jl",
format = Documenter.HTML(),
format = Documenter.HTML(
canonical = "https://juliadata.github.io/DataFrames.jl/stable/",
assets = ["assets/favicon.ico"]
),
pages = Any[
"Introduction" => "index.md",
"User Guide" => Any[
Expand All @@ -26,7 +29,7 @@ makedocs(
"Categorical Data" => "man/categorical.md",
"Missing Data" => "man/missing.md",
"Data manipulation frameworks" => "man/querying_frameworks.md",
"Comparison with Stata/R" => "man/comparisons.md"
"Comparison with Python/R/Stata" => "man/comparisons.md"
],
"API" => Any[
"Types" => "lib/types.md",
Expand Down
Binary file added docs/src/assets/favicon.ico
Binary file not shown.
15 changes: 11 additions & 4 deletions docs/src/index.md
Expand Up @@ -19,8 +19,8 @@ especially for those coming to Julia from R or Python.

DataFrames.jl plays a central role in the Julia Data ecosystem, and has tight
integrations with a range of different libraries. DataFrames.jl isn't the only
tool for working with tabular data in Julia --- as noted below, there are some
other great libraries for certain use-cases --- but it provides great data
tool for working with tabular data in Julia -- as noted below, there are some
other great libraries for certain use-cases -- but it provides great data
wrangling functionality through a familiar interface.

## DataFrames.jl and the Julia Data Ecosystem
Expand Down Expand Up @@ -67,6 +67,13 @@ integrated they are with DataFrames.jl.
- [ScikitLearn.jl](https://cstjean.github.io/ScikitLearn.jl/stable/):
A Julia wrapper around the full Python scikit-learn machine learning library.
Not well integrated with DataFrames.jl, but can be combined using StatsModels.jl.
- [AutoMLPipeline](https://github.com/IBM/AutoMLPipeline.jl):
A package that makes it trivial to create complex ML
pipeline structures using simple expressions. It leverages
on the built-in macro programming features of Julia to
symbolically process, manipulate pipeline expressions,
and makes it easy to discover optimal structures for
machine learning regression and classification.
- Deep learning:
[KNet.jl](https://denizyuret.github.io/Knet.jl/stable/tutorial/#Introduction-to-Knet-1)
and [Flux.jl](https://github.com/FluxML/Flux.jl).
Expand Down Expand Up @@ -107,8 +114,8 @@ integrated they are with DataFrames.jl.
CSVs (using [CSV.jl](https://github.com/JuliaData/CSV.jl)),
Stata, SPSS, and SAS files (using
[StatFiles.jl](https://github.com/queryverse/StatFiles.jl)),
and reading (though not writing) parquet files
(using [ParquetFiles.jl](https://github.com/queryverse/ParquetFiles.jl)).
and reading and writing parquet files
(using [Parquet.jl](https://github.com/JuliaIO/Parquet.jl)).

While not all of these libraries are tightly integrated with DataFrames.jl,
because `DataFrame`s are essentially collections of aligned Julia vectors, so it
Expand Down
2 changes: 2 additions & 0 deletions docs/src/lib/functions.md
Expand Up @@ -57,6 +57,7 @@ vcat
```@docs
stack
unstack
permutedims
```

## Sorting
Expand Down Expand Up @@ -99,6 +100,7 @@ filter
filter!
first
last
only
nonunique
unique
unique!
Expand Down
2 changes: 1 addition & 1 deletion docs/src/lib/indexing.md
Expand Up @@ -26,7 +26,7 @@ The rules for a valid type of index into a column are the following:
* a vector of `Bool` that has to be a subtype of `AbstractVector{Bool}`;
* a regular expression, which gets expanded to a vector of matching column names;
* a `Not` expression (see [InvertedIndices.jl](https://github.com/mbauman/InvertedIndices.jl));
* an `All` or `Between` expression (see [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl));
* an `Cols`, `All` or `Between` expression (see [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl));
* a colon literal `:`.

The rules for a valid type of index into a row are the following:
Expand Down
3 changes: 2 additions & 1 deletion docs/src/lib/types.md
Expand Up @@ -55,7 +55,8 @@ The `ByRow` type is a special type used for selection operations to signal that
to each element (row) of the selection.

The `AsTable` type is a special type used for selection operations to signal that the columns selected by a wrapped
selector should be passed as a `NamedTuple` to the function.
selector should be passed as a `NamedTuple` to the function or to signal that it is requested
to expand the return value of a transformation into multiple columns.

## [The design of handling of columns of a `DataFrame`](@id man-columnhandling)

Expand Down

0 comments on commit 5a15791

Please sign in to comment.