Merge branch 'master' into names_predicate

JuliaData · Nov 1, 2020 · 5a15791 · 5a15791
2 parents fc66601 + 540f901
commit 5a15791
Show file tree

Hide file tree

Showing 50 changed files with 2,970 additions and 913 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -16,26 +16,34 @@ Thanks for taking the plunge!
 
 ## Contributing
 
+* DataFrames.jl is a relatively complex package that also has many external dependencies.
+  Therefore if you would want to propose a new functionality (which is encouraged) it is
+  strongly recommended to open an issue first and reach a decision on the final design.
+  Then a pull request serves an implementation of the agreed way how things should work.
 * Feel free to open, or comment on, an issue and solicit feedback early on,
   especially if you're unsure about aligning with design goals and direction,
-  or if relevant historical comments are ambiguous
+  or if relevant historical comments are ambiguous.
 * Pair new functionality with tests, and bug fixes with tests that fail pre-fix.
-  Increasing test coverage as you go is always nice
+  Increasing test coverage as you go is always nice.
 * Aim for atomic commits, if possible, e.g. `change 'foo' behavior like so` &
   `'bar' handles such and such corner case`,
-  rather than `update 'foo' and 'bar'` & `fix typo` & `fix 'bar' better`
+  rather than `update 'foo' and 'bar'` & `fix typo` & `fix 'bar' better`.
 * Pull requests are tested against release and development branches of Julia,
-  so using `Pkg.test("DataFrames")` as you develop can be helpful
+  so using `Pkg.test("DataFrames")` as you develop can be helpful.
 * The style guidelines outlined below are not the personal style of most contributors,
-  but for consistency throughout the project, we've adopted them
-* It is recommended to disable GitHub Actions on your fork; check Settings > Actions
+  but for consistency throughout the project, we've adopted them.
+* It is recommended to disable GitHub Actions on your fork; check Settings > Actions.
 * If a PR adds a new exported name then make sure to add a docstring for it and
-  add a reference to it in the documentation
-* A PR with breaking changes should have `[BREAKING]` as a first part of its name
+  add a reference to it in the documentation.
+* A PR with breaking changes should have `[BREAKING]` as a first part of its name.
 * If a PR changes or adds functionality please update NEWS.md file accordingly as
   a part of the PR (along with the link to the PR); please do not add entries
   to NEWS.md for changes that are bug fixes or are not user visible, such as
-  adding tests, updating documentation or improving code layout
+  adding tests, updating documentation or improving code layout.
+* If you make a PR please try to avoid pushing many small commits to GitHub in
+  a sequence as each such commit triggers a separate CI job, which takes over
+  an hour. This has a consequence of making other PRs in packages from the JuliaData
+  ecosystem wait for such CI jobs to finish as hey share a common pool of CI resources.
 
 ## Style Guidelines
 

diff --git a/NEWS.md b/NEWS.md
@@ -2,6 +2,10 @@
 
 ## Breaking changes
 
+* the rules for transformations passed to `select`/`select!`, `transform`/`transform!`,
+  and `combine` have been made more flexible; in particular now it is allowed to
+  return multiple columns from a transformation function
+  [#2461](https://github.com/JuliaData/DataFrames.jl/pull/2461)
 * CategoricalArrays.jl is no longer reexported: call `using CategoricalArrays`
   to use it [#2404]((https://github.com/JuliaData/DataFrames.jl/pull/2404)).
   In the same vein, the `categorical` and `categorical!` functions
@@ -32,6 +36,16 @@
   choose the fast path only when it is safe; this resolves inconsistencies
   with what the same functions not using fast path produce
   ([#2357](https://github.com/JuliaData/DataFrames.jl/pull/2357))
+* joins now return `PooledVector` not `CategoricalVector` in indicator column
+  ([#2505](https://github.com/JuliaData/DataFrames.jl/pull/2505))
+* `GroupKeys` now supports `in` for `GroupKey`, `Tuple`, `NamedTuple` and dictionaries
+  ([2392](https://github.com/JuliaData/DataFrames.jl/pull/2392))
+* in `describe` the specification of custom aggregation is now `function => name`;
+  old `name => function` order is now deprecated
+  ([#2401](https://github.com/JuliaData/DataFrames.jl/pull/2401))
+* `unstack` now produces row and column keys in the order of their first appearance
+   and has two new keyword arguments `allowmissing` and `allowduplicates`
+  ([#2494](https://github.com/JuliaData/DataFrames.jl/pull/2494))
 
 ## New functionalities
 
@@ -61,6 +75,14 @@
   keyword argument that makes it possible to avoid adding transformation function name
   as a suffix in automatically generated column names
   ([#2397](https://github.com/JuliaData/DataFrames.jl/pull/2397))
+* `filter`, `sort`, `dropmissing`, and `unique` now support a `view` keyword argument
+  which if set to `true` makes them retun a `SubDataFrame` view into the passed
+  data frame.
+* add `only` method for `AbstractDataFrame` ([#2449](https://github.com/JuliaData/DataFrames.jl/pull/2449))
+* passing empty sets of columns in `filter`/`filter!` and in `select`/`transform`/`combine`
+  with `ByRow` is now accepted ([#2476](https://github.com/JuliaData/DataFrames.jl/pull/2476))
+* add `permutedims` method for `AbstractDataFrame` ([#2447](https://github.com/JuliaData/DataFrames.jl/pull/2447))
+* add support for `Cols` from DataAPI.jl ([#2495](https://github.com/JuliaData/DataFrames.jl/pull/2495))
 
 ## Deprecated
 
@@ -76,3 +98,7 @@
   ([#2315](https://github.com/JuliaData/DataFrames.jl/pull/2315))
 * add rich display support for Markdown cell entries in HTML and LaTeX
   ([#2346](https://github.com/JuliaData/DataFrames.jl/pull/2346))
+* limit the maximal display width the output can use in `text/plain` before
+  being truncated (in the `textwidth` sense, excluding `…`) to `32` per column
+  by default and fix a corner case when no columns are printed in situations when
+  they are too wide ([2403](https://github.com/JuliaData/DataFrames.jl/pull/2403))
diff --git a/Project.toml b/Project.toml
@@ -35,9 +35,9 @@ test = ["DataStructures", "DataValues", "Dates", "Logging", "Random", "Test"]
 
 [compat]
 julia = "1"
-CategoricalArrays = "0.8"
-Compat = "2.2, 3"
-DataAPI = "1.2"
+CategoricalArrays = "0.8.3"
+Compat = "3.17"
+DataAPI = "1.3"
 InvertedIndices = "1"
 IteratorInterfaceExtensions = "0.1.1, 1"
 Missings = "0.4.2"

diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@ DataFrames.jl
 =============
 
 [![Coverage Status](https://coveralls.io/repos/JuliaData/DataFrames.jl/badge.svg?branch=master&service=github)](https://coveralls.io/github/JuliaData/DataFrames.jl?branch=master)
-[![Travis Build Status](https://travis-ci.org/JuliaData/DataFrames.jl.svg?branch=master)](https://travis-ci.org/JuliaData/DataFrames.jl)
+[![Travis Build Status](https://travis-ci.com/JuliaData/DataFrames.jl.svg?branch=master)](https://travis-ci.com/JuliaData/DataFrames.jl)
 
 Tools for working with tabular data in Julia.
 

diff --git a/docs/make.jl b/docs/make.jl
@@ -14,7 +14,10 @@ makedocs(
     doctest = false,
     clean = false,
     sitename = "DataFrames.jl",
-    format = Documenter.HTML(),
+    format = Documenter.HTML(
+        canonical = "https://juliadata.github.io/DataFrames.jl/stable/",
+        assets = ["assets/favicon.ico"]
+    ),
     pages = Any[
         "Introduction" => "index.md",
         "User Guide" => Any[
@@ -26,7 +29,7 @@ makedocs(
             "Categorical Data" => "man/categorical.md",
             "Missing Data" => "man/missing.md",
             "Data manipulation frameworks" => "man/querying_frameworks.md",
-            "Comparison with Stata/R" => "man/comparisons.md"
+            "Comparison with Python/R/Stata" => "man/comparisons.md"
         ],
         "API" => Any[
             "Types" => "lib/types.md",

diff --git a/docs/src/assets/favicon.ico b/docs/src/assets/favicon.ico
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -19,8 +19,8 @@ especially for those  coming to Julia from R or Python.
 
 DataFrames.jl plays a central role in the Julia Data ecosystem, and has tight
 integrations with a range of different libraries. DataFrames.jl isn't the only
-tool for working with tabular data in Julia --- as noted below, there are some
-other great libraries for certain use-cases --- but it provides great data
+tool for working with tabular data in Julia -- as noted below, there are some
+other great libraries for certain use-cases -- but it provides great data
 wrangling functionality through a familiar interface.
 
 ## DataFrames.jl and the Julia Data Ecosystem
@@ -67,6 +67,13 @@ integrated they are with DataFrames.jl.
     - [ScikitLearn.jl](https://cstjean.github.io/ScikitLearn.jl/stable/):
       A Julia wrapper around the full Python scikit-learn machine learning library.
       Not well integrated with DataFrames.jl, but can be combined using StatsModels.jl.
+    - [AutoMLPipeline](https://github.com/IBM/AutoMLPipeline.jl):
+      A package that makes it trivial to create complex ML 
+      pipeline structures using simple expressions. It leverages 
+      on the built-in macro programming features of Julia to 
+      symbolically process, manipulate pipeline expressions, 
+      and makes it easy to discover optimal structures for 
+      machine learning regression and classification.
     - Deep learning:
       [KNet.jl](https://denizyuret.github.io/Knet.jl/stable/tutorial/#Introduction-to-Knet-1)
       and [Flux.jl](https://github.com/FluxML/Flux.jl).
@@ -107,8 +114,8 @@ integrated they are with DataFrames.jl.
       CSVs (using [CSV.jl](https://github.com/JuliaData/CSV.jl)),
       Stata, SPSS, and SAS files (using
       [StatFiles.jl](https://github.com/queryverse/StatFiles.jl)),
-      and reading (though not writing) parquet files
-      (using [ParquetFiles.jl](https://github.com/queryverse/ParquetFiles.jl)).
+      and reading and writing parquet files
+      (using [Parquet.jl](https://github.com/JuliaIO/Parquet.jl)).
 
 While not all of these libraries are tightly integrated with DataFrames.jl,
 because `DataFrame`s are essentially collections of aligned Julia vectors, so it

diff --git a/docs/src/lib/functions.md b/docs/src/lib/functions.md
@@ -57,6 +57,7 @@ vcat
 ```@docs
 stack
 unstack
+permutedims
 ```
 
 ## Sorting
@@ -99,6 +100,7 @@ filter
 filter!
 first
 last
+only
 nonunique
 unique
 unique!

diff --git a/docs/src/lib/indexing.md b/docs/src/lib/indexing.md
@@ -26,7 +26,7 @@ The rules for a valid type of index into a column are the following:
     * a vector of `Bool` that has to be a subtype of `AbstractVector{Bool}`;
     * a regular expression, which gets expanded to a vector of matching column names;
     * a `Not` expression (see [InvertedIndices.jl](https://github.com/mbauman/InvertedIndices.jl));
-    * an `All` or `Between` expression (see [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl));
+    * an `Cols`, `All` or `Between` expression (see [DataAPI.jl](https://github.com/JuliaData/DataAPI.jl));
     * a colon literal `:`.
 
 The rules for a valid type of index into a row are the following:

diff --git a/docs/src/lib/types.md b/docs/src/lib/types.md
@@ -55,7 +55,8 @@ The `ByRow` type is a special type used for selection operations to signal that
 to each element (row) of the selection.
 
 The `AsTable` type is a special type used for selection operations to signal that the columns selected by a wrapped
-selector should be passed as a `NamedTuple` to the function.
+selector should be passed as a `NamedTuple` to the function or to signal that it is requested
+to expand the return value of a transformation into multiple columns.
 
 ## [The design of handling of columns of a `DataFrame`](@id man-columnhandling)