Use doctests throughout documentation, and add missing functions to docs

JuliaData · Nov 14, 2017 · 202c42c · 202c42c
1 parent 8fd0851
commit 202c42c
Show file tree

Hide file tree

Showing 19 changed files with 1,125 additions and 333 deletions.
diff --git a/docs/make.jl b/docs/make.jl
@@ -19,16 +19,11 @@ makedocs(
             "Reshaping" => "man/reshaping_and_pivoting.md",
             "Sorting" => "man/sorting.md",
             "Categorical Data" => "man/categorical.md",
-            "Querying frameworks" => "man/querying_frameworks.md",
+            "Querying frameworks" => "man/querying_frameworks.md"
         ],
         "API" => Any[
-            "Main types" => "lib/maintypes.md",
-            "Utilities" => "lib/utilities.md",
-            "Data manipulation" => "lib/manipulation.md",
-        ],
-        "About" => Any[
-            "Release Notes" => "NEWS.md",
-            "License" => "LICENSE.md",
+            "Types" => "lib/types.md",
+            "Functions" => "lib/functions.md"
         ]
     ]
 )

diff --git a/docs/src/LICENSE.md b/docs/src/LICENSE.md
diff --git a/docs/src/NEWS.md b/docs/src/NEWS.md
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -1,21 +1,40 @@
-# DataFrames Documentation Outline
+# DataFrames.jl
+
+Welcome to the DataFrames documentation! This resource aims to teach you everything you need
+to know to get up and running with tabular data manipulation using the DataFrames.jl package
+and the Julia language. If there is something you expect DataFrames to be capable of, but
+cannot figure out how to do, please reach out with questions in Domains/Data on
+[Discourse](https://discourse.julialang.org/new-topic?title=[DataFrames%20Question]:%20&body=%23%20Question:%0A%0A%23%20Dataset%20(if%20applicable):%0A%0A%23%20Minimal%20Working%20Example%20(if%20applicable):%0A&category=Domains/Data&tags=question).
+Please report bugs by
+[opening an issue](https://github.com/JuliaData/DataFrames.jl/issues/new). You can follow
+the [**source**]() links throughout the documentation to jump right to the
+source files on GitHub to make pull requests for improving the documentation and function
+capabilities. Please review
+[DataFrames contributing guidelines](https://github.com/JuliaData/DataFrames.jl/blob/master/CONTRIBUTING.md)
+before submitting your first PR! Information on specific versions can be found on the [Release page](https://github.com/JuliaData/DataFrames.jl/releases).
 
 ## Package Manual
 
 ```@contents
-Pages = ["man/getting_started.md", "man/joins.md", "man/split_apply_combine.md", "man/reshaping_and_pivoting.md", "man/sorting.md", "man/categorical.md", "man/querying_frameworks.md"]
+Pages = ["man/getting_started.md",
+         "man/joins.md",
+         "man/split_apply_combine.md",
+         "man/reshaping_and_pivoting.md",
+         "man/sorting.md",
+         "man/categorical.md",
+         "man/querying_frameworks.md"]
 Depth = 2
 ```
 
 ## API
 
 ```@contents
-Pages = ["lib/maintypes.md", "lib/manipulation.md", "lib/utilities.md"]
+Pages = ["lib/types.md", "lib/functions.md"]
 Depth = 2
 ```
 
-## Documentation Index
+## Index
 
 ```@index
-Pages = ["lib/maintypes.md", "lib/manipulation.md", "lib/utilities.md"]
+Pages = ["lib/types.md", "lib/functions.md"]
 ```
diff --git a/docs/src/lib/functions.md b/docs/src/lib/functions.md
@@ -0,0 +1,54 @@
+```@meta
+CurrentModule = DataFrames
+```
+
+# Functions
+
+```@index
+Pages = ["functions.md"]
+```
+
+## Grouping, Joining, and Split-Apply-Combine
+
+```@docs
+aggregate
+by
+colwise
+groupby
+join
+melt
+stack
+unstack
+stackdf
+meltdf
+```
+
+## Basics
+
+```@docs
+categorical!
+combine
+completecases
+deleterows!
+describe
+dropnull
+dropnull!
+eachcol
+eachrow
+eltypes
+head
+names
+names!
+nonunique
+nullable!
+order
+rename!
+rename
+show
+showcols
+size
+sort
+sort!
+tail
+unique!
+```
diff --git a/docs/src/lib/manipulation.md b/docs/src/lib/manipulation.md
diff --git a/docs/src/lib/maintypes.md → docs/src/lib/types.md b/docs/src/lib/maintypes.md → docs/src/lib/types.md
@@ -3,14 +3,17 @@
 CurrentModule = DataFrames
 ```
 
-# Main Types
+# Types
 
 ```@index
-Pages = ["maintypes.md"]
+Pages = ["types.md"]
 ```
 
 ```@docs
 AbstractDataFrame
 DataFrame
+DataFrameRow
+GroupApplied
+GroupedDataFrame
 SubDataFrame
 ```
diff --git a/docs/src/lib/utilities.md b/docs/src/lib/utilities.md
diff --git a/docs/src/man/categorical.md b/docs/src/man/categorical.md
@@ -2,52 +2,151 @@
 
 Often, we have to deal with factors that take on a small number of levels:
 
-```julia
-v = ["Group A", "Group A", "Group A",
-     "Group B", "Group B", "Group B"]
+```jldoctest categorical
+julia> v = ["Group A", "Group A", "Group A", "Group B", "Group B", "Group B"]
+6-element Array{String,1}:
+ "Group A"
+ "Group A"
+ "Group A"
+ "Group B"
+ "Group B"
+ "Group B"
+
 ```
 
 The naive encoding used in an `Array` represents every entry of this vector as a full string. In contrast, we can represent the data more efficiently by replacing the strings with indices into a small pool of levels. This is what the `CategoricalArray` type does:
 
-```julia
-cv = CategoricalArray(["Group A", "Group A", "Group A",
-                       "Group B", "Group B", "Group B"])
+```jldoctest categorical
+julia> using CategoricalArrays
+
+julia> cv = CategoricalArray(v)
+6-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
+ "Group A"
+ "Group A"
+ "Group A"
+ "Group B"
+ "Group B"
+ "Group B"
+
 ```
 
 `CategoricalArrays` support missing values via the `Nulls` package.
 
-```julia
-using Nulls
-cv = CategoricalArray(["Group A", null, "Group A",
-                       "Group B", "Group B", null])
+```jldoctest categorical
+julia> using Nulls
+
+julia> cv = CategoricalArray(["Group A", null, "Group A",
+                              "Group B", "Group B", null])
+6-element CategoricalArrays.CategoricalArray{Union{Nulls.Null, String},1,UInt32}:
+ "Group A"
+ null
+ "Group A"
+ "Group B"
+ "Group B"
+ null
 ```
 
 In addition to representing repeated data efficiently, the `CategoricalArray` type allows us to determine efficiently the allowed levels of the variable at any time using the `levels` function (note that levels may or may not be actually used in the data):
 
-```julia
-levels(cv)
+```jldoctest categorical
+julia> levels(cv)
+2-element Array{String,1}:
+ "Group A"
+ "Group B"
+
 ```
 
 The `levels!` function also allows changing the order of appearance of the levels, which can be useful for display purposes or when working with ordered variables.
 
-By default, a `CategoricalArray` is able to represent 2<sup>32</sup>differents levels. You can use less memory by calling the `compact` function:
+```jldoctest categorical
+julia> levels!(cv, ["Group B", "Group A"]);
+
+julia> levels(cv)
+2-element Array{String,1}:
+ "Group B"
+ "Group A"
+
+julia> sort(cv)
+6-element CategoricalArrays.CategoricalArray{Union{Nulls.Null, String},1,UInt32}:
+ "Group B"
+ "Group B"
+ "Group A"
+ "Group A"
+ null
+ null
 
-```julia
-cv = compact(cv)
 ```
 
-Often, you will have factors encoded inside a DataFrame with `Array` columns instead of `CategoricalArray` columns. You can do conversion of a single column using the `categorical` function:
+By default, a `CategoricalArray` is able to represent 2<sup>32</sup>differents levels. You can use less memory by calling the `compress` function:
+
+```jldoctest categorical
+julia> cv = compress(cv)
+6-element CategoricalArrays.CategoricalArray{Union{Nulls.Null, String},1,UInt8}:
+ "Group A"
+ null
+ "Group A"
+ "Group B"
+ "Group B"
+ null
 
-```julia
-cv = categorical(v)
 ```
 
-Or you can edit the columns of a `DataFrame` in-place using the `categorical!` function:
+Often, you will have factors encoded inside a DataFrame with `Array` columns instead of
+`CategoricalArray` columns. You can convert one or more columns of the DataFrame using the
+`categorical!` function, which modifies the input DataFrame in-place.
+
+```jldoctest categorical
+julia> using DataFrames
+
+julia> df = DataFrame(A = ["A", "B", "C", "D", "D", "A"],
+                      B = ["X", "X", "X", "Y", "Y", "Y"])
+6×2 DataFrames.DataFrame
+│ Row │ A │ B │
+├─────┼───┼───┤
+│ 1   │ A │ X │
+│ 2   │ B │ X │
+│ 3   │ C │ X │
+│ 4   │ D │ Y │
+│ 5   │ D │ Y │
+│ 6   │ A │ Y │
+
+julia> eltypes(df)
+2-element Array{Type,1}:
+ String
+ String
+
+julia> categorical!(df, :A) # change the column `:A` to be categorical
+6×2 DataFrames.DataFrame
+│ Row │ A │ B │
+├─────┼───┼───┤
+│ 1   │ A │ X │
+│ 2   │ B │ X │
+│ 3   │ C │ X │
+│ 4   │ D │ Y │
+│ 5   │ D │ Y │
+│ 6   │ A │ Y │
+
+julia> eltypes(df)
+2-element Array{Type,1}:
+ CategoricalArrays.CategoricalString{UInt32}
+ String
+
+julia> categorical!(df) # change all columns to be categorical
+6×2 DataFrames.DataFrame
+│ Row │ A │ B │
+├─────┼───┼───┤
+│ 1   │ A │ X │
+│ 2   │ B │ X │
+│ 3   │ C │ X │
+│ 4   │ D │ Y │
+│ 5   │ D │ Y │
+│ 6   │ A │ Y │
+
+julia> eltypes(df)
+2-element Array{Type,1}:
+ CategoricalArrays.CategoricalString{UInt32}
+ CategoricalArrays.CategoricalString{UInt32}
 
-```julia
-df = DataFrame(A = [1, 1, 1, 2, 2, 2],
-               B = ["X", "X", "X", "Y", "Y", "Y"])
-categorical!(df, [:A, :B])
 ```
 
 Using categorical arrays is important for working with the [GLM package](https://github.com/JuliaStats/GLM.jl). When fitting regression models, `CategoricalArray` columns in the input are translated into 0/1 indicator columns in the `ModelMatrix` with one column for each of the levels of the `CategoricalArray`. This allows one to analyze categorical data efficiently.