Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of documentation #52

Merged
merged 9 commits into from
Apr 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
[deps]
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
PairPlots = "43a3c2be-4208-490b-832a-a21dcd55d7da"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
TableTransforms = "0d432bfd-3ee1-4ac1-886a-39f05cc69a3e"

[compat]
Expand Down
36 changes: 22 additions & 14 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,30 @@ using Documenter

DocMeta.setdocmeta!(TableTransforms, :DocTestSetup, :(using TableTransforms); recursive=true)

# Workaround for GR warnings
ENV["GKSwstype"] = "100"

makedocs(;
modules=[TableTransforms],
authors="Júlio Hoffimann <julio.hoffimann@gmail.com> and contributors",
repo="https://github.com/JuliaML/TableTransforms.jl/blob/{commit}{path}#{line}",
sitename="TableTransforms.jl",
format=Documenter.HTML(;
prettyurls=get(ENV, "CI", "false") == "true",
canonical="https://JuliaML.github.io/TableTransforms.jl",
assets=String[],
),
pages=[
"Home" => "index.md",
],
modules=[TableTransforms],
authors="Júlio Hoffimann <julio.hoffimann@gmail.com> and contributors",
repo="https://github.com/JuliaML/TableTransforms.jl/blob/{commit}{path}#{line}",
sitename="TableTransforms.jl",
format=Documenter.HTML(;
prettyurls=get(ENV, "CI", "false") == "true",
canonical="https://JuliaML.github.io/TableTransforms.jl",
assets=String[]
),
pages=[
"Home" => "index.md",
"Transforms" => [
"transforms/builtin.md",
"transforms/external.md"
]
]
)

deploydocs(;
repo="github.com/JuliaML/TableTransforms.jl",
devbranch="main",
repo="github.com/JuliaML/TableTransforms.jl",
devbranch="master",
push_preview=true
)
167 changes: 162 additions & 5 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,170 @@
CurrentModule = TableTransforms
```

# TableTransforms
# TableTransforms.jl

Documentation for [TableTransforms](https://github.com/JuliaML/TableTransforms.jl).
## Overview

```@index
This package provides transforms that are commonly used in statistics
and machine learning. It was developed to address specific needs in
feature engineering and works with general
[Tables.jl](https://github.com/JuliaData/Tables.jl) tables.

Past attempts to model transforms in Julia such as
[FeatureTransforms.jl](https://github.com/invenia/FeatureTransforms.jl)
served as inspiration for this package. We are happy to absorb any
missing transform, and contributions are very welcome.

## Features

- Transforms are **revertible** meaning that one can apply a transform
and undo the transformation without having to do all the manual work
keeping constants around.

- Pipelines can be easily constructed with clean syntax
`(f1 → f2 → f3) ⊔ (f4 → f5)`, and they are automatically
revertible when the individual transforms are revertible.

- Branches of a pipeline and colwise transforms are run in parallel
using multiple threads with the awesome
[Transducers.jl](https://github.com/JuliaFolds/Transducers.jl)
framework.

- Pipelines can be reapplied to unseen "test" data using the same cache
(e.g. constants) fitted with "training" data. For example, a `ZScore`
relies on "fitting" `μ` and `σ` once at training time.

## Rationale

A common task in statistics and machine learning consists of transforming
the variables of a problem to achieve better convergence or to apply methods
that rely on multivariate Gaussian distributions. This process can be quite
tedious to implement by hand and very error-prone. We provide a consistent
and clean API to combine statistical transforms into pipelines.

*Although most transforms discussed here come from the statistical domain,
our long term vision is more ambitious. We aim to provide a complete
user experience with fully-featured pipelines that include standardization
of column names, imputation of missing data, and more.*

## Installation

Get the latest stable release with Julia's package manager:

```julia
] add TableTransforms
```

```@autodocs
Modules = [TableTransforms]
## Usage

Below is a quick example with simple transforms:

```@example usage
using TableTransforms
using Plots, PairPlots
using Distributions
using Random; Random.seed!(2) # hide
gr(format=:png) # hide

# example table from PairPlots.jl
N = 100_000
a = [2randn(N÷2) .+ 6; randn(N÷2)]
b = [3randn(N÷2); 2randn(N÷2)]
c = randn(N)
d = c .+ 0.6randn(N)
table = (; a, b, c, d)

# corner plot of original table
table |> corner
```

```@example usage
# convert to PCA scores
table |> PCA() |> corner
```

```@example usage
# convert to any Distributions.jl
table |> Quantile(Normal()) |> corner
```

Below is a more sophisticated example with a pipeline that has
two parallel branches. The tables produced by these two branches
are concatenated horizontally in the final table:

```@example usage
# create a transform pipeline
f1 = ZScore()
f2 = Scale()
f3 = Quantile()
f4 = Functional(cos)
f5 = Interquartile()
pipeline = (f1 → f2 → f3) ⊔ (f4 → f5)

# feed data into the pipeline
table |> pipeline |> corner
```

Each branch is a sequence of transforms constructed with the `→` (`\to<tab>`) operator.
The branches are placed in parallel with the `⊔` (`\sqcup<tab>`) operator.

```@docs
```

To revert a pipeline or single transform, use the [`apply`](@ref) and [`revert`](@ref)
functions instead. The function [`isrevertible`](@ref) can be used to check if a transform is revertible.

```@docs
apply
revert
isrevertible
```

To exemplify the use of these functions, let's create a table:

```@example usage
a = [-1.0, 4.0, 1.6, 3.4]
b = [1.6, 3.4, -1.0, 4.0]
c = [3.4, 2.0, 3.6, -1.0]
table = (; a, b, c)
```

Now, let's choose a transform and check if it is reversible:

```@example usage
transform = Center()
isrevertible(transform)
```

We apply the transformation to the table and save the cache in a variable:

```@example usage
newtable, cache = apply(transform, table)
newtable
```

Using the cache we can reverse the transform:

```@example usage
original = revert(transform, newtable, cache)
```

Finally, it is sometimes useful to [`reapply`](@ref) a transform that was
"fitted" with training data to unseen test data. In this case, the
cache from a previous [`apply`](@ref) call is used:

```@docs
reapply
```

Consider the following example:

```julia
# ZScore transform "fits" μ and σ using training data
newtable, cache = apply(ZScore(), traintable)

# we can reuse the same values of μ and σ with test data
newtable = reapply(ZScore(), testtable, cache)
```
eliascarv marked this conversation as resolved.
Show resolved Hide resolved
135 changes: 135 additions & 0 deletions docs/src/transforms/builtin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Built-in

Below is the list of tranforms that are are available in this package.

## Select

```@docs
Select
```

## Reject

```@docs
Reject
```

## Filter

```@docs
Filter
```

## DropMissing

```@docs
DropMissing
```

## Rename

```@docs
Rename
```

## Replace

```@docs
Replace
```

## Coalesce

```@docs
Coalesce
```

## Coerce

```@docs
Coerce
```

## Identity

```@docs
Identity
```

## Center

```@docs
Center
```

## Scale

```@docs
Scale
```

## MinMax

```@docs
MinMax
```

## Interquartile

```@docs
Interquartile
```

## ZScore

```@docs
ZScore
```

## Quantile

```@docs
Quantile
```

## Functional

```@docs
Functional
```

## EigenAnalysis

```@docs
EigenAnalysis
```

## PCA

```@docs
PCA
```

## DRS

```@docs
DRS
```

## SDS

```@docs
SDS
```

## Sequential

```@docs
Sequential
```

## Parallel

```@docs
Parallel
```
13 changes: 13 additions & 0 deletions docs/src/transforms/external.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# External

Below is the list of transforms that are available in external packages.

## [CoDa.jl](https://github.com/JuliaEarth/CoDa.jl)

| Transform | Description |
|-----------|-------------|
| `Closure` | Compositional closure |
| `Remainder` | Compositional remainder |
| `ALR` | Additive log-ratio |
| `CLR` | Centered log-ratio |
| `ILR` | Isometric log-ratio |