Skip to content

Commit

Permalink
Merge pull request #32 from IBM/documentation
Browse files Browse the repository at this point in the history
Documentation
  • Loading branch information
ppalmes committed May 14, 2019
2 parents e1be376 + 34c8b9a commit 77078ca
Show file tree
Hide file tree
Showing 19 changed files with 5,906 additions and 4,991 deletions.
9,852 changes: 4,926 additions & 4,926 deletions data/testdata_output.csv

Large diffs are not rendered by default.

481 changes: 416 additions & 65 deletions docs/StatifierNotebook.jl.ipynb

Large diffs are not rendered by default.

35 changes: 35 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
using Documenter, TSML

using TSML.DecisionTreeLearners

makedocs(modules = [TSML,DecisionTreeLearners],
clean = false,
sitename = "TSML Documentation",
pages = Any[
"HOME" => "index.md",
"Tutorial" => Any[
"tutorial/aggregators.md",
"tutorial/pipeline.md",
"tutorial/statistics.md",
"tutorial/tsdetectors.md"
],
"Manual" => Any[
"Date Processing" => "man/dateproc.md",
"Value Processing" => "man/valueproc.md",
"Aggregation" => "man/aggregation.md",
"Imputation" => "man/imputation.md",
"Monotonic Detection" => "man/monotonic.md",
"TS Classification" => "man/tsclassification.md",
"CLI Wrappers" => "man/cli.md"
],
"Library" => Any[
"Decision Tree" => "lib/decisiontree.md"
#"Scikit Learners" => "lib/sklearn.md",
#"Caret Learners" => "lib/caretlearn.md"
]
],
format = Documenter.HTML(
prettyurls = get(ENV, "CI", nothing) == "true"
)
)

24 changes: 24 additions & 0 deletions docs/make.jl.old
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
using Pkg
using Documenter

ENV["LOAD_SK_CARET"] = "true"

Pkg.activate("..")

using TSML

makedocs(
modules = [TSML,DecisionTreeLearners,SKLearners],
clean = false,
sitename="TSML.jl",
pages = Any[
"Home" => "index.md",
"Library" => Any[
"DecisionTree" =>"lib/decisiontree.md",
"SKLearners" =>"lib/sklearn.md"
]
],
format = Documenter.HTML(
prettyurls = get(ENV, "CI", nothing) == "true"
)
)
69 changes: 69 additions & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
```@meta
Author = "Paulito P. Palmes"
```

# TSML (Time-Series Machine Learning)

TSML (Time Series Machine Learning) is package
for Time Series data processing, classification,
and prediction. It combines ML libraries from Python's
ScikitLearn, R's Caret, and Julia ML using a common API
and allows seamless ensembling and integration of
heterogenous ML libraries to create complex models
for robust time-series pre-processing and prediction/classification.

## Package Features

- TS aggregation based on time/date interval
- TS imputation based on Nearest Neighbors
- TS statistical metrics of data quality
- TS classification for automatic data discovery
- TS prediction with more than 100+ libraries from caret, scikitlearn, and julia
- TS date/val matrix conversion of 1-d TS using sliding windows for ML input
- Pipeline API allows high-level description of the processing workflow
- Easily extensible architecture by using just two main interfaces: fit and transform


## Installation

TSML is in the Julia Official package registry.
The latest release can be installed at the Julia
prompt using Julia's package management:
```julia
julia> ]add TSML
```

or

```julia
julia> using Pkg
julia> pkg"add TSML"
```

or

```julia
julia> using Pkg
julia> Pkg.add("TSML")
```
Once TSML is installed, you can load the TSML package by:

```julia
julia> using TSML
```

or

```julia
julia> import TSML
```
Generally, you will need the different transformers and utils in TSML for
time-series processing. To use them, it is standard in TSML code to have the
following declared at the topmost part of your application:

```julia
using TSML
using TSML.TSMLTransformers
using TSML.TSMLTypes
using TSML.Utils
```
14 changes: 14 additions & 0 deletions docs/src/lib/caretlearn.md1
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
```@meta
Author = "Paulito Palmes"
```

# [CaretLearners](@id lib_caretlearners)
Creates an API wrapper for Caret Libs for pipeline workflow.

```@index
Modules = [CaretLearners]
```

```@autodocs
Modules = [CaretLearners]
```
14 changes: 14 additions & 0 deletions docs/src/lib/decisiontree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
```@meta
Author = "Paulito Palmes"
```

# [DecisionTreeLearners](@id lib_decisiontree)
Creates an API wrapper for DecisionTrees for pipeline workflow.

```@index
Modules = [DecisionTreeLearners]
```

```@autodocs
Modules = [DecisionTreeLearners]
```
14 changes: 14 additions & 0 deletions docs/src/lib/sklearn.md1
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
```@meta
Author = "Paulito Palmes"
```

# [SKLearners](@id lib_sklearners)
Creates an API wrapper for Scikit Learners for pipeline workflow.

```@index
Modules = [SKLearners]
```

```@autodocs
Modules = [SKLearners]
```
5 changes: 5 additions & 0 deletions docs/src/man/aggregation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
```@meta
Author = "Paulito P. Palmes"
```

# Aggregation
5 changes: 5 additions & 0 deletions docs/src/man/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
```@meta
Author = "Paulito P. Palmes"
```

# CLI Wrappers
5 changes: 5 additions & 0 deletions docs/src/man/dateproc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
```@meta
Author = "Paulito P. Palmes"
```

# Preprocessing
5 changes: 5 additions & 0 deletions docs/src/man/imputation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
```@meta
Author = "Paulito P. Palmes"
```

# Imputation
5 changes: 5 additions & 0 deletions docs/src/man/monotonic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
```@meta
Author = "Paulito P. Palmes"
```

# Monotonic Detection
5 changes: 5 additions & 0 deletions docs/src/man/tsclassification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
```@meta
Author = "Paulito P. Palmes"
```

# TS Classification
5 changes: 5 additions & 0 deletions docs/src/man/valueproc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
```@meta
Author = "Paulito P. Palmes"
```

# Value Preprocessing
96 changes: 96 additions & 0 deletions docs/src/tutorial/aggregators.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
```@meta
Author = "Paulito P. Palmes"
```

# Aggregators and Imputers

The package assumes a two-column input composed of Dates and Values.
The first part of the workflow aggregates values based on the specified
date/time interval which minimizes occurence of missing values and noise.
The aggregated data is then left-joined to the complete sequence of dates
in a specified date/time interval. Remaining missing values are replaced
by k nearest neighbors where k is the symmetric distance from the location
of missing value. This approach can be called several times until there
are no more missing values.

Let us create Date, Value input with some missing values and apply TSML functions
to normalize/clean the data:

```@example 1
using Random, Dates, DataFrames
function generateDataWithMissing()
Random.seed!(123)
gdate = DateTime(2014,1,1):Dates.Minute(15):DateTime(2016,1,1)
gval = Array{Union{Missing,Float64}}(rand(length(gdate)))
gmissing = 50000
gndxmissing = Random.shuffle(1:length(gdate))[1:gmissing]
df = DataFrame(Date=gdate,Value=gval)
df[:Value][gndxmissing] .= missing
return df
end
```

Let's output the first 20 rows:

```@example 1
X = generateDataWithMissing()
first(X,20)
```
## DateValgator
You'll notice several blocks of missing with reading frequency every 15 minutes.
Let's aggregate our dataset by taking the hourly median using the `DateValgator` transformer.

```@example 1
using TSML
using TSML.TSMLTypes
using TSML.Utils
using TSML.TSMLTransformers
using TSML: DateValgator
dtvlgator = DateValgator(Dict(:dateinterval=>Dates.Hour(1)))
fit!(dtvlgator,X)
results = transform!(dtvlgator,X)
first(results,20)
```

Missing values are now reduced because of the aggregation applied using
`DateValgator` transformer. TSML transformers support the two main functions:
`fit!` and `transform!`. `DateValgator fit!` performs initial setups of necessary parameters
and validation of arguments while its `transform!` contains the algorithm for aggregation.

## DateValNNer

Let's perform further processing to replace the remaining missing values with their nearest neighbors.
We will use `DateValNNer` which is a TSML transformer to process the output of `DateValgator`.
`DateValNNer` can also process non-aggregated data by first running similar workflow
of `DateValgator` before performing its imputation routine.

```@example 1
using TSML: DateValNNer
datevalnner = DateValNNer(Dict(:dateinterval=>Dates.Hour(1)))
fit!(datevalnner, X)
results = transform!(datevalnner,X)
first(results,20)
```

After running the `DateValNNer`, it's guaranteed that there will be no more
missing data.

## DateValizer

One more imputer to replace missing data is `DateValizer`. It computes the hourly
median over 24 hours and use the hour => median mapping
to replace missing data with the hour as the key. Below is a sample
workflow to replace missing data in X with the hourly medians.

```@example 1
using TSML: DateValizer
datevalizer = DateValizer(Dict(:dateinterval=>Dates.Hour(1)))
fit!(datevalizer, X)
results = transform!(datevalizer,X)
first(results,20)
```


0 comments on commit 77078ca

Please sign in to comment.