-
Notifications
You must be signed in to change notification settings - Fork 20
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #32 from IBM/documentation
Documentation
- Loading branch information
Showing
19 changed files
with
5,906 additions
and
4,991 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
using Documenter, TSML | ||
|
||
using TSML.DecisionTreeLearners | ||
|
||
makedocs(modules = [TSML,DecisionTreeLearners], | ||
clean = false, | ||
sitename = "TSML Documentation", | ||
pages = Any[ | ||
"HOME" => "index.md", | ||
"Tutorial" => Any[ | ||
"tutorial/aggregators.md", | ||
"tutorial/pipeline.md", | ||
"tutorial/statistics.md", | ||
"tutorial/tsdetectors.md" | ||
], | ||
"Manual" => Any[ | ||
"Date Processing" => "man/dateproc.md", | ||
"Value Processing" => "man/valueproc.md", | ||
"Aggregation" => "man/aggregation.md", | ||
"Imputation" => "man/imputation.md", | ||
"Monotonic Detection" => "man/monotonic.md", | ||
"TS Classification" => "man/tsclassification.md", | ||
"CLI Wrappers" => "man/cli.md" | ||
], | ||
"Library" => Any[ | ||
"Decision Tree" => "lib/decisiontree.md" | ||
#"Scikit Learners" => "lib/sklearn.md", | ||
#"Caret Learners" => "lib/caretlearn.md" | ||
] | ||
], | ||
format = Documenter.HTML( | ||
prettyurls = get(ENV, "CI", nothing) == "true" | ||
) | ||
) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
using Pkg | ||
using Documenter | ||
|
||
ENV["LOAD_SK_CARET"] = "true" | ||
|
||
Pkg.activate("..") | ||
|
||
using TSML | ||
|
||
makedocs( | ||
modules = [TSML,DecisionTreeLearners,SKLearners], | ||
clean = false, | ||
sitename="TSML.jl", | ||
pages = Any[ | ||
"Home" => "index.md", | ||
"Library" => Any[ | ||
"DecisionTree" =>"lib/decisiontree.md", | ||
"SKLearners" =>"lib/sklearn.md" | ||
] | ||
], | ||
format = Documenter.HTML( | ||
prettyurls = get(ENV, "CI", nothing) == "true" | ||
) | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
```@meta | ||
Author = "Paulito P. Palmes" | ||
``` | ||
|
||
# TSML (Time-Series Machine Learning) | ||
|
||
TSML (Time Series Machine Learning) is package | ||
for Time Series data processing, classification, | ||
and prediction. It combines ML libraries from Python's | ||
ScikitLearn, R's Caret, and Julia ML using a common API | ||
and allows seamless ensembling and integration of | ||
heterogenous ML libraries to create complex models | ||
for robust time-series pre-processing and prediction/classification. | ||
|
||
## Package Features | ||
|
||
- TS aggregation based on time/date interval | ||
- TS imputation based on Nearest Neighbors | ||
- TS statistical metrics of data quality | ||
- TS classification for automatic data discovery | ||
- TS prediction with more than 100+ libraries from caret, scikitlearn, and julia | ||
- TS date/val matrix conversion of 1-d TS using sliding windows for ML input | ||
- Pipeline API allows high-level description of the processing workflow | ||
- Easily extensible architecture by using just two main interfaces: fit and transform | ||
|
||
|
||
## Installation | ||
|
||
TSML is in the Julia Official package registry. | ||
The latest release can be installed at the Julia | ||
prompt using Julia's package management: | ||
```julia | ||
julia> ]add TSML | ||
``` | ||
|
||
or | ||
|
||
```julia | ||
julia> using Pkg | ||
julia> pkg"add TSML" | ||
``` | ||
|
||
or | ||
|
||
```julia | ||
julia> using Pkg | ||
julia> Pkg.add("TSML") | ||
``` | ||
Once TSML is installed, you can load the TSML package by: | ||
|
||
```julia | ||
julia> using TSML | ||
``` | ||
|
||
or | ||
|
||
```julia | ||
julia> import TSML | ||
``` | ||
Generally, you will need the different transformers and utils in TSML for | ||
time-series processing. To use them, it is standard in TSML code to have the | ||
following declared at the topmost part of your application: | ||
|
||
```julia | ||
using TSML | ||
using TSML.TSMLTransformers | ||
using TSML.TSMLTypes | ||
using TSML.Utils | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
```@meta | ||
Author = "Paulito Palmes" | ||
``` | ||
|
||
# [CaretLearners](@id lib_caretlearners) | ||
Creates an API wrapper for Caret Libs for pipeline workflow. | ||
|
||
```@index | ||
Modules = [CaretLearners] | ||
``` | ||
|
||
```@autodocs | ||
Modules = [CaretLearners] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
```@meta | ||
Author = "Paulito Palmes" | ||
``` | ||
|
||
# [DecisionTreeLearners](@id lib_decisiontree) | ||
Creates an API wrapper for DecisionTrees for pipeline workflow. | ||
|
||
```@index | ||
Modules = [DecisionTreeLearners] | ||
``` | ||
|
||
```@autodocs | ||
Modules = [DecisionTreeLearners] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
```@meta | ||
Author = "Paulito Palmes" | ||
``` | ||
|
||
# [SKLearners](@id lib_sklearners) | ||
Creates an API wrapper for Scikit Learners for pipeline workflow. | ||
|
||
```@index | ||
Modules = [SKLearners] | ||
``` | ||
|
||
```@autodocs | ||
Modules = [SKLearners] | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
```@meta | ||
Author = "Paulito P. Palmes" | ||
``` | ||
|
||
# Aggregation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
```@meta | ||
Author = "Paulito P. Palmes" | ||
``` | ||
|
||
# CLI Wrappers |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
```@meta | ||
Author = "Paulito P. Palmes" | ||
``` | ||
|
||
# Preprocessing |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
```@meta | ||
Author = "Paulito P. Palmes" | ||
``` | ||
|
||
# Imputation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
```@meta | ||
Author = "Paulito P. Palmes" | ||
``` | ||
|
||
# Monotonic Detection |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
```@meta | ||
Author = "Paulito P. Palmes" | ||
``` | ||
|
||
# TS Classification |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
```@meta | ||
Author = "Paulito P. Palmes" | ||
``` | ||
|
||
# Value Preprocessing |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
```@meta | ||
Author = "Paulito P. Palmes" | ||
``` | ||
|
||
# Aggregators and Imputers | ||
|
||
The package assumes a two-column input composed of Dates and Values. | ||
The first part of the workflow aggregates values based on the specified | ||
date/time interval which minimizes occurence of missing values and noise. | ||
The aggregated data is then left-joined to the complete sequence of dates | ||
in a specified date/time interval. Remaining missing values are replaced | ||
by k nearest neighbors where k is the symmetric distance from the location | ||
of missing value. This approach can be called several times until there | ||
are no more missing values. | ||
|
||
Let us create Date, Value input with some missing values and apply TSML functions | ||
to normalize/clean the data: | ||
|
||
```@example 1 | ||
using Random, Dates, DataFrames | ||
function generateDataWithMissing() | ||
Random.seed!(123) | ||
gdate = DateTime(2014,1,1):Dates.Minute(15):DateTime(2016,1,1) | ||
gval = Array{Union{Missing,Float64}}(rand(length(gdate))) | ||
gmissing = 50000 | ||
gndxmissing = Random.shuffle(1:length(gdate))[1:gmissing] | ||
df = DataFrame(Date=gdate,Value=gval) | ||
df[:Value][gndxmissing] .= missing | ||
return df | ||
end | ||
``` | ||
|
||
Let's output the first 20 rows: | ||
|
||
```@example 1 | ||
X = generateDataWithMissing() | ||
first(X,20) | ||
``` | ||
## DateValgator | ||
You'll notice several blocks of missing with reading frequency every 15 minutes. | ||
Let's aggregate our dataset by taking the hourly median using the `DateValgator` transformer. | ||
|
||
```@example 1 | ||
using TSML | ||
using TSML.TSMLTypes | ||
using TSML.Utils | ||
using TSML.TSMLTransformers | ||
using TSML: DateValgator | ||
dtvlgator = DateValgator(Dict(:dateinterval=>Dates.Hour(1))) | ||
fit!(dtvlgator,X) | ||
results = transform!(dtvlgator,X) | ||
first(results,20) | ||
``` | ||
|
||
Missing values are now reduced because of the aggregation applied using | ||
`DateValgator` transformer. TSML transformers support the two main functions: | ||
`fit!` and `transform!`. `DateValgator fit!` performs initial setups of necessary parameters | ||
and validation of arguments while its `transform!` contains the algorithm for aggregation. | ||
|
||
## DateValNNer | ||
|
||
Let's perform further processing to replace the remaining missing values with their nearest neighbors. | ||
We will use `DateValNNer` which is a TSML transformer to process the output of `DateValgator`. | ||
`DateValNNer` can also process non-aggregated data by first running similar workflow | ||
of `DateValgator` before performing its imputation routine. | ||
|
||
```@example 1 | ||
using TSML: DateValNNer | ||
datevalnner = DateValNNer(Dict(:dateinterval=>Dates.Hour(1))) | ||
fit!(datevalnner, X) | ||
results = transform!(datevalnner,X) | ||
first(results,20) | ||
``` | ||
|
||
After running the `DateValNNer`, it's guaranteed that there will be no more | ||
missing data. | ||
|
||
## DateValizer | ||
|
||
One more imputer to replace missing data is `DateValizer`. It computes the hourly | ||
median over 24 hours and use the hour => median mapping | ||
to replace missing data with the hour as the key. Below is a sample | ||
workflow to replace missing data in X with the hourly medians. | ||
|
||
```@example 1 | ||
using TSML: DateValizer | ||
datevalizer = DateValizer(Dict(:dateinterval=>Dates.Hour(1))) | ||
fit!(datevalizer, X) | ||
results = transform!(datevalizer,X) | ||
first(results,20) | ||
``` | ||
|
||
|
Oops, something went wrong.