Merge pull request #32 from IBM/documentation

Documentation
IBM · May 14, 2019 · 77078ca · 77078ca
2 parents e1be376 + 34c8b9a
commit 77078ca
Show file tree

Hide file tree

Showing 19 changed files with 5,906 additions and 4,991 deletions.
diff --git a/data/testdata_output.csv b/data/testdata_output.csv
diff --git a/docs/StatifierNotebook.jl.ipynb b/docs/StatifierNotebook.jl.ipynb
diff --git a/docs/make.jl b/docs/make.jl
@@ -0,0 +1,35 @@
+using Documenter, TSML
+
+using TSML.DecisionTreeLearners
+
+makedocs(modules = [TSML,DecisionTreeLearners],
+	 clean = false,
+	 sitename = "TSML Documentation",
+	 pages = Any[
+	    "HOME" => "index.md",
+	    "Tutorial" => Any[
+		    "tutorial/aggregators.md",
+		    "tutorial/pipeline.md",
+		    "tutorial/statistics.md",
+		    "tutorial/tsdetectors.md"
+	    ],
+	    "Manual" => Any[
+		    "Date Processing" => "man/dateproc.md",
+		    "Value Processing" => "man/valueproc.md",
+		    "Aggregation" => "man/aggregation.md",
+		    "Imputation" => "man/imputation.md",
+		    "Monotonic Detection" => "man/monotonic.md",
+		    "TS Classification" => "man/tsclassification.md",
+		    "CLI Wrappers" => "man/cli.md"
+	    ],
+	    "Library" => Any[
+		"Decision Tree" => "lib/decisiontree.md"
+		#"Scikit Learners" => "lib/sklearn.md",
+		#"Caret Learners" => "lib/caretlearn.md"
+	    ]
+	 ],
+	 format = Documenter.HTML(
+	    prettyurls = get(ENV, "CI", nothing) == "true"
+	 )
+     )
+
diff --git a/docs/make.jl.old b/docs/make.jl.old
@@ -0,0 +1,24 @@
+using Pkg
+using Documenter
+
+ENV["LOAD_SK_CARET"] = "true"
+
+Pkg.activate("..")
+
+using TSML
+
+makedocs(
+	modules = [TSML,DecisionTreeLearners,SKLearners],
+	clean = false,
+    sitename="TSML.jl",
+	pages = Any[
+		"Home" => "index.md",
+		"Library" => Any[
+			"DecisionTree" =>"lib/decisiontree.md",
+			"SKLearners" =>"lib/sklearn.md"
+		]
+	],
+	format = Documenter.HTML(
+	  prettyurls = get(ENV, "CI", nothing) == "true"
+	)
+)
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -0,0 +1,69 @@
+```@meta
+Author = "Paulito P. Palmes"
+```
+
+# TSML (Time-Series Machine Learning)
+
+TSML (Time Series Machine Learning) is package 
+for Time Series data processing, classification,
+and prediction. It combines ML libraries from Python's 
+ScikitLearn, R's Caret, and Julia ML using a common API 
+and allows seamless ensembling and integration of 
+heterogenous ML libraries to create complex models 
+for robust time-series pre-processing and prediction/classification.
+
+## Package Features
+
+- TS aggregation based on time/date interval
+- TS imputation based on Nearest Neighbors
+- TS statistical metrics of data quality
+- TS classification for automatic data discovery
+- TS prediction with more than 100+ libraries from caret, scikitlearn, and julia
+- TS date/val matrix conversion of 1-d TS using sliding windows for ML input
+- Pipeline API allows high-level description of the processing workflow
+- Easily extensible architecture by using just two main interfaces: fit and transform
+
+
+## Installation
+
+TSML is in the Julia Official package registry. 
+The latest release can be installed at the Julia 
+prompt using Julia's package management:
+```julia
+julia> ]add TSML
+```
+
+or
+
+```julia
+julia> using Pkg
+julia> pkg"add TSML"
+```
+
+or
+
+```julia
+julia> using Pkg
+julia> Pkg.add("TSML")
+```
+Once TSML is installed, you can load the TSML package by:
+
+```julia
+julia> using TSML
+```
+
+or 
+
+```julia
+julia> import TSML
+```
+Generally, you will need the different transformers and utils in TSML for
+time-series processing. To use them, it is standard in TSML code to have the
+following declared at the topmost part of your application:
+
+```julia
+using TSML 
+using TSML.TSMLTransformers
+using TSML.TSMLTypes
+using TSML.Utils
+```
diff --git a/docs/src/lib/caretlearn.md1 b/docs/src/lib/caretlearn.md1
@@ -0,0 +1,14 @@
+```@meta
+Author = "Paulito Palmes"
+```
+
+# [CaretLearners](@id lib_caretlearners)
+Creates an API wrapper for Caret Libs for pipeline workflow.
+
+```@index
+Modules = [CaretLearners]
+```
+
+```@autodocs
+Modules = [CaretLearners]
+```
diff --git a/docs/src/lib/decisiontree.md b/docs/src/lib/decisiontree.md
@@ -0,0 +1,14 @@
+```@meta
+Author = "Paulito Palmes"
+```
+
+# [DecisionTreeLearners](@id lib_decisiontree)
+Creates an API wrapper for DecisionTrees for pipeline workflow.
+
+```@index
+Modules = [DecisionTreeLearners]
+```
+
+```@autodocs
+Modules = [DecisionTreeLearners]
+```
diff --git a/docs/src/lib/sklearn.md1 b/docs/src/lib/sklearn.md1
@@ -0,0 +1,14 @@
+```@meta
+Author = "Paulito Palmes"
+```
+
+# [SKLearners](@id lib_sklearners)
+Creates an API wrapper for Scikit Learners for pipeline workflow.
+
+```@index
+Modules = [SKLearners]
+```
+
+```@autodocs
+Modules = [SKLearners]
+```
diff --git a/docs/src/man/aggregation.md b/docs/src/man/aggregation.md
@@ -0,0 +1,5 @@
+```@meta
+Author = "Paulito P. Palmes"
+```
+
+# Aggregation 
diff --git a/docs/src/man/cli.md b/docs/src/man/cli.md
@@ -0,0 +1,5 @@
+```@meta
+Author = "Paulito P. Palmes"
+```
+
+# CLI Wrappers
diff --git a/docs/src/man/dateproc.md b/docs/src/man/dateproc.md
@@ -0,0 +1,5 @@
+```@meta
+Author = "Paulito P. Palmes"
+```
+
+# Preprocessing
diff --git a/docs/src/man/imputation.md b/docs/src/man/imputation.md
@@ -0,0 +1,5 @@
+```@meta
+Author = "Paulito P. Palmes"
+```
+
+# Imputation
diff --git a/docs/src/man/monotonic.md b/docs/src/man/monotonic.md
@@ -0,0 +1,5 @@
+```@meta
+Author = "Paulito P. Palmes"
+```
+
+# Monotonic Detection
diff --git a/docs/src/man/tsclassification.md b/docs/src/man/tsclassification.md
@@ -0,0 +1,5 @@
+```@meta
+Author = "Paulito P. Palmes"
+```
+
+# TS Classification
diff --git a/docs/src/man/valueproc.md b/docs/src/man/valueproc.md
@@ -0,0 +1,5 @@
+```@meta
+Author = "Paulito P. Palmes"
+```
+
+# Value Preprocessing
diff --git a/docs/src/tutorial/aggregators.md b/docs/src/tutorial/aggregators.md
@@ -0,0 +1,96 @@
+```@meta
+Author = "Paulito P. Palmes"
+```
+
+# Aggregators and Imputers
+
+The package assumes a two-column input composed of Dates and Values. 
+The first part of the workflow aggregates values based on the specified 
+date/time interval which minimizes occurence of missing values and noise. 
+The aggregated data is then left-joined to the complete sequence of dates 
+in a specified date/time interval. Remaining missing values are replaced 
+by k nearest neighbors where k is the symmetric distance from the location 
+of missing value. This approach can be called several times until there 
+are no more missing values.
+
+Let us create Date, Value input with some missing values and apply TSML functions
+to normalize/clean the data:
+
+```@example 1
+using Random, Dates, DataFrames
+function generateDataWithMissing()
+   Random.seed!(123)
+   gdate = DateTime(2014,1,1):Dates.Minute(15):DateTime(2016,1,1)
+   gval = Array{Union{Missing,Float64}}(rand(length(gdate)))
+   gmissing = 50000
+   gndxmissing = Random.shuffle(1:length(gdate))[1:gmissing]
+   df = DataFrame(Date=gdate,Value=gval)
+   df[:Value][gndxmissing] .= missing
+   return df
+end
+```
+
+Let's output the first 20 rows:
+
+```@example 1
+X = generateDataWithMissing()
+first(X,20)
+```
+## DateValgator
+You'll notice several blocks of missing with reading frequency every 15 minutes. 
+Let's aggregate our dataset by taking the hourly median using the `DateValgator` transformer.
+
+```@example 1
+using TSML
+using TSML.TSMLTypes
+using TSML.Utils
+using TSML.TSMLTransformers
+using TSML: DateValgator
+
+dtvlgator = DateValgator(Dict(:dateinterval=>Dates.Hour(1)))
+fit!(dtvlgator,X)
+results = transform!(dtvlgator,X)
+first(results,20)
+```
+
+Missing values are now reduced because of the aggregation applied using
+`DateValgator` transformer. TSML transformers support the two main functions:
+`fit!` and `transform!`. `DateValgator fit!` performs initial setups of necessary parameters
+and validation of arguments while its `transform!` contains the algorithm for aggregation.
+
+## DateValNNer
+
+Let's perform further processing to replace the remaining missing values with their nearest neighbors. 
+We will use `DateValNNer` which is a TSML transformer to process the output of `DateValgator`.
+`DateValNNer` can also process non-aggregated data by first running similar workflow
+of `DateValgator` before performing its imputation routine.
+
+```@example 1
+using TSML: DateValNNer
+
+datevalnner = DateValNNer(Dict(:dateinterval=>Dates.Hour(1)))
+fit!(datevalnner, X)
+results = transform!(datevalnner,X)
+first(results,20)
+```
+
+After running the `DateValNNer`, it's guaranteed that there will be no more
+missing data. 
+
+## DateValizer
+
+One more imputer to replace missing data is `DateValizer`. It computes the hourly
+median over 24 hours and use the hour => median mapping 
+to replace missing data with the hour as the key. Below is a sample
+workflow to replace missing data in X with the hourly medians.
+
+```@example 1
+using TSML: DateValizer
+
+datevalizer = DateValizer(Dict(:dateinterval=>Dates.Hour(1)))
+fit!(datevalizer, X)
+results = transform!(datevalizer,X)
+first(results,20)
+```
+
+