From 49061123ceb6d74f9a6d208ea9cb7f84c6d5c7d9 Mon Sep 17 00:00:00 2001
From: Okon Samuel <39421418+OkonSamuel@users.noreply.github.com>
Date: Sun, 19 May 2024 21:12:48 +0100
Subject: [PATCH] Add documentation (#5)

* add draft for RFE model

* rename package

* Add FeatureSelectore and some tests

* fix current tests

* complete RFE model and add tests

* Update model docstring

* fix code, Update readme and add more tests

* Apply suggestions from code review

Co-authored-by: Anthony Blaom, PhD <anthony.blaom@gmail.com>

* rename n_features_to_select to n_features

* update readme with

* Apply suggestions from code review

Co-authored-by: Anthony Blaom, PhD <anthony.blaom@gmail.com>

* set max column limit to 92 in readme

* add Aqua.jl tests and refactor code

* update ci

* Apply suggestions from code review

Co-authored-by: Anthony Blaom, PhD <anthony.blaom@gmail.com>

* fix bug, add support for serialization and add more tests

* Update ci.yml

* Update ci.yml

* Update ci.yml

* Update ci.yml

* Update ci.yml

* add documentation

* Disable julia Nighly tests

---------

Co-authored-by: Anthony Blaom, PhD <anthony.blaom@gmail.com>
---
 .github/workflows/ci_nightly.yml |  50 ---------
 Project.toml                     |   2 +-
 README.md                        |  99 +----------------
 docs/.gitignore                  |   2 +
 docs/Project.toml                |  11 ++
 docs/make.jl                     |  34 ++++++
 docs/src/api.md                  |   9 ++
 docs/src/index.md                | 185 +++++++++++++++++++++++++++++++
 src/models/rfe.jl                |   1 -
 9 files changed, 243 insertions(+), 150 deletions(-)
 delete mode 100644 .github/workflows/ci_nightly.yml
 create mode 100644 docs/.gitignore
 create mode 100644 docs/Project.toml
 create mode 100644 docs/make.jl
 create mode 100644 docs/src/api.md
 create mode 100644 docs/src/index.md

diff --git a/.github/workflows/ci_nightly.yml b/.github/workflows/ci_nightly.yml
deleted file mode 100644
index dbb29cd..0000000
--- a/.github/workflows/ci_nightly.yml
+++ /dev/null
@@ -1,50 +0,0 @@
-name: CI (Julia nightly)
-on:
-  pull_request:
-    branches:
-      - master
-      - dev
-  push:
-    branches:
-      - master
-      - dev
-    tags: '*'
-env:
-  TEST_MLJBASE: "true"
-jobs:
-  test:
-    name: Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }}
-    runs-on: ${{ matrix.os }}
-    strategy:
-      fail-fast: false
-      matrix:
-        version:
-          - 'nightly'
-        os:
-          - ubuntu-latest
-        arch:
-          - x64
-    steps:
-      - uses: actions/checkout@v2
-      - uses: julia-actions/setup-julia@v1
-        with:
-          version: ${{ matrix.version }}
-          arch: ${{ matrix.arch }}
-      - uses: actions/cache@v1
-        env:
-          cache-name: cache-artifacts
-        with:
-          path: ~/.julia/artifacts
-          key: ${{ runner.os }}-test-${{ env.cache-name }}-${{ hashFiles('**/Project.toml') }}
-          restore-keys: |
-            ${{ runner.os }}-test-${{ env.cache-name }}-
-            ${{ runner.os }}-test-
-            ${{ runner.os }}-
-      - uses: julia-actions/julia-buildpkg@v1
-      - uses: julia-actions/julia-runtest@v1
-        env:
-          JULIA_NUM_THREADS: 2
-      - uses: julia-actions/julia-processcoverage@v1
-      - uses: codecov/codecov-action@v1
-        with:
-          file: lcov.info
diff --git a/Project.toml b/Project.toml
index 7473f24..5303a45 100644
--- a/Project.toml
+++ b/Project.toml
@@ -1,6 +1,6 @@
 name = "FeatureSelection"
 uuid = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6"
-authors = ["Anthony D. Blaom <anthony.blaom@gmail.com>"]
+authors = ["Anthony D. Blaom <anthony.blaom@gmail.com>", "Samuel Okon <okonsamuel50@gmail.com"]
 version = "0.1.0"
 
 [deps]
diff --git a/README.md b/README.md
index 46555e8..c41f96a 100644
--- a/README.md
+++ b/README.md
@@ -4,101 +4,4 @@
 | :------------ | :------- | :------------- |
 | [![Build Status](https://github.com/JuliaAI/FeatureSelection.jl/workflows/CI/badge.svg)](https://github.com/JuliaAI/FeatureSelection.jl/actions) | [![Coverage](https://codecov.io/gh/JuliaAI/FeatureSelection.jl/branch/master/graph/badge.svg)](https://codecov.io/github/JuliaAI/FeatureSelection.jl?branch=dev) | [![Code Style: Blue](https://img.shields.io/badge/code%20style-blue-4495d1.svg)](https://github.com/invenia/BlueStyle) |
 
-Repository housing feature selection algorithms for use with the machine learning toolbox
-[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/).
-
-`FeatureSelector` model builds on contributions originally residing at [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl/blob/v0.16.15/src/builtins/Transformers.jl#L189-L266)
-
-# Installation
-On a running instance of Julia with at least version 1.6 run
-```julia
-import Pkg;
-Pkg.add("FeatureSelection")
-```
-
-# Example Usage
-Lets build a supervised recursive feature eliminator with `RandomForestRegressor` 
-from DecisionTree.jl as our base model.
-But first we need a dataset to train on. We shall create a synthetic dataset popularly 
-known in the R community as the friedman dataset#1. Notice how the target vector for this 
-dataset depends on only the first five columns of feature table. So we expect that our 
-recursive feature elimination should return the first columns as important features.
-```julia
-using MLJ, FeatureSelection
-using StableRNGs
-rng = StableRNG(10)
-A = rand(rng, 50, 10)
-X = MLJ.table(A) # features
-y = @views(
-    10 .* sin.(
-        pi .* A[:, 1] .* A[:, 2]
-    ) .+ 20 .* (A[:, 3] .- 0.5).^ 2 .+ 10 .* A[:, 4] .+ 5 * A[:, 5]
-) # target
-```
-Now we that we have our data we can create our recursive feature elimination model and 
-train it on our dataset
-```julia
-RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree
-forest = RandomForestRegressor(rng=rng)
-rfe = RecursiveFeatureElimination(
-    model = forest, n_features=5, step=1
-) # see doctring for description of defaults
-mach = machine(rfe, X, y)
-fit!(mach)
-```
-We can inspect the feature importances in two ways:
-```julia
-# A variable with lower rank has more significance than a variable with higher rank.
-# A variable with Higher feature importance is better than a variable with lower 
-# feature importance
-report(mach).ranking # returns [1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
-feature_importances(mach) # returns dict of feature => importance pairs
-```
-We can view the important features used by our model by inspecting the `fitted_params` 
-object.
-```julia
-p = fitted_params(mach)
-p.features_left == [:x1, :x2, :x3, :x4, :x5]
-```
-We can also call the `predict` method on the fitted machine, to predict using a 
-random forest regressor trained using only the important features, or call the `transform` 
-method, to select just those features from some new table including all the original 
-features. For more info, type `?RecursiveFeatureElimination` on a Julia REPL.
-
-Okay, let's say that we didn't know that our synthetic dataset depends on only five 
-columns from our feature table. We could apply cross fold validation 
-`StratifiedCV(nfolds=5)` with our recursive feature elimination model to select the 
-optimal value of `n_features` for our model. In this case we will use a simple Grid 
-search with root mean square as the measure. 
-```julia
-rfe = RecursiveFeatureElimination(model = forest)
-tuning_rfe_model  = TunedModel(
-    model = rfe,
-    measure = rms,
-    tuning = Grid(rng=rng),
-    resampling = StratifiedCV(nfolds = 5),
-    range = range(
-        rfe, :n_features, values = 1:10
-    )
-)
-self_tuning_rfe_mach = machine(tuning_rfe_model, X, y)
-fit!(self_tuning_rfe_mach)
-```
-As before we can inspect the important features by inspecting the object returned by 
-`fitted_params` or `feature_importances` as shown below.
-```julia
-fitted_params(self_tuning_rfe_mach).best_fitted_params.features_left == [:x1, :x2, :x3, :x4, :x5]
-feature_importances(self_tuning_rfe_mach) # returns dict of feature => importance pairs
-```
-and call `predict` on the tuned model machine as shown below
-```julia
-Xnew = MLJ.table(rand(rng, 50, 10)) # create test data
-predict(self_tuning_rfe_mach, Xnew)
-```
-In this case, prediction is done using the best recursive feature elimination model gotten 
-from the tuning process above.
-
-For resampling methods different from cross-validation, and for other
- `TunedModel` options, such as parallelization, see the 
- [Tuning Models](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/) section of the MLJ manual.
-[MLJ Documentation](https://alan-turing-institute.github.io/MLJ.jl/dev/)
\ No newline at end of file
+Repository housing feature selection algorithms for use with the machine learning toolbox [MLJ](https://juliaai.github.io/MLJ.jl/dev/).
diff --git a/docs/.gitignore b/docs/.gitignore
new file mode 100644
index 0000000..264c13f
--- /dev/null
+++ b/docs/.gitignore
@@ -0,0 +1,2 @@
+Manifest.toml
+build/
\ No newline at end of file
diff --git a/docs/Project.toml b/docs/Project.toml
new file mode 100644
index 0000000..ca94df0
--- /dev/null
+++ b/docs/Project.toml
@@ -0,0 +1,11 @@
+[deps]
+Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
+MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
+FeatureSelection = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6"
+StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
+
+[compat]
+Documenter = "^1.4"
+MLJ = "^0.20"
+StableRNGs = "^1.0"
+julia = "^1.0"
\ No newline at end of file
diff --git a/docs/make.jl b/docs/make.jl
new file mode 100644
index 0000000..e3aa86f
--- /dev/null
+++ b/docs/make.jl
@@ -0,0 +1,34 @@
+using Documenter, FeatureSelection
+
+makedocs(;
+    authors = """
+        Anthony D. Blaom <anthony.blaom@gmail.com>, 
+        Sebastian Vollmer <s.vollmer.4@warwick.ac.uk>, 
+        Okon Samuel <okonsamuel50@gmail.com>
+        """,
+    format = Documenter.HTML(;
+        prettyurls= get(ENV, "CI", "false") == "true",
+        edit_link = "dev"
+    ),
+    modules = [FeatureSelection],
+    pages=[
+        "Home" => "index.md",
+        "API" => "api.md"
+    ],
+    doctest = false, # don't runt doctest as doctests are automatically run separately in ci.
+    repo = Remotes.GitHub("JuliaAI", "FeatureSelection.jl"),
+    sitename = "FeatureSelection.jl",
+)
+
+# By default Documenter does not deploy docs just for PR
+# this causes issues with how we're doing things and ends
+# up choking the deployment of the docs, so  here we
+# force the environment to ignore this so that Documenter
+# does indeed deploy the docs
+#ENV["GITHUB_EVENT_NAME"] = "pull_request"
+
+deploydocs(;
+    deploy_config = Documenter.GitHubActions(),
+    repo="github.com/JuliaAI/FeatureSelection.jl.git",
+    push_preview=true
+)
\ No newline at end of file
diff --git a/docs/src/api.md b/docs/src/api.md
new file mode 100644
index 0000000..0321ede
--- /dev/null
+++ b/docs/src/api.md
@@ -0,0 +1,9 @@
+```@meta
+CurrentModule = FeatureSelection
+```
+# API
+# Models
+```@docs
+FeatureSelector
+RecursiveFeatureElimination
+```
\ No newline at end of file
diff --git a/docs/src/index.md b/docs/src/index.md
new file mode 100644
index 0000000..0221261
--- /dev/null
+++ b/docs/src/index.md
@@ -0,0 +1,185 @@
+# FeatureSelection
+
+FeatureSelction is a julia package containing implementations of feature selection algorithms for use with the machine learning toolbox
+[MLJ](https://juliaai.github.io/MLJ.jl/dev/).
+
+# Installation
+On a running instance of Julia with at least version 1.6 run
+```julia
+import Pkg;
+Pkg.add("FeatureSelection")
+```
+
+# Example Usage
+Lets build a supervised recursive feature eliminator with `RandomForestRegressor` 
+from [DecisionTree.jl](https://github.com/JuliaAI/DecisionTree.jl) as our base model.
+But first we need a dataset to train on. We shall create a synthetic dataset popularly 
+known in the R community as the friedman dataset#1. Notice how the target vector for this 
+dataset depends on only the first five columns of feature table. So we expect that our 
+recursive feature elimination should return the first columns as important features.
+```@meta
+DocTestSetup = quote
+  using MLJ, FeatureSelection, StableRNGs
+  rng = StableRNG(10)
+  A = rand(rng, 50, 10)
+  X = MLJ.table(A) # features
+  y = @views(
+      10 .* sin.(
+          pi .* A[:, 1] .* A[:, 2]
+      ) .+ 20 .* (A[:, 3] .- 0.5).^ 2 .+ 10 .* A[:, 4] .+ 5 * A[:, 5]
+  ) # target
+  RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree
+  forest = RandomForestRegressor(rng=rng)
+  rfe = RecursiveFeatureElimination(
+    model = forest, n_features=5, step=1
+  ) # see doctring for description of defaults  
+  mach = machine(rfe, X, y)
+  fit!(mach)
+
+  rfe = RecursiveFeatureElimination(model = forest)
+  tuning_rfe_model  = TunedModel(
+      model = rfe,
+      measure = rms,
+      tuning = Grid(rng=rng),
+      resampling = StratifiedCV(nfolds = 5),
+      range = range(
+          rfe, :n_features, values = 1:10
+      )
+  )
+  self_tuning_rfe_mach = machine(tuning_rfe_model, X, y)
+  fit!(self_tuning_rfe_mach)
+end
+```
+```@example example1
+using MLJ, FeatureSelection, StableRNGs
+rng = StableRNG(10)
+A = rand(rng, 50, 10)
+X = MLJ.table(A) # features
+y = @views(
+    10 .* sin.(
+        pi .* A[:, 1] .* A[:, 2]
+    ) .+ 20 .* (A[:, 3] .- 0.5).^ 2 .+ 10 .* A[:, 4] .+ 5 * A[:, 5]
+) # target
+```
+Now we that we have our data we can create our recursive feature elimination model and 
+train it on our dataset
+```@example example1
+RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree
+forest = RandomForestRegressor(rng=rng)
+rfe = RecursiveFeatureElimination(
+    model = forest, n_features=5, step=1
+) # see doctring for description of defaults
+mach = machine(rfe, X, y)
+fit!(mach)
+```
+We can inspect the feature importances in two ways:
+```jldoctest
+julia> report(mach).ranking
+10-element Vector{Int64}:
+ 1
+ 1
+ 1
+ 1
+ 1
+ 2
+ 3
+ 4
+ 5
+ 6
+
+julia> feature_importances(mach)
+10-element Vector{Pair{Symbol, Int64}}:
+  :x1 => 6
+  :x2 => 5
+  :x3 => 4
+  :x4 => 3
+  :x5 => 2
+  :x6 => 1
+  :x7 => 1
+  :x8 => 1
+  :x9 => 1
+ :x10 => 1
+```
+Note that a variable with lower rank has more significance than a variable with higher rank while a variable with higher feature importance is better than a variable with lower feature importance.
+
+We can view the important features used by our model by inspecting the `fitted_params` 
+object.
+```jldoctest
+julia> p = fitted_params(mach)
+(features_left = [:x1, :x2, :x3, :x4, :x5],
+ model_fitresult = (forest = Ensemble of Decision Trees
+Trees:      100
+Avg Leaves: 25.26
+Avg Depth:  8.36,),)
+
+julia> p.features_left
+5-element Vector{Symbol}:
+ :x1
+ :x2
+ :x3
+ :x4
+ :x5
+```
+We can also call the `predict` method on the fitted machine, to predict using a 
+random forest regressor trained using only the important features, or call the `transform` 
+method, to select just those features from some new table including all the original 
+features. For more info, type `?RecursiveFeatureElimination` on a Julia REPL.
+
+Okay, let's say that we didn't know that our synthetic dataset depends on only five 
+columns from our feature table. We could apply cross fold validation 
+`StratifiedCV(nfolds=5)` with our recursive feature elimination model to select the 
+optimal value of `n_features` for our model. In this case we will use a simple Grid 
+search with root mean square as the measure. 
+```@example example1
+rfe = RecursiveFeatureElimination(model = forest)
+tuning_rfe_model  = TunedModel(
+    model = rfe,
+    measure = rms,
+    tuning = Grid(rng=rng),
+    resampling = StratifiedCV(nfolds = 5),
+    range = range(
+        rfe, :n_features, values = 1:10
+    )
+)
+self_tuning_rfe_mach = machine(tuning_rfe_model, X, y)
+fit!(self_tuning_rfe_mach)
+```
+As before we can inspect the important features by inspecting the object returned by 
+`fitted_params` or `feature_importances` as shown below.
+```jldoctest
+julia> fitted_params(self_tuning_rfe_mach).best_fitted_params.features_left
+5-element Vector{Symbol}:
+ :x1
+ :x2
+ :x3
+ :x4
+ :x5
+
+julia> feature_importances(self_tuning_rfe_mach)
+10-element Vector{Pair{Symbol, Int64}}:
+  :x1 => 6
+  :x2 => 5
+  :x3 => 4
+  :x4 => 3
+  :x5 => 2
+  :x6 => 1
+  :x7 => 1
+  :x8 => 1
+  :x9 => 1
+ :x10 => 1
+```
+and call `predict` on the tuned model machine as shown below
+```@example example1
+Xnew = MLJ.table(rand(rng, 50, 10)) # create test data
+predict(self_tuning_rfe_mach, Xnew)
+```
+In this case, prediction is done using the best recursive feature elimination model gotten 
+from the tuning process above.
+
+For resampling methods different from cross-validation, and for other
+ `TunedModel` options, such as parallelization, see the 
+ [Tuning Models](https://juliaai.github.io/MLJ.jl/dev/tuning_models/) section of the MLJ manual.
+[MLJ Documentation](https://juliaai.github.io/MLJ.jl/dev/)
+```@meta
+DocTestSetup = nothing
+```
\ No newline at end of file
diff --git a/src/models/rfe.jl b/src/models/rfe.jl
index 1b10260..db636b9 100644
--- a/src/models/rfe.jl
+++ b/src/models/rfe.jl
@@ -139,7 +139,6 @@ Xnew = MLJ.table(rand(rng, 50, 10));
 predict(mach, Xnew)
 
 ```
-
 """
 function RecursiveFeatureElimination(
     args...;