Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 0 additions & 50 deletions .github/workflows/ci_nightly.yml

This file was deleted.

2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name = "FeatureSelection"
uuid = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6"
authors = ["Anthony D. Blaom <anthony.blaom@gmail.com>"]
authors = ["Anthony D. Blaom <anthony.blaom@gmail.com>", "Samuel Okon <okonsamuel50@gmail.com"]
version = "0.1.0"

[deps]
Expand Down
99 changes: 1 addition & 98 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,101 +4,4 @@
| :------------ | :------- | :------------- |
| [![Build Status](https://github.com/JuliaAI/FeatureSelection.jl/workflows/CI/badge.svg)](https://github.com/JuliaAI/FeatureSelection.jl/actions) | [![Coverage](https://codecov.io/gh/JuliaAI/FeatureSelection.jl/branch/master/graph/badge.svg)](https://codecov.io/github/JuliaAI/FeatureSelection.jl?branch=dev) | [![Code Style: Blue](https://img.shields.io/badge/code%20style-blue-4495d1.svg)](https://github.com/invenia/BlueStyle) |

Repository housing feature selection algorithms for use with the machine learning toolbox
[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/).

`FeatureSelector` model builds on contributions originally residing at [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl/blob/v0.16.15/src/builtins/Transformers.jl#L189-L266)

# Installation
On a running instance of Julia with at least version 1.6 run
```julia
import Pkg;
Pkg.add("FeatureSelection")
```

# Example Usage
Lets build a supervised recursive feature eliminator with `RandomForestRegressor`
from DecisionTree.jl as our base model.
But first we need a dataset to train on. We shall create a synthetic dataset popularly
known in the R community as the friedman dataset#1. Notice how the target vector for this
dataset depends on only the first five columns of feature table. So we expect that our
recursive feature elimination should return the first columns as important features.
```julia
using MLJ, FeatureSelection
using StableRNGs
rng = StableRNG(10)
A = rand(rng, 50, 10)
X = MLJ.table(A) # features
y = @views(
10 .* sin.(
pi .* A[:, 1] .* A[:, 2]
) .+ 20 .* (A[:, 3] .- 0.5).^ 2 .+ 10 .* A[:, 4] .+ 5 * A[:, 5]
) # target
```
Now we that we have our data we can create our recursive feature elimination model and
train it on our dataset
```julia
RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree
forest = RandomForestRegressor(rng=rng)
rfe = RecursiveFeatureElimination(
model = forest, n_features=5, step=1
) # see doctring for description of defaults
mach = machine(rfe, X, y)
fit!(mach)
```
We can inspect the feature importances in two ways:
```julia
# A variable with lower rank has more significance than a variable with higher rank.
# A variable with Higher feature importance is better than a variable with lower
# feature importance
report(mach).ranking # returns [1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
feature_importances(mach) # returns dict of feature => importance pairs
```
We can view the important features used by our model by inspecting the `fitted_params`
object.
```julia
p = fitted_params(mach)
p.features_left == [:x1, :x2, :x3, :x4, :x5]
```
We can also call the `predict` method on the fitted machine, to predict using a
random forest regressor trained using only the important features, or call the `transform`
method, to select just those features from some new table including all the original
features. For more info, type `?RecursiveFeatureElimination` on a Julia REPL.

Okay, let's say that we didn't know that our synthetic dataset depends on only five
columns from our feature table. We could apply cross fold validation
`StratifiedCV(nfolds=5)` with our recursive feature elimination model to select the
optimal value of `n_features` for our model. In this case we will use a simple Grid
search with root mean square as the measure.
```julia
rfe = RecursiveFeatureElimination(model = forest)
tuning_rfe_model = TunedModel(
model = rfe,
measure = rms,
tuning = Grid(rng=rng),
resampling = StratifiedCV(nfolds = 5),
range = range(
rfe, :n_features, values = 1:10
)
)
self_tuning_rfe_mach = machine(tuning_rfe_model, X, y)
fit!(self_tuning_rfe_mach)
```
As before we can inspect the important features by inspecting the object returned by
`fitted_params` or `feature_importances` as shown below.
```julia
fitted_params(self_tuning_rfe_mach).best_fitted_params.features_left == [:x1, :x2, :x3, :x4, :x5]
feature_importances(self_tuning_rfe_mach) # returns dict of feature => importance pairs
```
and call `predict` on the tuned model machine as shown below
```julia
Xnew = MLJ.table(rand(rng, 50, 10)) # create test data
predict(self_tuning_rfe_mach, Xnew)
```
In this case, prediction is done using the best recursive feature elimination model gotten
from the tuning process above.

For resampling methods different from cross-validation, and for other
`TunedModel` options, such as parallelization, see the
[Tuning Models](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/) section of the MLJ manual.
[MLJ Documentation](https://alan-turing-institute.github.io/MLJ.jl/dev/)
Repository housing feature selection algorithms for use with the machine learning toolbox [MLJ](https://juliaai.github.io/MLJ.jl/dev/).
2 changes: 2 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Manifest.toml
build/
11 changes: 11 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
FeatureSelection = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6"
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"

[compat]
Documenter = "^1.4"
MLJ = "^0.20"
StableRNGs = "^1.0"
julia = "^1.0"
34 changes: 34 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
using Documenter, FeatureSelection

makedocs(;
authors = """
Anthony D. Blaom <anthony.blaom@gmail.com>,
Sebastian Vollmer <s.vollmer.4@warwick.ac.uk>,
Okon Samuel <okonsamuel50@gmail.com>
""",
format = Documenter.HTML(;
prettyurls= get(ENV, "CI", "false") == "true",
edit_link = "dev"
),
modules = [FeatureSelection],
pages=[
"Home" => "index.md",
"API" => "api.md"
],
doctest = false, # don't runt doctest as doctests are automatically run separately in ci.
repo = Remotes.GitHub("JuliaAI", "FeatureSelection.jl"),
sitename = "FeatureSelection.jl",
)

# By default Documenter does not deploy docs just for PR
# this causes issues with how we're doing things and ends
# up choking the deployment of the docs, so here we
# force the environment to ignore this so that Documenter
# does indeed deploy the docs
#ENV["GITHUB_EVENT_NAME"] = "pull_request"

deploydocs(;
deploy_config = Documenter.GitHubActions(),
repo="github.com/JuliaAI/FeatureSelection.jl.git",
push_preview=true
)
9 changes: 9 additions & 0 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
```@meta
CurrentModule = FeatureSelection
```
# API
# Models
```@docs
FeatureSelector
RecursiveFeatureElimination
```
Loading