Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
f42ddee
add draft for RFE model
OkonSamuel Jan 16, 2024
a24fea3
rename package
OkonSamuel Jan 24, 2024
f5b8311
Add FeatureSelectore and some tests
OkonSamuel Jan 29, 2024
f7fe0bc
fix current tests
OkonSamuel Jan 29, 2024
860f203
complete RFE model and add tests
OkonSamuel Feb 5, 2024
2168294
Update model docstring
OkonSamuel Feb 7, 2024
f4a3c9c
fix code, Update readme and add more tests
OkonSamuel Feb 13, 2024
9351dcd
Merge branch 'dev' into prelim
OkonSamuel Feb 13, 2024
d2e41a5
Apply suggestions from code review
OkonSamuel Feb 15, 2024
1d312c2
rename n_features_to_select to n_features
OkonSamuel Feb 18, 2024
c5161cf
update readme with
OkonSamuel Feb 18, 2024
4cc82c7
Apply suggestions from code review
OkonSamuel Feb 28, 2024
3e45aba
set max column limit to 92 in readme
OkonSamuel Feb 28, 2024
9c53b85
add Aqua.jl tests and refactor code
OkonSamuel Feb 28, 2024
81b90de
update ci
OkonSamuel Feb 28, 2024
eed3af1
Apply suggestions from code review
OkonSamuel Mar 7, 2024
2c79259
fix bug, add support for serialization and add more tests
OkonSamuel Mar 22, 2024
fb0ba2f
Update ci.yml
OkonSamuel Mar 22, 2024
fa59bd3
Merge branch 'dev' into prelim
OkonSamuel Mar 22, 2024
2fcbc4f
Update ci.yml
OkonSamuel Mar 22, 2024
a66ffae
Update ci.yml
OkonSamuel Mar 22, 2024
91a8f54
Update ci.yml
OkonSamuel Mar 22, 2024
1720f81
Update ci.yml
OkonSamuel Mar 22, 2024
d1aba83
add documentation
OkonSamuel May 13, 2024
d56a1b1
Merge branch 'dev' into prelim
OkonSamuel May 13, 2024
51c70c2
Disable julia Nighly tests
OkonSamuel May 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 0 additions & 50 deletions .github/workflows/ci_nightly.yml

This file was deleted.

2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name = "FeatureSelection"
uuid = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6"
authors = ["Anthony D. Blaom <anthony.blaom@gmail.com>"]
authors = ["Anthony D. Blaom <anthony.blaom@gmail.com>", "Samuel Okon <okonsamuel50@gmail.com"]
version = "0.1.0"

[deps]
Expand Down
99 changes: 1 addition & 98 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,101 +4,4 @@
| :------------ | :------- | :------------- |
| [![Build Status](https://github.com/JuliaAI/FeatureSelection.jl/workflows/CI/badge.svg)](https://github.com/JuliaAI/FeatureSelection.jl/actions) | [![Coverage](https://codecov.io/gh/JuliaAI/FeatureSelection.jl/branch/master/graph/badge.svg)](https://codecov.io/github/JuliaAI/FeatureSelection.jl?branch=dev) | [![Code Style: Blue](https://img.shields.io/badge/code%20style-blue-4495d1.svg)](https://github.com/invenia/BlueStyle) |

Repository housing feature selection algorithms for use with the machine learning toolbox
[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/).

`FeatureSelector` model builds on contributions originally residing at [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl/blob/v0.16.15/src/builtins/Transformers.jl#L189-L266)

# Installation
On a running instance of Julia with at least version 1.6 run
```julia
import Pkg;
Pkg.add("FeatureSelection")
```

# Example Usage
Lets build a supervised recursive feature eliminator with `RandomForestRegressor`
from DecisionTree.jl as our base model.
But first we need a dataset to train on. We shall create a synthetic dataset popularly
known in the R community as the friedman dataset#1. Notice how the target vector for this
dataset depends on only the first five columns of feature table. So we expect that our
recursive feature elimination should return the first columns as important features.
```julia
using MLJ, FeatureSelection
using StableRNGs
rng = StableRNG(10)
A = rand(rng, 50, 10)
X = MLJ.table(A) # features
y = @views(
10 .* sin.(
pi .* A[:, 1] .* A[:, 2]
) .+ 20 .* (A[:, 3] .- 0.5).^ 2 .+ 10 .* A[:, 4] .+ 5 * A[:, 5]
) # target
```
Now we that we have our data we can create our recursive feature elimination model and
train it on our dataset
```julia
RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree
forest = RandomForestRegressor(rng=rng)
rfe = RecursiveFeatureElimination(
model = forest, n_features=5, step=1
) # see doctring for description of defaults
mach = machine(rfe, X, y)
fit!(mach)
```
We can inspect the feature importances in two ways:
```julia
# A variable with lower rank has more significance than a variable with higher rank.
# A variable with Higher feature importance is better than a variable with lower
# feature importance
report(mach).ranking # returns [1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
feature_importances(mach) # returns dict of feature => importance pairs
```
We can view the important features used by our model by inspecting the `fitted_params`
object.
```julia
p = fitted_params(mach)
p.features_left == [:x1, :x2, :x3, :x4, :x5]
```
We can also call the `predict` method on the fitted machine, to predict using a
random forest regressor trained using only the important features, or call the `transform`
method, to select just those features from some new table including all the original
features. For more info, type `?RecursiveFeatureElimination` on a Julia REPL.

Okay, let's say that we didn't know that our synthetic dataset depends on only five
columns from our feature table. We could apply cross fold validation
`StratifiedCV(nfolds=5)` with our recursive feature elimination model to select the
optimal value of `n_features` for our model. In this case we will use a simple Grid
search with root mean square as the measure.
```julia
rfe = RecursiveFeatureElimination(model = forest)
tuning_rfe_model = TunedModel(
model = rfe,
measure = rms,
tuning = Grid(rng=rng),
resampling = StratifiedCV(nfolds = 5),
range = range(
rfe, :n_features, values = 1:10
)
)
self_tuning_rfe_mach = machine(tuning_rfe_model, X, y)
fit!(self_tuning_rfe_mach)
```
As before we can inspect the important features by inspecting the object returned by
`fitted_params` or `feature_importances` as shown below.
```julia
fitted_params(self_tuning_rfe_mach).best_fitted_params.features_left == [:x1, :x2, :x3, :x4, :x5]
feature_importances(self_tuning_rfe_mach) # returns dict of feature => importance pairs
```
and call `predict` on the tuned model machine as shown below
```julia
Xnew = MLJ.table(rand(rng, 50, 10)) # create test data
predict(self_tuning_rfe_mach, Xnew)
```
In this case, prediction is done using the best recursive feature elimination model gotten
from the tuning process above.

For resampling methods different from cross-validation, and for other
`TunedModel` options, such as parallelization, see the
[Tuning Models](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/) section of the MLJ manual.
[MLJ Documentation](https://alan-turing-institute.github.io/MLJ.jl/dev/)
Repository housing feature selection algorithms for use with the machine learning toolbox [MLJ](https://juliaai.github.io/MLJ.jl/dev/).
2 changes: 2 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Manifest.toml
build/
11 changes: 11 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
FeatureSelection = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6"
StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"

[compat]
Documenter = "^1.4"
MLJ = "^0.20"
StableRNGs = "^1.0"
julia = "^1.0"
34 changes: 34 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
using Documenter, FeatureSelection

makedocs(;
authors = """
Anthony D. Blaom <anthony.blaom@gmail.com>,
Sebastian Vollmer <s.vollmer.4@warwick.ac.uk>,
Okon Samuel <okonsamuel50@gmail.com>
""",
format = Documenter.HTML(;
prettyurls= get(ENV, "CI", "false") == "true",
edit_link = "dev"
),
modules = [FeatureSelection],
pages=[
"Home" => "index.md",
"API" => "api.md"
],
doctest = false, # don't runt doctest as doctests are automatically run separately in ci.
repo = Remotes.GitHub("JuliaAI", "FeatureSelection.jl"),
sitename = "FeatureSelection.jl",
)

# By default Documenter does not deploy docs just for PR
# this causes issues with how we're doing things and ends
# up choking the deployment of the docs, so here we
# force the environment to ignore this so that Documenter
# does indeed deploy the docs
#ENV["GITHUB_EVENT_NAME"] = "pull_request"

deploydocs(;
deploy_config = Documenter.GitHubActions(),
repo="github.com/JuliaAI/FeatureSelection.jl.git",
push_preview=true
)
9 changes: 9 additions & 0 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
```@meta
CurrentModule = FeatureSelection
```
# API
# Models
```@docs
FeatureSelector
RecursiveFeatureElimination
```
Loading