generated from JuliaAI/MLJExampleInterface.jl
-
Notifications
You must be signed in to change notification settings - Fork 0
Prelim work #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Prelim work #3
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
f42ddee
add draft for RFE model
OkonSamuel a24fea3
rename package
OkonSamuel f5b8311
Add FeatureSelectore and some tests
OkonSamuel f7fe0bc
fix current tests
OkonSamuel 860f203
complete RFE model and add tests
OkonSamuel 2168294
Update model docstring
OkonSamuel f4a3c9c
fix code, Update readme and add more tests
OkonSamuel 9351dcd
Merge branch 'dev' into prelim
OkonSamuel d2e41a5
Apply suggestions from code review
OkonSamuel 1d312c2
rename n_features_to_select to n_features
OkonSamuel c5161cf
update readme with
OkonSamuel 4cc82c7
Apply suggestions from code review
OkonSamuel 3e45aba
set max column limit to 92 in readme
OkonSamuel 9c53b85
add Aqua.jl tests and refactor code
OkonSamuel 81b90de
update ci
OkonSamuel eed3af1
Apply suggestions from code review
OkonSamuel 2c79259
fix bug, add support for serialization and add more tests
OkonSamuel fb0ba2f
Update ci.yml
OkonSamuel fa59bd3
Merge branch 'dev' into prelim
OkonSamuel 2fcbc4f
Update ci.yml
OkonSamuel a66ffae
Update ci.yml
OkonSamuel 91a8f54
Update ci.yml
OkonSamuel 1720f81
Update ci.yml
OkonSamuel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,50 +1,104 @@ | ||
# FeatureSelection.jl | ||
|
||
This repository is a template for creating repositories that contain | ||
glue code between (i) packages providing machine learning algorithms; and (ii) | ||
the machine learning toolbox | ||
[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) - that is, | ||
for so-called *interface-only packages*. | ||
|
||
## When to use this template | ||
|
||
This template is intended for use when a package providing a machine | ||
learning model algorithm is not hosting the code that implements the | ||
MLJ model API, and a separate package for this purpose is to be | ||
created. This repo is itself a working implementation but should | ||
be used in conjunction with the more detailed [model implementation | ||
guidelines](https://alan-turing-institute.github.io/MLJ.jl/dev/adding_models_for_general_use/). | ||
|
||
## How to use this template | ||
|
||
1. Clone this repository or use it as a template if available from your organization. | ||
|
||
2. Rename this repository, replacing the word "Example" with the name of the model-providing package. | ||
|
||
1. Develop the contents of src/MLJExampleInterface.jl appropriately. | ||
|
||
2. Rename src/MLJExampleInterface.jl appropriately. | ||
|
||
3. Remove Example from Project.toml and instead add the model-providing package. | ||
|
||
3. **GENERATE A NEW UUID in Project.toml** and change the Project.toml | ||
name and author appropriately. | ||
|
||
1. You may want to remove the Distributions test dependency if you don't need it. | ||
|
||
4. Replace every instance of "Example" in this README.md with the name | ||
of the model-providing package and adjust the organization name in | ||
the link. | ||
|
||
5. Remove everything in this REAMDE.md except what is below the line | ||
you are currently reading 😉. | ||
|
||
|
||
# MLJ.jl <--> Example.jl | ||
|
||
Repository implementing the [MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/) model interface for models provided by | ||
[Example.jl](https://github.com/JuliaLang/Example.jl). | ||
|
||
| Linux | Coverage | | ||
| :------------ | :------- | | ||
| [](https://github.com/JuliaAI/MLJExampleInterface.jl/actions) | [](https://codecov.io/github/JuliaAI/MLJExampleInterface.jl?branch=master) | | ||
| Linux | Coverage | Code Style | ||
| :------------ | :------- | :------------- | | ||
| [](https://github.com/JuliaAI/FeatureSelection.jl/actions) | [](https://codecov.io/github/JuliaAI/FeatureSelection.jl?branch=dev) | [](https://github.com/invenia/BlueStyle) | | ||
|
||
Repository housing feature selection algorithms for use with the machine learning toolbox | ||
[MLJ](https://alan-turing-institute.github.io/MLJ.jl/dev/). | ||
|
||
`FeatureSelector` model builds on contributions originally residing at [MLJModels.jl](https://github.com/JuliaAI/MLJModels.jl/blob/v0.16.15/src/builtins/Transformers.jl#L189-L266) | ||
|
||
# Installation | ||
On a running instance of Julia with at least version 1.6 run | ||
```julia | ||
import Pkg; | ||
Pkg.add("FeatureSelection") | ||
``` | ||
|
||
# Example Usage | ||
Lets build a supervised recursive feature eliminator with `RandomForestRegressor` | ||
from DecisionTree.jl as our base model. | ||
But first we need a dataset to train on. We shall create a synthetic dataset popularly | ||
known in the R community as the friedman dataset#1. Notice how the target vector for this | ||
dataset depends on only the first five columns of feature table. So we expect that our | ||
recursive feature elimination should return the first columns as important features. | ||
```julia | ||
using MLJ, FeatureSelection | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am proposing that FeatureSelection be a dep of MLJ with all names re-exported. So you won't need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah |
||
using StableRNGs | ||
rng = StableRNG(10) | ||
A = rand(rng, 50, 10) | ||
X = MLJ.table(A) # features | ||
y = @views( | ||
10 .* sin.( | ||
pi .* A[:, 1] .* A[:, 2] | ||
) .+ 20 .* (A[:, 3] .- 0.5).^ 2 .+ 10 .* A[:, 4] .+ 5 * A[:, 5] | ||
) # target | ||
``` | ||
Now we that we have our data we can create our recursive feature elimination model and | ||
train it on our dataset | ||
```julia | ||
RandomForestRegressor = @load RandomForestRegressor pkg=DecisionTree | ||
forest = RandomForestRegressor(rng=rng) | ||
rfe = RecursiveFeatureElimination( | ||
model = forest, n_features=5, step=1 | ||
) # see doctring for description of defaults | ||
mach = machine(rfe, X, y) | ||
fit!(mach) | ||
``` | ||
We can inspect the feature importances in two ways: | ||
```julia | ||
# A variable with lower rank has more significance than a variable with higher rank. | ||
# A variable with Higher feature importance is better than a variable with lower | ||
# feature importance | ||
report(mach).ranking # returns [1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0] | ||
feature_importances(mach) # returns dict of feature => importance pairs | ||
``` | ||
We can view the important features used by our model by inspecting the `fitted_params` | ||
object. | ||
```julia | ||
p = fitted_params(mach) | ||
p.features_left == [:x1, :x2, :x3, :x4, :x5] | ||
``` | ||
We can also call the `predict` method on the fitted machine, to predict using a | ||
random forest regressor trained using only the important features, or call the `transform` | ||
method, to select just those features from some new table including all the original | ||
features. For more info, type `?RecursiveFeatureElimination` on a Julia REPL. | ||
|
||
Okay, let's say that we didn't know that our synthetic dataset depends on only five | ||
columns from our feature table. We could apply cross fold validation | ||
`StratifiedCV(nfolds=5)` with our recursive feature elimination model to select the | ||
optimal value of `n_features` for our model. In this case we will use a simple Grid | ||
search with root mean square as the measure. | ||
```julia | ||
rfe = RecursiveFeatureElimination(model = forest) | ||
tuning_rfe_model = TunedModel( | ||
model = rfe, | ||
measure = rms, | ||
tuning = Grid(rng=rng), | ||
resampling = StratifiedCV(nfolds = 5), | ||
range = range( | ||
rfe, :n_features, values = 1:10 | ||
) | ||
OkonSamuel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) | ||
self_tuning_rfe_mach = machine(tuning_rfe_model, X, y) | ||
fit!(self_tuning_rfe_mach) | ||
``` | ||
As before we can inspect the important features by inspecting the object returned by | ||
`fitted_params` or `feature_importances` as shown below. | ||
```julia | ||
fitted_params(self_tuning_rfe_mach).best_fitted_params.features_left == [:x1, :x2, :x3, :x4, :x5] | ||
feature_importances(self_tuning_rfe_mach) # returns dict of feature => importance pairs | ||
``` | ||
OkonSamuel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
and call `predict` on the tuned model machine as shown below | ||
```julia | ||
Xnew = MLJ.table(rand(rng, 50, 10)) # create test data | ||
predict(self_tuning_rfe_mach, Xnew) | ||
``` | ||
In this case, prediction is done using the best recursive feature elimination model gotten | ||
from the tuning process above. | ||
|
||
For resampling methods different from cross-validation, and for other | ||
`TunedModel` options, such as parallelization, see the | ||
[Tuning Models](https://alan-turing-institute.github.io/MLJ.jl/dev/tuning_models/) section of the MLJ manual. | ||
[MLJ Documentation](https://alan-turing-institute.github.io/MLJ.jl/dev/) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
module FeatureSelection | ||
|
||
using MLJModelInterface, Tables, ScientificTypesBase | ||
|
||
export FeatureSelector, RecursiveFeatureElimination | ||
|
||
const MMI = MLJModelInterface | ||
|
||
## Includes | ||
include("models/featureselector.jl") | ||
include("models/rfe.jl") | ||
|
||
## Pkg Traits | ||
MMI.metadata_pkg.( | ||
( | ||
DeterministicRecursiveFeatureElimination, | ||
ProbabilisticRecursiveFeatureElimination, | ||
FeatureSelector | ||
), | ||
package_name = "FeatureSelection", | ||
package_uuid = "33837fe5-dbff-4c9e-8c2f-c5612fe2b8b6", | ||
package_url = "https://github.com/JuliaAI/FeatureSelection.jl", | ||
is_pure_julia = true, | ||
package_license = "MIT" | ||
) | ||
|
||
end # module |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can just dump all the documenter.jl stuff since we are not using it. in any case, i don't think we need all that complicated logic for this package.