Skip to content

Commit

Permalink
Merge 739bcd6 into 6e43686
Browse files Browse the repository at this point in the history
  • Loading branch information
ablaom committed Jan 15, 2020
2 parents 6e43686 + 739bcd6 commit 8347643
Show file tree
Hide file tree
Showing 15 changed files with 110 additions and 1,427 deletions.
8 changes: 3 additions & 5 deletions Project.toml
Expand Up @@ -17,7 +17,6 @@ MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
OrderedCollections = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
PrettyTables = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
RecipesBase = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
Expand All @@ -31,13 +30,12 @@ CategoricalArrays = "^0.7"
ComputationalResources = "^0.3"
Distributions = "^0.21"
DocStringExtensions = "^0.8"
MLJBase = "^0.9.1"
MLJModels = "^0.6"
MLJBase = "^0.10"
MLJModels = "^0.7"
OrderedCollections = "^1.1"
PrettyTables = "^0.6"
ProgressMeter = "^1.1"
RecipesBase = "^0.7"
ScientificTypes = "^0.3.2"
ScientificTypes = "^0.5.1"
StatsBase = "^0.32"
Tables = "^0.2"
julia = "1"
Expand Down
4 changes: 2 additions & 2 deletions README.md
Expand Up @@ -100,7 +100,7 @@ The MLJ universe is made out of several repositories some of which can be used i
* (⟂) [MLJBase.jl](https://github.com/alan-turing-institute/MLJBase.jl) offers essential tools to load and interpret data, describe ML models and use metrics; it is the repository you should interface with if you wish to make your package accessible via MLJ,
* [MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl) offers tools to compose, tune and evaluate models,
* [MLJModels.jl](https://github.com/alan-turing-institute/MLJModels.jl) contains interfaces to a number of important model-providing packages such as, [DecisionTree.jl](https://github.com/bensadeghi/DecisionTree.jl), [ScikitLearn.jl](https://github.com/bensadeghi/ScikitLearn.jl) or [XGBoost.jl](https://github.com/dmlc/XGBoost.jl) as well as a few built-in transformations (one hot encoding, standardisation, ...), it also hosts the *model registry* which keeps track of all models accessible via MLJ,
* (⟂) [ScientificTypes.jl](https://github.com/alan-turing-institute/ScientificTypes.jl) a lightweight package to help specify the *interpretation* of data beyond how the data is currently encoded,
* (⟂) [ScientificTypes.jl](https://github.com/alan-turing-institute/ScientificTypes.jl) a lightweight package to help MLJ articulate it's conventions about how different types of data (`2.71`, `"male"`, `CategoricalArray{Int}`, etc ) should be *interpreted* by models (`Continuous`, `Textual`, `AbstractArray{Multiclass}`, etc).
* (⟂) [MLJLinearModels.jl](https://github.com/alan-turing-institute/MLJLinearModels.jl) an experimental package for a wide range of penalised linear models such as Lasso, Elastic-Net, Robust regression, LAD regression, etc.
* [MLJFlux.jl](https://github.com/alan-turing-institute/MLJFlux.jl) an experimental package to use Flux within MLJ.

Expand Down Expand Up @@ -151,7 +151,7 @@ The table below indicates the models that are accessible at present along with a
| Package | Models | Maturity | Note
| ------- | ------ | -------- | ----
[Clustering.jl] | KMeans, KMedoids | high | †
[DecisionTree.jl] | DecisionTreeClassifier, DecisionTreeRegressor | high | †
[DecisionTree.jl] | DecisionTreeClassifier, DecisionTreeRegressor, AdaBoostStumpClassifier | high | †
[GLM.jl] | LinearRegressor, LinearBinaryClassifier, LinearCountRegressor | medium | †
[LIBSVM.jl] | LinearSVC, SVC, NuSVC, NuSVR, EpsilonSVR, OneClassSVM | high | also via ScikitLearn.jl
[MLJModels.jl] (builtins) | StaticTransformer, FeatureSelector, FillImputer, UnivariateStandardizer, Standardizer, UnivariateBoxCoxTransformer, OneHotEncoder, ConstantRegressor, ConstantClassifier | medium |
Expand Down
9 changes: 5 additions & 4 deletions docs/src/acceleration_and_parallelism.md
Expand Up @@ -25,11 +25,12 @@ computation, while `acceleration=CPUThreads()` will use Julia's PARTR
threading model to perform acceleration.

The default computational resource is `CPU1()`, which is simply serial
processing via CPU. The default resource can be changed by setting
`MLJ.DEFAULT_RESOURCE[]` to an instance of
`ComputationalResources.AbstractResource`.
processing via CPU. The default resource can be changed as in this
example: `MLJ.default_resource(CPUProcesses())`. The argument must
always have type `<:ComputationalResource.AbstractResource`. To
inspect the current default, use `MLJ.default_resource()`.

!!! note

The `CPUThreads()` dispatch is only available when running a version of
The `CPUThreads()` resource is only available when running a version of
Julia with `Threads.@spawn` available.
66 changes: 65 additions & 1 deletion docs/src/adding_models_for_general_use.md
Expand Up @@ -125,6 +125,30 @@ function RidgeRegressor(; lambda=0.0)
end
```

An alternative to declaring the model struct, clean! method and keyword
constructor, is to use the `@mlj_model` macro, as in the following example:

```julia
@mlj_model mutable struct YourModel <: MLJBase.Deterministic
a::Float64 = 0.5::(_ > 0)
b::String = "svd"::(_ in ("svd","qr"))
end
```

This declaration specifies:

* A keyword constructor (here `YourModel(; a=..., b=...)`),
* Default values for the hyperparameters,
* Constraints on the hyperparameters where `_` refers to a value
passed.

For example, `a::Float64 = 0.5::(_ > 0)` indicates that
the field `a` is a `Float64`, takes `0.5` as default value, and
expects its value to be positive.

You cannot use the `@mlj_model` macro if your model struct has type
parameters.


### Supervised models

Expand Down Expand Up @@ -163,6 +187,12 @@ MLJBase.update(model::SomeSupervisedModel, verbosity, old_fitresult, old_cache,
MLJBase.fit(model, verbosity, X, y)
```

Optional, to specify default hyperparameter ranges (for use in tuning):

```julia
MLJBase.hyperparameter_ranges(T::Type) = Tuple(fill(nothing, length(fieldnames(T))))
```

Optional, if `SomeSupervisedModel <: Probabilistic`:

```julia
Expand Down Expand Up @@ -565,6 +595,40 @@ method | return type | declarable return values
`is_pure_julia` | `Bool` | `true` or `false` | `false`
`supports_weights` | `Bool` | `true` or `false` | `false`

**New.** A final trait you can optionally implement is the
`hyperparamter_ranges` trait. It declares default `ParamRange` objects
for one or more of your model's hyperparameters. This is for use (in
the future) by tuning algorithms (e.g., grid generation). It does not
represent the full space of *allowed values*. This information is
encoded in your `clean!` method (or `@mlj_model` call).

The value returned by `hyperparamter_ranges` must be a tuple of
`ParamRange` objects (query `?range` for details) whose length is the
number of hyperparameters (fields of your model). Note that varying a
hyperparameter over a specified range should not alter any type
parameters in your model struct (this never applies to numeric
ranges). If it doesn't make sense to provide a range for a parameter,
a `nothing` entry is allowed. The fallback returns a tuple of
`nothing`s.

For example, a three parameter model of the form

```julia
mutable struct MyModel{D} <: Deterministic
alpha::Float64
beta::Int
distribution::D
end
```
you might declare (order matters):

```julia
MLJBase.hyperparameter_ranges(::Type{<:MyModel}) =
(range(Float64, :alpha, lower=0, upper=1, scale=:log),
range(Int, :beta, lower=1, upper=Inf, origin=100, unit=50, scale=:log),
nothing)
```

Here is the complete list of trait function declarations for `DecisionTreeClassifier`
([source](https://github.com/alan-turing-institute/MLJModels.jl/blob/master/src/DecisionTree.jl)):

Expand Down Expand Up @@ -778,4 +842,4 @@ The MLJ model registry is located in the [MLJModels.jl repository](https://githu

1) Ensure your model conforms to the interface defined above
2) Raise an issue at https://github.com/alan-turing-institute/MLJModels.jl/issues and point out where the MLJ-interface implementation is, e.g. by providing a link to the code.
3) The core developer will then review your implementation and work with you to add the model to the registry
3) An administrator will then review your implementation and work with you to add the model to the registry
10 changes: 5 additions & 5 deletions docs/src/evaluating_model_performance.md
Expand Up @@ -90,15 +90,15 @@ Or define their own re-usable `ResamplingStrategy` objects, - see


```@docs
Holdout
MLJBase.Holdout
```

```@docs
CV
MLJBase.CV
```

```@docs
StratifiedCV
MLJBase.StratifiedCV
```


Expand Down Expand Up @@ -157,9 +157,9 @@ end
### API

```@docs
evaluate!
MLJBase.evaluate!
```

```@docs
evaluate
MLJBase.evaluate
```
3 changes: 2 additions & 1 deletion docs/src/tuning_models.md
Expand Up @@ -127,7 +127,8 @@ dimension of the grid. See [Grid](@ref) below for details.
### API

```@docs
MLJ.range
MLJBase.range
MLJBase.iterator
MLJ.Grid
MLJ.TunedModel
```
19 changes: 7 additions & 12 deletions src/MLJ.jl
Expand Up @@ -6,10 +6,7 @@ module MLJ
export MLJ_VERSION

# utilities.jl:
export @curve, @pcurve, pretty

# resampling.jl:
export Holdout, CV, StratifiedCV, evaluate!, Resampler
export @curve, @pcurve

# parameters.jl:
export Params, iterator
Expand Down Expand Up @@ -51,7 +48,7 @@ export nrows, nfeatures, color_off, color_on,
DeterministicNetwork, ProbabilisticNetwork,
target_scitype, input_scitype, output_scitype,
predict, predict_mean, predict_median, predict_mode,
transform, inverse_transform, se, evaluate, fitted_params, params,
transform, inverse_transform, evaluate, fitted_params, params,
@constant, @more, HANDLE_GIVEN_ID, UnivariateFinite,
classes, table, report, rebind!,
partition, unpack,
Expand All @@ -61,9 +58,11 @@ export nrows, nfeatures, color_off, color_on,
Machine, NodalMachine, machine, AbstractNode,
source, node, fit!, freeze!, thaw!, Node, sources, origins,
machines, sources, anonymize!, @from_network, fitresults,
@pipeline
@pipeline,
ResamplingStrategy, Holdout, CV,
StratifiedCV, evaluate!, Resampler, iterator,
default_resource, pretty

# re-export from MLJBase - relating to measures:
export measures,
orientation, reports_each_observation,
is_feature_dependent, aggregation,
Expand Down Expand Up @@ -122,7 +121,6 @@ import Distributions: pdf, mode
import Statistics, StatsBase, LinearAlgebra, Random
import Random: AbstractRNG, MersenneTwister
using ProgressMeter
import PrettyTables
using ComputationalResources
using ComputationalResources: CPUProcesses
using DocStringExtensions: SIGNATURES, TYPEDEF
Expand All @@ -131,14 +129,13 @@ using RecipesBase
# to be extended:
import MLJBase: fit, update, clean!, fit!,
predict, predict_mean, predict_median, predict_mode,
transform, inverse_transform, se, evaluate, fitted_params,
transform, inverse_transform, evaluate, fitted_params,
show_as_constructed, ==, getindex, setindex!
import MLJModels: models


## CONSTANTS

const DEFAULT_RESOURCE = Ref{AbstractResource}(CPU1())
const srcdir = dirname(@__FILE__)
const CategoricalElement = Union{CategoricalString,CategoricalValue}

Expand All @@ -156,8 +153,6 @@ const MLJ_VERSION = toml["version"]
## INCLUDE FILES

include("utilities.jl") # general purpose utilities
include("resampling.jl") # resampling strategies and model evaluation
include("parameters.jl") # hyperparameter ranges and grid generation
include("tuning.jl")
include("learning_curves.jl")
include("ensembles.jl") # homogeneous ensembles
Expand Down
8 changes: 4 additions & 4 deletions src/ensembles.jl
Expand Up @@ -242,7 +242,7 @@ function DeterministicEnsembleModel(;atom=DeterministicConstantClassifier(),
bagging_fraction=0.8,
rng=Random.GLOBAL_RNG,
n::Int=100,
acceleration=DEFAULT_RESOURCE[],
acceleration=default_resource(),
out_of_bag_measure=[])

model = DeterministicEnsembleModel(atom, atomic_weights, bagging_fraction, rng,
Expand Down Expand Up @@ -299,7 +299,7 @@ function ProbabilisticEnsembleModel(;atom=ConstantProbabilisticClassifier(),
bagging_fraction=0.8,
rng=Random.GLOBAL_RNG,
n::Int=100,
acceleration=DEFAULT_RESOURCE[],
acceleration=default_resource(),
out_of_bag_measure=[])

model = ProbabilisticEnsembleModel(atom, atomic_weights, bagging_fraction, rng, n, acceleration, out_of_bag_measure)
Expand All @@ -319,7 +319,7 @@ end
bagging_fraction=0.8,
n=100,
rng=GLOBAL_RNG,
acceleration=DEFAULT_RESOURCE[],
acceleration=default_resource(),
out_of_bag_measure=[])
Create a model for training an ensemble of `n` learners, with optional
Expand Down Expand Up @@ -449,7 +449,7 @@ function fit(model::EitherEnsembleModel{Atom},

acceleration = model.acceleration
if acceleration isa CPUProcesses && nworkers() == 1
acceleration = DEFAULT_RESOURCE[]
acceleration = default_resource()
end

if model.out_of_bag_measure isa Vector
Expand Down

0 comments on commit 8347643

Please sign in to comment.