Working with Tasks

Warning. The task API may be depreciated in the future.

In MLJ a task is a synthesis of three elements: data, an interpretation of that data, and a learning objective. Once one has a task one is ready to choose learning models.

Scientific types and the interpretation of data

Generally the columns of a table, such as a DataFrame, represents real quantities. However, the nature of a quantity is not always clear from the representation. For example, we might count phone calls using the UInt32 type but also use UInt32 to represent a categorical feature, such as the species of conifers. MLJ mitigates such ambiguity with the use of scientific types. See Getting Started) for details

Explicitly specifying scientific types during the construction of a MLJ task is the user's opportunity to articulate how the supplied data should be interpreted.

Learning objectives

In MLJ specifying a learning objective means specifying: (i) whether learning is supervised or not; (ii) whether, in the supervised case, predictions are to be probabilistic or deterministic; and (iii) what part of the data is relevant and what role is each part to play.

Sample usage

import Base.eval # hack because documenter puts code in baremodules
using MLJ
MLJ.color_off()

Given some data,

using RDatasets
df = dataset("boot", "channing");
first(df, 4)

we can check MLJ's interpretation of that data:

schema(df)

We construct a task by wrapping the data in a learning objective, and coercing the data into a form MLJ will correctly interpret. (The middle three fields of df refer to ages, in months, the last is a flag.):

task = supervised(data=df,
                  target=:Exit,
                  ignore=:Time,
                  is_probabilistic=true,
                  types=Dict(:Entry=>Continuous,
                             :Exit=>Continuous,
                             :Cens=>Multiclass))
schema(task.X)

Row selection and shuffling:

task[1:3].y

using Random
rng = MersenneTwister(1234)
shuffle!(rng, task) # rng is optional
task[1:3].y

Counting rows:

nrows(task)

Listing the models available to complete a task:

models(task)

Choosing a model and evaluating on the task:

using RDatasets
iris = dataset("datasets", "iris"); # a DataFrame
task = supervised(data=iris, target=:Species, is_probabilistic=true)
tree = @load DecisionTreeClassifier verbosity=1
mach = machine(tree, task)
evaluate!(mach, operation=predict_mode, 
          resampling=Holdout(), measure=misclassification_rate, verbosity=0)

API Reference

supervised

unsupervised

models

localmodels

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

working_with_tasks.md

working_with_tasks.md

Working with Tasks

Scientific types and the interpretation of data

Learning objectives

Sample usage

API Reference

Files

working_with_tasks.md

Latest commit

History

working_with_tasks.md

File metadata and controls

Working with Tasks

Scientific types and the interpretation of data

Learning objectives

Sample usage

API Reference