MLJScientificTypes.jl

Linux	Coverage	Documentation

This repository is now deprecated. The last supported release is MLJScientificTypes 0.4.8. ScientificTypes 2.0 and higher now serves the original purpose of MLJScientificTypes, implementing a scientific type convention called DefaultConvention (but previously known as the MLJ convention).

The scientific types themselves (on which all scientific type conventions are based) are now defined in ScientificTypesBase. Previously ScientificTypes (versions 1.1.1 and lower) defined the basic types and API.

Implementation of a convention for scientific types, as used in the MLJ universe.

Important note. While this document refers to the MLJ convention, this convention could (and, hopefully, will) be adopted in statistical/scientific software outside of the MLJ project. Of its dependencies, only the tiny package ScientificTypes.jl has any direct connection to MLJ.

This package makes a distinction between machine type and scientific type of a Julia object:

The machine type refers to the Julia type being used to represent the object (for instance, Float64).
The scientific type is one of the types defined in ScientificTypes.jl reflecting how the object should be interpreted (for instance, Continuous or Multiclass).

Installation

using Pkg
Pkg.add(MLJScientificTypes)

Who is this repository for?

This repository has two kinds of users in mind:

users of software in the MLJ universe seeking a deeper understanding of the use of scientific types and associated tools; these users do not need to directly install this package but may find its documentation helpful
developers of statistical and scientific software who want to articulate their data type requirements in a generic, purpose-oriented way, and who are furthermore happy to adopt an existing convention about what data types should be used for what purpose (a convention that has been successfully adopted in an existing large scale Julia project)

Developers interested in implementing a different convention will instead import Scientific Types.jl, following the documentation there, possibly using this repo as a template.

What's provided here?

The module MLJScientificTypes defined in this repo rexports the scientific types and associated methods defined in Scientific Types.jl and provides:

a collection of ScientificTypes.scitype definitions that articulate the MLJ convention, importing the module automatically activating the convention
a coerce function, for changing machine types to reflect a specified scientific interpretation (scientific type)
an autotype fuction for "guessing" the intended scientific type of data

Very quick start

For more information and examples please refer to the manual.

using MLJScientificTypes, DataFrames
X = DataFrame(
    a = randn(5),
    b = [-2.0, 1.0, 2.0, missing, 3.0],
    c = [1, 2, 3, 4, 5],
    d = [0, 1, 0, 1, 0],
    e = ['M', 'F', missing, 'M', 'F'],
    )
sch = schema(X)

will print

_.table =
┌─────────┬─────────────────────────┬────────────────────────────┐
│ _.names │ _.types                 │ _.scitypes                 │
├─────────┼─────────────────────────┼────────────────────────────┤
│ a       │ Float64                 │ Continuous                 │
│ b       │ Union{Missing, Float64} │ Union{Missing, Continuous} │
│ c       │ Int64                   │ Count                      │
│ d       │ Int64                   │ Count                      │
│ e       │ Union{Missing, Char}    │ Union{Missing, Unknown}    │
└─────────┴─────────────────────────┴────────────────────────────┘
_.nrows = 5

Detail is obtained in the obvious way; for example:

julia> sch.names
(:a, :b, :c, :d, :e)

To specify that instead b should be regared as Count, and that both d and e are Multiclass, we use the coerce function:

Xc = coerce(X, :b=>Count, :d=>Multiclass, :e=>Multiclass)
schema(Xc)

which prints

_.table =
┌─────────┬──────────────────────────────────────────────┬───────────────────────────────┐
│ _.names │ _.types                                      │ _.scitypes                    │
├─────────┼──────────────────────────────────────────────┼───────────────────────────────┤
│ a       │ Float64                                      │ Continuous                    │
│ b       │ Union{Missing, Int64}                        │ Union{Missing, Count}         │
│ c       │ Int64                                        │ Count                         │
│ d       │ CategoricalValue{Int64,UInt32}               │ Multiclass{2}                 │
│ e       │ Union{Missing, CategoricalValue{Char,UInt32}}│ Union{Missing, Multiclass{2}} │
└─────────┴──────────────────────────────────────────────┴───────────────────────────────┘
_.nrows = 5

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
.github/workflows		.github/workflows
docs		docs
src		src
test		test
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

docs

docs

src

src

test

test

.gitignore

.gitignore

LICENSE.md

LICENSE.md

Project.toml

Project.toml

README.md

README.md

Repository files navigation

MLJScientificTypes.jl

Contents

Installation

Who is this repository for?

What's provided here?

Very quick start

About

Releases 25

Packages

Contributors 8

Languages

License

JuliaAI/MLJScientificTypes.jl

Folders and files

Latest commit

History

Repository files navigation

MLJScientificTypes.jl

Contents

Installation

Who is this repository for?

What's provided here?

Very quick start

About

Resources

License

Stars

Watchers

Forks

Languages