SupervisedLearning

Work in progress for a front-end supervised learning framework. Currently the focus is on creating a pure Julia package for SVMs in KSVM.jl

The goal of this library is manyfold:

Education: allow the user to play around with the models, solvers, etc. for educational purposes. Provide a good base for course exercises. For example visualizing the learning curve of neural networks using different optimization algorithms.
Research: Swap out parts of the machine learning pipeline with custom implementations without losing the ability to utilize the rest of the framework. For example to prototype new prediction models.
Application: Porcelain interface to apply machine learning to given datasets in a convenient way. There might be multiple high-level interface for different usergroups (e.g. one that mimics R's caret package)

Planned High-level API

The following code should already work

using SupervisedLearning
using RDatasets
using UnicodePlots

data = dataset("datasets", "mtcars")

# In this case the dataset will be in-memory and encoded to -1, 1
# There will also be support for datastreaming from HDF5
problemSet = dataSource(AM ~ DRat + WT + DRat&WT, data, SignedClassEncoding)

# Convenient to use with UnicodePlots
print(barplot(classDistribution(problemSet)...))

# Methods for splitting the abstract data sets
trainSet, testSet = splitTrainTest!(problemSet, p_train = .75)

# Specifies the model and modelspecific parameter
model = Classifier.LogisticRegression(l2_coef=0.1)

# Backend for neural networks will be Mocha.jl or OnlineAI.jl
# model = Classifier.FeedForwardNeuralNetwork([4,5,7],[ReLu,ReLu,ReLu])

# train! mutates the model state
#  * the do-block is the callback function which also allows for early stopping
#  * In the regression case Solver.GradientDescent() will result in using Regression.jl, 
#    otherwise (in most deterministic cases) Optim.jl
#  * There will also be stochastic gradient descent with minibatches
train!(model, trainSet, Solver.GradientDescent(), max_iter = 10000, break_every = 100) do
  # You can also use the callback to execute any code
  # For example to print informative messages
  println("Testset accuracy: ", accuracy(model, testSet))
  
  # You can easily store custom learning curves or other arbitrary values
  # They will be linked to the correct iteration automatically
  remember!(model, :testsetCost, cost(model, testSet))
end

# The loss of the training set is stored by default and can be accessed with trainingCurve
# x is a Vector{Int} of iterations with stepsize break_every,
# y is a Vector{Float64} where y[i] is the cost of the trainSet at x[i]
x, y = trainingCurve(model)
print(lineplot(x, y, title = "Learning curve for trainSet"))

# Customly stored curves can be accessed with "history"
# x is a Vector{Int} of iterations (exact values depend on when you called remember!),
# y is a Vector{Float64} where y[i] is the cost of the testSet at x[i]
x, y = history(model, :testsetCost)
print(lineplot(x, y, title = "Learning curve for testSet"))

ŷ = predict(model, testSet) # what the model says
t = groundtruth(testSet) # what it should be

Planned Mid-level API

This is just a rough draft and still object to change

using SupervisedLearning
using RDatasets

data = dataset("datasets", "mtcars")

# In this case the dataset will be in-memory.
# Specifying the encoding is not necessary.
# The model will select the encoding it needs automatically
# Trees for example don't need an encoding at all.
problemSet = dataSource(AM ~ DRat + WT, data)

# Methods for splitting the abstract data sets
trainSet, testSet = splitTrainTest!(problemSet, p_train = .75)

# Perform a gridsearch over an arbitrary modelspace
gsResult = gridsearch([.001, .01, .1], [.0001, .0003]) do lr, lambda

  # Perform cross validation to get a good estimate for the hyperparameter performance
  cvResult = crossvalidate(trainSet, k = 5) do trainFold, valFold

    # Specify the model and model-specific parameters
    model = Classifier.LogisticRegression(l2_coef = lambda)

    # Specify the solver and solver-specific parameters
    solver = Solver.NaiveGradientDescent(learning_rate = lr, normalize_gradient = false)

    # train! mutates the model state
    train!(model, trainFold, solver, max_iter = 1000)

    # make sure to return the trained model
    model
  end

  # You can return a model or a cvResult to gridsearch
  cvResult
end

# Plot the final accuracy of all trained models using UnicodePlots
print(barplot(accuracy(gsResult, testSet)...))

# Get the best model
bestModel = gsResult.bestModel
ŷ = predict(bestModel, testSet)

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
misc		misc
src		src
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.md		LICENSE.md
README.md		README.md
REQUIRE		REQUIRE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SupervisedLearning

Planned High-level API

Planned Mid-level API

About

Releases

Packages

Languages

License

Evizero/SupervisedLearning.jl

Folders and files

Latest commit

History

Repository files navigation

SupervisedLearning

Planned High-level API

Planned Mid-level API

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages