This library is a utility library used to transform targets which can come in an array of ways (booleans, $(-1,1)$, continuous variables) into specific labels. This is very important in many classification problems, since for instance NNs expect vectors of the form $[1,0,0]$ for three classes, while SVMs expect scalars in the set ${-1,1}$.

Current state:

| Test        | Results           
| ------------- |:-------------:|
| Package works | yes |
| Deprecations warnings      | No      |
| Compatible with JuliaDB | If targets transformed into array |
| Contains documentation | yes |
| Simplicity | good |


In [1]:
using MLLabelUtils
include("load_titanic.jl");

In [2]:
train, train_targets, test, test_targets = load();

Note that $train\_targets$ is an Int64 array of ones and zeros.
If we wanted to quickly find this, we could use:

In [3]:
# Automatically finds most adapted label type
encoded_tragets = labelenc(train_targets)

MLLabelUtils.LabelEnc.OneOfK{Int64,632}()

Note that for the previous step to work, the array must be truly 1-dimensional. If a column is taken from a matrix as is often the case since targets tend to be stored as one of the columns in the training dataset, this method will fail as it will think of the 1-dimensional array as a matrix with one column, and will therefore give a "OneVsK" type as can be seen. 

To avoid this:

In [4]:
println("Current type is: $(typeof(train_targets))")
train_targets = train_targets[:]
println("The modified type is: $(typeof(train_targets))")

Current type is: Array{Int64,2}
The modified type is: Array{Int64,1}


And now we would get the correct format:

In [5]:
encoded_tragets = labelenc(train_targets);

If we were to train an SVM, we would need margin based targets, which can be calculated from:

In [6]:
convertlabel(LabelEnc.MarginBased, train_targets);
# Returns arrays of 1 & -1

We can also check whether the current encoding is of a given type:

In [7]:
# Tells if the current variable is of this encoding type
islabelenc(train_targets, LabelEnc.ZeroOne); # True
islabelenc(train_targets, LabelEnc.MarginBased); # False

A label very often required is the categorical label, which transforms scalar labels into vector labels. This can be done using

In [8]:
# Converting to categorical data for NNs with 3 theoretical classes
convertlabel(LabelEnc.OneOfK(Float32,3), [-1,1,-1,1,0,1,-1])

3×7 Array{Float32,2}:
 1.0  0.0  1.0  0.0  0.0  0.0  1.0
 0.0  1.0  0.0  1.0  0.0  1.0  0.0
 0.0  0.0  0.0  0.0  1.0  0.0  0.0