# Julia Academy

## Data Science Course

# 6. Classical Classification Algorithms in Julia

**Huda Nassar**

**Source:** https://github.com/JuliaAcademy/DataScience/blob/main/06.%20Classification.ipynb

We will use different implementations of classical classification algorithms in Julia and compare them at the end. We will use the classic iris dataset.

In [1]:
using GLMNet
using RDatasets
using MLBase
using Plots
using DecisionTree
using Distances
using NearestNeighbors
using Random
using LinearAlgebra
using DataStructures
using LIBSVM

┌ Info: Precompiling GLMNet [8d5ece8b-de18-5317-b113-243142960cc6]
└ @ Base loading.jl:1278
┌ Info: Precompiling DecisionTree [7806a523-6efd-50cb-b5f6-3fa6f1930dbb]
└ @ Base loading.jl:1278
┌ Info: Precompiling LIBSVM [b1bec4e5-fd48-53fe-b0cb-9723c09d164b]
└ @ Base loading.jl:1278


In [4]:
ENV["LINES"], ENV["COLUMNS"] = 15, 200;

We will use a simple function for computing the accuracy:

In [2]:
compute_accuracy(predicted, groundtruth) = sum(predicted .== groundtruth) / length(groundtruth)

compute_accuracy (generic function with 1 method)

In [5]:
iris = dataset("datasets", "iris")

Unnamed: 0_level_0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Cat…
1,5.1,3.5,1.4,0.2,setosa
2,4.9,3.0,1.4,0.2,setosa
3,4.7,3.2,1.3,0.2,setosa
4,4.6,3.1,1.5,0.2,setosa
5,5.0,3.6,1.4,0.2,setosa
6,5.4,3.9,1.7,0.4,setosa
7,4.6,3.4,1.4,0.3,setosa
8,5.0,3.4,1.5,0.2,setosa
9,4.4,2.9,1.4,0.2,setosa
10,4.9,3.1,1.5,0.1,setosa


Our goal is to predict the species from the measured feature variables.

In [6]:
X = Matrix(iris[:, 1:4])  # feature matrix
irislabels = iris[:, 5];   # labels

In [7]:
unique(irislabels)

3-element Array{String,1}:
 "setosa"
 "versicolor"
 "virginica"

We need to encode the labels. We can use MLBase to do this easily

In [9]:
irislabels_map = MLBase.labelmap(irislabels)
y = MLBase.labelencode(irislabels_map, irislabels) 

150-element Array{Int64,1}:
 1
 1
 1
 1
 1
 ⋮
 3
 3
 3
 3
 3

#### Train-Test Split

In [14]:
function get_class_training_indices(labels, train_ratio)
    uids = unique(labels)
    keepids = []
    for label in uids
        label_indices = findall(labels.==label)
        label_train_indices = randsubseq(label_indices, train_ratio)
        push!(keepids, label_train_indices...)
    end
    return keepids
end

get_class_training_indices (generic function with 1 method)

In [17]:
train_idx = get_class_training_indices(y, 0.7)
test_idx = setdiff(1:length(y), train_idx)

println("Number of training samples: ", length(train_idx))
println("Number of test samples: ", length(test_idx))

Number of training samples: 111
Number of test samples: 39


Finally we will need a function to assign class labels based on the continuous outputs of the GLMNet classification algorithms we'll use. The outputs will be real-valued near the integer class ID's.

In [18]:
assign_class(predicted_output) = argmin(abs.(predicted_output .- [1,2,3]))

assign_class (generic function with 1 method)

## Method 1: The Lasso

We will use logistic regression with L1 reguralirization using a Julia wrapper of Trevor Hastie et al's [glmnet](https://www.jstatsoft.org/article/view/v033i01) package.

glmnet runs a generic regression for different values of parameter \lambda. These runs are done along a so-called path.

> **Note:** glmnet also has a parameter \alpha which controls whether we use ridge regression (L2 norm) or lasso (L1 norm). By default \alpha != 1 implies using Lasso.

In [20]:
path = glmnet(X[train_idx, :], y[train_idx])

Least Squares GLMNet Solution Path (70 solutions for 4 predictors in 1089 passes):
──────────────────────────────
      df   pct_dev           λ
──────────────────────────────
 [1]   0  0.0       0.789924
 [2]   2  0.156264  0.719749
 [3]   2  0.28783   0.655809
 [4]   2  0.397058  0.597549
 [5]   2  0.48774   0.544464
 [6]   2  0.563025  0.496095
 [7]   2  0.625526  0.452024
 [8]   2  0.677415  0.411867
 [9]   2  0.720493  0.375278
[10]   2  0.756256  0.341939
[11]   2  0.785946  0.311562
[12]   2  0.810595  0.283884
[13]   2  0.831059  0.258664
[14]   2  0.848047  0.235685
[15]   2  0.862151  0.214748
[16]   2  0.87386   0.19567
[17]   2  0.88358   0.178287
[18]   2  0.89165   0.162449
[19]   2  0.898345  0.148017
[20]   2  0.903906  0.134868
[21]   2  0.908523  0.122887
[22]   2  0.912357  0.11197
[23]   2  0.915539  0.102023
[24]   2  0.918178  0.0929592
[25]   2  0.920371  0.084701
[26]   2  0.922191  0.0771764
[27]   2  0.923703  0.0703202
[28]   2  0.924956  0.0640732
[29]   3  

How should we choose the correct value of \lambda? We can use cross validation:

In [21]:
cv_folds = glmnetcv(X[train_idx, :], y[train_idx])

Least Squares GLMNet Cross Validation
70 models for 4 predictors in 10 folds
Best λ 0.002 (mean loss 0.047, std 0.008)

And now we can select the best lambda from the paths

In [22]:
λ = path.lambda[argmin(cv_folds.meanloss)]

0.0018677669431407396

Now we will set the best lambda and run a training pass using this parameter

In [26]:
best_lasso_path = glmnet(X[train_idx, :], y[train_idx], lambda=[λ])

Least Squares GLMNet Solution Path (1 solutions for 4 predictors in 65 passes):
─────────────────────────────
     df   pct_dev           λ
─────────────────────────────
[1]   3  0.936799  0.00186777
─────────────────────────────

#### Test Predictions

Now we can use the best path to get the test predictions using the Lasso algorithm

In [30]:
test_query = X[test_idx, :]

predictions_lasso = GLMNet.predict(best_lasso_path, test_query)

39×1 Array{Float64,2}:
 0.9513267362335774
 1.0223467910449973
 0.9663341636378728
 0.9134379730064015
 0.8979284103258488
 ⋮
 2.883368271870407
 2.6727877798970363
 2.754192419508327
 3.0104009540889516
 2.7049148009287656

In [31]:
predictions_lasso = assign_class.(predictions_lasso)

39×1 Array{Int64,2}:
 1
 1
 1
 1
 1
 ⋮
 3
 3
 3
 3
 3

In [32]:
compute_accuracy(predictions_lasso, y[test_idx])

0.9743589743589743

## Method 2: Ridge Regression

If we want ridge regression, we can set glmnet's \alpha parameter to zero.

In [34]:
ridge_path = glmnet(X[train_idx, :], y[train_idx], alpha=0)
ridge_cv_folds = glmnetcv(X[train_idx, :], y[train_idx], alpha=0)
λ_ridge = ridge_path.lambda[argmin(ridge_cv_folds.meanloss)]

best_ridge_path = glmnet(X[train_idx, :], y[train_idx], alpha=0, lambda=[λ_ridge])

Least Squares GLMNet Solution Path (1 solutions for 4 predictors in 29 passes):
────────────────────────────
     df   pct_dev          λ
────────────────────────────
[1]   4  0.928539  0.0789924
────────────────────────────

In [35]:
predictions_ridge = GLMNet.predict(best_ridge_path, test_query)
predictions_ridge = assign_class.(predictions_ridge)
compute_accuracy(predictions_ridge, y[test_idx])

1.0

## Method 3: Elastic Net

Elastic net is a weighted combination of Ridge and Lasso regression. We will use \alpha=0.5

In [36]:
elastic_path = glmnet(X[train_idx, :], y[train_idx], alpha=0.5)
elastic_cv_folds = glmnetcv(X[train_idx, :], y[train_idx], alpha=0.5)
λ_elastic = elastic_path.lambda[argmin(elastic_cv_folds.meanloss)]

best_elastic_path = glmnet(X[train_idx, :], y[train_idx], alpha=0.5, lambda=[λ_elastic])

Least Squares GLMNet Solution Path (1 solutions for 4 predictors in 70 passes):
────────────────────────────
     df  pct_dev           λ
────────────────────────────
[1]   3  0.93684  0.00177468
────────────────────────────

In [37]:
predictions_elastic = GLMNet.predict(best_elastic_path, test_query)
predictions_elastic = assign_class.(predictions_elastic)
compute_accuracy(predictions_elastic, y[test_idx])

0.9743589743589743

## Method 4: Decision Trees

We will use the `DecisionTree.jl` package to work with decision tree algorithms.

In [39]:
model_DT = DecisionTreeClassifier(max_depth=2)
DecisionTree.fit!(model_DT, X[train_idx, :], y[train_idx])

DecisionTreeClassifier
max_depth:                2
min_samples_leaf:         1
min_samples_split:        2
min_purity_increase:      0.0
pruning_purity_threshold: 1.0
n_subfeatures:            0
classes:                  [1, 2, 3]
root:                     Decision Tree
Leaves: 3
Depth:  2

Now we will query the tree on the test data

In [40]:
predictions_DT = DecisionTree.predict(model_DT, test_query)
compute_accuracy(predictions_DT, y[test_idx])

0.9743589743589743

## Method 5: Random Forest

Decision tree classifiers can be scaled to random forests

In [41]:
model_RF = RandomForestClassifier(n_trees=20)
DecisionTree.fit!(model_RF, X[train_idx, :], y[train_idx])

RandomForestClassifier
n_trees:             20
n_subfeatures:       -1
partial_sampling:    0.7
max_depth:           -1
min_samples_leaf:    1
min_samples_split:   2
min_purity_increase: 0.0
classes:             [1, 2, 3]
ensemble:            Ensemble of Decision Trees
Trees:      20
Avg Leaves: 6.8
Avg Depth:  4.9

In [42]:
predictions_RF = DecisionTree.predict(model_RF, test_query)
compute_accuracy(predictions_RF, y[test_idx])

0.9743589743589743

## Method 6: Nearest Neighbors

Now we will use the nearest neighbor method for classifying. The algorithm is similar to kmeans but the details are different.

We will start by building a k-nearest neighbours look-up table based on the training input data only:

In [43]:
kdtree = KDTree(X[train_idx, :]')  # we use the transpose

KDTree{StaticArrays.SArray{Tuple{4},Float64,1,4},Euclidean,Float64}
  Number of points: 111
  Dimensions: 4
  Metric: Euclidean(0.0)
  Reordered: true

Now we will map the 5 nearest training samples to each test sample:

In [44]:
idxs, dists = knn(kdtree, test_query', 5, true);

And now we will use the above indices and distances for each test sample to make predictions:

In [69]:
KNN_classes = y[train_idx][hcat(idxs...)]   # 5 nearest neighbours x n_test_samples

5×39 Array{Int64,2}:
 1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  3  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
 1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  3  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3
 1  1  1  1  1  1  1  1  1  2  2  2  2  2  3  2  2  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3  3  3  3  3  3
 1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  3  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3  3  3  2  3  3  2
 1  1  1  1  1  1  1  1  1  2  2  2  3  2  3  2  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3  3  3  2  3  3  3

In [70]:
possible_labels = map(i -> counter(KNN_classes[:, i]), 1:size(KNN_classes, 2))

39-element Array{Accumulator{Int64,Int64},1}:
 Accumulator(1 => 5)
 Accumulator(1 => 5)
 Accumulator(1 => 5)
 Accumulator(1 => 5)
 Accumulator(1 => 5)
 ⋮
 Accumulator(3 => 5)
 Accumulator(2 => 2, 3 => 3)
 Accumulator(3 => 5)
 Accumulator(3 => 5)
 Accumulator(2 => 1, 3 => 4)

In [71]:
predictions_KNN = map(i -> argmax(possible_labels[i]), 1:size(KNN_classes,2))
compute_accuracy(predictions_KNN, y[test_idx])

0.9743589743589743

## Method 7: Support Vector Machines

The last classical method we'll look at is Support Vector Machines. We'll use of course LIBSVM:

In [67]:
model_svm = LIBSVM.svmtrain(X[train_idx, :]', y[train_idx])

LIBSVM.SVM{Int64}(SVC, LIBSVM.Kernel.RadialBasis, nothing, 4, 3, [1, 2, 3], Int32[1, 2, 3], Float64[], Int32[], LIBSVM.SupportVectors{Int64,Float64}(35, Int32[5, 15, 15], [1, 1, 1, 1, 1, 2, 2, 2, 2, 2  …  3, 3, 3, 3, 3, 3, 3, 3, 3, 3], [5.7 5.7 … 6.9 5.8; 4.4 3.8 … 3.1 2.7; 1.5 1.7 … 5.1 5.1; 0.4 0.3 … 2.3 1.9], Int32[11, 13, 18, 33, 36, 42, 43, 44, 46, 47  …  93, 96, 97, 99, 101, 103, 104, 107, 109, 110], LIBSVM.SVMNode[LIBSVM.SVMNode(1, 5.7), LIBSVM.SVMNode(1, 5.7), LIBSVM.SVMNode(1, 4.8), LIBSVM.SVMNode(1, 4.5), LIBSVM.SVMNode(1, 5.1), LIBSVM.SVMNode(1, 7.0), LIBSVM.SVMNode(1, 6.4), LIBSVM.SVMNode(1, 6.9), LIBSVM.SVMNode(1, 6.3), LIBSVM.SVMNode(1, 4.9)  …  LIBSVM.SVMNode(1, 6.3), LIBSVM.SVMNode(1, 6.2), LIBSVM.SVMNode(1, 6.1), LIBSVM.SVMNode(1, 7.2), LIBSVM.SVMNode(1, 7.9), LIBSVM.SVMNode(1, 6.3), LIBSVM.SVMNode(1, 6.1), LIBSVM.SVMNode(1, 6.0), LIBSVM.SVMNode(1, 6.9), LIBSVM.SVMNode(1, 5.8)]), 0.0, [0.4402103896003432 0.9389584892572648; 0.06574646141834749 0.0; … ; -0.1377420052095

In [68]:
predictions_SVM, decision_values = LIBSVM.svmpredict(model_svm, test_query')
compute_accuracy(predictions_SVM, y[test_idx])

0.9743589743589743

## Putting Everything Together

Now we will compare the accuracy of all the methods above:

In [72]:
overall_accuracies = zeros(7)
methods = ["Lasso", "Ridge", "Elastic Net", "Decision Tree", "Random Forest", "KNN", "SVM"]

groundtruth = y[test_idx]

overall_accuracies[1] = compute_accuracy(predictions_lasso, groundtruth)
overall_accuracies[2] = compute_accuracy(predictions_ridge, groundtruth)
overall_accuracies[3] = compute_accuracy(predictions_elastic, groundtruth)
overall_accuracies[4] = compute_accuracy(predictions_DT, groundtruth)
overall_accuracies[5] = compute_accuracy(predictions_RF, groundtruth)
overall_accuracies[6] = compute_accuracy(predictions_KNN, groundtruth)
overall_accuracies[7] = compute_accuracy(predictions_SVM, groundtruth)

hcat(methods, overall_accuracies)

7×2 Array{Any,2}:
 "Lasso"          0.974359
 "Ridge"          1.0
 "Elastic Net"    0.974359
 "Decision Tree"  0.974359
 "Random Forest"  0.974359
 "KNN"            0.974359
 "SVM"            0.974359

So Ridge Regression ends up being the best method for this dataset and the rest have equivalent accuracy