# AutoML

Testing out automl frameworks

## Auto-Sklearn

### Installation

**<span style="color:red">Warning : auto-sklearn doesn't work on Windows</span>**

```bash
!pip install auto-sklearn
```

### Benchmark against regular models 

In [2]:
from sklearn.datasets import load_iris

Loading the iris dataset

In [3]:
data, labels = load_iris(return_X_y=True)

Splitting the data into training and test sets
80% / 20%

In [12]:
split_idx = int(data.shape[0] * 0.8)

train_x = data[:split_idx]
train_y = labels[:split_idx]
test_x = data[split_idx:]
test_y = labels[split_idx:]

Setup 3 classifiers for base models of the benchmark

In [10]:
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier()
mlp = MLPClassifier(max_iter=2000)
svm = SVC()

In [11]:
import time

t0 = time.time()
knn.fit(train_x, train_y)
mlp.fit(train_x, train_y)
svm.fit(train_x, train_y)
t1 = time.time()

Checking the accuracy of the base models

In [25]:
from sklearn.metrics import accuracy_score

knn_predict = knn.predict(test_x)
train_knn_predict = knn.predict(train_x)

svm_predict = svm.predict(test_x)
train_svm_predict = svm.predict(train_x)

mlp_predict = mlp.predict(test_x)
train_mlp_predict = mlp.predict(train_x)

knn_accuracy = accuracy_score(test_y, knn_predict)
train_knn_accuracy = accuracy_score(train_y,train_knn_predict)

svm_accuracy = accuracy_score(test_y, svm_predict)
train_svm_accuracy = accuracy_score(train_y,train_svm_predict)

mlp_accuracy = accuracy_score(test_y, mlp_predict)
train_mlp_accuracy = accuracy_score(train_y,train_mlp_predict)

print(f"Train accuracy:\n\tsvm {train_svm_accuracy:.4f},\n\tknn {train_knn_accuracy:.4f},\n\tmlp {train_mlp_accuracy:.4f}")
print(f"Test accuracy:\n\tsvm {svm_accuracy:.4f},\n\tknn {knn_accuracy:.4f},\n\tmlp {mlp_accuracy:.4f}")
print(f"time to fit: {int((t1-t0)*1000)} ms (all base models combined)")

Train accuracy:
	svm 0.9667,
	knn 0.9833,
	mlp 0.9833
Test accuracy:
	svm 0.7000,
	knn 0.8000,
	mlp 0.7667
time to fit: 311 ms (all base models combined)


Setup the automl model from sklearn

In [3]:
from autosklearn.classification import AutoSklearnClassifier as ASC

classifier = ASC()
classifier.time_left_for_this_task = 300

t0 = time.time()
classifier.fit(train_x, train_y)
t1 = time.time()

autosk_predict = classifier.predict(test_x)
train_autosk_predict = classifier.predict(train_x)

autosk_accuracy = accuracy_score(test_y, autosk_predict)
train_autosk_accuracy = accuracy_score(train_y, train_autosk_predict)

print(f"train accuracy {train_autosk_accuracy:.4f}")
print(f"test accuracy {autosk_accuracy:.4f}")
print(f"time to fit: {int(t1-t0)} seconds")

train accuracy 0.9970
test accuracy 0.9890
time to fit: 300 seconds
