# AutoML
AutoML is an automated way to choose the best machine learning algorithm for your problem with the best set of parameters. In this tutorial, we will show you how to run AutoML on a simple classification problem and then use the best classifier found to make some predictions.

## Notice:
You can use the AutoML library for regression problems.

# Import libraries

In [68]:
def install_packages():
  !pip install auto-sklearn
   

import numpy as np
import sklearn.datasets
import sklearn.metrics
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

try:
  import autosklearn.classification
except:
  install_packages()
  import autosklearn.classification


# Create a classification problem
This is a simple classification problem. At this stage, you can define your classifiation problem. The output of this stage should be your training and testing data.

In [69]:
X, y = make_classification(n_samples=150, n_classes=3,
                            n_features=5, n_informative=3, n_redundant=0,
                            random_state=0)
x_train, x_test, y_train , y_test = train_test_split(X,y)
print('Training input data shape: ', x_train.shape)
print('Training output data shape: ', y_train.shape)
print('Testing data shape: ', x_test.shape)
print('Testing data shape: ', y_test.shape)

Training input data shape:  (112, 5)
Training output data shape:  (112,)
Testing data shape:  (38, 5)
Testing data shape:  (38,)


# Define AutoML object with time limits
Choosing the best algorithm and best set of parameters is an optimisation problem. We might don't reach the optimal algorithm and set of parameters for our problem, but we will get a good enough solution. Thus, you need to specify the time limits for the optimisation problem and each hyper-parameter tunning run.

If  you have a regression problem, then you can use  `autosklearn.regression.AutoSklearnRegressor`. [Here is an example](https://automl.github.io/auto-sklearn/master/examples/20_basic/example_regression.html#sphx-glr-examples-20-basic-example-regression-py).

In [70]:
automl = autosklearn.classification.AutoSklearnClassifier(
                                                          time_left_for_this_task=60, # in seconds
                                                          per_run_time_limit=30 # in seconds
                                                          )

# Train the AutoML object on the train data

In [71]:
automl.fit(x_train, y_train, dataset_name='sklearn_classification_dataset')



AutoSklearnClassifier(per_run_time_limit=30, time_left_for_this_task=60)

# Check the classifiers that are investigated

In [72]:
for algorithm in automl.show_models().items():
  print(algorithm[1]['sklearn_classifier'])
  print()

ExtraTreesClassifier(criterion='entropy', max_features=15, min_samples_leaf=2,
                     min_samples_split=20, n_estimators=512, n_jobs=1,
                     random_state=1, warm_start=True)

HistGradientBoostingClassifier(early_stopping=False,
                               l2_regularization=4.821686883442146e-05,
                               learning_rate=0.10161621495242192, max_iter=512,
                               max_leaf_nodes=535, min_samples_leaf=10,
                               n_iter_no_change=0, random_state=1,
                               validation_fraction=None, warm_start=True)

HistGradientBoostingClassifier(early_stopping=False,
                               l2_regularization=1.0647401999412075e-10,
                               learning_rate=0.08291320147381159, max_iter=512,
                               max_leaf_nodes=39, n_iter_no_change=0,
                               random_state=1, validation_fraction=None,
                           

# Print the best classifier and show the validation accuracy
You can check the other classifiers by using `automl.show_models()`

In [73]:
list(list(automl.show_models().values())[0].values())[8]

ExtraTreesClassifier(criterion='entropy', max_features=15, min_samples_leaf=2,
                     min_samples_split=20, n_estimators=512, n_jobs=1,
                     random_state=1, warm_start=True)

In [74]:
print(automl.sprint_statistics())

auto-sklearn results:
  Dataset name: sklearn_classification_dataset
  Metric: accuracy
  Best validation score: 0.810811
  Number of target algorithm runs: 24
  Number of successful target algorithm runs: 23
  Number of crashed target algorithm runs: 0
  Number of target algorithms that exceeded the time limit: 1
  Number of target algorithms that exceeded the memory limit: 0



# Predict on the test set using the best classifier found

In [75]:
predictions = automl.predict(x_test)

accuracy = sklearn.metrics.accuracy_score(y_test, predictions)
print(f'Accuracy of the best classifier: {round(accuracy,2)}%')

Accuracy of the best classifier: 0.89%


# References
* https://automl.github.io/auto-sklearn/master/examples/index.html