# AutoML: Machine Learning automatico

El proceso de Machine Learning se puede automatizar un poco. Hay varios servicios en internet que lo hacen y tambien varias librerias. Aqui veremos la libreria FLAML de microsoft: 
- https://microsoft.github.io/FLAML/docs/Getting-Started/

In [1]:
!pip install "flaml[automl]"



Como vimos en la unidad anterior el proceso de Machine Learning culmina entrenando distintas cajas negras con distintos algoritmos y ajustando los parametros de dichos algoritmos para que el resultado del entrenamiento sea el mejor posible. Ahora veremos como utilizar la libreria FLAML para automatizar este proceso:

In [18]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

In [19]:
df_train = pd.read_csv('https://raw.githubusercontent.com/amiune/freecodingtour/main/cursos/espanol/datascience/data/diabetes/diabetes_train_procesado.csv')
df_test = pd.read_csv('https://raw.githubusercontent.com/amiune/freecodingtour/main/cursos/espanol/datascience/data/diabetes/diabetes_test_procesado.csv')

In [20]:
X_train = df_train.loc[:, df_train.columns != "diabetes"]
y_train = df_train.loc[:, "diabetes"]

X_train.head()

Unnamed: 0,fuma,alcohol,gimnasia,edad,altura,peso,presion1,presion2,colesterol_alto,colesterol_bajo,colesterol_medio,glucosa_alta,glucosa_baja,glucosa_media,sexo_f,sexo_m
0,0,0,1,0.11991,0.567395,0.12628,-0.175302,-0.086953,0,1,0,0,1,0,1,0
1,0,0,1,-0.618194,0.079476,-0.63953,-0.054742,-0.076428,0,1,0,0,1,0,0,1
2,0,0,1,1.030711,0.689375,-1.266102,-0.054742,-0.086953,0,1,0,0,1,0,0,1
3,0,0,0,0.353273,0.567395,-0.848387,-0.054742,-0.086953,0,1,0,0,1,0,0,1
4,0,0,0,-1.298867,0.201456,-0.500292,-0.054742,-0.086953,0,1,0,0,1,0,0,1


In [21]:
from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification", time_budget=10)

[flaml.automl.logger: 02-14 16:27:28] {1679} INFO - task = classification
[flaml.automl.logger: 02-14 16:27:28] {1690} INFO - Evaluation method: holdout
[flaml.automl.logger: 02-14 16:27:28] {1788} INFO - Minimizing error metric: 1-roc_auc
[flaml.automl.logger: 02-14 16:27:28] {1900} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost', 'extra_tree', 'xgb_limitdepth', 'lrl1']
[flaml.automl.logger: 02-14 16:27:28] {2218} INFO - iteration 0, current learner lgbm
[flaml.automl.logger: 02-14 16:27:28] {2344} INFO - Estimated sufficient time budget=858s. Estimated necessary time budget=20s.
[flaml.automl.logger: 02-14 16:27:28] {2391} INFO -  at 0.1s,	estimator lgbm's best error=0.0939,	best estimator lgbm's best error=0.0939
[flaml.automl.logger: 02-14 16:27:28] {2218} INFO - iteration 1, current learner lgbm
[flaml.automl.logger: 02-14 16:27:28] {2391} INFO -  at 0.2s,	estimator lgbm's best error=0.0939,	best estimator lgbm's best error=0.0939
[flaml.automl.logger: 02-14 16:

In [22]:
dir(automl)

['__class__',
 '__del__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__version__',
 '__weakref__',
 '_active_estimators',
 '_auto_augment',
 '_best_estimator',
 '_best_iteration',
 '_check_feature_names',
 '_check_n_features',
 '_config_history',
 '_decide_eval_method',
 '_df',
 '_early_stop',
 '_eci',
 '_ensemble',
 '_estimator_index',
 '_estimator_type',
 '_feature_names_in_',
 '_force_cancel',
 '_fullsize_reached',
 '_get_param_names',
 '_get_tags',
 '_hpo_method',
 '_iter_per_learner',
 '_iter_per_learner_fullsize',
 '_label_transformer',
 '_learner_selector',
 '_log_trial',
 '_log_type',
 '_max_iter',
 '_max_iter_per_learner',
 '_mem_thres',
 '_metric_co

In [23]:
automl.best_estimator

'lgbm'

In [24]:
automl.best_config

{'n_estimators': 183,
 'num_leaves': 17,
 'min_child_samples': 2,
 'learning_rate': 0.28491500484442356,
 'log_max_bin': 10,
 'colsample_bytree': 0.8144610352665659,
 'reg_alpha': 0.0009765625,
 'reg_lambda': 0.04528736269543623}

In [25]:
automl.best_loss

0.00021278220517995106

In [26]:
X_test = df_test.loc[:, df_test.columns != "diabetes"]
y_test = df_test.loc[:, "diabetes"]

In [27]:
from sklearn.metrics import accuracy_score

def calcular_accuracy_train_val(clf, X_train, y_train, X_val, y_val):
    #clf.fit(X_train,y_train)
    y_train_pred = clf.predict(X_train)
    print("Entrenamiento accuracy:",accuracy_score(y_train, y_train_pred))
    y_val_pred = clf.predict(X_val)
    print("Validacion accuracy:",accuracy_score(y_val, y_val_pred))
    return clf

In [28]:
clf = calcular_accuracy_train_val(automl, X_train, y_train, X_test, y_test)

Entrenamiento accuracy: 0.99935
Validacion accuracy: 0.9922
