# Lazy Predict

Lazy Predict es una librería que permite  especificar y ajustar muchos modelos de forma rápida con muy pocas líneas de código. Los modelos que utiliza Lazy Predict están implementados en Scikit-Learn.

En este cuaderno vamos a aplicar muchos modelos diferentes a un mismo data-set y vamos a comparar los resultados. Vamos a poder visualizar de forma clara la gran variedad de modelos a disposición del analista a la hora de analizar los datos

Vamos a considerar modelos de regresión y modelos de clasificación




# Instalar e importar la libreria


In [4]:
# Instalar la librería Lazy Predictla

%pip install lazypredict

Collecting lazypredict
  Downloading lazypredict-0.2.12-py2.py3-none-any.whl (12 kB)
Installing collected packages: lazypredict
Successfully installed lazypredict-0.2.12


In [5]:
# Importar las librerías necesarias

from lazypredict.Supervised import LazyClassifier, LazyRegressor
from sklearn.model_selection import train_test_split
from sklearn import datasets

# Clasificacion

Vamos a utilizar un conjunto de datos de pacientes con el objetivo de ajustar un modelo capaz de diagnosticar si padecen o no padecen cancer a partir de una serie de mediciones de determinadas variables.

En este caso no entramos en los detalles del significado de las variables ya que lo que nos interesa es focalizarnos en la especificación, ajuste y evaluación de muchos modelos a un mismo data-set

In [None]:
# leer datos de sklearn y preparar los conjuntos de entrenamiento y test
datos = datasets.load_breast_cancer()
X, y = datos.data, datos.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=42)

# instanciar los modelos de clasificación
clasifica = LazyClassifier(predictions=True)

# ajustar, evaluar y predecir usando muchos modelos diferentes
modelos, predicciones = clasifica.fit(X_train, X_test, y_train, y_test)

In [7]:
# imprimir la puntuación obtenida por cada modelo en varias métricas
modelos

Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
BernoulliNB,0.98,0.98,0.98,0.98,0.04
PassiveAggressiveClassifier,0.98,0.98,0.98,0.98,0.02
SVC,0.98,0.98,0.98,0.98,0.04
Perceptron,0.97,0.97,0.97,0.97,0.02
AdaBoostClassifier,0.97,0.97,0.97,0.97,0.3
LogisticRegression,0.97,0.97,0.97,0.97,0.07
SGDClassifier,0.96,0.97,0.97,0.97,0.02
ExtraTreeClassifier,0.96,0.97,0.97,0.97,0.02
CalibratedClassifierCV,0.97,0.97,0.97,0.97,0.09
RandomForestClassifier,0.96,0.96,0.96,0.96,0.3


In [9]:
# imprimir las predicciones de cada modelo
predicciones

Unnamed: 0,AdaBoostClassifier,BaggingClassifier,BernoulliNB,CalibratedClassifierCV,DecisionTreeClassifier,DummyClassifier,ExtraTreeClassifier,ExtraTreesClassifier,GaussianNB,KNeighborsClassifier,...,PassiveAggressiveClassifier,Perceptron,QuadraticDiscriminantAnalysis,RandomForestClassifier,RidgeClassifier,RidgeClassifierCV,SGDClassifier,SVC,XGBClassifier,LGBMClassifier
0,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
1,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
4,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
109,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
110,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
111,1,1,1,1,1,1,1,1,1,1,...,1,1,1,1,1,1,1,1,1,1
112,1,1,1,1,1,1,0,1,1,1,...,0,1,1,1,1,1,0,1,1,1


# Regresion

Vamos a aplicar muchos modelos de regresión a un mismo data-set.

Cuando decimos "modelos de regresión" nos referimos a modelos para predecir una variable continua. Entre los modelos de regresión se encuentra el modelo de regresión lineal, pero existen otros modelos diferentes de regresión

In [10]:
# leer datos de sklearn y preparar los conjuntos de entrenamiento y test
from sklearn.datasets import load_diabetes
X, y = load_diabetes(return_X_y=True, as_frame=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=42)

# # instanciar los modelos de clasificación
reg = LazyRegressor(predictions=True)

# ajustar, evaluar y obtener predicciones
modelos, prediciones = reg.fit(X_train, X_test, y_train, y_test)

100%|██████████| 42/42 [00:02<00:00, 15.22it/s]

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000068 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 594
[LightGBM] [Info] Number of data points in the train set: 353, number of used features: 10
[LightGBM] [Info] Start training from score 153.736544





In [11]:
modelos

Unnamed: 0_level_0,Adjusted R-Squared,R-Squared,RMSE,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
LarsCV,0.4,0.47,52.87,0.03
LassoCV,0.4,0.47,52.92,0.1
LassoLarsCV,0.4,0.47,52.92,0.02
OrthogonalMatchingPursuitCV,0.4,0.47,53.02,0.02
ExtraTreesRegressor,0.4,0.47,53.07,0.22
Lasso,0.4,0.47,53.15,0.02
LassoLars,0.4,0.47,53.15,0.01
PoissonRegressor,0.39,0.46,53.43,0.04
ElasticNetCV,0.39,0.46,53.44,0.08
BayesianRidge,0.39,0.46,53.59,0.02


In [12]:
prediciones

Unnamed: 0,AdaBoostRegressor,BaggingRegressor,BayesianRidge,DecisionTreeRegressor,DummyRegressor,ElasticNet,ElasticNetCV,ExtraTreeRegressor,ExtraTreesRegressor,GammaRegressor,...,RANSACRegressor,RandomForestRegressor,Ridge,RidgeCV,SGDRegressor,SVR,TransformedTargetRegressor,TweedieRegressor,XGBRegressor,LGBMRegressor
0,159.68,169.30,141.59,206.00,153.74,152.10,143.51,128.00,154.82,151.87,...,170.12,146.21,139.86,139.86,141.22,142.35,139.55,155.86,183.56,156.72
1,186.21,165.70,180.01,99.00,153.74,169.13,178.30,25.00,163.97,156.17,...,204.20,171.58,179.96,179.96,182.30,142.77,179.52,165.01,196.34,168.30
2,159.21,143.70,140.57,170.00,153.74,150.92,142.89,134.00,161.24,146.34,...,64.20,150.62,135.72,135.72,140.79,141.80,134.04,153.83,140.24,158.74
3,246.39,257.40,291.52,268.00,153.74,265.90,287.30,310.00,257.40,284.40,...,365.32,255.12,292.12,292.12,294.17,152.16,291.42,253.32,251.64,285.51
4,130.09,143.40,122.93,183.00,153.74,135.93,124.89,51.00,100.97,134.83,...,163.89,107.19,123.19,123.19,121.09,136.81,123.79,141.66,155.96,114.75
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
84,103.00,91.60,112.44,49.00,153.74,118.71,113.07,51.00,90.99,119.29,...,114.40,84.81,114.07,114.07,111.13,123.61,115.01,123.38,63.36,70.59
85,98.09,62.00,85.81,52.00,153.74,97.08,88.00,42.00,90.89,102.82,...,75.78,76.63,80.65,80.65,84.65,119.52,78.96,103.69,69.00,66.13
86,100.78,103.40,80.95,74.00,153.74,84.07,80.90,152.00,85.96,93.67,...,77.29,92.75,81.38,81.38,79.84,124.95,81.56,89.35,68.34,85.43
87,102.29,72.90,62.55,74.00,153.74,76.35,65.00,134.00,66.25,92.40,...,92.42,76.01,56.44,56.44,61.32,118.70,54.38,85.59,59.99,70.95


# Comentario sobre los datos utilizados

Los datos que se han utilizado en este cuaderno están en la librería `sklearn`

Los datos se pueden leer de dos formas:
* DataFrame de pandas
* ndarray de numpy


## Leer los datos como un dataframe de Pandas

In [13]:
# Leer los datos para clasificacion como un dataframe de Pandas
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True, as_frame=True)


In [None]:
# Leer los datos para regresion como un dataframe de Pandas
from sklearn.datasets import load_diabetes
X, y = load_diabetes(return_X_y=True, as_frame=True)

## Leer los datos como un ndarray de Numpy

In [14]:
# Leer los datos para clasificacion como un ndarray de Numpy
from sklearn import datasets
datos = datasets.load_breast_cancer()
X, y = datos.data, datos.target


In [None]:
# Leer los datos para regresion como un ndarray de Numpy
datos = datasets.load_diabetes()
X, y = datos.data, datos.target

In [None]:
X

array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
         0.01990749, -0.01764613],
       [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
        -0.06833155, -0.09220405],
       [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
         0.00286131, -0.02593034],
       ...,
       [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
        -0.04688253,  0.01549073],
       [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
         0.04452873, -0.02593034],
       [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
        -0.00422151,  0.00306441]])