##**Segunda iteración de modelos de Machine Learning clásicos**##

###Instrucciones:

El notebook se puede ejecutar linealmente con el archivo train.csv, el cual se encuentra en la carpeta del drive llamada DATASET o se puede consultar en el siguiente link de la competencia de Kaggle: https://www.kaggle.com/c/petfinder-adoption-prediction/data

##**Instalación e importe de librerías**##

In [None]:
#Instalando lazypredict
!pip install lazypredict

In [None]:
#Instalando versión de pandas 
!pip install pandas==1.1.0

In [None]:
#Versión de la librería de pandas
pd.__version__

'1.1.0'

In [None]:
#Instalando versión de folium
!pip install folium==0.2.1

In [None]:
#Instalando versión de imgaug
!pip install imgaug==0.2.5

In [None]:
#Instalando la version 0.24 de scikit-learn
!pip install scikit-learn==0.24

In [None]:
#Importar librerías
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
sns.set_theme(style="darkgrid")
import warnings
warnings.filterwarnings("ignore")
import IPython
import sys
import joblib
sys.modules['sklearn.externals.joblib'] = joblib
from sklearn.neighbors import KNeighborsClassifier
from mlxtend.feature_selection import SequentialFeatureSelector as SFS
from sklearn import svm
from lazypredict.Supervised import LazyClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

##**Lectura de archivos**##

In [None]:
#Importando el drive al colab
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [None]:
#Importando el dataset de la carpeta del drive
!ls '/content/gdrive/My Drive/MONOGRAFIA/DATASET'

 breed_labels.csv		 petfinder-adoption-prediction.zip
 BreedLabels.csv		 PetFinder-BreedLabels.csv
'Clasificación imagenes.ipynb'	 PetFinder-ColorLabels.csv
 color_labels.csv		 PetFinder-StateLabels.csv
 ColorLabels.csv		 state_labels.csv
'Copia de BreedLabels.csv'	 StateLabels.csv
'Copia de ColorLabels.csv'	 test
'Copia de state_labels.csv'	 test_sentiment
'Copia de StateLabels.csv'	 train
 fc9cf8b8d-1.jpg		 train_images
 ImagenesMuestra		 train_metadata
 Imagenes_Org			 train_sentiment


In [None]:
#Lectura del archivo de datos para el entrenamiento
train = pd.read_csv('/content/gdrive/My Drive/MONOGRAFIA/DATASET/train/train.csv') 
train.head(5)

Unnamed: 0,Type,Name,Age,Breed1,Breed2,Gender,Color1,Color2,Color3,MaturitySize,FurLength,Vaccinated,Dewormed,Sterilized,Health,Quantity,Fee,State,RescuerID,VideoAmt,Description,PetID,PhotoAmt,AdoptionSpeed
0,2,Nibble,3,299,0,1,1,7,0,1,1,2,2,2,1,1,100,41326,8480853f516546f6cf33aa88cd76c379,0,Nibble is a 3+ month old ball of cuteness. He ...,86e1089a3,1.0,2
1,2,No Name Yet,1,265,0,1,1,2,0,2,2,3,3,3,1,1,0,41401,3082c7125d8fb66f7dd4bff4192c8b14,0,I just found it alone yesterday near my apartm...,6296e909a,2.0,0
2,1,Brisco,1,307,0,1,2,7,0,2,2,1,1,2,1,1,0,41326,fa90fa5b1ee11c86938398b60abc32cb,0,Their pregnant mother was dumped by her irresp...,3422e4906,7.0,3
3,1,Miko,4,307,0,2,1,2,0,2,1,1,1,2,1,1,150,41401,9238e4f44c71a75282e62f7136c6b240,0,"Good guard dog, very alert, active, obedience ...",5842f1ff5,8.0,2
4,1,Hunter,1,307,0,1,1,0,0,2,1,2,2,2,1,1,0,41326,95481e953f8aed9ec3d16fc4509537e8,0,This handsome yet cute boy is up for adoption....,850a43f90,3.0,2


##**Preprocesamiento de los datos**##

In [None]:
#Convirtiendo las variables categóricas a un one hot encoding
breed1_dummy = pd.get_dummies(train['Breed1'],prefix='Breed1')
breed2_dummy = pd.get_dummies(train['Breed2'],prefix='Breed2')
gender_dummy = pd.get_dummies(train['Gender'],prefix='Gender')
color1_dummy = pd.get_dummies(train['Color1'],prefix='Color1')
color2_dummy = pd.get_dummies(train['Color2'],prefix='Color2')
color3_dummy = pd.get_dummies(train['Color3'],prefix='Color3')
MaturitySize_dummy = pd.get_dummies(train['MaturitySize'],prefix='MaturitySize')
FurLength_dummy = pd.get_dummies(train['FurLength'],prefix='FurLength')
Vaccinated_dummy = pd.get_dummies(train['Vaccinated'],prefix='Vaccinated')
Dewormed_dummy = pd.get_dummies(train['Dewormed'],prefix='Dewormed')
Sterilized_dummy = pd.get_dummies(train['Sterilized'],prefix='Sterilized')
Health_dummy = pd.get_dummies(train['Health'],prefix='Health')
State_dummy = pd.get_dummies(train['State'],prefix='State')

In [None]:
#Añadiendo a los datos las nuevas variables con one hot encoding
train = pd.concat([train,breed1_dummy],axis=1) # axis = 1 Columnas
train = pd.concat([train,breed2_dummy],axis=1)
train = pd.concat([train,gender_dummy],axis=1)
train = pd.concat([train,color1_dummy],axis=1)
train = pd.concat([train,color2_dummy],axis=1)
train = pd.concat([train,color3_dummy],axis=1)
train = pd.concat([train,MaturitySize_dummy],axis=1)
train = pd.concat([train,FurLength_dummy],axis=1)
train = pd.concat([train,Sterilized_dummy],axis=1)
train = pd.concat([train,Vaccinated_dummy],axis=1)
train = pd.concat([train,Dewormed_dummy],axis=1)
train = pd.concat([train,Health_dummy],axis=1)
train = pd.concat([train,State_dummy],axis=1)

In [None]:
#Eliminando las variables que no se necesitan para X
X = train.drop(['Breed1','Name','Breed2','Gender','Color1','Color2','Color3','MaturitySize','FurLength','Vaccinated','Dewormed','Sterilized','Health','State','RescuerID','Description','PetID', 'AdoptionSpeed'], axis=1)

In [None]:
#Lectura de la variable X
X

Unnamed: 0,Type,Age,Quantity,Fee,VideoAmt,PhotoAmt,Breed1_0,Breed1_1,Breed1_3,Breed1_5,Breed1_7,Breed1_10,Breed1_11,Breed1_15,Breed1_16,Breed1_17,Breed1_18,Breed1_19,Breed1_20,Breed1_21,Breed1_23,Breed1_24,Breed1_25,Breed1_26,Breed1_31,Breed1_32,Breed1_39,Breed1_42,Breed1_44,Breed1_49,Breed1_50,Breed1_56,Breed1_58,Breed1_60,Breed1_61,Breed1_64,Breed1_65,Breed1_69,Breed1_70,Breed1_71,...,Color2_7,Color3_0,Color3_3,Color3_4,Color3_5,Color3_6,Color3_7,MaturitySize_1,MaturitySize_2,MaturitySize_3,MaturitySize_4,FurLength_1,FurLength_2,FurLength_3,Sterilized_1,Sterilized_2,Sterilized_3,Vaccinated_1,Vaccinated_2,Vaccinated_3,Dewormed_1,Dewormed_2,Dewormed_3,Health_1,Health_2,Health_3,State_41324,State_41325,State_41326,State_41327,State_41330,State_41332,State_41335,State_41336,State_41342,State_41345,State_41361,State_41367,State_41401,State_41415
0,2,3,1,100,0,1.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,1,1,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
1,2,1,1,0,0,2.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
2,1,1,1,0,0,7.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,1,1,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
3,1,4,1,150,0,8.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
4,1,1,1,0,0,3.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14988,2,2,4,0,0,3.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
14989,2,60,2,0,0,3.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
14990,2,2,5,30,0,5.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,1,0,1,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
14991,2,9,1,0,0,3.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,1,1,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


##**Modelos clásicos de Machine Learning**##

In [None]:
#Lectura de la variable y
y = train[['AdoptionSpeed']]

In [None]:
#Partiendo los datos en entrenamiento y test
#Entrenando el modelo con lazyClassifier
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size = 0.3,random_state=42)

clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None)
models,predictions = clf.fit(X_train, X_test, y_train, y_test)
models

100%|██████████| 29/29 [10:05<00:00, 20.87s/it]


Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
XGBClassifier,0.41,0.33,,0.39,61.91
LGBMClassifier,0.4,0.32,,0.38,3.67
RandomForestClassifier,0.38,0.31,,0.37,4.67
BaggingClassifier,0.37,0.31,,0.37,1.86
AdaBoostClassifier,0.39,0.31,,0.36,2.58
NearestCentroid,0.32,0.3,,0.32,0.28
ExtraTreesClassifier,0.36,0.3,,0.36,5.45
LinearDiscriminantAnalysis,0.36,0.29,,0.34,1.24
LogisticRegression,0.37,0.29,,0.35,2.71
BernoulliNB,0.35,0.29,,0.34,0.31
