# Árboles de decisión - clasificación
Un árbol de decisión es un modelo de predicción utilizado en diversos ámbitos que van desde la inteligencia artificial hasta la economía.

Dado un conjunto de datos se fabrican diagramas de construcciones lógicas que sirven para representar y categorizar una serie de condiciones que ocurren de forma sucesiva para la resolución de un problema.

Aquí vemos un diagrama o un árbol de decisión de ejemplo donde en función del número de unidades solicitadas y dependiendo del envío sea a España o a Europa pues acaba aplicando un descuento u otro.

<img src="https://upload.wikimedia.org/wikipedia/commons/f/fb/Arbol_decision.jpg">

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
vinos=pd.read_csv('vino.csv')
vinos.head(5)

Unnamed: 0,Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280/OD315 of diluted wines,Proline,Wine Type
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,One
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0,One
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0,One
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0,One
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0,One


In [6]:
vinos['Wine Type'].unique()

array(['One', 'Two', 'Three'], dtype=object)

In [7]:
vinos['Wine Type'].value_counts()

Wine Type
Two      71
One      59
Three    48
Name: count, dtype: int64

In [8]:
y=vinos['Wine Type'] #variable a predecir
X=vinos.drop('Wine Type',axis=1)

In [11]:
from sklearn.model_selection import train_test_split

In [12]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3)

In [14]:
from sklearn.tree import DecisionTreeClassifier

In [15]:
arbol=DecisionTreeClassifier()

In [16]:
arbol.fit(X_train,y_train)

In [17]:
predicciones=arbol.predict(X_test)

In [18]:
predicciones

array(['Three', 'Three', 'Two', 'Three', 'Two', 'Three', 'Two', 'One',
       'One', 'One', 'Two', 'One', 'Two', 'One', 'One', 'Two', 'Two',
       'One', 'One', 'Two', 'One', 'Two', 'Two', 'One', 'One', 'One',
       'Three', 'Two', 'Three', 'Two', 'Two', 'One', 'One', 'One', 'One',
       'Two', 'Two', 'Two', 'One', 'Two', 'One', 'One', 'Two', 'One',
       'Three', 'Three', 'One', 'Two', 'Two', 'Three', 'Three', 'Two',
       'One', 'One'], dtype=object)

In [19]:
y_test

176    Three
175    Three
127      Two
155    Three
63       Two
134    Three
60       Two
122      Two
20       One
10       One
121      Two
0        One
116      Two
39       One
79       Two
4        One
111      Two
124      Two
11       One
114      Two
58       One
105      Two
94       Two
44       One
53       One
8        One
151    Three
65       Two
167    Three
87       Two
71       Two
26       One
18       One
35       One
28       One
92       Two
118      Two
103      Two
1        One
140    Three
32       One
7        One
108      Two
129      Two
159    Three
153    Three
14       One
107      Two
99       Two
171    Three
137    Three
77       Two
27       One
46       One
Name: Wine Type, dtype: object

In [20]:
#evaluamos el modelo
from sklearn.metrics import classification_report,confusion_matrix

In [21]:
print(classification_report(y_test,predicciones))

              precision    recall  f1-score   support

         One       0.83      0.95      0.88        20
       Three       1.00      0.91      0.95        11
         Two       0.90      0.83      0.86        23

    accuracy                           0.89        54
   macro avg       0.91      0.90      0.90        54
weighted avg       0.90      0.89      0.89        54



In [22]:
print(confusion_matrix(y_test,predicciones))

[[19  0  1]
 [ 0 10  1]
 [ 4  0 19]]


# Random Forest - Bosqes aleatorios
Random forest es una combinación de árboles de decisión donde cada árbol selecciona una clase y luego se combinan las decisiones de cada árbol para seleccionar una clase final ganadora.

Es uno de los algoritmos de aprendizaje de clasificación con mayor precisión.

Funciona eficientemente en bases de datos grandes y puede manejar cientos de variables de entrada.

<img src="https://interactivechaos.com/sites/default/files/2023-01/random_forest.png" >

In [24]:
vinos2=pd.read_csv('vino.csv')
vinos2

Unnamed: 0,Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280/OD315 of diluted wines,Proline,Wine Type
0,14.23,1.71,2.43,15.6,127.0,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,One
1,13.20,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050.0,One
2,13.16,2.36,2.67,18.6,101.0,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185.0,One
3,14.37,1.95,2.50,16.8,113.0,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480.0,One
4,13.24,2.59,2.87,21.0,118.0,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735.0,One
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,13.71,5.65,2.45,20.5,95.0,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740.0,Three
174,13.40,3.91,2.48,23.0,102.0,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750.0,Three
175,13.27,4.28,2.26,20.0,120.0,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835.0,Three
176,13.17,2.59,2.37,20.0,120.0,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840.0,Three


In [26]:
y=vinos2['Wine Type'] #variable a predecir
X=vinos2.drop('Wine Type',axis=1)

In [27]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3)

In [28]:
from sklearn.ensemble import RandomForestClassifier

In [29]:
randomforest=RandomForestClassifier(n_estimators=80) #n_estimators es el número de árboles que queremos que use

In [30]:
randomforest.fit(X_train,y_train)

In [32]:
predicciones=randomforest.predict(X_test)

In [33]:
print(classification_report(y_test,predicciones))
print(confusion_matrix(y_test,predicciones))

              precision    recall  f1-score   support

         One       1.00      1.00      1.00        18
       Three       0.92      1.00      0.96        12
         Two       1.00      0.96      0.98        24

    accuracy                           0.98        54
   macro avg       0.97      0.99      0.98        54
weighted avg       0.98      0.98      0.98        54

[[18  0  0]
 [ 0 12  0]
 [ 0  1 23]]
