# Arvore de Decisão

**Importe as bibliotecas necessárias**

In [6]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.tree import export_graphviz
from sklearn import tree
from sklearn.datasets import load_iris

  return f(*args, **kwds)


**Carregando a base de dados iris**

In [7]:
iris = load_iris()

**Verificando a descrição do dataset**

In [9]:

print(iris.DESCR)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :

**Verificando features_names, target e dados**

In [10]:
iris.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [11]:
iris.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [12]:
iris.data

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1],
       [5.4, 3.7, 1.5, 0.2],
       [4.8, 3.4, 1.6, 0.2],
       [4.8, 3. , 1.4, 0.1],
       [4.3, 3. , 1.1, 0.1],
       [5.8, 4. , 1.2, 0.2],
       [5.7, 4.4, 1.5, 0.4],
       [5.4, 3.9, 1.3, 0.4],
       [5.1, 3.5, 1.4, 0.3],
       [5.7, 3.8, 1.7, 0.3],
       [5.1, 3.8, 1.5, 0.3],
       [5.4, 3.4, 1.7, 0.2],
       [5.1, 3.7, 1.5, 0.4],
       [4.6, 3.6, 1. , 0.2],
       [5.1, 3.3, 1.7, 0.5],
       [4.8, 3.4, 1.9, 0.2],
       [5. , 3. , 1.6, 0.2],
       [5. , 3.4, 1.6, 0.4],
       [5.2, 3.5, 1.5, 0.2],
       [5.2, 3.4, 1.4, 0.2],
       [4.7, 3.2, 1.6, 0.2],
       [4.8, 3.1, 1.6, 0.2],
       [5.4, 3.4, 1.5, 0.4],
       [5.2, 4.1, 1.5, 0.1],
       [5.5, 4.2, 1.4, 0.2],
       [4.9, 3

**Dividindo os dados em treino e teste**

In [13]:
from sklearn.model_selection import train_test_split

In [14]:
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.3)

**Verificando a forma dos dados**

In [15]:
X_train.shape, X_test.shape

((105, 4), (45, 4))

**Instância do classificador**

In [16]:
clf = tree.DecisionTreeClassifier()

#### Parâmetros do  classificador DecisionTreeClassifier
* criterion: medida de qualidade da divisão
* splitter: estratégia utilizada para dividir o nó de decisão
* max_depth: profundidade máxima da árvore
* min_samples_split: número de amostras mínimas para considerar um nó para divisão
* min_samples_leaf: número de amostras mínimas no nível folha

**Treinando o Modelo**

In [17]:
clf = clf.fit(X_train, y_train)

**Verificando as features mais importantes para o modelo de arvore de decisão**

In [19]:
clf.feature_importances_

array([0.01428571, 0.03809524, 0.43417367, 0.51344538])

In [21]:
for x,y in zip(iris.feature_names, clf.feature_importances_):
    print(f'{x}:{y}')

sepal length (cm):0.014285714285714285
sepal width (cm):0.03809523809523809
petal length (cm):0.4341736694677871
petal width (cm):0.5134453781512606


**Executando o algoritmo de arvore de decisão com o conjunto de teste**

In [37]:
resultado = clf.predict(X_test)
resultado

array([0, 2, 1, 0, 1, 0, 0, 0, 2, 0, 1, 2, 1, 1, 1, 0, 1, 0, 0, 1, 2, 2,
       1, 0, 2, 1, 1, 1, 2, 1, 0, 2, 0, 2, 0, 2, 2, 1, 1, 1, 2, 0, 2, 0,
       2])

**Testando uma nova amostra**

In [38]:
teste = np.array([[5.1, 3.5, 1.4, 0.1]])

In [39]:
clf.predict(teste)

array([0])

**Verificando a probalidade**

In [43]:
clf.predict_proba(teste).round(2)

array([[1., 0., 0.]])

In [41]:
from sklearn import metrics
print(metrics.classification_report(y_test, resultado,target_names=iris.target_names))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        15
  versicolor       0.81      0.87      0.84        15
   virginica       0.86      0.80      0.83        15

    accuracy                           0.89        45
   macro avg       0.89      0.89      0.89        45
weighted avg       0.89      0.89      0.89        45



**Biblioteca para utilização widgets interativos**

In [44]:
!pip install ipywidgets



**Bibliotecas para visualização da arvore de decisão**

In [45]:
!pip install pydot

Collecting pydot
  Downloading https://files.pythonhosted.org/packages/33/d1/b1479a770f66d962f545c2101630ce1d5592d90cb4f083d38862e93d16d2/pydot-1.4.1-py2.py3-none-any.whl
Installing collected packages: pydot
Successfully installed pydot-1.4.1


In [47]:
!pip install graphviz

Collecting graphviz
  Downloading https://files.pythonhosted.org/packages/f5/74/dbed754c0abd63768d3a7a7b472da35b08ac442cf87d73d5850a6f32391e/graphviz-0.13.2-py2.py3-none-any.whl
Installing collected packages: graphviz
Successfully installed graphviz-0.13.2


**Importando as bibliotecas instaladas**

In [48]:
import pydot
import graphviz
from ipywidgets import interactive

**Visualizando de forma gráfica a arvore gerada**

In [55]:
dot_data = tree.export_graphviz(
        clf, # Modelo
        out_file=None, #Se deseja exporta o arquivo
        feature_names=iris.feature_names, # Features do dataset
        class_names=iris.target_names, # Alvo do data set
        filled=True, # Preencher a caixa em volta
        rounded= True, # Borda arredondada
        proportion=True, # Proporção das imagens
        node_ids=True, # Identificação
        rotate=False, # Rodar o gráfico 
        label='all', # Mostrar todos os labels
        special_characters=True # Mostra carácter especiais
)

graph = graphviz.Source(dot_data)
graph

ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH

<graphviz.files.Source at 0x2e617d591c8>

In [59]:
from ipywidgets import interactive
from IPython.display import SVG, display
from graphviz import Source

**Realizando a alteração de parâmetro de forma interativa**

In [58]:
# load dataset
data = load_iris()

# feature matrix
X = data.data

# target vector
y = data.target

# feature labels
features_label = data.feature_names

# class label
class_label = data.target_names


def plot_tree(crit, split, depth, min_split, min_leaf=0.2):
    estimator = tree.DecisionTreeClassifier(
           random_state = 0 
          ,criterion = crit
          ,splitter = split
          ,max_depth = depth
          ,min_samples_split=min_split
          ,min_samples_leaf=min_leaf
    )

    estimator.fit(X, y) # Treino do modelo
    graph = Source(tree.export_graphviz(estimator 
      , out_file=None
      , feature_names=features_label
      , class_names=class_label
      , filled = True))
    display(SVG(graph.pipe(format='svg'))) # Exibição do gráfico
    return estimator

inter=interactive(plot_tree  # Função para plotar a arvore
   , crit = ["gini", "entropy"] # Construção do Menu
   , split = ["best", "random"]
   , depth=[1,2,3,4,5]
   , min_split=(0.1,1)
   , min_leaf=(0.1,0.5))

display(inter)

interactive(children=(Dropdown(description='crit', options=('gini', 'entropy'), value='gini'), Dropdown(descri…