# Árvore de Decisão

A árvore de decisão é uma estrutura de dados que pode ser representada como um conjunto de regras. As árvores classificam os dados percorrendo da raiz até chegar ao nó folha, este que representa as classes do conjunto. Ou seja, a árvore possui duas estruturas de nó: 

(a) Nó folha: Corresponde a uma classe. 
(b) Nó não-folha: Contém um teste de algum atributo dos dados. Suas ramificações correspondem a um dos valores possíveis para esse atributo.

A classificação é realizada começando no nó raiz, é feito a aplicação dos teste, de forma a averiguar qual ramificação será seguida: a sub-árvore à esquerda ou sub-árvore à direita. É executada a caminhada na árvore, até que um nó folha seja encontrado.

<b> Importe

In [37]:
import os
import numpy as np
import geopandas as gpd
import pandas as pd
from sklearn import tree
from sklearn import model_selection
import graphviz 

<b> Define o Dataframe

In [38]:
df = gpd.read_file("shapely_new/221_067_2017_new.shp")

<b> Imprime Dataframe

In [39]:
df.head()

Unnamed: 0,id,cod_sat,cena_id,nome_arq,orb_pto,area_ha,perim,n_arq_ant,ndvi,nbrl,...,focos,data_atual,data_anter,mirb,nbr,nbr2,bai,baim,ndwi,geometry
0,23946449,8,LC82210672017125LGN00,LC82210672017125LGN00.tar.gz,221_067,12.968645,2639,LC82210672017109LGN00.tar.bz,0.211523,-0.044128,...,2,2017-05-05,2017-04-19 00:00:00,-2.529071,-0.039766,0.10715,33.485539,23.624772,-0.146292,"POLYGON ((-46.9715795250224 -10.3363584942161,..."
1,23946450,8,LC82210672017125LGN00,LC82210672017125LGN00.tar.gz,221_067,1.26332,600,LC82210672017109LGN00.tar.bz,0.042114,-0.027508,...,0,2017-05-05,2017-04-19 00:00:00,-2.018163,-0.031193,0.029106,267.023785,44.180948,-0.060244,"POLYGON ((-46.358025045795 -10.3373571795547, ..."
2,23946451,8,LC82210672017125LGN00,LC82210672017125LGN00.tar.gz,221_067,7.56893,2340,LC82210672017109LGN00.tar.bz,0.306289,-0.003284,...,1,2017-05-05,2017-04-19 00:00:00,-2.318851,-0.055296,0.096054,115.204774,96.589363,-0.15055,"POLYGON ((-46.9860921817999 -10.3359982294198,..."
3,23946452,8,LC82210672017125LGN00,LC82210672017125LGN00.tar.gz,221_067,1.804195,660,LC82210672017109LGN00.tar.bz,0.256839,-0.017937,...,0,2017-05-05,2017-04-19 00:00:00,-3.017878,-0.018472,0.178214,28.506415,15.967674,-0.196041,"POLYGON ((-46.0128759745859 -10.3521908768478,..."
4,23946453,8,LC82210672017125LGN00,LC82210672017125LGN00.tar.gz,221_067,2.886433,1140,LC82210672017109LGN00.tar.bz,0.24024,0.000871,...,2,2017-05-05,2017-04-19 00:00:00,-2.498149,0.014029,0.125918,53.876698,43.300922,-0.112087,"POLYGON ((-46.3574643464459 -10.3343756813685,..."


<b> Define X e y

In [40]:
x = np.array(df[['nbrl', 'dif_ndvi', 'nbr', 'bai', 'ndwi', 'medianb2', 'medianb3', 'medianb4', 'medianb5']])
y = np.array(df[['verifica']])

<b> Define a configuração da arvore

In [41]:
clf = tree.DecisionTreeClassifier(max_depth = 10, max_leaf_nodes = 10)

<b> Reliza o fit

In [42]:
clf = clf.fit(x, y)

<b> Define a acurácia

In [43]:
kfold = model_selection.KFold(n_splits=10, random_state=7)
scoring = 'accuracy'
cv_results = model_selection.cross_val_score(clf, x, y, cv=kfold, scoring=scoring)
print(cv_results.mean(), cv_results.std())

0.7305041480536055 0.08904181719784843


<b> Define o grafico

In [44]:
dot_data = tree.export_graphviz(clf, out_file='tree.dot', 
                         feature_names=['nbrl', 'dif_ndvi', 'nbr', 'bai', 'ndwi', 'medianb2', 'medianb3', 'medianb4', 'medianb5'],  
                         #class_names=iris.target_names,  
                         filled=True, rounded=True,  
                         special_characters=True)

In [45]:
os.system("dot -Tpng tree.dot -o tree.png")

0

<b> Tree
<img src="img/tree.png" alt="drawing" width="700"/>