#### **Decision Tree**

---
Describes the possibility of an event occurring using as a basis the calculation of gain by entropy.

> Entropy define the degree of uncertainty, how much more entropy has in base of data, more uncertainty has.

That is, the classifications with greater weight in the overview will have greater influence on the result and be placed in the base of decision tree.

In [1]:
# General imports
from sklearn.tree import DecisionTreeClassifier

##### Base Credit Risk

In [19]:
import pickle
# add serialized data in file opened
with open('../examples/credit_risk.pkl', 'rb') as f:
  # formalize categories
  x_credit_risk, y_credit_risk = pickle.load(f)

In [None]:
# show formatted categories to trainning
x_credit_risk

In [None]:
# show results to trainning
y_credit_risk

In [None]:
# training algorithm
credit_risk_tree = DecisionTreeClassifier(criterion='entropy')
credit_risk_tree.fit(x_credit_risk, y_credit_risk)

In [None]:
# show importances values of categories fited
credit_risk_tree.feature_importances_

In [None]:
from sklearn import tree
# show tree of decision
tree.export_text(credit_risk_tree)

In [None]:
# examples to predict
## historia boa (0), dívida alta (0), garantias nenhuma (1), renda > 35 (2) # must be "baixo"
## historia ruim (2), dívida alta (0), garantias adequada (0), renda < 15 (0) # must be "moderado"

# predict algotithm
predict = credit_risk_tree.predict([[0,0,1,2], [2,0,0,0]])
# show predict
predict

##### Base Credit data - 98,2%

In [21]:
import pickle
# read credit.pkl file
with open('../examples/credit.pkl', 'rb') as f:
  x_credit_trainning, y_credit_trainning, x_credit_test, y_credit_test = pickle.load(f)

In [None]:
# training algorithm
credit_tree = DecisionTreeClassifier(criterion='entropy', random_state=0)
credit_tree.fit(x_credit_trainning, y_credit_trainning)

In [None]:
# predict
predict = credit_tree.predict(x_credit_test)
# show predict
predict

In [None]:
from sklearn.metrics import accuracy_score, classification_report
# calculate accuracy score between credit tests and predict
accuracy = accuracy_score(y_credit_test, predict)
# show accuracy
accuracy

In [None]:
# show classification reports
classification_report(y_credit_test, predict)

In [None]:
# show importances values of categories fited
credit_tree.feature_importances_

In [None]:
from sklearn import tree
# show tree of decision
tree.export_text(credit_tree)

##### Base Census - 81,0%

In [25]:
import pickle
# read census.pkl file
with open('../examples/census.pkl', 'rb') as f:
  x_census_trainning, y_census_trainning, x_census_test, y_census_test = pickle.load(f)

In [None]:
# training algorithm
census_tree = DecisionTreeClassifier(criterion='entropy', random_state=0)
census_tree.fit(x_census_trainning, y_census_trainning)

In [None]:
# predict
predict = census_tree.predict(x_census_test)
predict

In [None]:
from sklearn.metrics import accuracy_score, classification_report
# calculate accuracy score between census tests and predict
accuracy = accuracy_score(y_census_test, predict)
accuracy

In [None]:
# show classification reports
classification_report(y_census_test, predict)

In [None]:
# show importances values of categories fited
census_tree.feature_importances_

In [None]:
from sklearn import tree
# show tree of decision
tree.export_text(census_tree)

#### **Random Forest**

---

Ensemble Learning method to combine different decision trees in order to find the best performance.

Averages the results of the different trees to determine the best decision.

In [31]:
# General imports
from sklearn.ensemble import RandomForestClassifier

##### Base Credit Data - 98,2%

In [29]:
import pickle
# read credit.pkl file
with open('../examples/credit.pkl', 'rb') as f:
  x_credit_trainning, y_credit_trainning, x_credit_test, y_credit_test = pickle.load(f)

In [None]:
# fit algorithm
random_forest_credit = RandomForestClassifier(n_estimators=100, criterion='entropy', random_state=0)
# the more n_estimators the greater the accuracy
random_forest_credit.fit(x_credit_trainning, y_credit_trainning)

In [None]:
# predit algorithm
predict = random_forest_credit.predict(x_credit_test)
predict

In [None]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# accuracy score of compare between test and predict
accuracy_score(y_credit_test, predict)

In [None]:
# same index row in the same index column, are record count is correct
confusion_matrix(y_credit_test, predict)

In [None]:
# show classification reports
classification_report(y_credit_test, predict)

##### Base Census - 85,1%

In [50]:
import pickle
# read census.pkl file
with open('../examples/census.pkl', 'rb') as f:
  x_census_trainning, y_census_trainning, x_census_test, y_census_test = pickle.load(f)

In [None]:
# fit algorithm
random_forest_census = RandomForestClassifier(n_estimators=100, criterion='entropy', random_state=0)
# the more n_estimators the greater the accuracy
random_forest_census.fit(x_census_trainning, y_census_trainning)

In [None]:
# predit algorithm
predict = random_forest_census.predict(x_census_test)
predict

In [None]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# calculate accuracy score between census tests and predict
accuracy = accuracy_score(y_census_test, predict)
accuracy

In [None]:
# same index row in the same index column, are record count is correct
confusion_matrix(y_census_test, predict)

In [None]:
# show classification reports
classification_report(y_census_test, predict)