<font color="#CC3D3D"><p>
## Decision Tree Visualization

이번 실습을 위해 아래와 같이 필요한 SW(graphviz.exe) 및 패키지(graphviz, dtreeviz)를 설치해야 합니다.
1. graphviz-2.38.msi 다운로드
2. graphviz-2.38.msi을 실행시켜 graphviz 설치
3. 설치 후, 아래와 같이 Path 환경변수를 수정
  - 바탕화면에서 컴퓨터 아이콘을 마우스 오른쪽 단추로 클릭한 후 속성 선택
  - 고급 시스템 설정을 선택한 후 환경변수 버튼을 클릭
  - 시스템 변수 섹션에서 Path 환경 변수를 찾아 선택한 후 편집을 클릭
  - 새로 만들기 버튼을 눌러 입력란이 나타나면, 정확하게 `C:\Program Files (x86)\Graphviz2.38\bin`을 입력 후 확인 클릭
4. Anaconda Prompt에서 `pip install graphviz, dtreeviz` 실행

In [None]:
import pandas as pd
import matplotlib.pylab as plt
%matplotlib inline
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
from matplotlib import font_manager, rc
import platform
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from dtreeviz.trees import *

#### Reading data

In [None]:
data = pd.read_csv('purchase_history.csv', encoding='cp949')
data

#### Feature engineering

In [None]:
data = data.fillna(0)
data = pd.get_dummies(data)

X = data.drop(['custid', 'gender'], axis=1)
y = data.gender

X

#### Data splitting

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=123)

#### Modeing

In [None]:
clf = DecisionTreeClassifier(max_depth=3, random_state=123)
clf.fit(X_train, y_train).score(X_test, y_test)

#### Evaluation

In [None]:
x = range(1,10)
y1 = [DecisionTreeClassifier(max_depth=i, random_state=123).fit(X_train, y_train).score(X_train, y_train) for i in x]
y2 = [DecisionTreeClassifier(max_depth=i, random_state=123).fit(X_train, y_train).score(X_test, y_test) for i in x]
plt.plot(x,y1,label='train')
plt.plot(x,y2,label='test')
plt.xlabel('depth of tree')
plt.ylabel('accuracy')
plt.legend()
plt.show()

#### Check feature importance

In [None]:
# 차트에서 한글 출력을 위한 설정
your_os = platform.system()
if your_os == 'Linux':
    rc('font', family='NanumGothic')
elif your_os == 'Windows':
    ttf = "c:/Windows/Fonts/malgun.ttf"
    font_name = font_manager.FontProperties(fname=ttf).get_name()
    rc('font', family=font_name)
elif your_os == 'Darwin':
    rc('font', family='AppleGothic')
rc('axes', unicode_minus=False)

In [None]:
plt.figure(figsize=(10,15))
sns.barplot(x=clf.feature_importances_, y=data.columns[2:])

#### Visualize decision tree using [graphviz](http://scikit-learn.org/stable/modules/tree.html)

In [None]:
from sklearn.tree import export_graphviz
import graphviz

export_graphviz(clf, out_file="tree.dot",
                feature_names=X_train.columns,
                class_names=['여자','남자'],
                filled=True, rounded=True,
                special_characters=True)  

with open("tree.dot", encoding='utf-8') as f:  # 한글 처리
    dot_graph = f.read()
    dot = graphviz.Source(dot_graph)
    dot.format = 'png'
    dot.render(filename='tree', cleanup=True)
    
dot

### Visualize decision tree using [dtreeviz](https://github.com/parrt/dtreeviz)

In [None]:
from sklearn.datasets import load_iris

In [None]:
classifier = tree.DecisionTreeClassifier(max_depth=2, random_state=0) 
iris = load_iris()
classifier.fit(iris.data, iris.target)

In [None]:
viz = dtreeviz(classifier, 
               iris.data, 
               iris.target,
               target_name='variety',
               feature_names=iris.feature_names, 
               class_names=["setosa", "versicolor", "virginica"]  # need class_names for classifier
              )  
              
viz

In [None]:
# Decision tree without scatterplot or histograms for decision nodes

viz = dtreeviz(classifier,
               iris.data, 
               iris.target,
               target_name='variety',
               feature_names=iris.feature_names, 
               class_names=["setosa", "versicolor", "virginica"],
               fancy=False )  # fance=False to remove histograms/scatterplots from decision nodes
              
viz

In [None]:
# Prediction path

X = iris.data[np.random.randint(0, len(iris.data)),:]  # random sample from training
viz = dtreeviz(classifier,
               iris.data, 
               iris.target,
               target_name='variety',
               feature_names=iris.feature_names, 
               class_names=["setosa", "versicolor", "virginica"],
               orientation ='LR',  # left-right orientation
               X=X)  # need to give single observation for prediction
              
viz

<font color="#CC3D3D"><p>
## End