## <b>■ Pandas를 이용한 머신러닝 수업</b>
    1. Pandas DataFrame & Series
    2. 외부 데이터 파일을 파이썬으로 불러오는 방법
    3. Pandas DataFrame 기본 활용(통계, 시각화)
    4. 데이터 살펴보기 (시각화 : matplotlib, seaborn)
    5. 데이터 전처리1
    6. 데이터 전처리2
    7. 파이썬으로 머신러닝 구현하기 (ncs 평가 = Kaggle 순위)
        - kNN (유방암, 타이타닉)
        - naiveBayes (유방암, 타이타닉)
        - Decision Tree
        - Random Forest
        - 회귀분석 (단순 / 다중회귀)
        - 로지스틱 회귀

### <b>■ 파이썬으로 Decision Tree 구현하기</b>
    Decision Tree : 의사결정 나무
    컴퓨터 알고리즘에서 즐겨 사용하는 알고리즘이 Tree 구조인데,
    이 tree 구조를 사용하고 각 분기점(node)에는 분석 대상의 속성들이 위치
    각 분기점마다 목표값을 가장 잘 분류할 수 있는 속성을 찾아서 배치, 속성이 갖는 값을 이용하여 새로운 가지 생성
    각 분기점에서 최적의 속성을 선택할 때는 해당 속성을 기준으로 분류한 값들이 구분되는 정도를 측정
    이 측정 척도가 바로 엔트로피(entropy)
    
#### <b>예제 : 유방암 데이터의 악성 종양과 양성 종양을 예측하는 의사결정 머신러닝 모델</b>
    % 머신러닝 데이터 분석 순서 %
    1. 데이터 불러오기
    2. 데이터 살펴보기 ( 데이터 전처리 )
    3. 데이터 정규화
    4. 훈련 데이터와 테스트 데이터를 나눈다.
    5. 훈련 데이터로 의사결정트리 모델을 생성한다.
    6. 테스트 데이터로 모델의 성능을 확인한다.
    7. 모델의 성능을 향상시킨다

In [2]:
# -*- coding: utf-8 -*-
### 기본 라이브러리 불러오기
import pandas as pd
import numpy as np
'''
[Step 1] 데이터 준비/ 기본 설정
'''
# Breast Cancer 데이터셋 가져오기 (출처: UCI ML Repository)
uci_path = 'https://archive.ics.uci.edu/ml/machine-learning-databases/\
breast-cancer-wisconsin/breast-cancer-wisconsin.data'
df = pd.read_csv(uci_path, header=None)
# 열 이름 지정
df.columns = ['id','clump','cell_size','cell_shape', 'adhesion','epithlial',
 'bare_nuclei','chromatin','normal_nucleoli', 'mitoses', 'class'] 
df

Unnamed: 0,id,clump,cell_size,cell_shape,adhesion,epithlial,bare_nuclei,chromatin,normal_nucleoli,mitoses,class
0,1000025,5,1,1,1,2,1,3,1,1,2
1,1002945,5,4,4,5,7,10,3,2,1,2
2,1015425,3,1,1,1,2,2,3,1,1,2
3,1016277,6,8,8,1,3,4,3,7,1,2
4,1017023,4,1,1,3,2,1,3,1,1,2
...,...,...,...,...,...,...,...,...,...,...,...
694,776715,3,1,1,1,3,2,1,1,1,2
695,841769,2,1,1,1,2,1,1,1,1,2
696,888820,5,10,10,3,7,3,8,10,2,4
697,897471,4,8,6,4,3,4,10,6,1,4


In [3]:
# IPython 디스플레이 설정 - 출력할 열의 개수 한도 늘리기
pd.set_option('display.max_columns', 15)
'''
[Step 2] 데이터 탐색
'''
# 데이터 살펴보기
print(df.head()) 
print('\n')
# 데이터 자료형 확인
print(df.info()) 
print('\n')

        id  clump  cell_size  cell_shape  adhesion  epithlial bare_nuclei  \
0  1000025      5          1           1         1          2           1   
1  1002945      5          4           4         5          7          10   
2  1015425      3          1           1         1          2           2   
3  1016277      6          8           8         1          3           4   
4  1017023      4          1           1         3          2           1   

   chromatin  normal_nucleoli  mitoses  class  
0          3                1        1      2  
1          3                2        1      2  
2          3                1        1      2  
3          3                7        1      2  
4          3                1        1      2  


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 699 entries, 0 to 698
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   id               699 non-null    int64 
 1   cl

    bare_nuclei 컬럼만 object이다. 이 컬럼의 데이터 타입을 int(숫자형)으로 변환 해줘야 한다.

In [4]:
# 데이터 통계 요약정보 확인
print(df.describe())
print('\n')

                 id       clump   cell_size  cell_shape    adhesion  \
count  6.990000e+02  699.000000  699.000000  699.000000  699.000000   
mean   1.071704e+06    4.417740    3.134478    3.207439    2.806867   
std    6.170957e+05    2.815741    3.051459    2.971913    2.855379   
min    6.163400e+04    1.000000    1.000000    1.000000    1.000000   
25%    8.706885e+05    2.000000    1.000000    1.000000    1.000000   
50%    1.171710e+06    4.000000    1.000000    1.000000    1.000000   
75%    1.238298e+06    6.000000    5.000000    5.000000    4.000000   
max    1.345435e+07   10.000000   10.000000   10.000000   10.000000   

        epithlial   chromatin  normal_nucleoli     mitoses       class  
count  699.000000  699.000000       699.000000  699.000000  699.000000  
mean     3.216023    3.437768         2.866953    1.589413    2.689557  
std      2.214300    2.438364         3.053634    1.715078    0.951273  
min      1.000000    1.000000         1.000000    1.000000    2.0000

In [5]:
# bare_nuclei 열의 자료형 변경 (문자열 ->숫자)
print(df['bare_nuclei'].unique()) # bare_nuclei 열의 고유값 확인
print('\n')

['1' '10' '2' '4' '3' '9' '7' '?' '5' '8' '6']




In [8]:
df['bare_nuclei'].replace('?', np.nan, inplace=True) # '?'을 np.nan으로 변경
df.dropna(subset=['bare_nuclei'], axis=0, inplace=True) # 누락데이터 행을 삭제
df['bare_nuclei'] = df['bare_nuclei'].astype('int') # 문자열을 정수형으로 변환
print(df.info())
print('\n')

<class 'pandas.core.frame.DataFrame'>
Int64Index: 683 entries, 0 to 698
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype
---  ------           --------------  -----
 0   id               683 non-null    int64
 1   clump            683 non-null    int64
 2   cell_size        683 non-null    int64
 3   cell_shape       683 non-null    int64
 4   adhesion         683 non-null    int64
 5   epithlial        683 non-null    int64
 6   bare_nuclei      683 non-null    int32
 7   chromatin        683 non-null    int64
 8   normal_nucleoli  683 non-null    int64
 9   mitoses          683 non-null    int64
 10  class            683 non-null    int64
dtypes: int32(1), int64(10)
memory usage: 61.4 KB
None




In [9]:
'''
[Step 3] 데이터셋 구분 - 훈련용(train data)/ 검증용(test data)
'''
# id 컬럼은 필요없으므로 제외시키고 다른 훈련할 때 필요한 컬럼들을 선별힌다.
# 속성(변수) 선택
X=df[['clump','cell_size','cell_shape', 'adhesion','epithlial',
 'bare_nuclei','chromatin','normal_nucleoli', 'mitoses']] #설명 변수 X
y=df['class'] #예측 변수 Y
# 설명 변수 데이터를 정규화
from sklearn import preprocessing
X = preprocessing.StandardScaler().fit(X).transform(X)
pd.DataFrame(X).describe()
# numpy를 dataframe으로 만들어줘서 통계 요약정보를 확인한다.

Unnamed: 0,0,1,2,3,4,5,6,7,8
count,683.0,683.0,683.0,683.0,683.0,683.0,683.0,683.0,683.0
mean,2.813757e-16,-2.899909e-16,-4.431139e-16,2.144047e-16,-5.439768e-16,-1.563415e-15,-2.223697e-16,-7.737425000000001e-17,1.050404e-15
std,1.000733,1.000733,1.000733,1.000733,1.000733,1.000733,1.000733,1.000733,1.000733
min,-1.221191,-0.702212,-0.7417736,-0.6393655,-1.005763,-0.6988531,-0.9988531,-0.6129274,-0.3483997
25%,-0.8664174,-0.702212,-0.7417736,-0.6393655,-0.5556085,-0.6988531,-0.5903401,-0.6129274,-0.3483997
50%,-0.1568693,-0.702212,-0.7417736,-0.6393655,-0.5556085,-0.6988531,-0.1818272,-0.6129274,-0.3483997
75%,0.5526787,0.6037398,0.5976352,0.4086824,0.3447014,0.6743249,0.6351988,0.3705403,-0.3483997
max,1.971775,2.23618,2.271896,2.504778,3.045631,1.772867,2.677764,2.337476,4.84969


In [10]:
# train data 와 test data로 구분(7:3 비율)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=10) 
print('train data 개수: ', X_train.shape)
print('test data 개수: ', X_test.shape)
print('\n')

train data 개수:  (478, 9)
test data 개수:  (205, 9)




In [11]:
'''
[Step 4] Decision Tree 분류 모형 - sklearn 사용
'''
# sklearn 라이브러리에서 Decision Tree 분류 모형 가져오기
from sklearn import tree
# 모형 객체 생성 (criterion='entropy' 적용)
tree_model = tree.DecisionTreeClassifier(criterion='entropy', max_depth=5)

    설명 : 분류정도를 평가하는 기준으로 'entropy' 값을 사용하겠다는 의미
     max_depth = 5 는 트리 레벨을 5로 지정해서 가지의 확장을 5단계까지 확장시키겠다는 의미
     레벨이 많아질수록 모형 학습에 사용하는 훈련 데이터에 대한 예측은 정확해진다.
     그러나 모형이 훈련 데이터에 대해서만 지나치게 최적화 되어 실제 데이터의 예측 능력은 떨어지는 문제가 발생한다. (오버피팅 문제가 발생 )
     머신러닝 데이터 분석시 적절한 max_depth를 찾는게 데이터 분석가의 역할이다.

    면접문제 :
        8. 의사결정트리에서 트리의 depth가 깊게 되었을 때의 장단점을 설명해보시오.

         답 : max_depth = 5 는 트리 레벨을 5로 지정해서 가지의 확장을 5단계까지 확장시키겠다는 의미
         레벨이 많아질수록 모형 학습에 사용하는 훈련 데이터에 대한 예측은 정확해진다.
         그러나 모형이 훈련 데이터에 대해서만 지나치게 최적화 되어 실제 데이터의 예측 능력은 떨어지는 문제가 발생한다. (오버피팅 문제가 발생 )
         머신러닝 데이터 분석시 적절한 max_depth를 찾는게 데이터 분석가의 역할이다.

In [12]:
# train data를 가지고 모형 학습
tree_model.fit(X_train, y_train) 
# test data를 가지고 y_hat을 예측 (분류) 
y_hat = tree_model.predict(X_test) # 2: benign(양성), 4: malignant(악성)
print(y_hat[0:10])
print(y_test.values[0:10])
print('\n')

[4 4 4 4 4 4 2 2 4 4]
[4 4 4 4 4 4 2 2 4 4]




In [13]:
# 모형 성능 평가 - Confusion Matrix 계산
from sklearn import metrics 
tree_matrix = metrics.confusion_matrix(y_test, y_hat) 
print(tree_matrix)
print('\n')

[[127   4]
 [  2  72]]




In [14]:
# 모형 성능 평가 - 평가지표 계산
tree_report = metrics.classification_report(y_test, y_hat) 
print(tree_report)

              precision    recall  f1-score   support

           2       0.98      0.97      0.98       131
           4       0.95      0.97      0.96        74

    accuracy                           0.97       205
   macro avg       0.97      0.97      0.97       205
weighted avg       0.97      0.97      0.97       205



### ※ 문제230. class를 value_counts하여 각각 건수가 어떻게 되는지 확인하시오

In [15]:
df['class'].value_counts()

2    444
4    239
Name: class, dtype: int64

### ※ 문제231. 위의 의사결정트리 모델을 아산병원에서 사용할 수 있도록 FN을 0을 만드는 max_depth를 알아내시오

In [16]:
'''
[Step 4] Decision Tree 분류 모형 - sklearn 사용
'''
# sklearn 라이브러리에서 Decision Tree 분류 모형 가져오기
from sklearn import tree
# 모형 객체 생성 (criterion='entropy' 적용)
tree_model = tree.DecisionTreeClassifier(criterion='entropy', max_depth=4)
# train data를 가지고 모형 학습
tree_model.fit(X_train, y_train)
# test data를 가지고 y_hat을 예측 (분류)
y_hat = tree_model.predict(X_test) # 2: benign(양성), 4: malignant(악성)
print(y_hat[0:10])
print(y_test.values[0:10])
print('\n')
# 모형 성능 평가 - Confusion Matrix 계산
from sklearn import metrics
tree_matrix = metrics.confusion_matrix(y_test, y_hat)
print(tree_matrix)
print('\n')
# 모형 성능 평가 - 평가지표 계산
tree_report = metrics.classification_report(y_test, y_hat)
tree_report = metrics.classification_report(y_test, y_hat)
print(tree_report)

[4 4 4 4 4 4 2 2 4 4]
[4 4 4 4 4 4 2 2 4 4]


[[126   5]
 [  0  74]]


              precision    recall  f1-score   support

           2       1.00      0.96      0.98       131
           4       0.94      1.00      0.97        74

    accuracy                           0.98       205
   macro avg       0.97      0.98      0.97       205
weighted avg       0.98      0.98      0.98       205



### ※ 문제232. seaborn의 타이타닉 데이터를 이용해서 의사결정트리 모델을 만들고 테스트 데이터의 정확도를 확인하시오

In [17]:
import seaborn as sns
tt = sns.load_dataset('titanic')
pd.set_option('display.max_columns', 15)
'''
[Step 2] 데이터 탐색
'''
tt.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   survived     891 non-null    int64   
 1   pclass       891 non-null    int64   
 2   sex          891 non-null    object  
 3   age          714 non-null    float64 
 4   sibsp        891 non-null    int64   
 5   parch        891 non-null    int64   
 6   fare         891 non-null    float64 
 7   embarked     889 non-null    object  
 8   class        891 non-null    category
 9   who          891 non-null    object  
 10  adult_male   891 non-null    bool    
 11  deck         203 non-null    category
 12  embark_town  889 non-null    object  
 13  alive        891 non-null    object  
 14  alone        891 non-null    bool    
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.6+ KB


In [18]:
tt.describe()

Unnamed: 0,survived,pclass,age,sibsp,parch,fare
count,891.0,891.0,714.0,891.0,891.0,891.0
mean,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,0.0,1.0,0.42,0.0,0.0,0.0
25%,0.0,2.0,20.125,0.0,0.0,7.9104
50%,0.0,3.0,28.0,0.0,0.0,14.4542
75%,1.0,3.0,38.0,1.0,0.0,31.0
max,1.0,3.0,80.0,8.0,6.0,512.3292


In [20]:
rdf = tt.drop(['deck', 'embark_town'], axis=1)
rdf = rdf.dropna(subset=['age'], how='any', axis=0)
ndf = rdf[['survived', 'pclass', 'sex', 'age','sibsp', 'parch', 'embarked']]
gender = pd.get_dummies(ndf['sex'])
ndf = pd.concat([ndf, gender], axis=1)
onehot_embarked = pd.get_dummies(ndf['embarked'], prefix='town')
onehot_embarked
ndf = pd.concat([ndf, onehot_embarked], axis=1)
ndf.drop(['sex', 'embarked'], axis=1, inplace=True)
'''
[Step 3] 데이터셋 구분 - 훈련용(train data)/ 검증용(test data)
'''
X = ndf[['pclass', 'age', 'sibsp', 'parch', 'female', 'male', 'town_C', 'town_Q', 'town_S']]
y = ndf['survived']
# 설명 변수 데이터를 정규화
from sklearn import preprocessing
X = preprocessing.StandardScaler().fit(X).transform(X)
pd.DataFrame(X).describe()
# train data 와 test data로 구분(7:3 비율)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=10) 
print('train data 개수: ', X_train.shape)
print('test data 개수: ', X_test.shape)
print('\n')

train data 개수:  (499, 9)
test data 개수:  (215, 9)




In [21]:
'''
[Step 4] Decision Tree 분류 모형 - sklearn 사용
'''
# sklearn 라이브러리에서 Decision Tree 분류 모형 가져오기
from sklearn import tree
# 모형 객체 생성 (criterion='entropy' 적용)
tree_model = tree.DecisionTreeClassifier(criterion='entropy', max_depth=5)
# train data를 가지고 모형 학습
tree_model.fit(X_train, y_train) 
# test data를 가지고 y_hat을 예측 (분류) 
y_hat = tree_model.predict(X_test) # 2: benign(양성), 4: malignant(악성)
print(y_hat[0:10])
print(y_test.values[0:10])
print('\n')
# 모형 성능 평가 - Confusion Matrix 계산
from sklearn import metrics 
tree_matrix = metrics.confusion_matrix(y_test, y_hat) 
print(tree_matrix)
print('\n')
# 모형 성능 평가 - 평가지표 계산
tree_report = metrics.classification_report(y_test, y_hat) 
print(tree_report)

[0 0 1 0 0 1 1 0 0 0]
[0 0 1 0 0 1 1 1 0 0]


[[120   5]
 [ 35  55]]


              precision    recall  f1-score   support

           0       0.77      0.96      0.86       125
           1       0.92      0.61      0.73        90

    accuracy                           0.81       215
   macro avg       0.85      0.79      0.80       215
weighted avg       0.83      0.81      0.81       215



In [23]:
import seaborn as sns
tt = sns.load_dataset('titanic')
pd.set_option('display.max_columns', 15)
'''
[Step 2] 데이터 탐색
'''
tt.info()

rdf = tt.drop(['deck', 'embark_town'], axis=1)

# age를 최빈값으로 채웠을 경우
freq = rdf['age'].value_counts(dropna=True).idxmax()
rdf['age'].fillna(freq, inplace=True)

ndf = rdf[['survived', 'pclass', 'sex', 'age','sibsp', 'parch', 'embarked']]
gender = pd.get_dummies(ndf['sex'])

ndf = pd.concat([ndf, gender], axis=1)
onehot_embarked = pd.get_dummies(ndf['embarked'], prefix='town')

ndf = pd.concat([ndf, onehot_embarked], axis=1)
ndf.drop(['sex', 'embarked'], axis=1, inplace=True)

X = ndf[['pclass', 'age', 'sibsp', 'parch', 'female', 'male', 'town_C', 'town_Q', 'town_S']]
y = ndf['survived']

from sklearn import preprocessing
X = preprocessing.StandardScaler().fit(X).transform(X)
pd.DataFrame(X).describe()

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=10) 

from sklearn import tree
tree_model = tree.DecisionTreeClassifier(criterion='entropy', max_depth=5)
tree_model.fit(X_train, y_train) 
y_hat = tree_model.predict(X_test) # 2: benign(양성), 4: malignant(악성)

from sklearn import metrics 
tree_matrix = metrics.confusion_matrix(y_test, y_hat) 
print(tree_matrix)
print('\n')

tree_report = metrics.classification_report(y_test, y_hat) 
print(tree_report)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   survived     891 non-null    int64   
 1   pclass       891 non-null    int64   
 2   sex          891 non-null    object  
 3   age          714 non-null    float64 
 4   sibsp        891 non-null    int64   
 5   parch        891 non-null    int64   
 6   fare         891 non-null    float64 
 7   embarked     889 non-null    object  
 8   class        891 non-null    category
 9   who          891 non-null    object  
 10  adult_male   891 non-null    bool    
 11  deck         203 non-null    category
 12  embark_town  889 non-null    object  
 13  alive        891 non-null    object  
 14  alone        891 non-null    bool    
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.6+ KB
[[152  22]
 [ 22  72]]


              precision    recall  f1-score   support


In [24]:
### 선생님 답 ###
from sklearn import metrics
import numpy as np
import pandas as pd
import seaborn as sns
df = sns.load_dataset('titanic')
pd.set_option('display.max_columns',15)

mask4 = (df.age<10) | (df.sex=='female') 
df['child_women']=mask4.astype(int)

rdf = df.drop(['deck','embark_town'], axis =1)

# 모든 데이터가 없으면 drop 해라 (how = 'all')
rdf = rdf.dropna( subset=['age'], how='any', axis=0)

most_freq = rdf['embarked'].value_counts().idxmax()
rdf['embarked'].fillna(most_freq, inplace = True)

ndf = rdf[['survived','pclass','sex','age','sibsp','parch','embarked','child_women']]

gender = pd.get_dummies(ndf['sex'])
ndf = pd.concat([ndf,gender], axis= 1)

onehot_embarked = pd.get_dummies(ndf['embarked'])
ndf = pd.concat([ndf,onehot_embarked],axis=1)

ndf.drop(['sex','embarked'], axis=1, inplace = True)

# 종속변수 독립변수
x = ndf[ ['pclass', 'age' ,'sibsp', 'parch' ,'female' ,'male', 'C' ,'Q' ,'S', 'child_women'] ]
y = ndf['survived'] # 종속변수

from sklearn import preprocessing
X = preprocessing.StandardScaler().fit(x).transform(x)

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(x,y,test_size = 0.3,
 random_state = 33)

from sklearn import tree
tree_model = tree.DecisionTreeClassifier(criterion='entropy', max_depth=5)
tree_model.fit( X_train, y_train )

y_hat = tree_model.predict( X_test )

from sklearn import metrics
randomforest_matrix = metrics.confusion_matrix( y_test, y_hat )
print( randomforest_matrix )
tn, fp, fn, tp = metrics.confusion_matrix( y_test, y_hat ).ravel()
f1_report = metrics.classification_report( y_test, y_hat )

print( f1_report )
#print(np.array([[tp,fp],[fn,tn]]))

from sklearn.metrics import accuracy_score
accuracy = accuracy_score( y_test, y_hat)
print(accuracy)

[[109  17]
 [ 22  67]]
              precision    recall  f1-score   support

           0       0.83      0.87      0.85       126
           1       0.80      0.75      0.77        89

    accuracy                           0.82       215
   macro avg       0.81      0.81      0.81       215
weighted avg       0.82      0.82      0.82       215

0.8186046511627907


    seaborn의 타이타닉 데이터를 의사결정트리 모델로 생성 할 때 주의사항
        1. 결측치가 없어야 한다. 모델 생성할 때 에러가 난다.
        2. 명목형 데이터가 없어야 한다. 다 숫자형으로 변환해야한다.

### ※ 문제233. for loop문을 이용해서 정확도가 높은 max_depth가 무엇인지 알아내시오

In [27]:
from sklearn import metrics
import numpy as np

# 1단계 csv ---> 데이터 프레임으로 변환
import pandas as pd
import seaborn as sns
df = sns.load_dataset('titanic')
pd.set_option('display.max_columns',15)

mask4 = (df.age<10) | (df.sex=='female') 
df['child_women']=mask4.astype(int)

rdf = df.drop(['deck','embark_town'], axis =1)

# 2.4 age(나이) 열에 나이가 없는 모든행을 최빈값으로 대체한다.
# 데이터가 한개라도 없으면 drop 해라 (how = 'any')
# 모든 데이터가 없으면 drop 해라 (how = 'all')
freq = rdf['age'].value_counts(dropna=True).idxmax()
rdf['age'].fillna(freq, inplace=True)

most_freq = rdf['embarked'].value_counts().idxmax()
rdf['embarked'].fillna(most_freq, inplace = True)

ndf = rdf[['survived','pclass','sex','age','sibsp','parch','embarked','child_women']]

gender = pd.get_dummies(ndf['sex'])
ndf = pd.concat([ndf,gender], axis= 1)
onehot_embarked = pd.get_dummies(ndf['embarked'])
ndf = pd.concat([ndf,onehot_embarked],axis=1)
ndf.drop(['sex','embarked'], axis=1, inplace = True)

x = ndf[ ['pclass', 'age' ,'sibsp', 'parch' ,'female' ,'male', 'C' ,'Q' ,'S', 'child_women'] ]
y = ndf['survived'] # 종속변수

from sklearn import preprocessing
X = preprocessing.StandardScaler().fit(x).transform(x)

max=0

for i in range(1,50) :
    print('random_state:', i)
    
    from sklearn.model_selection import train_test_split
    X_train,X_test,y_train,y_test = train_test_split(x,y,test_size = 0.3, random_state = i)

    from sklearn import tree
    for k in range(1,50) :
        print('max_depth: ', k)
        # 모형 객체 생성 (criterion='entropy' 적용)
        tree_model = tree.DecisionTreeClassifier(criterion='entropy', max_depth=k)
        tree_model.fit( X_train, y_train )
        
        y_hat = tree_model.predict( X_test )

        from sklearn import metrics
        randomforest_matrix = metrics.confusion_matrix( y_test, y_hat )

        print( randomforest_matrix )

        tn, fp, fn, tp = metrics.confusion_matrix( y_test, y_hat ).ravel()
        f1_report = metrics.classification_report( y_test, y_hat )

        from sklearn.metrics import accuracy_score
        accuracy = accuracy_score( y_test, y_hat)
        print(accuracy, '\n')

        if accuracy > max :
            max = accuracy
            print('random_state: ', i, 'max_depth: ', k, 'accuracy: ', accuracy)
        # print(accuracy, '\n')

random_state: 1
max_depth:  1
[[125  28]
 [ 39  76]]
0.75 

random_state:  1 max_depth:  1 accuracy:  0.75
max_depth:  2
[[150   3]
 [ 63  52]]
0.753731343283582 

random_state:  1 max_depth:  2 accuracy:  0.753731343283582
max_depth:  3
[[133  20]
 [ 41  74]]
0.7723880597014925 

random_state:  1 max_depth:  3 accuracy:  0.7723880597014925
max_depth:  4
[[132  21]
 [ 37  78]]
0.7835820895522388 

random_state:  1 max_depth:  4 accuracy:  0.7835820895522388
max_depth:  5
[[135  18]
 [ 41  74]]
0.7798507462686567 

max_depth:  6
[[135  18]
 [ 42  73]]
0.7761194029850746 

max_depth:  7
[[132  21]
 [ 42  73]]
0.7649253731343284 

max_depth:  8
[[133  20]
 [ 49  66]]
0.7425373134328358 

max_depth:  9
[[133  20]
 [ 50  65]]
0.7388059701492538 

max_depth:  10
[[130  23]
 [ 48  67]]
0.7350746268656716 

max_depth:  11
[[131  22]
 [ 47  68]]
0.7425373134328358 

max_depth:  12
[[128  25]
 [ 45  70]]
0.7388059701492538 

max_depth:  13
[[128  25]
 [ 45  70]]
0.7388059701492538 

max_depth:  

0.7910447761194029 

max_depth:  42
[[135  29]
 [ 30  74]]
0.7798507462686567 

max_depth:  43
[[136  28]
 [ 31  73]]
0.7798507462686567 

max_depth:  44
[[135  29]
 [ 29  75]]
0.7835820895522388 

max_depth:  45
[[134  30]
 [ 30  74]]
0.7761194029850746 

max_depth:  46
[[136  28]
 [ 31  73]]
0.7798507462686567 

max_depth:  47
[[136  28]
 [ 29  75]]
0.7873134328358209 

max_depth:  48
[[133  31]
 [ 30  74]]
0.7723880597014925 

max_depth:  49
[[135  29]
 [ 30  74]]
0.7798507462686567 

random_state: 4
max_depth:  1
[[153  25]
 [ 23  67]]
0.8208955223880597 

random_state:  4 max_depth:  1 accuracy:  0.8208955223880597
max_depth:  2
[[177   1]
 [ 42  48]]
0.8395522388059702 

random_state:  4 max_depth:  2 accuracy:  0.8395522388059702
max_depth:  3
[[155  23]
 [ 22  68]]
0.832089552238806 

max_depth:  4
[[164  14]
 [ 37  53]]
0.8097014925373134 

max_depth:  5
[[159  19]
 [ 28  62]]
0.8246268656716418 

max_depth:  6
[[160  18]
 [ 26  64]]
0.835820895522388 

max_depth:  7
[[160  18

[[153  21]
 [ 30  64]]
0.8097014925373134 

max_depth:  35
[[153  21]
 [ 30  64]]
0.8097014925373134 

max_depth:  36
[[154  20]
 [ 31  63]]
0.8097014925373134 

max_depth:  37
[[153  21]
 [ 30  64]]
0.8097014925373134 

max_depth:  38
[[154  20]
 [ 31  63]]
0.8097014925373134 

max_depth:  39
[[154  20]
 [ 30  64]]
0.8134328358208955 

max_depth:  40
[[153  21]
 [ 31  63]]
0.8059701492537313 

max_depth:  41
[[153  21]
 [ 30  64]]
0.8097014925373134 

max_depth:  42
[[154  20]
 [ 30  64]]
0.8134328358208955 

max_depth:  43
[[153  21]
 [ 31  63]]
0.8059701492537313 

max_depth:  44
[[153  21]
 [ 31  63]]
0.8059701492537313 

max_depth:  45
[[153  21]
 [ 30  64]]
0.8097014925373134 

max_depth:  46
[[153  21]
 [ 30  64]]
0.8097014925373134 

max_depth:  47
[[153  21]
 [ 30  64]]
0.8097014925373134 

max_depth:  48
[[154  20]
 [ 30  64]]
0.8134328358208955 

max_depth:  49
[[154  20]
 [ 31  63]]
0.8097014925373134 

random_state: 7
max_depth:  1
[[125  31]
 [ 38  74]]
0.7425373134328358

0.7761194029850746 

max_depth:  40
[[147  15]
 [ 42  64]]
0.7873134328358209 

max_depth:  41
[[146  16]
 [ 43  63]]
0.7798507462686567 

max_depth:  42
[[146  16]
 [ 42  64]]
0.7835820895522388 

max_depth:  43
[[147  15]
 [ 43  63]]
0.7835820895522388 

max_depth:  44
[[146  16]
 [ 44  62]]
0.7761194029850746 

max_depth:  45
[[146  16]
 [ 43  63]]
0.7798507462686567 

max_depth:  46
[[146  16]
 [ 44  62]]
0.7761194029850746 

max_depth:  47
[[146  16]
 [ 44  62]]
0.7761194029850746 

max_depth:  48
[[146  16]
 [ 43  63]]
0.7798507462686567 

max_depth:  49
[[146  16]
 [ 43  63]]
0.7798507462686567 

random_state: 10
max_depth:  1
[[148  26]
 [ 20  74]]
0.8283582089552238 

max_depth:  2
[[173   1]
 [ 40  54]]
0.8470149253731343 

max_depth:  3
[[152  22]
 [ 24  70]]
0.8283582089552238 

max_depth:  4
[[155  19]
 [ 24  70]]
0.8395522388059702 

max_depth:  5
[[151  23]
 [ 23  71]]
0.8283582089552238 

max_depth:  6
[[149  25]
 [ 22  72]]
0.8246268656716418 

max_depth:  7
[[149  25]

0.7947761194029851 

max_depth:  37
[[143  18]
 [ 38  69]]
0.7910447761194029 

max_depth:  38
[[143  18]
 [ 38  69]]
0.7910447761194029 

max_depth:  39
[[142  19]
 [ 38  69]]
0.7873134328358209 

max_depth:  40
[[142  19]
 [ 38  69]]
0.7873134328358209 

max_depth:  41
[[141  20]
 [ 38  69]]
0.7835820895522388 

max_depth:  42
[[144  17]
 [ 44  63]]
0.7723880597014925 

max_depth:  43
[[141  20]
 [ 43  64]]
0.7649253731343284 

max_depth:  44
[[143  18]
 [ 38  69]]
0.7910447761194029 

max_depth:  45
[[142  19]
 [ 38  69]]
0.7873134328358209 

max_depth:  46
[[143  18]
 [ 43  64]]
0.7723880597014925 

max_depth:  47
[[144  17]
 [ 38  69]]
0.7947761194029851 

max_depth:  48
[[144  17]
 [ 38  69]]
0.7947761194029851 

max_depth:  49
[[142  19]
 [ 38  69]]
0.7873134328358209 

random_state: 13
max_depth:  1
[[135  36]
 [ 23  74]]
0.7798507462686567 

max_depth:  2
[[135  36]
 [ 23  74]]
0.7798507462686567 

max_depth:  3
[[143  28]
 [ 26  71]]
0.7985074626865671 

max_depth:  4
[[157  

[[118  34]
 [ 25  91]]
0.7798507462686567 

max_depth:  46
[[119  33]
 [ 25  91]]
0.7835820895522388 

max_depth:  47
[[119  33]
 [ 25  91]]
0.7835820895522388 

max_depth:  48
[[118  34]
 [ 25  91]]
0.7798507462686567 

max_depth:  49
[[119  33]
 [ 25  91]]
0.7835820895522388 

random_state: 16
max_depth:  1
[[137  27]
 [ 30  74]]
0.7873134328358209 

max_depth:  2
[[163   1]
 [ 51  53]]
0.8059701492537313 

max_depth:  3
[[147  17]
 [ 31  73]]
0.8208955223880597 

max_depth:  4
[[148  16]
 [ 34  70]]
0.8134328358208955 

max_depth:  5
[[139  25]
 [ 29  75]]
0.7985074626865671 

max_depth:  6
[[143  21]
 [ 39  65]]
0.7761194029850746 

max_depth:  7
[[138  26]
 [ 42  62]]
0.746268656716418 

max_depth:  8
[[139  25]
 [ 39  65]]
0.7611940298507462 

max_depth:  9
[[131  33]
 [ 44  60]]
0.7126865671641791 

max_depth:  10
[[131  33]
 [ 43  61]]
0.7164179104477612 

max_depth:  11
[[131  33]
 [ 42  62]]
0.7201492537313433 

max_depth:  12
[[129  35]
 [ 41  63]]
0.7164179104477612 

max_d

 [ 33  72]]
0.7723880597014925 

max_depth:  42
[[136  27]
 [ 33  72]]
0.7761194029850746 

max_depth:  43
[[138  25]
 [ 33  72]]
0.7835820895522388 

max_depth:  44
[[138  25]
 [ 31  74]]
0.7910447761194029 

max_depth:  45
[[138  25]
 [ 32  73]]
0.7873134328358209 

max_depth:  46
[[139  24]
 [ 33  72]]
0.7873134328358209 

max_depth:  47
[[139  24]
 [ 33  72]]
0.7873134328358209 

max_depth:  48
[[135  28]
 [ 30  75]]
0.7835820895522388 

max_depth:  49
[[136  27]
 [ 31  74]]
0.7835820895522388 

random_state: 19
max_depth:  1
[[146  20]
 [ 29  73]]
0.8171641791044776 

max_depth:  2
[[165   1]
 [ 51  51]]
0.8059701492537313 

max_depth:  3
[[153  13]
 [ 31  71]]
0.835820895522388 

max_depth:  4
[[157   9]
 [ 32  70]]
0.8470149253731343 

max_depth:  5
[[156  10]
 [ 32  70]]
0.8432835820895522 

max_depth:  6
[[161   5]
 [ 42  60]]
0.8246268656716418 

max_depth:  7
[[159   7]
 [ 47  55]]
0.7985074626865671 

max_depth:  8
[[161   5]
 [ 45  57]]
0.8134328358208955 

max_depth:  9
[

[[129  35]
 [ 34  70]]
0.7425373134328358 

max_depth:  3
[[137  27]
 [ 34  70]]
0.7723880597014925 

max_depth:  4
[[140  24]
 [ 33  71]]
0.7873134328358209 

max_depth:  5
[[139  25]
 [ 34  70]]
0.7798507462686567 

max_depth:  6
[[136  28]
 [ 34  70]]
0.7686567164179104 

max_depth:  7
[[137  27]
 [ 35  69]]
0.7686567164179104 

max_depth:  8
[[131  33]
 [ 31  73]]
0.7611940298507462 

max_depth:  9
[[134  30]
 [ 33  71]]
0.7649253731343284 

max_depth:  10
[[130  34]
 [ 36  68]]
0.7388059701492538 

max_depth:  11
[[129  35]
 [ 34  70]]
0.7425373134328358 

max_depth:  12
[[132  32]
 [ 32  72]]
0.7611940298507462 

max_depth:  13
[[132  32]
 [ 34  70]]
0.753731343283582 

max_depth:  14
[[130  34]
 [ 34  70]]
0.746268656716418 

max_depth:  15
[[130  34]
 [ 33  71]]
0.75 

max_depth:  16
[[133  31]
 [ 35  69]]
0.753731343283582 

max_depth:  17
[[131  33]
 [ 34  70]]
0.75 

max_depth:  18
[[133  31]
 [ 35  69]]
0.753731343283582 

max_depth:  19
[[133  31]
 [ 35  69]]
0.75373134328

0.7686567164179104 

max_depth:  49
[[143  30]
 [ 31  64]]
0.7723880597014925 

random_state: 25
max_depth:  1
[[137  28]
 [ 31  72]]
0.7798507462686567 

max_depth:  2
[[137  28]
 [ 31  72]]
0.7798507462686567 

max_depth:  3
[[145  20]
 [ 32  71]]
0.8059701492537313 

max_depth:  4
[[145  20]
 [ 32  71]]
0.8059701492537313 

max_depth:  5
[[142  23]
 [ 28  75]]
0.8097014925373134 

max_depth:  6
[[144  21]
 [ 35  68]]
0.7910447761194029 

max_depth:  7
[[151  14]
 [ 41  62]]
0.7947761194029851 

max_depth:  8
[[141  24]
 [ 29  74]]
0.8022388059701493 

max_depth:  9
[[140  25]
 [ 33  70]]
0.7835820895522388 

max_depth:  10
[[136  29]
 [ 34  69]]
0.7649253731343284 

max_depth:  11
[[140  25]
 [ 34  69]]
0.7798507462686567 

max_depth:  12
[[140  25]
 [ 33  70]]
0.7835820895522388 

max_depth:  13
[[140  25]
 [ 32  71]]
0.7873134328358209 

max_depth:  14
[[139  26]
 [ 31  72]]
0.7873134328358209 

max_depth:  15
[[139  26]
 [ 30  73]]
0.7910447761194029 

max_depth:  16
[[139  26]
 

0.7910447761194029 

max_depth:  47
[[145  22]
 [ 37  64]]
0.7798507462686567 

max_depth:  48
[[144  23]
 [ 37  64]]
0.7761194029850746 

max_depth:  49
[[146  21]
 [ 36  65]]
0.7873134328358209 

random_state: 28
max_depth:  1
[[137  28]
 [ 30  73]]
0.7835820895522388 

max_depth:  2
[[163   2]
 [ 55  48]]
0.7873134328358209 

max_depth:  3
[[144  21]
 [ 33  70]]
0.7985074626865671 

max_depth:  4
[[147  18]
 [ 35  68]]
0.8022388059701493 

max_depth:  5
[[146  19]
 [ 34  69]]
0.8022388059701493 

max_depth:  6
[[153  12]
 [ 42  61]]
0.7985074626865671 

max_depth:  7
[[152  13]
 [ 41  62]]
0.7985074626865671 

max_depth:  8
[[149  16]
 [ 40  63]]
0.7910447761194029 

max_depth:  9
[[148  17]
 [ 42  61]]
0.7798507462686567 

max_depth:  10
[[145  20]
 [ 41  62]]
0.7723880597014925 

max_depth:  11
[[146  19]
 [ 41  62]]
0.7761194029850746 

max_depth:  12
[[139  26]
 [ 40  63]]
0.753731343283582 

max_depth:  13
[[142  23]
 [ 40  63]]
0.7649253731343284 

max_depth:  14
[[135  30]
 [

0.8208955223880597 

max_depth:  6
[[149  10]
 [ 41  68]]
0.8097014925373134 

max_depth:  7
[[144  15]
 [ 38  71]]
0.8022388059701493 

max_depth:  8
[[136  23]
 [ 32  77]]
0.7947761194029851 

max_depth:  9
[[139  20]
 [ 38  71]]
0.7835820895522388 

max_depth:  10
[[140  19]
 [ 44  65]]
0.7649253731343284 

max_depth:  11
[[139  20]
 [ 40  69]]
0.7761194029850746 

max_depth:  12
[[138  21]
 [ 39  70]]
0.7761194029850746 

max_depth:  13
[[138  21]
 [ 37  72]]
0.7835820895522388 

max_depth:  14
[[137  22]
 [ 37  72]]
0.7798507462686567 

max_depth:  15
[[138  21]
 [ 36  73]]
0.7873134328358209 

max_depth:  16
[[138  21]
 [ 37  72]]
0.7835820895522388 

max_depth:  17
[[137  22]
 [ 36  73]]
0.7835820895522388 

max_depth:  18
[[137  22]
 [ 38  71]]
0.7761194029850746 

max_depth:  19
[[137  22]
 [ 35  74]]
0.7873134328358209 

max_depth:  20
[[137  22]
 [ 37  72]]
0.7798507462686567 

max_depth:  21
[[137  22]
 [ 36  73]]
0.7835820895522388 

max_depth:  22
[[136  23]
 [ 37  72]]
0

0.8022388059701493 

max_depth:  13
[[150  18]
 [ 35  65]]
0.8022388059701493 

max_depth:  14
[[148  20]
 [ 30  70]]
0.8134328358208955 

max_depth:  15
[[149  19]
 [ 33  67]]
0.8059701492537313 

max_depth:  16
[[149  19]
 [ 30  70]]
0.8171641791044776 

max_depth:  17
[[148  20]
 [ 35  65]]
0.7947761194029851 

max_depth:  18
[[149  19]
 [ 33  67]]
0.8059701492537313 

max_depth:  19
[[149  19]
 [ 33  67]]
0.8059701492537313 

max_depth:  20
[[148  20]
 [ 32  68]]
0.8059701492537313 

max_depth:  21
[[148  20]
 [ 33  67]]
0.8022388059701493 

max_depth:  22
[[149  19]
 [ 34  66]]
0.8022388059701493 

max_depth:  23
[[148  20]
 [ 33  67]]
0.8022388059701493 

max_depth:  24
[[149  19]
 [ 33  67]]
0.8059701492537313 

max_depth:  25
[[149  19]
 [ 33  67]]
0.8059701492537313 

max_depth:  26
[[148  20]
 [ 32  68]]
0.8059701492537313 

max_depth:  27
[[149  19]
 [ 34  66]]
0.8022388059701493 

max_depth:  28
[[148  20]
 [ 32  68]]
0.8059701492537313 

max_depth:  29
[[149  19]
 [ 33  67

[[140  19]
 [ 32  77]]
0.8097014925373134 

max_depth:  14
[[144  15]
 [ 35  74]]
0.8134328358208955 

max_depth:  15
[[141  18]
 [ 32  77]]
0.8134328358208955 

max_depth:  16
[[139  20]
 [ 33  76]]
0.8022388059701493 

max_depth:  17
[[141  18]
 [ 33  76]]
0.8097014925373134 

max_depth:  18
[[141  18]
 [ 32  77]]
0.8134328358208955 

max_depth:  19
[[141  18]
 [ 33  76]]
0.8097014925373134 

max_depth:  20
[[140  19]
 [ 33  76]]
0.8059701492537313 

max_depth:  21
[[142  17]
 [ 33  76]]
0.8134328358208955 

max_depth:  22
[[141  18]
 [ 34  75]]
0.8059701492537313 

max_depth:  23
[[140  19]
 [ 33  76]]
0.8059701492537313 

max_depth:  24
[[140  19]
 [ 34  75]]
0.8022388059701493 

max_depth:  25
[[141  18]
 [ 33  76]]
0.8097014925373134 

max_depth:  26
[[142  17]
 [ 33  76]]
0.8134328358208955 

max_depth:  27
[[142  17]
 [ 34  75]]
0.8097014925373134 

max_depth:  28
[[141  18]
 [ 33  76]]
0.8097014925373134 

max_depth:  29
[[140  19]
 [ 34  75]]
0.8022388059701493 

max_depth:  

0.7649253731343284 

max_depth:  20
[[130  26]
 [ 39  73]]
0.7574626865671642 

max_depth:  21
[[134  22]
 [ 40  72]]
0.7686567164179104 

max_depth:  22
[[133  23]
 [ 41  71]]
0.7611940298507462 

max_depth:  23
[[134  22]
 [ 40  72]]
0.7686567164179104 

max_depth:  24
[[135  21]
 [ 39  73]]
0.7761194029850746 

max_depth:  25
[[134  22]
 [ 39  73]]
0.7723880597014925 

max_depth:  26
[[134  22]
 [ 40  72]]
0.7686567164179104 

max_depth:  27
[[135  21]
 [ 40  72]]
0.7723880597014925 

max_depth:  28
[[132  24]
 [ 39  73]]
0.7649253731343284 

max_depth:  29
[[134  22]
 [ 39  73]]
0.7723880597014925 

max_depth:  30
[[135  21]
 [ 40  72]]
0.7723880597014925 

max_depth:  31
[[133  23]
 [ 39  73]]
0.7686567164179104 

max_depth:  32
[[133  23]
 [ 40  72]]
0.7649253731343284 

max_depth:  33
[[135  21]
 [ 40  72]]
0.7723880597014925 

max_depth:  34
[[132  24]
 [ 39  73]]
0.7649253731343284 

max_depth:  35
[[133  23]
 [ 39  73]]
0.7686567164179104 

max_depth:  36
[[133  23]
 [ 41  71

0.7798507462686567 

max_depth:  18
[[144  20]
 [ 39  65]]
0.7798507462686567 

max_depth:  19
[[144  20]
 [ 39  65]]
0.7798507462686567 

max_depth:  20
[[144  20]
 [ 39  65]]
0.7798507462686567 

max_depth:  21
[[144  20]
 [ 38  66]]
0.7835820895522388 

max_depth:  22
[[144  20]
 [ 39  65]]
0.7798507462686567 

max_depth:  23
[[143  21]
 [ 39  65]]
0.7761194029850746 

max_depth:  24
[[144  20]
 [ 38  66]]
0.7835820895522388 

max_depth:  25
[[144  20]
 [ 38  66]]
0.7835820895522388 

max_depth:  26
[[143  21]
 [ 42  62]]
0.7649253731343284 

max_depth:  27
[[144  20]
 [ 38  66]]
0.7835820895522388 

max_depth:  28
[[144  20]
 [ 38  66]]
0.7835820895522388 

max_depth:  29
[[144  20]
 [ 39  65]]
0.7798507462686567 

max_depth:  30
[[143  21]
 [ 38  66]]
0.7798507462686567 

max_depth:  31
[[144  20]
 [ 39  65]]
0.7798507462686567 

max_depth:  32
[[143  21]
 [ 38  66]]
0.7798507462686567 

max_depth:  33
[[143  21]
 [ 38  66]]
0.7798507462686567 

max_depth:  34
[[143  21]
 [ 38  66

 [ 35  73]]
0.7686567164179104 

max_depth:  27
[[132  28]
 [ 36  72]]
0.7611940298507462 

max_depth:  28
[[134  26]
 [ 36  72]]
0.7686567164179104 

max_depth:  29
[[135  25]
 [ 36  72]]
0.7723880597014925 

max_depth:  30
[[134  26]
 [ 35  73]]
0.7723880597014925 

max_depth:  31
[[135  25]
 [ 36  72]]
0.7723880597014925 

max_depth:  32
[[134  26]
 [ 36  72]]
0.7686567164179104 

max_depth:  33
[[135  25]
 [ 36  72]]
0.7723880597014925 

max_depth:  34
[[133  27]
 [ 37  71]]
0.7611940298507462 

max_depth:  35
[[134  26]
 [ 35  73]]
0.7723880597014925 

max_depth:  36
[[134  26]
 [ 34  74]]
0.7761194029850746 

max_depth:  37
[[134  26]
 [ 37  71]]
0.7649253731343284 

max_depth:  38
[[134  26]
 [ 35  73]]
0.7723880597014925 

max_depth:  39
[[134  26]
 [ 35  73]]
0.7723880597014925 

max_depth:  40
[[135  25]
 [ 36  72]]
0.7723880597014925 

max_depth:  41
[[133  27]
 [ 35  73]]
0.7686567164179104 

max_depth:  42
[[134  26]
 [ 37  71]]
0.7649253731343284 

max_depth:  43
[[135  2

0.8395522388059702 

max_depth:  27
[[158  20]
 [ 22  68]]
0.8432835820895522 

max_depth:  28
[[158  20]
 [ 23  67]]
0.8395522388059702 

max_depth:  29
[[158  20]
 [ 23  67]]
0.8395522388059702 

max_depth:  30
[[158  20]
 [ 22  68]]
0.8432835820895522 

max_depth:  31
[[158  20]
 [ 24  66]]
0.835820895522388 

max_depth:  32
[[159  19]
 [ 22  68]]
0.8470149253731343 

max_depth:  33
[[159  19]
 [ 22  68]]
0.8470149253731343 

max_depth:  34
[[159  19]
 [ 24  66]]
0.8395522388059702 

max_depth:  35
[[159  19]
 [ 25  65]]
0.835820895522388 

max_depth:  36
[[158  20]
 [ 23  67]]
0.8395522388059702 

max_depth:  37
[[158  20]
 [ 24  66]]
0.835820895522388 

max_depth:  38
[[159  19]
 [ 23  67]]
0.8432835820895522 

max_depth:  39
[[158  20]
 [ 23  67]]
0.8395522388059702 

max_depth:  40
[[159  19]
 [ 22  68]]
0.8470149253731343 

max_depth:  41
[[159  19]
 [ 22  68]]
0.8470149253731343 

max_depth:  42
[[159  19]
 [ 24  66]]
0.8395522388059702 

max_depth:  43
[[159  19]
 [ 22  68]]


### ※ 문제234. (점심시간 문제) 카페에 올린 for loop문 스크립트를 이용해서 정확도가 가장 좋은 random_state와 max_depth를 알아내시오

In [28]:
from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn import metrics
from sklearn.metrics import accuracy_score
b=[]
c=[]
for i in range(1, 50):
    X_train,X_test,y_train,y_test = train_test_split(x,y,test_size = 0.3, random_state = i)
    
    for k in range(1,50):
        b.append((i,k))
        
        tree_model = tree.DecisionTreeClassifier(criterion='entropy', max_depth=k)
        tree_model.fit( X_train, y_train )

        y_hat = tree_model.predict( X_test )

        randomforest_matrix = metrics.confusion_matrix( y_test, y_hat )
        accuracy = accuracy_score( y_test, y_hat)
        c.append(accuracy)
# print(np.max(c))
idx = c.index(np.max(c))
print('random_state: ', b[idx][0], 'max_depth: ', b[idx][1], 'accuracy:', c[idx])

random_state:  33 max_depth:  5 accuracy: 0.8694029850746269
