# 교차 검증 반복자 (Cross Validation iterators)

ref. [참고 자료](https://davinci-ai.tistory.com/18)  
**반복자의 선정은 데이터 세트의 모양과 구조에 따라 신중하게 선택이 되어야 합니다. 일반적으로 독립적인지, 동일한 분포인지를 보게 됩니다**

1. **데이터가 독립적이고 동일한 분포를 가진 경우**  
    - KFold
    - RepeatedKFold
    - LeaveOneOut(LOO)
    - LeavePOutLeaveOneOut(LPO)
    
    
2. **동일한 분포가 아닌 경우**  
    - StratifiedKFold 
    - RepeatedStratifiedKFold
    - StratifiedShuffleSplit
    
    
3. **그룹화된 데이터의 경우**  
    - GroupKFold
    - LeaveOneGroupOut
    - LeavePGroupsOut
    - GroupShuffleSplit
    
    
4. **시계열 데이터의 경우**  
    - TimeSeriesSplit

##### import modules and data sets

In [1]:
import pandas as pd
import numpy as np

from sklearn.datasets import load_iris, load_digits, load_boston
iris   = load_iris()
boston = load_boston()
digits = load_digits()

## 데이터가 독립적이고 동일한 분포를 가진 경우

### KFold
---
**k-fold cross validation**
- 모든 데이터를 K개의 Fold로 나누고, 이를 split하여 총 k번의 시행에서 각각 n번째가 fold를 test set의 역할을 한다. n=1,2, ... , K 

<img src="./_images/kfold.png" height="75%" width="75%">

- 'shuffle = True' : 전체 데이터를 Split 하기 전에 한 번 뒤 섞는다.

In [32]:
from sklearn.model_selection import KFold

kf = KFold(n_splits = 5, shuffle= False, random_state = None)

i = 0
for train_idx, test_idx in kf.split( iris['data'] ) : 
    i += 1 
    
    print('[{}-th iteration]\n- train set size : {}, test set size : {}'.format(i, len(train_idx), len(test_idx)))
    print('- Index of test set is \n{}\n'.format(sorted(test_idx)))

[1-th iteration]
train set size : 120, test set size : 30
Index of test set is 
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29]

[2-th iteration]
train set size : 120, test set size : 30
Index of test set is 
[30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
 54 55 56 57 58 59]

[3-th iteration]
train set size : 120, test set size : 30
Index of test set is 
[60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
 84 85 86 87 88 89]

[4-th iteration]
train set size : 120, test set size : 30
Index of test set is 
[ 90  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107
 108 109 110 111 112 113 114 115 116 117 118 119]

[5-th iteration]
train set size : 120, test set size : 30
Index of test set is 
[120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137
 138 139 140 141 142 143 144 145 146 147 148 149]



### RepeatedKFold
---
**Repeated k-fold cross validation**

- **k-fold cross validation**을 **n**번 시행한 것임
- 'shuffle = True' 가 기본 세팅

In [35]:
from sklearn.model_selection import RepeatedKFold

rkf = RepeatedKFold(n_splits = 5, n_repeats = 2, random_state = None)

i = 0
for train_idx, test_idx in rkf.split( iris['data'] ) : 
    i += 1 
    
    print('[{}-th iteration]\n- train set size : {}, test set size : {}'.format(i, len(train_idx), len(test_idx)))
    print('- Index of test set is \n{}\n'.format(sorted(test_idx)))

[1-th iteration]
train set size : 120, test set size : 30
Index of test set is 
[  1   4   5   8  16  19  28  34  43  47  52  55  56  59  63  65  68  79
  86  88  92 103 108 110 113 127 132 133 136 141]

[2-th iteration]
train set size : 120, test set size : 30
Index of test set is 
[  0   2   9  14  29  31  33  36  42  45  51  58  67  69  70  76  82  85
  98  99 106 119 120 121 122 125 134 137 142 148]

[3-th iteration]
train set size : 120, test set size : 30
Index of test set is 
[  3  13  15  32  39  41  48  50  57  66  72  74  75  77 101 102 105 107
 115 118 123 124 129 131 135 138 143 144 145 149]

[4-th iteration]
train set size : 120, test set size : 30
Index of test set is 
[  6  10  11  23  24  26  27  30  37  40  44  46  53  62  64  71  87  89
  90  91  93  94  97 104 111 112 116 126 146 147]

[5-th iteration]
train set size : 120, test set size : 30
Index of test set is 
[  7  12  17  18  20  21  22  25  35  38  49  54  60  61  73  78  80  81
  83  84  95  96 100 109 114 11

### LeaveOneOut (LOO)
---
전체 데이터에서 하나의 관측치를 제외하고 Train set으로 사용하고 제외한 관측치를 test 용도로 사용한다.

- <=> **하나의 관측치만**을 TEST SET으로 설정
- 데이터의 개수가 적을 때, 데이터의 낭비를 막는 방법
- 결과값들의 분산이 높게 나오는 경우가 많음

In [37]:
from sklearn.model_selection import LeaveOneOut

loo = LeaveOneOut()

i = 0
for train_idx, test_idx in loo.split( iris['data'] ) : 
    i += 1 
    
    print('[{}-th iteration]\n- train set size : {}, test set size : {}'.format(i, len(train_idx), len(test_idx)))
    print('- Index of test set is \n{}\n'.format(sorted(test_idx)))

[1-th iteration]
train set size : 149, test set size : 1
Index of test set is 
[0]

[2-th iteration]
train set size : 149, test set size : 1
Index of test set is 
[1]

[3-th iteration]
train set size : 149, test set size : 1
Index of test set is 
[2]

[4-th iteration]
train set size : 149, test set size : 1
Index of test set is 
[3]

[5-th iteration]
train set size : 149, test set size : 1
Index of test set is 
[4]

[6-th iteration]
train set size : 149, test set size : 1
Index of test set is 
[5]

[7-th iteration]
train set size : 149, test set size : 1
Index of test set is 
[6]

[8-th iteration]
train set size : 149, test set size : 1
Index of test set is 
[7]

[9-th iteration]
train set size : 149, test set size : 1
Index of test set is 
[8]

[10-th iteration]
train set size : 149, test set size : 1
Index of test set is 
[9]

[11-th iteration]
train set size : 149, test set size : 1
Index of test set is 
[10]

[12-th iteration]
train set size : 149, test set size : 1
Index of test s

### LeavePOut (LPO)
---
**LPO** = LOO + K-fold
- LOO와 마찬가지로 양이 적은 데이터에서 데이터 손실을 막기 위해 고안된 방법
- 전체 데이터에서 **p**개 관측치들을 제외 (모든 조합의 수), 나머지를 Train set, 제외된 관측치들을 Test set



In [52]:
from sklearn.model_selection import LeavePOut

lpo = lpo = LeavePOut(p=2)

i = 0
for train_idx, test_idx in lpo.split( iris['data'] ) : 
    i += 1 
    
    print('[{}-th iteration]\n- train set size : {}, test set size : {}'.format(i, len(train_idx), len(test_idx)))
    print('- Index of test set is \n{}\n'.format(test_idx))

[1-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[0 1]

[2-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[0 2]

[3-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[0 3]

[4-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[0 4]

[5-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[0 5]

[6-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[0 6]

[7-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[0 7]

[8-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[0 8]

[9-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[0 9]

[10-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 0 10]

[11-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 0 11]

[12-

- Index of test set is 
[ 5 98]

[829-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 5 99]

[830-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[  5 100]

[831-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[  5 101]

[832-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[  5 102]

[833-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[  5 103]

[834-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[  5 104]

[835-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[  5 105]

[836-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[  5 106]

[837-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[  5 107]

[838-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[  5 108]

[839-th i

- train set size : 148, test set size : 2
- Index of test set is 
[ 10 143]

[1579-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 10 144]

[1580-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 10 145]

[1581-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 10 146]

[1582-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 10 147]

[1583-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 10 148]

[1584-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 10 149]

[1585-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[11 12]

[1586-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[11 13]

[1587-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[11 14]

[1588-th iteration]
- train set size : 148, test set siz

[2328-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[16 80]

[2329-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[16 81]

[2330-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[16 82]

[2331-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[16 83]

[2332-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[16 84]

[2333-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[16 85]

[2334-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[16 86]

[2335-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[16 87]

[2336-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[16 88]

[2337-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[16 89]

[2338-th iteration]
- train set size : 148, test s


[3078-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[22 53]

[3079-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[22 54]

[3080-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[22 55]

[3081-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[22 56]

[3082-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[22 57]

[3083-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[22 58]

[3084-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[22 59]

[3085-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[22 60]

[3086-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[22 61]

[3087-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[22 62]

[3088-th iteration]
- train set size : 148, test 

- Index of test set is 
[28 61]

[3828-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[28 62]

[3829-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[28 63]

[3830-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[28 64]

[3831-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[28 65]

[3832-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[28 66]

[3833-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[28 67]

[3834-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[28 68]

[3835-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[28 69]

[3836-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[28 70]

[3837-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[28 71]

[3838-th iteratio

- train set size : 148, test set size : 2
- Index of test set is 
[ 34 106]

[4578-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 34 107]

[4579-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 34 108]

[4580-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 34 109]

[4581-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 34 110]

[4582-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 34 111]

[4583-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 34 112]

[4584-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 34 113]

[4585-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 34 114]

[4586-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 34 115]

[4587-th iteration]
- train set size : 148, test s

[5327-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[41 79]

[5328-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[41 80]

[5329-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[41 81]

[5330-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[41 82]

[5331-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[41 83]

[5332-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[41 84]

[5333-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[41 85]

[5334-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[41 86]

[5335-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[41 87]

[5336-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[41 88]

[5337-th iteration]
- train set size : 148, test s


[6077-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 48 101]

[6078-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 48 102]

[6079-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 48 103]

[6080-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 48 104]

[6081-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 48 105]

[6082-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 48 106]

[6083-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 48 107]

[6084-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 48 108]

[6085-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 48 109]

[6086-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 48 110]

[6087-th iteration]
- train s

- Index of test set is 
[56 78]

[6827-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[56 79]

[6828-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[56 80]

[6829-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[56 81]

[6830-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[56 82]

[6831-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[56 83]

[6832-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[56 84]

[6833-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[56 85]

[6834-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[56 86]

[6835-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[56 87]

[6836-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[56 88]

[6837-th iteratio


[7540-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[64 84]

[7541-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[64 85]

[7542-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[64 86]

[7543-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[64 87]

[7544-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[64 88]

[7545-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[64 89]

[7546-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[64 90]

[7547-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[64 91]

[7548-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[64 92]

[7549-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[64 93]

[7550-th iteration]
- train set size : 148, test 


[8121-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[71 98]

[8122-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[71 99]

[8123-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 71 100]

[8124-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 71 101]

[8125-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 71 102]

[8126-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 71 103]

[8127-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 71 104]

[8128-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 71 105]

[8129-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 71 106]

[8130-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 71 107]

[8131-th iteration]
- train set s


[8826-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 80 146]

[8827-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 80 147]

[8828-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 80 148]

[8829-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 80 149]

[8830-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[81 82]

[8831-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[81 83]

[8832-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[81 84]

[8833-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[81 85]

[8834-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[81 86]

[8835-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[81 87]

[8836-th iteration]
- train set size : 14

- train set size : 148, test set size : 2
- Index of test set is 
[ 92 102]

[9533-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 92 103]

[9534-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 92 104]

[9535-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 92 105]

[9536-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 92 106]

[9537-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 92 107]

[9538-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 92 108]

[9539-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 92 109]

[9540-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 92 110]

[9541-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[ 92 111]

[9542-th iteration]
- train set size : 148, test s

- train set size : 148, test set size : 2
- Index of test set is 
[102 144]

[10090-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[102 145]

[10091-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[102 146]

[10092-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[102 147]

[10093-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[102 148]

[10094-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[102 149]

[10095-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[103 104]

[10096-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[103 105]

[10097-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[103 106]

[10098-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[103 107]

[10099-th iteration]
- train set size : 1

[10825-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[123 124]

[10826-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[123 125]

[10827-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[123 126]

[10828-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[123 127]

[10829-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[123 128]

[10830-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[123 129]

[10831-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[123 130]

[10832-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[123 131]

[10833-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[123 132]

[10834-th iteration]
- train set size : 148, test set size : 2
- Index of test set is 
[123 133]

[10835-th iteration]

### ShuffleSplit
---
- k-fold 기반의 방법들과는 다르게 **중복**의 가능성이 있다. (완전 배타를 보장하지 않는다.)
- **n**번의 반복 마다, **test_size** 만큼의 관측치를 test set으로 사용한다.

In [48]:
from sklearn.model_selection import ShuffleSplit 

ss = ShuffleSplit(n_splits= 5, test_size= 0.3, random_state= 9505)

i = 0
for train_idx, test_idx in ss.split( iris['data'] ) : 
    i += 1 
    
    print('[{}-th iteration]\n- train set size : {}, test set size : {}'.format(i, len(train_idx), len(test_idx)))
    print('- Index of test set is \n{}\n'.format(sorted(test_idx)))

[1-th iteration]
train set size : 105, test set size : 45
Index of test set is 
[1, 3, 5, 7, 8, 10, 12, 15, 26, 31, 41, 42, 45, 47, 48, 56, 66, 72, 73, 74, 75, 79, 80, 87, 89, 92, 96, 102, 106, 108, 111, 121, 124, 125, 127, 135, 136, 137, 138, 140, 141, 143, 145, 146, 147]

[2-th iteration]
train set size : 105, test set size : 45
Index of test set is 
[2, 10, 11, 13, 14, 15, 18, 24, 26, 30, 35, 36, 40, 55, 64, 65, 66, 67, 70, 71, 79, 81, 82, 85, 89, 95, 96, 101, 102, 103, 104, 105, 107, 108, 111, 114, 115, 117, 119, 120, 121, 125, 141, 145, 148]

[3-th iteration]
train set size : 105, test set size : 45
Index of test set is 
[1, 6, 11, 13, 18, 22, 27, 29, 30, 31, 32, 38, 39, 44, 47, 48, 52, 54, 59, 66, 68, 73, 74, 76, 77, 78, 81, 86, 88, 95, 97, 100, 102, 105, 106, 107, 116, 121, 122, 123, 124, 128, 137, 144, 148]

[4-th iteration]
train set size : 105, test set size : 45
Index of test set is 
[0, 1, 7, 8, 9, 11, 13, 14, 15, 23, 25, 30, 35, 38, 39, 48, 49, 51, 59, 61, 64, 65, 66, 68, 

## 동일한 분포가 아닌 경우

### StratifiedKFold
- 분류 문제에서 label의 비율은 Learning에서 아주 중요하게 적용될 수 있다.
- **lable의 비율을 유지**하면서 Train set, Test set을 K-fold cross validation.

In [51]:
from sklearn.model_selection import StratifiedKFold

skf = StratifiedKFold(n_splits = 4)

i = 0
for train_idx, test_idx in skf.split( iris['data'], iris['target'] ) : 
    i += 1 
    
    print('[{}-th iteration]\n- train set size : {}, test set size : {}'.format(i, len(train_idx), len(test_idx)))
    print('- Index of test set is \n{}\n'.format(sorted(test_idx)))

[1-th iteration]
- train set size : 112, test set size : 38
- Index of test set is 
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112]

[2-th iteration]
- train set size : 112, test set size : 38
- Index of test set is 
[13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125]

[3-th iteration]
- train set size : 113, test set size : 37
- Index of test set is 
[26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137]

[4-th iteration]
- train set size : 113, test set size : 37
- Index of test set is 
[38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149]



### GroupKFold
- ㅔㅣㅌㅓㅏ 