# 개요

- 사전 정보를 바탕으로 최적 하이퍼파라미터 값을 확률적으로 추정하여 탐색하는 기법
- 그리드서치, 랜덤서치보다 최적 하이퍼파라미터를 더 빠르고 효율적으로 찾아줌
- 절차
  - 하이퍼파라미터 탐색 범위 설정
  - 평가 지표 계산 함수 정의(성능 평가 함수)
  - BayesianOptimization 객체 생성
  - 베이지안 최적화 수행

# 분류

## 베이스라인

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

cancer = load_breast_cancer()
X = cancer.data
y = cancer.target
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.25, random_state=0)

# 알고리즘 생성
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()

# 학습
model.fit(X_train, y_train)

# 예측
y_pred = model.predict( X_test )
y_pred

# 성능평가
from sklearn.metrics import accuracy_score
accuracy_score( y_test, y_pred )

# 예측 후 평가까지 진행
model.score( X_test, y_test )

0.972027972027972

## 최적화

In [None]:
!pip install bayesian-optimization

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting bayesian-optimization
  Downloading bayesian_optimization-1.4.2-py3-none-any.whl (17 kB)
Collecting colorama>=0.4.6
  Downloading colorama-0.4.6-py2.py3-none-any.whl (25 kB)
Installing collected packages: colorama, bayesian-optimization
Successfully installed bayesian-optimization-1.4.2 colorama-0.4.6


In [29]:
from bayes_opt import BayesianOptimization
from sklearn.metrics import accuracy_score

param_bound = {
    # 0.001 ~ 100 사이가 탐색 범위
    'n_estimators':(10, 1000),
    'min_samples_split':(2,5),    
    'min_samples_leaf':(2,5),
    'max_depth':(2,10),
    'max_leaf_nodes':(2,100),
    # 그리드라면 : 'n_estimators':[1, 10, 100, 200, 300, 400] => 5개중 하나, 램덤서치라면 : 1 ~ 400 중에 랜덤
}
# 파라미터를 튜닝할 하이퍼로 두고 생성
def my_RF_func (n_estimators,min_samples_split,min_samples_leaf,max_depth,max_leaf_nodes):
  # 정수는 사용하는 값들은 정수 변환을 해준다 -> 부동소수로 값이 오기 때문에
  model = RandomForestClassifier(n_estimators=int(n_estimators),
                                 min_samples_split=int(min_samples_split),
                                 min_samples_leaf=int(min_samples_leaf),
                                 max_depth=int(max_depth),
                                 max_leaf_nodes=int(max_leaf_nodes)  )
  # 학습
  model.fit(X_train, y_train)
  # 예측
  y_pred = model.predict( X_test )
  # 성능평가  
  accuracy_score( y_test, y_pred )
  # 예측 후 평가까지 진행
  return model.score( X_test, y_test )

optimizer = BayesianOptimization(f=my_RF_func, pbounds=param_bound, random_state=0)

In [30]:
# 베이지안 최적화 수행 
'''
  # 메소드를 이용해 최대화 과정 수행
  # init_points :  초기 Random Search 갯수
  # n_iter : 반복 횟수 (몇개의 입력값-함숫값 점들을 확인할지! 많을 수록 정확한 값을 얻을 수 있다.)
  # acq : Acquisition Function들 중 Expected Improvement(EI) 를 사용
  # xi : exploration 강도 (기본값은 0.0)
'''
optimizer.maximize(init_points=10, n_iter=100, acq='ei', xi=0.01)

|   iter    |  target   | max_depth | max_le... | min_sa... | min_sa... | n_esti... |
-------------------------------------------------------------------------------------


Passing acquisition function parameters or gaussian process parameters to maximize
is no longer supported, and will cause an error in future releases. Instead,
please use the "set_gp_params" method to set the gp params, and pass an instance
 of bayes_opt.util.UtilityFunction using the acquisition_function argument

  optimizer.maximize(init_points=10, n_iter=100, acq='ei', xi=0.01)


| [0m1        [0m | [0m0.972    [0m | [0m6.391    [0m | [0m72.09    [0m | [0m3.808    [0m | [0m3.635    [0m | [0m429.4    [0m |
| [0m2        [0m | [0m0.965    [0m | [0m7.167    [0m | [0m44.88    [0m | [0m4.675    [0m | [0m4.891    [0m | [0m389.6    [0m |
| [0m3        [0m | [0m0.958    [0m | [0m8.334    [0m | [0m53.83    [0m | [0m3.704    [0m | [0m4.777    [0m | [0m80.33    [0m |
| [0m4        [0m | [0m0.951    [0m | [0m2.697    [0m | [0m3.981    [0m | [0m4.498    [0m | [0m4.334    [0m | [0m871.3    [0m |
| [0m5        [0m | [0m0.965    [0m | [0m9.829    [0m | [0m80.32    [0m | [0m3.384    [0m | [0m4.342    [0m | [0m127.1    [0m |
| [0m6        [0m | [0m0.965    [0m | [0m7.119    [0m | [0m16.05    [0m | [0m4.834    [0m | [0m3.566    [0m | [0m420.5    [0m |
| [0m7        [0m | [0m0.958    [0m | [0m4.116    [0m | [0m77.87    [0m | [0m3.368    [0m | [0m3.705    [0m | [0m28.6     [0m |
| [0m

In [31]:
print( optimizer.max )
# 0.972027972027972 => 0.9790209790209791 : 0.006993006993007089 상승

{'target': 0.9790209790209791, 'params': {'max_depth': 9.28567526560476, 'max_leaf_nodes': 71.68018998851385, 'min_samples_leaf': 2.155993766478279, 'min_samples_split': 2.149627657876271, 'n_estimators': 435.1435973684417}}
