SVM 하이퍼 파라미터 최적화 - Random Search

In [1]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
from scipy.stats import loguniform

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

In [2]:
# 데이터 로드
digits = load_digits()
# print(digits)

# 독립 변수, 종속 변수 데이터 저장
x = digits.data
y = digits.target

# 데이터 분리
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2, random_state=42)

print(len(x))
print(len(x_train))
print(len(x_test))

1797
1437
360


탐색할 하이퍼 파라미터 공간

In [3]:
params = {
    'C' : loguniform(1e-4,100),
    'kernel' : ['linear','poly','rbf','sigmoid'],
    'gamma' : ['scaler','auto'] + list(loguniform(1e-4,10).rvs(10)),
    'degree' : range(1,6),   # poly parameter 1-5까지 사용
    'coef0' : loguniform(1e-4,10).rvs(10)   # 1e-4 ~ 10 까지 logsacle로 탐색
}

In [4]:
svm_model = SVC()

random_search = RandomizedSearchCV(svm_model, params, n_iter=100, cv=5, verbose=2, n_jobs=-1)

random_search.fit(x_train, y_train)

# print(random_search.best_params_)

# 최적 모델 저장
best_model = random_search.best_estimator_

# 테스트 데이터에 대한 예측 수행
y_pred = best_model.predict(x_test)

# print('ACC >> ', accuracy_score(y_test,y_pred))

# 최적 하이퍼파라미터 ㅊㄹ력
print('Best HyperParameters : ', random_search.best_params_)

Fitting 5 folds for each of 100 candidates, totalling 500 fits
[CV] END C=0.018095391916084922, coef0=0.000487976306777579, degree=4, gamma=0.00014804616277148679, kernel=poly; total time=   0.2s
[CV] END C=0.009685369482256252, coef0=0.000487976306777579, degree=1, gamma=0.2058826944790441, kernel=rbf; total time=   0.3s
[CV] END C=0.009685369482256252, coef0=0.000487976306777579, degree=1, gamma=0.2058826944790441, kernel=rbf; total time=   0.3s
[CV] END C=0.009685369482256252, coef0=0.000487976306777579, degree=1, gamma=0.2058826944790441, kernel=rbf; total time=   0.3s
[CV] END C=0.009685369482256252, coef0=0.000487976306777579, degree=1, gamma=0.2058826944790441, kernel=rbf; total time=   0.3s
[CV] END C=0.009685369482256252, coef0=0.000487976306777579, degree=1, gamma=0.2058826944790441, kernel=rbf; total time=   0.3s
[CV] END C=21.493556189163492, coef0=0.008575511911176508, degree=5, gamma=scaler, kernel=rbf; total time=   0.0s
[CV] END C=21.493556189163492, coef0=0.00857551191

70 fits failed out of a total of 500.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
70 fits failed with the following error:
Traceback (most recent call last):
  File "/Users/park.s.w/anaconda3/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/Users/park.s.w/anaconda3/lib/python3.10/site-packages/sklearn/svm/_base.py", line 226, in fit
    raise ValueError(
ValueError: When 'gamma' is a string, it should be either 'scale' or 'auto'. Got ''scaler'' instead.

 0.970768   0.97355062 0.10716705 0.98886953 0.98468835 0.970768
 0.970768          nan        nan 0.10716705 0.98886953 0.97146245
 0.10716705 0.20457317 0.96730546        nan

Best HyperParameters :  {'C': 0.009685369482256252, 'coef0': 0.000487976306777579, 'degree': 1, 'gamma': 0.2058826944790441, 'kernel': 'rbf'}
