9.1 Using the function getTestData from Chapter 8, form a synthetic dataset of
10,000 observations with 10 features, where 5 are informative and 5 are noise.

(a) Use GridSearchCV on 10-fold CV to find the C, gamma optimal hyper-
parameters on a SVC with RBF kernel, where param_grid={'C':[1E-
2,1E-1,1,10,100],'gamma':[1E-2,1E-1,1,10,100]} and the scor-
ing function is neg_log_loss.

In [32]:
import pandas as pd
import numpy as np
from sklearn.datasets import make_classification
from datetime import datetime
from utils import PurgedKFold

def get_test_data(n_features=10, n_informative=5, n_samples=10000):
    X, cont=make_classification(n_samples=n_samples, n_features=n_features, n_informative=n_informative, random_state=0, shuffle=False)
    t_index=pd.date_range(periods=n_samples, freq='D', end=datetime.today())
    X, cont=pd.DataFrame(X, index=t_index), pd.Series(cont, index=t_index).to_frame(name='bin')
    X.columns=[f'I_{i}' for i in range(n_informative)]+[f'N_{i}' for i in range(n_informative, n_features)]
    cont['w']=1/len(cont)
    cont['t1']=pd.Series(cont.index, index=cont.index)
    return X, cont

X, cont = get_test_data(n_features=10, n_informative=5, n_samples=10000)
print(X, cont)

                                 I_0       I_1       I_2       I_3       I_4  \
1997-12-27 19:28:26.526902  2.105359  2.861661  0.104159  0.686149  1.369429   
1997-12-28 19:28:26.526902 -0.330754  1.464379 -1.405119  0.396713 -1.722305   
1997-12-29 19:28:26.526902 -0.461334 -0.160432 -2.169501 -0.137535  0.398229   
1997-12-30 19:28:26.526902 -1.573667  3.110105  0.073939  1.232501  1.069429   
1997-12-31 19:28:26.526902  0.528677  1.538982 -1.603758  2.056413  0.777722   
...                              ...       ...       ...       ...       ...   
2025-05-09 19:28:26.526902  0.340564 -2.226446 -1.717539  1.920408 -2.453376   
2025-05-10 19:28:26.526902 -2.003425 -2.504737 -2.081414  1.236596  0.386712   
2025-05-11 19:28:26.526902 -3.191242 -0.151656 -0.376615 -0.944432 -0.663403   
2025-05-12 19:28:26.526902 -2.116680 -0.735869 -0.858766  0.371223 -1.769760   
2025-05-13 19:28:26.526902 -1.717428 -3.589082 -2.411008  2.244037  0.664538   

                                 N_5   

In [2]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

param_grid={
    'C':[1e-2, 1e-1, 1, 10, 100],
    'gamma':[1e-2, 1e-1, 1, 10, 100]
}
cv=PurgedKFold(n_splits=10, t1=cont['t1'], pct_embargo=0.01)
grid=GridSearchCV(estimator=SVC(probability=True), param_grid=param_grid, scoring='neg_log_loss', cv=cv, n_jobs=-1)
grid.fit(X=X, y=cont['bin'], sample_weight=cont['w'])
print(grid.best_params_, grid.best_score_)
print(grid.cv_results_)



{'C': 1, 'gamma': 0.01} -0.3727089525170734
{'mean_fit_time': array([ 9.17368157,  9.41541514,  9.1325393 ,  9.21862061,  7.26268544,
        9.63701828,  9.72935259,  9.43547769,  9.57638218,  7.24953258,
        9.69084883,  9.57894535,  9.2583168 ,  9.56063757,  7.26230938,
        9.51589711,  9.57695706,  9.52711604,  9.66317406,  8.35909042,
        6.50851345,  7.50539181, 10.16300895, 12.46835697,  9.11296785]), 'std_fit_time': array([0.42471518, 0.74248626, 1.18435859, 0.92663503, 0.36893453,
       0.42876045, 0.3846183 , 1.11289177, 0.38060618, 0.38968233,
       0.67779845, 0.67934416, 0.8809273 , 0.47216704, 0.43838611,
       0.83625234, 0.74688685, 1.1547238 , 0.39840316, 0.31269165,
       0.33987558, 0.65383477, 0.47855205, 1.01320179, 0.54353153]), 'mean_score_time': array([0.34169986, 0.34902806, 0.36442113, 0.34658749, 0.33390081,
       0.34793735, 0.35568216, 0.35592337, 0.35558095, 0.33323262,
       0.36152432, 0.35665302, 0.35717497, 0.37553806, 0.34589481,
   

(b) How many nodes are there in the grid?

In [3]:
print('node cnt:', len(grid.cv_results_['params']))

node cnt: 25


(c) How may fits did it take to find the optimal solution ?

In [4]:
print('fit cnt:', len(grid.cv_results_['params']) * len([key for key in grid.cv_results_.keys() if key.startswith('split')]))

fit cnt: 250


(d) How long did it take to find this solution?

In [5]:
import os
cpu_cnt=os.cpu_count()
split_cnt=len([key for key in grid.cv_results_.keys() if key.startswith('split')])
total_fit_time=[grid.cv_results_[time_key].sum()*split_cnt for time_key in grid.cv_results_.keys() if time_key.startswith('mean') and time_key.endswith('time')]
print('total fit time:', sum(total_fit_time)/cpu_cnt)

total fit time: 236.21311922073363


(e) How can you access the optimal result?

In [6]:
print(grid.best_index_, grid.best_estimator_)

10 SVC(C=1, gamma=0.01, probability=True)


(f) What is the CV score of the optimal parameter combation ?

In [7]:
print(grid.cv_results_['mean_test_score'][grid.best_index_])

-0.3727089525170734


(g) How can you pass sample weights to the SVC?

9.2 Using the same dataset from exercise 1,   
Use RandomizedSearchCV on 10-fold CV to find the C,  
gamma optimal hyper-parameters on an SVC with RBF kernel,  
where param_distributions={'C':logUniform(a=1E-2,b=1E2),'gamma':logUniform(a=1E-2,b=1E2)},n_iter=25 and neg_log_loss is the scoring function.

In [8]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import loguniform

param_distributions={
    'C':loguniform(1e-2, 100),
    'gamma':loguniform(1e-2, 100)
}
cv=PurgedKFold(n_splits=10, t1=cont['t1'], pct_embargo=0.01)
randomnized=RandomizedSearchCV(estimator=SVC(probability=True), param_distributions=param_distributions, scoring='neg_log_loss', cv=cv, n_jobs=-1, n_iter=25)
randomnized.fit(X=X, y=cont['bin'], sample_weight=cont['w'])
print(randomnized.best_params_, randomnized.best_score_)





{'C': np.float64(1.9940757492152932), 'gamma': np.float64(0.019454583161927167)} -0.3650842741167315


(b) How long did it ake to find this solution?

In [9]:
cpu_cnt=os.cpu_count()
split_cnt=len([key for key in randomnized.cv_results_.keys() if key.startswith('split')])
total_fit_time=[randomnized.cv_results_[time_key].sum()*split_cnt for time_key in randomnized.cv_results_.keys() if time_key.startswith('mean') and time_key.endswith('time')]
print('total fit time:', sum(total_fit_time)/cpu_cnt)

total fit time: 247.4079880952835


(c) Is the optimal parameter combination similar to the one found in exercise1 ?

In [10]:
print('best params', randomnized.best_params_)

best params {'C': np.float64(1.9940757492152932), 'gamma': np.float64(0.019454583161927167)}


(d) What is the CV score of the optimal parameter combination? 
How does it compare to the CV score from exercise 1?

In [11]:
print('best scores', randomnized.best_score_)

best scores -0.3650842741167315


9.3 From exercise 1,
(a) Compute the Sharpe ratio of the resulting in-sample forecasts,from point 1.a

In [12]:
sharpe_ratios=[abs(grid.cv_results_['mean_test_score'][i])/grid.cv_results_['std_test_score'][i]  for i in range(len(grid.cv_results_['mean_test_score']))]
print('sharpe ratio(neg log loss):', max(sharpe_ratios), 'lowest:', min(sharpe_ratios))

sharpe ratio(neg log loss): 15.708096270562903 lowest: 0.9888403198543811


(b) Repeat point1.a,this time with accuracy as the scoring function.
Compute the in-sample forecasts derived from the hyper-tuned parameters.

In [13]:
param_grid={
    'C':[1e-2, 1e-1, 1, 10, 100],
    'gamma':[1e-2, 1e-1, 1, 10, 100]
}
cv=PurgedKFold(n_splits=10, t1=cont['t1'], pct_embargo=0.01)
grid=GridSearchCV(estimator=SVC(probability=True), param_grid=param_grid, scoring='accuracy', cv=cv, n_jobs=-1)
grid.fit(X=X, y=cont['bin'], sample_weight=cont['w'])
print(grid.best_params_, grid.best_score_)
print(grid.cv_results_)



{'C': 100, 'gamma': 0.01} 0.7971
{'mean_fit_time': array([10.26034508,  9.54598763,  9.92795203,  9.74459569,  7.29867718,
        9.67603126,  9.86873162,  9.31503923,  9.48041127,  7.40314298,
        9.63850091, 10.08881316,  9.3715544 , 12.03973701,  8.93130822,
       10.11274588,  9.95934966,  9.70562444, 10.056634  ,  8.50154657,
        6.66883671,  7.66906261, 10.3028614 , 12.67197356,  9.30520582]), 'std_fit_time': array([0.56172927, 1.33446501, 0.6701515 , 0.43444584, 0.40580352,
       0.46958278, 1.47707838, 1.575544  , 1.67054981, 0.76871194,
       0.48040539, 0.45674308, 1.07295777, 1.46342118, 0.81816766,
       0.80576297, 0.4807356 , 0.49022664, 0.40382943, 0.2567342 ,
       0.35704953, 0.95738665, 1.08408997, 1.02960466, 0.82249923]), 'mean_score_time': array([0.36386883, 0.3692677 , 0.3724489 , 0.361078  , 0.34397075,
       0.34345453, 0.3817744 , 0.31825464, 0.36829109, 0.34839568,
       0.3291229 , 0.36590154, 0.35047443, 0.4815012 , 0.39112678,
       0.36163

(c) What scoring method leads tohigher (in-sample) Sharpe ratio?

In [14]:
sharpe_ratios=[abs(grid.cv_results_['mean_test_score'][i])/grid.cv_results_['std_test_score'][i]  for i in range(len(grid.cv_results_['mean_test_score']))]
print('sharpe ratio(accuracy):', max(sharpe_ratios), 'lowest:', min(sharpe_ratios))

sharpe ratio(accuracy): 5.823764800584609 lowest: 0.5180504861625214


9.4 From exercise 2,

(a) Compute the Sharpe ratio of the resulting in-sample forecasts, from point
2.a.

In [15]:
sharpe_ratios=[abs(randomnized.cv_results_['mean_test_score'][i])/randomnized.cv_results_['std_test_score'][i]  for i in range(len(randomnized.cv_results_['mean_test_score']))]
print('sharpe ratio(neg log loss):', max(sharpe_ratios), 'lowest:', min(sharpe_ratios))

sharpe ratio(neg log loss): 15.874431804140167 lowest: 0.935179205978811


(b) Repeat point2.a,this time with accuracy as the scoring function.
Compute the in-sample forecasts derived from the hyper-tuned parameters.

In [16]:
from scipy.stats import loguniform
from sklearn.model_selection import RandomizedSearchCV

param_distributions={
    'C':loguniform(1e-2, 100),
    'gamma':loguniform(1e-2, 100)
}
cv=PurgedKFold(n_splits=10, t1=cont['t1'], pct_embargo=0.01)
randomnized=RandomizedSearchCV(estimator=SVC(probability=True), param_distributions=param_distributions, scoring='accuracy', cv=cv, n_jobs=-1, n_iter=25)
randomnized.fit(X=X, y=cont['bin'], sample_weight=cont['w'])
print(randomnized.best_params_, randomnized.best_score_)



{'C': np.float64(30.06683104508431), 'gamma': np.float64(0.05827034355380007)} 0.767


(c) What scoring method leads tohigher (in-sample) Sharpe ratio?

In [33]:
sharpe_ratios=[abs(randomnized.cv_results_['mean_test_score'][i])/randomnized.cv_results_['std_test_score'][i]  for i in range(len(randomnized.cv_results_['mean_test_score']))]
print('sharpe ratio(accuracy):', max(sharpe_ratios), 'lowest:', min(sharpe_ratios))

sharpe ratio(accuracy): 4.951149382485065 lowest: 0.5202088185650193


9.5 Read the definition of log loss, L[Y,P].

(a) Why is the scoring function neg_log_loss defined as the negative log loss, −L[Y,P]?
1. 높은게 더 좋게 하기 위해서

(b) What would be the outcome of maximizing the log loss,rather than the neg-
ative log loss?
1. 그 라벨을 맞추지 않도록 피팅을 하여 랜덤 예측보다 더 안 좋은 성능을 가진 모델을 만든다.

9.6 Consider an investment strategy that sizes its bets equally,regardless of thefore-
cast’s confidence. In this case, what is a more appropriate scoring function for
hyper-parameter tuning, accuracy or cross-entropy loss?

1. 예측 확률 자체의 크기보다는 맞췄는지 그 자체인 hit-ratio가 중요하므로 accuracy가 더 적절하다.