usage of identity in cross-validation #40

updiversity · 2015-04-22T10:38:56Z

I am getting the following error by setting the aggregator option to opt.cross_validation.identity

---------------------------------------------------------------------------
     33 # Define Parameter Tuning
---> 34 optimal_pars_clf_sgd, _, _ = opt.maximize(clf_sgd_cv, num_evals=n_hyperparams_evals, alpha=[0.001, .1], l1_ratio=[0., 1.])
     35 
     36 # Train model on the Inner Training Set with Tuned Hyperparameters

../local/lib/python2.7/site-packages/optunity/api.pyc in maximize(f, num_evals, solver_name, pmap, **kwargs)
    179     solver = make_solver(**suggestion)
    180     solution, details = optimize(solver, f, maximize=True, max_evals=num_evals,
--> 181                                  pmap=pmap)
    182     return solution, details, suggestion
    183 

../local/lib/python2.7/site-packages/optunity/api.pyc in optimize(solver, func, maximize, max_evals, pmap)
    243     time = timeit.default_timer()
    244     try:
--> 245         solution, report = solver.optimize(f, maximize, pmap=pmap)
    246     except fun.MaximumEvaluationsException:
    247         # early stopping because maximum number of evaluations is reached

../local/lib/python2.7/site-packages/optunity/solvers/ParticleSwarm.pyc in optimize(self, f, maximize, pmap)
    257             fitnesses = pmap(evaluate, list(map(self.particle2dict, pop)))
    258             for part, fitness in zip(pop, fitnesses):
--> 259                 part.fitness = fit*fitness
    260                 if not part.best or part.best_fitness < part.fitness:
    261                     part.best = part.position

TypeError: can't multiply sequence by non-int of type 'float'

Here is my code

import optunity as opt
from optunity.metrics import _recall, contingency_table
from sklearn.linear_model import SGDClassifier
import numpy as np

n_in = 1
k_in = 2
n_hyperparams_evals = 10

clf_sgd = SGDClassifier(
            penalty="elasticnet",
            shuffle=True,
            n_iter=500,
            fit_intercept=True,
            learning_rate="optimal")

# Define Inner CV
cv_decorator = opt.cross_validated(x=X, y=Y.values, 
                                   num_folds=k_in, num_iter=n_in,
                                   strata=[Y[Y==1].index.values], 
                                   regenerate_folds=True,
                                   aggregator=opt.cross_validation.identity)

def obj_fun_clf_sgd(x_train, y_train, x_test, y_test, alpha, l1_ratio):
    model = clf_sgd.set_params(l1_ratio=l1_ratio, alpha=alpha).fit(x_train, y_train)
    y_pred = model.predict(x_test)
    score = _recall(contingency_table(y_test,y_pred))
    return score

clf_sgd_cv = cv_decorator(obj_fun_clf_sgd)

# Define Parameter Tuning
optimal_pars_clf_sgd, _, _ = opt.maximize(clf_sgd_cv, num_evals=n_hyperparams_evals, alpha=[0.001, .1], l1_ratio=[0., 1.])

# Train model on the Inner Training Set with Tuned Hyperparameters
optimal_model_clf_sgd = clf_sgd.set_params(**optimal_pars_clf_sgd).fit(X, Y.values)

The objective is to keep track of all the scores from the various folds. Is it a bug? or am I using incorrectly the API ?

Thanks in advance

The text was updated successfully, but these errors were encountered:

claesenm · 2015-04-22T10:49:09Z

All solvers we use expect to get a single number for each evaluation of an objective function (which is constructed via cross-validation in your case). The identity aggregator constructs a list of all results, which causes this error.

The identity aggregator is mainly intended for other uses than you have now, for instance in the outer cross-validation procedure of a nested cross-validation setup.

At this point I am not entirely sure what your intentions are: when you are optimizing hyperparameters you really need to return a single number for every set of hyperparameters that is tried. Note that you already get a trace of all objective function evaluations (that is the overall cross-validation score) in the return values of maximize, minimize and optimize.

If you want to retain scores for each fold, you will need to modify optunity internally. I can make a quick code example if you like.

updiversity · 2015-04-22T12:41:57Z

I'd like to keep track of the scores obtained during the model selection in order to estimate the variance of the scores as a function of the hyper-parameters values.
I suppose this could be done by building an ad-hoc scorer, and then use the agregator "list_mean"

However, even like that, I would get only num_evals data points, i.e. the number of iterations configured in the "optimizer". I'd like anyway to get at least the points for each CV iteration.

So, yes, if it is not a problem for you, some lines of code would be welcome.

…mit fixes issue #40.

claesenm · 2015-04-22T14:41:36Z

You can now freely return all fold-wise information during cross-validation. Solvers now have an extra layer between objective function evaluations and actually using the return value. If your objective function returns an indexable, it's first element is used in optimization.

You can see an example of how to use this feature in bin/examples/python/advanced_cv.py, which returns the full cross-validation results. If this is not what you were looking for, please do let me know so I can reopen this issue.

PS: you will need to update optunity from git to get the new feature.

updiversity · 2015-04-23T16:16:54Z

thanks for the update. Now it even works with a self-provided aggregator. Here is the one I have defined FYI. It computes mean,std,min and max for a list of list of scores, and provides as first element, the score to be used by the optimizer.

´´´
def score_aggregators(list_of_scores,list_of_agg_fun=[np.mean,np.std,np.min,np.max]):
try:
scores = zip(*list_of_scores)
agg_scores = list(map(agg_func,scores) for agg_func in list_of_agg_fun)
return (agg_scores[0][0],list(map(agg_func,scores) for agg_func in list_of_agg_fun))
except:
agg_scores = [agg_func(list_of_scores) for agg_func in list_of_agg_fun]
return [agg_func(list_of_scores) for agg_func in list_of_agg_fun]
pass
´´´

claesenm · 2015-04-23T16:46:31Z

Glad to hear it is working! The current solution allows pretty much every use case I can think of, but feel free to let us know if there are still any remaining issues.

claesenm added a commit that referenced this issue Apr 22, 2015

added mean_and_list aggregator and example of how to use it. This com…

1e788b5

…mit fixes issue #40.

claesenm closed this as completed Apr 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

usage of identity in cross-validation #40

usage of identity in cross-validation #40

updiversity commented Apr 22, 2015

claesenm commented Apr 22, 2015

updiversity commented Apr 22, 2015

claesenm commented Apr 22, 2015

updiversity commented Apr 23, 2015

claesenm commented Apr 23, 2015

usage of identity in cross-validation #40

usage of identity in cross-validation #40

Comments

updiversity commented Apr 22, 2015

claesenm commented Apr 22, 2015

updiversity commented Apr 22, 2015

claesenm commented Apr 22, 2015

updiversity commented Apr 23, 2015

claesenm commented Apr 23, 2015