New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No score variance over 50 iterations despite multiple parameters switched #23
Comments
Could you provide the entire script? How do i get "jc"? I would like to reproduce this bug. Could you also provide a random_state that shows the bug? |
I already have the suspicion that the SimulatedAnnealingOptimizer "sticks" to the edge of the search space. This can happen for those kinds of local optimizers when n_iter is very small. |
jc is a script of helper functions I have for taking a dictionary of social media posts, their authors, dates, and recipients, pulling out the post texts, cleaning them, tokenizing them, converting into tf-idf, and generating labels for them. I also get the |
Apologies, but I'm not sure what you mean by providing a random_state; this is admittedly my first rodeo :) |
I just found the "error". You create the parameters in the search space, but you are not using any of them. The model does not change during the optimization run. Your objective function should look like this: def model(opt):
clf_xgb = xgb.XGBClassifier(
n_estimators=opt["n_estimators"],
max_depth=opt["max_depth"],
learning_rate=opt["learning_rate"],
objective="binary:logistic",
# eta=0.4,
# max_depth=8,
subsample=0.5,
base_score=np.mean(y_labels),
eval_metric="logloss",
missing=None,
use_label_encoder=False,
seed=42,
)
scores = cross_val_score(
clf_xgb, freq_df, y_labels, cv=5
) # default is 5, hyperactive example is 3
return scores.mean()
# Configure the range of hyperparameters we want to test out
search_space = {
"n_estimators": list(range(500, 5000, 100)),
"max_depth": list(range(6, 12)),
"learning_rate": [0.1, 0.3, 0.4, 0.5, 0.7],
} It is funny how I missed this in my answers above. If you are convinced something is wrong it is sometimes hard to see the obvious. I will close this issue now, but if you have further questions about this you can ask them here. |
Attempted tuning a xgboost binary classifier on tf-idf data adjusting n_estimators, max_depth, and learning_rate and there was zero variation in the score for each of 50 iterations. When I manually tweak parameters and run a single training instance manually, I achieve score variations.
Note:
I have also tried this with the default optimizer for 20 iterations and different ranges for the parameter tuning, and it gave me the same results: the score is always0.6590446358653093
.SYSTEM DETAILS:
Amazon SageMaker
Hyperactive ver: 3.0.5.1
Python ver: 3.6.13
Here is my code:
The text was updated successfully, but these errors were encountered: