No score variance over 50 iterations despite multiple parameters switched #23

suciokhan · 2021-03-19T12:06:02Z

Attempted tuning a xgboost binary classifier on tf-idf data adjusting n_estimators, max_depth, and learning_rate and there was zero variation in the score for each of 50 iterations. When I manually tweak parameters and run a single training instance manually, I achieve score variations. Note: I have also tried this with the default optimizer for 20 iterations and different ranges for the parameter tuning, and it gave me the same results: the score is always 0.6590446358653093.

SYSTEM DETAILS:
Amazon SageMaker
Hyperactive ver: 3.0.5.1
Python ver: 3.6.13

Here is my code:

freq_df, y_labels = jc.prep_train_data('raw_data.pkl', remove_stopwords=False)

def model(opt):
    clf_xgb = xgb.XGBClassifier(objective='binary:logistic',
                            #eta=0.4,
                            #max_depth=8,
                            subsample=0.5,
                            base_score=np.mean(y_labels),
                            eval_metric = 'logloss',
                            missing=None,
                            use_label_encoder=False,
                            seed=42)
    
    scores = cross_val_score(clf_xgb, freq_df, y_labels, cv=5) # default is 5, hyperactive example is 3

    return scores.mean()

# Configure the range of hyperparameters we want to test out
search_space = {
    "n_estimators": list(range(500, 5000, 100)),
    "max_depth": list(range(6, 12)),
    "learning_rate": [0.1, 0.3, 0.4, 0.5, 0.7],
}

# Configure the optimizer
optimizer = SimulatedAnnealingOptimizer(
    epsilon=0.1,
    distribution="laplace",
    n_neighbours=4,
    rand_rest_p=0.1,
    p_accept=0.15,
    norm_factor="adaptive",
    annealing_rate=0.999,
    start_temp=0.8)

# Execute optimization
hyper = Hyperactive()
hyper.add_search(model, search_space, n_iter=50, optimizer=optimizer)
hyper.run()

# Print-out the results and save them to a dataframe
results_filename = "xgboost_hyperactive_results.csv"

search_data = hyper.results(model)
search_data.to_csv(results_filename, index=0)

The text was updated successfully, but these errors were encountered:

SimonBlanke · 2021-03-19T12:24:55Z

Could you provide the entire script? How do i get "jc"? I would like to reproduce this bug. Could you also provide a random_state that shows the bug?

SimonBlanke · 2021-03-19T12:27:42Z

I already have the suspicion that the SimulatedAnnealingOptimizer "sticks" to the edge of the search space. This can happen for those kinds of local optimizers when n_iter is very small.

suciokhan · 2021-03-19T12:52:47Z

jc is a script of helper functions I have for taking a dictionary of social media posts, their authors, dates, and recipients, pulling out the post texts, cleaning them, tokenizing them, converting into tf-idf, and generating labels for them. I also get the exact same score when I do not use SimulateAnnealing, and instead use the default random optimizer. freq_df is a dataframe of the tf-idf values for each token in the corpus, where each row is a separate document and the header has each token.

suciokhan · 2021-03-19T12:55:40Z

Apologies, but I'm not sure what you mean by providing a random_state; this is admittedly my first rodeo :)

SimonBlanke · 2021-03-19T12:56:25Z

I just found the "error". You create the parameters in the search space, but you are not using any of them. The model does not change during the optimization run.

Your objective function should look like this:

def model(opt):
    clf_xgb = xgb.XGBClassifier(
        n_estimators=opt["n_estimators"],
        max_depth=opt["max_depth"],
        learning_rate=opt["learning_rate"],
        objective="binary:logistic",
        # eta=0.4,
        # max_depth=8,
        subsample=0.5,
        base_score=np.mean(y_labels),
        eval_metric="logloss",
        missing=None,
        use_label_encoder=False,
        seed=42,
    )

    scores = cross_val_score(
        clf_xgb, freq_df, y_labels, cv=5
    )  # default is 5, hyperactive example is 3

    return scores.mean()

# Configure the range of hyperparameters we want to test out
search_space = {
    "n_estimators": list(range(500, 5000, 100)),
    "max_depth": list(range(6, 12)),
    "learning_rate": [0.1, 0.3, 0.4, 0.5, 0.7],
}

It is funny how I missed this in my answers above. If you are convinced something is wrong it is sometimes hard to see the obvious.

I will close this issue now, but if you have further questions about this you can ask them here.

suciokhan added the bug Something isn't working label Mar 19, 2021

SimonBlanke added question Further information is requested and removed bug Something isn't working labels Mar 19, 2021

SimonBlanke self-assigned this Mar 19, 2021

SimonBlanke closed this as completed Mar 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No score variance over 50 iterations despite multiple parameters switched #23

No score variance over 50 iterations despite multiple parameters switched #23

suciokhan commented Mar 19, 2021 •

edited

SimonBlanke commented Mar 19, 2021 •

edited

SimonBlanke commented Mar 19, 2021

suciokhan commented Mar 19, 2021 •

edited

suciokhan commented Mar 19, 2021

SimonBlanke commented Mar 19, 2021

No score variance over 50 iterations despite multiple parameters switched #23

No score variance over 50 iterations despite multiple parameters switched #23

Comments

suciokhan commented Mar 19, 2021 • edited

SimonBlanke commented Mar 19, 2021 • edited

SimonBlanke commented Mar 19, 2021

suciokhan commented Mar 19, 2021 • edited

suciokhan commented Mar 19, 2021

SimonBlanke commented Mar 19, 2021

suciokhan commented Mar 19, 2021 •

edited

SimonBlanke commented Mar 19, 2021 •

edited

suciokhan commented Mar 19, 2021 •

edited