Bug: Cannot load training states from checkpoints to resume training #374

alarivarmann · 2020-07-31T04:22:53Z

Describe the bug
When training AutoML on ResumablePipeline, logs are full of messages like these:
UserWarning: Cannot Load Step /models/resumable_pipeline/AutoML/ResumablePipeline (ResumablePipeline:ResumablePipeline) With Step Saver JoblibStepSaver.
saver.class.name))

To Reproduce

  HP_real = HyperparameterSpace({
    "learning_rate": Uniform(1e-5, 1),
    "max_depth": RandInt(2, 4),
    "n_estimators": Choice([30,60,90,100,130])
})

    pipeline_sk = ResumablePipeline([  
    # A Pipeline is composed of multiple chained steps. Steps
    # can alter the data before passing it to the next steps.
    AddFeatures([
        PCA(n_components=2),
        FastICA(n_components=2),
    ]),
    RidgeModelStacking([
        RandomForestRegressor(),
        GradientBoostingRegressor(warm_start=False, min_samples_leaf=2, random_state=42)  # validation_fraction = 0.2
    ])
], cache_folder=resumable_pipeline_folder).set_hyperparams_space(HP_real)

time_a = time.time()
auto_ml = AutoML(
    pipeline_sk,
    #AutoMLContainer  = AutoMLContainer(main_scoring_metric_name = "mse"),
    refit_trial=True,
    n_trials=int(epochs),
    cache_folder_when_no_handle=resumable_pipeline_folder,
    validation_splitter=ValidationSplitter(test_set_ratio),
    hyperparams_optimizer=RandomSearchHyperparameterSelectionStrategy(),
    scoring_callback=ScoringCallback(mean_squared_error, higher_score_is_better=False),
    callbacks=[
        MetricCallback('mse', metric_function=mean_squared_error, higher_score_is_better=False)
    ]
)
auto_ml = auto_ml.fit(train, y_train)  # if you use custom label encoder, then fit takes in the whole data at a time

Expected behavior
I am expecting that there would be no errors such as these. The goal is to be able to warm start training from old checkpoints.
Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

alexbrillant · 2020-08-08T22:50:53Z

hello @alarivarmann you are right about the Warning, I am fixing it right now. However, if you want to use checkpoints, you need to add one inside your pipeline. I am trying your code out. Thanks for creating this issue.

alexbrillant · 2020-08-08T23:02:53Z

I have fixed the warnings here : #378

alexbrillant · 2020-08-08T23:14:31Z

@alarivarmann you would need to add a checkpoint at the end of your pipeline for it to be truly resumable...
check out the documentation that we have here for "checkpoints" : https://www.neuraxle.org/stable/examples/caching/plot_auto_ml_checkpoint.html

However, do you really need this ? It seems like you are trying to run an AutoML loop, and not use any checkpoints. I think you should use Pipeline instead of ResumablePipeline.

You could use the Trainer alone to do this, and not the AutoML loop. I think you simply want to save, and load the pipeline you have trained on. Am I right ? If so, please take a look at the Step Saving documentation : https://www.neuraxle.org/stable/step_saving_and_lifecycle.html

I don't think you are trying to search hyperparams here. We need to add more examples for this...

This feature is still experimental. It will be fully ready/functional after this pr here : #377

alarivarmann · 2020-08-19T09:25:48Z

Hi Alex,

Thanks for your response.

This code snippet was just an example. The idea why I wished to try Neuraxle is to have AutoML Pipeline for which training can be resumed from checkpoints. Does Neuraxle currently support this feature? Thanks a lot!

alexbrillant · 2020-08-20T01:18:20Z

Hi Alex,

Thanks for your response.

This code snippet was just an example. The idea why I wished to try Neuraxle is to have AutoML Pipeline for which training can be resumed from checkpoints. Does Neuraxle currently support this feature? Thanks a lot!

Yes, there is an example here : https://www.neuraxle.org/stable/examples/caching/plot_auto_ml_checkpoint.html
The step saving checkpoints had problems, and they have been fixed here : #377 I am waiting for my boss to approve the pull request : @guillaume-chevalier

Next release will have this fix that includes the step saving checkpoint. Thanks for trying out Neuraxle :)

alarivarmann · 2020-09-26T22:24:13Z

OK Thanks!

When do you plan to implement Bayesian optimization/TPE similar to Optuna?

guillaume-chevalier · 2020-09-29T23:40:44Z

@alarivarmann Until Alexandre answers, I think you could dig in our unit tests for the AutoML hyperparameter selection algorithms. You should be able to see example usages of our TPE !

We just need to add some documentaiton, I think it was working.

@alexbrillant can you confirm that we have a functional and working TPE implementation ? I think this should be documented very soon, this is an important point to not forget.

guillaume-chevalier · 2020-09-29T23:43:37Z

@alarivarmann So here are the TPE tests and how you can use it:
https://github.com/Neuraxio/Neuraxle/blob/master/testing/metaopt/test_tpe.py

You may want to understand how @pytest.mark.parametrize works if you have difficulty reading those tests. And the delayed function of joblib.

alarivarmann added bug Something isn't working invalid This doesn't seem right labels Jul 31, 2020

alarivarmann changed the title ~~Bug: Cannot load states from training checkpoints~~ Bug: Cannot load training states from checkpoints to resume training Jul 31, 2020

guillaume-chevalier mentioned this issue Sep 29, 2020

Feature: Create TPE documentation example #409

Closed

guillaume-chevalier closed this as completed Oct 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Cannot load training states from checkpoints to resume training #374

Bug: Cannot load training states from checkpoints to resume training #374

alarivarmann commented Jul 31, 2020

alexbrillant commented Aug 8, 2020

alexbrillant commented Aug 8, 2020

alexbrillant commented Aug 8, 2020 •

edited

alarivarmann commented Aug 19, 2020

alexbrillant commented Aug 20, 2020

alarivarmann commented Sep 26, 2020

guillaume-chevalier commented Sep 29, 2020

guillaume-chevalier commented Sep 29, 2020 •

edited

Bug: Cannot load training states from checkpoints to resume training #374

Bug: Cannot load training states from checkpoints to resume training #374

Comments

alarivarmann commented Jul 31, 2020

alexbrillant commented Aug 8, 2020

alexbrillant commented Aug 8, 2020

alexbrillant commented Aug 8, 2020 • edited

alarivarmann commented Aug 19, 2020

alexbrillant commented Aug 20, 2020

alarivarmann commented Sep 26, 2020

guillaume-chevalier commented Sep 29, 2020

guillaume-chevalier commented Sep 29, 2020 • edited

alexbrillant commented Aug 8, 2020 •

edited

guillaume-chevalier commented Sep 29, 2020 •

edited