Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tabular: Fix crash when save path is absolute #3288

Merged
merged 3 commits into from
Jun 8, 2023

Conversation

Innixma
Copy link
Contributor

@Innixma Innixma commented Jun 7, 2023

Issue #, if available:

Description of changes:

  • If using an absolute path in predictor init, will crash upon load.

This PR fixes the obvious error so that the below script works, however the current fix is not a good idea long-term, as I have no idea if there will be edge-cases where it fails. I also don't know if it will work when using multiple disk drives in a single process.

The root of the issue is that StackerEnsembleModel does something it shouldn't: It has the absolute paths of both itself and its base models stored as variables. Therefore, in order to figure out where the base models are upon loading in a new location, it isn't sufficient to look at the old absolute paths, instead they need to be regenerated based on 1. the old absolute path, 2. the new path.

For example:

Model 1: `/abc/def/`
Model 2: `./foo/bar/`


# Model 1 where model 2 is a base model:
self.path_root = `/abc/def/`
self.base_model_paths_dict = {
    'model_2': './foo/bar/'
}

When loaded from a new location, this can cause issues. It worked in the past except for cross-os loading.

new_path = './new/location/'
new_model_2_path = ?????  # <------- This is the problem, hard to figure out the correct path

The solution is to either

  1. Delete StackerEnsembleModel and have trainer deal with this (Ideal, but significant work)
  2. Convert absolute paths to relative paths, or otherwise hack things to avoid the breaking situation (What I've done in this PR, but I don't like it. It probably fixes things, but hard to know for sure in all cases).
  3. Bandaid solution by try/except on the path conversion: If can't convert because absolute, assume same machine and continue (will crash if cross-OS and absolute path was used during fit).

Reproducible Example:

from autogluon.tabular import TabularPredictor, TabularDataset


if __name__ == '__main__':
    data_root = 'https://autogluon.s3.amazonaws.com/datasets/Inc/'
    train_data = TabularDataset(data_root + 'train.csv')
    train_data = train_data.sample(500)
    test_data = TabularDataset(data_root + 'test.csv')

    predictor = TabularPredictor(
        label='class',
        path='/tmp/tmpfe_o0p7g/',
    ).fit(
        train_data=train_data,
        hyperparameters={'DUMMY': {}},
        fit_weighted_ensemble=False,
        num_bag_folds=2,
        num_bag_sets=1,
    )
    predictor.persist_models('best')
    predictor.leaderboard(test_data)

Exception:

    assert not PathConverter._is_absolute(path), "It is ambiguous on how to convert an absolute path. Please provide a relative path instead"
AssertionError: It is ambiguous on how to convert an absolute path. Please provide a relative path instead

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Innixma Innixma added bug Something isn't working module: tabular priority: 0 Maximum priority labels Jun 7, 2023
@Innixma Innixma added this to the 0.8 Release milestone Jun 7, 2023
@Innixma Innixma requested a review from yinweisu June 7, 2023 03:31
@github-actions
Copy link

github-actions bot commented Jun 7, 2023

Job PR-3288-1fc42e4 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3288/1fc42e4/index.html

@github-actions
Copy link

github-actions bot commented Jun 8, 2023

Job PR-3288-bcd61c2 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3288/bcd61c2/index.html

@Innixma Innixma changed the title [WIP] Tabular: Fix crash when save path is absolute Tabular: Fix crash when save path is absolute Jun 8, 2023
Copy link
Collaborator

@yinweisu yinweisu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Also verified cross-os loading and cloud endpoint

@yinweisu yinweisu merged commit 03d4f7e into autogluon:master Jun 8, 2023
28 checks passed
@yinweisu yinweisu mentioned this pull request Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module: tabular priority: 0 Maximum priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants