Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tabular: Add available disk logging + warning #3069

Merged
merged 3 commits into from
Mar 22, 2023

Conversation

Innixma
Copy link
Contributor

@Innixma Innixma commented Mar 21, 2023

Issue #, if available:

Description of changes:

  • Add available disk logging + warning to TabularPredictor.fit call.
  • This will highlight to the user when they have small amounts of disk space available.
  • If logic errors for whatever reason due to a strange env, it will fallback to a warning message that we were unable to calculating available disk space and asks to submit a GitHub issue and continue.
  • Note that the current threshold is 10 GB. This is too high for small datasets, and too low for large datasets, but properly calculating the expected disk space requirement is complex. 10 GB is a good initial value, however it depends on how many models are being trained and what presets are being used. I've added a TODO to improve this estimate in future.

Example:

Low Available Disk:

Beginning AutoGluon training ...
AutoGluon will save models to "AutogluonModels/ag-20230321_214349/"
AutoGluon Version:  0.6.2b20230103
Python Version:     3.8.10
Operating System:   Darwin
Platform Machine:   x86_64
Platform Version:   Darwin Kernel Version 21.6.0: Mon Dec 19 20:44:01 PST 2022; root:xnu-8020.240.18~2/RELEASE_X86_64
Disk Space Avail:   1.50 GB / 499.96 GB (0.3%)
	WARNING: Available disk space is low and there is a risk that AutoGluon will run out of disk during fit, causing an exception. 
	We recommend a minimum available disk space of 10 GB, and large datasets may require more.
Train Data Rows:    39073
Train Data Columns: 14

Standard (sufficient disk space):

Beginning AutoGluon training ...
AutoGluon will save models to "AutogluonModels/ag-20230321_214455/"
AutoGluon Version:  0.6.2b20230103
Python Version:     3.8.10
Operating System:   Darwin
Platform Machine:   x86_64
Platform Version:   Darwin Kernel Version 21.6.0: Mon Dec 19 20:44:01 PST 2022; root:xnu-8020.240.18~2/RELEASE_X86_64
Disk Space Avail:   332.76 GB / 499.96 GB (66.6%)
Train Data Rows:    39073
Train Data Columns: 14

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Innixma Innixma added this to the 0.7.1 Release milestone Mar 21, 2023
@Innixma Innixma added API & Doc Improvements or additions to documentation module: tabular labels Mar 21, 2023
Comment on lines +72 to +74
disk_stats = ResourceManager.get_disk_usage(path=self.path)
disk_free_gb = disk_stats.free / 1e9
disk_total_gb = disk_stats.total / 1e9
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice you created class methods for get_disk_usage_free, get_disk_usage_used, and get_disk_usage_total but just used get_disk_usage and extracted the free and total values. Should we consider deleting the get_disk_usage_{free,used,total} methods if we aren't going to use them, it's easy to extract the members individually, and will avoid multiple calls to shutil.disk_usage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I think they aren't necessary, so I've removed them.

logger.log(disk_verbosity,
f'Disk Space Avail: {disk_free_gb:.2f} GB / {disk_total_gb:.2f} GB '
f'({disk_proportion_avail * 100:.1f}%){disk_log_extra}')
except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its best to catch as specific of an exception as possible. This would allow us to make the following log statement if the issue is indeed related to the disk_usage call, but if another error is raised in this try block, we don't accidentally hide if from the user. If you know of a more specific exception, it could be good to use it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear what exception it would raise, as it would require me being able to simulate some strange unknown environment where this logic doesn't work correctly, and the shutil code does not make it clear what exception would be raised in these strange situations. Because of this, it is safest to have a generic exception catch to avoid completely breaking AutoGluon in these more niche scenarios that I don't know about. I'm unsure if these scenarios even exist, but it is very hard to rule out the possibility. (For example, running in the browser, running in colab, running in kaggle, running in docker, running on windows, running on specialized private hardware, running in a fully-in-memory disk environment, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in-line comment explaining this in the code

@github-actions
Copy link

Job PR-3069-88242ab is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3069/88242ab/index.html

@github-actions
Copy link

Job PR-3069-35fb649 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3069/35fb649/index.html

@Innixma Innixma merged commit 8c45480 into autogluon:master Mar 22, 2023
@Innixma Innixma modified the milestones: 0.7.1 Release, 0.8 Release May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API & Doc Improvements or additions to documentation module: tabular
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants