Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up disk quota computation #4354

Merged
merged 18 commits into from
Feb 8, 2023
Merged

Conversation

AndrewJGaut
Copy link
Contributor

@AndrewJGaut AndrewJGaut commented Jan 17, 2023

Reasons for making this change

Current method of updating disk quota is slow. Basically, currently the disk quota is updated using the following function:

def _get_disk_used(self, user_id):
        return (
            self.search_bundles(user_id, ['size=.sum', 'owner_id=' + user_id, 'data_hash=%'])[
                'result'
            ]
            or 0
        )

and then setting the disk quota to the returned value. However, doing such a search is quite slow, and so we want to do it faster. This PR speeds up the computation and just increments the user's disk quota each time by the correct amount.

This PR also adds a bunch of disk quota tests.

Related issues

#4350 and #4351

@AndrewJGaut
Copy link
Contributor Author

TODO @AndrewJGaut : Add benchmarking

…h the data_size stuff. I'm not surprised that it's hard to predict for a running bundle, but I am surprised that something weird happens when uploading two things at once. The bundle size isn't the sum of the two archives. Very weird
@AndrewJGaut AndrewJGaut reopened this Jan 18, 2023
@@ -470,8 +472,6 @@ def upload_to_bundle_store(
params={'success': False, 'error_msg': f'Bypass server upload error. {err}',},
)
raise err
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't see it in this abridged view, but if you look at the full file the code looks ilke:

 try:
                ....
                self._client.update_bundle_state(bundle['id'], params={'success': True})
            except Exception as err:
                ...
            else:
                self._client.update_bundle_state(bundle['id'], params={'success': True})

What does this mean? Well, the 'else' branch triggers in a try-catch block when no error is thrown. In other words, when no errors was thrown, the self._client.update_bundle_state(bundle['id'], params={'success': True}) call was triggered twice. This wasn't an issue (except for efficiency reasons) before since the update wasn't incremental (it just reset the whole value), but it was for me since the disk increment happened twice.

)
_run_command([cl, 'rm', uuid])
check_equals(disk_used, _run_command([cl, 'uinfo', 'codalab', '-f', 'disk_used']))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these test code paths end up calling cleanup_existing_contents?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, shoot. No, I don't think so. Let me add one more test for that before I merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that!

@AndrewJGaut AndrewJGaut force-pushed the speed-up-disk-quota-computation branch from cd195d3 to 47a8b47 Compare February 8, 2023 06:38
@AndrewJGaut AndrewJGaut merged commit e40009b into master Feb 8, 2023
@AndrewJGaut AndrewJGaut deleted the speed-up-disk-quota-computation branch February 8, 2023 08:37
@wwwjn wwwjn mentioned this pull request Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants