Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/4285 limit disk on parallel upload #4329

Merged
merged 35 commits into from
Jan 29, 2023

Conversation

AndrewJGaut
Copy link
Contributor

Reasons for making this change

We want to make sure users can't bypass their disk quota limits by uploading files in parallel

Related issues

Fixes #4285

AndrewJGaut and others added 22 commits November 24, 2022 22:22
…odalab/codalab-worksheets into fix/4285-limit-disk-on-parallel-upload
@AndrewJGaut AndrewJGaut force-pushed the fix/4285-limit-disk-on-parallel-upload branch from da91bb6 to 06b0f91 Compare January 7, 2023 22:38
@AndrewJGaut AndrewJGaut marked this pull request as ready for review January 18, 2023 05:12
raise Exception(
'Upload aborted. User disk quota exceeded. '
'To apply for more quota, please visit the following link: '
'https://codalab-worksheets.readthedocs.io/en/latest/FAQ/'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we refactor this link out into a constant?

codalab/lib/upload_manager.py Outdated Show resolved Hide resolved
@@ -114,7 +114,7 @@ def _make_request(
raise RestClientException('Invalid JSON: ' + response_data, False)

def _upload_with_chunked_encoding(
self, method, url, query_params, fileobj, progress_callback=None
self, method, url, query_params, fileobj, pass_self=False, progress_callback=None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does pass_self mean? Can you document here and in the other function where you add it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -32,6 +33,11 @@ def upload_with_chunked_encoding(
:param url: String. Location or sub url that indicating where the file object will be uploaded.
:param need_response: Bool. Whether need to wait for the response.
:param progress_callback: Function. Callback function indicating upload progress.
:param json_api_client: JsonApiClient. None when this function is run by the server.
Used to update disk usage from client.
:param check_disk: bool. True if the user's disk should be checked during the upload.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This param doesn't exist anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -61,16 +67,33 @@ def upload_with_chunked_encoding(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the exact same fn as the one in upload_manager.py? If so we might want to refactor / unify in a future PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially. There are a few minor differences. I agree that it should be refactored.

@@ -92,7 +115,10 @@ def upload_with_chunked_encoding(
logging.debug("Socket timeout, retrying url: %s", url)
conn.send(b'\0')
logging.debug("Finished reading the response, url: %s", url)
print("Finished reading the response, url: %s", url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these print statements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

USER_READ_ONLY_FIELDS. We make this special function (which only allows
positive disk increments so that users can't decrement their disk used) to ensure
that we can safely increment user disk used without introducing a
security flaw.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add documentation as to what the request format should look like (i.e., the disk_used_increment key)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

_run_command([cl, 'work', worksheet_uuid])
# expect to fail when we upload something more than 2k bytes
check_contains(
'Upload aborted. User disk quota exceeded. '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just check if it contains "User disk quota exceeded" -- this way we don't need to change this test if we slightly modify the error message in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@AndrewJGaut AndrewJGaut merged commit b856bb0 into master Jan 29, 2023
@AndrewJGaut AndrewJGaut deleted the fix/4285-limit-disk-on-parallel-upload branch January 29, 2023 22:07
@wwwjn wwwjn mentioned this pull request Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make sure that users don't exceed disk quota with parallel uploads
2 participants