-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datasetRw: add dataset upload file capabilities to domino-python (multi-part, multi-thread) #185
datasetRw: add dataset upload file capabilities to domino-python (multi-part, multi-thread) #185
Conversation
c76f0fa
to
e48bef8
Compare
UPLOAD_READ_TIMEOUT_IN_SEC = 30 | ||
|
||
|
||
class UploadChunk: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to be a good use case for a dataclass https://docs.python.org/3/library/dataclasses.html
domino/datasets.py
Outdated
url = self.routes.datasets_end_upload(self.dataset_id, self.upload_key, self.target_relative_path) | ||
return self.request_manager.get(url) | ||
|
||
def _create_chunk_queue(self) -> [UploadChunk]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have to use list[UploadChunk]
for proper type hinting here
domino/domino.py
Outdated
) | ||
|
||
try: | ||
uploader.start_upload_session() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like you could easily make this a context manager https://www.pythonmorsels.com/creating-a-context-manager/ to avoid having the user of this class know which method to call first (since there are no arguments once created)
domino/datasets.py
Outdated
return self._create_chunk(self.local_path_file_or_directory) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python like early return, no need for else
here
domino/datasets.py
Outdated
# append chunk to queue | ||
nested_q.append(self._create_chunk(relative_path_to_file)) | ||
# flattening list of chunks | ||
return [chunk for chunks in nested_q for chunk in chunks] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WAT is happening here 😆 . Can you name things differently? is there a way to flatten this in the previous for loop?
domino/datasets.py
Outdated
if should_upload: | ||
upload_try = 1 | ||
uploaded = False | ||
while upload_try <= MAX_UPLOAD_ATTEMPTS and not uploaded: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
big fan of making good use a decorator for this kind of logic https://pypi.org/project/retry/. This allow clearer code by making the function do only what it is supposed to do
self.upload_key = upload_key | ||
|
||
|
||
class Uploader: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment about making this a context manager class
tests/test_datasets.py
Outdated
|
||
assert response.status_code == 200 | ||
assert "filecache" in response.json() | ||
q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this q
supposed to be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha no!! thanks :)
domino/datasets.py
Outdated
except Exception: | ||
if upload_try > MAX_UPLOAD_ATTEMPTS: | ||
raise Exception(f"Uploading chunk {chunk.chunk_number} of {chunk.total_chunks} " | ||
f"for {chunk.file_name} failed. Please try again") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can add period after "Please try again"
def datasets_end_upload(self, dataset_id, upload_key, target_relative_path=None): | ||
url = self.host + f"/v4/datasetrw/datasets/{dataset_id}/snapshot/file/end/{upload_key}" | ||
if target_relative_path: | ||
url += f"?targetRelativePath={target_relative_path}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
domino/datasets.py
Outdated
self.relative_path = relative_path | ||
self.target_chunk_size = target_chunk_size | ||
self.total_chunks = total_chunks | ||
self.upload_key = upload_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: Might be clearer to match the order or the arguments of __init__
with the order the variables are assigned.
domino/datasets.py
Outdated
except Exception: | ||
if upload_try > MAX_UPLOAD_ATTEMPTS: | ||
raise Exception(f"Uploading chunk {chunk.chunk_number} of {chunk.total_chunks} " | ||
f"for {chunk.file_name} failed. Please try again") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could add period at the end of message.
domino/datasets.py
Outdated
def __init__( | ||
self, | ||
absolute_path: str, | ||
chunk_number: int, | ||
dataset_id: str, | ||
file_name: str, | ||
file_size: int, | ||
identifier: str, | ||
relative_path: str, | ||
target_chunk_size: int, | ||
total_chunks: int, | ||
upload_key: str, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the __init__
in a dataclass (that was the goal of my comment)
raise RuntimeError(f"upload key for {self.dataset_id} not found. Session could not start.") | ||
return self | ||
|
||
def __exit__(self, exc_type, exc_val, exc_tb): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might want to check on the exc_type to cancel the upload session in the event an exception was raised? This way you handle the whole lifecycle of the upload here instead of leaving that to the user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh i see!
domino/domino.py
Outdated
except ValueError: | ||
self.log.error(f"Erroneous input parameter. Please fix the parameters and try again") | ||
except Exception as e: | ||
self.log.error(f"Upload for dataset {dataset_id} and file {local_path_to_file_or_directory} failed, " | ||
f"canceling session. Please try again.") | ||
uploader.cancel_upload_session() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous comment, want to avoid that here
@pytest.mark.skipif( | ||
not domino_is_reachable(), reason="No access to a live Domino deployment" | ||
) | ||
def test_datasets_upload(default_domino_client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some more test cases.
domino/datasets.py
Outdated
except Exception as e: | ||
self.log.error("Upload failed. See error for details.") | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to use try..catch
here since it gets captured in the exit
domino/datasets.py
Outdated
if not self.upload_key: | ||
raise RuntimeError(f"upload key for {self.dataset_id} not found. Could not end session.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might want to check the exception before this since an error can happen before the upload_key is set
domino/datasets.py
Outdated
# catching errors | ||
if not self.upload_key: | ||
raise RuntimeError(f"upload key for {self.dataset_id} not found. Could not end session.") | ||
if exc_type != None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if exc_type is not None
is the python way of checking null values
domino/datasets.py
Outdated
f"attempting to cancel session. Please try again.") | ||
self.log.error(f"Error type: {exc_val}. Error message: {exc_tb}.") | ||
if not isinstance(exc_type, ValueError): | ||
self._cancel_upload_session() # is it is a ValueError, canceling session would fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happen if this raise an exception? afaik __exit__
must return True or False https://docs.python.org/3/library/contextlib.html#contextlib.ContextDecorator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it raises it! modifying below.
domino/datasets.py
Outdated
self.log.info("Upload session cancelled successfully.") | ||
except Exception as e: | ||
self.log.info("Cancelling upload session failed. See error for details.") | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably want to prevent the exception from being re-raise (see previous comment)
domino/domino.py
Outdated
target_chunk_size=target_chunk_size, | ||
target_relative_path=target_relative_path | ||
) as path: | ||
if path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in what scenario is path not evaluating to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah previously I was logging “Upload complete” here and was trying to find a way around the possibility of the ending failing - this is a remnant of that. good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving but couple of comments that we might want to iterate on a future PR.
domino/datasets.py
Outdated
self.upload_key = self.request_manager.post(start_upload_url, json=start_upload_body).json() | ||
if not self.upload_key: | ||
raise RuntimeError(f"upload key for {self.dataset_id} not found. Session could not start.") | ||
return self._upload() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure whether you should let the user (user of the context manager) initiate the upload or not, seems more correct to let them do it in case they want to do some logging before and after.
except Exception as e: | ||
self.log.error("Ending snapshot upload failed. See error for details. Attempting to cancel " | ||
"upload session.") | ||
self._cancel_upload_session() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see previous comment, the exit method should return true/false
Big hype, this seems like it closes #171! Thank you! |
Link to JIRA
DOM-52840
What issue does this pull request solve?
Needed a new endpoint to handle dataset upload
What is the solution?
Add endpoint as well as full client implementation + tests for single file, multi-part, multi-thread uploads
Testing
Briefly describe how the change was tested. The purpose of this section is that a reviewer can identify a test gap, if any.
e.g. "I ran an upgrade from 4.2 to 4.6".
Pull Request Reminders
References (optional)