-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunked uploader #350
Chunked uploader #350
Conversation
Hi @carycheng, thanks for the pull request. Before we can merge it, we need you to sign our Contributor License Agreement. You can do so electronically here: http://opensource.box.com/cla Once you have signed, just add a comment to this pull request saying, "CLA signed". Thanks! |
Pull Request Test Coverage Report for Build 1265
💛 - Coveralls |
boxsdk/object/file.py
Outdated
Create a new chunked upload session for uploading a new version of the file. | ||
|
||
:param file_size: | ||
The size of the file that will be uploaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth noting that this is in bytes
boxsdk/object/file.py
Outdated
:type file_size: | ||
`int` | ||
:param file_name: | ||
The name of the file version that will be uploaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't totally correct; this parameter renames the entire file on upload.
boxsdk/object/folder.py
Outdated
response = self._session.post( | ||
url, | ||
data=json.dumps(body_params), | ||
).json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit hard to read spread across multiple lines; does it fit all on one line instead?
boxsdk/object/upload_session.py
Outdated
data=content_stream | ||
).json() | ||
|
||
def commit(self, parts, content_sha1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also allow passing arbitrary file attributes and an ETag; see https://developer.box.com/v2.0/reference/#commit-upload
Also, parts
can be optional — if the user doesn't pass it in you can just call the API to get the list of parts the server has and use those (if you just want to trust the server).
boxsdk/object/upload_session.py
Outdated
|
||
range_end = min(offset + self.part_size - 1, total_size - 1) # pylint:disable=no-member | ||
|
||
return self._session.put( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to return just the actual part dict
here, to make it easier for developers to form the correct array for commit()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure that the return value of this method is just the part record, and not the weird wrapper around it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a nice method on a ChunkedUploadPart
object.
parts = mock_box_session.get.return_value.json.return_value = { | ||
'entries': [ | ||
{ | ||
'part': { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the API reference, this isn't what the response looks like. Please align with the actual response format and ensure that this still works correctly
docs/file.md
Outdated
part_bytes = BytesIO(b'abcdefgh') | ||
upload_session = client.upload_session('11493C07ED3EABB6E59874D3A1EF3581') | ||
chunk = part_bytes.read(upload_session.part_size) | ||
offset = 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth showing that the offset is a multiple of the part size in this example
docs/file.md
Outdated
|
||
### Manual Process | ||
|
||
For more complicated upload scenarios, such as those being coordinated across |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove most of this verbiage, since the automatic uploader will not be available right away. Instead, it might be nice to take the manual test code you used and put an end-to-end example here
part = test_upload_session.upload_part(chunk, offset, total_size) | ||
|
||
mock_box_session.put.assert_called_once_with(expected_url, data=chunk, headers=expected_headers) | ||
assert part['part']['sha1'] == expected_sha1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return value of this method should be the part object itself; this should ideally assert that:
assert part['sha1'] == expected_sha1
Also, we should assert that this is a dict
boxsdk/object/file.py
Outdated
} | ||
if file_name is not None: | ||
body_params['file_name'] = file_name | ||
url = self.get_url('{0}'.format('upload_sessions')).replace(API.BASE_API_URL, API.UPLOAD_URL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url = self.get_url('{0}'.format('upload_sessions')).replace(API.BASE_API_URL, API.UPLOAD_URL) | |
url = self.get_url('{0}'.format('upload_sessions')).replace( | |
self._session.api_config.BASE_API_URL, | |
self._session.api_config.UPLOAD_URL, | |
) |
docs/file.md
Outdated
To create an upload session for uploading a large version, use `file.create_upload_session(file_size, file_name=None)` | ||
|
||
```python | ||
file_size = 197520 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using a realistic size, since our API won't let you upload such a small file in chunks.
docs/file.md
Outdated
|
||
```python | ||
from io import BytesIO | ||
offset = 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example isn't likely to help a developer. The numbers don't match what the API will allow, and the code to read chunk_size
bytes from a stream is unlikely to work for a real chunk size.
Co-Authored-By: carycheng <ccheng@box.com>
Co-Authored-By: carycheng <ccheng@box.com>
Co-Authored-By: carycheng <ccheng@box.com>
boxsdk/object/file.py
Outdated
} | ||
if file_name is not None: | ||
body_params['file_name'] = file_name | ||
url = self.get_url('upload_sessions').replace(API.BASE_API_URL, API.UPLOAD_URL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use the local config here, rather than the global one, to ensure that a user's configured URLs get used.
boxsdk/object/upload_session.py
Outdated
:type fields: | ||
`Iterable` of `unicode` | ||
:returns: | ||
Returns a `list` of parts uploaded so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns a BoxObjectCollection
instead of a list
now
boxsdk/object/upload_session.py
Outdated
:param part_content_sha1: | ||
SHA-1 hash of the part's content. If not specified, this will be calculated. | ||
:type part_content_sha1: | ||
`unicode` or None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right type? Below you encode the hash in a way that I think requires bytes
boxsdk/object/upload_session.py
Outdated
else: | ||
for part in self.get_parts(): | ||
parts_list.append(part) | ||
body['parts'] = parts_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be simplified to a list comprehension:
body['parts'] = [part for part in self.get_parts()]
Co-Authored-By: carycheng <ccheng@box.com>
No description provided.