Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunked uploader #350

Merged
merged 67 commits into from
Oct 29, 2018
Merged

Chunked uploader #350

merged 67 commits into from
Oct 29, 2018

Conversation

carycheng
Copy link

No description provided.

@boxcla
Copy link

boxcla commented Oct 9, 2018

Hi @carycheng, thanks for the pull request. Before we can merge it, we need you to sign our Contributor License Agreement. You can do so electronically here: http://opensource.box.com/cla

Once you have signed, just add a comment to this pull request saying, "CLA signed". Thanks!

@carycheng carycheng changed the base branch from 1.6 to master October 9, 2018 23:47
@carycheng carycheng closed this Oct 11, 2018
@carycheng carycheng reopened this Oct 11, 2018
@coveralls
Copy link

coveralls commented Oct 11, 2018

Pull Request Test Coverage Report for Build 1265

  • 65 of 65 (100.0%) changed or added relevant lines in 7 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.08%) to 96.964%

Totals Coverage Status
Change from base Build 1260: 0.08%
Covered Lines: 2395
Relevant Lines: 2470

💛 - Coveralls

Create a new chunked upload session for uploading a new version of the file.

:param file_size:
The size of the file that will be uploaded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth noting that this is in bytes

:type file_size:
`int`
:param file_name:
The name of the file version that will be uploaded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't totally correct; this parameter renames the entire file on upload.

response = self._session.post(
url,
data=json.dumps(body_params),
).json()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit hard to read spread across multiple lines; does it fit all on one line instead?

data=content_stream
).json()

def commit(self, parts, content_sha1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also allow passing arbitrary file attributes and an ETag; see https://developer.box.com/v2.0/reference/#commit-upload

Also, parts can be optional — if the user doesn't pass it in you can just call the API to get the list of parts the server has and use those (if you just want to trust the server).


range_end = min(offset + self.part_size - 1, total_size - 1) # pylint:disable=no-member

return self._session.put(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to return just the actual part dict here, to make it easier for developers to form the correct array for commit()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure that the return value of this method is just the part record, and not the weird wrapper around it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a nice method on a ChunkedUploadPart object.

parts = mock_box_session.get.return_value.json.return_value = {
'entries': [
{
'part': {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the API reference, this isn't what the response looks like. Please align with the actual response format and ensure that this still works correctly

boxsdk/object/file.py Outdated Show resolved Hide resolved
boxsdk/object/file.py Outdated Show resolved Hide resolved
boxsdk/object/file.py Outdated Show resolved Hide resolved
boxsdk/object/folder.py Outdated Show resolved Hide resolved
boxsdk/object/folder.py Outdated Show resolved Hide resolved
docs/file.md Outdated
part_bytes = BytesIO(b'abcdefgh')
upload_session = client.upload_session('11493C07ED3EABB6E59874D3A1EF3581')
chunk = part_bytes.read(upload_session.part_size)
offset = 32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth showing that the offset is a multiple of the part size in this example

docs/file.md Outdated

### Manual Process

For more complicated upload scenarios, such as those being coordinated across
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove most of this verbiage, since the automatic uploader will not be available right away. Instead, it might be nice to take the manual test code you used and put an end-to-end example here

test/unit/object/test_file.py Show resolved Hide resolved
test/unit/object/test_upload_session.py Outdated Show resolved Hide resolved
part = test_upload_session.upload_part(chunk, offset, total_size)

mock_box_session.put.assert_called_once_with(expected_url, data=chunk, headers=expected_headers)
assert part['part']['sha1'] == expected_sha1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value of this method should be the part object itself; this should ideally assert that:

assert part['sha1'] == expected_sha1

Also, we should assert that this is a dict

}
if file_name is not None:
body_params['file_name'] = file_name
url = self.get_url('{0}'.format('upload_sessions')).replace(API.BASE_API_URL, API.UPLOAD_URL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
url = self.get_url('{0}'.format('upload_sessions')).replace(API.BASE_API_URL, API.UPLOAD_URL)
url = self.get_url('{0}'.format('upload_sessions')).replace(
self._session.api_config.BASE_API_URL,
self._session.api_config.UPLOAD_URL,
)

boxsdk/object/folder.py Outdated Show resolved Hide resolved
boxsdk/object/upload_session.py Show resolved Hide resolved
boxsdk/object/upload_session.py Outdated Show resolved Hide resolved
boxsdk/object/upload_session.py Outdated Show resolved Hide resolved
docs/file.md Outdated
To create an upload session for uploading a large version, use `file.create_upload_session(file_size, file_name=None)`

```python
file_size = 197520
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a realistic size, since our API won't let you upload such a small file in chunks.

docs/file.md Outdated

```python
from io import BytesIO
offset = 32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example isn't likely to help a developer. The numbers don't match what the API will allow, and the code to read chunk_size bytes from a stream is unlikely to work for a real chunk size.

docs/file.md Show resolved Hide resolved
boxsdk/object/upload_session.py Outdated Show resolved Hide resolved
docs/file.md Show resolved Hide resolved
Matt Willer and others added 3 commits October 19, 2018 10:55
Co-Authored-By: carycheng <ccheng@box.com>
Co-Authored-By: carycheng <ccheng@box.com>
Co-Authored-By: carycheng <ccheng@box.com>
}
if file_name is not None:
body_params['file_name'] = file_name
url = self.get_url('upload_sessions').replace(API.BASE_API_URL, API.UPLOAD_URL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use the local config here, rather than the global one, to ensure that a user's configured URLs get used.

See https://box-python-sdk.readthedocs.io/en/latest/boxsdk.session.html#boxsdk.session.session.Session.api_config

boxsdk/object/folder.py Outdated Show resolved Hide resolved
:type fields:
`Iterable` of `unicode`
:returns:
Returns a `list` of parts uploaded so far.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns a BoxObjectCollection instead of a list now

:param part_content_sha1:
SHA-1 hash of the part's content. If not specified, this will be calculated.
:type part_content_sha1:
`unicode` or None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right type? Below you encode the hash in a way that I think requires bytes

else:
for part in self.get_parts():
parts_list.append(part)
body['parts'] = parts_list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be simplified to a list comprehension:

body['parts'] = [part for part in self.get_parts()]

boxsdk/object/upload_session.py Show resolved Hide resolved
@carycheng carycheng merged commit edbdb08 into master Oct 29, 2018
@carycheng carycheng deleted the chunked_uploader branch October 29, 2018 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants