Fix buffer size when binary data is returned in multipart response #145

bartonip · 2021-02-22T06:46:01Z

When binary data is returned in a response, len is not sufficient to determine the required buffer size. getsizeof has been used instead to provide a more accurate buffer value.

CLAassistant · 2021-02-22T06:46:06Z

All committers have signed the CLA.

phanak-sap · 2021-02-22T07:36:06Z

VERSION

@@ -1 +1 @@
-1.7.0
+1.7.1


Please do not change the version inside the PR. While in this case it will probably go out just after merge, such practice would create problems for us very easily (imagine multiple PRs). It is better to leave the version management up to maintainers.

Also, what actually would save us time, would be the update of CHANGELOG.md :)

phanak-sap · 2021-02-22T07:39:05Z

pyodata/v2/service.py

            dict(response.getheaders()),
            response.status,
-            response.read(len(data))  # the len here will give a 'big enough' value to read the whole content
+            response.read(sys.getsizeof(data))  # the getsizeof here will actually give a 'big enough' value to read the whole content


Since this is a fix for a bug, could you please capture it with a test?

Pls add new test to test_service_v2.py, ideally one that would reproduce the bug before your fix and pass after your fix. This is relevant to big binary file, but you don't need to commit such big file (pls don't, git repo will be smaller :) ), just generate it inside the test.

phanak-sap · 2021-02-22T07:55:04Z

Apart from the line comment, your change fails builds because of linter failure.

Relevant for you is the:
************* Module pyodata.v2.service
pyodata/v2/service.py:143: [C0301(line-too-long), ] Line too long (134/120)

the getsizeof will give 'big enough' value to read the whole content

Sadly, together with your change came new version of pylint, so other two failures are not relevant, but new rules applied to old code.

I have fixed this in PR #146, so now the upgrade of linters will happen deterministically and with its own build-per-PR. Pls pull from master into your branch , so the build will pass and we can merge the PR. You can run the linter locally with pip install -r dev-requirements.txt and pylint --rcfile=.pylintrc --output-format=parseable --reports=no pyodata

phanak-sap

Thank you for the fix. Please refer to the comments what needs to be changed so this can be merged.

Also the CLA needs to be signed, check on the PR the respective check. I have updated the CONTRIBUTING.md so its necessity is more clear, you can now refer to it for any PR.

inxonic · 2022-01-17T21:08:19Z

Hi @bartonip, I'd like to anwer #issuecomment-1011475876 here, because it's related to this issue.

The issue with len vs getsizeof is really that the HTTP response itself at the lowest level is a byte stream that gets translated
into a string by the requests library.

I agree, that's probably a bug, that will truncate the response body and I think this will happen, when there are many multibyte characters in the payload. The len is taken including the header, so the bug will be effective, if the extra bytes of the multibyte characters are more than the size of the header.

My understanding of what happens is that in the case of say, image data that is sent as pure binary (not as a base64 string represented as binary) when you run len on that resultant byte string the count will be too small due to bytes like \x00 not actually counting as a character in a string.

However, if you receive arbitrary binary data, you will have "surprises" anyway. Like you've noted before, the data is decoded as UTF-8 (actually I think this happens in pyodata, not in the requests library). Not every arbitrary series of bytes is valid UTF-8 (e.g. 0xffis not) and you will see conversion errors.

The OData spec knows different primitive types and defines a specific encoding for each of them. There is no type, that permits the server to send arbitrary binary data (except when adressing the $value property and I don't think, pyodata supports that).

As the affected lines of code are called through a few indirections, it's really guesswork to say, how you could run into this bug. So if you did, could you boil this down to a reproducer or even a testcase?

bartonip · 2022-01-17T21:56:00Z

Yep the situation where I've run into receiving pure binary data is receiving it through the $value property.

The encoding of the string becomes irrelevant when using getsizeof because, unless I am mistaken, it measures the size of the region of memory the variable is stored in. So even though there are invalid bytes for UTF-8 like 0xff, the byte is still factored into the size because it is in that region of memory. This is the best recollection I can give you from when I was last using the pyodata package almost a year ago so I may be wrong.

I will work on a test case this weekend, although no promises as I am not familiar with this codebase anymore.

phanak-sap · 2023-07-14T07:05:31Z

Hi @bartonip I understand that you do not want to work on this PR anymore.

But since you came into contact with this bug - and seems still valid for me - could you please provide if not test that reproduces the problem, at least attach a file with response that creates problem with the response.read(len(data)).

I was not able to reproduce your problem and seems @inxonic was not able as well.

phanak-sap reviewed Feb 22, 2021

View reviewed changes

phanak-sap requested changes Feb 22, 2021

View reviewed changes

phanak-sap marked this pull request as draft July 21, 2021 17:33

phanak-sap mentioned this pull request Jul 21, 2021

Support OData Edm.Stream #164

Open

phanak-sap added the missing tests label Jul 21, 2021

phanak-sap self-assigned this Jul 21, 2021

phanak-sap mentioned this pull request Jan 11, 2022

Fix Edm.Binary literal representation #186

Merged

bartonip closed this Jul 14, 2023

bartonip force-pushed the master branch 2 times, most recently from bdc4f15 to 1710d96 Compare July 14, 2023 04:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix buffer size when binary data is returned in multipart response #145

Fix buffer size when binary data is returned in multipart response #145

Uh oh!

bartonip commented Feb 22, 2021

Uh oh!

CLAassistant commented Feb 22, 2021 •

edited by cla-assistant bot

Loading

Uh oh!

phanak-sap Feb 22, 2021 •

edited

Loading

Uh oh!

phanak-sap Feb 22, 2021

Uh oh!

phanak-sap commented Feb 22, 2021

Uh oh!

phanak-sap left a comment •

edited

Loading

Uh oh!

inxonic commented Jan 17, 2022

Uh oh!

bartonip commented Jan 17, 2022 •

edited

Loading

Uh oh!

phanak-sap commented Jul 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		@@ -1 +1 @@
		1.7.0
		1.7.1

Fix buffer size when binary data is returned in multipart response #145

Fix buffer size when binary data is returned in multipart response #145

Uh oh!

Conversation

bartonip commented Feb 22, 2021

Uh oh!

CLAassistant commented Feb 22, 2021 • edited by cla-assistant bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phanak-sap Feb 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

phanak-sap Feb 22, 2021

Choose a reason for hiding this comment

Uh oh!

phanak-sap commented Feb 22, 2021

Uh oh!

phanak-sap left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

inxonic commented Jan 17, 2022

Uh oh!

bartonip commented Jan 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phanak-sap commented Jul 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented Feb 22, 2021 •

edited by cla-assistant bot

Loading

phanak-sap Feb 22, 2021 •

edited

Loading

phanak-sap left a comment •

edited

Loading

bartonip commented Jan 17, 2022 •

edited

Loading