Skip to content

Conversation

@bartonip
Copy link
Contributor

When binary data is returned in a response, len is not sufficient to determine the required buffer size. getsizeof has been used instead to provide a more accurate buffer value.

@CLAassistant
Copy link

CLAassistant commented Feb 22, 2021

CLA assistant check
All committers have signed the CLA.

VERSION Outdated
@@ -1 +1 @@
1.7.0
1.7.1
Copy link
Contributor

@phanak-sap phanak-sap Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not change the version inside the PR. While in this case it will probably go out just after merge, such practice would create problems for us very easily (imagine multiple PRs). It is better to leave the version management up to maintainers.

Also, what actually would save us time, would be the update of CHANGELOG.md :)

dict(response.getheaders()),
response.status,
response.read(len(data)) # the len here will give a 'big enough' value to read the whole content
response.read(sys.getsizeof(data)) # the getsizeof here will actually give a 'big enough' value to read the whole content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a fix for a bug, could you please capture it with a test?

Pls add new test to test_service_v2.py, ideally one that would reproduce the bug before your fix and pass after your fix. This is relevant to big binary file, but you don't need to commit such big file (pls don't, git repo will be smaller :) ), just generate it inside the test.

@phanak-sap
Copy link
Contributor

Apart from the line comment, your change fails builds because of linter failure.

Relevant for you is the:
************* Module pyodata.v2.service
pyodata/v2/service.py:143: [C0301(line-too-long), ] Line too long (134/120)

the getsizeof will give 'big enough' value to read the whole content

Sadly, together with your change came new version of pylint, so other two failures are not relevant, but new rules applied to old code.

I have fixed this in PR #146, so now the upgrade of linters will happen deterministically and with its own build-per-PR. Pls pull from master into your branch , so the build will pass and we can merge the PR. You can run the linter locally with pip install -r dev-requirements.txt and pylint --rcfile=.pylintrc --output-format=parseable --reports=no pyodata

Copy link
Contributor

@phanak-sap phanak-sap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix. Please refer to the comments what needs to be changed so this can be merged.

Also the CLA needs to be signed, check on the PR the respective check. I have updated the CONTRIBUTING.md so its necessity is more clear, you can now refer to it for any PR.

@phanak-sap phanak-sap marked this pull request as draft July 21, 2021 17:33
@phanak-sap phanak-sap self-assigned this Jul 21, 2021
@inxonic
Copy link
Contributor

inxonic commented Jan 17, 2022

Hi @bartonip, I'd like to anwer #issuecomment-1011475876 here, because it's related to this issue.

The issue with len vs getsizeof is really that the HTTP response itself at the lowest level is a byte stream that gets translated
into a string by the requests library.

I agree, that's probably a bug, that will truncate the response body and I think this will happen, when there are many multibyte characters in the payload. The len is taken including the header, so the bug will be effective, if the extra bytes of the multibyte characters are more than the size of the header.

My understanding of what happens is that in the case of say, image data that is sent as pure binary (not as a base64 string represented as binary) when you run len on that resultant byte string the count will be too small due to bytes like \x00 not actually counting as a character in a string.

However, if you receive arbitrary binary data, you will have "surprises" anyway. Like you've noted before, the data is decoded as UTF-8 (actually I think this happens in pyodata, not in the requests library). Not every arbitrary series of bytes is valid UTF-8 (e.g. 0xffis not) and you will see conversion errors.

The OData spec knows different primitive types and defines a specific encoding for each of them. There is no type, that permits the server to send arbitrary binary data (except when adressing the $value property and I don't think, pyodata supports that).

As the affected lines of code are called through a few indirections, it's really guesswork to say, how you could run into this bug. So if you did, could you boil this down to a reproducer or even a testcase?

@bartonip
Copy link
Contributor Author

bartonip commented Jan 17, 2022

Yep the situation where I've run into receiving pure binary data is receiving it through the $value property.

The encoding of the string becomes irrelevant when using getsizeof because, unless I am mistaken, it measures the size of the region of memory the variable is stored in. So even though there are invalid bytes for UTF-8 like 0xff, the byte is still factored into the size because it is in that region of memory. This is the best recollection I can give you from when I was last using the pyodata package almost a year ago so I may be wrong.

I will work on a test case this weekend, although no promises as I am not familiar with this codebase anymore.

@bartonip bartonip closed this Jul 14, 2023
@bartonip bartonip force-pushed the master branch 2 times, most recently from bdc4f15 to 1710d96 Compare July 14, 2023 04:55
@phanak-sap
Copy link
Contributor

Hi @bartonip I understand that you do not want to work on this PR anymore.

But since you came into contact with this bug - and seems still valid for me - could you please provide if not test that reproduces the problem, at least attach a file with response that creates problem with the response.read(len(data)).

I was not able to reproduce your problem and seems @inxonic was not able as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants