-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Track the amount of read data per row #2368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to use TypeSizes instead here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I don't think so, that would require also refactoring that in bytesRead updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more clear to use the size number 1/2/4/8 ,which make the code more readable as the code below just add these numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not following. Why would you need to refactor anything? You guys think 1 is more readable than TypeSizes.BOOL_SIZE and on top of that you're hardcoding a value? It's like magic numbers where we have a proper abstraction ready unless I am missing sthg.
checkCanRead(TypeSizes.BOOL_SIZE) reads good in my eyes 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your point of view. From the perspective of code readability, TypeSizes.BOOL_SIZE is definitely much better than hard code 1.
What I said above means that if the existing code is already written as 1/4/8, our newly added code also fills in 1/4/8, then it just echoes back and forth, so the readability is not bad.
Of course, if Jacek is willing to change the existing numbers to TypeSizes, I definitely support your point of view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1)checkCanRead(TypeSizes.BOOL_SIZE); byte b = source.readByte(); bytesRead += TypeSizes.BOOL_SIZE; return b;
(2) checkCanRead(1); byte b = source.readByte(); bytesRead += 1; return b;
(3)checkCanRead(TypeSizes.BOOL_SIZE); byte b = source.readByte(); bytesRead += 1; return b;
I think the code itself should explain its behavior.
just from my point of view, this order is (1) > (2) > (3)
and it seems Jacek has already resolved this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing new line?
|
We need CI runs and will you port this to 4.0 and 4.1 also? |
|
@bereng thanks for super quick response. I'll provide patches for older branches but probably it will be next week |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will limit just be zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just test some times,it seems limit can not be 0 ~~~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we always encode at least a flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a test for some corner case is needed such as what I said when limit is 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this
|
@bereng is it ok now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this
|
Approach looks good to me. CI and PRs for the rest of the versions are missing iiuc. |
|
@bereng there was no point in running CI before code review, same for other PRs |
|
Ha you're brave. I always run CI before... |
If an sstable is corrupted in a nasty way, we may read invalid cell sizes and try to read much more data for a row than we should. In rare scenarios this can lead even to OOMs. This simple fix adds tracking and limiting the amount of data that is read per row. Row has its size stored in preamble which can be used as a limit. If the deserialization code tries to read more than that, it will simply fail with EOF which will prevent more serious problems later. Patch by Jacek Lewandowski; reviewed by Berenguer Blasi and Maxwell Guo for CASSANDRA-18513
7400fc5 to
02b188a
Compare
|
^That is j11 only, j8 seems to not have been triggered? |
df3eb40 to
54e39a9
Compare
If an sstable is corrupted in a nasty way, we may read invalid cell sizes and try to read much more data for a row than we should. In rare scenarios this can lead even to OOMs.
This simple fix adds tracking and limiting the amount of data that is read per row. Row has its size stored in preamble which can be used as a limit. If the deserialization code tries to read more than that, it will simply fail with EOF which will prevent more serious problems later.
Thanks for sending a pull request! Here are some tips if you're new here:
Commit messages should follow the following format:
The Cassandra Jira