Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add back RAM accounting and rate limiting #15712

Merged
merged 3 commits into from Mar 19, 2024

Conversation

romseygeek
Copy link
Contributor

Column sketches now keep track of how much memory is being used, and how
fast data is being added to the sketches. Experiments show that a column sketch
with three internal sketches uses between 2 and 4% of the memory of a fully
realised java array of samples, so we adjust the amount of memory tracked
downwards by a factor of 32.

@romseygeek romseygeek self-assigned this Mar 18, 2024
Comment on lines 285 to 287
T value = (T) expression.value();
ramAccounting.addBytes(dataType.valueBytes(value));
statsBuilder.add((T) expression.value());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
T value = (T) expression.value();
ramAccounting.addBytes(dataType.valueBytes(value));
statsBuilder.add((T) expression.value());
T value = (T) expression.value();
ramAccounting.addBytes(dataType.valueBytes(value));
statsBuilder.add(value);

Comment on lines 285 to 286
T value = (T) expression.value();
ramAccounting.addBytes(dataType.valueBytes(value));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check that the cast is safe without going through dataType.sanitizeValue? E.g. if the source lookup interpretes a LONG as INTEGER due to the json parser behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean replace the cast with T value = dataType.sanitizeValue(expression.value());?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the safe option but I'm not sure if it is required. (A case where you use a LONG type with a small integer like 4 should tell you)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha.

I think this should in fact always work because practically speaking the expression and the sketch will always be parametrized with the same type, but because of the way things are constructed we lose the type information of the expression somewhere along the way. For now I'll add in the sanitizeValue call. We may be able to remove it later.

Copy link
Member

@mfussenegger mfussenegger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise lgtm

Copy link
Contributor

@matriv matriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left 2 comments though.

}
accounting.addBytes(991); // still cached

Exception e = expectThrows(RuntimeException.class, () -> accounting.addBytes(1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use assertThatThrownBy

@Override
public void addBytes(long bytes) {
total += bytes;
if (total > 32) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment that 32 == SHIFTED_BLOCK_SIZE

@romseygeek romseygeek merged commit 420fb8f into feature/analyze-datasketches Mar 19, 2024
8 of 9 checks passed
@romseygeek romseygeek deleted the aw/analyze/ram-accounting branch March 19, 2024 10:21
romseygeek added a commit that referenced this pull request Mar 19, 2024
Column sketches now keep track of how much memory is being used, and how
fast data is being added to the sketches. Experiments show that a column sketch
with three internal sketches uses between 2 and 4% of the memory of a fully
realised java array of samples, so we adjust the amount of memory tracked
downwards by a factor of 32.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants