Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IGNITE-17855 Implement inline size calculation for B+tree #1179

Merged
merged 14 commits into from
Oct 11, 2022

Conversation

tkalkirill
Copy link
Contributor

case STRING: {
int length = ((VarlenNativeType) nativeType).length();

return length == Integer.MAX_VALUE ? UNDEFINED_VARLEN_INLINE_SIZE : length * 2;
Copy link
Member

@AMashenkov AMashenkov Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return length == Integer.MAX_VALUE ? UNDEFINED_VARLEN_INLINE_SIZE : length * 2;
return length == Integer.MAX_VALUE ? UNDEFINED_VARLEN_INLINE_SIZE : length;

Do you expect most of users will use 2 bytes chars?
Why should an index over columns of ASCII chars be twice large by default?
And what to do with 4-byte chars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proceed from the fact that in Java characters is char (2 bytes).
We can assume, as in Java, that in most cases the characters will be ascii (1 byte).
We're just doing an approximate inline count, I don't know how often we'll see 4 byte characters.

Later we can add an encoding (utf-8, 16, 32) to the configuration or ddl.

wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, also I'd limit default value for var-length types (e.g. with 64-256 bytes).

Usually, the upper limit for var-length columns is usually calculated with some reserve.
So, the actual values size is less the limit (often noticeably).
Also, large inline-size means less data will fit to the index page and may have negative impact on performance.

In case of relatively large values with similar prefixes user always can increase inline-size manually,
but imho, in most of cases, this limit will reduce index size with no performance penalty.

/**
* Helper class for index inlining.
*/
public class InlineUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most methods in this class a package-private, while this class is public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class will be used in VolatilePageMemoryTableStorage and PersistentPageMemoryTableStorage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess they're package-private for tests, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

/**
* Helper class for index inlining.
*/
public class InlineUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess they're package-private for tests, right?

assertEquals(9, inlineSize(nativeType = NativeTypes.datetime()));
nativeTypeSpecs.remove(nativeType.spec());

assertEquals(12, inlineSize(nativeType = NativeTypes.timestamp()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 bytes per timestamp? Jesus! Why is it so big? Just asking

Copy link
Contributor Author

@tkalkirill tkalkirill Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's seconds (long) + nanos (int)

void testBinaryTupleInlineSize() {
IndexDescriptor indexDescriptor = testIndexDescriptor();

assertEquals(BinaryTupleCommon.HEADER_SIZE, binaryTupleInlineSize(indexDescriptor));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is funny, you know. I'd expect an exception, who would want to make a tuple with 0 columns.
On the other hand, can this be useful somewhere? Just asking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know yet, I can add an assertion

Copy link
Contributor

@ibessonov ibessonov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have my approval, let's wait for other guys now

@tkalkirill tkalkirill merged commit 215ea8c into apache:main Oct 11, 2022
@tkalkirill tkalkirill deleted the ignite-17855 branch October 11, 2022 13:57
lowka pushed a commit to gridgain/apache-ignite-3 that referenced this pull request Mar 18, 2023
lowka pushed a commit to gridgain/apache-ignite-3 that referenced this pull request Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants