ORC-642: update PatchedBase doc with patch ceiling in spec #1868

Jefffrey · 2024-03-31T08:30:50Z

What changes were proposed in this pull request?

Update PatchedBase specification doc to include details about the behaviour of padding the patch gap + patch width bits to nearest fixed btis.

Why are the changes needed?

Ensure spec is accurate to implementation details

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

No

Jefffrey · 2024-03-31T08:31:30Z

site/specification/ORCv1.md

+(PGW + PW)    | ceil(PGW + PW)
+:------------ | :-------------
+1 <= x <= 24  | x
+25            | 26
+26            | 26
+27            | 28
+28            | 28
+29            | 30
+30            | 30
+31            | 32
+32            | 32
+33 <= x <= 40 | 40
+41 <= x <= 48 | 48
+49 <= x <= 56 | 56
+57 <= x <= 64 | 64


From

orc/c++/src/RLEV2Util.cc

Lines 29 to 33 in 9b79de9

// Map bit length i to closest fixed bit width that can contain i bits.

const uint8_t ClosestFixedBitsMap[65] = {

1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,

22, 23, 24, 26, 26, 28, 28, 30, 30, 32, 32, 40, 40, 40, 40, 40, 40, 40, 40, 48, 48, 48,

48, 48, 48, 48, 48, 56, 56, 56, 56, 56, 56, 56, 56, 64, 64, 64, 64, 64, 64, 64, 64};

Is it better to follow the java code to add 0 as well?

orc/java/core/src/java/org/apache/orc/impl/SerializationUtils.java

Lines 362 to 391 in 513922a

public int getClosestFixedBits(int n) {

if (n == 0) {

return 1;

}

if (n <= 24) {

return n;

}

if (n <= 26) {

return 26;

}

if (n <= 28) {

return 28;

}

if (n <= 30) {

return 30;

}

if (n <= 32) {

return 32;

}

if (n <= 40) {

return 40;

}

if (n <= 48) {

return 48;

}

if (n <= 56) {

return 56;

}

return 64;

}

I don't think it makes sense to have 0 as the input is PGW + PW, and the spec states that PGW is 1 to 8 bits and PW is 1 to 64 bits, so it can never be 0 (or 1, for that matter, I guess 🤔 )

Jefffrey · 2024-03-31T08:31:47Z

site/specification/ORCv1.md

+  64. (PGW + PW) is padded to the nearest fixed bit size according to the
+  below table before being encoded in the patch list.


According to

orc/c++/src/RleDecoderV2.cc

Line 332 in 9b79de9

uint32_t cfb = getClosestFixedBits(patchBitSize + pgw);

Could you please rename ceil to closestFixedBits or cfb? ceil does not seems to have its common meaning here.

Sure thing, done

wgtmac

Thanks for the fix! Could you please also add them to ORCv2.md as well?

wgtmac · 2024-04-02T04:45:41Z

site/specification/ORCv1.md

+  64. (PGW + PW) is padded to the nearest fixed bit size according to the
+  below table before being encoded in the patch list.


Could you please rename ceil to closestFixedBits or cfb? ceil does not seems to have its common meaning here.

wgtmac · 2024-04-02T04:48:08Z

site/specification/ORCv1.md

+(PGW + PW)    | ceil(PGW + PW)
+:------------ | :-------------
+1 <= x <= 24  | x
+25            | 26
+26            | 26
+27            | 28
+28            | 28
+29            | 30
+30            | 30
+31            | 32
+32            | 32
+33 <= x <= 40 | 40
+41 <= x <= 48 | 48
+49 <= x <= 56 | 56
+57 <= x <= 64 | 64


Is it better to follow the java code to add 0 as well?

orc/java/core/src/java/org/apache/orc/impl/SerializationUtils.java

Lines 362 to 391 in 513922a

public int getClosestFixedBits(int n) {

if (n == 0) {

return 1;

}

if (n <= 24) {

return n;

}

if (n <= 26) {

return 26;

}

if (n <= 28) {

return 28;

}

if (n <= 30) {

return 30;

}

if (n <= 32) {

return 32;

}

if (n <= 40) {

return 40;

}

if (n <= 48) {

return 48;

}

if (n <= 56) {

return 56;

}

return 64;

}

wgtmac

LGTM +1

wgtmac · 2024-04-04T15:52:02Z

cc @dongjoon-hyun @deshanxiao

dongjoon-hyun

+1, LGTM.

deshanxiao · 2024-04-10T10:27:11Z

Thanks @Jefffrey @wgtmac @dongjoon-hyun . Merged to main.

ORC-642: update PatchedBase doc with patch ceiling in spec

64b8cbf

github-actions bot added the DOCS label Mar 31, 2024

Jefffrey commented Mar 31, 2024

View reviewed changes

wgtmac requested changes Apr 2, 2024

View reviewed changes

Fix wording and update v2 spec

c353ff0

wgtmac approved these changes Apr 4, 2024

View reviewed changes

deshanxiao approved these changes Apr 8, 2024

View reviewed changes

dongjoon-hyun approved these changes Apr 9, 2024

View reviewed changes

dongjoon-hyun added this to the 2.1.0 milestone Apr 9, 2024

deshanxiao closed this in 8387981 Apr 10, 2024

Jefffrey deleted the ORC-642 branch April 10, 2024 10:43

	// Map bit length i to closest fixed bit width that can contain i bits.
	const uint8_t ClosestFixedBitsMap[65] = {
	1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
	22, 23, 24, 26, 26, 28, 28, 30, 30, 32, 32, 40, 40, 40, 40, 40, 40, 40, 40, 48, 48, 48,
	48, 48, 48, 48, 48, 56, 56, 56, 56, 56, 56, 56, 56, 64, 64, 64, 64, 64, 64, 64, 64};

	public int getClosestFixedBits(int n) {
	if (n == 0) {
	return 1;
	}
	if (n <= 24) {
	return n;
	}
	if (n <= 26) {
	return 26;
	}
	if (n <= 28) {
	return 28;
	}
	if (n <= 30) {
	return 30;
	}
	if (n <= 32) {
	return 32;
	}
	if (n <= 40) {
	return 40;
	}
	if (n <= 48) {
	return 48;
	}
	if (n <= 56) {
	return 56;
	}
	return 64;
	}

		64. (PGW + PW) is padded to the nearest fixed bit size according to the
		below table before being encoded in the patch list.

ORC-642: update PatchedBase doc with patch ceiling in spec #1868

ORC-642: update PatchedBase doc with patch ceiling in spec #1868

Uh oh!

Conversation

Jefffrey commented Mar 31, 2024

What changes were proposed in this pull request?

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wgtmac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wgtmac left a comment

Choose a reason for hiding this comment

Uh oh!

wgtmac commented Apr 4, 2024

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

deshanxiao commented Apr 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants