Int128 - new datatype #220

AlexPeshkoff · 2019-08-08T14:11:28Z

Using 128 bit integer is more efficient for representing numerics with high precision compared with decimal float. As a base for the test was used a table (not too long) with 4 Numeric(18,4) fields. Performance of statement "select avg(a/b + c/d) from tbl" was measured by multiply repeating it in stored procedure. Runtimes were as following:

Int64 (native datatype) - 5.500 sec
Int128 (emulated datatype) - 6.470 sec
Decfloat (emulated datatype) - 14.890 sec

Missing serious delays with Int128 makes it possible to use it as a result of multiply & divide for 64-bit integers to preserve high precision without decfloat-specific performance degradation.

mrotteveel

I looked it over, and left some comments. I have skipped/glossed over the more complex logic, so please don't consider this a full review. I suggest asking for a review for someone with more familiarity with C++ and the Firebird internals.

Overall, I think my main concerns are:

Compatibility with previous Firebird 4 versions: It doesn't look like it will be possible to restore backups from previous versions and get correct values, same with opening existing databases populated with previous Firebird 4 versions.
Some code seems to suggest that an INT128 datatype (bigger brother of BIGINT) exists in SQL, but no such datatype is defined: I suggest adding INT128 (subtype 0, scale 0), for consistency with the other integer datatypes.
Reusing type codes of dec_fixed for int128 could introduce compatibility problems with clients written to use dec_fixed
I get the feeling that in some parts a search/replace was done for dtype_dec_fixed to dtype_int128 without checking the logic (or at least, I have seen some parts - and left comments there - that seem to suggest that an int128 is handled through Decimal128 logic that shouldn't apply to int128 values as far as I understand it).

mrotteveel · 2019-08-10T07:17:24Z

doc/sql.extensions/README.data_types

+	binary representation), CHAR/CHARACTER (use ASCII string), DOUBLE PRECISION (use
+	8-byte FP representation - same as used for DOUBLE PRECISION fields) or BIGINT
+	with possible comma-separated SCALE clause (i.e. 'BIGINT, 3'). Various bindings
+	are useful if one plans to use DECFLOAT values with some old client not supporting


Reference to DECFLOAT doesn't belong here

Mark, I don’t understand why you need backward compatibility with previous versions 4.0. These versions were never released; they could not be used in industrial operation. Those who used are to blame for themselves. Everywhere it was written that the implementation in Alpha and Beta is not final.

I assume this was a reply to my general review comment and not on this specific review comment? Next time, please use the normal reply (quote reply) on the relevant comment, that is less confusing.

Breaking changes after a beta release should be explicitly documented somewhere. I don't expect full compatibility, but I do think this should either produce errors (which I suspect won't happen because the type code was reused, so you probably just get wrong values), or some form of fixing things through a backup and restore should be available. This would probably be appreciated by people who are actively testing things.

mrotteveel · 2019-08-10T07:23:18Z

src/common/DecFloat.cpp

@@ -518,11 +521,21 @@ Decimal128 Decimal128::set(SLONG value, DecimalStatus decSt, int scale)
 	return *this;
 }

-Decimal128 Decimal128::set(DecimalFixed value, DecimalStatus decSt, int scale)
+Decimal128 Decimal128::set(Int128 value, DecimalStatus decSt, int scale)


I'm not sure I understand the logic in this method correctly, but does this take into account the wider precision of Int128 compared to a Decimal128?

mrotteveel · 2019-08-10T07:29:19Z

src/common/cvt.cpp

@@ -2974,8 +3154,15 @@ DecimalFixed CVT_get_dec_fixed(const dsc* desc, SSHORT scale, DecimalStatus decS
 *
 **************************************/
 	VaryStr<1024> buffer;			// represents unreasonably long decfloat literal in ASCII
-	DecimalFixed dfix;
+	Int128 dfix;


name dfix seems to refer to old type name DecimalFixed

mrotteveel · 2019-08-10T07:43:44Z

src/dsql/StmtNodes.cpp

@@ -8364,7 +8364,10 @@ void SetDecFloatBindNode::execute(thread_db* tdbb, dsql_req* /*request*/, jrd_tr
 {
 	SET_TDBB(tdbb);
 	Attachment* const attachment = tdbb->getAttachment();
-	attachment->att_dec_binding = bind;
+	if (bindInt128)


Does this belong in a method called SetDecFloatBindNode? I'd sooner suggest giving it its own method, or otherwise it needs to be renamed.

mrotteveel · 2019-08-10T07:47:21Z

src/dsql/dsql.h

-			case dtype_dec_fixed:
-				precision = 34;
+			case dtype_int128:
+				precision = 37;


128bit numbers support a precision of 38 digits (max 39 digits, but those aren't the full range)

mrotteveel · 2019-08-10T08:33:47Z

src/include/firebird/impl/blr.h

@@ -69,7 +69,7 @@
 #define blr_bool			(unsigned char)23
 #define blr_dec64			(unsigned char)24
 #define blr_dec128			(unsigned char)25
-#define blr_dec_fixed		(unsigned char)26
+#define blr_int128			(unsigned char)26


Will reusing the same type code introduce compatibility problems with older Firebird 4 versions?

mrotteveel · 2019-08-10T08:34:03Z

src/include/firebird/impl/dsc_pub.h

@@ -63,7 +63,7 @@
 #define dtype_boolean	21
 #define dtype_dec64		22
 #define dtype_dec128	23
-#define dtype_dec_fixed	24
+#define dtype_int128	24


Will reusing the same type code introduce compatibility problems with older Firebird 4 versions?

mrotteveel · 2019-08-10T08:37:08Z

src/burp/canonical.cpp

@@ -170,7 +170,7 @@ ULONG CAN_encode_decode(burp_rel* relation, lstring* buffer, UCHAR* data, bool d
 			break;

 		case dtype_dec128:
-		case dtype_dec_fixed:
+		case dtype_int128:
 			if (!xdr_dec128(xdrs, (Firebird::Decimal128*) p))


Is it correct to handle dtype_int128 with the same path as Decimal128? Shouldn't it use xdr_int128 instead?

mrotteveel · 2019-08-10T08:44:21Z

src/isql/isql.epp

-	case SQL_DEC_FIXED:
-		return "DECIMAL FIXED";
+	case SQL_INT128:
+		return "INT128";


Given INT128 doesn't exist as a type, shouldn't this return DECIMAL (unless INT128 is introduced as its own type)?

mrotteveel · 2019-08-10T08:46:10Z

src/jrd/ExtEngineManager.cpp

@@ -1570,7 +1570,7 @@ void ExtEngineManager::makeTrigger(thread_db* tdbb, CompilerScratch* csb, Jrd::T
 			if (field)
 			{
 				dsc d(relFormat->fmt_desc[i]);
-				if (d.dsc_dtype == dtype_dec_fixed)
+				if (d.dsc_dtype == dtype_int128)
 					d.dsc_dtype = dtype_dec128;


Is this the right handling for dtype_int128?

AlexPeshkoff · 2019-09-11T16:33:03Z

Overall, I think my main concerns are:

* Compatibility with previous Firebird 4 versions: It doesn't look like it will be possible to restore backups from previous versions and get correct values, same with opening existing databases populated with previous Firebird 4 versions.

And here comes one interesting question - may be we should increase MINOR ODS code (ste it to 1) and also set ODS_RELEASED to ODS_13_1? That will efficiently prevent use of old databases with new engine.

* Some code seems to suggest that an `INT128` datatype (bigger brother of `BIGINT`) exists in SQL, but no such datatype is defined: I suggest adding `INT128` (subtype 0, scale 0), for consistency with the other integer datatypes.

Will do.

* Reusing type codes of dec_fixed for int128 could introduce compatibility problems with clients written to use dec_fixed

Changing code is not trivial - but if you think that's real problem can be done.

* I get the feeling that in some parts a search/replace was done for dtype_dec_fixed to dtype_int128 without checking the logic (or at least, I have seen some parts - and left comments there - that seem to suggest that an int128 is handled through Decimal128 logic that shouldn't apply to int128 values as far as I understand it).

Started to fix it.

dyemanov · 2019-09-11T16:45:01Z

Alex et al,

And here comes one interesting question - may be we should increase MINOR ODS code (ste it to 1) and also set ODS_RELEASED to ODS_13_1? That will efficiently prevent use of old databases with new engine.

This is surely possible, but it will not prevent problems with existing backups.

Some code seems to suggest that an INT128 datatype (bigger brother of BIGINT) exists in SQL, but no such datatype is defined: I suggest adding INT128 (subtype 0, scale 0), for consistency with the other integer datatypes.

Will do.

Is it really necessary? The standard types don't define the underlying size exactly, only relatively to each other. Given SMALLINT, INT and BIGINT, I don't see INT128 as a consistent addition (unless we also add INT16, INT32 and INT64).

mrotteveel · 2019-09-12T06:29:39Z

@dyemanov

Some code seems to suggest that an INT128 datatype (bigger brother of BIGINT) exists in SQL, but no such datatype is defined: I suggest adding INT128 (subtype 0, scale 0), for consistency with the other integer datatypes.

Will do.

Is it really necessary? The standard types don't define the underlying size exactly, only relatively to each other. Given SMALLINT, INT and BIGINT, I don't see INT128 as a consistent addition (unless we also add INT16, INT32 and INT64).

I don't really agree with that. I think using INT128 is a logical name compared to something like HUGEINT (which would be somewhat consistent with the TINYINT, SMALLINT, INT, BIGINT series used by the standard). The fact we add a non-standard type doesn't mean we need to retroactively add aliases consistent with that non-standard type to the standard types.

But otherwise, we just shouldn't do this, but then I do suggest that things need to be modified a bit. In its current state it suggests than an INT128 type does exist, with things like SET INT128 BIND, isc_dpb_int128_bind, and returning INT128 in isql.epp

mrotteveel · 2019-09-12T06:35:36Z

@AlexPeshkoff

Overall, I think my main concerns are:

Compatibility with previous Firebird 4 versions: It doesn't look like it will be possible to restore backups from previous versions and get correct values, same with opening existing databases populated with previous Firebird 4 versions.

And here comes one interesting question - may be we should increase MINOR ODS code (ste it to 1) and also set ODS_RELEASED to ODS_13_1? That will efficiently prevent use of old databases with new engine.

That solves the problem of reusing databases, but doesn't solve the problem of backing up a database for a Firebird 4 with dec_fixed numerics and restoring it under a Firebird 4 with int128 numerics. We could just accept the problem as is, but in that case the release notes need to be extremely clear that those values will be corrupt (if the typecodes are reused), or that those backups cannot be restored (if the typecodes are changed).

[..]

Reusing type codes of dec_fixed for int128 could introduce compatibility problems with clients written to use dec_fixed

Changing code is not trivial - but if you think that's real problem can be done.

Shouldn't that be a matter of assigning different values to the various constants (SQL_INT128 and blr_int128)?

AlexPeshkoff · 2019-09-12T11:12:10Z

On 11.09.2019 19:45, Dmitry Yemanov wrote: Alex et al, And here comes one interesting question - may be we should increase MINOR ODS code (ste it to 1) and also set ODS_RELEASED to ODS_13_1? That will efficiently prevent use of old databases with new engine. This is surely possible, but it will not prevent problems with existing backups.

To prevent problems with existing backups we need to increase backup version number and process high precision numerics from it as decfloat (convert them to 128-bit integers). That's definitely doable, but is it really necessary?

* Some code seems to suggest that an |INT128| datatype (bigger brother of |BIGINT|) exists in SQL, but no such datatype is defined: I suggest adding |INT128| (subtype 0, scale 0), for consistency with the other integer datatypes. Will do. Is it really necessary? The standard types don't define the underlying size exactly, only relatively to each other. Given SMALLINT, INT and BIGINT, I don't see INT128 as a consistent addition (unless we also add INT16, INT32 and INT64).

OK, let's not hurry with this.

asfernandes · 2019-09-12T11:14:37Z

I don't think any effort or technical debt should be inserted to handle non-released databases/backups.

AlexPeshkoff · 2019-09-12T13:21:33Z

On 12.09.2019 9:35, Mark Rotteveel wrote: * Reusing type codes of dec_fixed for int128 could introduce compatibility problems with clients written to use dec_fixed Changing code is not trivial - but if you think that's real problem can be done. Shouldn't that be a matter of assigning different values to the various constants (SQL_INT128 and blr_int128)?

SQL_INT128 already has another value. With BLR code things are a bit more complicated - it's used as an index in a number of internal arrays, i.e. chnaging it means a lot of manual changes in the code. Therefore I've left blr_int128 == old blr_decfixed.

dyemanov · 2019-09-12T13:24:44Z

I don't think any effort or technical debt should be inserted to handle non-released databases/backups.

So far I second this.

AlexPeshkoff · 2019-09-12T14:42:50Z

On 12.09.2019 16:24, Dmitry Yemanov wrote: I don't think any effort or technical debt should be inserted to handle non-released databases/backups. So far I second this.

+1

PizzaProgram · 2019-09-13T10:18:26Z

... (unless we also add INT16, INT32 and INT64).

Please allow me to share some thoughts from "End User" view:

Why would be so bad to add ALL at once? (Instead of adding ONLY INT128 now...)
Why are you forced to name it INT128? Why not something similar, that won't interfere with existing names, like:

 _INT8 _INT16 _INT32 _INT64 _INT128 _INT256 _INT384 _INT512 _INT1024
//   or ... if it's possible, it would harmonize even better with RDB$ ... naming:
$INT8 $INT16 $INT32 $INT64 $INT128 $INT256 $INT384 $INT512 $INT1024

-Why not creating aliases for those too ?
(TINYINT = _INT8 ; HUGEINT = _INT128 ; GIANTINT, EXTREMINT = reserved)

My whole life long, karma taught me again and again:
Instead of "half solutions" to one problem it's always better to put the "whole thing right" at the same time. ;-)

mrotteveel · 2019-09-13T10:54:24Z

@PizzaProgram

Why would be so bad to add ALL at once? (Instead of adding ONLY INT128 now...)

What benefit does adding additional aliases for the other types give us? We have standard types defined in the ISO SQL standard, and we have our own non-standard datatypes.

In my opinion, introducing non-standard aliases for standard types just because we - hypothetically at the moment - introduce an INT128 type seems busy work just for the sake of internal consistency. It will increase the amount of documentation, it will increase the amount of keywords (with potential for conflicts with current domains or other object names, etc), and using them will only lead to Firebird-specific SQL that is not portable to other databases. In addition, it might cause confusion.

What is the benefit of introducing this choice in datatype names?

And I have to admit, the same can be said about introducing INT128, as the same effect can be achieved with NUMERIC(38) or DECIMAL(38) (or NUMERIC(38,0)...). So my argument above can also be taken as an argument against my earlier proposal to introduce INT128.

Why are you forced to name it INT128? Why not something similar, that won't interfere with existing names, like:

Nothing forces us to use INT128, but given the implementation it is a logical choice, as it is self-documenting.

 _INT8 _INT16 _INT32 _INT64 _INT128 _INT256 _INT384 _INT512 _INT1024
//   or ... if it's possible, it would harmonize even better with RDB$ ... naming:
$INT8 $INT16 $INT32 $INT64 $INT128 $INT256 $INT384 $INT512 $INT1024

Why would choose that naming over INT128, what benefit does prefixing with _ or $ give?

-Why not creating aliases for those too ?
(TINYINT = _INT8 ; HUGEINT = _INT128 ; GIANTINT, EXTREMINT = reserved)

My whole life long, karma taught me again and again:
Instead of "half solutions" to one problem it's always better to put the "whole thing right" at the same time. ;-)

Why do you think adding those aliases for the standard types is the 'right solution'?

PizzaProgram · 2019-09-13T12:52:10Z

@mrotteveel

What benefit does adding additional aliases for the other types give us?

For "You", FB coders: Nothing, just extra work. :-(
For "Us", "End users": to make life easier and save 1000 people 5 minutes. (If there are "ready to use", predefined domains, both $INT128 and HUGEINT ... and also $UINT2 ... $UINT128 ... we can use.)
So we do not have to search, check, define, test, etc...

We have standard types defined in the ISO SQL standard, and we have our own non-standard datatypes.

Exactly! So why not implement both fully? (Well I guess I know why: extra work for you guys...)
But instead of "half piece of this" + "half of that" >> it helps a lot to learn FB for newbies.
For example: I like very much you have kept GEN_ID() for us, "old guys", but implemented an SQL standard SEQUENCE too. Nice job!

In my opinion, introducing non-standard aliases for standard types just because we - hypothetically at the moment - introduce an INT128 type seems busy work just for the sake of internal consistency. It will increase the amount of documentation, it will increase the amount of keywords (with potential for conflicts with current domains or other object names, etc), and using them will only lead to Firebird-specific SQL that is not portable to other databases. In addition, it might cause confusion.

I didn't know it would be soooo much extra work to create ONE of those, Copy>Paste and change 2>4>8>>...
Sorry to hear that!

What is the benefit of introducing this choice in datatype names?
IMHO, if you do only 1 new >> not too much. But if you introduce a FULL spectrum of "easy names", it will help a lot for all the programmers using FB.

Nothing forces us to use INT128, but given the implementation it is a logical choice, as it is self-documenting.
Why would choose that naming over INT128, what benefit does prefixing with _ or $ give?

I agree. Only there is a tiny problem with those databases migrating from prev. version:

What if there's a name conflict with a self-defined domain?

Why do you think adding those aliases for the standard types is the 'right solution'?

I'm not 100% sure those names are perfect, just expressed my opinion. If you know better names for $int256, $int512, I'm glad to hear some ;-)

...

Compatibility with previous Firebird 4 versions: It doesn't look like it will be possible to restore backups from previous versions and get correct values, same with opening existing databases populated with previous Firebird 4 versions.

+1 I agree with ODS version change.

PS: Thanks for reading my suggestions. I won't post any more to this thread. From now on, it's up to You guys, which path you choose.

mrotteveel · 2019-09-13T14:50:04Z

@PizzaProgram

What benefit does adding additional aliases for the other types give us?

For "You", FB coders: Nothing, just extra work. :-(

To be clear, I'm not a core Firebird developer. I was challenging your idea to find out what your underlying arguments for such a feature are.

AlexPeshkoff · 2019-09-13T16:50:24Z

Thanks to Mark - I've fixed most of issues noticed by him regarding use/naming of int128 as dec float.

Only one thing remains (specially if we are not gong to add separate SQL type for 128-bit integers) - how to call appropriate macro in Message.h (remember we have FB_SMALLINT, FB_INTEGER, FB_BIGINT) ?

asfernandes · 2019-09-13T16:57:10Z

Only one thing remains (specially if we are not gong to add separate SQL type for 128-bit integers)

I have lost everything here. We are not going to add SQL type for 128-bit integers? Isn't the discusion about INT128 and a possible alias for it?

mrotteveel · 2019-09-13T17:13:41Z

@asfernandes

Only one thing remains (specially if we are not gong to add separate SQL type for 128-bit integers)

I have lost everything here. We are not going to add SQL type for 128-bit integers? Isn't the discusion about INT128 and a possible alias for it?

No, this PR is about swapping the underlying type of NUMERIC/DECIMAL with a precision of 19 or greater from a Decimal128 to an Int128 (and increasing the maximum precision from 34 to 38). I muddied the waters by suggesting to also introduce INT128 as a type in SQL.

AlexPeshkoff · 2019-09-13T17:17:35Z

On 13.09.2019 19:57, Adriano dos Santos Fernandes wrote: Only one thing remains (specially if we are not gong to add separate SQL type for 128-bit integers) I have lost everything here. We are not going to add SQL type for 128-bit integers?

As it was suggested: DY: Is it really necessary? The standard types don't define the underlying size exactly, only relatively to each other. Given SMALLINT, INT and BIGINT, I don't see INT128 as a consistent addition (unless we also add INT16, INT32 and INT64). and agreed: AP: OK, let's not hurry with this. appears the answer is 'no'.

Isn't the discusion about INT128 and a possible alias for it?

Primary goal of this PR is to increase precision of intermediate calculations with numerics. Explicit new type is not required for this. Notice - BIGINT was added later than use of it in NUMERIC(18).

AlexPeshkoff and others added 11 commits July 29, 2019 19:50

Int128 support - work in progress

54a0111

Work in progress

10d4f3c

Int128 datatype appears to be mostly OK except sort & index

f9e5e7c

Fixed divide scaling, added sorting & network (xdr) support

0f5f284

Binding control, aggregate nodes, cleanup and documentation

3b71f10

Fixed VS2017 AppVeyor build

0cccd90

Next attempt to fix vs2017 build

6346ce6

Next attempt to fix vs2017 build

7346d38

Next attempt to fix vs2017 build

7cc5c40

Update MSVC build.

360cfe2

Set VS architecture correctly

a182bec

AlexPeshkoff requested a review from mrotteveel August 9, 2019 15:16

AlexPeshkoff self-assigned this Aug 9, 2019

mrotteveel reviewed Aug 10, 2019

View reviewed changes

Fixed a number of issues noticed by Mark

21bbecc

Merged changes from master

09d7bbf

AlexPeshkoff merged commit 861d536 into master Sep 16, 2019

AlexPeshkoff deleted the int128 branch September 17, 2019 09:04

Uh oh!

Int128 - new datatype #220

Int128 - new datatype #220

Uh oh!

Conversation

AlexPeshkoff commented Aug 8, 2019

Uh oh!

mrotteveel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrotteveel Aug 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrotteveel Aug 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexPeshkoff commented Sep 11, 2019

Uh oh!

dyemanov commented Sep 11, 2019

Uh oh!

mrotteveel commented Sep 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrotteveel commented Sep 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexPeshkoff commented Sep 12, 2019 via email

Uh oh!

asfernandes commented Sep 12, 2019

Uh oh!

AlexPeshkoff commented Sep 12, 2019 via email

Uh oh!

dyemanov commented Sep 12, 2019

Uh oh!

AlexPeshkoff commented Sep 12, 2019 via email

Uh oh!

PizzaProgram commented Sep 13, 2019

Please allow me to share some thoughts from "End User" view:

Uh oh!

mrotteveel commented Sep 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PizzaProgram commented Sep 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrotteveel commented Sep 13, 2019

Uh oh!

AlexPeshkoff commented Sep 13, 2019

Uh oh!

asfernandes commented Sep 13, 2019

Uh oh!

mrotteveel commented Sep 13, 2019

Uh oh!

AlexPeshkoff commented Sep 13, 2019 via email

Uh oh!

Uh oh!

mrotteveel left a comment •

edited

Loading

mrotteveel Aug 10, 2019 •

edited

Loading

mrotteveel Aug 10, 2019 •

edited

Loading

mrotteveel commented Sep 12, 2019 •

edited

Loading

mrotteveel commented Sep 12, 2019 •

edited

Loading

mrotteveel commented Sep 13, 2019 •

edited

Loading

PizzaProgram commented Sep 13, 2019 •

edited

Loading