Skip to content

Int128 - new datatype #220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Sep 16, 2019
Merged

Int128 - new datatype #220

merged 13 commits into from
Sep 16, 2019

Conversation

AlexPeshkoff
Copy link
Member

Using 128 bit integer is more efficient for representing numerics with high precision compared with decimal float. As a base for the test was used a table (not too long) with 4 Numeric(18,4) fields. Performance of statement "select avg(a/b + c/d) from tbl" was measured by multiply repeating it in stored procedure. Runtimes were as following:

  1. Int64 (native datatype) - 5.500 sec
  2. Int128 (emulated datatype) - 6.470 sec
  3. Decfloat (emulated datatype) - 14.890 sec

Missing serious delays with Int128 makes it possible to use it as a result of multiply & divide for 64-bit integers to preserve high precision without decfloat-specific performance degradation.

@AlexPeshkoff AlexPeshkoff requested a review from mrotteveel August 9, 2019 15:16
@AlexPeshkoff AlexPeshkoff self-assigned this Aug 9, 2019
Copy link
Member

@mrotteveel mrotteveel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked it over, and left some comments. I have skipped/glossed over the more complex logic, so please don't consider this a full review. I suggest asking for a review for someone with more familiarity with C++ and the Firebird internals.

Overall, I think my main concerns are:

  • Compatibility with previous Firebird 4 versions: It doesn't look like it will be possible to restore backups from previous versions and get correct values, same with opening existing databases populated with previous Firebird 4 versions.
  • Some code seems to suggest that an INT128 datatype (bigger brother of BIGINT) exists in SQL, but no such datatype is defined: I suggest adding INT128 (subtype 0, scale 0), for consistency with the other integer datatypes.
  • Reusing type codes of dec_fixed for int128 could introduce compatibility problems with clients written to use dec_fixed
  • I get the feeling that in some parts a search/replace was done for dtype_dec_fixed to dtype_int128 without checking the logic (or at least, I have seen some parts - and left comments there - that seem to suggest that an int128 is handled through Decimal128 logic that shouldn't apply to int128 values as far as I understand it).

binary representation), CHAR/CHARACTER (use ASCII string), DOUBLE PRECISION (use
8-byte FP representation - same as used for DOUBLE PRECISION fields) or BIGINT
with possible comma-separated SCALE clause (i.e. 'BIGINT, 3'). Various bindings
are useful if one plans to use DECFLOAT values with some old client not supporting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference to DECFLOAT doesn't belong here

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark, I don’t understand why you need backward compatibility with previous versions 4.0. These versions were never released; they could not be used in industrial operation. Those who used are to blame for themselves. Everywhere it was written that the implementation in Alpha and Beta is not final.

Copy link
Member

@mrotteveel mrotteveel Aug 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this was a reply to my general review comment and not on this specific review comment? Next time, please use the normal reply (quote reply) on the relevant comment, that is less confusing.

Breaking changes after a beta release should be explicitly documented somewhere. I don't expect full compatibility, but I do think this should either produce errors (which I suspect won't happen because the type code was reused, so you probably just get wrong values), or some form of fixing things through a backup and restore should be available. This would probably be appreciated by people who are actively testing things.

@@ -518,11 +521,21 @@ Decimal128 Decimal128::set(SLONG value, DecimalStatus decSt, int scale)
return *this;
}

Decimal128 Decimal128::set(DecimalFixed value, DecimalStatus decSt, int scale)
Decimal128 Decimal128::set(Int128 value, DecimalStatus decSt, int scale)
Copy link
Member

@mrotteveel mrotteveel Aug 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the logic in this method correctly, but does this take into account the wider precision of Int128 compared to a Decimal128?

@@ -2974,8 +3154,15 @@ DecimalFixed CVT_get_dec_fixed(const dsc* desc, SSHORT scale, DecimalStatus decS
*
**************************************/
VaryStr<1024> buffer; // represents unreasonably long decfloat literal in ASCII
DecimalFixed dfix;
Int128 dfix;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name dfix seems to refer to old type name DecimalFixed

@@ -8364,7 +8364,10 @@ void SetDecFloatBindNode::execute(thread_db* tdbb, dsql_req* /*request*/, jrd_tr
{
SET_TDBB(tdbb);
Attachment* const attachment = tdbb->getAttachment();
attachment->att_dec_binding = bind;
if (bindInt128)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this belong in a method called SetDecFloatBindNode? I'd sooner suggest giving it its own method, or otherwise it needs to be renamed.

src/dsql/dsql.h Outdated
case dtype_dec_fixed:
precision = 34;
case dtype_int128:
precision = 37;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

128bit numbers support a precision of 38 digits (max 39 digits, but those aren't the full range)

@@ -69,7 +69,7 @@
#define blr_bool (unsigned char)23
#define blr_dec64 (unsigned char)24
#define blr_dec128 (unsigned char)25
#define blr_dec_fixed (unsigned char)26
#define blr_int128 (unsigned char)26
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will reusing the same type code introduce compatibility problems with older Firebird 4 versions?

@@ -63,7 +63,7 @@
#define dtype_boolean 21
#define dtype_dec64 22
#define dtype_dec128 23
#define dtype_dec_fixed 24
#define dtype_int128 24
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will reusing the same type code introduce compatibility problems with older Firebird 4 versions?

@@ -170,7 +170,7 @@ ULONG CAN_encode_decode(burp_rel* relation, lstring* buffer, UCHAR* data, bool d
break;

case dtype_dec128:
case dtype_dec_fixed:
case dtype_int128:
if (!xdr_dec128(xdrs, (Firebird::Decimal128*) p))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to handle dtype_int128 with the same path as Decimal128? Shouldn't it use xdr_int128 instead?

case SQL_DEC_FIXED:
return "DECIMAL FIXED";
case SQL_INT128:
return "INT128";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given INT128 doesn't exist as a type, shouldn't this return DECIMAL (unless INT128 is introduced as its own type)?

@@ -1570,7 +1570,7 @@ void ExtEngineManager::makeTrigger(thread_db* tdbb, CompilerScratch* csb, Jrd::T
if (field)
{
dsc d(relFormat->fmt_desc[i]);
if (d.dsc_dtype == dtype_dec_fixed)
if (d.dsc_dtype == dtype_int128)
d.dsc_dtype = dtype_dec128;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right handling for dtype_int128?

@AlexPeshkoff
Copy link
Member Author

Overall, I think my main concerns are:

* Compatibility with previous Firebird 4 versions: It doesn't look like it will be possible to restore backups from previous versions and get correct values, same with opening existing databases populated with previous Firebird 4 versions.

And here comes one interesting question - may be we should increase MINOR ODS code (ste it to 1) and also set ODS_RELEASED to ODS_13_1? That will efficiently prevent use of old databases with new engine.

* Some code seems to suggest that an `INT128` datatype (bigger brother of `BIGINT`) exists in SQL, but no such datatype is defined: I suggest adding `INT128` (subtype 0, scale 0), for consistency with the other integer datatypes.

Will do.

* Reusing type codes of dec_fixed for int128 could introduce compatibility problems with clients written to use dec_fixed

Changing code is not trivial - but if you think that's real problem can be done.

* I get the feeling that in some parts a search/replace was done for dtype_dec_fixed to dtype_int128 without checking the logic (or at least, I have seen some parts - and left comments there - that seem to suggest that an int128 is handled through Decimal128 logic that shouldn't apply to int128 values as far as I understand it).

Started to fix it.

@dyemanov
Copy link
Member

Alex et al,

And here comes one interesting question - may be we should increase MINOR ODS code (ste it to 1) and also set ODS_RELEASED to ODS_13_1? That will efficiently prevent use of old databases with new engine.

This is surely possible, but it will not prevent problems with existing backups.

  • Some code seems to suggest that an INT128 datatype (bigger brother of BIGINT) exists in SQL, but no such datatype is defined: I suggest adding INT128 (subtype 0, scale 0), for consistency with the other integer datatypes.

Will do.

Is it really necessary? The standard types don't define the underlying size exactly, only relatively to each other. Given SMALLINT, INT and BIGINT, I don't see INT128 as a consistent addition (unless we also add INT16, INT32 and INT64).

@mrotteveel
Copy link
Member

mrotteveel commented Sep 12, 2019

@dyemanov

  • Some code seems to suggest that an INT128 datatype (bigger brother of BIGINT) exists in SQL, but no such datatype is defined: I suggest adding INT128 (subtype 0, scale 0), for consistency with the other integer datatypes.

Will do.

Is it really necessary? The standard types don't define the underlying size exactly, only relatively to each other. Given SMALLINT, INT and BIGINT, I don't see INT128 as a consistent addition (unless we also add INT16, INT32 and INT64).

I don't really agree with that. I think using INT128 is a logical name compared to something like HUGEINT (which would be somewhat consistent with the TINYINT, SMALLINT, INT, BIGINT series used by the standard). The fact we add a non-standard type doesn't mean we need to retroactively add aliases consistent with that non-standard type to the standard types.

But otherwise, we just shouldn't do this, but then I do suggest that things need to be modified a bit. In its current state it suggests than an INT128 type does exist, with things like SET INT128 BIND, isc_dpb_int128_bind, and returning INT128 in isql.epp

@mrotteveel
Copy link
Member

mrotteveel commented Sep 12, 2019

@AlexPeshkoff

Overall, I think my main concerns are:

  • Compatibility with previous Firebird 4 versions: It doesn't look like it will be possible to restore backups from previous versions and get correct values, same with opening existing databases populated with previous Firebird 4 versions.

And here comes one interesting question - may be we should increase MINOR ODS code (ste it to 1) and also set ODS_RELEASED to ODS_13_1? That will efficiently prevent use of old databases with new engine.

That solves the problem of reusing databases, but doesn't solve the problem of backing up a database for a Firebird 4 with dec_fixed numerics and restoring it under a Firebird 4 with int128 numerics. We could just accept the problem as is, but in that case the release notes need to be extremely clear that those values will be corrupt (if the typecodes are reused), or that those backups cannot be restored (if the typecodes are changed).

[..]

  • Reusing type codes of dec_fixed for int128 could introduce compatibility problems with clients written to use dec_fixed

Changing code is not trivial - but if you think that's real problem can be done.

Shouldn't that be a matter of assigning different values to the various constants (SQL_INT128 and blr_int128)?

@AlexPeshkoff
Copy link
Member Author

AlexPeshkoff commented Sep 12, 2019 via email

@asfernandes
Copy link
Member

I don't think any effort or technical debt should be inserted to handle non-released databases/backups.

@AlexPeshkoff
Copy link
Member Author

AlexPeshkoff commented Sep 12, 2019 via email

@dyemanov
Copy link
Member

I don't think any effort or technical debt should be inserted to handle non-released databases/backups.

So far I second this.

@AlexPeshkoff
Copy link
Member Author

AlexPeshkoff commented Sep 12, 2019 via email

@PizzaProgram
Copy link

... (unless we also add INT16, INT32 and INT64).

Please allow me to share some thoughts from "End User" view:

  • Why would be so bad to add ALL at once? (Instead of adding ONLY INT128 now...)
  • Why are you forced to name it INT128? Why not something similar, that won't interfere with existing names, like:
 _INT8 _INT16 _INT32 _INT64 _INT128 _INT256 _INT384 _INT512 _INT1024
//   or ... if it's possible, it would harmonize even better with RDB$ ... naming:
$INT8 $INT16 $INT32 $INT64 $INT128 $INT256 $INT384 $INT512 $INT1024  

-Why not creating aliases for those too ?
(TINYINT = _INT8 ; HUGEINT = _INT128 ; GIANTINT, EXTREMINT = reserved)

My whole life long, karma taught me again and again:
Instead of "half solutions" to one problem it's always better to put the "whole thing right" at the same time. ;-)

@mrotteveel
Copy link
Member

mrotteveel commented Sep 13, 2019

@PizzaProgram

  • Why would be so bad to add ALL at once? (Instead of adding ONLY INT128 now...)

What benefit does adding additional aliases for the other types give us? We have standard types defined in the ISO SQL standard, and we have our own non-standard datatypes.

In my opinion, introducing non-standard aliases for standard types just because we - hypothetically at the moment - introduce an INT128 type seems busy work just for the sake of internal consistency. It will increase the amount of documentation, it will increase the amount of keywords (with potential for conflicts with current domains or other object names, etc), and using them will only lead to Firebird-specific SQL that is not portable to other databases. In addition, it might cause confusion.

What is the benefit of introducing this choice in datatype names?

And I have to admit, the same can be said about introducing INT128, as the same effect can be achieved with NUMERIC(38) or DECIMAL(38) (or NUMERIC(38,0)...). So my argument above can also be taken as an argument against my earlier proposal to introduce INT128.

  • Why are you forced to name it INT128? Why not something similar, that won't interfere with existing names, like:

Nothing forces us to use INT128, but given the implementation it is a logical choice, as it is self-documenting.

 _INT8 _INT16 _INT32 _INT64 _INT128 _INT256 _INT384 _INT512 _INT1024
//   or ... if it's possible, it would harmonize even better with RDB$ ... naming:
$INT8 $INT16 $INT32 $INT64 $INT128 $INT256 $INT384 $INT512 $INT1024  

Why would choose that naming over INT128, what benefit does prefixing with _ or $ give?

-Why not creating aliases for those too ?
(TINYINT = _INT8 ; HUGEINT = _INT128 ; GIANTINT, EXTREMINT = reserved)

My whole life long, karma taught me again and again:
Instead of "half solutions" to one problem it's always better to put the "whole thing right" at the same time. ;-)

Why do you think adding those aliases for the standard types is the 'right solution'?

@PizzaProgram
Copy link

PizzaProgram commented Sep 13, 2019

@mrotteveel

What benefit does adding additional aliases for the other types give us?

For "You", FB coders: Nothing, just extra work. :-(
For "Us", "End users": to make life easier and save 1000 people 5 minutes. (If there are "ready to use", predefined domains, both $INT128 and HUGEINT ... and also $UINT2 ... $UINT128 ... we can use.)
So we do not have to search, check, define, test, etc...

We have standard types defined in the ISO SQL standard, and we have our own non-standard datatypes.

Exactly! So why not implement both fully? (Well I guess I know why: extra work for you guys...)
But instead of "half piece of this" + "half of that" >> it helps a lot to learn FB for newbies.
For example: I like very much you have kept GEN_ID() for us, "old guys", but implemented an SQL standard SEQUENCE too. Nice job!

In my opinion, introducing non-standard aliases for standard types just because we - hypothetically at the moment - introduce an INT128 type seems busy work just for the sake of internal consistency. It will increase the amount of documentation, it will increase the amount of keywords (with potential for conflicts with current domains or other object names, etc), and using them will only lead to Firebird-specific SQL that is not portable to other databases. In addition, it might cause confusion.

I didn't know it would be soooo much extra work to create ONE of those, Copy>Paste and change 2>4>8>>...
Sorry to hear that!

What is the benefit of introducing this choice in datatype names?
IMHO, if you do only 1 new >> not too much. But if you introduce a FULL spectrum of "easy names", it will help a lot for all the programmers using FB.

Nothing forces us to use INT128, but given the implementation it is a logical choice, as it is self-documenting.
Why would choose that naming over INT128, what benefit does prefixing with _ or $ give?

I agree. Only there is a tiny problem with those databases migrating from prev. version:

  • What if there's a name conflict with a self-defined domain?

Why do you think adding those aliases for the standard types is the 'right solution'?

I'm not 100% sure those names are perfect, just expressed my opinion. If you know better names for $int256, $int512, I'm glad to hear some ;-)

...

  • Compatibility with previous Firebird 4 versions: It doesn't look like it will be possible to restore backups from previous versions and get correct values, same with opening existing databases populated with previous Firebird 4 versions.

+1 I agree with ODS version change.

PS: Thanks for reading my suggestions. I won't post any more to this thread. From now on, it's up to You guys, which path you choose.

@mrotteveel
Copy link
Member

@PizzaProgram

What benefit does adding additional aliases for the other types give us?

For "You", FB coders: Nothing, just extra work. :-(

To be clear, I'm not a core Firebird developer. I was challenging your idea to find out what your underlying arguments for such a feature are.

@AlexPeshkoff
Copy link
Member Author

Thanks to Mark - I've fixed most of issues noticed by him regarding use/naming of int128 as dec float.

Only one thing remains (specially if we are not gong to add separate SQL type for 128-bit integers) - how to call appropriate macro in Message.h (remember we have FB_SMALLINT, FB_INTEGER, FB_BIGINT) ?

@asfernandes
Copy link
Member

Only one thing remains (specially if we are not gong to add separate SQL type for 128-bit integers)

I have lost everything here. We are not going to add SQL type for 128-bit integers? Isn't the discusion about INT128 and a possible alias for it?

@mrotteveel
Copy link
Member

@asfernandes

Only one thing remains (specially if we are not gong to add separate SQL type for 128-bit integers)

I have lost everything here. We are not going to add SQL type for 128-bit integers? Isn't the discusion about INT128 and a possible alias for it?

No, this PR is about swapping the underlying type of NUMERIC/DECIMAL with a precision of 19 or greater from a Decimal128 to an Int128 (and increasing the maximum precision from 34 to 38). I muddied the waters by suggesting to also introduce INT128 as a type in SQL.

@AlexPeshkoff
Copy link
Member Author

AlexPeshkoff commented Sep 13, 2019 via email

@AlexPeshkoff AlexPeshkoff merged commit 861d536 into master Sep 16, 2019
@AlexPeshkoff AlexPeshkoff deleted the int128 branch September 17, 2019 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants