Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String to HUGEINT cast bug #5328

Closed
2 tasks done
taniabogatsch opened this issue Nov 14, 2022 · 9 comments · Fixed by #9581
Closed
2 tasks done

String to HUGEINT cast bug #5328

taniabogatsch opened this issue Nov 14, 2022 · 9 comments · Fixed by #9581

Comments

@taniabogatsch
Copy link
Contributor

What happens?

Casting from string to hugeint behaves incorrect.

To Reproduce

D  select '1.8259857912588366e+37'::hugeint;
┌───────────────────────────────────────────┐
│ CAST('1.8259857912588366e+37' AS HUGEINT) │
│                  int128                   │
├───────────────────────────────────────────┤
│    20000000000000000000000000000000000000 │
└───────────────────────────────────────────┘

But this works.

D select 1.8259857912588366e+37::hugeint;
┌─────────────────────────────────────────┐
│ CAST(1.8259857912588366e+37 AS HUGEINT) │
│                 int128                  │
├─────────────────────────────────────────┤
│  18259857912588365870837119913054699520 │
└─────────────────────────────────────────┘

OS:

iOS

DuckDB Version:

master

DuckDB Client:

CLI

Full Name:

Tania Bogatsch

Affiliation:

DuckDB Labs

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree
@l1t1

This comment was marked as abuse.

@Tishj
Copy link
Contributor

Tishj commented Nov 14, 2022

Seems to be caused on the first run of HugeIntegerCastOperation::HandleExponent (cast_operators.cpp:1518)
where before Flush() is called the data is:

(lldb) p result
(duckdb::HugeIntCastData) $30 = {
  hugeint = (lower = 1, upper = 0)
  intermediate = 1
  digits = '\0'
  decimal = true
}

and after:

(lldb) p result
(duckdb::HugeIntCastData) $31 = {
  hugeint = (lower = 2, upper = 0)
  intermediate = 0
  digits = '\0'
  decimal = true
}

Note: this is not me assigning myself, just got curious and thought I'd share some seemingly useful findings ;)

@papparapa
Copy link
Contributor

papparapa commented Nov 15, 2022

I tested it with INT and got similar result.

duckdb> select '1.8259857912588366e4'::int;
┌─────────────────────────────────────────┐
│ CAST('1.8259857912588366e4' AS INTEGER) │
╞═════════════════════════════════════════╡
│                                   20000 │
└─────────────────────────────────────────┘

So I think the cause is that HugeIntegerCastOperation::HandleDecimal and IntegerCastOperation::HandleDecimal called in IntegerCastLoop round the decimal even when there is an exponent.

while (pos < len) {
if (!StringUtil::CharacterIsDigit(buf[pos])) {
break;
}
if (!OP::template HandleDecimal<T, NEGATIVE, ALLOW_EXPONENT>(result, buf[pos] - '0')) {
return false;
}
pos++;
}

IntegerCastLoop calls generic HandleDecimal here.

If no one has started on it, I will try fixing it later.

@Tishj
Copy link
Contributor

Tishj commented Nov 16, 2022

Hmm that sounds likely, DECIMAL had a similar issue, maybe have a look at how it's handled there - or if you can come up with a nicer solution, that's even better :)

@papparapa
Copy link
Contributor

papparapa commented Nov 16, 2022

I looked at DecimalCastOperation, and it seems to handle the fractional part better than IntegerCastOperation and HugeIntegerCastOperation.
DecimalCastOperation stores the integer and decimal part in one value first.

template <class T, bool NEGATIVE>
static bool HandleDigit(T &state, uint8_t digit) {
if (state.result == 0 && digit == 0) {
// leading zero's don't count towards the digit count
return true;
}
if (state.digit_count == state.width - state.scale) {
// width of decimal type is exceeded!
return false;
}
state.digit_count++;
if (NEGATIVE) {
if (state.result < (NumericLimits<typename T::type_t>::Minimum() / 10)) {
return false;
}
state.result = state.result * 10 - digit;
} else {
if (state.result > (NumericLimits<typename T::type_t>::Maximum() / 10)) {
return false;
}
state.result = state.result * 10 + digit;
}
return true;
}

template <class T, bool NEGATIVE, bool ALLOW_EXPONENT>
static bool HandleDecimal(T &state, uint8_t digit) {
if (!ALLOW_EXPONENT && state.decimal_count == state.scale) {
// we exceeded the amount of supported decimals
// however, we don't throw an error here
// we just truncate the decimal
return true;
}
//! If we expect an exponent, we need to preserve the decimals
//! But we don't want to overflow, so we prevent overflowing the result with this check
if (state.digit_count + state.decimal_count >= DecimalWidth<decltype(state.result)>::max) {
return true;
}
state.decimal_count++;
if (NEGATIVE) {
state.result = state.result * 10 - digit;
} else {
state.result = state.result * 10 + digit;
}
return true;
}

DecimalCastOperation::HandleDigit stores the integer part in state.result, and DecimalCastOperation::HandleDecimal stores the fractional part in state.result too.

Later it adjusts scale and decides whether to round up or down if needed.

Maybe this logic can be reused in IntegerCastOperation and HugeIntegerCastOperation.

@l1t1

This comment was marked as abuse.

@l1t1

This comment was marked as abuse.

@github-actions
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@github-actions
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants