String to HUGEINT cast bug #5328

taniabogatsch · 2022-11-14T08:51:57Z

What happens?

Casting from string to hugeint behaves incorrect.

To Reproduce

D  select '1.8259857912588366e+37'::hugeint;
┌───────────────────────────────────────────┐
│ CAST('1.8259857912588366e+37' AS HUGEINT) │
│                  int128                   │
├───────────────────────────────────────────┤
│    20000000000000000000000000000000000000 │
└───────────────────────────────────────────┘

But this works.

D select 1.8259857912588366e+37::hugeint;
┌─────────────────────────────────────────┐
│ CAST(1.8259857912588366e+37 AS HUGEINT) │
│                 int128                  │
├─────────────────────────────────────────┤
│  18259857912588365870837119913054699520 │
└─────────────────────────────────────────┘

OS:

iOS

DuckDB Version:

master

DuckDB Client:

CLI

Full Name:

Tania Bogatsch

Affiliation:

DuckDB Labs

Have you tried this on the latest `master` branch?

I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

I agree

The text was updated successfully, but these errors were encountered:

Tishj · 2022-11-14T09:25:16Z

Seems to be caused on the first run of HugeIntegerCastOperation::HandleExponent (cast_operators.cpp:1518)
where before Flush() is called the data is:

(lldb) p result
(duckdb::HugeIntCastData) $30 = {
  hugeint = (lower = 1, upper = 0)
  intermediate = 1
  digits = '\0'
  decimal = true
}

and after:

(lldb) p result
(duckdb::HugeIntCastData) $31 = {
  hugeint = (lower = 2, upper = 0)
  intermediate = 0
  digits = '\0'
  decimal = true
}

Note: this is not me assigning myself, just got curious and thought I'd share some seemingly useful findings ;)

papparapa · 2022-11-15T15:47:47Z

I tested it with INT and got similar result.

duckdb> select '1.8259857912588366e4'::int;
┌─────────────────────────────────────────┐
│ CAST('1.8259857912588366e4' AS INTEGER) │
╞═════════════════════════════════════════╡
│                                   20000 │
└─────────────────────────────────────────┘

So I think the cause is that HugeIntegerCastOperation::HandleDecimal and IntegerCastOperation::HandleDecimal called in IntegerCastLoop round the decimal even when there is an exponent.

duckdb/src/common/operator/cast_operators.cpp

Lines 884 to 892 in 0f04611

    
           while (pos < len) { 
        
           	if (!StringUtil::CharacterIsDigit(buf[pos])) { 
        
           		break; 
        
           	} 
        
           	if (!OP::template HandleDecimal<T, NEGATIVE, ALLOW_EXPONENT>(result, buf[pos] - '0')) { 
        
           		return false; 
        
           	} 
        
           	pos++; 
        
           }

IntegerCastLoop calls generic HandleDecimal here.

If no one has started on it, I will try fixing it later.

Tishj · 2022-11-16T08:53:48Z

Hmm that sounds likely, DECIMAL had a similar issue, maybe have a look at how it's handled there - or if you can come up with a nicer solution, that's even better :)

papparapa · 2022-11-16T11:30:05Z

I looked at DecimalCastOperation, and it seems to handle the fractional part better than IntegerCastOperation and HugeIntegerCastOperation.
DecimalCastOperation stores the integer and decimal part in one value first.

duckdb/src/common/operator/cast_operators.cpp

Lines 1594 to 1617 in 0f04611

    
           template <class T, bool NEGATIVE> 
        
           static bool HandleDigit(T &state, uint8_t digit) { 
        
           	if (state.result == 0 && digit == 0) { 
        
           		// leading zero's don't count towards the digit count 
        
           		return true; 
        
           	} 
        
           	if (state.digit_count == state.width - state.scale) { 
        
           		// width of decimal type is exceeded! 
        
           		return false; 
        
           	} 
        
           	state.digit_count++; 
        
           	if (NEGATIVE) { 
        
           		if (state.result < (NumericLimits<typename T::type_t>::Minimum() / 10)) { 
        
           			return false; 
        
           		} 
        
           		state.result = state.result * 10 - digit; 
        
           	} else { 
        
           		if (state.result > (NumericLimits<typename T::type_t>::Maximum() / 10)) { 
        
           			return false; 
        
           		} 
        
           		state.result = state.result * 10 + digit; 
        
           	} 
        
           	return true; 
        
           }

duckdb/src/common/operator/cast_operators.cpp

Lines 1677 to 1697 in 0f04611

    
           template <class T, bool NEGATIVE, bool ALLOW_EXPONENT> 
        
           static bool HandleDecimal(T &state, uint8_t digit) { 
        
           	if (!ALLOW_EXPONENT && state.decimal_count == state.scale) { 
        
           		// we exceeded the amount of supported decimals 
        
           		// however, we don't throw an error here 
        
           		// we just truncate the decimal 
        
           		return true; 
        
           	} 
        
           	//! If we expect an exponent, we need to preserve the decimals 
        
           	//! But we don't want to overflow, so we prevent overflowing the result with this check 
        
           	if (state.digit_count + state.decimal_count >= DecimalWidth<decltype(state.result)>::max) { 
        
           		return true; 
        
           	} 
        
           	state.decimal_count++; 
        
           	if (NEGATIVE) { 
        
           		state.result = state.result * 10 - digit; 
        
           	} else { 
        
           		state.result = state.result * 10 + digit; 
        
           	} 
        
           	return true; 
        
           }

DecimalCastOperation::HandleDigit stores the integer part in state.result, and DecimalCastOperation::HandleDecimal stores the fractional part in state.result too.

Later it adjusts scale and decides whether to round up or down if needed.

Maybe this logic can be reused in IntegerCastOperation and HugeIntegerCastOperation.

github-actions · 2023-07-29T00:30:12Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions · 2023-10-31T00:30:51Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

taniabogatsch added the bug label Nov 14, 2022

This comment was marked as abuse.

Sign in to view

papparapa mentioned this issue Nov 16, 2022

Fix string with exponent to integer cast #5381

Closed

github-actions bot added the stale label Jul 29, 2023

taniabogatsch added reproduced and removed bug stale labels Aug 1, 2023

github-actions bot added the stale label Oct 31, 2023

taniabogatsch removed the stale label Oct 31, 2023

nickgerrets mentioned this issue Nov 6, 2023

Fix: string to integer cast #9581

Merged

nickgerrets closed this as completed Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String to HUGEINT cast bug #5328

String to HUGEINT cast bug #5328

taniabogatsch commented Nov 14, 2022

This comment was marked as abuse.

Tishj commented Nov 14, 2022 •

edited

papparapa commented Nov 15, 2022 •

edited

Tishj commented Nov 16, 2022 •

edited

papparapa commented Nov 16, 2022 •

edited

This comment was marked as abuse.

This comment was marked as abuse.

github-actions bot commented Jul 29, 2023

github-actions bot commented Oct 31, 2023

String to HUGEINT cast bug #5328

String to HUGEINT cast bug #5328

Comments

taniabogatsch commented Nov 14, 2022

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Full Name:

Affiliation:

Have you tried this on the latest master branch?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

This comment was marked as abuse.

Tishj commented Nov 14, 2022 • edited

papparapa commented Nov 15, 2022 • edited

Tishj commented Nov 16, 2022 • edited

papparapa commented Nov 16, 2022 • edited

This comment was marked as abuse.

This comment was marked as abuse.

github-actions bot commented Jul 29, 2023

github-actions bot commented Oct 31, 2023

Have you tried this on the latest `master` branch?

Tishj commented Nov 14, 2022 •

edited

papparapa commented Nov 15, 2022 •

edited

Tishj commented Nov 16, 2022 •

edited

papparapa commented Nov 16, 2022 •

edited