Skip to content

Conversation

@maxburke
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

We need to be able to base64 encode fixedsizebinary types.

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the functions Changes to functions implementation label Nov 26, 2025
#[cfg(test)]
mod tests {
#[test]
fn test_encode_fsb() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be added as an SLT test instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? SLT tests are quite unapproachable for those who don't work in datafusion regularly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it is very necessary, because it reveals that the signature of the function doesn't actually accept FixedSizeBinary arrays, something calling the function directly in code like this wouldn't make obvious.

While SLTs can be harder to write, we prefer it because they are less verbose, and they can test functions with a more stable interface (SQL) rather than their internal APIs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something you would be able to help out with?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the README and see some other tests for reference:

# Arrays tests
query T
SELECT encode(bin_field, 'hex') FROM test ORDER BY num;
----
616263
7177657177
NULL
8f50d3f60eae370ddbf85c86219c55108a350165

arrow_cast(column1, 'FixedSizeBinary(3)') as "column1",
arrow_cast(column2, 'FixedSizeBinary(3)') as "column2"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I don't have the bandwidth to dig into SLT tests at this time. I have provided this MR, it contains tests, and in the past this has been sufficient for Datafusion. You may take it from here as you wish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell this PR doesn't actually fix anything; have you tested this according to your use case for the issue you raised? (And I don't mean the unit tests you introduced here because those are internal tests)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell this PR doesn't actually fix anything

How so? Before, calling encode on fixedsizebinary types would have returned an unsupported data type error.

We've been using in production since April when we first made the change: urbanlogiq@9dfd561

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some possible tests:

# test for FixedSizeBinary support for encode
statement ok
CREATE TABLE test_fsb AS 
SELECT arrow_cast(X'0123456789ABCDEF', 'FixedSizeBinary(8)') as fsb_col;

query TT
SELECT 
  encode(fsb_col, 'base64') AS fsb_base64,
  encode(fsb_col, 'hex') AS fsb_hex
FROM test_fsb;
----
ASNFZ4mrze8 0123456789abcdef

# Test with NULL
query T
SELECT encode(arrow_cast(NULL, 'FixedSizeBinary(8)'), 'base64');
----
NULL

LargeUtf8 => LargeUtf8,
Utf8View => Utf8,
Binary => Utf8,
LargeBinary => LargeUtf8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FixedSizeBinary should be added here, no ?

Suggested change
LargeBinary => LargeUtf8,
FixedSizeBinary(_) => Utf8,

}
DataType::LargeUtf8 => Ok(vec![DataType::LargeUtf8, DataType::Utf8]),
DataType::Binary => Ok(vec![DataType::Binary, DataType::Utf8]),
DataType::LargeBinary => Ok(vec![DataType::LargeBinary, DataType::Utf8]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FixedSizeBinary should be added here too

Suggested change
DataType::LargeBinary => Ok(vec![DataType::LargeBinary, DataType::Utf8]),
DataType::FixedSizeBinary(size) => Ok(vec\![DataType::FixedSizeBinary(*size), DataType::Utf8]),

#[cfg(test)]
mod tests {
#[test]
fn test_encode_fsb() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some possible tests:

# test for FixedSizeBinary support for encode
statement ok
CREATE TABLE test_fsb AS 
SELECT arrow_cast(X'0123456789ABCDEF', 'FixedSizeBinary(8)') as fsb_col;

query TT
SELECT 
  encode(fsb_col, 'base64') AS fsb_base64,
  encode(fsb_col, 'hex') AS fsb_hex
FROM test_fsb;
----
ASNFZ4mrze8 0123456789abcdef

# Test with NULL
query T
SELECT encode(arrow_cast(NULL, 'FixedSizeBinary(8)'), 'base64');
----
NULL

encode_process(&value, Encoding::Base64).unwrap()
else {
panic!("unexpected value");
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To cover the missing usage of return_type and coerce_types you need something like:

let encode_func = EncodeFunc::new();
    let args = vec![
        ColumnarValue::Array(Arc::new(array)),
        ColumnarValue::Scalar(ScalarValue::Utf8(Some("base64".to_string()))),
    ];
    
    // This will test the full path including type checking
    let result = encode_func.invoke_with_args(datafusion_expr::ScalarFunctionArgs {
        args,
        number_rows: 1,
        return_type: &DataType::Utf8,
    }).unwrap();
    
    let ColumnarValue::Array(result) = result else {
        panic!("unexpected value");
    };
    ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FixedSizeBinary should be an allowable input to base64 encoding

3 participants