Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPC writer should truncate string array with all empty string #2312

Closed
JasonLi-cn opened this issue Aug 4, 2022 · 0 comments · Fixed by #2314
Closed

IPC writer should truncate string array with all empty string #2312

JasonLi-cn opened this issue Aug 4, 2022 · 0 comments · Fixed by #2314
Labels
arrow Changes to the arrow crate bug

Comments

@JasonLi-cn
Copy link
Contributor

Describe the bug

For an Array like StringArray that contains two Buffers, if its Value Buffer is empty and its Value Offset Buffer is not empty, which means that it contains all elements of the empty string. In this case, let slice = array.slice(0, n); sending slice will result in a truncation failure because buffer_need_truncate() returns false.

To Reproduce

    #[test]
    fn truncate_ipc_string_array_with_all_empty_string() {
        fn create_batch() -> RecordBatch {
            let schema = Schema::new(vec![
                Field::new("a", DataType::Utf8, true),
            ]);
            let a = StringArray::from(vec![Some(""), Some(""), Some(""), Some(""), Some("")]);
            RecordBatch::try_new(Arc::new(schema), vec![Arc::new(a)])
                .unwrap()
        }

        let record_batch = create_batch();
        let record_batch_slice = record_batch.slice(0, 1);
        let deserialized_batch = deserialize(serialize(&record_batch_slice));

        // actual
        assert_eq!(serialize(&record_batch).len(), serialize(&record_batch_slice).len());
        
        // expected
        // assert!(serialize(&record_batch).len() > serialize(&record_batch_slice).len());

        assert_eq!(record_batch_slice, deserialized_batch);
    }

Expected behavior

        // expected
        assert!(serialize(&record_batch).len() > serialize(&record_batch_slice).len());

Additional context

@JasonLi-cn JasonLi-cn added the bug label Aug 4, 2022
@viirya viirya changed the title IPC truncation failure IPC writer should truncate string array with all empty string Aug 4, 2022
@alamb alamb added the arrow Changes to the arrow crate label Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants