Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Feb 11, 2021

As pointed out by @tustvold on https://github.com/influxdata/influxdb_iox/pull/783#discussion_r574414243, it would seem the whole point of NullArray::new_with_type, is to cheaply construct entirely null columns, with a smaller memory footprint.

Currently trying to print them out causes a panic:

    #[test]
    fn test_pretty_format_null() -> Result<()> {
        // define a schema.
        let schema = Arc::new(Schema::new(vec![
            Field::new("a", DataType::Utf8, true),
            Field::new("b", DataType::Int32, true),
        ]));

        let num_rows = 4;

        // define data (null)
        let batch = RecordBatch::try_new(
            schema,
            vec![
                Arc::new(NullArray::new_with_type(num_rows, DataType::Utf8)),
                Arc::new(NullArray::new_with_type(num_rows, DataType::Int32)),
            ],
        )?;

        let table = pretty_format_batches(&[batch])?;
}

Panics:

failures:

---- util::pretty::tests::test_pretty_format_null stdout ----
thread 'util::pretty::tests::test_pretty_format_null' panicked at 'called `Option::unwrap()` on a `None` value', arrow/src/util/display.rs:201:27

Note

_Update: the issue with NullArray claiming to be types such as Int32 has been fixed by @nevi-me in #9469 _

@github-actions
Copy link

@nevi-me
Copy link
Contributor

nevi-me commented Feb 11, 2021

heads-up @alamb, the implementation of NullArray::new_with_type is not sound, because it's not compliant with the spec.
A NullArray must always have DataType::Null.

I was working on a solution for this last night, but didn't finish it; I'll wrap it up tonight. This mainly fixes #9331. [EDIT: I've opened https://github.com//pull/9469].

This PR will still be necessary, as we'll need to be able to print NullArray correctly (but haven't looked at the code)

@paddyhoran
Copy link
Contributor

Sorry, I didn't scroll and see the comment from @nevi-me. It only affects your test but I guess we should resolve these items first.

@alamb
Copy link
Contributor Author

alamb commented Feb 11, 2021

Thanks @paddyhoran -- I think the logic in this particular PR is sound (with the possibly exception that NullArray::new_with_type is going away) as it just checks for nulls and adds a test :) I am going over to lend a hand with #9469 and hopefully we'll have something good

@alamb alamb marked this pull request as draft February 11, 2021 19:36
@alamb
Copy link
Contributor Author

alamb commented Feb 11, 2021

Switching back to draft until #9469 gets merged

@alamb alamb force-pushed the alamb/support-null-printing branch from 29b95cf to 12e73b3 Compare February 13, 2021 21:33
@alamb
Copy link
Contributor Author

alamb commented Feb 13, 2021

Rebased to pick up #9469

@alamb alamb marked this pull request as ready for review February 13, 2021 21:33
Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thanks @alamb !

I agree with @nevi-me : Array::data_type(&a) == DataType::Null if and only if a.as_ref().downcast_ref() is infalible.

@alamb , the general solution here is to have a new entry in match data_type for DataType::Null. In general, every missing branch on these matches is either an unimplemented feature or unreachable. :)

@alamb
Copy link
Contributor Author

alamb commented Feb 14, 2021

I agree with @nevi-me : Array::data_type(&a) == DataType::Null if and only if a.as_ref().downcast_ref() is infalible.

Good call @jorgecarleitao -- I didn't realize that the real bug was NullArray created by NullArray::new_with_type claiming to be DataType::Int32, etc. Since this has been fixed by @nevi-me in #9469 I am feeling much better and have updated this PR's description to remove the "Note" section

@alamb alamb closed this in f55a3f3 Feb 14, 2021
sgnkc pushed a commit to sgnkc/arrow that referenced this pull request Feb 17, 2021
As pointed out by @tustvold  on https://github.com/influxdata/influxdb_iox/pull/783#discussion_r574414243, it would seem the whole point of `NullArray::new_with_type`, is to cheaply construct entirely null columns, with a smaller memory footprint.

Currently trying to print them out causes a panic:

```
    #[test]
    fn test_pretty_format_null() -> Result<()> {
        // define a schema.
        let schema = Arc::new(Schema::new(vec![
            Field::new("a", DataType::Utf8, true),
            Field::new("b", DataType::Int32, true),
        ]));

        let num_rows = 4;

        // define data (null)
        let batch = RecordBatch::try_new(
            schema,
            vec![
                Arc::new(NullArray::new_with_type(num_rows, DataType::Utf8)),
                Arc::new(NullArray::new_with_type(num_rows, DataType::Int32)),
            ],
        )?;

        let table = pretty_format_batches(&[batch])?;
}

```

Panics:

```
failures:

---- util::pretty::tests::test_pretty_format_null stdout ----
thread 'util::pretty::tests::test_pretty_format_null' panicked at 'called `Option::unwrap()` on a `None` value', arrow/src/util/display.rs:201:27
```

# Note

_Update: the issue with  `NullArray` claiming to be types such as  `Int32` has been fixed by @nevi-me  in apache#9469 _

Closes apache#9468 from alamb/alamb/support-null-printing

Authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants