-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Followups on column reader changes #17293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Followups on column reader changes #17293
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #17293 +/- ##
=========================================
Coverage 63.22% 63.23%
- Complexity 1474 1475 +1
=========================================
Files 3147 3148 +1
Lines 187575 187586 +11
Branches 28712 28710 -2
=========================================
+ Hits 118601 118618 +17
Misses 59771 59771
+ Partials 9203 9197 -6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
The test case failures look unrelated. And are passing locally. So should be intermittent failures |
9aman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor queries. They are mostly questions and not as such suggestions for improvement.
| * Get the value at the given document ID as a Java Object. | ||
| * Can be used for both single-value and multi-value columns. | ||
| * This should be used if | ||
| * 1. The data type is not known at compile time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compile time ??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea. updated the comment to clarify more
| Assert.assertTrue(reader.hasNext()); | ||
| Assert.assertFalse(reader.isNextNull()); | ||
| Assert.assertTrue(reader.isNextNull()); | ||
| Assert.assertEquals(reader.nextInt(), expectedValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change kind of makes these tests depict the wrong access behaviour
nextInt should only be called when isNextNull is false.
We can still call hasNext and next as it does not guarantee non-null returns.
Confused me a bit while reading.
Is guess the entire objective to return true in isNextNull is to ensure that the reader does not go ahead with some default value reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern here is that hadNext() and next() access pattern for a long default column will give 0 while isNextNull() access pattern would expect the user to skip the value and handle nulls.
Or do we return null object as the default value ?
_defaultValue = defaultNullValue;
even for primitive types ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nextInt should only be called when isNextNull is false.
Yes in https://github.com/apache/pinot/pull/17304/files#diff-b2f354bd2a7666251988f742a8e710c754933547eaa67fbff28992ed18997705 I am making the change to throw an exception for nextInt() in DefaultValueColumnReader
I am removing this change of returning true from this PR as its handled better in that PR
| @Override | ||
| public ColumnReader getColumnReader(String columnName) | ||
| throws IOException { | ||
| public ColumnReader getColumnReader(String columnName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we moving away from throwing an exception. Is the logic similar to that of next() where the caller is expected to return null.
Is so, are there benefits of the same. ?
In case we are sticking to this, can we just update the interface to say that a null will be returned in case the ColumnReader is not present.
/**
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case for the method is to get a column reader from a input file.
The client needs to know if the input file doesn't contain the column. If we throw an IOException in such a case, we can't know if there is an exception in creating the reader or if the column doesn't exist.
Thus returning null can help the client differentiate that.
Currently PinotSegmentColumnReaderFactory creates a DefaultValueColumnReader if it doesn't exist. Ideally it shouldn't do that. The client should handle case where the input doesn't have the column (in some cases it might make sense to fail or not read that column etc)
I am not making the changes to remove DefaultValueColumnReader from PinotSegmentColumnReaderFactory because it might break something if some external code depends on that.
However other factories that we will create will return null if a column doesn't exist. So made the change here too to return null. This shouldn't break anything major if some external code is using this factory
| * </ul> | ||
| * <p> | ||
| */ | ||
| public interface ColumnarDataSource extends Closeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noob question
@krishan1390
Why can't we directly use: ColumnReaderFactory
How and where does this additional level of abstraction help ?
I see an already written columnar reader that is being used during segment reload that directly relies on the factory ?
What value add do we expect from this interface in future ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the java doc in the class and createColumnReaderFactory() help in answering this ?
ColumnarDataSource is similar to RecordReaderFileConfig that it holds required attributes to initialise the ColumnReaderFactory / ColumnReader
For the same input attributes (file name, etc), we want to create independent instances of ColumnReaderFactory / ColumnReader
9aman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Uh oh!
There was an error while loading. Please reload this page.