New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRILL-7723 - Add Excel Metadata as Implicit Fields #2069
Conversation
@cgivre, will these "implicit" columns values be populated (but not returned) every time the file is read even they aren't specified in the query? |
@vvysotskyi |
@vvysotskyi |
@cgivre, I'll take a look this weekend. |
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
Outdated
Show resolved
Hide resolved
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
Outdated
Show resolved
Hide resolved
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
Outdated
Show resolved
Hide resolved
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
Outdated
Show resolved
Hide resolved
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
Show resolved
Hide resolved
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
Outdated
Show resolved
Hide resolved
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
Outdated
Show resolved
Hide resolved
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
Outdated
Show resolved
Hide resolved
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
Outdated
Show resolved
Hide resolved
contrib/format-excel/src/test/java/org/apache/drill/exec/store/excel/TestExcelFormat.java
Outdated
Show resolved
Hide resolved
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
Outdated
Show resolved
Hide resolved
private static final String MISSING_FIELD_NAME_HEADER = "field_"; | ||
|
||
private static final int ROW_CACHE_SIZE = 100; | ||
private enum IMPLICIT_STRING_COLUMNS { | ||
_category, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use naming according to JCC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vvysotskyi
Are you referring to the field names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please name elements of enum in the upper case. And do not start them with an underscore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And please rename enums with correct casing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Thanks for the review :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vvysotskyi
To clarify, the fields here are metadata fields and I wanted to make sure that they do not conflict with fields that might actually be in the users' data. It is a fairly standard practice to begin metadata fields with an _
. I've seen this done in many other platforms.
In any event, how would you like me to proceed with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a field to enum wouldn't make things too complex, using enums in such way is a common practice. Regarding naming fields in other projects, the specific project has its own code-style rules.
If you have a strong objection to this, please merge the PR as it is, I've just expressed my opinion about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vvysotskyi
Ok.. I see what you're getting at. I'll refactor the enum
with a value. I misunderstood what you were asking. In any event, do you have any objection to the actual column name starting with an underscore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is ok to have such a column name, my objection was only related to the enum element names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vvysotskyi
Enum refactored and commits squashed. Are we good to go?
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
Outdated
Show resolved
Hide resolved
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
Outdated
Show resolved
Hide resolved
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
Outdated
Show resolved
Hide resolved
contrib/format-excel/src/main/java/org/apache/drill/exec/store/excel/ExcelBatchReader.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making changes, LGTM, +1
DRILL-7723: Add Excel Metadata as Implicit Fields
Description
Excel files contain a significant number of metadata fields in them. This PR adds the ability to access these fields via implicit columns. The columns are:
These fields are not projected in star queries so there is no effect on existing queries.
Documentation
Updated the
README.md
file with the implicit field information.Testing
Added a unit test for the metadata fields.