Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-7723 - Add Excel Metadata as Implicit Fields #2069

Merged
merged 1 commit into from Jun 14, 2020

Conversation

cgivre
Copy link
Contributor

@cgivre cgivre commented Apr 30, 2020

DRILL-7723: Add Excel Metadata as Implicit Fields

Description

Excel files contain a significant number of metadata fields in them. This PR adds the ability to access these fields via implicit columns. The columns are:

_category
_content_status
_content_type;
_creator
_description
_identifier
_keywords
_last_modified_by_user
_revision
_subject
_title
_created
_last_printed
_modified

These fields are not projected in star queries so there is no effect on existing queries.

Documentation

Updated the README.md file with the implicit field information.

Testing

Added a unit test for the metadata fields.

@cgivre cgivre added the enhancement PRs that add a new functionality to Drill label Apr 30, 2020
@cgivre cgivre self-assigned this Apr 30, 2020
@cgivre cgivre requested a review from vvysotskyi May 11, 2020 19:30
@vvysotskyi
Copy link
Member

@cgivre, will these "implicit" columns values be populated (but not returned) every time the file is read even they aren't specified in the query?

@cgivre
Copy link
Contributor Author

cgivre commented May 17, 2020

@vvysotskyi
That is correct. This follows the model for EVF readers where data is written to the writers and the EVF framework determines which columns to actually project.

@cgivre
Copy link
Contributor Author

cgivre commented May 22, 2020

@vvysotskyi
Any comments on this or is it good to go?

@vvysotskyi
Copy link
Member

@cgivre, I'll take a look this weekend.

private static final String MISSING_FIELD_NAME_HEADER = "field_";

private static final int ROW_CACHE_SIZE = 100;
private enum IMPLICIT_STRING_COLUMNS {
_category,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use naming according to JCC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvysotskyi
Are you referring to the field names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please name elements of enum in the upper case. And do not start them with an underscore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And please rename enums with correct casing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. Thanks for the review :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvysotskyi
To clarify, the fields here are metadata fields and I wanted to make sure that they do not conflict with fields that might actually be in the users' data. It is a fairly standard practice to begin metadata fields with an _. I've seen this done in many other platforms.

In any event, how would you like me to proceed with this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a field to enum wouldn't make things too complex, using enums in such way is a common practice. Regarding naming fields in other projects, the specific project has its own code-style rules.

If you have a strong objection to this, please merge the PR as it is, I've just expressed my opinion about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvysotskyi
Ok.. I see what you're getting at. I'll refactor the enum with a value. I misunderstood what you were asking. In any event, do you have any objection to the actual column name starting with an underscore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ok to have such a column name, my objection was only related to the enum element names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvysotskyi
Enum refactored and commits squashed. Are we good to go?

Copy link
Member

@vvysotskyi vvysotskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making changes, LGTM, +1

@cgivre cgivre merged commit 2564908 into apache:master Jun 14, 2020
@cgivre cgivre deleted the excel_metadata branch June 14, 2020 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement PRs that add a new functionality to Drill
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants