Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-5743: Handling column family and column scan for hbase #975

Closed
wants to merge 1 commit into from

Conversation

@prasadns14
Copy link
Contributor

prasadns14 commented Oct 5, 2017

This PR handles the scenario where the projected column list contains both a column family and a column within the same family.

@paul-rogers please review

Copy link
Contributor

paul-rogers left a comment

Overall, looks very good. A few minor questions and suggestions.

@@ -97,6 +97,7 @@ public HBaseRecordReader(Connection connection, HBaseSubScan.HBaseSubScanSpec su
@Override
protected Collection<SchemaPath> transformColumns(Collection<SchemaPath> columns) {
Set<SchemaPath> transformed = Sets.newLinkedHashSet();
Set<String> completeFamilies = Sets.newHashSet();

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Oct 6, 2017

Contributor

Do we need to worry about case sensitivity here?

This comment has been minimized.

Copy link
@prasadns14

prasadns14 Oct 7, 2017

Author Contributor

I observed that the planner takes care of it. It returns a single column family if there are more than one column family with same name but different case.
I still made the change to make it case insensitive.

} else {
hbaseScan.addFamily(family);
completeFamilies.add(new String(family));
}

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Oct 6, 2017

Contributor

This code would greatly benefit from a comment to explain what's happening. Would suggest a Javadoc comment for the function explaining the transform rules.

hbaseScan.addFamily(family);
if (!completeFamilies.contains(new String(family))) {
if (child != null && child.isNamed()) {
byte[] qualifier = child.getNameSegment().getPath().getBytes();

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Oct 6, 2017

Contributor

This assumes UTF-8 encoding for the name. Can we be sure that HBase always uses UTF-8 for its encoding? Or, does HBase only support ASCII names so that we need only the ASCII subset of UTF-8? What happens if the user puts a non-ASCII character into the name in this case?

table.mutate(p);

p = new Put("a2".getBytes());
p.addColumn("f".getBytes(), "c1".getBytes(), "11".getBytes());

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Oct 6, 2017

Contributor

Here, we are deciding to encode names as UTF-8. Is this a standard? Or, is it our own convention? Could we have used some other encoding? If we do, how do we tell the code above what encoding we chose?

This comment has been minimized.

Copy link
@prasadns14

prasadns14 Oct 7, 2017

Author Contributor

We currently assume encoding to be UTF-8. Support for different encoding can be addressed through DRILL-5825.

@prasadns14 prasadns14 force-pushed the prasadns14:DRILL-5743 branch from 0469197 to f665485 Oct 7, 2017
@prasadns14 prasadns14 force-pushed the prasadns14:DRILL-5743 branch from f665485 to 07db39f Oct 7, 2017
@prasadns14

This comment has been minimized.

Copy link
Contributor Author

prasadns14 commented Oct 7, 2017

@paul-rogers please review

Copy link
Contributor

paul-rogers left a comment

Thanks for the PR!
LGTM
+1

@asfgit asfgit closed this in e90d96a Oct 16, 2017
ilooner pushed a commit to ilooner/drill that referenced this pull request Jan 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.