Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-8358: Storage plugin for querying other Apache Drill clusters #2709

Merged
merged 3 commits into from
Dec 12, 2022

Conversation

vvysotskyi
Copy link
Member

DRILL-8358: Storage plugin for querying other Apache Drill clusters

Description

Using native client to query other drill clusters. Added logic to do various pushdowns when possible.
Fixed adding extra project for the case of star columns.
Fixed ignoring column with empty name column for excel format.

Documentation

See README.md

Testing

Tested manually, added UT.

@vvysotskyi vvysotskyi self-assigned this Nov 25, 2022
@vvysotskyi vvysotskyi added enhancement PRs that add a new functionality to Drill new-storage New Storage Plugin labels Nov 25, 2022
super("Query timed out in "+ timeoutValueInSeconds + " seconds");
}
}
package org.apache.drill.exec.store.drill.plugin;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Git decided that this was the renaming of a file 😏

try {
String urlSuffix = connection.substring(CONNECTION_STRING_PREFIX.length());
Properties props = ConnectStringParser.parse(urlSuffix, properties);
props.putAll(credentialsProvider.getUserCredentials(userName));
Copy link
Contributor

@jnturton jnturton Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This getUserCredentials(String username) method is meant to fetch per-query-user credentials for plugins that are in user translation auth mode while the nullary method getUserCredentials() is meant for shared credentials. Only the plain and Vault providers currently support per-user credentials. You can see some logic for deciding which to call (via UsernamePasswordCredentials objects) in JdbcStorageConfig on line 142.

Those APIs wound up being a little ugly :/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

Copy link
Contributor

@cgivre cgivre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vvysotskyi for this. I had a usage question as well.

Let's say that I have 2 drills, drill1 and drill2. Let's say that drill2 is connected to a file system called dfs2 and I want to query that from drill1. What would the query look like?
Would it be something like:

SELECT *
FROM drill2.dfs.ws.`file`

@JsonCreator
public DrillSubScan(
@JsonProperty("userName") String userName,
@JsonProperty("mongoPluginConfig") StoragePluginConfig pluginConfig,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be mongoPluginConfig?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it isn't, thanks, fixed it.

@vvysotskyi
Copy link
Member Author

@cgivre, yes, you can create a plugin in drill1 with the name drill2, and query all plugins that drill2 has configured from drill1, so if drill2 has file system plugin called dfs2, query for drill1 will be the following:

SELECT *
FROM drill2.dfs2.ws.`file`

Copy link
Contributor

@jnturton jnturton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making changes.

.recordCount();

assertEquals(50L, recordCount);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test of a schema path that descends through directories in a filesystem plugin on the remote Drill cluster? E.g.

select * from drill.`dfs.tmp`.`/path/to/foo.parquet`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll add a unit test for it in one of the future pull requests.

@cgivre cgivre merged commit 314105c into apache:master Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement PRs that add a new functionality to Drill new-storage New Storage Plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants