Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-6847: Add Query Metadata to RESTful Interface #1539

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
4 participants
@cgivre
Copy link
Contributor

cgivre commented Nov 13, 2018

The Drill RESTful interface does not return the structure of the query results. This makes integrating Drill with other BI tools difficult because they do not know what kind of data to expect.
This PR adds a new section to the results called Metadata which contains a list of the minor types of all the columns returned.

The query below will now return the following in the RESTful interface:

SELECT CAST( employee_id AS INT) AS employee_id,
full_name,
first_name, 
last_name, 
CAST( position_id AS BIGINT) AS position_id, 
position_title 
FROM cp.`employee.json` LIMIT 2
{
  "queryId": "2414bf3f-b4f4-d4df-825f-73dfb3a56681",
  "columns": [
    "employee_id",
    "full_name",
    "first_name",
    "last_name",
    "position_id",
    "position_title"
  ],
  "metadata": [
    "INT",
    "VARCHAR",
    "VARCHAR",
    "VARCHAR",
    "BIGINT",
    "VARCHAR"
  ],
  "rows": [
    {
      "full_name": "Sheri Nowmer",
      "employee_id": "1",
      "last_name": "Nowmer",
      "position_title": "President",
      "first_name": "Sheri",
      "position_id": "1"
    },
    {
      "full_name": "Derrick Whelply",
      "employee_id": "2",
      "last_name": "Whelply",
      "position_title": "VP Country Manager",
      "first_name": "Derrick",
      "position_id": "2"
    }
  ]
}
@vdiravka
Copy link
Member

vdiravka left a comment

Looks good to me. Please rename the commit message to:

DRILL-6847: Add Query Metadata to RESTful Interface

@cgivre cgivre changed the title DRILL-6847 Add Query Metadata to RESTful Interface DRILL-6847: Add Query Metadata to RESTful Interface Nov 14, 2018


MaterializedField col = loader.getSchema().getColumn(i);
columns.add(col.getName());
metadata.add(col.getType().getMinorType().name());

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Nov 14, 2018

Member

Some types can have precision and scale, is this information needed?

This comment has been minimized.

Copy link
@cgivre

cgivre Nov 14, 2018

Author Contributor

Hi @arina-ielchiieva ,
The use case I had in mind was integrating Drill with SQLPad and Apache Superset. In these instances basically, the UI needed to know if a field was numeric, temporal of any sort, or text so that it could render visualizations properly.

I'm sure there are other use cases out there, but I know that for me at least, this was a major blocker in getting Drill to work with the various BI tools. The JDBC interface provided this information, but the RESTful interface did not, so I had to resort to hackery.

So to answer your question, it might be useful for other use cases to provide precision and scale, but for the one I had in mind, that would not be helpful.

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Nov 14, 2018

Member

I see but though it is not your use case, I think we should consider shipping not only String type but as well information about precision and scale for those who might need it.

This comment has been minimized.

Copy link
@cgivre

cgivre Nov 14, 2018

Author Contributor

How would you recommend designing that? I was trying to keep this PR relatively simple, and backwards compatible, but one option might be to make the metadata a little duplicative so something like:

"metadata": [{
   "name": "price",
   "type": "FLOAT4"
   "precision":
   "scale"
   },{
     "name": "customer",
     "type": "VARCHAR"
  ...
]

Do you know off hand where the precision/scale or any other attributes of the columns can be accessed?

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Nov 14, 2018

Member
  1. Duplicating column name does not make sense.
  2. You may not output precision and scale is they are absent, depends in which Object you plan to deserialize this information.
  3. Look at major type, for example.
@kkhatua

This comment has been minimized.

Copy link
Contributor

kkhatua commented Nov 14, 2018

@cgivre I didn't build and try this out, but I'm curious on how we manage for select * from queries. Especially with schema changes... like fields being added or dropped between rows.
Also, I agree with @arina-ielchiieva 's point of not repeating the field names (unless schema change requires it). But I'm not sure how badly we break backward compatibility (perhaps carry a version in the REST response?).

@cgivre

This comment has been minimized.

Copy link
Contributor Author

cgivre commented Nov 15, 2018

HI @kkhatua
A few things. First re: breaking backward compatibility. Since the RESTful interface returns JSON KV pairs, and since all this is doing is introducing an additional key (metadata) which contains the column types, I don't think it will break anything since it isn't doing anything to the existing keys. No real way to know unless a lot of people complain, but the tools which I use that use the RESTful interface are not affected.

Secondly, regarding star queries, I did try a few on my local machine and you get a dump of all the column types. IE if your data contains 4 columns and you do a SELECT * you get:

  "metadata": [
    "INT",
    "VARCHAR",
    "VARCHAR",
    "VARCHAR"
  ],

Since this code gets called when results are returned, I don't think schema inconsistencies matter. IE if the query fails due to schema inconsistencies, you won't get metadata, and if it succeeds you get the results.

@arina-ielchiieva Would you like to see additional arrays with precision and scale with null values if it is not needed. IE

"scale": [
4,
null,
null,
null
]
@arina-ielchiieva

This comment has been minimized.

Copy link
Member

arina-ielchiieva commented Nov 15, 2018

@cgivre you mean inside of metadata?

  "metadata": [
    type: [
       "DECIMAL"
     ],
    precision: [
       5
    ],
   scale: [
     2
   ]
 ]
@cgivre

This comment has been minimized.

Copy link
Contributor Author

cgivre commented Nov 15, 2018

@arina-ielchiieva
I like that. If the column type doesn't have precision or scale, would it look like this?

"metadata": [
    type: [
       "DECIMAL",
       "VARCHAR"
     ],
    precision: [
       5,
       null
    ],
   scale: [
     2,
     null
   ]
 ]
@arina-ielchiieva

This comment has been minimized.

Copy link
Member

arina-ielchiieva commented Nov 15, 2018

Looks ok, try to implement and we'll see.

@cgivre

This comment has been minimized.

Copy link
Contributor Author

cgivre commented Nov 17, 2018

@arina-ielchiieva
I added the precision and scale.
The query:

SELECT CAST( employee_id AS DECIMAL) AS employee_id,
full_name,
first_name, 
last_name, 
CAST( position_id AS BIGINT) AS position_id, position_title 
FROM cp.`employee.json` LIMIT 2

Will result in the following metadata in the RESTful interface:

{
  "queryId": "240f59ad-9846-bee1-0668-575c00add18a",
  "columns": [
    "employee_id",
    "full_name",
    "first_name",
    "last_name",
    "position_id",
    "position_title"
  ],
  "metadata": {
    "precision": [
      38,
      null,
      null,
      null,
      null,
      null
    ],
    "scale": [
      0,
      null,
      null,
      null,
      null,
      null
    ],
    "type": [
      "VARDECIMAL",
      "VARCHAR",
      "VARCHAR",
      "VARCHAR",
      "BIGINT",
      "VARCHAR"
    ]
  },
  "rows": [
    {
      "full_name": "Sheri Nowmer",
      "employee_id": "1",
      "last_name": "Nowmer",
      "position_title": "President",
      "first_name": "Sheri",
      "position_id": "1"
    },
    {
      "full_name": "Derrick Whelply",
      "employee_id": "2",
      "last_name": "Whelply",
      "position_title": "VP Country Manager",
      "first_name": "Derrick",
      "position_id": "2"
    }
  ]
}
@arina-ielchiieva

This comment has been minimized.

Copy link
Member

arina-ielchiieva commented Nov 19, 2018

@cgivre GitHub diff show lots of changes which clearly unrelated to your changes. Please revert all unrelated formatting changes that might have caused it since it's hard to review what has been done.

@cgivre

This comment has been minimized.

Copy link
Contributor Author

cgivre commented Nov 19, 2018

Hi @arina-ielchiieva
Can you just hide the non-whitespace changes in GitHub for the review? I think when I ran the code-formatter in IntelliJ it reformatted those files.

If not, I'll revert and resubmit.

@arina-ielchiieva

This comment has been minimized.

Copy link
Member

arina-ielchiieva commented Nov 19, 2018

@cgivre submitting code reformatting without significant reason is not encouraged. Please use Drill dev guidelines to ensure you use the formatting accepted in Drill project. The same with star imports, this also can be configured in Intellij.

@cgivre cgivre force-pushed the cgivre:rest-metadata branch from f864d31 to 5ade409 Nov 20, 2018

@@ -126,14 +128,23 @@ private float getHeapUsage() {
public static class QueryResult {
private final String queryId;
public final Collection<String> columns;
public final HashMap<String, ArrayList> metadata;

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Nov 20, 2018

Member

Use more generic collections: public final Map<String, List> metadata;.
Plus List of what type? Maybe List<String>?

Please update here and in the code below.

this.rows = rows;
}

public QueryResult(QueryId queryId, WebUserConnection webUserConnection, List<Map<String, String>> rows) {

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Nov 20, 2018

Member

Why we have two constructors now?

This comment has been minimized.

Copy link
@cgivre

cgivre Nov 20, 2018

Author Contributor

I left the original constructor because I wasn't sure if it was used elsewhere and I didn't want to break anything. The new one passes the webUserConnection directly to the QueryResult and enables you to access all the newly added properties.

I can remove the old one if you'd prefer.

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Nov 20, 2018

Member

Please make sure it's not and remove if needed.

This comment has been minimized.

Copy link
@cgivre

cgivre Nov 20, 2018

Author Contributor

Fixed

@@ -106,7 +116,25 @@ public void sendData(RpcOutcomeListener<Ack> listener, QueryWritableBatch result
// TODO: Clean: DRILL-2933: That load(...) no longer throws
// SchemaChangeException, so check/clean catch clause below.
for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
columns.add(loader.getSchema().getColumn(i).getName());
MaterializedField col = loader.getSchema().getColumn(i);

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Nov 20, 2018

Member

What about varchar? This type has length...

This comment has been minimized.

Copy link
@cgivre

cgivre Nov 20, 2018

Author Contributor

This change was done for code cleanup. Since I was referring to the column several times, it was easier to create a col variable.


metadata.get("type").add(col.getType().getMinorType().name());

//minorType.add(col.getType().getMinorType().name());

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Nov 20, 2018

Member

Please clean up comments.

@cgivre

This comment has been minimized.

Copy link
Contributor Author

cgivre commented Nov 20, 2018

Hi @arina-ielchiieva
I like this approach much better. That way you don't have multiple parallel sparse lists. I'll work this up.

@cgivre cgivre force-pushed the cgivre:rest-metadata branch from c26e778 to 38bd277 Nov 20, 2018

@cgivre

This comment has been minimized.

Copy link
Contributor Author

cgivre commented Nov 20, 2018

@arina-ielchiieva
Ok, metadata is refomatted.

The query:

SELECT CAST( employee_id AS INT) AS employee_id, 
CAST( employee_id AS DECIMAL(8,4)) AS employee_id1, 
CAST(full_name AS VARCHAR(10)) AS full_name,
first_name, 
last_name, 
CAST( position_id AS BIGINT) AS position_id, 
position_title 
FROM cp.`employee.json` 
LIMIT 2

Results in the following metadata:

{
  "queryId": "240bc0a2-cb41-320d-17cf-072c912c83a8",
  "columns": [
    "employee_id",
    "employee_id1",
    "full_name",
    "first_name",
    "last_name",
    "position_id",
    "position_title"
  ],
  "rows": [
    {
      "full_name": "Sheri Nowm",
      "employee_id1": "1.0000",
      "employee_id": "1",
      "last_name": "Nowmer",
      "position_title": "President",
      "first_name": "Sheri",
      "position_id": "1"
    },
    {
      "full_name": "Derrick Wh",
      "employee_id1": "2.0000",
      "employee_id": "2",
      "last_name": "Whelply",
      "position_title": "VP Country Manager",
      "first_name": "Derrick",
      "position_id": "2"
    }
  ],
  "metadata": [
    "INT",
    "VARDECIMAL(8, 4)",
    "VARCHAR(10)",
    "VARCHAR",
    "VARCHAR",
    "BIGINT",
    "VARCHAR"
  ]
}
dataType.append(col.getType().getPrecision());

if (col.getType().hasScale()) {
dataType.append((", "));

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Nov 20, 2018

Member

Please remove extra parenthesis and squash the commits. Also make sure commit message is in proper style.

@cgivre cgivre force-pushed the cgivre:rest-metadata branch 3 times, most recently from 5dff773 to 06e6249 Nov 20, 2018

@@ -106,7 +110,30 @@ public void sendData(RpcOutcomeListener<Ack> listener, QueryWritableBatch result
// TODO: Clean: DRILL-2933: That load(...) no longer throws
// SchemaChangeException, so check/clean catch clause below.
for (int i = 0; i < loader.getSchema().getFieldCount(); ++i) {
columns.add(loader.getSchema().getColumn(i).getName());
//DRILL-6847: This section adds query metadata to the REST results

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Nov 20, 2018

Member

@cgivre sorry for being picky but it seems indent in for loop and if clauses does not qualify to the accepted one in Drill project.

This comment has been minimized.

Copy link
@cgivre

cgivre Nov 20, 2018

Author Contributor

No worries... Fixed.

@cgivre cgivre force-pushed the cgivre:rest-metadata branch from 6ad8134 to 73c32ea Nov 20, 2018

@@ -115,7 +115,7 @@ public QueryResult run(final WorkManager workManager, final WebUserConnection we
}

// Return the QueryResult.
return new QueryResult(queryId, webUserConnection.columns, webUserConnection.results);
return new QueryResult(queryId, webUserConnection, webUserConnection.results);

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Nov 21, 2018

Member

Please revert back initial indent.

This comment has been minimized.

Copy link
@cgivre

cgivre Nov 21, 2018

Author Contributor

Fixed

}

dataType.append(")");
} else if( col.getType().hasWidth()) {

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Nov 21, 2018

Member

else if( col.getType().hasWidth()) -> else if (col.getType().hasWidth())

This comment has been minimized.

Copy link
@cgivre

cgivre Nov 21, 2018

Author Contributor

Fixed. Sorry about all the formatting errors. Normally, I run the formatter before submitting but that screwed up the rest of the file's formatting.

@cgivre cgivre force-pushed the cgivre:rest-metadata branch from 5a4e90c to 87a0f1d Nov 21, 2018

@arina-ielchiieva

This comment has been minimized.

Copy link
Member

arina-ielchiieva commented Nov 21, 2018

+1

@asfgit asfgit closed this in 6a990c7 Nov 26, 2018

mattpollack added a commit to mattpollack/drill that referenced this pull request Feb 25, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.