Conversation
|
This pull request introduces 8 alerts when merging c958a98 into ebfe1c0 - view on LGTM.com new alerts:
|
Catalog object model for tables, columns Druid metadata DB storage (as an exension) REST API to update the catalog (as an extension) Integration tests Model only: no planner integration yet
c958a98 to
bd7c8be
Compare
|
This pull request introduces 6 alerts when merging bd7c8be into 7fa53ff - view on LGTM.com new alerts:
|
|
Build is clean except for one unrelated flaky failure in a Kafka IT. Will rerun later as I'm sure someone will have comments that require code changes and a new build. |
rohangarg
left a comment
There was a problem hiding this comment.
Thanks a lot for the design issue and changes @paul-rogers! 👍
I've currently taken a first pass at the core changes as of now - posting the current review as a checkpoint as I review the remaining part.
Thanks for the in-code explanations and patience as well!
processing/src/main/java/org/apache/druid/segment/column/RowSignature.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/CatalogUtils.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/CatalogUtils.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/CatalogUtils.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/ExternalSpec.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/Parameterized.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/facade/ColumnFacade.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/facade/InputTableFacade.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/MeasureTypes.java
Outdated
Show resolved
Hide resolved
Addressed direct review comments Removed rollup support per offline discussion
paul-rogers
left a comment
There was a problem hiding this comment.
@rohangarg, thanks for your review. Addressed your comments. Also revised the model to remove the "rollup" concept, leaving just the one "datasource" table type.
server/src/main/java/org/apache/druid/catalog/model/table/MeasureTypes.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/facade/InputTableFacade.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/facade/ColumnFacade.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/Parameterized.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/Parameterized.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/CatalogUtils.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/druid/data/input/InputFormat.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/druid/data/input/InputSource.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/druid/java/util/common/jackson/JacksonUtils.java
Outdated
Show resolved
Hide resolved
FrankChen021
left a comment
There was a problem hiding this comment.
The change is huge and I just took a short time to look through and left some minor suggestions.
...-core/druid-catalog/src/main/java/org/apache/druid/catalog/http/CatalogListenerResource.java
Show resolved
Hide resolved
extensions-core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/HideColumns.java
Outdated
Show resolved
Hide resolved
...core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/sql/SQLCatalogManager.java
Outdated
Show resolved
Hide resolved
...core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/sql/SQLCatalogManager.java
Outdated
Show resolved
Hide resolved
...core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/sql/SQLCatalogManager.java
Outdated
Show resolved
Hide resolved
...core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/sql/SQLCatalogManager.java
Show resolved
Hide resolved
...core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/sql/SQLCatalogManager.java
Outdated
Show resolved
Hide resolved
| * If SQL based, return the SQL version of the metastore | ||
| * connector. Throws an exception if not SQL-based. | ||
| */ | ||
| SQLMetadataConnector sqlConnector(); |
There was a problem hiding this comment.
About this interface and above isSql, I think we don't need such a specialized method defined at this interface level. A more appropriate way is to change it as
MetadataStorageConnector connector();and remove the isSql interface.
The SQLCatalogManager checks if the returned connector is instanceof SQLMetadataConnector or not.
There was a problem hiding this comment.
You are probably right. This is a work-in-progress: I have not currently set up a way to test with one of the non-SQL storage extensions. The DB integration is rather a hack: extensions can't really depend on one another, so the extension-DB code to create tables (etc.) can't know about the catalog (yet), and so that code lives here. Since it lives here, we have to know that we are, in fact, dealing with a SQL DB so the code can blow up if the storage is something else (and so this extension doesn't know how to create the table-like-thing that the target non-SQL DB needs.)
Druid is in dire need of a more general DB storage API (as well as a mechanism for DB schema evolution.)
There was a problem hiding this comment.
In the same sentiment, I think the getDBI method on the SQLMetadataConnector interface feels a bit like a leak. But due to the lack of choices, I'm ok with the general current implementation.
Small styling suggestion could be to use MetadataStorage instead of Metastore to adhere to the current naming convention.
There was a problem hiding this comment.
A more general issue is how we should handle metadata storage for extensions. The catalog is an extension for now. Once it is merged into the core Druid, the DB access stuff can be distributed across the SQL extensions and other extensions.
For an extension, however, the storage has to live in the extension itself. It is hard to see how, say, the catalog extension would work with, say, a Mongo DB storage extension: either that extension has to know about the catalog extension, or visa-versa.
There was a problem hiding this comment.
Renamed class and combined the interface and implementation since it turned out there was only one implementation.
paul-rogers
left a comment
There was a problem hiding this comment.
Thank you @rohangarg and @FrankChen021 for your reviews. I've either updated the code or provided a description for each of our comments.
core/src/main/java/org/apache/druid/data/input/InputFormat.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/druid/java/util/common/jackson/JacksonUtils.java
Outdated
Show resolved
Hide resolved
| * If SQL based, return the SQL version of the metastore | ||
| * connector. Throws an exception if not SQL-based. | ||
| */ | ||
| SQLMetadataConnector sqlConnector(); |
There was a problem hiding this comment.
You are probably right. This is a work-in-progress: I have not currently set up a way to test with one of the non-SQL storage extensions. The DB integration is rather a hack: extensions can't really depend on one another, so the extension-DB code to create tables (etc.) can't know about the catalog (yet), and so that code lives here. Since it lives here, we have to know that we are, in fact, dealing with a SQL DB so the code can blow up if the storage is something else (and so this extension doesn't know how to create the table-like-thing that the target non-SQL DB needs.)
Druid is in dire need of a more general DB storage API (as well as a mechanism for DB schema evolution.)
...core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/sql/SQLCatalogManager.java
Outdated
Show resolved
Hide resolved
...core/druid-catalog/src/main/java/org/apache/druid/catalog/storage/sql/SQLCatalogManager.java
Outdated
Show resolved
Hide resolved
| register(new SchemaDefnImpl( | ||
| TableId.LOOKUP_SCHEMA, | ||
| ResourceType.CONFIG, | ||
| null // TODO |
There was a problem hiding this comment.
It means that this early draft does not yet include lookup tables and so there are no valid table types for the lookup schema. Ideally, that will change as the project progresses so that lookup tables can have entries in the catalog.
| register(new SchemaDefnImpl( | ||
| TableId.VIEW_SCHEMA, | ||
| ResourceType.VIEW, | ||
| null // TODO |
There was a problem hiding this comment.
Druid currently offers at multiple view systems: one via an extension, another two available from Imply. There may be more. Eventually, it will make sense to define views in the catalog, just as is done in most DB systems. The TODO here is a reminder that that work is pending once we get the basics working.
| public class ClusterKeySpec | ||
| { | ||
| private final String expr; | ||
| private final boolean desc; |
There was a problem hiding this comment.
My understanding is that clustering is sorting and sorting is either ascending or descending. (The Druid implementation seems to always use nulls-first sorting.) See SortColumn in MSQ. In fact, maybe this should just reuse SortColumn, now that I know that class exists...
Can you give me a hint as to what other options we might offer in the future? That way, we can figure out how to encode them.
I believe @clintropolis is working on some cool new indexing ideas. If that results in more than one index choice per column (i.e. front-encoding or not), then we can add properties to columns for per-column choices, or to the table for cross-column choices. (The reason that clustering is a table property is that it includes more than one column.)
...-core/druid-catalog/src/main/java/org/apache/druid/catalog/http/CatalogListenerResource.java
Show resolved
Hide resolved
...-core/druid-catalog/src/main/java/org/apache/druid/catalog/http/CatalogListenerResource.java
Outdated
Show resolved
Hide resolved
| } | ||
| TableSpec spec = tableSpec.spec(); | ||
| if (CatalogUpdateNotifier.TOMBSTONE_TABLE_TYPE.equals(spec.type())) { | ||
| listener.deleted(tableSpec.id()); |
There was a problem hiding this comment.
Why not have 2 different endpoints; one for update, one for delete?
There was a problem hiding this comment.
Or three: one for add, another for update, and another for delete. Then, replicate that for other catalog objects: connections, etc. Pretty soon one has a big API to maintain. Since the goal here is to simply shift delta messages, we use the "tombstone encoding" to merge the delete and the add/update cases.
The code right now is specific to tables because that's all this version of the catalog includes. Still, due to the rolling-upgrade forward/backward compatibility issue, perhaps this needs to change. Perhaps another approach is to wrap the table in an event message:
{
"objectType": "table",
"action": "delete",
"table": {
"schema": "druid",
"name": "doomed"
}
}A receiver would ignore object types it doesn't understand (it means the coordinator is newer than the broker.) I'll play with that a bit to see if that will work.
...-core/druid-catalog/src/main/java/org/apache/druid/catalog/http/CatalogListenerResource.java
Outdated
Show resolved
Hide resolved
extensions-core/druid-catalog/src/main/java/org/apache/druid/catalog/http/CatalogResource.java
Outdated
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| private Response authorizeTable(TableId tableId, TableSpec spec, final HttpServletRequest req) |
There was a problem hiding this comment.
Why not use resourceFilter?
There was a problem hiding this comment.
A resource filter is handy when the resource pattern is known and multiple endpoints follow the same pattern. At present, the endpoints here follow different patterns. This code is more like the Broker (have to root around in the SQL to find the resources) than, say, tasks, where it is obvious that the task ID is to be authorized, and there are multiple task endpoints.
The filter is a crude tool. Another form of reuse is the subroutine, which is the technique used here.
That said, if the redesign of the REST API results in multiple messages, then we'll consider moving the auth code to a filter if that makes the code simpler.
server/src/main/java/org/apache/druid/catalog/model/TableDefnRegistry.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/TableMetadata.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/TableSpec.java
Outdated
Show resolved
Hide resolved
|
This pull request introduces 1 alert when merging 5de07ce into c875f4b - view on LGTM.com new alerts:
|
server/src/main/java/org/apache/druid/catalog/model/table/AbstractDatasourceDefn.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/AbstractDatasourceDefn.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/ClusterKeySpec.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/ExternalTableDefn.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/ExternalTableDefn.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/ExternalTableDefn.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/HttpTableDefn.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/HttpTableDefn.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/InputFormats.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/LocalTableDefn.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/HttpTableDefn.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/TableBuilder.java
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/metadata/storage/derby/DerbyMetadataStorage.java
Show resolved
Hide resolved
...ns-core/druid-catalog/src/main/java/org/apache/druid/catalog/sync/CachedMetadataCatalog.java
Outdated
Show resolved
Hide resolved
zachjsh
left a comment
There was a problem hiding this comment.
Thanks for making all the changes!! Once we get the tests fix, think this is good to go.
The Druid catalog provides a collection of metadata "hints" about tables (datasources, input sources, views, etc.) within Druid. This PR provides the foundation: the DB and REST layer, but not yet the integration with the Calcite SQL layer. This is a much-refined version of the earlier catalog PR.
The DB layer extends what is done for other Druid metadata tables. The semantic ("business logic") layer provides the usual CRUD operations on tables. The entire design is pretty standard and follows Druid patterns. The key difference is the rather extreme lengths taken by the implementation to ensure each bit is easily testable without mocks. That means many interfaces which can be implemented in multiple ways.
Parts available in this PR include:
The catalog mechanism is split into two parts.
druid-catalogextension which stores data in the Druid metadata database.This split exists for two reasons:
druid-catalogextension now acknowledge that they are using it only for testing, not production, and at their own risk.Functionality not in this PR, but which will appear in the next one, includes:
INSERTandREPLACE).SELECTqueries.This is a great opportunity for reviewers to provide guidance on the basic catalog mechanism before we start building SQL integration on top. Please see the Druid catalog issue for additional details about the goals and design of the catalog.
Update 10-17
After much debate, we decided to drop the rollup feature from the catalog. It turns out that "rollup" is a feature used for many purposes and more thought is needed to work out how users might want to use the rollup features. Compared to the Catalog proposal, the initial feature won't help with MSQ ingestion of rollup tables: users are empowered to specify on each ingestion how they want those segments to be rolled up.
A new commit pulls out the portions of the code in support of the proposed, but now dropped,
rollupdatasource type.This PR has: