NIFI-12382: Add DatabaseSchemaRegistry service#8042
NIFI-12382: Add DatabaseSchemaRegistry service#8042mattyb149 wants to merge 4 commits intoapache:mainfrom
Conversation
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for proposing this new feature @mattyb149.
The concept of a database-backed Schema Registry makes sense, but this initial implementation appears to be very specific in terms of table definition, and does not include any documentation on those expectations. With that background, it seems like this could make supporting such a component potentially difficult with concerns around the potential need for schema migration.
For these reasons, it seems like a generalized implementation needs to more configuration options and documentation. Even then, some table definition decisions raise other questions, like IS_NULLABLE as a string versus a tiny integer or something similar. Perhaps it would be helpful to address some of these design considerations before going further.
|
I'll add more documentation around the use case and what this service does, but I didn't see any use for a DB-backed registry that just stores schema text (such as a String column containing an Avro schema) since we have other implementations that basically do that. I can rename the service DatabaseTableSchemaRegistry if that helps make it clearer what this service is for, but the idea is to generate the schema from metadata rather than store and retrieve schemas. This can be used by things like ValidateRecord in order to ensure that records will successfully go into a database with a downstream PutDatabaseRecord for example. |
8ab5815 to
334aa8f
Compare
|
Thanks for the reply @mattyb149. Renaming the class could be helpful. Another potential concern is using the schema name as the table name. Aside from schema naming conventions potentially conflicting with database table naming requirements, this would also require a table for every schema, which seems like it could be difficult to maintain. Although this could make sense in some specific environments, it seems like this might not be a good fit for inclusion as a component for general usage. |
|
I'm jumping in the conversation here but I don't think this is a very specific use case. The idea is that, in many cases, data would be sent into a database by NiFi and some users may not want to deal with schemas in NiFi or introduce an external Schema Registry. The idea here is to be able to retrieve the schema of the destination table in the NiFi flow so that it can be leveraged instead of the Infer Schema option which comes with the limitations and risks that we know. |
|
Thanks for the additional input @pvillard31, that is helpful, and makes more sense now. I realize that I was not evaluating all the implementation details. Now I see that the implementation uses the Database Metadata results, and that is what defines the information. Sorry for missing that detail earlier @mattyb149. With that in mind, I can now see how this makes more sense as a way to use database table metadata for the schema definition. Perhaps including With that background, are there any concerns about JDBC driver support across vendors? I would expect that more popular database vendors should work with this approach, but some may not, so that may be worth highlighting under additional details. |
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for adding the capability description @mattyb149, that definitely helps clarify how things work now. I noted a couple more detail recommendations.
...dles/nifi-standard-services/nifi-dbcp-service-bundle/nifi-db-schema-registry-service/pom.xml
Outdated
Show resolved
Hide resolved
...dles/nifi-standard-services/nifi-dbcp-service-bundle/nifi-db-schema-registry-service/pom.xml
Outdated
Show resolved
Hide resolved
...registry-service/src/main/java/org/apache/nifi/db/schemaregistry/DatabaseSchemaRegistry.java
Outdated
Show resolved
Hide resolved
|
|
||
| // If no columns are found, check that the table exists | ||
| if (recordFields.isEmpty()) { | ||
| try (final ResultSet tblrs = dmd.getTables(dbCatalogName, dbSchemaName, tableName, null)) { |
There was a problem hiding this comment.
Recommend adjusting the variable name for readability:
| try (final ResultSet tblrs = dmd.getTables(dbCatalogName, dbSchemaName, tableName, null)) { | |
| try (final ResultSet tablesResultSet = dmd.getTables(dbCatalogName, dbSchemaName, tableName, null)) { |
...registry-service/src/main/java/org/apache/nifi/db/schemaregistry/DatabaseSchemaRegistry.java
Outdated
Show resolved
Hide resolved
334aa8f to
49bc2e7
Compare
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for making the code adjustments @mattyb149. The latest version works well in runtime testing with MySQL.
I noted a couple minor convention recommendations. The only concern remaining is the location of the class itself. Although it is convenient to have it with the Connection Pools, it seems out of place. As mentioned in the detail comments, moving it to a separate NAR seems like a better long-term solution, with minimal impact on the size. Thoughts?
| } | ||
|
|
||
| private void checkTableExists(final DatabaseMetaData databaseMetaData, final String tableName) throws SchemaNotFoundException, SQLException { | ||
| try (final ResultSet tblrs = databaseMetaData.getTables(dbCatalogName, dbSchemaName, tableName, null)) { |
There was a problem hiding this comment.
Recommend renaming this variable for readability.
| try (final ResultSet tblrs = databaseMetaData.getTables(dbCatalogName, dbSchemaName, tableName, null)) { | |
| try (final ResultSet tables = databaseMetaData.getTables(dbCatalogName, dbSchemaName, tableName, null)) { |
| <modules> | ||
| <module>nifi-dbcp-service</module> | ||
| <module>nifi-hikari-dbcp-service</module> | ||
| <module>nifi-db-schema-registry-service</module> |
There was a problem hiding this comment.
Although this uses Database Connection Pools, it does not seem like it fits in this bundle and NAR. As it relies only on standard service interfaces, it should be a small addition as a separate NAR. What do you think?
| } | ||
|
|
||
| @AfterAll | ||
| public static void shutdownDatabase() throws Exception { |
There was a problem hiding this comment.
The Exception is not thrown and can be removed:
| public static void shutdownDatabase() throws Exception { | |
| public static void shutdownDatabase() { |
| assertThrows(SchemaNotFoundException.class, () -> dbSchemaRegistry.retrieveSchema(schemaIdentifier)); | ||
| } | ||
|
|
||
| private static class DBCPServiceSimpleImpl extends AbstractControllerService implements DBCPService { |
There was a problem hiding this comment.
Recommend avoiding the use of Impl on class names, even for testing.
| private static class DBCPServiceSimpleImpl extends AbstractControllerService implements DBCPService { | |
| private static class SimpleDBCPService extends AbstractControllerService implements DBCPService { |
| @Override | ||
| public Connection getConnection() throws ProcessException { | ||
| try { | ||
| Class.forName("org.apache.derby.jdbc.EmbeddedDriver"); |
There was a problem hiding this comment.
Recommend declaring this class name once and reusing throughout the test class.
49bc2e7 to
fea9178
Compare
exceptionfactory
left a comment
There was a problem hiding this comment.
Thanks for restructuring the bundle @mattyb149, the latest version looks good! +1 merging
- Added nifi-db-schema-registry-service-nar This closes #8042 Signed-off-by: David Handermann <exceptionfactory@apache.org> (cherry picked from commit 99c843f)
Summary
NIFI-12382 This PR adds a DatabaseSchemaRegistry controller service which gets the metadata for a specified database table and creates a RecordSchema from it.
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000NIFI-00000Pull Request Formatting
mainbranchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
mvn clean install -P contrib-checkLicensing
LICENSEandNOTICEfilesDocumentation