Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-12828 Add mapping for BIT SQL Type in DataTypeUtils #8445

Closed
wants to merge 2 commits into from

Conversation

ravinarayansingh
Copy link
Contributor

Summary

NIFI-12828

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@ravinarayansingh ravinarayansingh changed the title NIFI-12828 Add mapping for BIT SQL Type NIFI-12828 Add mapping for BIT SQL Type in DataTypeUtils Feb 22, 2024
@Lehel44
Copy link
Contributor

Lehel44 commented Feb 27, 2024

@ravinarayansingh Thanks for making this change. However, I'd like to address a concern regarding its scope as this change only adresses a subset of the issue. Postgres maps BOOLEAN types to BIT due to historical reasons. By mapping java.sql.Types.BIT to RecordFieldType.Boolean in DataTypeUtils::getDataTypeFromSQLTypeValue, we may encounter a scenario where a field defined as BIT in the PostgreSQL table with a size specification such as BIT(n) where n>2 cannot be represented as a BOOLEAN type, and though the record won't be converted by the JDBC driver the schema type is not valid. Consequently, the schema for such a field will also be defined as Boolean. @mattyb149, do you have any thoughts on this?

@Lehel44
Copy link
Contributor

Lehel44 commented Feb 27, 2024

Postgres does not recommend storing boolean values in BIT(1) columns because it introduces overhead, so we neither should support a conversion from BIT(1) to Boolean and everything else to Integer, I also think that all the BIT(n) and BIT VARYING(n) can be just converted to Integer. This means that when a BOOLEAN data type is converted to BIT by the JDBC driver and it arrives as a BIT datatype, we cannot distinguish if it was originally a BIT datatype or it was converted by the JDBC driver, we have to map both of them to Integer.

@@ -1918,6 +1918,7 @@ public static DataType getDataTypeFromSQLTypeValue(final int sqlType) {
case Types.BIGINT:
return RecordFieldType.BIGINT.getDataType();
case Types.BOOLEAN:
case Types.BIT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping for the BIT datatype should be changed to Integer. This is because the BIT datatype can represent not only BIT(1) but also BIT(n>1), making it impossible to distinguish between them. Therefore, using Integer provides a more suitable and consistent mapping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @Lehel44 and @mattyb149

Have a look on below sample table definition (Postgres14)

CREATE TABLE test_table (
    id int4 NOT NULL,
    active bool NULL,
    col_bit bit(1) NULL,
    col_bit1 bit(1) NULL,
    col_varbit varbit(2) NULL,
    CONSTRAINT temp_market_cap_pkey PRIMARY KEY (id)
);

We can use something like below:

# DataTypeUtils.java
case Types.BOOLEAN:
    return RecordFieldType.BOOLEAN.getDataType();
case Types.BIT:
    return RecordFieldType.INT.getDataType();

String typeName = columnResultSet.getString("TYPE_NAME");
final int dataType;
if (typeName.equalsIgnoreCase("bool")) {
    dataType = 16;
} else {
    dataType = columnResultSet.getInt("DATA_TYPE");
}

then below will be result

colName postgresType javaSqlType javaSqlTypeName finalType
id int 4 int4 INT
active bool -7 bool BOOLEAN
col_bit bit(1) -7 bit INT
col_bit1 bit(2) -7 bit INT
col_varbit varbit(2) 1111 varbit STRING

let me know your thoughts

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ravinarayansingh I tried it and it looks good to me!

@mattyb149
Copy link
Contributor

+1 LGTM, merging to support/nifi-1.x and main. In the future please include the Jira case in the actual commit message(s), I will add them on merge. Thanks for the fix!

@mattyb149 mattyb149 closed this in 8346bd7 Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants