Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-28152][SQL][2.4] Mapped ShortType to SMALLINT and FloatType to REAL for MsSqlServerDialect #25248

Closed
wants to merge 1 commit into from

Commits on Jul 24, 2019

  1. [SPARK-28152][SQL] Mapped ShortType to SMALLINT and FloatType to REAL…

    … for MsSqlServerDialect
    
    ## What changes were proposed in this pull request?
    This PR aims to correct mappings in `MsSqlServerDialect`. `ShortType` is mapped to `SMALLINT` and `FloatType` is mapped to `REAL` per [JBDC mapping]( https://docs.microsoft.com/en-us/sql/connect/jdbc/using-basic-data-types?view=sql-server-2017) respectively.
    
    ShortType and FloatTypes are not correctly mapped to right JDBC types when using JDBC connector. This results in tables and spark data frame being created with unintended types. The issue was observed when validating against SQLServer.
    
    Refer [JBDC mapping]( https://docs.microsoft.com/en-us/sql/connect/jdbc/using-basic-data-types?view=sql-server-2017  ) for guidance on mappings between SQLServer, JDBC and Java. Note that java "Short" type should be mapped to JDBC "SMALLINT" and java Float should be mapped to JDBC "REAL".
    
    Some example issue that can happen because of wrong mappings
        - Write from df with column type results in a SQL table of with column type as INTEGER as opposed to SMALLINT.Thus a larger table that expected.
        - Read results in a dataframe with type INTEGER as opposed to ShortType
    
    - ShortType has a problem in both the the write and read path
    - FloatTypes only have an issue with read path. In the write path Spark data type 'FloatType' is correctly mapped to JDBC equivalent data type 'Real'. But in the read path when JDBC data types need to be converted to Catalyst data types ( getCatalystType) 'Real' gets incorrectly gets mapped to 'DoubleType' rather than 'FloatType'.
    
    Refer apache#28151 which contained this fix as one part of a larger PR.  Following PR apache#28151 discussion it was decided to file seperate PRs for each of the fixes.
    
    ## How was this patch tested?
    UnitTest added in JDBCSuite.scala and these were tested.
    Integration test updated and passed in MsSqlServerDialect.scala
    E2E test done with SQLServer
    
    Closes apache#25146 from shivsood/float_short_type_fix.
    
    Authored-by: shivsood <shivsood@microsoft.com>
    Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
    shivsood committed Jul 24, 2019
    Configuration menu
    Copy the full SHA
    a3020b2 View commit details
    Browse the repository at this point in the history