Inconsistent column names for different databases (ie. H2 & PostgreSQL) #1658

simboel · 2022-12-27T21:15:12Z

When defining a column that requires quotation the user will run into problems when changing databases (i.e. PostgreSQL for production and H2 for local tests).

Context:

Using H2 for local tests and PostgreSQL for production
Flyway as db migration tool (both used for local tests as well as in production)
There are columns that will be quoted (ie. identical to keywords)

Problem

When running the application, the quoted column names will be different depending on the target database. In H2 the column name will be changed to quoted UPPERCASE and when using PostgreSQL the column name will be changed to lowercase.

This means that the Flyway migration script will create a database that can only be read by H2 or PostgreSQL, but never both (when there is at least one column requiring quotation).

My current workaround is to iterate over all tables and columns and make sure, that no column will be quoted by Exposed. If it does, there will be an error and I manually change the column names.

Example

import org.jetbrains.exposed.dao.id.IntIdTable
import org.jetbrains.exposed.exceptions.ExposedSQLException
import org.jetbrains.exposed.sql.*
import org.jetbrains.exposed.sql.transactions.transaction
import org.springframework.boot.context.event.ApplicationReadyEvent
import org.springframework.context.event.EventListener
import org.springframework.stereotype.Component

object StarWarsFilms : IntIdTable() {
    val name = varchar("name", 50) // name is a keyword in SQL and must be quoted to be used as a column
}

@Component
class BigEntityTest() {
    @Suppress("unused")
    @EventListener(ApplicationReadyEvent::class)
    fun test() {
        transaction {
            addLogger(StdOutSqlLogger)
            // now just select all (only to get the sql SELECT statement in the logs)
            try { StarWarsFilms.selectAll().forEach {  } } catch (ex: ExposedSQLException) { }

        }
    }
}

Result on H2

SELECT STARWARSFILMS.ID, STARWARSFILMS."NAME" FROM STARWARSFILMS

Result on PostgreSQL

SELECT starwarsfilms.id, starwarsfilms."name" FROM starwarsfilms

Result

When using a flyway migration script, we will define the column name to id and name. This will result in SQL exception when using H2 as there the column "NAME" will be used instead.
This problem will arise with all columns that match any of the roughly 500 keywords.

Expected result

Exposed should never change column names from the explicit value used in the table definition (at least when quotation is required). So the SQL should look like this:

SELECT STARWARSFILMS.id, STARWARSFILMS."name" FROM STARWARSFILMS

Note that id and name are both identical to the definition in the Table.

If consistency between column names and generated sql is not undesired, Exposed should make sure, that the name will be identical to the column definition when quoting is required (ie. identical to a keyword).

The text was updated successfully, but these errors were encountered:

dzikoysk · 2023-01-15T20:18:48Z

I'm facing several similar issues trying to connect Exposed DSL against existing database scheme, unfortunately current implementation is so inconsistent in this area that I guess it won't work. I wouldn't even mind that much to quote it on my own, but:

Postgres column names are case sensitive (uppercase / lowercase) #683

simboel · 2023-01-15T20:55:27Z

@dzikoysk I don't think inconsistent is the right term.

It will try to use UPPERCASE or lowercase only depending on the used database. This shouldn't make any difference as MYOLUMN and mycolumn are the same for a database implementing the SQL standard.

When you have upper and lowercase mixed in your column name (like myColumn) it will quote it automatically for you so you will have "myColumn". Makes sense as a column name like that must be quoted for SQL databases.

But, when you have a column name identical to a keyword (like user), it will be changed to UPPERCASE or lowercase depending on database AND it will be quoted. And this is the problem described in this ticket.

So I see two possible solutions I could think of:

Drop the feature to automatically change the case of column names. So mycolumn should never be converted to MYCOLUMN.
Do not change the column name case when quotation is required.

I would side with solution 1 but this would result in a higher impact on existing applications and might not be desired. Solution 2 may be a good trade-off.

simboel · 2023-01-16T17:14:39Z

I've further looked into this and I think that this is indeed deeply linked to #683

@Tapac Do you have any opinion about the way Exposed should behave?
You can see two possible solution drafts above. As I've said, I would drop the IdentifierManagerApi.inProperCase(identity: String) and always use the identity as-is or quote if necessary. But I'm unsure about the reasoning behind this method.

I'm eager to work on a Pull Request for this problem if you want. But as we're talking about breaking changes I want to be extra careful.

So even if you decide for any of the solutions from above (or something else), I would like to think a bit about how to incorporate the breaking changes. I have a few ideas:

Just break applications and change the behavior.
Just break applications but provide an optional configuration (somewhere) that can be used to recover the previous behavior.
Do not break the application and provide a configuration option on the Table object instance as proposed in Postgres column names are case sensitive (uppercase / lowercase) #683: open class Table(name: String = "", useExplicitName: Boolean = false)
Do not break the application and provide an optional configuration (somewhere) that can be used to enable the new behavior.

I'd advise against option 1 (obviously not very elegant or user friendly) and option 3 (specify this on all tables manually is cumbersome and error-prone).
Depending on the current stability of Exposed I'd go with option 2 or 4, favoring option 4. With option 4 we could provide an easy "fix" for this problem so users just need to enable the configuration option. And we could also notify users that this configuration option will switch to true by default in n versions down the line. Of course, this is only required if we want to change the default behavior (which I think we should).

dzikoysk · 2023-01-16T17:55:06Z

For sure it cannot be solved by any kind of breaking change, so I'd personally go for opt-in option (4) to enforce explicit names for each identifier used in Exposed for tables/coulmns/indices names, even sth like IdentifierManagerApi.enforceExplicitNames = true at this point.

AlexeySoshin · 2023-02-02T10:24:42Z

@simboel
The potential change should be pretty small:

Exposed/exposed-core/src/main/kotlin/org/jetbrains/exposed/sql/statements/api/IdentifierManagerApi.kt

Lines 65 to 74 in 990bff3

    
           when { 
        
               alreadyQuoted && supportsMixedQuotedIdentifiers -> identity 
        
               alreadyQuoted && isUpperCaseQuotedIdentifiers -> identity.uppercase() 
        
               alreadyQuoted && isLowerCaseQuotedIdentifiers -> identity.lowercase() 
        
               supportsMixedIdentifiers -> identity 
        
               oracleVersion != OracleVersion.NonOracle -> identity.uppercase() 
        
               isUpperCaseIdentifiers -> identity.uppercase() 
        
               isLowerCaseIdentifiers -> identity.lowercase() 
        
               else -> identity 
        
           }

There's a keywords map available, so you could say that if the identified is in keywords, you return identity

I would say, let's give it a try and see where it breaks. The test suite should be quite comprehensive.

simboel · 2023-02-02T20:55:51Z

@AlexeySoshin I think it'll probably break every single Postgres or H2 application (possibly more that I just haven't checked yet) that is relying on the current implementation as we're facing quotation. I think we might not be able to skip backward compatibility.

bog-walk · 2023-08-22T15:16:15Z

Hi @simboel and thanks for all your thoughts on this issue.

Regarding why inProperCase() is used:

I can't testify to the exact thoughts of the original maintainer, but one reasoning is that Exposed is tightly reliant on JDBC metadata for a fair amount of under-the-hood processes. Schema validation for example, via createMissingTablesAndColumns() or statementsRequiredToActualizeScheme(), retrieves this metadata for comparison checks and requires that the identifiers cached by inProperCase() match the identifiers as they are internally stored and returned by the database.

So that we can work towards the best solution for our users, I'd appreciate any insight you have regarding some questions I've detailed below:

Essentially, I'm trying to understand the end-goal of solutions 3/4 (i.e. providing some sort of flag to enable explicit use of the user-defined name).

Let's imagine in this scenario that the issue with keywords doesn't exist. i.e. The current behavior of Exposed, regardless of database, is that it correctly identifies a keyword, leaves the identifier case unchanged, and appropriately quotes it. If that were the current situation, what would be the benefit of introducing a flag on top of that?

So the user can avoid escape-quoting every single identifier in order to ensure that desired case is retained?
- If so, would the expectation be that it is a global flag, performing a blanket change that affects all created schema and tables?
- What if the user then wants to exclude a single table from the flag?
- Should opt-in/opt-out be configurable on a table-by-table basis?
So that table objects can be seamlessly used across multiple databases?
- From my understanding, this should only be an issue for quoted identifiers. In your example, the id column doesn't throw an exception even though it has a different case in H2 from the case in the table definition, correct?
- As I'm seeing in issue 683 , MySQL might be a special case, as it will not recognize quoted identifiers unless they are wrapped in backticks.

So if the situation with quoted keywords is fixed, would there still be a need for a flag that I'm missing? If so, any example would be very much appreciated.

bog-walk · 2023-08-25T12:28:35Z

Hi @simboel . Please note that this issue will be closed by the PR over the next few days.

In the interest of continuing the discussion regarding a new feature flag that enables user-defined case sensitivity, please address any thoughts/design wishes in the related issue mentioned above (#683). Or in the dedicated YouTrack issue.

Replying here might mean that I miss your response and I'd rather avoid that if possible.
Thanks in advance!

simboel changed the title ~~Inconsistent column names for different databases (H2 & PostgreSQL)~~ Inconsistent column names for different databases (ie. H2 & PostgreSQL) Jan 1, 2023

dzikoysk mentioned this issue Mar 25, 2023

Exposed insert does not return generated id due to lack of autoIncrement() on column type dzikoysk/sqiffy#13

Closed

bog-walk self-assigned this Jul 20, 2023

bog-walk mentioned this issue Aug 24, 2023

Postgres column names are case sensitive (uppercase / lowercase) #683

Open

bog-walk linked a pull request Aug 24, 2023 that will close this issue

fix!: EXPOSED-150 Auto-quoted column names change case across databases #1841

Merged

bog-walk mentioned this issue Sep 15, 2023

fix!: EXPOSED-150 Auto-quoted column names change case across databases #1841

Merged

bog-walk closed this as completed in #1841 Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent column names for different databases (ie. H2 & PostgreSQL) #1658

Inconsistent column names for different databases (ie. H2 & PostgreSQL) #1658

simboel commented Dec 27, 2022 •

edited

Loading

dzikoysk commented Jan 15, 2023

simboel commented Jan 15, 2023 •

edited

Loading

simboel commented Jan 16, 2023

dzikoysk commented Jan 16, 2023

AlexeySoshin commented Feb 2, 2023 •

edited

Loading

simboel commented Feb 2, 2023

bog-walk commented Aug 22, 2023

bog-walk commented Aug 25, 2023

Inconsistent column names for different databases (ie. H2 & PostgreSQL) #1658

Inconsistent column names for different databases (ie. H2 & PostgreSQL) #1658

Comments

simboel commented Dec 27, 2022 • edited Loading

Problem

Example

Result on H2

Result on PostgreSQL

Result

Expected result

dzikoysk commented Jan 15, 2023

simboel commented Jan 15, 2023 • edited Loading

simboel commented Jan 16, 2023

dzikoysk commented Jan 16, 2023

AlexeySoshin commented Feb 2, 2023 • edited Loading

simboel commented Feb 2, 2023

bog-walk commented Aug 22, 2023

bog-walk commented Aug 25, 2023

simboel commented Dec 27, 2022 •

edited

Loading

simboel commented Jan 15, 2023 •

edited

Loading

AlexeySoshin commented Feb 2, 2023 •

edited

Loading