Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent column names for different databases (ie. H2 & PostgreSQL) #1658

Closed
simboel opened this issue Dec 27, 2022 · 8 comments · Fixed by #1841
Closed

Inconsistent column names for different databases (ie. H2 & PostgreSQL) #1658

simboel opened this issue Dec 27, 2022 · 8 comments · Fixed by #1841
Assignees

Comments

@simboel
Copy link
Contributor

simboel commented Dec 27, 2022

When defining a column that requires quotation the user will run into problems when changing databases (i.e. PostgreSQL for production and H2 for local tests).

Context:

  • Using H2 for local tests and PostgreSQL for production
  • Flyway as db migration tool (both used for local tests as well as in production)
  • There are columns that will be quoted (ie. identical to keywords)

Problem

When running the application, the quoted column names will be different depending on the target database. In H2 the column name will be changed to quoted UPPERCASE and when using PostgreSQL the column name will be changed to lowercase.

This means that the Flyway migration script will create a database that can only be read by H2 or PostgreSQL, but never both (when there is at least one column requiring quotation).

My current workaround is to iterate over all tables and columns and make sure, that no column will be quoted by Exposed. If it does, there will be an error and I manually change the column names.

Example

import org.jetbrains.exposed.dao.id.IntIdTable
import org.jetbrains.exposed.exceptions.ExposedSQLException
import org.jetbrains.exposed.sql.*
import org.jetbrains.exposed.sql.transactions.transaction
import org.springframework.boot.context.event.ApplicationReadyEvent
import org.springframework.context.event.EventListener
import org.springframework.stereotype.Component

object StarWarsFilms : IntIdTable() {
    val name = varchar("name", 50) // name is a keyword in SQL and must be quoted to be used as a column
}

@Component
class BigEntityTest() {
    @Suppress("unused")
    @EventListener(ApplicationReadyEvent::class)
    fun test() {
        transaction {
            addLogger(StdOutSqlLogger)
            // now just select all (only to get the sql SELECT statement in the logs)
            try { StarWarsFilms.selectAll().forEach {  } } catch (ex: ExposedSQLException) { }

        }
    }
}

Result on H2

SELECT STARWARSFILMS.ID, STARWARSFILMS."NAME" FROM STARWARSFILMS

Result on PostgreSQL

SELECT starwarsfilms.id, starwarsfilms."name" FROM starwarsfilms

Result

When using a flyway migration script, we will define the column name to id and name. This will result in SQL exception when using H2 as there the column "NAME" will be used instead.
This problem will arise with all columns that match any of the roughly 500 keywords.

Expected result

Exposed should never change column names from the explicit value used in the table definition (at least when quotation is required). So the SQL should look like this:

SELECT STARWARSFILMS.id, STARWARSFILMS."name" FROM STARWARSFILMS

Note that id and name are both identical to the definition in the Table.

If consistency between column names and generated sql is not undesired, Exposed should make sure, that the name will be identical to the column definition when quoting is required (ie. identical to a keyword).

@simboel simboel changed the title Inconsistent column names for different databases (H2 & PostgreSQL) Inconsistent column names for different databases (ie. H2 & PostgreSQL) Jan 1, 2023
@dzikoysk
Copy link
Contributor

I'm facing several similar issues trying to connect Exposed DSL against existing database scheme, unfortunately current implementation is so inconsistent in this area that I guess it won't work. I wouldn't even mind that much to quote it on my own, but:

@simboel
Copy link
Contributor Author

simboel commented Jan 15, 2023

@dzikoysk I don't think inconsistent is the right term.

It will try to use UPPERCASE or lowercase only depending on the used database. This shouldn't make any difference as MYOLUMN and mycolumn are the same for a database implementing the SQL standard.

When you have upper and lowercase mixed in your column name (like myColumn) it will quote it automatically for you so you will have "myColumn". Makes sense as a column name like that must be quoted for SQL databases.

But, when you have a column name identical to a keyword (like user), it will be changed to UPPERCASE or lowercase depending on database AND it will be quoted. And this is the problem described in this ticket.

So I see two possible solutions I could think of:

  1. Drop the feature to automatically change the case of column names. So mycolumn should never be converted to MYCOLUMN.
  2. Do not change the column name case when quotation is required.

I would side with solution 1 but this would result in a higher impact on existing applications and might not be desired. Solution 2 may be a good trade-off.

@simboel
Copy link
Contributor Author

simboel commented Jan 16, 2023

I've further looked into this and I think that this is indeed deeply linked to #683

@Tapac Do you have any opinion about the way Exposed should behave?
You can see two possible solution drafts above. As I've said, I would drop the IdentifierManagerApi.inProperCase(identity: String) and always use the identity as-is or quote if necessary. But I'm unsure about the reasoning behind this method.

I'm eager to work on a Pull Request for this problem if you want. But as we're talking about breaking changes I want to be extra careful.

So even if you decide for any of the solutions from above (or something else), I would like to think a bit about how to incorporate the breaking changes. I have a few ideas:

  1. Just break applications and change the behavior.
  2. Just break applications but provide an optional configuration (somewhere) that can be used to recover the previous behavior.
  3. Do not break the application and provide a configuration option on the Table object instance as proposed in Postgres column names are case sensitive (uppercase / lowercase) #683: open class Table(name: String = "", useExplicitName: Boolean = false)
  4. Do not break the application and provide an optional configuration (somewhere) that can be used to enable the new behavior.

I'd advise against option 1 (obviously not very elegant or user friendly) and option 3 (specify this on all tables manually is cumbersome and error-prone).
Depending on the current stability of Exposed I'd go with option 2 or 4, favoring option 4. With option 4 we could provide an easy "fix" for this problem so users just need to enable the configuration option. And we could also notify users that this configuration option will switch to true by default in n versions down the line. Of course, this is only required if we want to change the default behavior (which I think we should).

@dzikoysk
Copy link
Contributor

For sure it cannot be solved by any kind of breaking change, so I'd personally go for opt-in option (4) to enforce explicit names for each identifier used in Exposed for tables/coulmns/indices names, even sth like IdentifierManagerApi.enforceExplicitNames = true at this point.

@AlexeySoshin
Copy link
Contributor

AlexeySoshin commented Feb 2, 2023

@simboel
The potential change should be pretty small:

when {
alreadyQuoted && supportsMixedQuotedIdentifiers -> identity
alreadyQuoted && isUpperCaseQuotedIdentifiers -> identity.uppercase()
alreadyQuoted && isLowerCaseQuotedIdentifiers -> identity.lowercase()
supportsMixedIdentifiers -> identity
oracleVersion != OracleVersion.NonOracle -> identity.uppercase()
isUpperCaseIdentifiers -> identity.uppercase()
isLowerCaseIdentifiers -> identity.lowercase()
else -> identity
}

There's a keywords map available, so you could say that if the identified is in keywords, you return identity

I would say, let's give it a try and see where it breaks. The test suite should be quite comprehensive.

@simboel
Copy link
Contributor Author

simboel commented Feb 2, 2023

@AlexeySoshin I think it'll probably break every single Postgres or H2 application (possibly more that I just haven't checked yet) that is relying on the current implementation as we're facing quotation. I think we might not be able to skip backward compatibility.

@bog-walk
Copy link
Member

Hi @simboel and thanks for all your thoughts on this issue.

Regarding why inProperCase() is used:

  • I can't testify to the exact thoughts of the original maintainer, but one reasoning is that Exposed is tightly reliant on JDBC metadata for a fair amount of under-the-hood processes. Schema validation for example, via createMissingTablesAndColumns() or statementsRequiredToActualizeScheme(), retrieves this metadata for comparison checks and requires that the identifiers cached by inProperCase() match the identifiers as they are internally stored and returned by the database.

So that we can work towards the best solution for our users, I'd appreciate any insight you have regarding some questions I've detailed below:

Essentially, I'm trying to understand the end-goal of solutions 3/4 (i.e. providing some sort of flag to enable explicit use of the user-defined name).

Let's imagine in this scenario that the issue with keywords doesn't exist. i.e. The current behavior of Exposed, regardless of database, is that it correctly identifies a keyword, leaves the identifier case unchanged, and appropriately quotes it. If that were the current situation, what would be the benefit of introducing a flag on top of that?

  1. So the user can avoid escape-quoting every single identifier in order to ensure that desired case is retained?
    • If so, would the expectation be that it is a global flag, performing a blanket change that affects all created schema and tables?
    • What if the user then wants to exclude a single table from the flag?
    • Should opt-in/opt-out be configurable on a table-by-table basis?
  2. So that table objects can be seamlessly used across multiple databases?
    • From my understanding, this should only be an issue for quoted identifiers. In your example, the id column doesn't throw an exception even though it has a different case in H2 from the case in the table definition, correct?
    • As I'm seeing in issue 683 , MySQL might be a special case, as it will not recognize quoted identifiers unless they are wrapped in backticks.

So if the situation with quoted keywords is fixed, would there still be a need for a flag that I'm missing? If so, any example would be very much appreciated.

@bog-walk
Copy link
Member

Hi @simboel . Please note that this issue will be closed by the PR over the next few days.

In the interest of continuing the discussion regarding a new feature flag that enables user-defined case sensitivity, please address any thoughts/design wishes in the related issue mentioned above (#683). Or in the dedicated YouTrack issue.

Replying here might mean that I miss your response and I'd rather avoid that if possible.
Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants