Handle literal backtick in column name by doubling per SQL grammar

msrathore-db · msrathore-db · commit 2c1bc557434d · 2026-04-24T15:25:41.000+05:30
Databricks supports literal backticks in column names via the
BACKQUOTED_IDENTIFIER lexer rule ``(``|~`)*`` — a backtick inside a
quoted identifier is escaped by doubling it. Empirically verified
via raw SQL: `CREATE TABLE ... (`a``b` STRING)`, `INSERT INTO ...
VALUES (:`a``b`)` with dict key `a`b` (single) all work end-to-end.
Databricks docs example: `DESCRIBE SELECT 5 AS `a``b``.

The dialect had two bugs in this path:

1. DatabricksIdentifierPreparer inherited SQLAlchemy's default
   escape_quote=`"` (double quote), not ``` ` ``` (backtick). So DDL
   for a column named ``a`b`` rendered as ``a`b`` (3 backticks,
   unparseable) instead of ``a``b`` (4 backticks). CREATE TABLE
   failed with PARSE_SYNTAX_ERROR. Pre-existing bug — never worked
   on main.

2. The bind template `:`%(name)s`` used straight string substitution
   with no escape awareness. For a name containing a literal
   backtick, the rendered marker had an unescaped inner backtick
   and the parser failed to match.

Fix:

- Preparer passes escape_quote=``` ` ``` explicitly so SQLAlchemy
  doubles backticks inside quoted identifiers. SQLAlchemy infers
  escape_to_quote=```` `` ```` from escape_quote.

- Statement compiler overrides bindparam_string to double backticks
  in the name before the template wraps it. escaped_from is NOT set
  so the params dict key stays the single-backtick original — the
  server collapses the doubled form back to a single backtick when
  it parses the marker name. Skipped for post_compile and
  pre-escaped paths, matching super's contract.

Adds unit test and extends the empirical audit with a round-trip
case (CREATE -&gt; INSERT -&gt; SELECT WHERE -&gt; DROP) on a column named
`a`b`. Audit is now 41/41 end-to-end; unit tests 258.

Signed-off-by: Madhavendra Rathore &lt;madhavendra.rathore@databricks.com&gt;
diff --git a/src/databricks/sqlalchemy/_ddl.py b/src/databricks/sqlalchemy/_ddl.py
@@ -11,7 +11,12 @@ class DatabricksIdentifierPreparer(compiler.IdentifierPreparer):
     legal_characters = re.compile(r"^[A-Z0-9_]+$", re.I)
 
     def __init__(self, dialect):
-        super().__init__(dialect, initial_quote="`")
+        # ``escape_quote`` must match ``initial_quote`` so a literal
+        # backtick inside a quoted identifier is doubled (``a``b`` —
+        # per the ``BACKQUOTED_IDENTIFIER`` lexer rule in Spark SQL).
+        # The default from SQLAlchemy is ``"`` which would escape the
+        # wrong character, producing invalid DDL like ``a`b``.
+        super().__init__(dialect, initial_quote="`", escape_quote="`")
 
 
 class DatabricksDDLCompiler(compiler.DDLCompiler):
@@ -127,6 +132,23 @@ class DatabricksStatementCompiler(compiler.SQLCompiler):
         lambda self: self._BIND_TEMPLATE, lambda self, _: None
     )
 
+    def bindparam_string(self, name, **kw):
+        # The template ``:`%(name)s``` assumes ``name`` is safe inside
+        # backticks — any literal backtick must be doubled per the
+        # ``BACKQUOTED_IDENTIFIER`` lexer rule. The doubling affects only
+        # the rendered SQL; the params dict key sent to the driver stays
+        # the single-backtick original (the server collapses ``  ->  `
+        # when it parses the marker name), so we must not set
+        # ``escaped_from`` — leaving ``escaped_bind_names`` empty keeps
+        # the key translation in ``construct_params`` a no-op.
+        if (
+            "`" in name
+            and not kw.get("escaped_from")
+            and not kw.get("post_compile", False)
+        ):
+            name = name.replace("`", "``")
+        return super().bindparam_string(name, **kw)
+
     def limit_clause(self, select, **kw):
         """Identical to the default implementation of SQLCompiler.limit_clause except it writes LIMIT ALL instead of LIMIT -1,
         since Databricks SQL doesn't support the latter.
diff --git a/tests/test_local/test_ddl.py b/tests/test_local/test_ddl.py
@@ -211,6 +211,27 @@ def test_leading_digit_column_is_backticked(self):
         compiled = self._compile_insert(table, {"1col": "x"})
         assert ":`1col`" in str(compiled)
 
+    def test_literal_backtick_in_column_name_is_doubled(self):
+        """A literal backtick inside a column name must be doubled in the
+        rendered SQL (both the DDL column identifier and the bind
+        marker), per the Spark SQL ``BACKQUOTED_IDENTIFIER`` lexer rule.
+        The params dict key stays the single-backtick original — the
+        server un-doubles when it parses the marker name.
+        """
+        from sqlalchemy.schema import CreateTable
+
+        metadata = MetaData()
+        table = Table("t", metadata, Column("a`b", String()))
+
+        create_sql = str(CreateTable(table).compile(bind=self.engine))
+        assert "`a``b`" in create_sql  # DDL identifier doubled
+
+        compiled = self._compile_insert(table, {"a`b": "x"})
+        assert ":`a``b`" in str(compiled)  # bind marker doubled
+        params = compiled.construct_params()
+        assert params["a`b"] == "x"  # dict key stays single-backtick
+        assert "a``b" not in params
+
     def test_many_special_characters_in_column_names(self):
         """Column names containing characters that Delta allows (hyphens,
         slashes, question marks, hash, plus, star, at, dollar, amp, pipe,