Skip to content

Commit 2c1bc55

Browse files
committed
Handle literal backtick in column name by doubling per SQL grammar
Databricks supports literal backticks in column names via the BACKQUOTED_IDENTIFIER lexer rule ``(``|~`)*`` — a backtick inside a quoted identifier is escaped by doubling it. Empirically verified via raw SQL: `CREATE TABLE ... (`a``b` STRING)`, `INSERT INTO ... VALUES (:`a``b`)` with dict key `a`b` (single) all work end-to-end. Databricks docs example: `DESCRIBE SELECT 5 AS `a``b``. The dialect had two bugs in this path: 1. DatabricksIdentifierPreparer inherited SQLAlchemy's default escape_quote=`"` (double quote), not ``` ` ``` (backtick). So DDL for a column named ``a`b`` rendered as ``a`b`` (3 backticks, unparseable) instead of ``a``b`` (4 backticks). CREATE TABLE failed with PARSE_SYNTAX_ERROR. Pre-existing bug — never worked on main. 2. The bind template `:`%(name)s`` used straight string substitution with no escape awareness. For a name containing a literal backtick, the rendered marker had an unescaped inner backtick and the parser failed to match. Fix: - Preparer passes escape_quote=``` ` ``` explicitly so SQLAlchemy doubles backticks inside quoted identifiers. SQLAlchemy infers escape_to_quote=```` `` ```` from escape_quote. - Statement compiler overrides bindparam_string to double backticks in the name before the template wraps it. escaped_from is NOT set so the params dict key stays the single-backtick original — the server collapses the doubled form back to a single backtick when it parses the marker name. Skipped for post_compile and pre-escaped paths, matching super's contract. Adds unit test and extends the empirical audit with a round-trip case (CREATE -> INSERT -> SELECT WHERE -> DROP) on a column named `a`b`. Audit is now 41/41 end-to-end; unit tests 258. Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
1 parent 9414d89 commit 2c1bc55

2 files changed

Lines changed: 44 additions & 1 deletion

File tree

src/databricks/sqlalchemy/_ddl.py

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,12 @@ class DatabricksIdentifierPreparer(compiler.IdentifierPreparer):
1111
legal_characters = re.compile(r"^[A-Z0-9_]+$", re.I)
1212

1313
def __init__(self, dialect):
14-
super().__init__(dialect, initial_quote="`")
14+
# ``escape_quote`` must match ``initial_quote`` so a literal
15+
# backtick inside a quoted identifier is doubled (``a``b`` —
16+
# per the ``BACKQUOTED_IDENTIFIER`` lexer rule in Spark SQL).
17+
# The default from SQLAlchemy is ``"`` which would escape the
18+
# wrong character, producing invalid DDL like ``a`b``.
19+
super().__init__(dialect, initial_quote="`", escape_quote="`")
1520

1621

1722
class DatabricksDDLCompiler(compiler.DDLCompiler):
@@ -127,6 +132,23 @@ class DatabricksStatementCompiler(compiler.SQLCompiler):
127132
lambda self: self._BIND_TEMPLATE, lambda self, _: None
128133
)
129134

135+
def bindparam_string(self, name, **kw):
136+
# The template ``:`%(name)s``` assumes ``name`` is safe inside
137+
# backticks — any literal backtick must be doubled per the
138+
# ``BACKQUOTED_IDENTIFIER`` lexer rule. The doubling affects only
139+
# the rendered SQL; the params dict key sent to the driver stays
140+
# the single-backtick original (the server collapses `` -> `
141+
# when it parses the marker name), so we must not set
142+
# ``escaped_from`` — leaving ``escaped_bind_names`` empty keeps
143+
# the key translation in ``construct_params`` a no-op.
144+
if (
145+
"`" in name
146+
and not kw.get("escaped_from")
147+
and not kw.get("post_compile", False)
148+
):
149+
name = name.replace("`", "``")
150+
return super().bindparam_string(name, **kw)
151+
130152
def limit_clause(self, select, **kw):
131153
"""Identical to the default implementation of SQLCompiler.limit_clause except it writes LIMIT ALL instead of LIMIT -1,
132154
since Databricks SQL doesn't support the latter.

tests/test_local/test_ddl.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,6 +211,27 @@ def test_leading_digit_column_is_backticked(self):
211211
compiled = self._compile_insert(table, {"1col": "x"})
212212
assert ":`1col`" in str(compiled)
213213

214+
def test_literal_backtick_in_column_name_is_doubled(self):
215+
"""A literal backtick inside a column name must be doubled in the
216+
rendered SQL (both the DDL column identifier and the bind
217+
marker), per the Spark SQL ``BACKQUOTED_IDENTIFIER`` lexer rule.
218+
The params dict key stays the single-backtick original — the
219+
server un-doubles when it parses the marker name.
220+
"""
221+
from sqlalchemy.schema import CreateTable
222+
223+
metadata = MetaData()
224+
table = Table("t", metadata, Column("a`b", String()))
225+
226+
create_sql = str(CreateTable(table).compile(bind=self.engine))
227+
assert "`a``b`" in create_sql # DDL identifier doubled
228+
229+
compiled = self._compile_insert(table, {"a`b": "x"})
230+
assert ":`a``b`" in str(compiled) # bind marker doubled
231+
params = compiled.construct_params()
232+
assert params["a`b"] == "x" # dict key stays single-backtick
233+
assert "a``b" not in params
234+
214235
def test_many_special_characters_in_column_names(self):
215236
"""Column names containing characters that Delta allows (hyphens,
216237
slashes, question marks, hash, plus, star, at, dollar, amp, pipe,

0 commit comments

Comments
 (0)