feat: improve codebase#3
Merged
Merged
Conversation
sqlx 0.8's PgTypeInfo::with_name does not accept schema-qualified names like "agent.canal_type_enum"; emitting them causes runtime decode errors. Always emit the unqualified type name and rely on the connection's search_path to resolve non-public schemas.
35-task TDD plan covering 78 findings across security, codegen/typemap, error handling, tests/CI, and SQL ↔ Rust conformity.
Introduces codegen::identifiers with quote_ident, quote_qualified, and is_safe_ident helpers. Foundation for preventing SQL injection in generated CRUD code by quoting table/column/schema names per dialect (backticks for MySQL, double quotes for Postgres/SQLite).
Every table, schema, and column name interpolated into generated SQL strings now goes through identifiers::quote_ident / quote_qualified. Prevents SQL injection when DB metadata contains quote characters or reserved words, and produces correct SQL for identifiers that would otherwise be ambiguous (e.g. columns named "select"). Updated 13 existing tests whose substring assertions encoded the prior unquoted output. Fixed the junction_entity test fixture to split schema/table properly instead of relying on a dotted table_name.
Each Rust type passed via --type-overrides is now parsed by
syn::parse_str::<syn::Type> before being injected into generated code.
Rejects empty keys/values, missing '=', and strings that aren't a
single valid Rust type. Closes the code-injection path where
"--type-overrides jsonb=Vec<u8>; fn pwned() {}" would have been
emitted verbatim into the output.
Connection failures previously bubbled up the raw sqlx::Error which can include the full database URL (user:password@host) in its Display implementation. Wrap the pool.connect() error in a new Error::Connection variant that carries a redacted URL, and add a redact_url helper that replaces the password with "****".
parse_and_format / parse_and_format_with_tab_spaces previously called std::process::exit(1) when prettyplease failed to parse the generated TokenStream. That kills the user's build with no recovery path and no useful diagnostic if sqlx-gen is ever used as a library. Now both helpers return Result<String, error::Error>; format_tokens, format_tokens_with_imports, and codegen::generate propagate via ?. The error message includes the failing token stream and a request to file an issue, since this only fires on internal codegen bugs. Test helpers across struct_gen, enum_gen, composite_gen, domain_gen, crud_gen, codegen::mod, and e2e_sqlite were updated to .unwrap() the Result — they assert on the happy path and want a clear panic if it breaks.
Removed 7× .expect() on MySQL information_schema Vec<u8> → String conversions, replaced with utf8_field helper that returns Error::Config on invalid UTF-8. Removed 5× .last_mut().unwrap() panic risks across postgres.rs and mysql.rs (tables, views, enums, composites). Each now returns Error::Config with an "internal sqlx-gen bug" message that points the user at filing an issue rather than crashing the build.
write_atomic streams into a sibling NamedTempFile then renames into place, so a Ctrl-C or disk-full error never leaves a half-written .rs file that would break the user's next build. validate_safe_filename rejects path separators, "..", absolute paths, empty names, and non-.rs extensions before any write happens. Defends against the rare case where introspected table names flow into the output filename and could otherwise escape output_dir.
Runs on every push to main and every PR. Three primary jobs: - test: cargo test --all (unit + sqlite-based integration) - fmt: rustfmt --check - clippy: -D warnings on all targets Two optional jobs spin up Postgres 16 and MySQL 8.0 services and run the e2e_postgres / e2e_mysql test files (added in upcoming commits). These continue-on-error until the e2e suites exist.
…e decimal - MySQL `bit(1)` → bool (idiomatic boolean column); bit(N>1) stays Vec<u8> - MySQL `boolean`/`bool` aliases → bool (previously fell through to String) - Postgres `interval` → PgInterval with the corresponding import (was hitting the String fallback) - SQLite `NUMERIC`/`DECIMAL` → Decimal instead of f64; matches the precision-safe behaviour already shipped for Postgres and MySQL
Before this commit, a Postgres column of type my_enum[] was mapped to
Vec<MyEnum> but the generated MyEnum had no PgHasArrayType impl. At
runtime sqlx then bailed with "unsupported type _my_enum of column #N"
because it could not resolve the array's element type info.
Now both enum_gen and composite_gen emit an `impl PgHasArrayType` whose
array_type_info() returns PgTypeInfo::with_name("_<name>"), which matches
how Postgres names array types. Gated on DatabaseKind::Postgres so MySQL
and SQLite output is unchanged.
Two SQL enum values like 'foo bar' and 'foo_bar' both collapse to the Rust identifier FooBar via to_upper_camel_case, which previously generated code that would not compile. check_variant_collisions runs during codegen::generate and returns a clear Error::Config pointing at the conflicting variants and the Rust identifier they share.
Columns named "user-id", "created at", "123foo" etc. previously produced Rust code that wouldn't compile because format_ident! cannot encode dashes/spaces/leading digits. sanitize_rust_ident: - replaces every non-alphanumeric (and non-_) character with '_' - prefixes a '_' if the result starts with a digit - falls back to "_field" on an empty string The original DB column name is preserved via the existing #[sqlx(rename = "<original>")] rewrite, so reads and writes still hit the right column.
Tables in "public" (Postgres), "main" (SQLite), or "dbo" no longer get their schema rendered into every generated SELECT/INSERT/UPDATE/DELETE. The qualified form is still used for non-default schemas, where it is required for unambiguous resolution.
The audit flagged inline ENUMs as potentially broken, but the existing per-variant #[sqlx(rename)] emitted whenever the camelCase identifier differs from the SQL value is exactly what sqlx::Type expects for text encoding on MySQL/SQLite. These tests pin that behaviour for both lowercase and case-sensitive variants so a future refactor can't silently regress it.
`--domain-style alias` (default) keeps the existing `pub type Email = String;`
behaviour. `--domain-style newtype` instead emits
#[derive(..., sqlx::Type)]
#[sqlx(transparent)]
pub struct Email(pub String);
so the user can attach validation, traits, or accessors to the
domain. Both styles share the same doc-comment and codegen plumbing
via the new DomainStyle enum and generate_with_domain_style entry
point. CLI defaults preserve current behaviour exactly.
SQLite has no native enum type, so users encode them with
TEXT CHECK (status IN ('active', 'inactive'))
extract_check_enums parses sqlite_master.sql for each table, looks for
that pattern column-by-column, and synthesises an EnumInfo plus
rewrites the column's udt_name to <table>_<col>_enum. From there the
existing enum/typemap pipeline takes over and emits a real Rust enum
that round-trips via per-variant #[sqlx(rename)].
contextualize_sqlx_error inspects the SQLSTATE on a sqlx::Error and re-raises: - 42501 / 28000 → PermissionDenied with a hint about the DB user's privileges on information_schema / pg_catalog / sqlite_master - 42P01 / 3F000 / 42S02 → SchemaNotFound with a hint about --schemas Other sqlx::Error values still fall through to the existing Error::Database variant, so the public API and behaviour are unchanged for unrelated failures.
LAST_INSERT_ID() only returns a meaningful value when the table has a single AUTO_INCREMENT primary key. For composite PKs: - include every PK column in InsertParams so the user can supply them - run the INSERT with the bound values - SELECT the freshly inserted row by binding the same PK values build_insert_method_parsed and build_insert_many_transactionally_method both branch on pk_fields.len(); single-PK MySQL flows continue to use LAST_INSERT_ID exactly as before. Postgres / SQLite are unaffected because their RETURNING * already handled this case.
compile_check.rs validates that codegen output is loadable in two modes: 1. Fast path (always on): each GeneratedFile is parsed with syn::parse_file. Catches malformed attributes, unclosed braces, invalid identifiers, and anything else that breaks at the AST level. Runs across Postgres, MySQL, SQLite, and the newtype-domain variant. 2. Deep path (gated on SQLX_GEN_COMPILE_CHECK=1): scaffolds a temporary downstream crate, drops the generated code into src/lib.rs, and runs `cargo check` with the full sqlx dependency tree. This is the only check that confirms the emitted derives and #[sqlx(...)] attributes are actually accepted by sqlx itself.
Postgres' information_schema.columns reports the schema in which a column's user-defined type lives (e.g. "auth" for an auth.role enum column, "pg_catalog" for builtins). Capture it on every column so the typemap and codegen layers can disambiguate two schemas declaring a type with the same name. - Adds udt_schema: Option<String> to ColumnInfo - Postgres fetch_tables / fetch_views select COALESCE(udt_schema, '') and unpack to None when empty - MySQL, SQLite, and synthetic test fixtures keep it None - ColumnInfo derives Default so future test code can use struct update syntax
When the same SQL name (e.g. "role") exists in two non-default
schemas, sqlx-gen now prefixes the Rust identifier with the schema's
PascalCase form: auth.role → AuthRole, billing.role → BillingRole.
The bare PascalCase ("Role") is reserved for unique names and for the
default schema even when a collision exists.
Plumbing:
- codegen::rust_type_name_for + type_name_has_cross_schema_collision
as the single source of truth, callable from typemap and from each
*_gen module.
- typemap::postgres exposes map_type_qualified that takes the column's
udt_schema (added in the previous commit) so cross-schema duplicate
lookups land on the right (schema, name) pair.
- enum_gen::generate_enum_with_schema wraps the legacy entry point
and propagates the SchemaInfo so the emitted Rust enum carries the
prefixed name. composite_gen and domain_gen call rust_type_name_for
directly since they already receive SchemaInfo.
- codegen::generate now calls generate_enum_with_schema.
The SQL #[sqlx(type_name = "...")] attribute is still emitted in its
unqualified form because sqlx 0.8 doesn't accept "schema.type"; users
remain responsible for setting search_path on the connection.
When an enum or composite lives in a schema other than public, sqlx 0.8 cannot resolve its unqualified type_name unless the connection's search_path includes that schema. To make this discoverable: - Emit a /// doc-comment on every non-default-schema enum and composite spelling out the requirement with a copy-paste-ready SET search_path snippet - Add codegen::required_pg_search_path(&schema_info), which returns the sorted, deduplicated list of schemas needed - Make the CLI log the exact SET search_path line after introspection when the result references any non-default schemas - Document the whole flow (after_connect hook + collision prefixing) in a new "PostgreSQL — multi-schema setup" section in README.md
…ype) #[derive(sqlx::Type)] combined with #[sqlx(type_name = "x")] already auto-generates `impl PgHasArrayType` pointing at `_x` in sqlx 0.8+. The manual impl added by Task 27 collided with the derive output, producing E0119 "conflicting implementations" in any downstream crate that consumed the generated types. Remove the manual block from enum_gen and composite_gen, replace the "must emit" tests with "must NOT emit" regressions across all three dialects, and rely on the sqlx derive for array support.
Every column, table, and schema reference was previously emitted with
unconditional dialect quotes. For lowercase ASCII names that aren't
reserved words this produced noisy SQL ("agent"."agent__connector",
"connector_id" = $1) without any added safety.
quote_ident now defers to is_safe_unquoted: an identifier is emitted
bare when it starts with a lowercase letter or underscore, contains
only ASCII lowercase / digits / underscores, and is not in a curated
~100-word SQL reserved list (sorted, binary-searched).
quote_ident_always remains for sites that genuinely need to force the
quotes. quote_qualified composes per-part.
This means agent.agent__connector instead of "agent"."agent__connector"
on the user's reported schema, while user-supplied DB names that
collide with SELECT / order / user etc. still get quoted defensively.
The two crates used to declare their version independently (0.5.5 in both, but with sqlx-gen-macros pinned at 0.5.4 inside sqlx-gen). A single field bump would have to happen in three places before they matched again, and the cross-dep made silent drift easy to ship. - Root Cargo.toml grows [workspace.package] with version, edition, rust-version, license, repository, keywords, categories. - Root Cargo.toml grows [workspace.dependencies] declaring every dependency once, including the internal sqlx-gen-macros (now always = the workspace version) and every external crate. - Each member crate inherits with `*.workspace = true`. Per-crate Cargo.toml shrinks to per-crate fields only (name, description, features, bin). - .gitignore now excludes /docs/superpowers/ so locally generated audit/plan files stay out of the repo.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.