fix(db): add CHECK constraint introspection for all database dialects#40
fix(db): add CHECK constraint introspection for all database dialects#40
Conversation
The database introspectors were missing CHECK constraint queries in their get_constraints() implementations, causing CHECK constraints to be silently lost during strata export. Added CHECK constraint support for: - PostgreSQL: queries pg_constraint with contype='c' and uses pg_get_constraintdef() to extract the expression - MySQL: queries information_schema.check_constraints (MySQL 8.0.16+) with filtering for auto-generated NOT NULL checks - SQLite: parses CHECK constraints from CREATE TABLE DDL in sqlite_master Fixes #24 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This pull request adds CHECK constraint introspection for PostgreSQL, MySQL, and SQLite databases to fix issue #24 where CHECK constraints were silently dropped during strata export. The implementation queries database-specific system catalogs and parses CREATE TABLE statements to extract CHECK constraints and their associated columns.
Changes:
- Added CHECK constraint introspection queries for all three database dialects (PostgreSQL, MySQL, SQLite)
- Implemented helper functions to extract column names from CHECK expressions using dialect-specific parsing strategies
- Added 17 unit tests covering SQLite DDL parsing, MySQL backtick extraction, and SQLite column identification
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // 大文字小文字を無視して CHECK キーワードを検索 | ||
| // テーブルレベルの CHECK 制約: CHECK (expression) の形式 | ||
| let re = regex::Regex::new(r"(?i)\bCHECK\s*\(").unwrap(); | ||
|
|
||
| for m in re.find_iter(create_sql) { | ||
| let start = m.end(); // '(' の直後 | ||
| // 対応する閉じ括弧を見つける(ネスト対応) | ||
| let mut depth = 1; | ||
| let mut end = start; | ||
| for (i, ch) in create_sql[start..].char_indices() { | ||
| match ch { | ||
| '(' => depth += 1, | ||
| ')' => { | ||
| depth -= 1; | ||
| if depth == 0 { | ||
| end = start + i; | ||
| break; | ||
| } | ||
| } | ||
| _ => {} | ||
| } | ||
| } | ||
|
|
||
| if depth == 0 { | ||
| let expression = create_sql[start..end].trim().to_string(); | ||
|
|
||
| // 式からカラム名を推定(識別子として使われている単語を抽出) | ||
| let columns = extract_columns_from_sqlite_check(&expression); | ||
|
|
||
| results.push(RawConstraintInfo::Check { | ||
| columns, | ||
| expression, | ||
| }); | ||
| } |
There was a problem hiding this comment.
The regex pattern could match CHECK keywords that appear in column names or string literals, not just actual CHECK constraints. For example:
- Column name:
CREATE TABLE t (check_date TEXT, CHECK (check_date IS NOT NULL))- would match both occurrences - String literal:
CREATE TABLE t (val TEXT, CHECK (val != 'CHECK'))- could match the string literal
While the parenthesis-matching logic might mitigate some cases, this could still lead to false positives. Consider adding logic to:
- Skip CHECK keywords that appear within quoted strings
- Verify the CHECK keyword is at the appropriate syntactic position (after a comma or opening parenthesis at the table level)
| // 大文字小文字を無視して CHECK キーワードを検索 | |
| // テーブルレベルの CHECK 制約: CHECK (expression) の形式 | |
| let re = regex::Regex::new(r"(?i)\bCHECK\s*\(").unwrap(); | |
| for m in re.find_iter(create_sql) { | |
| let start = m.end(); // '(' の直後 | |
| // 対応する閉じ括弧を見つける(ネスト対応) | |
| let mut depth = 1; | |
| let mut end = start; | |
| for (i, ch) in create_sql[start..].char_indices() { | |
| match ch { | |
| '(' => depth += 1, | |
| ')' => { | |
| depth -= 1; | |
| if depth == 0 { | |
| end = start + i; | |
| break; | |
| } | |
| } | |
| _ => {} | |
| } | |
| } | |
| if depth == 0 { | |
| let expression = create_sql[start..end].trim().to_string(); | |
| // 式からカラム名を推定(識別子として使われている単語を抽出) | |
| let columns = extract_columns_from_sqlite_check(&expression); | |
| results.push(RawConstraintInfo::Check { | |
| columns, | |
| expression, | |
| }); | |
| } | |
| // 文字ごとのインデックスと文字を保持して、安全にスキャンできるようにする | |
| let chars: Vec<(usize, char)> = create_sql.char_indices().collect(); | |
| let len = chars.len(); | |
| // クォート内かどうかをトラッキング(シングルクォートは文字列、ダブルクォートは識別子) | |
| let mut in_single_quote = false; | |
| let mut in_double_quote = false; | |
| let mut i = 0; | |
| while i < len { | |
| let (byte_idx, ch) = chars[i]; | |
| // クォートの開始・終了処理(SQLite の '' / "" エスケープに対応) | |
| if ch == '\'' && !in_double_quote { | |
| if in_single_quote { | |
| // 連続する '' はエスケープなので、クォートは継続 | |
| if let Some((_, next_ch)) = chars.get(i + 1) { | |
| if *next_ch == '\'' { | |
| i += 2; | |
| continue; | |
| } | |
| } | |
| in_single_quote = false; | |
| } else { | |
| in_single_quote = true; | |
| } | |
| i += 1; | |
| continue; | |
| } else if ch == '"' && !in_single_quote { | |
| if in_double_quote { | |
| if let Some((_, next_ch)) = chars.get(i + 1) { | |
| if *next_ch == '"' { | |
| i += 2; | |
| continue; | |
| } | |
| } | |
| in_double_quote = false; | |
| } else { | |
| in_double_quote = true; | |
| } | |
| i += 1; | |
| continue; | |
| } | |
| // クォート内の文字は無視 | |
| if in_single_quote || in_double_quote { | |
| i += 1; | |
| continue; | |
| } | |
| // CHECK キーワードを大文字小文字を無視して検出 | |
| if (ch == 'c' || ch == 'C') && i + 4 < len { | |
| let is_check = matches!( | |
| ( | |
| chars[i].1.to_ascii_uppercase(), | |
| chars[i + 1].1.to_ascii_uppercase(), | |
| chars[i + 2].1.to_ascii_uppercase(), | |
| chars[i + 3].1.to_ascii_uppercase(), | |
| chars[i + 4].1.to_ascii_uppercase() | |
| ), | |
| ('C', 'H', 'E', 'C', 'K') | |
| ); | |
| if is_check { | |
| // 単語境界を確認(前後が識別子文字でないこと) | |
| let prev_is_ident = if i == 0 { | |
| false | |
| } else { | |
| let (_, prev) = chars[i - 1]; | |
| prev == '_' || prev.is_ascii_alphanumeric() | |
| }; | |
| let next_is_ident = if i + 5 >= len { | |
| false | |
| } else { | |
| let (_, next_ch) = chars[i + 5]; | |
| next_ch == '_' || next_ch.is_ascii_alphanumeric() | |
| }; | |
| if !prev_is_ident && !next_is_ident { | |
| // 直前の非空白文字が ',' または '(' であることを確認(テーブルレベル制約の位置) | |
| let mut j = i; | |
| while j > 0 { | |
| j -= 1; | |
| let (_, pj) = chars[j]; | |
| if !pj.is_whitespace() { | |
| if pj != ',' && pj != '(' { | |
| // テーブルレベル制約ではない可能性が高いのでスキップ | |
| break; | |
| } | |
| // ここまで来たらテーブルレベル CHECK とみなす | |
| let mut k = i + 5; | |
| // CHECK の後の空白をスキップ | |
| while k < len && chars[k].1.is_whitespace() { | |
| k += 1; | |
| } | |
| if k >= len || chars[k].1 != '(' { | |
| // CHECK の直後に '(' がない場合は無視 | |
| break; | |
| } | |
| // 式部分の開始位置('(' の直後のバイトインデックス) | |
| let open_paren_byte = chars[k].0; | |
| let expr_start = open_paren_byte | |
| + chars[k].1.len_utf8(); | |
| // 対応する閉じ括弧を見つける(ネストおよびクォートに対応) | |
| let mut depth = 1_i32; | |
| let mut expr_end = None; | |
| let mut m = k + 1; | |
| let mut in_sq = false; | |
| let mut in_dq = false; | |
| while m < len { | |
| let (m_byte, mch) = chars[m]; | |
| // CHECK 式内でのクォート処理 | |
| if mch == '\'' && !in_dq { | |
| if in_sq { | |
| if let Some((_, next_ch)) = chars.get(m + 1) { | |
| if *next_ch == '\'' { | |
| m += 2; | |
| continue; | |
| } | |
| } | |
| in_sq = false; | |
| } else { | |
| in_sq = true; | |
| } | |
| m += 1; | |
| continue; | |
| } else if mch == '"' && !in_sq { | |
| if in_dq { | |
| if let Some((_, next_ch)) = chars.get(m + 1) { | |
| if *next_ch == '"' { | |
| m += 2; | |
| continue; | |
| } | |
| } | |
| in_dq = false; | |
| } else { | |
| in_dq = true; | |
| } | |
| m += 1; | |
| continue; | |
| } | |
| if in_sq || in_dq { | |
| m += 1; | |
| continue; | |
| } | |
| match mch { | |
| '(' => depth += 1, | |
| ')' => { | |
| depth -= 1; | |
| if depth == 0 { | |
| expr_end = Some(m_byte); | |
| break; | |
| } | |
| } | |
| _ => {} | |
| } | |
| m += 1; | |
| } | |
| if let Some(expr_end_byte) = expr_end { | |
| if expr_end_byte >= expr_start { | |
| let expression = create_sql[expr_start..expr_end_byte] | |
| .trim() | |
| .to_string(); | |
| // 式からカラム名を推定(識別子として使われている単語を抽出) | |
| let columns = | |
| extract_columns_from_sqlite_check(&expression); | |
| results.push(RawConstraintInfo::Check { | |
| columns, | |
| expression, | |
| }); | |
| } | |
| } | |
| break; | |
| } | |
| } | |
| } | |
| // この CHECK の検査が終わったので、次の位置へ進む | |
| i += 5; | |
| continue; | |
| } | |
| } | |
| i += 1; |
| let re = regex::Regex::new(r"\b([a-zA-Z_][a-zA-Z0-9_]*)\b").unwrap(); | ||
| let mut columns = Vec::new(); | ||
|
|
||
| for cap in re.captures_iter(expression) { | ||
| let word = &cap[1]; | ||
| let upper = word.to_uppercase(); | ||
| if !keywords.contains(&upper.as_str()) && !columns.contains(&word.to_string()) { | ||
| columns.push(word.to_string()); | ||
| } | ||
| } |
There was a problem hiding this comment.
The column extraction logic doesn't account for SQL keywords used in different contexts. For example, in a CHECK expression like CASE WHEN val > 0 THEN 1 ELSE 0 END = 1, the keywords "CASE", "WHEN", "THEN", "ELSE", and "END" are properly filtered, but if a column is actually named something like "status_case" or "end_date", only part of the name would be extracted if it doesn't match the regex boundaries correctly.
More critically, the regex \b([a-zA-Z_][a-zA-Z0-9_]*)\b will match identifiers inside string literals. For example, in CHECK (status IN ('pending', 'active')), it would extract "pending" and "active" as column names.
Consider implementing proper string literal detection to skip identifiers within quotes.
| if trimmed.ends_with("is not null") | ||
| || trimmed.ends_with("IS NOT NULL") | ||
| || constraint_name.ends_with("_chk_1") | ||
| && (trimmed.contains("in (") || trimmed.contains("IN (")) |
There was a problem hiding this comment.
The boolean logic in this condition is incorrect due to missing parentheses. The current expression evaluates as:
(A || B || C) && D
where:
- A =
trimmed.ends_with("is not null") - B =
trimmed.ends_with("IS NOT NULL") - C =
constraint_name.ends_with("_chk_1") - D =
(trimmed.contains("in (") || trimmed.contains("IN ("))
This means that NOT NULL checks (A or B) will only be filtered if condition D is also true, which is incorrect. The intended logic should filter ENUM checks (C && D) separately from NOT NULL checks (A || B).
The condition should be rewritten as:
(trimmed.ends_with("is not null") || trimmed.ends_with("IS NOT NULL")) || (constraint_name.ends_with("_chk_1") && (trimmed.contains("in (") || trimmed.contains("IN (")))
| if trimmed.ends_with("is not null") | |
| || trimmed.ends_with("IS NOT NULL") | |
| || constraint_name.ends_with("_chk_1") | |
| && (trimmed.contains("in (") || trimmed.contains("IN (")) | |
| if (trimmed.ends_with("is not null") || trimmed.ends_with("IS NOT NULL")) | |
| || (constraint_name.ends_with("_chk_1") | |
| && (trimmed.contains("in (") || trimmed.contains("IN ("))) |
| for c in chars.by_ref() { | ||
| if c == '`' { | ||
| break; |
There was a problem hiding this comment.
The backtick parsing logic doesn't handle escaped backticks in MySQL identifiers. In MySQL, backticks can be escaped using double backticks (e.g., `mycolumn `` represents the identifier "mycolumn"). The current implementation would incorrectly split this into two separate identifiers.
While escaped backticks in column names are rare, the parser should handle them correctly for robustness. Consider checking if the next character after a closing backtick is another backtick, and if so, include it as part of the identifier and continue parsing.
| for c in chars.by_ref() { | |
| if c == '`' { | |
| break; | |
| // Parse a backtick-quoted identifier, handling escaped backticks (``) | |
| while let Some(c) = chars.next() { | |
| if c == '`' { | |
| // If the next character is also a backtick, treat as an escaped backtick | |
| if let Some('`') = chars.peek() { | |
| // Consume the second backtick and append a single backtick to the name | |
| chars.next(); | |
| name.push('`'); | |
| continue; | |
| } else { | |
| // Single backtick terminates the identifier | |
| break; | |
| } |
|
|
||
| // 大文字小文字を無視して CHECK キーワードを検索 | ||
| // テーブルレベルの CHECK 制約: CHECK (expression) の形式 | ||
| let re = regex::Regex::new(r"(?i)\bCHECK\s*\(").unwrap(); |
There was a problem hiding this comment.
The regular expressions are compiled on every function call, which is inefficient. For parse_sqlite_check_constraints, the regex is compiled once per table introspection, and for extract_columns_from_sqlite_check, it's compiled once per CHECK constraint.
Consider using lazy_static or once_cell to compile these regexes once at program startup. For example:
use once_cell::sync::Lazy;
static CHECK_REGEX: Lazy<Regex> = Lazy::new(|| Regex::new(r"(?i)\bCHECK\s*\(").unwrap());This would improve performance, especially when introspecting multiple tables with CHECK constraints.
| "AND", | ||
| "OR", | ||
| "NOT", | ||
| "IN", | ||
| "IS", | ||
| "NULL", | ||
| "LIKE", | ||
| "BETWEEN", | ||
| "EXISTS", | ||
| "TRUE", | ||
| "FALSE", |
There was a problem hiding this comment.
The keyword list for filtering SQL keywords is incomplete. Common SQL keywords that could appear in CHECK expressions are missing, which means they could be incorrectly identified as column names. Missing keywords include:
- Comparison and logical: CASE, WHEN, THEN, ELSE, END
- Data types: INTEGER, REAL, TEXT, BLOB, NUMERIC, DATE, TIME
- Functions: LENGTH, LOWER, UPPER, SUBSTR, ABS, ROUND, etc.
- Other: AS, CAST, COLLATE, GLOB, MATCH, REGEXP
Consider adding these common keywords to the filter list to avoid false positives when extracting column names. Alternatively, consider whether this heuristic approach is robust enough, or if a more sophisticated parser would be more appropriate.
| "AND", | |
| "OR", | |
| "NOT", | |
| "IN", | |
| "IS", | |
| "NULL", | |
| "LIKE", | |
| "BETWEEN", | |
| "EXISTS", | |
| "TRUE", | |
| "FALSE", | |
| // 論理演算子・比較・制御構文 | |
| "AND", | |
| "OR", | |
| "NOT", | |
| "IN", | |
| "IS", | |
| "LIKE", | |
| "BETWEEN", | |
| "EXISTS", | |
| "CASE", | |
| "WHEN", | |
| "THEN", | |
| "ELSE", | |
| "END", | |
| // リテラル / 真偽値 | |
| "NULL", | |
| "TRUE", | |
| "FALSE", | |
| // データ型 | |
| "INTEGER", | |
| "REAL", | |
| "TEXT", | |
| "BLOB", | |
| "NUMERIC", | |
| "DATE", | |
| "TIME", | |
| // 関数 | |
| "LENGTH", | |
| "LOWER", | |
| "UPPER", | |
| "SUBSTR", | |
| "ABS", | |
| "ROUND", | |
| "COALESCE", | |
| "IFNULL", | |
| "NULLIF", | |
| "TRIM", | |
| "LTRIM", | |
| "RTRIM", | |
| "MIN", | |
| "MAX", | |
| "AVG", | |
| "COUNT", | |
| "SUM", | |
| "RANDOM", | |
| "CHAR", | |
| "HEX", | |
| // その他のキーワード | |
| "AS", | |
| "CAST", | |
| "COLLATE", | |
| "GLOB", | |
| "MATCH", | |
| "REGEXP", |
| // CHECK制約 (MySQL 8.0.16+) | ||
| // information_schema.check_constraints と table_constraints を結合して取得 | ||
| let check_sql = r#" | ||
| SELECT | ||
| cc.constraint_name, | ||
| cc.check_clause | ||
| FROM information_schema.check_constraints cc | ||
| JOIN information_schema.table_constraints tc | ||
| ON cc.constraint_name = tc.constraint_name | ||
| AND cc.constraint_schema = tc.constraint_schema | ||
| WHERE tc.table_name = ? AND tc.table_schema = DATABASE() | ||
| AND tc.constraint_type = 'CHECK' | ||
| ORDER BY cc.constraint_name | ||
| "#; | ||
|
|
||
| let check_rows = sqlx::query(check_sql) | ||
| .bind(table_name) | ||
| .fetch_all(pool) | ||
| .await?; | ||
|
|
||
| for row in &check_rows { | ||
| let constraint_name = mysql_get_string(row, 0); | ||
| let check_clause = mysql_get_string(row, 1); | ||
|
|
||
| // MySQL auto-generates NOT NULL-like constraints with names ending in "_chk_N" | ||
| // for ENUM columns. Filter out constraints that are just IS NOT NULL checks. | ||
| let trimmed = check_clause.trim(); | ||
| if trimmed.ends_with("is not null") | ||
| || trimmed.ends_with("IS NOT NULL") | ||
| || constraint_name.ends_with("_chk_1") | ||
| && (trimmed.contains("in (") || trimmed.contains("IN (")) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| // Try to extract column names from the expression | ||
| // For simple single-column checks like "(balance >= 0)", extract column name | ||
| let columns = extract_columns_from_check_expression(&check_clause, table_name); | ||
|
|
||
| constraints.push(RawConstraintInfo::Check { | ||
| columns, | ||
| expression: check_clause, | ||
| }); | ||
| } | ||
|
|
There was a problem hiding this comment.
The test coverage for the MySQL CHECK constraint extraction is insufficient. There are no tests verifying the filtering logic for auto-generated constraints (NOT NULL checks and ENUM validation). Specifically:
- No test for filtering NOT NULL checks (lines 909-910)
- No test for filtering ENUM validation checks (lines 911-912)
- No test for ensuring user-defined CHECK constraints are NOT filtered
These tests are critical since the filtering logic has a bug (see related comment on lines 909-913). Adding tests would help catch the current boolean logic error and prevent regressions.
| // CHECK制約 | ||
| // pg_constraintからCHECK制約を取得(contype = 'c') | ||
| // string_agg でカラム名をカンマ区切りで返す(Any ドライバは配列非対応) | ||
| let check_sql = r#" | ||
| SELECT | ||
| con.conname::text, | ||
| pg_get_constraintdef(con.oid)::text AS check_expression, | ||
| string_agg(a.attname::text, ',' ORDER BY u.ord) AS columns | ||
| FROM pg_constraint con | ||
| JOIN pg_class c ON c.oid = con.conrelid | ||
| JOIN pg_namespace n ON n.oid = c.relnamespace | ||
| CROSS JOIN LATERAL unnest(con.conkey) WITH ORDINALITY AS u(attnum, ord) | ||
| JOIN pg_attribute a ON a.attrelid = con.conrelid AND a.attnum = u.attnum | ||
| WHERE con.contype = 'c' | ||
| AND c.relname = $1 | ||
| AND n.nspname = 'public' | ||
| GROUP BY con.conname, con.oid | ||
| ORDER BY con.conname | ||
| "#; | ||
|
|
||
| let check_rows = sqlx::query(check_sql) | ||
| .bind(table_name) | ||
| .fetch_all(pool) | ||
| .await?; | ||
|
|
||
| for row in check_rows { | ||
| let _constraint_name: String = row.get(0); | ||
| let raw_expression: String = row.get(1); | ||
| let columns_str: String = row.get(2); | ||
| let columns: Vec<String> = columns_str | ||
| .split(',') | ||
| .map(|s| s.trim().to_string()) | ||
| .collect(); | ||
|
|
||
| // pg_get_constraintdef returns "CHECK ((expression))" | ||
| // Strip the outer "CHECK (" prefix and ")" suffix to get the actual expression | ||
| let expression = raw_expression | ||
| .strip_prefix("CHECK (") | ||
| .and_then(|s| s.strip_suffix(')')) | ||
| .unwrap_or(&raw_expression) | ||
| .to_string(); | ||
|
|
||
| // PostgreSQL wraps expressions in extra parentheses, strip those too | ||
| let expression = expression | ||
| .strip_prefix('(') | ||
| .and_then(|s| s.strip_suffix(')')) | ||
| .unwrap_or(&expression) | ||
| .to_string(); | ||
|
|
||
| constraints.push(RawConstraintInfo::Check { | ||
| columns, | ||
| expression, | ||
| }); | ||
| } |
There was a problem hiding this comment.
The test coverage for PostgreSQL CHECK constraint expression parsing is insufficient. There are no tests verifying the double parenthesis stripping logic (lines 539-544). Specifically:
- No test for expressions with nested parentheses like "CHECK ((val >= 0) AND (val <= 100))"
- No test for expressions that should keep their parentheses like "CHECK ((a OR b) AND (c OR d))"
- No test for edge cases where the stripping logic could corrupt the expression
Adding these tests would help catch the current bug where the second strip could corrupt valid expressions (see related comment on lines 539-544).
| /// MySQL の CHECK 式からカラム名を推定する | ||
| /// | ||
| /// MySQL の check_clause にはバッククォートで囲まれたカラム名が含まれる。 | ||
| /// 例: "(`balance` >= 0)" -> ["balance"] |
There was a problem hiding this comment.
The _table_name parameter is prefixed with an underscore indicating it's intentionally unused, but the function signature doesn't explain why this parameter exists if it's not needed. This creates unnecessary coupling and could confuse future maintainers.
Consider either:
- Removing the parameter if it's genuinely not needed
- Documenting in the function comment why the parameter exists but isn't currently used (e.g., reserved for future enhancement)
- Using it if there's a valid use case (e.g., for table-qualified column name detection)
| /// 例: "(`balance` >= 0)" -> ["balance"] | |
| /// 例: "(`balance` >= 0)" -> ["balance"] | |
| /// `_table_name` は将来、テーブル修飾付きカラム名(例: `users`.`name`)の検出や | |
| /// 複数テーブル対応などの拡張で利用するために予約されている引数です。 | |
| /// 現在の MySQL 実装では未使用ですが、他方言の実装とのシグネチャ整合性のために保持しています。 |
| "CONSTRAINT", | ||
| ]; | ||
|
|
||
| let re = regex::Regex::new(r"\b([a-zA-Z_][a-zA-Z0-9_]*)\b").unwrap(); |
There was a problem hiding this comment.
The regular expression is compiled on every function call. Since extract_columns_from_sqlite_check is called once per CHECK constraint, this compilation overhead occurs repeatedly during schema introspection.
Consider using lazy_static or once_cell to compile the regex once at program startup:
use once_cell::sync::Lazy;
static IDENTIFIER_REGEX: Lazy<Regex> = Lazy::new(|| Regex::new(r"\b([a-zA-Z_][a-zA-Z0-9_]*)\b").unwrap());This would improve performance when introspecting tables with multiple CHECK constraints.
- Fix PostgreSQL expression parsing: use matching-paren check instead of blind double-strip to avoid corrupting expressions like "(val >= 0) AND (val <= 100)" - Fix SQLite CHECK parser: rewrite with character-by-character scanning to correctly handle CHECK keywords inside string literals and column names like "check_date" - Fix SQLite column extraction: strip string literals before parsing to avoid extracting words like 'pending' as column names; expand keyword list with CASE/WHEN/THEN/ELSE/END, data types, and common functions - Fix MySQL filter condition: use explicit boolean variables for clarity instead of relying on operator precedence - Fix MySQL backtick parsing: handle escaped backticks (``) in column identifiers - Remove unused _table_name parameter from extract_columns_from_check_expression() - Cache IDENTIFIER_REGEX with std::sync::LazyLock for performance - Add 12 new tests covering edge cases: string literal handling, paren stripping, escaped backticks, keyword filtering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdds CHECK constraint introspection across PostgreSQL, MySQL, and SQLite by parsing constraint definitions and associating extracted expressions and referenced columns with table metadata; introduces parsing/normalization helpers and comprehensive unit tests to validate extraction logic and edge cases. Changes
Sequence Diagram(s)mermaid Client->>Introspector: request schema introspection Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let check_sql = r#" | ||
| SELECT | ||
| con.conname::text, | ||
| pg_get_constraintdef(con.oid)::text AS check_expression, | ||
| string_agg(a.attname::text, ',' ORDER BY u.ord) AS columns | ||
| FROM pg_constraint con | ||
| JOIN pg_class c ON c.oid = con.conrelid | ||
| JOIN pg_namespace n ON n.oid = c.relnamespace | ||
| CROSS JOIN LATERAL unnest(con.conkey) WITH ORDINALITY AS u(attnum, ord) | ||
| JOIN pg_attribute a ON a.attrelid = con.conrelid AND a.attnum = u.attnum | ||
| WHERE con.contype = 'c' | ||
| AND c.relname = $1 | ||
| AND n.nspname = 'public' | ||
| GROUP BY con.conname, con.oid | ||
| ORDER BY con.conname | ||
| "#; |
There was a problem hiding this comment.
The PostgreSQL CHECK constraint query may fail for table-level CHECK constraints that don't reference specific columns. The query joins to pg_attribute via con.conkey, but conkey is NULL for table-level CHECK constraints (those not tied to specific columns). This would cause the query to return no rows for such constraints, silently dropping them from the export.
Consider using a LEFT JOIN and handling the NULL case:
- Change the CROSS JOIN LATERAL to LEFT JOIN LATERAL
- Handle NULL columns_str to return an empty array
- Or filter to only return column-level constraints if table-level constraints are not supported
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@src/db/src/adapters/database_introspector.rs`:
- Around line 1419-1493: The heuristic in extract_columns_from_sqlite_check is
excluding real column names like "date" and "time" because they’re in the
keyword list; change the approach to cross-reference extracted identifiers
against the actual table columns (from PRAGMA table_info) instead of relying
solely on the keywords array: update extract_columns_from_sqlite_check (or
create a wrapper) to accept the table name or a Vec<String> of real column
names, call PRAGMA table_info to get the column list, run IDENTIFIER_REGEX as
now, then filter the extracted words to only those present in the PRAGMA-derived
column list (case-insensitive match) and return that filtered Vec<String>; this
preserves existing parsing but avoids dropping valid columns like date/time.
- Around line 1153-1167: The code builds create_sql_query with string
interpolation using quoted_table which is unsafe/inconsistent; replace that with
a parameterized query (e.g. "SELECT sql FROM sqlite_master WHERE type = 'table'
AND name = ?") and use
sqlx::query(...).bind(table_name_variable).fetch_all(pool).await to bind the
table name instead of formatting it into the SQL; update the code around
create_sql_query / create_rows and ensure you pass the original unquoted table
identifier (or strip surrounding quotes from quoted_table) so
parse_sqlite_check_constraints(&sql) behavior is unchanged.
- Around line 918-928: The enum/NOT NULL filtering is too brittle: change the
logic around check_clause/trimmed to perform case-insensitive checks and broaden
the ENUM constraint match; compute a lowercase trimmed string (e.g., let lower =
check_clause.trim().to_lowercase()) and test lower.ends_with("is not null") for
NOT NULL, and detect ENUM auto-generated constraints by matching constraint_name
for the pattern "_chk_" followed by one or more digits (not just "_chk_1") and
that lower.contains("in ("); update the conditions that set is_not_null_check
and is_enum_validation (referencing check_clause, trimmed/lower,
is_not_null_check, is_enum_validation, and constraint_name) so all _chk_N
variants and any casing of "IS NOT NULL" are filtered.
- Around line 515-562: The CHECK-query currently INNER JOINs pg_attribute via
unnest(con.conkey) so constraints with empty conkey (expression-only CHECKs) are
omitted; change the SQL in check_sql to use LEFT JOIN LATERAL unnest(con.conkey)
WITH ORDINALITY AS u(attnum, ord) and LEFT JOIN pg_attribute a ON a.attrelid =
con.conrelid AND a.attnum = u.attnum, then COALESCE the aggregated columns
(string_agg) to an empty string or NULL so that columns_str can be interpreted
as no columns; update the parsing in the loop around check_rows (where
columns_str is read and split into Vec<String>) to handle NULL/empty by
producing an empty Vec (e.g., treat empty string or NULL as Vec::new()) so
RawConstraintInfo::Check is emitted for expression-only checks.
🧹 Nitpick comments (1)
src/db/src/adapters/database_introspector.rs (1)
1211-1242:strip_outer_parensdoes not account for parentheses inside string literals.If a PostgreSQL CHECK expression contains string literals with unbalanced parentheses (e.g.,
(status != 'foo)')), the depth tracking would break. In practice this is unlikely sincepg_get_constraintdef()produces well-formed expressions, but it's worth noting as a known limitation.
- PostgreSQL: use LEFT JOIN LATERAL instead of CROSS JOIN for expression-only CHECK constraints where conkey is empty - PostgreSQL: add COALESCE to handle NULL from string_agg when no columns are referenced - MySQL: broaden _chk_1 filter to _chk_N pattern for all auto-generated ENUM validation constraints - MySQL: use case-insensitive matching via to_lowercase() - SQLite: replace string interpolation with parameterized bind query for sqlite_master lookup - Remove data type keywords (INTEGER, REAL, TEXT, etc.) from SQLite column extraction filter to avoid false negatives for columns named after types Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/db/src/adapters/database_introspector.rs`:
- Around line 944-950: The MySQL check_clause is left with outer parentheses;
update the block that builds the RawConstraintInfo::Check so the expression is
normalized by calling strip_outer_parens on check_clause before storing it:
compute a normalized_expression = strip_outer_parens(&check_clause) (or
equivalent) and pass that as the expression in RawConstraintInfo::Check while
still using extract_columns_from_check_expression(&check_clause) to get columns;
adjust references around extract_columns_from_check_expression, check_clause,
and RawConstraintInfo::Check accordingly.
🧹 Nitpick comments (2)
src/db/src/adapters/database_introspector.rs (2)
1167-1181: Consider usingfetch_optionalfor consistency with line 1021.Lines 1020-1024 already query
sqlite_masterfor the same table usingfetch_optional. Herefetch_all+.first()works but is slightly inconsistent.Suggested change
- let create_rows = sqlx::query(create_sql_query) + let create_row = sqlx::query(create_sql_query) .bind(table_name) - .fetch_all(pool) + .fetch_optional(pool) .await?; - if let Some(row) = create_rows.first() { + if let Some(row) = create_row { let create_sql: Option<String> = row.get(0);
1846-2045: Good test coverage for the new helper functions.Tests cover the key scenarios including edge cases like escaped quotes, nested parentheses, keyword filtering, and word boundary detection. Consider adding a test for column-level CHECK constraints (e.g.,
CREATE TABLE t (val INTEGER CHECK (val > 0), name TEXT)) to verify the parser handles both inline and table-level CHECKs.
| // バッククォートで囲まれたカラム名を抽出 | ||
| let columns = extract_columns_from_check_expression(&check_clause); | ||
|
|
||
| constraints.push(RawConstraintInfo::Check { | ||
| columns, | ||
| expression: check_clause, | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Step 1: Examine PostgreSQL check constraint handling (around line 561)
echo "=== PostgreSQL CHECK handling (lines 550-570) ==="
sed -n '550,570p' src/db/src/adapters/database_introspector.rs
# Step 2: Examine SQLite check constraint handling (around line 1404)
echo -e "\n=== SQLite CHECK handling (lines 1390-1410) ==="
sed -n '1390,1410p' src/db/src/adapters/database_introspector.rs
# Step 3: Find strip_outer_parens function
echo -e "\n=== Searching for strip_outer_parens function ==="
rg -n "fn strip_outer_parens|strip_outer_parens" src/db/src/adapters/database_introspector.rs | head -20
# Step 4: Look at the full MySQL check constraint handling
echo -e "\n=== MySQL CHECK handling (lines 920-960) ==="
sed -n '920,960p' src/db/src/adapters/database_introspector.rsRepository: Lazialize/strata
Length of output: 4074
🏁 Script executed:
#!/bin/bash
# Step 5: Search for downstream check expression comparisons or normalization
echo "=== Searching for check expression usage patterns ==="
rg -n "check_expression\|Check\s*\{" src/db/src/adapters/ -A 2 | head -40
echo -e "\n=== Searching for RawConstraintInfo::Check usage ==="
rg -n "RawConstraintInfo::Check" src/db/src/ -B 2 -A 2 | head -50Repository: Lazialize/strata
Length of output: 3938
🏁 Script executed:
#!/bin/bash
# Step 6: Check if there's any normalization applied to check expressions during export or comparison
echo "=== Searching for check expression processing in constraint handling ==="
rg -n "expression" src/db/src/adapters/database_introspector.rs | grep -i check
echo -e "\n=== Check how constraints are used in diffs/comparisons ==="
fd -t f "\.rs$" src/db/src/table_diff/ -exec grep -l "check\|Check" {} \;Repository: Lazialize/strata
Length of output: 1783
🏁 Script executed:
#!/bin/bash
# Step 1: Look at strip_outer_parens implementation
echo "=== strip_outer_parens function (lines 1229-1250) ==="
sed -n '1229,1250p' src/db/src/adapters/database_introspector.rs
# Step 2: Check for MySQL-specific test cases that show check_clause format
echo -e "\n=== MySQL check constraint tests ==="
rg -n "test.*mysql.*check|mysql_get_checks" src/db/src/adapters/database_introspector.rs -A 10 | head -80
# Step 3: Look at the actual test assertion for MySQL
echo -e "\n=== MySQL CHECK test assertions ==="
rg -n "1850|1851|1852|1853|1854|1855" src/db/src/adapters/database_introspector.rs -B 5 -A 10Repository: Lazialize/strata
Length of output: 2243
🏁 Script executed:
#!/bin/bash
# Step 4: Check line 1877 and surrounding test to see what format SQLite actually returns
echo "=== SQLite CHECK test (around line 1874-1878) ==="
sed -n '1870,1880p' src/db/src/adapters/database_introspector.rs
# Step 5: Search for any documentation or comments about MySQL information_schema.check_constraints
echo -e "\n=== Comments about MySQL check_clause format ==="
rg -n "information_schema.*check|check_clause" src/db/src/adapters/database_introspector.rs | head -30Repository: Lazialize/strata
Length of output: 1144
🏁 Script executed:
#!/bin/bash
# Step 6: Look for how constraint expressions are actually used/compared downstream
echo "=== Constraint expression usage in schema conversion ==="
sed -n '80,95p' src/db/src/services/schema_conversion/converters.rs
echo -e "\n=== Check conversion and usage ==="
rg -n "RawConstraintInfo::Check.*=>" src/db/src/services/schema_conversion/ -A 5 | head -30Repository: Lazialize/strata
Length of output: 596
🏁 Script executed:
#!/bin/bash
# Look for any MySQL integration tests or fixtures that show actual check_clause data
echo "=== Searching for MySQL test fixtures or mock data ==="
find src -name "*.rs" -type f | xargs grep -l "information_schema.*check_constraints\|check_clause.*balance\|check_clause.*>=" | head -10
# Look for any comments in the MySQL section about format
echo -e "\n=== MySQL section comments (lines 895-920) ==="
sed -n '895,920p' src/db/src/adapters/database_introspector.rs
# Check if there are any test helper functions that mock MySQL data
echo -e "\n=== Searching for test helpers with MySQL check data ==="
rg -n "fn.*mock.*mysql|fn.*test.*mysql.*check" src/db/src/adapters/database_introspector.rs -A 15 | head -50Repository: Lazialize/strata
Length of output: 2827
🏁 Script executed:
#!/bin/bash
# Search the entire codebase for any reference to MySQL check_clause format or parentheses
echo "=== Searching for any MySQL documentation or format comments ==="
rg -n "MySQL.*check|check_clause.*MySQL|information_schema.*check" src/ -C 2 | head -60
# Check if constraint comparison logic exists that might be affected
echo -e "\n=== Searching for constraint comparison/equality logic ==="
rg -n "impl.*Eq\|impl.*PartialEq|==.*Check\|check.*==" src/db/src/services/ -B 2 -A 5 | head -40Repository: Lazialize/strata
Length of output: 1479
🏁 Script executed:
#!/bin/bash
# Get the full comment and function for extract_columns_from_check_expression
echo "=== extract_columns_from_check_expression function and comments (lines 110-130) ==="
sed -n '110,130p' src/db/src/adapters/database_introspector.rs
# Also check if there's any MySQL-specific normalization elsewhere
echo -e "\n=== Searching for any downstream normalization of check expressions ==="
rg -n "expression.*trim\|expression.*strip\|expression.*to_string" src/db/src/services/schema_conversion/ -B 2 -A 2Repository: Lazialize/strata
Length of output: 1001
🏁 Script executed:
#!/bin/bash
# Check the test case more carefully - line 1876-1878 shows SQLite stores inner parens
echo "=== SQLite vs PostgreSQL format comparison from tests ==="
echo "SQLite test (line 1876-1878):"
sed -n '1876,1878p' src/db/src/adapters/database_introspector.rs
echo ""
echo "PostgreSQL test expectations:"
rg -n "test.*postgres.*check|test_parse_postgres_check" src/db/src/adapters/database_introspector.rs -A 10 | head -25Repository: Lazialize/strata
Length of output: 324
🏁 Script executed:
#!/bin/bash
# Find PostgreSQL check constraint test
echo "=== Searching for PostgreSQL check constraint test ==="
rg -n "test.*postgres.*check\|fn test.*check" src/db/src/adapters/database_introspector.rs -B 2 -A 15 | grep -A 15 "postgres\|pg_get"
# Look at the example in the MySQL comment more carefully
echo -e "\n=== MySQL check_clause comment example (lines 110-115) ==="
sed -n '110,115p' src/db/src/adapters/database_introspector.rs
# Search for any actual test data that shows what MySQL returns
echo -e "\n=== Searching for test cases with CHECK expressions ==="
rg -n "CHECK.*\|check.*expression.*=" src/db/src/adapters/database_introspector.rs | head -20Repository: Lazialize/strata
Length of output: 464
🏁 Script executed:
#!/bin/bash
# Let's see if there's real MySQL test data anywhere in the tests
echo "=== MySQL database introspection tests ==="
rg -n "test.*mysql.*constraint\|fn test.*mysql" src/db/src/adapters/database_introspector.rs -B 2 -A 10 | head -60Repository: Lazialize/strata
Length of output: 102
🏁 Script executed:
#!/bin/bash
# The comment at line 114 shows the format - let's verify the exact example
echo "=== Full context of MySQL check_clause format documentation ==="
sed -n '110,115p' src/db/src/adapters/database_introspector.rs
# Let's trace through what strip_outer_parens does with this format
echo -e "\n=== strip_outer_parens logic for MySQL format ==="
echo "Example: (\`balance\` >= 0)"
echo "- Starts with '(' and ends with ')': YES"
echo "- Would strip to: \`balance\` >= 0"
echo ""
# Verify the inconsistency: check what each dialect actually stores
echo "=== Expression formats stored across dialects ==="
echo "PostgreSQL (from test/code):"
echo " Raw: CHECK ((expression))"
echo " After strip_prefix + strip_outer_parens: expression"
echo ""
echo "SQLite (from test at 1876-1878):"
echo " SQL: CHECK ((val >= 0) AND (val <= 100))"
echo " Stored: (val >= 0) AND (val <= 100)"
echo ""
echo "MySQL (from comment example at 114):"
echo " check_clause: (\`balance\` >= 0)"
echo " Stored: (\`balance\` >= 0) [OUTER PARENS RETAINED]"Repository: Lazialize/strata
Length of output: 907
MySQL CHECK expressions retain outer parentheses, unlike PostgreSQL and SQLite.
MySQL's information_schema.check_constraints.check_clause includes surrounding parentheses (documented format at line 114: "(balance >= 0)"), while PostgreSQL (line 561) and SQLite (line 1405) normalize by stripping outer parentheses. This creates inconsistent expression formats across dialects that could affect downstream constraint comparison or export.
Apply strip_outer_parens to normalize MySQL expressions:
Proposed fix
// バッククォートで囲まれたカラム名を抽出
let columns = extract_columns_from_check_expression(&check_clause);
+ let expression = strip_outer_parens(&check_clause);
constraints.push(RawConstraintInfo::Check {
columns,
- expression: check_clause,
+ expression,
});🤖 Prompt for AI Agents
In `@src/db/src/adapters/database_introspector.rs` around lines 944 - 950, The
MySQL check_clause is left with outer parentheses; update the block that builds
the RawConstraintInfo::Check so the expression is normalized by calling
strip_outer_parens on check_clause before storing it: compute a
normalized_expression = strip_outer_parens(&check_clause) (or equivalent) and
pass that as the expression in RawConstraintInfo::Check while still using
extract_columns_from_check_expression(&check_clause) to get columns; adjust
references around extract_columns_from_check_expression, check_clause, and
RawConstraintInfo::Check accordingly.
Merge origin/main into fix/24-check-constraint-export, keeping both CHECK constraint introspection code and MySQL dialect-specific type helper functions (parse_mysql_set_values, is_mysql_unsigned) with their respective tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // バッククォートで囲まれたカラム名を抽出 | ||
| let columns = extract_columns_from_check_expression(&check_clause); | ||
|
|
||
| constraints.push(RawConstraintInfo::Check { | ||
| columns, | ||
| expression: check_clause, | ||
| }); |
There was a problem hiding this comment.
MySQLの check_clause は多くの場合 (... ) の形で返るため、そのまま expression に入れると他方言(Postgres/SQLite)と表現が揃わず、schema checksumや差分判定で不要な差分が出やすくなります。strip_outer_parens 等で外側1ペアだけ剥がして正規化してから保存することを検討してください(フィルタ条件も同じ正規化後の文字列で評価すると一貫します)。
| let create_rows = sqlx::query(create_sql_query) | ||
| .bind(table_name) | ||
| .fetch_all(pool) | ||
| .await?; | ||
|
|
||
| if let Some(row) = create_rows.first() { |
There was a problem hiding this comment.
SQLiteのCREATE TABLE SQL取得が fetch_all() + first() になっていて、1件しか返らないクエリに対して不要な割り当てが発生します。fetch_optional() を使うと意図が明確になり、少しだけ効率も良くなります。
| let create_rows = sqlx::query(create_sql_query) | |
| .bind(table_name) | |
| .fetch_all(pool) | |
| .await?; | |
| if let Some(row) = create_rows.first() { | |
| let create_row = sqlx::query(create_sql_query) | |
| .bind(table_name) | |
| .fetch_optional(pool) | |
| .await?; | |
| if let Some(row) = create_row { |
| /// テーブルレベルのCHECK制約を抽出する。 | ||
| /// 文字列リテラル('...')およびダブルクォート識別子("...")内の CHECK は無視する。 | ||
| /// 例: `CREATE TABLE t (id INTEGER, balance REAL, CHECK (balance >= 0))` |
There was a problem hiding this comment.
この関数のコメントは「テーブルレベルのCHECK制約を抽出」とありますが、実装はCREATE TABLE文中の CHECK ( を広く拾うため、カラム定義内のCHECKも抽出対象になります。実装に合わせてコメントを「テーブル/カラム両方のCHECKを抽出」などに更新した方が誤解が減ります。
| /// テーブルレベルのCHECK制約を抽出する。 | |
| /// 文字列リテラル('...')およびダブルクォート識別子("...")内の CHECK は無視する。 | |
| /// 例: `CREATE TABLE t (id INTEGER, balance REAL, CHECK (balance >= 0))` | |
| /// CREATE TABLE 文中の CHECK 制約(テーブルレベルおよびカラム定義内の両方)を抽出する。 | |
| /// 文字列リテラル('...')およびダブルクォート識別子("...")内の CHECK は無視する。 | |
| /// 例(テーブルレベル): `CREATE TABLE t (id INTEGER, balance REAL, CHECK (balance >= 0))` | |
| /// 例(カラムレベル) : `CREATE TABLE t (id INTEGER CHECK (id > 0), balance REAL)` |
| /// 文字列リテラル('...')内の単語は無視し、 | ||
| /// SQLキーワード・関数名・データ型を除外して識別子を抽出する。 | ||
| fn extract_columns_from_sqlite_check(expression: &str) -> Vec<String> { | ||
| // 文字列リテラルを除去してからパース | ||
| let stripped = strip_string_literals(expression); | ||
|
|
There was a problem hiding this comment.
extract_columns_from_sqlite_check() は「関数名・データ型を除外」とコメントされていますが、現在の keywords には主要なSQLite型名(INTEGER/TEXT/REAL/NUMERIC 等)が含まれていないため、CAST(x AS INTEGER) のような一般的な式で型名が“カラム”として誤検出されます。可能なら PRAGMA table_info で取得した実カラム名集合に対してだけフィルタする(または型名をキーワードに追加する)形にすると、exportの columns が汚染されにくくなります。
| let expression = raw_expression | ||
| .strip_prefix("CHECK (") | ||
| .and_then(|s| s.strip_suffix(')')) | ||
| .unwrap_or(&raw_expression) |
There was a problem hiding this comment.
PostgreSQLのCHECK制約定義は CHECK (...) NOT VALID / NO INHERIT のように末尾に追加トークンが付くことがあり、その場合 strip_suffix(')') が失敗して CHECK ( 付きの文字列がそのまま expression に入ってしまいます。CHECK ( の直後の括弧ペアをバランス取りで抽出してから strip_outer_parens する形にすると、末尾オプションがあっても安定して式だけを取り出せます(併せてユニットテスト追加推奨)。
| // 1. NOT NULL チェック(ENUM カラムに自動付与される) | ||
| // 2. ENUM バリデーション(_chk_N の名前で IN (...) を含む) | ||
| let lower = check_clause.trim().to_lowercase(); | ||
| let is_not_null_check = lower.ends_with("is not null"); |
There was a problem hiding this comment.
MySQLの自動生成NOT NULLチェックは (�col� is not null) のように全体が括弧で包まれていることが多く、ends_with("is not null") だとフィルタリングできません。外側の括弧と末尾の空白を剥がした上で判定するか、contains(" is not null") + 厳密なトークン境界チェック等に変更すると意図通り除外できます。
| let is_not_null_check = lower.ends_with("is not null"); | |
| // MySQL の自動生成 NOT NULL チェックは全体が括弧で囲まれていることがあるため、 | |
| // 外側の括弧と末尾の空白を取り除いた上で判定する。 | |
| let mut normalized = lower.trim(); | |
| while normalized.starts_with('(') && normalized.ends_with(')') { | |
| normalized = normalized[1..normalized.len() - 1].trim(); | |
| } | |
| let is_not_null_check = normalized.ends_with(" is not null"); |
- MySQL: apply strip_outer_parens to normalize CHECK expressions, matching PostgreSQL and SQLite behavior for cross-dialect consistency - MySQL: unwrap outer parentheses before NOT NULL / ENUM filter checks, handling (`col` is not null) format from information_schema - PostgreSQL: replace strip_prefix/strip_suffix with bracket-balanced extract_pg_check_expression() to handle "CHECK (...) NOT VALID" and "CHECK (...) NO INHERIT" suffixes - SQLite: use fetch_optional() instead of fetch_all() + first() for sqlite_master query (single row expected) - SQLite: update parse_sqlite_check_constraints doc comment to reflect both table-level and column-level CHECK constraint extraction - Add 5 tests for extract_pg_check_expression (simple, NOT VALID, NO INHERIT, complex nested, no prefix) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // MySQL の自動生成制約をフィルタリング: | ||
| // 1. NOT NULL チェック(ENUM カラムに自動付与される) | ||
| // 2. ENUM バリデーション(_chk_N の名前で IN (...) を含む) | ||
| // NOT NULL チェックは `(`col` is not null)` のように括弧で囲まれることがあるため、 | ||
| // 外側の括弧を剥がした上で判定する | ||
| let lower = check_clause.trim().to_lowercase(); | ||
| let mut normalized = lower.as_str(); | ||
| while normalized.starts_with('(') && normalized.ends_with(')') { | ||
| let inner = &normalized[1..normalized.len() - 1]; | ||
| // strip_outer_parens と同様に、先頭の ( と末尾の ) が対応しているか確認 | ||
| let mut depth = 0i32; | ||
| let mut matched = true; | ||
| for (i, ch) in inner.char_indices() { | ||
| match ch { | ||
| '(' => depth += 1, | ||
| ')' => { | ||
| depth -= 1; | ||
| if depth < 0 && i < inner.len() - 1 { | ||
| matched = false; | ||
| break; | ||
| } | ||
| } | ||
| _ => {} | ||
| } | ||
| } | ||
| if matched && depth == 0 { | ||
| normalized = inner.trim(); | ||
| } else { | ||
| break; | ||
| } | ||
| } | ||
| let is_not_null_check = normalized.ends_with("is not null"); | ||
| let is_enum_validation = { | ||
| // MySQL は _chk_1, _chk_2, ... の名前で ENUM バリデーションを自動生成する | ||
| let has_chk_suffix = constraint_name | ||
| .rfind("_chk_") | ||
| .map(|pos| { | ||
| constraint_name[pos + 5..] | ||
| .chars() | ||
| .all(|c| c.is_ascii_digit()) | ||
| }) | ||
| .unwrap_or(false); | ||
| has_chk_suffix && (normalized.contains("in (") || normalized.contains("in(")) | ||
| }; | ||
| if is_not_null_check || is_enum_validation { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
MySQLの自動生成NOT NULLチェック判定が normalized.ends_with("is not null") だけだと過剰にマッチします。ユーザー定義のCHECK(例: (a > 0) AND (b IS NOT NULL) や末尾が IS NOT NULL で終わる式)まで除外してしまい、今回修正した「CHECKがexportで落ちる」問題を再発させる可能性があります。NOT NULL除外は制約名パターン等(自動生成に限定できる条件)で絞るか、式全体が単純な <col> IS NOT NULL の形のときだけ除外するようにしてください。
| /// SQLite CHECK式からカラム名を推定する | ||
| /// | ||
| /// 文字列リテラル('...')内の単語は無視し、 | ||
| /// SQLキーワード・関数名・データ型を除外して識別子を抽出する。 | ||
| fn extract_columns_from_sqlite_check(expression: &str) -> Vec<String> { | ||
| // 文字列リテラルを除去してからパース | ||
| let stripped = strip_string_literals(expression); | ||
|
|
||
| let keywords = [ | ||
| // 論理演算子・比較・制御構文 | ||
| "AND", | ||
| "OR", | ||
| "NOT", | ||
| "IN", | ||
| "IS", | ||
| "LIKE", | ||
| "BETWEEN", | ||
| "EXISTS", | ||
| "CASE", | ||
| "WHEN", | ||
| "THEN", | ||
| "ELSE", | ||
| "END", | ||
| // リテラル・真偽値 | ||
| "NULL", | ||
| "TRUE", | ||
| "FALSE", | ||
| // 関数 | ||
| "LENGTH", | ||
| "LOWER", | ||
| "UPPER", | ||
| "SUBSTR", | ||
| "ABS", | ||
| "ROUND", | ||
| "COALESCE", | ||
| "IFNULL", | ||
| "NULLIF", | ||
| "TRIM", | ||
| "LTRIM", | ||
| "RTRIM", | ||
| "MIN", | ||
| "MAX", | ||
| "AVG", | ||
| "COUNT", | ||
| "SUM", | ||
| "RANDOM", | ||
| "CHAR", | ||
| "HEX", | ||
| // その他 | ||
| "AS", | ||
| "CAST", | ||
| "COLLATE", | ||
| "GLOB", | ||
| "MATCH", | ||
| "REGEXP", | ||
| "CHECK", | ||
| "CONSTRAINT", | ||
| ]; | ||
|
|
||
| let mut columns = Vec::new(); | ||
|
|
||
| for cap in IDENTIFIER_REGEX.captures_iter(&stripped) { | ||
| let word = &cap[1]; | ||
| let upper = word.to_uppercase(); | ||
| if !keywords.contains(&upper.as_str()) && !columns.contains(&word.to_string()) { | ||
| columns.push(word.to_string()); | ||
| } | ||
| } |
There was a problem hiding this comment.
extract_columns_from_sqlite_check() のコメントでは「SQLキーワード・関数名・データ型を除外」とありますが、keywords に SQLite の代表的な型名(INTEGER/TEXT/REAL/BLOB/NUMERIC など)が含まれていないため、CAST(x AS INTEGER) のような式だと INTEGER をカラム名として誤抽出する可能性があります。型名も除外リストに含める、または CAST(... AS <type>) パターンでは <type> を除外するなどして、columns 推定が誤らないようにしてください。
- MySQL: restrict NOT NULL filter to simple single-column expressions only (e.g. `col IS NOT NULL`), preventing compound expressions like `(a > 0) AND (b IS NOT NULL)` from being incorrectly filtered out - SQLite: detect CAST(... AS <type>) pattern and exclude the type name from column extraction, avoiding false positives like "INTEGER" being treated as a column name without adding data types to keyword list - Add 9 tests: CAST type exclusion, date column preservation, and 7 MySQL filter tests covering NOT NULL (simple, no-parens, compound), ENUM validation (_chk_2, _chk_3), and user-defined constraint preservation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let check_sql = r#" | ||
| SELECT | ||
| con.conname::text, | ||
| pg_get_constraintdef(con.oid)::text AS check_expression, | ||
| COALESCE(string_agg(a.attname::text, ',' ORDER BY u.ord), '') AS columns | ||
| FROM pg_constraint con | ||
| JOIN pg_class c ON c.oid = con.conrelid | ||
| JOIN pg_namespace n ON n.oid = c.relnamespace | ||
| LEFT JOIN LATERAL unnest(con.conkey) WITH ORDINALITY AS u(attnum, ord) ON true | ||
| LEFT JOIN pg_attribute a ON a.attrelid = con.conrelid AND a.attnum = u.attnum | ||
| WHERE con.contype = 'c' | ||
| AND c.relname = $1 | ||
| AND n.nspname = 'public' | ||
| GROUP BY con.conname, con.oid | ||
| ORDER BY con.conname | ||
| "#; | ||
|
|
||
| let check_rows = sqlx::query(check_sql) | ||
| .bind(table_name) | ||
| .fetch_all(pool) | ||
| .await?; | ||
|
|
||
| for row in check_rows { | ||
| let _constraint_name: String = row.get(0); | ||
| let raw_expression: String = row.get(1); | ||
| let columns_str: String = row.get(2); | ||
| let columns: Vec<String> = if columns_str.is_empty() { | ||
| Vec::new() | ||
| } else { | ||
| columns_str | ||
| .split(',') | ||
| .map(|s| s.trim().to_string()) | ||
| .collect() | ||
| }; |
There was a problem hiding this comment.
PostgreSQL の CHECK 制約取得で columns を string_agg(..., ',') してから split(',') していますが、PostgreSQL の識別子はカンマを含められるため、カラム名に ',' が入ると誤分割します。配列デコード回避が目的なら、区切り文字を出現しにくい文字(例: ASCII Unit Separator)にするか、json_agg で JSON 配列として返して serde_json でデコードする方式にすると安全です。
| // 単純な `<col> is not null` パターンのみをフィルタ(複合式は除外しない) | ||
| // 例: "`col` is not null" → フィルタ, "`a` > 0 and `b` is not null" → 保持 | ||
| let is_not_null_check = { | ||
| let trimmed_norm = normalized.trim(); | ||
| trimmed_norm.ends_with("is not null") | ||
| && !trimmed_norm.contains(" and ") | ||
| && !trimmed_norm.contains(" or ") | ||
| }; | ||
| let is_enum_validation = { | ||
| // MySQL は _chk_1, _chk_2, ... の名前で ENUM バリデーションを自動生成する | ||
| let has_chk_suffix = constraint_name | ||
| .rfind("_chk_") | ||
| .map(|pos| { | ||
| constraint_name[pos + 5..] | ||
| .chars() | ||
| .all(|c| c.is_ascii_digit()) | ||
| }) | ||
| .unwrap_or(false); | ||
| has_chk_suffix && (normalized.contains("in (") || normalized.contains("in(")) | ||
| }; | ||
| if is_not_null_check || is_enum_validation { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
MySQL の自動生成 CHECK フィルタで is_not_null_check は constraint_name に関係なく ... IS NOT NULL だけで除外しています。ユーザーが意図的に CHECK (col IS NOT NULL) を作っている場合も export から消えてしまうので、フィルタ対象を MySQL の自動生成っぽい名前(例: _chk_\d+)に限定する等、条件をもう少し絞った方が安全です。
| fn extract_columns_from_sqlite_check(expression: &str) -> Vec<String> { | ||
| // 文字列リテラルを除去してからパース | ||
| let stripped = strip_string_literals(expression); | ||
|
|
||
| let keywords = [ | ||
| // 論理演算子・比較・制御構文 | ||
| "AND", | ||
| "OR", | ||
| "NOT", | ||
| "IN", | ||
| "IS", | ||
| "LIKE", | ||
| "BETWEEN", | ||
| "EXISTS", | ||
| "CASE", | ||
| "WHEN", | ||
| "THEN", | ||
| "ELSE", | ||
| "END", | ||
| // リテラル・真偽値 | ||
| "NULL", | ||
| "TRUE", | ||
| "FALSE", | ||
| // 関数 | ||
| "LENGTH", | ||
| "LOWER", | ||
| "UPPER", | ||
| "SUBSTR", | ||
| "ABS", | ||
| "ROUND", | ||
| "COALESCE", | ||
| "IFNULL", | ||
| "NULLIF", | ||
| "TRIM", | ||
| "LTRIM", | ||
| "RTRIM", | ||
| "MIN", | ||
| "MAX", | ||
| "AVG", | ||
| "COUNT", | ||
| "SUM", | ||
| "RANDOM", | ||
| "CHAR", | ||
| "HEX", | ||
| // その他 | ||
| "AS", | ||
| "CAST", | ||
| "COLLATE", | ||
| "GLOB", | ||
| "MATCH", | ||
| "REGEXP", | ||
| "CHECK", | ||
| "CONSTRAINT", | ||
| ]; | ||
|
|
||
| // CAST(... AS <type>) パターンで <type> の位置を収集 | ||
| // 例: "CAST(x AS INTEGER)" → "INTEGER" のバイト開始位置を記録し、カラム名から除外する | ||
| // これによりデータ型名をキーワードリストに含めずとも、CAST 式内の型名を安全に除外できる | ||
| let upper_stripped = stripped.to_uppercase(); | ||
| let mut cast_type_positions = std::collections::HashSet::new(); | ||
| for m in upper_stripped.match_indices(" AS ") { | ||
| let after_as = m.0 + m.1.len(); | ||
| // AS の直後の空白をスキップ | ||
| let type_start = upper_stripped[after_as..] | ||
| .find(|c: char| !c.is_whitespace()) | ||
| .map(|p| after_as + p) | ||
| .unwrap_or(after_as); | ||
| cast_type_positions.insert(type_start); | ||
| } | ||
|
|
||
| let mut columns = Vec::new(); | ||
|
|
||
| for cap in IDENTIFIER_REGEX.captures_iter(&stripped) { | ||
| let word = &cap[1]; | ||
| let upper = word.to_uppercase(); | ||
| let start = cap.get(1).unwrap().start(); | ||
| if keywords.contains(&upper.as_str()) | ||
| || columns.contains(&word.to_string()) | ||
| || cast_type_positions.contains(&start) | ||
| { | ||
| continue; | ||
| } | ||
| columns.push(word.to_string()); | ||
| } | ||
|
|
||
| columns |
There was a problem hiding this comment.
SQLite の extract_columns_from_sqlite_check() は正規表現で識別子っぽい単語を拾ってキーワード配列で除外していますが、関数名(例: STRFTIME/JSON_VALID/INSTR 等)がキーワードに無い場合に「カラム」と誤認して columns に入ります。その結果、export した YAML が validate_constraint_references で「存在しないカラム参照」として弾かれる可能性があります。テーブルの実カラム名一覧を引数で渡して、その集合に含まれるものだけを columns に採用する方式にすると誤検出を避けられます。
| for cap in IDENTIFIER_REGEX.captures_iter(&stripped) { | ||
| let word = &cap[1]; | ||
| let upper = word.to_uppercase(); | ||
| let start = cap.get(1).unwrap().start(); | ||
| if keywords.contains(&upper.as_str()) | ||
| || columns.contains(&word.to_string()) | ||
| || cast_type_positions.contains(&start) | ||
| { | ||
| continue; | ||
| } | ||
| columns.push(word.to_string()); | ||
| } |
There was a problem hiding this comment.
extract_columns_from_sqlite_check() の重複排除が columns.contains(&word.to_string()) になっており、ループ毎に String を確保します。columns.iter().any(|c| c == word) にするか、HashSet で重複排除して最後に Vec に戻すと無駄な割り当てを避けられます。
- MySQL NOT NULL filter now requires _chk_N naming pattern to avoid filtering user-defined CHECK (col IS NOT NULL) constraints - PostgreSQL string_agg delimiter changed from comma to Unit Separator (U+001F) to safely handle column names containing commas - SQLite column extraction now uses PRAGMA table_info column names instead of keyword heuristic, eliminating false positives from function names and data type keywords - Fixed unnecessary String allocation in SQLite column deduplication Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| static IDENTIFIER_REGEX: LazyLock<Regex> = | ||
| LazyLock::new(|| Regex::new(r"\b([a-zA-Z_][a-zA-Z0-9_]*)\b").unwrap()); |
There was a problem hiding this comment.
IDENTIFIER_REGEX が ASCII のみを対象にしているため、SQLite の CHECK 式中に UTF-8 識別子(例: "名前")が含まれる場合に extract_columns_from_sqlite_check() でカラム名を抽出できず、columns が空になり得ます。SQLite/CLI 側では UTF-8 識別子の利用を想定しているようなので、Unicode 対応の識別子パターン(例: XID_Start/XID_Continue 等)にするか、table_columns を元に Unicode 対応の探索ロジックに置き換えることを検討してください。
| static IDENTIFIER_REGEX: LazyLock<Regex> = | |
| LazyLock::new(|| Regex::new(r"\b([a-zA-Z_][a-zA-Z0-9_]*)\b").unwrap()); | |
| /// Unicode 識別子 (XID_Start/XID_Continue) とアンダースコアを許可する。 | |
| static IDENTIFIER_REGEX: LazyLock<Regex> = | |
| LazyLock::new(|| Regex::new(r"\b([\p{XID_Start}_][\p{XID_Continue}_]*)\b").unwrap()); |
| /// MySQL の NOT NULL / ENUM フィルタロジックを再現するヘルパー | ||
| fn should_filter_mysql_check(constraint_name: &str, check_clause: &str) -> bool { | ||
| let lower = check_clause.trim().to_lowercase(); | ||
| let mut normalized = lower.as_str(); | ||
| loop { | ||
| if normalized.starts_with('(') && normalized.ends_with(')') { | ||
| let inner = &normalized[1..normalized.len() - 1]; | ||
| let mut depth = 0i32; | ||
| let mut matched = true; | ||
| for (i, ch) in inner.char_indices() { | ||
| match ch { | ||
| '(' => depth += 1, | ||
| ')' => { | ||
| depth -= 1; | ||
| if depth < 0 && i < inner.len() - 1 { | ||
| matched = false; | ||
| break; | ||
| } | ||
| } | ||
| _ => {} | ||
| } | ||
| } | ||
| if matched && depth == 0 { | ||
| normalized = inner.trim(); | ||
| } else { | ||
| break; | ||
| } | ||
| } else { | ||
| break; | ||
| } | ||
| } | ||
| let has_chk_suffix = constraint_name | ||
| .rfind("_chk_") | ||
| .map(|pos| { | ||
| constraint_name[pos + 5..] | ||
| .chars() | ||
| .all(|c| c.is_ascii_digit()) | ||
| }) | ||
| .unwrap_or(false); | ||
| let is_not_null_check = { | ||
| let trimmed_norm = normalized.trim(); | ||
| has_chk_suffix | ||
| && trimmed_norm.ends_with("is not null") | ||
| && !trimmed_norm.contains(" and ") | ||
| && !trimmed_norm.contains(" or ") | ||
| }; | ||
| let is_enum_validation = | ||
| has_chk_suffix && (normalized.contains("in (") || normalized.contains("in(")); | ||
| is_not_null_check || is_enum_validation | ||
| } |
There was a problem hiding this comment.
テスト内の should_filter_mysql_check() が本体のフィルタロジックを“再実装”しているため、将来本体側だけ変更された場合にテストが実装差分を検知できず、逆にテストだけが古くなって誤検知/見逃しの原因になります。フィルタ判定を非テストのヘルパー関数として切り出して get_constraints() とテストの両方から呼ぶ形にすると、テストが実装を直接カバーできます。
Change IDENTIFIER_REGEX from ASCII-only pattern to Unicode XID_Start / XID_Continue to correctly extract non-ASCII column names (e.g. Japanese) from SQLite CHECK constraint expressions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // ダブルクォート識別子をスキップ("" エスケープ対応) | ||
| if ch == '"' { | ||
| i += 1; | ||
| while i < len { | ||
| if chars[i].1 == '"' { | ||
| i += 1; | ||
| if i < len && chars[i].1 == '"' { | ||
| i += 1; | ||
| continue; | ||
| } | ||
| break; | ||
| } | ||
| i += 1; | ||
| } | ||
| continue; | ||
| } |
There was a problem hiding this comment.
parse_sqlite_check_constraints() skips single-quoted strings and double-quoted identifiers, but SQLite also allows identifier quoting with backticks (`...`) and square brackets ([...]). If a user created a table/column named CHECK using backticks/brackets, this scanner will mis-detect it as a CHECK constraint keyword and may parse the surrounding (...) as a constraint. Consider extending the skip logic to handle backtick- and bracket-quoted identifiers (including escaped ```` and ]]), and add a unit test for a backtick-quoted `CHECK` identifier to prevent regressions.
| while let Some(ch) = chars.next() { | ||
| if ch == '\'' { | ||
| // 文字列リテラルをスキップ | ||
| loop { | ||
| match chars.next() { | ||
| Some('\'') => { | ||
| // '' エスケープのチェック | ||
| if chars.peek() == Some(&'\'') { | ||
| chars.next(); | ||
| continue; | ||
| } | ||
| break; | ||
| } | ||
| None => break, | ||
| _ => {} | ||
| } | ||
| } | ||
| } else { | ||
| result.push(ch); | ||
| } | ||
| } |
There was a problem hiding this comment.
strip_string_literals() removes string literals without leaving any separator. For valid SQL like col = 'x'AND b > 0 (or col='x'ANDb>0), this can concatenate tokens (ANDb) and cause extract_columns_from_sqlite_check() to miss column identifiers, which then impacts generated CHECK constraint columns (and therefore deterministic constraint naming). Consider replacing removed literals with a single whitespace (or another delimiter) when the surrounding characters are identifier characters, to preserve token boundaries.
Summary
strata exportwas silently dropping CHECK constraints because all three database introspectors (PostgreSQL, MySQL, SQLite) were missing CHECK constraint queries in theirget_constraints()implementations. This PR adds CHECK constraint introspection for all dialects.Changes per dialect:
pg_constraintwithcontype = 'c', usespg_get_constraintdef()for the expression andstring_agg()for column names (since theAnydriver doesn't support array decoding)information_schema.check_constraintsjoined withtable_constraints(MySQL 8.0.16+), with filtering to exclude auto-generated NOT NULL and ENUM validation checksCREATE TABLEDDL stored insqlite_master, with nested parenthesis handlingAlso adds helper functions:
extract_columns_from_check_expression()- Extracts column names from MySQL backtick-quoted CHECK expressionsparse_sqlite_check_constraints()- Parses CHECK constraints from SQLite CREATE TABLE DDLextract_columns_from_sqlite_check()- Extracts column names from SQLite CHECK expressions by filtering SQL keywordsRelated Issue
Fixes #24
Type of Change
Test Plan
Added Tests (17 new tests):
SQLite CHECK constraint parsing:
test_parse_sqlite_check_simple- Basic single CHECK constrainttest_parse_sqlite_check_multiple- Multiple CHECK constraints in one tabletest_parse_sqlite_check_nested_parens- Nested parentheses in expressionstest_parse_sqlite_check_case_insensitive- Case-insensitive CHECK keywordtest_parse_sqlite_check_no_checks- Table without CHECK constraintsMySQL column extraction:
test_extract_columns_from_mysql_check_single- Single backtick-quoted columntest_extract_columns_from_mysql_check_multiple- Multiple columns in expressiontest_extract_columns_from_mysql_check_no_backticks- No backticks returns emptySQLite column extraction:
test_extract_columns_from_sqlite_check_simple- Single columntest_extract_columns_from_sqlite_check_with_and- Deduplication with ANDtest_extract_columns_from_sqlite_check_multiple_columns- Multiple columnsChecklist
cargo fmtandcargo clippycargo test)Summary by CodeRabbit
New Features
Tests