New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql,sql-parser: parse CREATE SOURCES #6299
Conversation
0ad202e
to
3d8c5c0
Compare
Gah, sorry to throw a wrench in things here, but can we workshop the syntax a bit? Repeating the connection string and publication name will be a chore, and also won't give us the I believe desirable property that the connection string and publication name are the same across all the sources? I was envisioning something less general purpose but more UX-focused, like:
If there are other folks (yourself included!) who have strong opinions on the subject, then probably a design doc is warranted! |
3d8c5c0
to
87c26f0
Compare
5c92203
to
4cbdae5
Compare
#[derive(Debug, Clone, PartialEq, Eq, Hash)] | ||
pub struct PgTable<T: AstInfo> { | ||
/// The name of the table to sync | ||
pub name: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should be changed to UnresolvedObjectName
. We are parsing the column defs like a normal create table statement, I think we should parse the table name similarly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just taking a look now, that makes sense to me! Fixing. Thanks @mjibson !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might have some difficulties with
materialize/src/sql/src/postgres_util.rs
Line 40 in c325921
.query( |
format!("SELECT {}::regclass::oid", table)
(using params here didn't work in cockroach, but it might work in postgres. But the blah::regclass::oid
thing is the magic that will get you the OID of the table you want).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hit the same problem when I tried to implement your suggestion in the other PR. Do you know if you can use parameters with this type of query? If not I'd personally stick with the original query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@petrosagg would you want to switch back to avoid potential SQL injection? or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we parse the name as a valid ObjectName then it should be ok, but if we accept it as a string literal I think we should use the long form query
4cbdae5
to
2ab6123
Compare
src/sql/src/pure.rs
Outdated
table: &mut PgTable<Raw>, | ||
) -> Result<(), anyhow::Error> { | ||
let fetched_columns = | ||
postgres_util::fetch_columns(conn, namespace, &table.name.to_ast_string()).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct anymore. table.name
could be something like db.schema.table
which will be converted to a single string which will break the query in fetch_columns
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! Sorry, was pushing too quickly to take advantage of CI. I think I added a sufficient coverage for this in the new pg-cdc.td
test cases.
Uh I'm rethinking my suggestion about UnresolvedObjectName because it's more difficult than I realized. I'd do it in a followup PR. |
Edited: I think the changes I just pushed correctly swap over to |
aa35f6a
to
d45031e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow review! Took me a while to spin up. Left you comments within!
src/sql-parser/src/ast/defs/ddl.rs
Outdated
table: Option<PgTable<T>>, | ||
/// The tables to sync, if multiple tables | ||
/// Will be set for `CREATE SOURCES`, but not `CREATE SOURCE` | ||
tables: Option<Vec<PgTable<T>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like something that wants to be an enum?
enum PostgresConnectorTableSpec<T> {
Table(PgTable<T>),
Tables(Vec<PgTable<T>>),
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, thinking about it more, despite the duplication I think it will be safer if you create an entirely new fork of the AST for the multiple sources case.
enum Connector<T> {
// ...
Postgres {
conn: String,
publication: String,
table: String,
columns: Vec<ColumnDef<T>>,
}
}
enum MultiConnector {
Postgres {
conn: String,
publication: String,
tables: Vec<PgTable<T>>,
}
}
pub struct PgTable<T: AstInfo> {
pub name: UnresolvedObjectName,
pub alias: Option<T::ObjectName>,
pub columns: Vec<ColumnDef<T>>,
}
That way in the planner you won't have to be switching on table
vs tables
. It also makes me a little nervous when valid ASTs don't serialize to valid SQL; e.g., with the current design, you can construct an AST that renders to "CREATE SOURCE ... TABLES (...)" which is not actually valid SQL.
src/sql/src/pure.rs
Outdated
} | ||
if let Some(tables) = tables { | ||
for table in tables.iter_mut() { | ||
purify_postgres_connector(conn, table).await? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this case will ever get hit because this is within a CreateSource
branch, and this can only happen with CREATE SOURCES
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This ended up being part of the reason I proposed creating two forks of the AST.)
src/sql/src/normalize.rs
Outdated
if let Some(tables) = tables { | ||
for table in tables { | ||
visit_table_columns(&mut normalizer, table) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is part of the reason I'm a bit sketched out by the AST in its current form; you'll never actually hit the tables
case here!)
src/sql/src/postgres_util.rs
Outdated
.strip_prefix("\"") | ||
.unwrap_or_else(|| &table) | ||
.strip_suffix("\"") | ||
.unwrap_or_else(|| &table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, eek, this is going to mishandle table names that have single quotes in them, and possible identifiers with other sorts of special characters. I think the original query was definitely better here—sorry!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I see what's going on here. The idea with using unresolved names is that you'd no longer need the NAMESPACE
option in CREATE SOURCE(S)
. Instead the namespace should come from the part before the dot; so
CREATE SOURCE blah FROM POSTGRES HOST 'foo:5432' TABLES (
public.one AS renamed
public.two AS renamed2
weird_schema.three
)
So then here you'd require that table
be an object name of two components, and pass the first component into the original query as the namespace and the second component into the original query as the table name.
85d2d30
to
3b7f96e
Compare
Parses statements like: CREATE SOURCES FROM POSTGRES HOST '...' PUBLICATION '...' NAMESPACE '...' TABLES (...) where each of the upstream sources shares a HOST, PUBLICATION, and NAMESPACE, but has its own table name, alias, and optional column definitions. Leaves planning and sequencing unimplemented.
3b7f96e
to
dd3c3ae
Compare
I've made the suggested parsing/purifying/AST changes, but I punted on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM!
Parses statements like:
where each of the upstream sources shares a
HOST
,PUBLICATION
, andNAMESPACE
, but has its own table name, alias, and optional columndefinitions. Leaves planning and sequencing unimplemented.
This change is