-
Notifications
You must be signed in to change notification settings - Fork 298
Better prepare_db to use latest preql features #24
Conversation
|
I think next step will be to merge the separate bigquery file into prepare_db |
sirupsen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you get this against all of them?
How can we make it easy for others (inside the Datafold org), to do the same?
| func create_indices(tbl) { | ||
| tbl.add_index("id", true) | ||
| tbl.add_index("timestamp") | ||
| tbl.add_index(["id", "timestamp"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally these would be created as part of the table definition instead of built afterward. It's nicer if that's defined together
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I agree, but I'm not sure what's the best way to do it, syntax-wise.
Especially on (id, timestamp).
Also, does it mean that indexes are copied when you copy a table?
This just seemed like the simplest, most robust solution.
| drop_table("rating_update001p") | ||
| drop_table("rating_update1p") | ||
| drop_table("rating_del1p") | ||
| drop_table("rating_update50p") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should .drop on the table definition be a preql construct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already remove_table_if_exists(), which does the same thing. But it doesn't print out the SQL.
|
|
||
| run_sql("RM @~/ratings.csv.gz") | ||
| run_sql("PUT file://dev/ratings.csv @~") | ||
| if (db_type == "snowflake" or db_type == "redshift") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What could we do to remove the Cloud databases as special-cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do something like db.is_cloud(), but that would only solve this specific if.
I don't see how we can get rid of db-specific code.
What we can do is have an separate module for each db, implementing import_sample_csv(), and have the main module call it. So at least the db-specific stuff will be separated.
Not sure I understand the question.. You just need to set up the databases and run it. |
…hema Support 3-part paths in DuckDB
A step in the right direction for better integration with preql