Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Conversation

@erezsh
Copy link
Contributor

@erezsh erezsh commented May 13, 2022

A step in the right direction for better integration with preql

@erezsh erezsh requested review from cfernhout and sirupsen and removed request for sirupsen May 13, 2022 10:53
@erezsh
Copy link
Contributor Author

erezsh commented May 13, 2022

I think next step will be to merge the separate bigquery file into prepare_db

Copy link
Contributor

@sirupsen sirupsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you get this against all of them?

How can we make it easy for others (inside the Datafold org), to do the same?

func create_indices(tbl) {
tbl.add_index("id", true)
tbl.add_index("timestamp")
tbl.add_index(["id", "timestamp"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally these would be created as part of the table definition instead of built afterward. It's nicer if that's defined together

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I agree, but I'm not sure what's the best way to do it, syntax-wise.

Especially on (id, timestamp).

Also, does it mean that indexes are copied when you copy a table?

This just seemed like the simplest, most robust solution.

drop_table("rating_update001p")
drop_table("rating_update1p")
drop_table("rating_del1p")
drop_table("rating_update50p")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should .drop on the table definition be a preql construct?

Copy link
Contributor Author

@erezsh erezsh May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already remove_table_if_exists(), which does the same thing. But it doesn't print out the SQL.


run_sql("RM @~/ratings.csv.gz")
run_sql("PUT file://dev/ratings.csv @~")
if (db_type == "snowflake" or db_type == "redshift") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What could we do to remove the Cloud databases as special-cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do something like db.is_cloud(), but that would only solve this specific if.

I don't see how we can get rid of db-specific code.

What we can do is have an separate module for each db, implementing import_sample_csv(), and have the main module call it. So at least the db-specific stuff will be separated.

@erezsh
Copy link
Contributor Author

erezsh commented May 16, 2022

Did you get this against all of them?

How can we make it easy for others (inside the Datafold org), to do the same?

Not sure I understand the question.. You just need to set up the databases and run it.

@erezsh erezsh merged commit 199067c into master May 17, 2022
nolar pushed a commit that referenced this pull request Apr 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants