Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Import/Export] Exported databases can now be safely moved #5965

Merged
merged 7 commits into from
Jan 27, 2023

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Jan 23, 2023

This PR fixes #5868

Previously we would EXPORT and bake the absolute path of the table data into the query e.g path/to/exported/db/tbl.csv
When we only really need the tbl.csv part, because the path of the db folder is already provided on IMPORT.

Now we prefix the path with a placeholder on EXPORT, and replace it with the db folder path on IMPORT.

@Tishj
Copy link
Contributor Author

Tishj commented Jan 23, 2023

Actually, do we even need the placeholder?

@Mytherin
Copy link
Collaborator

Mytherin commented Jan 23, 2023

Thanks for the PR!

While the placeholder approach is relatively straightforward I do see some potential issues with this:

  • It won't work correctly with older versions of DuckDB - as they won't know how to handle a placeholder, and won't emit a placeholder
  • It won't work when loading into other systems, e.g. Postgres
  • Global text replacement is a bit hacky, since the IMPORT file can contain arbitrary user-defined input (e.g. the actual file name)

Perhaps we can use the Parser for this - for example we can run the parser on the load.sql file. It should only contain COPY statements - and we can directly extract and modify the path for each of the CopyStatements within the file. We could then implement CopyStatement::ToString (similar to what we have for SelectStatement) to stringify it again. That should work while leaving the export code untouched.

@Tishj
Copy link
Contributor Author

Tishj commented Jan 23, 2023

You mean from new version -> old version
Is that supported in general?

Because for old version -> new version this should still work
it doesn't assert that the file is prefixed with the placeholder, only when it does it replaces it

The other points are fair though
I like the approach you suggested, I felt the string replacements were kinda dirty/error-prone to begin with

@Mytherin
Copy link
Collaborator

You mean from new version -> old version
Is that supported in general?

It used to work, so I would prefer not to break it unnecessarily

@Tishj
Copy link
Contributor Author

Tishj commented Jan 25, 2023

I've made some changes to the statement verifier to include the copy statement

Though I went a little overboard and implemented Equals for every Statement
I think I'll make a separate PR out of it, and also add Copy + Equals verification for all statements

Found and fixed a few bugs in this ToString that I need to move over to this PR, but other than that it should be done

}

string CopyStatement::ToString() const {
string result;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have verification for ToString round-trips for INSERT/DELETE/SELECT statements in src/main/client_context.cpp:670 (in PendingStatementOrPreparedStatementInternal). Perhaps add the COPY statement there as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I completely missed that, my mind instantly went to StatementVerifier
so I made changes there to support other types of statements than Select, but that required skipping a lot of the functionality of the StatementVerifier, because we can't execute them - because of side-effects.

So this will definitely be better
Also a good place to add the Equals check I think

@Mytherin Mytherin merged commit ef706e0 into duckdb:master Jan 27, 2023
@Mytherin
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Absolute paths in EXPORT DATABASE load.sql
2 participants