-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
importccl: Allow wildcards in cloud storage URLs #40522
Labels
A-disaster-recovery
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Comments
rolandcrosby
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-disaster-recovery
labels
Sep 5, 2019
g3orgia
pushed a commit
to g3orgia/cockroach
that referenced
this issue
Sep 16, 2019
This adds support for using wildcard characters to specify the list of files to import from. Using cloud provider SDKs, we list files that could potentially match a pattern in the URI, then match files using specs as detailed [here](https://golang.org/pkg/path/filepath/#Match). This listing capability was added to the `ExportStorage` interface. Within `import_stmt`, we use the newly added interface to expand the listed files. Note that we will import the expanded list of files but the job description would show the originally listed files (with wildcards). Fixes: cockroachdb#40522 Release note (enterprise change): Within an import statement, users will be able to specify CSV filenames using wildcard characters. Release justification: This will not be release in 19.2
g3orgia
pushed a commit
to g3orgia/cockroach
that referenced
this issue
Sep 18, 2019
This adds support for using wildcard characters to specify the list of files to import from. Using cloud provider SDKs, we list files that could potentially match a pattern in the URI, then match files using specs as detailed [here](https://golang.org/pkg/path/filepath/#Match). This listing capability was added to the `ExportStorage` interface. Within `import_stmt`, we use the newly added interface to expand the listed files. Note that we will import the expanded list of files but the job description would show the originally listed files (with wildcards). This also adds an option flag that can specify turning off this feature `WITH disable_glob_matching`. Fixes: cockroachdb#40522 Release note (enterprise change): Within an import statement, users will be able to specify CSV filenames using wildcard characters. This behavior can be disabled with the following option: `WITH disabled_glob_matching`. Release justification: This will not be release in 19.2
g3orgia
pushed a commit
to g3orgia/cockroach
that referenced
this issue
Oct 7, 2019
This adds support for using wildcard characters to specify the list of files to import from. Using cloud provider SDKs, we list files that could potentially match a pattern in the URI, then match files using specs as detailed [here](https://golang.org/pkg/path/filepath/#Match). This listing capability was added to the `ExportStorage` interface. Within `import_stmt`, we use the newly added interface to expand the listed files. Note that we will import the expanded list of files but the job description would show the originally listed files (with wildcards). This also adds an option flag that can specify turning off this feature `WITH disable_glob_matching`. Fixes: cockroachdb#40522 Release note (enterprise change): Within an import statement, users will be able to specify CSV filenames using wildcard characters. This behavior can be disabled with the following option: `WITH disabled_glob_matching`. Release justification: This will not be release in 19.2
craig bot
pushed a commit
that referenced
this issue
Oct 9, 2019
40714: importccl: Allow wildcards in import URIs r=g3orgia a=g3orgia This adds support for using wildcard characters to specify the list of files to import from. Using cloud provider SDKs, we list files that could potentially match a pattern in the URI, then match files using specs as detailed [here](https://golang.org/pkg/path/filepath/#Match). This listing capability was added to the `ExportStorage` interface. Within `import_stmt`, we use the newly added interface to expand the listed files. Note that we will import the expanded list of files but the job description would show the originally listed files (with wildcards). Fixes: #40522 Release note: None Co-authored-by: Georgia Hong <georgiah@cockroachlabs.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-disaster-recovery
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Historically, we have required all files being
IMPORT
ed into CockroachDB to be listed as separate URLs. This is workable but extremely clunky in practice, especially since cloud storage credentials need to be listed in every URL.For cloud storage URLs (s3, gs, azure), we can use the providers' SDKs to get a list of files under a given prefix. We could use this to let a user specify file patterns to match instead of explicitly specifying every file.
We can start by adding support for the
*
wildcard character, and should aim to support the following types of match:s3://bucket-name/path/to/data/*
s3://bucket-name/files/*.csv
s3://bucket-name/files/data*
s3://bucket-name/files/data*.csv
These should only look at files directly under the specified path, and should not descend into additional directories recursively. (That said, if it's possible to cheaply add that functionality, we could consider adding a
**
wildcard as a separate feature in the future).The text was updated successfully, but these errors were encountered: