Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl: Allow wildcards in cloud storage URLs #40522

Closed
rolandcrosby opened this issue Sep 5, 2019 · 0 comments · Fixed by #40714
Closed

importccl: Allow wildcards in cloud storage URLs #40522

rolandcrosby opened this issue Sep 5, 2019 · 0 comments · Fixed by #40714
Assignees
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@rolandcrosby
Copy link

Historically, we have required all files being IMPORTed into CockroachDB to be listed as separate URLs. This is workable but extremely clunky in practice, especially since cloud storage credentials need to be listed in every URL.

For cloud storage URLs (s3, gs, azure), we can use the providers' SDKs to get a list of files under a given prefix. We could use this to let a user specify file patterns to match instead of explicitly specifying every file.

We can start by adding support for the * wildcard character, and should aim to support the following types of match:

  • All files in a given directory: s3://bucket-name/path/to/data/*
  • All files in a given directory that end with a given string:s3://bucket-name/files/*.csv
  • All files in a given directory that start with a given string:s3://bucket-name/files/data*
  • All files in a given directory that start and end with a given string:s3://bucket-name/files/data*.csv

These should only look at files directly under the specified path, and should not descend into additional directories recursively. (That said, if it's possible to cheaply add that functionality, we could consider adding a ** wildcard as a separate feature in the future).

@rolandcrosby rolandcrosby added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-disaster-recovery labels Sep 5, 2019
g3orgia pushed a commit to g3orgia/cockroach that referenced this issue Sep 16, 2019
This adds support for using wildcard characters
to specify the list of files to import from. Using
cloud provider SDKs, we list files that could potentially
match a pattern in the URI, then match files using specs
as detailed [here](https://golang.org/pkg/path/filepath/#Match).
This listing capability was added to the `ExportStorage` interface.

Within `import_stmt`, we use the newly added interface
to expand the listed files. Note that we will import
the expanded list of files but the job description would
show the originally listed files (with wildcards).

Fixes: cockroachdb#40522

Release note (enterprise change):
Within an import statement, users will be able to specify
CSV filenames using wildcard characters.

Release justification: This will not be release in 19.2
g3orgia pushed a commit to g3orgia/cockroach that referenced this issue Sep 18, 2019
This adds support for using wildcard characters
to specify the list of files to import from. Using
cloud provider SDKs, we list files that could potentially
match a pattern in the URI, then match files using specs
as detailed [here](https://golang.org/pkg/path/filepath/#Match).
This listing capability was added to the `ExportStorage` interface.

Within `import_stmt`, we use the newly added interface
to expand the listed files. Note that we will import
the expanded list of files but the job description would
show the originally listed files (with wildcards).

This also adds an option flag that can specify turning off
this feature `WITH disable_glob_matching`.

Fixes: cockroachdb#40522

Release note (enterprise change):
Within an import statement, users will be able to specify
CSV filenames using wildcard characters. This behavior can
be disabled with the following option: `WITH disabled_glob_matching`.

Release justification: This will not be release in 19.2
g3orgia pushed a commit to g3orgia/cockroach that referenced this issue Oct 7, 2019
This adds support for using wildcard characters
to specify the list of files to import from. Using
cloud provider SDKs, we list files that could potentially
match a pattern in the URI, then match files using specs
as detailed [here](https://golang.org/pkg/path/filepath/#Match).
This listing capability was added to the `ExportStorage` interface.

Within `import_stmt`, we use the newly added interface
to expand the listed files. Note that we will import
the expanded list of files but the job description would
show the originally listed files (with wildcards).

This also adds an option flag that can specify turning off
this feature `WITH disable_glob_matching`.

Fixes: cockroachdb#40522

Release note (enterprise change):
Within an import statement, users will be able to specify
CSV filenames using wildcard characters. This behavior can
be disabled with the following option: `WITH disabled_glob_matching`.

Release justification: This will not be release in 19.2
craig bot pushed a commit that referenced this issue Oct 9, 2019
40714: importccl: Allow wildcards in import URIs r=g3orgia a=g3orgia

This adds support for using wildcard characters to specify the list of files to import from. Using cloud provider SDKs, we list files that could potentially match a pattern in the URI, then match files using specs as detailed [here](https://golang.org/pkg/path/filepath/#Match). This listing capability was added to the `ExportStorage` interface.

Within `import_stmt`, we use the newly added interface to expand the listed files. Note that we will import the expanded list of files but the job description would show the originally listed files (with wildcards).

Fixes: #40522

Release note: None

Co-authored-by: Georgia Hong <georgiah@cockroachlabs.com>
@craig craig bot closed this as completed in 2c7315b Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants