-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add configurations to allow tempdir and Redshift cluster to be in different AWS regions #87
Comments
Come to think of it, there are cases where we want to support cross-region transfers. Therefore, we might choose to split this into two separate issues: giving a more informative warning message and giving instructions on how to configure the cross-region UNLOAD command. As far as I know, there's not an easy way to determine the cluster's region over JDBC, so I don't know that we'd be able to automatically figure out the correct UNLOAD command: http://stackoverflow.com/q/32545040/590203 |
Having |
That sounds reasonable to me; I was considering doing something similar. I think that |
I'm a bit overloaded with other work at the moment and this task isn't part of our current sprint, so this is up-for-grabs if anyone wants to work on this. I do have time to review / revise small patches to |
Now that #35 has been merged, this can be worked around using the new |
|
Users: please comment on this thread to vote on this issue if it's important to you. I'd like to implement this but am holding off until I hear about more demand for this feature, since I have limited time to devote to |
While the COPY command seems to support a region, it appears that the UNLOAD command doesn't: From http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html: I'm glad to add this functionality, but it appears it would only be one-way (write-only) |
If you want this functionality yourself, I would be happy to accept a patch for it; the one-way limitation is fine. |
I was just finishing the unit test for this when I realized that there's already a trivial workaround with the existing |
+1 We have a use case where we pull data from redshift for several different clients and would prefer to only use a single S3 bucket instead of having an S3 bucket in every region. This will be very helpful. |
@karanveerm, note the limitation described in #87 (comment):
Given this limitation, I don't think we'll be able to support your use-case of using a single bucket to pull data from several clients. However, I believe that you could use a single bucket as the staging area for writes by using the |
Issue databricks#87: Creating a serializer per row has a really low performance.
By default, S3 <-> Redshift copies will not work if the S3 bucket and Redshift cluster are in different AWS regions. If you try to use a bucket in a different region, then you get a confusing error message; see https://forums.databricks.com/questions/1963/why-spark-redshift-can-not-write-s3-bucket.html for one example.
Note that it is technically possible to use a bucket in a different region if you pass an extra
region
parameter to the COPY command; see https://sqlhaven.wordpress.com/2014/09/07/common-errors-of-redshift-copy-command-and-how-to-solve-them-part-1/ for one example of this.The text was updated successfully, but these errors were encountered: