-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbConnect and noctua::athena() only pulls in default profile #90
Comments
Hi @mraess , This behaviour is correct. What I think you want to do is change the default library(DBI)
library(noctua)
# changing profile name but same schema
con1 <- dbConnect(athena())
con2 <- dbConnect(athena(),
profile_name = "demo_profile")
# changing schema/database
con3<- dbConnect(athena(),
schema_name = "temp")
con4 <- dbConnect(athena(),
profile_name = "demo_profile",
schema_name = "temp")
dbGetInfo(con1)
# $profile_name
# [1] "default"
#
# $s3_staging
# [1] "s3://mybucket/"
#
# $dbms.name
# [1] "default"
#
# $work_group
# [1] "primary"
#
# $poll_interval
# NULL
#
# $encryption_option
# NULL
#
# $kms_key
# NULL
#
# $expiration
# NULL
#
# $region_name
# [1] "eu-west-1"
#
# $paws
# [1] "0.1.7"
#
# $noctua
# [1] "1.6.0.9000"
dbGetInfo(con2)
# $profile_name
# [1] "demo_profile"
#
# $s3_staging
# [1] "s3://mybucket/"
#
# $dbms.name
# [1] "default"
#
# $work_group
# [1] "primary"
#
# $poll_interval
# NULL
#
# $encryption_option
# NULL
#
# $kms_key
# NULL
#
# $expiration
# NULL
#
# $region_name
# [1] "eu-west-1"
#
# $paws
# [1] "0.1.7"
#
# $noctua
# [1] "1.6.0.9000"
dbGetInfo(con3)
# $profile_name
# [1] "default"
#
# $s3_staging
# [1] "s3://mybucket/"
#
# $dbms.name
# [1] "temp"
#
# $work_group
# [1] "primary"
#
# $poll_interval
# NULL
#
# $encryption_option
# NULL
#
# $kms_key
# NULL
#
# $expiration
# NULL
#
# $region_name
# [1] "eu-west-1"
#
# $paws
# [1] "0.1.7"
#
# $noctua
# [1] "1.6.0.9000"
dbGetInfo(con4)
# $profile_name
# [1] "demo_profile"
#
# $s3_staging
# [1] "s3://mybucket/"
#
# $dbms.name
# [1] "temp"
#
# $work_group
# [1] "primary"
#
# $poll_interval
# NULL
#
# $encryption_option
# NULL
#
# $kms_key
# NULL
#
# $expiration
# NULL
#
# $region_name
# [1] "eu-west-1"
#
# $paws
# [1] "0.1.7"
#
# $noctua
# [1] "1.6.0.9000" I hope this makes sense :) |
When you make a connection and you have multiple schemas in your Athena instance you can still query them for example: library(DBI)
library(noctua)
# changing profile name but same schema
con1 <- dbConnect(athena())
dbGetQuery(con1, "select * from temp.iris_parquet")
# Info: (Data scanned: 666 Bytes)
# petal_length petal_width time_stamp
# 1: 1.4 0.2 20200421
# 2: 1.4 0.2 20200421
# 3: 1.3 0.2 20200421
# 4: 1.5 0.2 20200421
# 5: 1.4 0.2 20200421
# ---
# 146: 5.2 2.3 20200421
# 147: 5.0 1.9 20200421
# 148: 5.2 2.0 20200421
# 149: 5.4 2.3 20200421
# 150: 5.1 1.8 20200421 The benefit of setting the library(dplyr)
con2 <- dbConnect(athena(),
schema_name = "temp")
tbl(con2, "iris_parquet")
# Info: (Data scanned: 666 Bytes)
# Source: table<iris_parquet> [?? x 3]
# Database: Athena 0.1.7 [default@eu-west-1/temp]
# petal_length petal_width time_stamp
# <dbl> <dbl> <chr>
# 1 1.4 0.2 20200421
# 2 1.4 0.2 20200421
# 3 1.3 0.2 20200421
# 4 1.5 0.2 20200421
# 5 1.4 0.2 20200421
# 6 1.7 0.4 20200421
# 7 1.4 0.3 20200421
# 8 1.5 0.2 20200421
# 9 1.4 0.2 20200421
# 10 1.5 0.1 20200421
# … with more rows However this still can be achieved with tbl(con1, dbplyr::in_schema("temp","iris_parquet"))
# Info: (Data scanned: 666 Bytes)
# Source: table<temp.iris_parquet> [?? x 3]
# Database: Athena 0.1.7 [default@eu-west-1/default]
# petal_length petal_width time_stamp
# <dbl> <dbl> <chr>
# 1 1.4 0.2 20200421
# 2 1.4 0.2 20200421
# 3 1.3 0.2 20200421
# 4 1.5 0.2 20200421
# 5 1.4 0.2 20200421
# 6 1.7 0.4 20200421
# 7 1.4 0.3 20200421
# 8 1.5 0.2 20200421
# 9 1.4 0.2 20200421
# 10 1.5 0.1 20200421
# … with more rows I hope this has helped with your initial query. I will leave this open just incase I miss understood you initial question. |
Hey, Thanks for the quick answer - I'll certainly give this a shot. One question though, wouldn't the different s3 staging directories determine which database it connects to? And how does it know what the standard database is, which I'm connecting to right now? I guess what I'm asking is IF I have all my tables in AWS athena, but some are under Let me know if this makes sense :) |
By default I am not sure how to get the |
Haha, sounds good - I just might have to hard-code the |
It should be able to query across schema's so for example: library(DBI)
library(noctua)
con <- dbConnect(athena(),
schema_name = "default")
# querying default schema
dbGetQuery(con, "select * from iris")
# Info: (Data scanned: 860 Bytes)
# sepal_length sepal_width petal_length petal_width species
# 1: 5.1 3.5 1.4 0.2 setosa
# 2: 4.9 3.0 1.4 0.2 setosa
# 3: 4.7 3.2 1.3 0.2 setosa
# 4: 4.6 3.1 1.5 0.2 setosa
# 5: 5.0 3.6 1.4 0.2 setosa
# ---
# 146: 6.7 3.0 5.2 2.3 virginica
# 147: 6.3 2.5 5.0 1.9 virginica
# 148: 6.5 3.0 5.2 2.0 virginica
# 149: 6.2 3.4 5.4 2.3 virginica
# 150: 5.9 3.0 5.1 1.8 virginica
# querying temp schema
dbGetQuery(con, "select * from temp.iris_parquet")
# Info: (Data scanned: 666 Bytes)
# petal_length petal_width time_stamp
# 1: 1.4 0.2 20200421
# 2: 1.4 0.2 20200421
# 3: 1.3 0.2 20200421
# 4: 1.5 0.2 20200421
# 5: 1.4 0.2 20200421
# ---
# 146: 5.2 2.3 20200421
# 147: 5.0 1.9 20200421
# 148: 5.2 2.0 20200421
# 149: 5.4 2.3 20200421
# 150: 5.1 1.8 20200421 |
Thanks for the quick replies!! I'll check this out for sure. Thanks also for keeping the issue open for now! |
Hey, so I implemented your suggestions - unfortunately to no avail. I have been able to narrow the problem down some more though. A little more context: both When I replace the key/secret for con_2 <- dbConnect(noctua::athena(),
profile_name = "other-profile-that-is-not-default",
s3_staging_dir = "s3://aws-athena-query-results-for-other-profile-us-east-1/",
region = "us-east-1")
pointing to a non-default profile. This leads me to believe that, for some reason, only the Another thing I tried was to put SAML credentials into the
In the
This however results in an error: Error: UnrecognizedClientException (HTTP 400). The security token included in the request is invalid. Anyways, these might be two different problems altogether but it seems like the function is only picking up the Again, any insights and help are greatly appreciated! I'm really hoping to figure this out :) |
Will have a little look tomorrow :) |
Awesome, thanks!! |
I am just a little confused with what is going on, sorry. Do you mind running through with me again so I have it straight in my head :) Please correct me if I have miss understood
Your initial comment was able to make the connection to both AWS Accounts but not to the right databases. As you was able to get connection objects from both With your initial con_2 <- dbConnect(noctua::athena(),
profile_name = "other-profile-that-is-not-default",
schema_name = "prod-a",
s3_staging_dir = "s3://aws-athena-query-results-for-other-profile-us-east-1/") |
Hey, thanks for following up. I'm happy to provide some more details. Yes, I tried the code at the bottom but that did not work. For context, assume that However, I tried to narrow the use case down even more to make potential debugging easier :) I wrote a short test script, which I then ran from the terminal and which prints out a list of tables under the connection. This allowed me to narrow down the problems more library(noctua)
library(DBI)
con_3 <- dbConnect(noctua::athena(),
profile_name = "other-profile-that-is-not-default"
s3_staging_dir = "s3://aws-athena-query-results-for-other-profile-us-east-1/",
region = "us-east-1")
dbListTables(con_3) Observations:
This seems to indicate that 1) the In conclusion, hopefully, this test proved that 1) the Please let me know if this makes sense. I'm happy to provide further insights/clarification. Thanks again for your awesome responsiveness here!!! Happy Friday! |
Ah ok cool, just had a little look and the profile not being passed correctly looks like it was a known issue with the
From a quick test this fixes apart of your issue but I don't know if it will fix the second part. For the STS what code are you using? I might need to added more explicit support for |
For the STS use-case, we are using a SAML provider in conjunction with sts assume-role-with-saml in order to establish the temporary STS session. STS generates temporary AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SECURITY_TOKEN values that can be stored in the ~/.aws/credentials and then used with standard SDK tools to call AWS services. Once the session is established, it works like any other STS session, so there is no more SAML involved at that point. This is similar to what you would get if you used normal keys and called |
@pmarkert Thanks for the for the extra information. I will need to create a branch that gives more explicit support for SAML connection methods similar to assume_role(). I will need to do a little extra reading so I know how to set up my own aws account with saml connection method so i can add some unit tests around this :) |
@DyfanJones thanks a lot!! |
@mraess i will create a initial pull request is it possible for you to test? I currently don't know how to set up my aws account for saml connections. This would be a massive help :) |
Actually, it looks like this second issue is also related to that underlying driver library. paws-r/paws#280 describes exactly the problem. From your code, you are already passing the aws_session_token through any time the access_key is referenced, which would be all you need to do to ensure the proper support, so I don't know that you need to setup any SAML or role integrations. |
@DyfanJones yes, I'll get on that later today or early next week and let you know how it goes :) |
I am getting the package ready for the next release which comes with retry functionality, |
@DyfanJones Sounds great! |
I was able to test with the updated paws library and it looks like it is working to me, both parts reading from a named profile and picking up the STS session. Thanks! |
Thanks @pmarkert for testing. I will close this ticket. If this issue crops up again please re-open. Or open up another ticket :) |
Issue Description
r noctua::athena()
does connect to database BUT only to the database with the default profile in the aws .config/.credentials files.When I run this
or
It does connect to the correct AWS database under the
default
profile in the aws .config/.credentials file and I'm able to run queries against it.Output from ```dbGetInfo(con)```
> dbGetInfo(con) $profile_name [1] "default"$s3_staging
[1] "s3://aws-athena-query-results-for-default-profile-us-east-1"
$dbms.name
[1] "default"
$work_group
[1] "primary"
$poll_interval
NULL
$encryption_option
NULL
$kms_key
NULL
$expiration
NULL
$region_name
[1] "us-east-1"
$paws
[1] "0.1.7"
$noctua
[1] "1.6.0"
However, when I run this,
Output from ```dbGetInfo(con_2)```
$profile_name
[1] "other-profile-that-is-not-default"
$s3_staging
[1] "s3://aws-athena-query-results-for-other-profile-us-east-1"
$dbms.name
[1] "default"
$work_group
[1] "primary"
$poll_interval
NULL
$encryption_option
NULL
$kms_key
NULL
$expiration
NULL
$region_name
[1] "us-east-1"
$paws
[1] "0.1.7"
$noctua
[1] "1.6.0"
then, it still connects but again to the
default
database. How can I successfully connect to a non-default profile? It looks likedbGetInfo(con_2)
has the correct information but for whatever reason it does not connect to the correct database and I see the exact same tables as forcon
.This is what the
.config
file looks likeand this is what the
.credentials
file looks likeAny help would be greatly appreciated especially since I have no idea how to set up the .config/.credentials file differently.
The text was updated successfully, but these errors were encountered: