Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird query behavior: Successful query with HTTP 403 AccessDenied error #96

Closed
mraess opened this issue Jun 12, 2020 · 7 comments
Closed

Weird query behavior: Successful query with HTTP 403 AccessDenied error #96

mraess opened this issue Jun 12, 2020 · 7 comments
Assignees
Labels

Comments

@mraess
Copy link

@mraess mraess commented Jun 12, 2020

Issue Description

I don't know if you remember, but not to long ago we discovered an issue in paws.common together and in an issue that had to do with pulling in profiles into the profile parameter into dbConnect(). Everything works fine even with AWS_PROFILE as env variables. I've been working with the updated packages and a new error prevents me from running a script on the server.

When I run a query, the data is pulled and written into an r object (see test_results below). However, I also get this error message:

Info: (Data scanned: 625.16 GB)
additional arguments ignored in warning()
Warning: AccessDenied (HTTP 403). Access Denied

Which is weird since the data are actually pulled and the connection to the database is successfully established. Any idea why this message is popping up?

As always, any insights would be great!

Reproducible Example

Example:

library(RJDBC)
library(glue)
library(tidyverse)
library(magrittr)

library(rlang)
library(noctua)
library(DBI)
library(paws.common)
library(paws)

con_3 <- dbConnect(noctua::athena(),
                 profile_name = "profile",
                 s3_staging_dir = "s3://aws-athena-query-results-XXXXXXXXXX-us-east-1/", 
                 region = "us-east-1")

test_query <- glue_sql("select event_date, count(x) as x_count from logs.x group by 1 limit 10;", .con = con_3)

# make sure all results are returned from athena

setMethod("dbGetQuery", signature(conn="JDBCConnection", statement="character"),  def=function(conn, statement, ...) {
r <- dbSendQuery(con, test_query)
on.exit(.jcall(r@stat, "V", "close"))
if (conn@jc %instanceof% "com.amazonaws.athena.jdbc.AthenaConnection") fetch(r, -1, 999) # Athena can only pull 999 rows at a time
else fetch(r, -1)
})

test_results <- dbGetQuery(conn = con_3, statement = test_query)
Session Info
> devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.3 (2019-03-11)
 os       macOS  10.15.3              
 system   x86_64, darwin15.6.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/New_York            
 date     2020-06-12Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package        * version  date       lib source        
 assertthat       0.2.1    2019-03-21 [2] CRAN (R 3.5.2)
 backports        1.1.7    2020-05-13 [1] CRAN (R 3.5.3)
 bit              1.1-15.1 2020-01-14 [2] CRAN (R 3.5.2)
 bit64            0.9-7    2017-05-08 [2] CRAN (R 3.5.0)
 blob             1.2.1    2020-01-20 [2] CRAN (R 3.5.2)
 broom            0.5.6    2020-04-20 [1] CRAN (R 3.5.3)
 callr            3.4.3    2020-03-28 [1] CRAN (R 3.5.3)
 cellranger       1.1.0    2016-07-27 [2] CRAN (R 3.5.0)
 cli              2.0.2    2020-02-28 [1] CRAN (R 3.5.2)
 colorspace       1.4-1    2019-03-18 [1] CRAN (R 3.5.2)
 crayon           1.3.4    2017-09-16 [2] CRAN (R 3.5.0)
 curl             4.3      2019-12-02 [1] CRAN (R 3.5.2)
 data.table       1.12.8   2019-12-09 [1] CRAN (R 3.5.2)
 DBI            * 1.1.0    2019-12-15 [2] CRAN (R 3.5.2)
 dbplyr           1.4.4    2020-05-27 [1] CRAN (R 3.5.3)
 desc             1.2.0    2018-05-01 [2] CRAN (R 3.5.0)
 devtools         2.3.0    2020-04-10 [1] CRAN (R 3.5.3)
 digest           0.6.25   2020-02-23 [1] CRAN (R 3.5.2)
 dplyr          * 0.8.5    2020-03-07 [1] CRAN (R 3.5.2)
 ellipsis         0.3.1    2020-05-15 [1] CRAN (R 3.5.3)
 fansi            0.4.1    2020-01-08 [2] CRAN (R 3.5.2)
 forcats        * 0.5.0    2020-03-01 [1] CRAN (R 3.5.2)
 fs               1.4.1    2020-04-04 [1] CRAN (R 3.5.3)
 generics         0.0.2    2018-11-29 [1] CRAN (R 3.5.0)
 ggplot2        * 3.3.0    2020-03-05 [1] CRAN (R 3.5.2)
 glue           * 1.4.1    2020-05-13 [1] CRAN (R 3.5.3)
 gt             * 0.2.1    2020-05-23 [1] CRAN (R 3.5.3)
 gtable           0.3.0    2019-03-25 [2] CRAN (R 3.5.3)
 haven            2.3.0    2020-05-24 [1] CRAN (R 3.5.3)
 hms              0.5.3    2020-01-08 [1] CRAN (R 3.5.2)
 httr             1.4.1    2019-08-05 [1] CRAN (R 3.5.2)
 ini              0.3.1    2018-05-20 [2] CRAN (R 3.5.0)
 jsonlite         1.6.1    2020-02-02 [1] CRAN (R 3.5.2)
 knitr            1.28     2020-02-06 [1] CRAN (R 3.5.2)
 lattice          0.20-38  2018-11-04 [2] CRAN (R 3.5.3)
 lifecycle        0.2.0    2020-03-06 [1] CRAN (R 3.5.2)
 lubridate        1.7.8    2020-04-06 [1] CRAN (R 3.5.3)
 magrittr       * 1.5      2014-11-22 [2] CRAN (R 3.5.0)
 memoise          1.1.0    2017-04-21 [2] CRAN (R 3.5.0)
 modelr           0.1.8    2020-05-19 [1] CRAN (R 3.5.3)
 munsell          0.5.0    2018-06-12 [2] CRAN (R 3.5.0)
 nlme             3.1-143  2019-12-10 [2] CRAN (R 3.5.2)
 noctua         * 1.7.0    2020-05-14 [1] CRAN (R 3.5.3)
 packrat          0.5.0    2018-11-14 [2] CRAN (R 3.5.0)
 paws           * 0.1.8    2020-04-28 [1] CRAN (R 3.5.3)
 paws.analytics   0.1.8    2020-04-26 [1] CRAN (R 3.5.3)
 paws.common    * 0.3.1    2020-04-26 [1] CRAN (R 3.5.3)
 paws.storage     0.1.8    2020-04-27 [1] CRAN (R 3.5.3)
 pillar           1.4.3    2019-12-20 [1] CRAN (R 3.5.2)
 pkgbuild         1.0.6    2019-10-09 [2] CRAN (R 3.5.2)
 pkgconfig        2.0.3    2019-09-22 [2] CRAN (R 3.5.2)
 pkgload          1.0.2    2018-10-29 [2] CRAN (R 3.5.0)
 prettyunits      1.1.1    2020-01-24 [2] CRAN (R 3.5.2)
 processx         3.4.2    2020-02-09 [1] CRAN (R 3.5.2)
 ps               1.3.2    2020-02-13 [1] CRAN (R 3.5.2)
 purrr          * 0.3.4    2020-04-17 [1] CRAN (R 3.5.3)
 R6               2.4.1    2019-11-12 [1] CRAN (R 3.5.2)
 Rcpp             1.0.4.6  2020-04-09 [1] CRAN (R 3.5.3)
 readr          * 1.3.1    2018-12-21 [1] CRAN (R 3.5.0)
 readxl           1.3.1    2019-03-13 [1] CRAN (R 3.5.2)
 remotes          2.1.1    2020-02-15 [1] CRAN (R 3.5.2)
 reprex           0.3.0    2019-05-16 [2] CRAN (R 3.5.2)
 rJava          * 0.9-11   2019-03-29 [2] CRAN (R 3.5.2)
 RJDBC          * 0.2-7.1  2018-04-16 [2] CRAN (R 3.5.0)
 rlang          * 0.4.6    2020-05-02 [1] CRAN (R 3.5.3)
 RPostgreSQL      0.6-2    2017-06-24 [2] CRAN (R 3.5.0)
 rprojroot        1.3-2    2018-01-03 [2] CRAN (R 3.5.0)
 rstudioapi       0.11     2020-02-07 [1] CRAN (R 3.5.2)
 rvest            0.3.5    2019-11-08 [1] CRAN (R 3.5.2)
 scales           1.1.0    2019-11-18 [2] CRAN (R 3.5.2)
 sessioninfo      1.1.1    2018-11-05 [2] CRAN (R 3.5.0)
 stringi          1.4.6    2020-02-17 [1] CRAN (R 3.5.2)
 stringr        * 1.4.0    2019-02-10 [1] CRAN (R 3.5.2)
 testthat         2.3.2    2020-03-02 [1] CRAN (R 3.5.2)
 tibble         * 3.0.1    2020-04-20 [1] CRAN (R 3.5.3)
 tidyr          * 1.0.2    2020-01-24 [1] CRAN (R 3.5.2)
 tidyselect       1.0.0    2020-01-27 [2] CRAN (R 3.5.3)
 tidyverse      * 1.3.0    2019-11-21 [1] CRAN (R 3.5.2)
 usethis          1.6.1    2020-04-29 [2] CRAN (R 3.5.3)
 vctrs            0.3.0    2020-05-11 [1] CRAN (R 3.5.3)
 withr            2.2.0    2020-04-20 [1] CRAN (R 3.5.3)
 xfun             0.12     2020-01-13 [1] CRAN (R 3.5.2)
 xml2             1.3.2    2020-04-23 [1] CRAN (R 3.5.3)
@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Jun 12, 2020

Hi @mraess yes I know what is happening :) It is nothing to be worries about.

noctua tries to keep the s3_staging_dir "tidy". The way it does this, is it tries to delete the AWS Athena Query output. However if the user doesn't have permission to do this, it returns the error as a warning so that it doesn't prevent the user using AWS Athena.

If you don't like the warning message all the time. Try query caching :D

This won't "tidy" the s3_staging_dir. Plus you have the extra benefit of repeat querying, which is quicker and cheaper as it doesn't need to call AWS Athena again for the same query :D https://dyfanjones.github.io/noctua/articles/aws_athena_query_caching.html

library(DBI)
library(noctua)

con = dbConnect(athena())

noctua_options(cache_size = 10)

dbGetQuery(con, "select * from iris")

Sorry for any confusion it has caused. Side note noctua now supports returning query in chunks (just released I need to update the documentations :S )

library(DBI)
library(noctua)

con = dbConnect(athena())

noctua_options(cache_size = 10)

res <- dbExecuteQuery(con, "select * from iris")

athena_iris <- dbFetch(res, 3000)

Just a quick heads up, the chunk method isn't the fastest if you want to get the entire data frame. I recommend the standard dbGetQuery as that will pull the data straight from AWS S3 :D

@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Jun 12, 2020

Please let me know if this resolves you issue.

P.s. thanks of such detailed ticket :D it was a great help understanding what was going on :D

@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Jun 12, 2020

Just thinking out loud, this behaviour should really be documented

@DyfanJones DyfanJones self-assigned this Jun 12, 2020
@mraess
Copy link
Author

@mraess mraess commented Jun 15, 2020

Hey @DyfanJones , Thanks for the swift response as always! I'll definitely give this a shot today to see if that resolves the issue and then I'll follow up with you.

@mraess
Copy link
Author

@mraess mraess commented Jun 15, 2020

@DyfanJones that did the trick! It's working now locally and on the server. We can close this out. Thanks for putting it into the documentation!

@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Jun 15, 2020

@mraess That is perfect. I will leave this open for the time being so I can get some time to add the documentation

DyfanJones added a commit that referenced this issue Jun 16, 2020
DyfanJones added a commit that referenced this issue Jun 16, 2020
dbClearResult behaviour documentation #96
@DyfanJones
Copy link
Owner

@DyfanJones DyfanJones commented Jun 16, 2020

closing ticket as PR #97 has been merged

@DyfanJones DyfanJones closed this Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.