Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Almost random: InvalidRequestException (HTTP 400). Idempotent parameters do not match / Could not find results #104

Closed
magnussv opened this issue Sep 21, 2020 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@magnussv
Copy link

Hello Dyfan,
Lately I had problems with RAthena so I turned to noctua instead and I noticed some peculiar, almost random, issues.

Issue Description

When running the code below then I usually get Error: InvalidRequestException (HTTP 400). Idempotent parameters do not match or sometimes Error: InvalidRequestException (HTTP 400). Could not find results HOWEVER about 1 of 5 times I retrieve the desired result. It almost seems random. Issue occurs in both R Console and in Rstudio. As a side note: If I for example use library(aws.s3); bucketlist() then I always retrieves the desired result, so I know that I'm assuming the role correctly.

Reproducible Example

Example:

library(DBI)
library(noctua)

noctua::assume_role(
role_arn = "arn:aws:iam::666666666:role/ARole",
duration_seconds = 3600L,
region_name = "eu-west-1",
set_env = TRUE)

con <- dbConnect(noctua::athena(), s3_staging_dir = "s3://aws-athena-query-results-eu-west-1-328396111111")

dbGetQuery(con, "SELECT row_names, mpg, cyl, disp FROM adhoc.mtcars limit 10")

Screenshot:

Screenshot 2020-09-21 at 20 34 33

Traceback:

dbGetQuery(con, "SELECT row_names, mpg, cyl, disp FROM adhoc.mtcars limit 10")
Error: InvalidRequestException (HTTP 400). Idempotent parameters do not match

> traceback()
8: stop(resp)
7: retry_api_call(response <- conn@ptr$Athena$start_query_execution(QueryString = statement, 
      QueryExecutionContext = list(Database = conn@info$dbms.name), 
      ResultConfiguration = ResultConfiguration(conn), WorkGroup = conn@info$work_group))
6: AthenaResult(conn = conn, statement = statement, s3_staging_dir = s3_staging_dir)
5: dbSendQuery(conn, statement = statement)
4: dbSendQuery(conn, statement = statement)
3: .local(conn, statement, ...)
2: dbGetQuery(con, "SELECT * FROM adhoc.mtcars limit 10")
1: dbGetQuery(con, "SELECT * FROM adhoc.mtcars limit 10")


dbGetQuery(con, "SELECT row_names, mpg, cyl, disp FROM adhoc.mtcars limit 10")
Error: InvalidRequestException (HTTP 400). Could not find results

> traceback()
7: stop(resp)
6: retry_api_call(result_class <- res@connection@ptr$Athena$get_query_results(QueryExecutionId = res@info$QueryExecutionId, 
      MaxResults = as.integer(1))$ResultSet$ResultSetMetadata$ColumnInfo)
5: dbFetch(res = rs, n = -1, ...)
4: dbFetch(res = rs, n = -1, ...)
3: .local(conn, statement, ...)
2: dbGetQuery(con, "SELECT * FROM adhoc.mtcars limit 10")
1: dbGetQuery(con, "SELECT * FROM adhoc.mtcars limit 10")

               
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.5

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] noctua_1.8.0 DBI_1.0.0   

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2                   paws.analytics_0.1.9         digest_0.6.20                zeallot_0.1.0                crayon_1.3.4                
 [6] paws.common_0.3.3            R6_2.4.0                     jsonlite_1.6                 backports_1.1.4              httr_1.4.0                  
[11] pillar_1.4.2                 rlang_0.4.0                  curl_4.0                     data.table_1.13.0            ini_0.3.1                   
[16] paws_0.1.9                   xml2_1.2.1                   vctrs_0.2.0                  paws.security.identity_0.1.9 readr_1.3.1                 
[21] paws.storage_0.1.9           hms_0.5.0                    compiler_3.6.1               pkgconfig_2.0.2              tibble_2.1.3      
@DyfanJones
Copy link
Owner

Hi @magnussv, really sorry to here that you are having issues with RAthena and noctua . This is the first time i have come across this type of error. But from a quick google it looks similar to this: ankane/blazer#234 . I will have a look around to see if there is simple solution :)

@magnussv
Copy link
Author

Thanks @DyfanJones ! I found ankane/blazer#234 when I googled as well, so I tried to run many (new/different) queries but with the same type of result on all of them. I didn't fully understand that issue but the title said "does not allow you to run same query multiple times". :-)

@DyfanJones
Copy link
Owner

Hi @magnussv, i just want to check some details with you. Does mtcars exist in 2 schema's? For example default.mtcars and adhoc.mtcars?

@DyfanJones
Copy link
Owner

Plus can you let me know how you connect to athena? I am wondering if there is a problem with the defualt schema. I.e

You connect to the default, but you are querying the adhoc schema for example.

con <- dbConnect(noctua:: athena())`

dbGetQuery(con, "select * from adhoc.mtcars")

@DyfanJones
Copy link
Owner

DyfanJones commented Sep 21, 2020

This looks like a possible solution: https://paws-r.github.io/docs/athena/start_query_execution.html.

The parameter ClientRequestToken :

A unique case-sensitive string used to ensure the request to create the query is idempotent (executes only once). If another StartQueryExecution request is received, the same response is returned and another query is not created. If a parameter has changed, for example, the QueryString, an error is returned.This token is listed as not required because AWS SDKs (for example the AWS SDK for Java) auto-generate the token for users. If you are not using the AWS SDK or the AWS CLI, you must provide this token or the action will fail.

Could be the issue. I will need to read some boto3, documentation in how to handle this(any suggestions welcome).

In the meantime can you try caching: https://dyfanjones.github.io/noctua/articles/aws_athena_query_caching.html and let me know if this persists.

(Sorry for the mountain of questions please take your time to answer them :) )

@DyfanJones DyfanJones added the bug Something isn't working label Sep 22, 2020
@DyfanJones DyfanJones self-assigned this Sep 22, 2020
@DyfanJones
Copy link
Owner

From investigating further it looks like there are 2 possible issues going on here.

  • Error: InvalidRequestException (HTTP 400). Idempotent parameters do not match: seems to be an issue with start_query_execution function. The authors of paws have done a great job in integrating R with AWS and usually the parameter ClientRequestToken doesn't need to be provided however for what ever reason it is throwing this error. I believe using a uuid string for the ClientRequestToken should help to fix this.
  • Error: InvalidRequestException (HTTP 400). Could not find results: is caused when Athena is looking for the result of a repeat query. If that query has previously been removed from s3 then Athena throws an error. This can be replicated with forcing the ClientRequestToken as digest::digest(paste(statement, conn@info$dbms.name, sep="/"), algo = "md5") (replicating Athena Adapter - client_request_token hash does not allow you to run same query multiple times ankane/blazer#234 solution). By using digest::digest(paste(statement, conn@info$dbms.name, sep="/"), algo = "md5") the QueryExecutionId is always the same for repeat queries. As noctua tries tidy up after it's self (when caching is turned off) the data of past queries is deleted. I believe using uuid string for ClientRequestToken should resolve this as well.

@DyfanJones
Copy link
Owner

Hi @magnussv can please try out the development branch "start_query_execution"

remotes::install_github("dyfanjones/noctua", ref = "start_query_execution")

Please let me know if the error continues to persist.

@magnussv
Copy link
Author

@DyfanJones thanks for your quick response. You find answers to your initial questions below.

Hi @magnussv, i just want to check some details with you. Does mtcars exist in 2 schema's? For example default.mtcars and adhoc.mtcars?

mtcars exist only in adhoc. The default is empty. It seems like it doesn't matter which table I'm querying, I get the same issues independent of the table (even with tables with unique names across all databases).

Plus can you let me know how you connect to athena? I am wondering if there is a problem with the defualt schema. I.e

You connect to the default, but you are querying the adhoc schema for example.

con <- dbConnect(noctua:: athena())`

dbGetQuery(con, "select * from adhoc.mtcars")

I'm using credentials from a local ~/.aws/credentials with [default] profile; I have tried with others as well. I have also tried to connect to a specific schema as well, with the same result.

In the meantime can you try caching: https://dyfanjones.github.io/noctua/articles/aws_athena_query_caching.html and let me know if this persists.

I have tried with noctua_options(cache_size = 0), noctua_options(cache_size = 10), noctua_options(cache_size = 99) (note: = 100 gives an error), noctua_options() and noctua_options(clear_cache = T) but the problem persists.

Hi @magnussv can please try out the development branch "start_query_execution"

remotes::install_github("dyfanjones/noctua", ref = "start_query_execution")

Please let me know if the error continues to persist.

I will try it and come back to you!

@magnussv
Copy link
Author

Hi @magnussv can please try out the development branch "start_query_execution"

remotes::install_github("dyfanjones/noctua", ref = "start_query_execution")

Please let me know if the error continues to persist.

@DyfanJones I can confirm that this resolved the issues! I ran around 100 queries and all returned the expected result. Great! Thanks for the instant correction. :-) 👍

@DyfanJones
Copy link
Owner

@magnussv oh brilliant. I didn't know what to do next if that failed 😆 I will merge the branch after I have tweaked the caching option to allow 100 distinct queries (thanks for point that out to me :D )

DyfanJones added a commit that referenced this issue Sep 22, 2020
bug fix: pass uuid string to `ClientRequestToken` #104
@DyfanJones
Copy link
Owner

noctua_1.8.1 has been submitted to the cran.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants