Timeouts when running with paws.database 0.1.10 #132

davidski · 2021-01-24T19:13:52Z

Issue Description

Under noctua 1.10.0, going to paws.database 0.1.10 seems to cause curl timeouts when using the dplyr interface to queries.

(Semi-)Reproducible Example

Generating a clean reprex is tricky, but I have a local query managed under renv that reliably replicates the problem. Here is a redacted query (hitting an Apache log store in parquet format) that demonstrates the problem:

Error under paws.database 1.10.0

 > con <- dbConnect(noctua::athena(),
                  profile_name = "REDACTED",
                  region = "us-east-2",
                  s3_staging_dir = 's3://REDACTED',
                  work_group = "REDACTED")
> query <- str_glue("
   SELECT date_parse(timestamp, '%d/%b/%Y:%H:%i:%s +0000') AS timestamp,
          verb, request, response, CAST(bytes as integer) AS bytes, referrer, agent
   FROM REDACTED")
> dat <- tbl(con, sql(query)) %>% collect()
Info: (Data scanned: 14.09 MB)
Error in curl::curl_fetch_memory(url, handle = handle): Timeout was reached: [REDACTED.s3.us-east-2.amazonaws.com] Operation timed out after 10000 milliseconds with 119006796 out of 316782524 bytes received
Request failed. Retrying in 0.7 seconds...
Error in curl::curl_fetch_memory(url, handle = handle): Timeout was reached: [REDACTED.s3.us-east-2.amazonaws.com] Operation timed out after 10000 milliseconds with 112822860 out of 316782524 bytes received
Request failed. Retrying in 2.2 seconds...

If left to run, the query goes through exponential back-off and eventually fails.

Running the same query under paws.database 0.1.9 works without issue. noctua 1.9.1 also hits this problem, so this seems to be something in the interface with paws (or maybe even a problem with paws itself).

Really appreciate the package. If there's a better way to help debug this, please let me know!

The text was updated successfully, but these errors were encountered:

DyfanJones · 2021-01-24T19:19:12Z

Hi @davidski thanks for letting me know. Does this error happen with the DBI interface? Or is it just the dplyr interface?

I will have a little look at paws.database to see what the change could be and how to fix it :)

davidski · 2021-01-24T21:15:09Z

Thanks for the quick response! Yes, this problem occurs when making a DBI-style query as well. ☹️

DyfanJones · 2021-01-25T09:29:28Z

@davidski how long did this query run before it timed out?

DyfanJones · 2021-01-25T11:08:14Z

noctua utilises the services of aws athena, s3 and glue. This corresponds to paws.analytics and paws.storage. To find out what is the most likely culprit of this error will break down the dbGetQuery call.

dbGetQuery breakdown:

dbGetQuery calls the following noctua methods:

dbSendQuery
dbStatistics
dbFetch
dbClearResult

dbSendQuery calls:

paws.analytics::athena -> start_query_execution. This is to start an AWS Athena query.

dbStatistics calls:

paws.analytics::athena -> get_query_execution. Get memory usage from Athena.

dbFetch calls:

paws.analytics::athena -> get_query_execution. Get Athena execution Status
paws.analytics::athena -> get_query_results. Get Athena column class so that it can be passed back to file parsers
paws.storage::s3 -> get_object. Get Athena result

dbClearResult calls:

paws.analytics::athena -> get_query_execution. Get Athena execution Status
paws.storage::s3 -> delete_object. Remove Athena S3 result file from S3. Note only called when cache equals 0

DyfanJones · 2021-01-25T11:15:49Z

As the statistics of the query has been returned (Info: (Data scanned: 14.09 MB)) the best culprit would be dbFetch. I believe it is paws.storage::s3 -> get_object causing this issue. And a possible change to paws.common. I will update issue paws-r/paws#371 accordingly.

davidkretch · 2021-01-25T16:20:04Z

Thanks for diagnosing the bug, which we inappropriately introduced as a default timeout in the last release of paws.common. Sorry about that. The latest version (0.3.8) with a fix (no timeout) is now on CRAN.

DyfanJones · 2021-01-25T16:21:21Z

@davidkretch thanks again. I will close this ticket.

DyfanJones added bug Something isn't working sdk issue labels Jan 24, 2021

DyfanJones mentioned this issue Jan 25, 2021

paws.storage::get_object curl timingout paws-r/paws#371

Closed

DyfanJones closed this as completed Jan 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeouts when running with paws.database 0.1.10 #132

Timeouts when running with paws.database 0.1.10 #132

davidski commented Jan 24, 2021

DyfanJones commented Jan 24, 2021

davidski commented Jan 24, 2021

DyfanJones commented Jan 25, 2021

DyfanJones commented Jan 25, 2021 •

edited

DyfanJones commented Jan 25, 2021 •

edited

davidkretch commented Jan 25, 2021

DyfanJones commented Jan 25, 2021

Timeouts when running with paws.database 0.1.10 #132

Timeouts when running with paws.database 0.1.10 #132

Comments

davidski commented Jan 24, 2021

Issue Description

(Semi-)Reproducible Example

DyfanJones commented Jan 24, 2021

davidski commented Jan 24, 2021

DyfanJones commented Jan 25, 2021

DyfanJones commented Jan 25, 2021 • edited

dbGetQuery breakdown:

dbGetQuery calls the following noctua methods:

dbSendQuery calls:

dbStatistics calls:

dbFetch calls:

dbClearResult calls:

DyfanJones commented Jan 25, 2021 • edited

davidkretch commented Jan 25, 2021

DyfanJones commented Jan 25, 2021

DyfanJones commented Jan 25, 2021 •

edited

DyfanJones commented Jan 25, 2021 •

edited