Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verbose query statistics #67

Closed
ozhyrenkov opened this issue Jan 17, 2020 · 6 comments
Closed

Verbose query statistics #67

ozhyrenkov opened this issue Jan 17, 2020 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@ozhyrenkov
Copy link

Hello!
Thanks for your awesome package I managed to set dozen of automated scripts which use Athena quite intensive. So, thank you so much!

It's a feature request or\and idea for future work rather then issue.
dding the show_statistics extra param in dbGetQuery() or additional function dbGetQueryStatistics() that uses the QueryExecution['Statistics'] part of boto3 get_query_execution function's response.

Why it might be helpful? The pricing of Athena for now is $5 per 1TB of scanned data, and Athena itself is designed to query huge amounts of data. It might be helpful for managing the operations' costs in complex environments.

@DyfanJones
Copy link
Owner

I am really glad you are enjoying the package. I am happy to put these new features in the package

@DyfanJones DyfanJones added the enhancement New feature or request label Jan 17, 2020
@DyfanJones
Copy link
Owner

Currently dbGetQuery returns some of the Statistics from boto3 get_query_execution with the amount of Data Athena scanned. This should be fairly easy to implement.

@DyfanJones DyfanJones self-assigned this Jan 20, 2020
@DyfanJones
Copy link
Owner

DyfanJones commented Jan 20, 2020

Created initial implementation of returning statistics from queries to AWS Athena (https://github.com/DyfanJones/RAthena/tree/query-stats). New function dbStatistics and parameter statistics in dbGetQuery address this in the following way:

library(DBI)
library(RAthena)

con = dbConnect(athena())

# method 1:
res = dbSendQuery(con, "show databases")
dbStatistics(res)
dbClearResult(res)

# method 2:
dbGetQuery(con, "show databases", statistics = TRUE)

Will need to check unit tests to see if this has any implementations. Plus need to double check dplyr integration.

DyfanJones pushed a commit that referenced this issue Jan 20, 2020
A wrapper to return AWS Athena Statistics #67
@DyfanJones
Copy link
Owner

PR #68 passed all current unit tests

@ozhyrenkov
Copy link
Author

Thank You so much, it's working perfectly. Will share with you our use-cases in some time :)

@DyfanJones
Copy link
Owner

These features will be pushed to cran shortly

@DyfanJones DyfanJones mentioned this issue Jan 31, 2020
25 tasks
DyfanJones pushed a commit that referenced this issue Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants