Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sqllab]More granular sqllab logging #5736

Merged
merged 4 commits into from Sep 11, 2018

Conversation

timifasubaa
Copy link
Contributor

@timifasubaa timifasubaa commented Aug 26, 2018

This PR improves how sqllab logs its performance by logging at a more granular level primarily for asynchronous queries.

It logs the
(for asynchroous queries) time spent pending
time spent in the database
time spent writing to results_backend
time spent time spent reading from results_backend

This PR depends on #5844

@mistercrunch @john-bodley



def upgrade():
op.alter_column('query', 'start_running_time', nullable=False, new_column_name='time_at_db')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table tends to have hanging locks and is hard to alter. I've had issue every time I've had to alter this model where it just completely lock up everything. We'd all need to schedule downtime to allow for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use some sort of tracing instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mistercrunch Can you flesh this out a little more or provide an example? I'll be happy to investigate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean just using statsd.timing(DURATION)

@timifasubaa timifasubaa force-pushed the more_granular_sqllab_logging branch 4 times, most recently from a7fc4c0 to b389888 Compare September 10, 2018 09:10
@timifasubaa timifasubaa force-pushed the more_granular_sqllab_logging branch 2 times, most recently from f7e5a4f to 8bf7603 Compare September 10, 2018 17:49
@timifasubaa timifasubaa changed the title [WIP][sqllab]More granular sqllab logging [sqllab]More granular sqllab logging Sep 10, 2018
@timifasubaa timifasubaa force-pushed the more_granular_sqllab_logging branch 2 times, most recently from f4b25d2 to 0e2af38 Compare September 10, 2018 19:14
@codecov-io
Copy link

codecov-io commented Sep 10, 2018

Codecov Report

Merging #5736 into master will decrease coverage by <.01%.
The diff coverage is 37.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5736      +/-   ##
==========================================
- Coverage   63.75%   63.74%   -0.01%     
==========================================
  Files         374      374              
  Lines       23320    23330      +10     
  Branches     2608     2608              
==========================================
+ Hits        14867    14872       +5     
- Misses       8440     8445       +5     
  Partials       13       13
Impacted Files Coverage Δ
superset/views/core.py 74.22% <0%> (-0.12%) ⬇️
superset/sql_lab.py 71.42% <50%> (-0.36%) ⬇️
superset/assets/src/SqlLab/actions.js 70.66% <0%> (-0.59%) ⬇️
superset/viz.py 81.44% <0%> (+0.01%) ⬆️
superset/stats_logger.py 64.44% <0%> (+2.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 73db918...e479606. Read the comment docs.

@timifasubaa timifasubaa force-pushed the more_granular_sqllab_logging branch 3 times, most recently from 60c7fbb to eeeca47 Compare September 11, 2018 01:29
Copy link
Member

@john-bodley john-bodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the statsD key it seems like they should potentially by more descriptive, i.e., start with the sqllab prefix so rather than time_pending it could be something like sqllab.query.time_pending. You can look at Datadog for examples of how other applications define their keys.

db_engine_spec.execute(cursor, query.executed_sql, async_=True)
logging.info('Handling cursor')
db_engine_spec.handle_cursor(cursor, query, session)
logging.info('Fetching data: {}'.format(query.to_dict()))
data = db_engine_spec.fetch_data(cursor, query.limit)
stats_logger.timing('time_in_database', utils.now_as_float() - query_start_time)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's a better (more descriptive) name than time_in_database? May time_executing_query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOne

@timifasubaa
Copy link
Contributor Author

That makes sense. I updated the names.

@timifasubaa timifasubaa merged commit 9a4bba4 into apache:master Sep 11, 2018
timifasubaa added a commit to airbnb/superset-fork that referenced this pull request Sep 13, 2018
* quote hive column names (apache#5368)

* create db migration

* use stats_logger timing

* trigger build

(cherry picked from commit 9a4bba4)
betodealmeida pushed a commit to lyft/incubator-superset that referenced this pull request Oct 12, 2018
* quote hive column names (apache#5368)

* create db migration

* use stats_logger timing

* trigger build
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
* quote hive column names (apache#5368)

* create db migration

* use stats_logger timing

* trigger build
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.28.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.28.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants