-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-7008] Add perf kit with common used decorators/contexts #7650
Conversation
5935904
to
58b3822
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic @mik-laj ! I love it. I think there should be some short document describing how to use those. But I am ok with merging it as it is and adding a document later.
Some static checks need to be fixed as well :) |
I want to add documentation in this PR, but for now I wanted to take the first step. This PR is still WIP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the sql alchemy bit here should we remove/update the previous query trace scripts?
Previous scripts track SQL queries for many processes and for all code. This script allows you to trace only part of the code and doesn't work properly with many processes. We also cannot share the code, because perf_kit is a separate package to be able to monitor the Airflow initialization process. |
Codecov Report
@@ Coverage Diff @@
## master #7650 +/- ##
==========================================
- Coverage 88.45% 87.68% -0.77%
==========================================
Files 937 940 +3
Lines 45234 45355 +121
==========================================
- Hits 40011 39770 -241
- Misses 5223 5585 +362
Continue to review full report at Codecov.
|
I added documentation. Does someone want to look at these changes? That would be helpful. |
Should we add some document to |
@zhongjiajie I am afraid that this file will be too long if we add any information. In my opinion, the CONTRIBUTOR Guide should contain the most important information for beginners. Scheduler optimization is not a key task for these people. I would like to see a guidebook that describes more advanced information on the model of a contributor's guide, but advanced people are already quite familiar with the structure of the project. WDYT? |
Make sence, but I still think we should add doc somewhere else, not only in python docsting |
BTW, I find out we have some scheduler doc only in confluence https://cwiki.apache.org/confluence/display/AIRFLOW/Scheduler+Basics, should we add this page to airflow document too |
I will add a short note in the file TESTING.rst. |
I saw this article, but I'm afraid we should check and update it first. I started to write internal materials for my company. I started to write internal materials for my company. I hope that some excerpts can be published on the company blog sometime. |
Looking forward your public blog, BTW, code&doc LGTM, except TESTING.rst what you mention above |
# Example 2: | ||
REPEAT_COUNT = 5 | ||
|
||
@timing(REPEAT_COUNT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timing(REPEAT_COUNT) |
Timing is here twice - don't think you want this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In this case, the average for all runs will be calculated.
scripts/perf/perf_kit/__init__.py
Outdated
Personally, I also have a separate file - a notebook in which I save various test cases and run them | ||
like a classic Python program. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the third person here or something like "Having a separate file to save various tests cases can be helpful"
Co-Authored-By: Kaxil Naik <kaxilnaik@gmail.com>
scripts/perf/perf_kit/sqlalchemy.py
Outdated
|
||
|
||
@contextlib.contextmanager | ||
def trace_queries(display_time=True, display_trace=True, display_sql=False, displaay_parameters=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def trace_queries(display_time=True, display_trace=True, display_sql=False, displaay_parameters=True): | |
def trace_queries(display_time=True, display_trace=True, display_sql=False, display_parameters=True): |
scripts/perf/perf_kit/sqlalchemy.py
Outdated
:param display_time: If True, displays the query execution time. | ||
:param display_trace: If True, displays the simplified (one-line) stack trace | ||
:param display_sql: If True, displays the SQL statements | ||
:param displaay_parameters: If True, display SQL statement parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:param displaay_parameters: If True, display SQL statement parameters | |
:param display_parameters: If True, display SQL statement parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just few minor doc related changes suggested
Co-Authored-By: Kaxil Naik <kaxilnaik@gmail.com>
Running py-spy inside of a docker container will also usually bring up a permissions denied error | ||
even when running as root. | ||
|
||
This error is caused by docker restricting the process_vm_readv system call we are using. This can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to surface the solution as part of the error message to user in case we run into the permission error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be difficult because this error happens in the subprocess. It would be necessary to check whether this process started correctly, which will affect the result. When we run the process, we immediately need to start executing the observed code. Otherwise, we will have junk data in the diagram.
Thanks for support to @evgenyshulman from Databand!
Issue link: AIRFLOW-7008
Make sure to mark the boxes below before creating PR: [x]
[AIRFLOW-NNNN]
. AIRFLOW-NNNN = JIRA ID** For document-only changes commit message can start with
[AIRFLOW-XXXX]
.In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.