Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-7008] Add perf kit with common used decorators/contexts #7650

Merged
merged 8 commits into from
May 4, 2020

Conversation

mik-laj
Copy link
Member

@mik-laj mik-laj commented Mar 7, 2020

Thanks for support to @evgenyshulman from Databand!


Issue link: AIRFLOW-7008

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic @mik-laj ! I love it. I think there should be some short document describing how to use those. But I am ok with merging it as it is and adding a document later.

@potiuk
Copy link
Member

potiuk commented Mar 8, 2020

Some static checks need to be fixed as well :)

@mik-laj mik-laj requested a review from potiuk March 8, 2020 03:38
@mik-laj
Copy link
Member Author

mik-laj commented Mar 8, 2020

I want to add documentation in this PR, but for now I wanted to take the first step. This PR is still WIP.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the sql alchemy bit here should we remove/update the previous query trace scripts?

@mik-laj
Copy link
Member Author

mik-laj commented Mar 8, 2020

Previous scripts track SQL queries for many processes and for all code. This script allows you to trace only part of the code and doesn't work properly with many processes. We also cannot share the code, because perf_kit is a separate package to be able to monitor the Airflow initialization process.

@codecov-io
Copy link

codecov-io commented Mar 9, 2020

Codecov Report

Merging #7650 into master will decrease coverage by 0.76%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7650      +/-   ##
==========================================
- Coverage   88.45%   87.68%   -0.77%     
==========================================
  Files         937      940       +3     
  Lines       45234    45355     +121     
==========================================
- Hits        40011    39770     -241     
- Misses       5223     5585     +362     
Impacted Files Coverage Δ
...flow/providers/apache/cassandra/hooks/cassandra.py 21.25% <0.00%> (-72.50%) ⬇️
...w/providers/apache/hive/operators/mysql_to_hive.py 35.84% <0.00%> (-64.16%) ⬇️
airflow/kubernetes/volume_mount.py 44.44% <0.00%> (-55.56%) ⬇️
airflow/providers/postgres/operators/postgres.py 47.82% <0.00%> (-52.18%) ⬇️
airflow/providers/redis/operators/redis_publish.py 50.00% <0.00%> (-50.00%) ⬇️
airflow/kubernetes/volume.py 52.94% <0.00%> (-47.06%) ⬇️
airflow/providers/mongo/sensors/mongo.py 53.33% <0.00%> (-46.67%) ⬇️
airflow/kubernetes/pod_launcher.py 47.18% <0.00%> (-45.08%) ⬇️
airflow/providers/mysql/operators/mysql.py 55.00% <0.00%> (-45.00%) ⬇️
airflow/providers/redis/sensors/redis_key.py 61.53% <0.00%> (-38.47%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 45c8983...82205bc. Read the comment docs.

@mik-laj mik-laj changed the title [AIRFLOW-7008][WIP] Add perf kit with common used decorators/contexts [AIRFLOW-7008] Add perf kit with common used decorators/contexts Apr 14, 2020
@mik-laj
Copy link
Member Author

mik-laj commented Apr 14, 2020

I added documentation. Does someone want to look at these changes? That would be helpful.
CC: @zhongjiajie

@zhongjiajie
Copy link
Member

Should we add some document to CONTRIBUTING.rst too?

@mik-laj
Copy link
Member Author

mik-laj commented Apr 15, 2020

@zhongjiajie I am afraid that this file will be too long if we add any information. In my opinion, the CONTRIBUTOR Guide should contain the most important information for beginners. Scheduler optimization is not a key task for these people. I would like to see a guidebook that describes more advanced information on the model of a contributor's guide, but advanced people are already quite familiar with the structure of the project. WDYT?

@zhongjiajie
Copy link
Member

In my opinion, the CONTRIBUTOR Guide should contain the most important information for beginners.

Make sence, but I still think we should add doc somewhere else, not only in python docsting

@zhongjiajie
Copy link
Member

BTW, I find out we have some scheduler doc only in confluence https://cwiki.apache.org/confluence/display/AIRFLOW/Scheduler+Basics, should we add this page to airflow document too

@mik-laj
Copy link
Member Author

mik-laj commented Apr 15, 2020

I will add a short note in the file TESTING.rst.

@mik-laj
Copy link
Member Author

mik-laj commented Apr 15, 2020

I saw this article, but I'm afraid we should check and update it first. I started to write internal materials for my company. I started to write internal materials for my company. I hope that some excerpts can be published on the company blog sometime.

@zhongjiajie
Copy link
Member

zhongjiajie commented Apr 15, 2020

Looking forward your public blog, BTW, code&doc LGTM, except TESTING.rst what you mention above

# Example 2:
REPEAT_COUNT = 5

@timing(REPEAT_COUNT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@timing(REPEAT_COUNT)

Timing is here twice - don't think you want this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In this case, the average for all runs will be calculated.

Comment on lines 117 to 118
Personally, I also have a separate file - a notebook in which I save various test cases and run them
like a classic Python program.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the third person here or something like "Having a separate file to save various tests cases can be helpful"

Co-Authored-By: Kaxil Naik <kaxilnaik@gmail.com>


@contextlib.contextmanager
def trace_queries(display_time=True, display_trace=True, display_sql=False, displaay_parameters=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def trace_queries(display_time=True, display_trace=True, display_sql=False, displaay_parameters=True):
def trace_queries(display_time=True, display_trace=True, display_sql=False, display_parameters=True):

:param display_time: If True, displays the query execution time.
:param display_trace: If True, displays the simplified (one-line) stack trace
:param display_sql: If True, displays the SQL statements
:param displaay_parameters: If True, display SQL statement parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:param displaay_parameters: If True, display SQL statement parameters
:param display_parameters: If True, display SQL statement parameters

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just few minor doc related changes suggested

mik-laj and others added 2 commits April 22, 2020 05:06
Running py-spy inside of a docker container will also usually bring up a permissions denied error
even when running as root.

This error is caused by docker restricting the process_vm_readv system call we are using. This can be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to surface the solution as part of the error message to user in case we run into the permission error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be difficult because this error happens in the subprocess. It would be necessary to check whether this process started correctly, which will affect the result. When we run the process, we immediately need to start executing the observed code. Otherwise, we will have junk data in the diagram.

@mik-laj mik-laj merged commit aec768b into apache:master May 4, 2020
@mik-laj mik-laj deleted the AIRFLOW-7008 branch May 4, 2020 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants