Skip to content

Conversation

@FilipAndersson245
Copy link

@FilipAndersson245 FilipAndersson245 commented Apr 12, 2025

Summary

This PR adds a script to generate a pgo optimized binary of ruff, see #7055 for more detail.
I initially only implemented the script on x86_64-unknown-linux-gnu so people can try it out, if we think this is more interesting it should probably be added to the release pipeline. from what I can see you get a general 3-10% improvement of runtime speed then running the pgo-binaries, as well as a somewhat smaller binary size.

The script fetches a large popular python codebases and uses that as training data (see pgo_profile.json),

this is an example of the performance for a cold run on pytorch's source, where the pgo is 12% faster.

✦ ❯ hyperfine "./target/x86_64-unknown-linux-gnu/release/ruff check -e -n  clones/pytorch/" "./target/release/ruff check -e -n  clones/pytorch/" --warmup 100 -r 1000
Benchmark 1: ./target/x86_64-unknown-linux-gnu/release/ruff check -e -n  clones/pytorch/
  Time (mean ± σ):     169.4 ms ±   6.4 ms    [User: 2942.4 ms, System: 244.4 ms]
  Range (min … max):   151.0 ms … 212.7 ms    1000 runs

Benchmark 2: ./target/release/ruff check -e -n  clones/pytorch/
  Time (mean ± σ):     190.5 ms ±   7.5 ms    [User: 3641.0 ms, System: 239.7 ms]
  Range (min … max):   168.1 ms … 235.4 ms    1000 runs

Summary
  ./target/x86_64-unknown-linux-gnu/release/ruff check -e -n  clones/pytorch/ ran
    1.12 ± 0.06 times faster than ./target/release/ruff check -e -n  clones/pytorch/

Test Plan

I tested this by running
ensure you have cargo-pgo installed and configured.
run python ./scripts/pgo.py

@FilipAndersson245
Copy link
Author

@MichaReiser could you checkout if this is a valid start for pgo?

@dhruvmanila dhruvmanila added the performance Potential performance improvement label Apr 21, 2025
@dhruvmanila dhruvmanila requested a review from MichaReiser April 21, 2025 16:03
@github-actions
Copy link
Contributor

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+142 -1834 violations, +0 -0 fixes in 3 projects; 52 projects unchanged)

apache/airflow (+138 -1832 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL

- airflow-core/src/airflow/api/common/mark_tasks.py:106:10: AIR311 `airflow.models.dag.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api/common/mark_tasks.py:142:22: AIR311 `airflow.models.dag.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api/common/mark_tasks.py:195:10: AIR311 `airflow.models.dag.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api/common/mark_tasks.py:239:10: AIR311 `airflow.models.dag.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api/common/mark_tasks.py:325:10: AIR311 `airflow.models.dag.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api/common/mark_tasks.py:356:10: AIR311 `airflow.models.dag.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api/common/mark_tasks.py:87:32: AIR311 `airflow.models.dag.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api_fastapi/core_api/datamodels/dag_run.py:108:37: AIR311 `airflow.models.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api_fastapi/core_api/routes/public/assets.py:320:10: AIR311 `airflow.models.dag.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api_fastapi/core_api/routes/public/dag_run.py:159:10: AIR311 `airflow.models.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api_fastapi/core_api/routes/public/dag_run.py:263:10: AIR311 `airflow.models.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api_fastapi/core_api/routes/public/dag_run.py:337:14: AIR311 `airflow.models.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api_fastapi/core_api/routes/public/dag_run.py:401:14: AIR311 `airflow.models.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api_fastapi/core_api/routes/public/dag_versions.py:109:14: AIR311 `airflow.models.dag.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api_fastapi/core_api/routes/public/dags.py:170:10: AIR311 `airflow.models.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api_fastapi/core_api/routes/public/dags.py:197:10: AIR311 `airflow.models.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
- airflow-core/src/airflow/api_fastapi/core_api/routes/public/extra_links.py:58:10: AIR311 `airflow.models.DAG` is removed in Airflow 3.0; It still works in Airflow 3.0 but is expected to be removed in a future version.
... 1812 additional changes omitted for rule AIR311
+ providers/amazon/tests/system/amazon/aws/example_appflow.py:102:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_appflow_run.py:182:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_athena.py:159:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_azure_blob_to_s3.py:65:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_batch.py:258:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_bedrock.py:108:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_bedrock.py:137:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_bedrock.py:197:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_bedrock_batch_inference.py:167:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_bedrock_retrieve_and_generate.py:111:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_bedrock_retrieve_and_generate.py:112:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_bedrock_retrieve_and_generate.py:567:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_cloudformation.py:101:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_comprehend.py:118:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_comprehend.py:74:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_comprehend_document_classifier.py:109:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
+ providers/amazon/tests/system/amazon/aws/example_comprehend_document_classifier.py:198:5: AIR301 `airflow.models.baseoperator.chain` is removed in Airflow 3.0
... 118 additional changes omitted for rule AIR301
- providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py:141:17: PERF403 Use `dict.update` instead of a for-loop
+ providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py:141:17: PERF403 Use a dictionary comprehension instead of a for-loop
- providers/exasol/src/airflow/providers/exasol/hooks/exasol.py:70:17: PERF403 Use `dict.update` instead of a for-loop
+ providers/exasol/src/airflow/providers/exasol/hooks/exasol.py:70:17: PERF403 Use a dictionary comprehension instead of a for-loop
- providers/mysql/src/airflow/providers/mysql/hooks/mysql.py:191:17: PERF403 Use `dict.update` instead of a for-loop
+ providers/mysql/src/airflow/providers/mysql/hooks/mysql.py:191:17: PERF403 Use a dictionary comprehension instead of a for-loop
- providers/postgres/src/airflow/providers/postgres/hooks/postgres.py:171:17: PERF403 Use `dict.update` instead of a for-loop
+ providers/postgres/src/airflow/providers/postgres/hooks/postgres.py:171:17: PERF403 Use a dictionary comprehension instead of a for-loop
... 1928 additional changes omitted for project

apache/superset (+2 -2 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview --select ALL

- superset/migrations/versions/2021-04-12_12-38_fc3a3a8ff221_migrate_filter_sets_to_new_format.py:122:21: PERF403 Use `dict.update` instead of a for-loop
+ superset/migrations/versions/2021-04-12_12-38_fc3a3a8ff221_migrate_filter_sets_to_new_format.py:122:21: PERF403 Use a dictionary comprehension instead of a for-loop
- superset/migrations/versions/2022-04-01_14-38_a9422eeaae74_new_dataset_models_take_2.py:795:21: PERF403 Use `dict.update` instead of a for-loop
+ superset/migrations/versions/2022-04-01_14-38_a9422eeaae74_new_dataset_models_take_2.py:795:21: PERF403 Use a dictionary comprehension instead of a for-loop

pandas-dev/pandas (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --no-fix --output-format concise --preview

+ pandas/tests/arrays/sparse/test_accessor.py:247:30: RUF043 Pattern passed to `match=` contains metacharacters but is neither escaped nor raw
+ pandas/tests/arrays/sparse/test_accessor.py:96:50: RUF043 Pattern passed to `match=` contains metacharacters but is neither escaped nor raw

Changes by rule (4 rules affected)

code total + violation - violation + fix - fix
AIR311 1828 0 1828 0 0
AIR301 134 134 0 0 0
PERF403 12 6 6 0 0
RUF043 2 2 0 0 0

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@FilipAndersson245
Copy link
Author

FilipAndersson245 commented Apr 25, 2025

Anything more that should be fixed?

Was thinking if this works out good we could test adding it to the release pipeline.

@MichaReiser
Copy link
Member

Sorry, I haven't had time to look at this yet (I was out all last week and we're getting close to an important release. That's why my time is a bit limited at the moment). But this PR is still on my todo list

@FilipAndersson245
Copy link
Author

FilipAndersson245 commented Apr 25, 2025

No problem, just didn't want it to be forgotten :)

@FilipAndersson245
Copy link
Author

Any update on when this may be reviewed?

@MichaReiser
Copy link
Member

Sorry, I'm very busy at the moment. It's on my list but I don't know when I'll get to it

1 similar comment
@MichaReiser
Copy link
Member

Sorry, I'm very busy at the moment. It's on my list but I don't know when I'll get to it

@FilipAndersson245
Copy link
Author

Are you still interested or should we close this?

@MeGaGiGaGon
Copy link
Contributor

I saw ntBre say in another issue he’s off on PTO for a week, so you might not get a response till then

@ntBre
Copy link
Contributor

ntBre commented Jul 3, 2025

I think we're definitely still interested, just need to find enough time to review it and think about how we'd want to incorporate it into our release process, as Micha mentioned on the issue.

Thank you for your work on this!

@FilipAndersson245
Copy link
Author

This PR is now more then 5 month old so I guess it only adds noise at this point.

@MichaReiser
Copy link
Member

I still find it a valuable source to understand how to add a PGO script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Potential performance improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants