⚡️ Speed up method Compare._get_column_comparison by 21%
#26
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 21% (0.21x) speedup for
Compare._get_column_comparisonindatacompy/core.py⏱️ Runtime :
933 microseconds→772 microseconds(best of211runs)📝 Explanation and details
The optimized code achieves a 20% speedup by replacing three separate list comprehensions with a single loop that performs all calculations in one pass.
What was optimized:
self.column_statsforloop that calculates all three metrics simultaneously by accumulating countersKey performance improvements:
col["unequal_cnt"]access happens once per column instead of up to three timesWhy this optimization works:
sum()andlen()function calls are eliminatedTest case performance patterns:
This optimization is particularly valuable for dataframe comparison workflows where column statistics are computed frequently, as it reduces both time complexity and memory allocation overhead.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
⏪ Replay Tests and Runtime
test_pytest_teststest_sparktest_helper_py_teststest_fuguetest_fugue_polars_py_teststest_fuguetest_fugue_p__replay_test_0.py::test_datacompy_core_Compare__get_column_comparisonTo edit these changes
git checkout codeflash/optimize-Compare._get_column_comparison-mi5sulmland push.