Skip to content

⚡️ Speed up function _calc_sturges_bin_width_from_profile by 17%#52

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-_calc_sturges_bin_width_from_profile-mgd9crvy
Open

⚡️ Speed up function _calc_sturges_bin_width_from_profile by 17%#52
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-_calc_sturges_bin_width_from_profile-mgd9crvy

Conversation

@codeflash-ai
Copy link
Copy Markdown

@codeflash-ai codeflash-ai bot commented Oct 5, 2025

📄 17% (0.17x) speedup for _calc_sturges_bin_width_from_profile in dataprofiler/profilers/histogram_utils.py

⏱️ Runtime : 230 microseconds 196 microseconds (best of 110 runs)

📝 Explanation and details

The optimization achieves a 17% speedup through several key micro-optimizations that reduce function call overhead and improve memory access patterns:

Key Optimizations:

  1. Eliminated redundant property lookups: In _get_maximum_from_profile and _get_minimum_from_profile, the original code accessed profile.max twice - once for the value and once for the None check. The optimized version stores it in a local variable val, avoiding the duplicate property access.

  2. Reduced dictionary access overhead: The optimized code stores profile._stored_histogram["histogram"] in a local variable histogram, avoiding repeated nested dictionary lookups when accessing both bin_edges and bin_counts.

  3. Replaced NumPy function call with native operator: The most significant improvement comes from replacing np.subtract(maximum, minimum) with direct subtraction maximum - minimum in _ptp(). This eliminates the overhead of a NumPy function call for scalar values, reducing execution time by ~95% (from 214μs to 10μs in the line profiler).

  4. Optimized sum operation: Added a fast path in _get_dataset_size_from_profile to use NumPy's .sum() method when bin_counts is a NumPy array, which is significantly faster than Python's built-in sum().

Performance Impact by Test Case:
The optimizations show consistent improvements across all test scenarios:

  • Basic cases: 12-23% faster (most tests show 15-22% improvement)
  • Edge cases: 10-27% faster (particularly effective for simple cases like zero ranges)
  • Large scale cases: 13-18% faster (scales well with dataset size)

The optimizations are particularly effective for scenarios with frequent profile queries and when fallback to histogram data is required, as they reduce the cumulative overhead of repeated property and dictionary accesses.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 12 Passed
🌀 Generated Regression Tests 22 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
profilers/test_histogram_utils.py::TestHistogramUtils.test_calc_sturges_bin_width_from_profile 15.6μs 13.5μs 15.3%✅
🌀 Generated Regression Tests and Runtime
import math

# function to test
import numpy as np
# imports
import pytest  # used for our unit tests
from dataprofiler.profilers.histogram_utils import \
    _calc_sturges_bin_width_from_profile

# --- Unit Tests ---

# Helper class for mocking profile objects
class MockProfile:
    def __init__(self, min_val=None, max_val=None, match_count=None, bin_edges=None, bin_counts=None):
        self.min = min_val
        self.max = max_val
        self.match_count = match_count
        self._stored_histogram = {
            "histogram": {
                "bin_edges": bin_edges if bin_edges is not None else [],
                "bin_counts": bin_counts if bin_counts is not None else []
            }
        }

# ---------- BASIC TEST CASES ----------

def test_basic_profile_with_match_count():
    # Basic: min/max present, match_count present
    profile = MockProfile(min_val=0, max_val=10, match_count=100)
    # Expected: (10 - 0) / (log2(100) + 1)
    expected = (10 - 0) / (math.log2(100) + 1)
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 16.4μs -> 13.8μs (18.7% faster)


def test_basic_profile_with_partial_fallback():
    # Basic: min present, max missing, fallback to histogram for max
    bin_edges = [1, 3, 5, 7]
    bin_counts = [5, 10, 15]
    profile = MockProfile(min_val=1, max_val=None, match_count=30, bin_edges=bin_edges, bin_counts=bin_counts)
    # min=1, max=7, dataset_size=30
    expected = (7 - 1) / (math.log2(30) + 1)
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 16.3μs -> 14.0μs (16.4% faster)

def test_basic_profile_with_partial_fallback_min():
    # Basic: max present, min missing, fallback to histogram for min
    bin_edges = [10, 20, 30, 40]
    bin_counts = [1, 2, 3]
    profile = MockProfile(min_val=None, max_val=40, match_count=6, bin_edges=bin_edges, bin_counts=bin_counts)
    # min=10, max=40, dataset_size=6
    expected = (40 - 10) / (math.log2(6) + 1)
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 8.65μs -> 7.70μs (12.4% faster)

# ---------- EDGE TEST CASES ----------

def test_edge_profile_zero_range():
    # Edge: min == max, so range is zero
    profile = MockProfile(min_val=5, max_val=5, match_count=10)
    expected = 0.0
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 7.72μs -> 6.53μs (18.3% faster)

def test_edge_profile_dataset_size_one():
    # Edge: dataset_size == 1 (should not divide by zero)
    profile = MockProfile(min_val=0, max_val=10, match_count=1)
    expected = (10 - 0) / (math.log2(1) + 1)  # log2(1) == 0
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 7.41μs -> 6.17μs (20.1% faster)

def test_edge_profile_dataset_size_two():
    # Edge: dataset_size == 2
    profile = MockProfile(min_val=0, max_val=10, match_count=2)
    expected = (10 - 0) / (math.log2(2) + 1)  # log2(2) == 1
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 7.06μs -> 5.89μs (19.9% faster)

def test_edge_profile_negative_values():
    # Edge: negative min and max
    profile = MockProfile(min_val=-10, max_val=-2, match_count=8)
    expected = (-2 - -10) / (math.log2(8) + 1)  # log2(8) == 3
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 7.18μs -> 5.86μs (22.6% faster)

def test_edge_profile_float_values():
    # Edge: min/max are floats
    profile = MockProfile(min_val=1.5, max_val=3.7, match_count=16)
    expected = (3.7 - 1.5) / (math.log2(16) + 1)  # log2(16) == 4
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 7.22μs -> 5.84μs (23.5% faster)



def test_edge_profile_min_greater_than_max():
    # Edge: min > max (should handle negative range)
    profile = MockProfile(min_val=10, max_val=5, match_count=100)
    expected = (5 - 10) / (math.log2(100) + 1)
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 16.3μs -> 13.8μs (17.9% faster)

# ---------- LARGE SCALE TEST CASES ----------

def test_large_scale_profile():
    # Large scale: 1000 elements, min/max present
    profile = MockProfile(min_val=0, max_val=999, match_count=1000)
    expected = (999 - 0) / (math.log2(1000) + 1)
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 8.62μs -> 7.63μs (13.0% faster)


def test_large_scale_partial_fallback():
    # Large scale: min present, max fallback to histogram
    bin_edges = [0, 500, 1000]
    bin_counts = [500, 500]
    profile = MockProfile(min_val=0, max_val=None, match_count=1000, bin_edges=bin_edges, bin_counts=bin_counts)
    expected = (1000 - 0) / (math.log2(1000) + 1)
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 16.4μs -> 13.9μs (17.6% faster)

def test_large_scale_partial_fallback_min():
    # Large scale: max present, min fallback to histogram
    bin_edges = [0, 500, 1000]
    bin_counts = [500, 500]
    profile = MockProfile(min_val=None, max_val=1000, match_count=1000, bin_edges=bin_edges, bin_counts=bin_counts)
    expected = (1000 - 0) / (math.log2(1000) + 1)
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 8.56μs -> 7.67μs (11.6% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import math

# function to test
import numpy as np
# imports
import pytest  # used for our unit tests
from dataprofiler.profilers.histogram_utils import \
    _calc_sturges_bin_width_from_profile


# Helper class to simulate a NumericStatsMixin-like profile object
class MockProfile:
    def __init__(self, min_val=None, max_val=None, match_count=None, stored_histogram=None):
        self.min = min_val
        self.max = max_val
        self.match_count = match_count
        self._stored_histogram = stored_histogram

# --------------------------
# Basic Test Cases
# --------------------------

def test_basic_case_with_min_max_and_match_count():
    # Basic scenario: min, max, and match_count are set
    profile = MockProfile(min_val=0, max_val=10, match_count=100)
    expected = (10 - 0) / (math.log2(100) + 1.0)
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 7.58μs -> 6.19μs (22.5% faster)

def test_basic_case_with_float_values():
    # Basic scenario: float values for min, max
    profile = MockProfile(min_val=2.5, max_val=7.5, match_count=32)
    expected = (7.5 - 2.5) / (math.log2(32) + 1.0)
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 7.06μs -> 5.93μs (19.0% faster)


def test_edge_case_min_equals_max():
    # Edge: min == max => ptp == 0
    profile = MockProfile(min_val=5, max_val=5, match_count=10)
    expected = 0.0
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 16.3μs -> 13.9μs (16.6% faster)

def test_edge_case_dataset_size_one():
    # Edge: dataset size == 1
    profile = MockProfile(min_val=2, max_val=10, match_count=1)
    expected = (10 - 2) / (math.log2(1) + 1.0)  # log2(1) == 0
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 8.65μs -> 7.47μs (15.8% faster)

def test_edge_case_dataset_size_two():
    # Edge: dataset size == 2
    profile = MockProfile(min_val=2, max_val=10, match_count=2)
    expected = (10 - 2) / (math.log2(2) + 1.0)  # log2(2) == 1
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 7.53μs -> 6.38μs (18.0% faster)

def test_edge_case_negative_values():
    # Edge: negative min/max values
    profile = MockProfile(min_val=-10, max_val=-2, match_count=16)
    expected = (-2 - (-10)) / (math.log2(16) + 1.0)
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 7.35μs -> 6.00μs (22.4% faster)

def test_edge_case_min_greater_than_max():
    # Edge: min > max (should return negative bin width)
    profile = MockProfile(min_val=10, max_val=2, match_count=8)
    expected = (2 - 10) / (math.log2(8) + 1.0)
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 6.87μs -> 6.22μs (10.4% faster)

def test_edge_case_zero_range_and_size():
    # Edge: min == max == 0, dataset_size == 1
    profile = MockProfile(min_val=0, max_val=0, match_count=1)
    expected = 0.0
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 7.30μs -> 5.75μs (27.0% faster)


def test_large_scale_case_1000_elements():
    # Large scale: dataset of size 1000, min/max set
    profile = MockProfile(min_val=0, max_val=1000, match_count=1000)
    expected = (1000 - 0) / (math.log2(1000) + 1.0)
    codeflash_output = _calc_sturges_bin_width_from_profile(profile); result = codeflash_output # 16.3μs -> 13.8μs (18.0% faster)




def test_missing_stored_histogram_raises():
    # If min/max are None and stored_histogram is missing, should raise AttributeError
    profile = MockProfile(min_val=None, max_val=None, match_count=None, stored_histogram=None)
    with pytest.raises(TypeError):
        # This will fail because profile._stored_histogram is None and indexing it will raise TypeError
        _calc_sturges_bin_width_from_profile(profile) # 2.08μs -> 2.09μs (0.764% slower)

To edit these changes git checkout codeflash/optimize-_calc_sturges_bin_width_from_profile-mgd9crvy and push.

Codeflash

The optimization achieves a **17% speedup** through several key micro-optimizations that reduce function call overhead and improve memory access patterns:

**Key Optimizations:**

1. **Eliminated redundant property lookups**: In `_get_maximum_from_profile` and `_get_minimum_from_profile`, the original code accessed `profile.max` twice - once for the value and once for the None check. The optimized version stores it in a local variable `val`, avoiding the duplicate property access.

2. **Reduced dictionary access overhead**: The optimized code stores `profile._stored_histogram["histogram"]` in a local variable `histogram`, avoiding repeated nested dictionary lookups when accessing both `bin_edges` and `bin_counts`.

3. **Replaced NumPy function call with native operator**: The most significant improvement comes from replacing `np.subtract(maximum, minimum)` with direct subtraction `maximum - minimum` in `_ptp()`. This eliminates the overhead of a NumPy function call for scalar values, reducing execution time by ~95% (from 214μs to 10μs in the line profiler).

4. **Optimized sum operation**: Added a fast path in `_get_dataset_size_from_profile` to use NumPy's `.sum()` method when `bin_counts` is a NumPy array, which is significantly faster than Python's built-in `sum()`.

**Performance Impact by Test Case:**
The optimizations show consistent improvements across all test scenarios:
- **Basic cases**: 12-23% faster (most tests show 15-22% improvement)
- **Edge cases**: 10-27% faster (particularly effective for simple cases like zero ranges)
- **Large scale cases**: 13-18% faster (scales well with dataset size)

The optimizations are particularly effective for scenarios with frequent profile queries and when fallback to histogram data is required, as they reduce the cumulative overhead of repeated property and dictionary accesses.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 5, 2025 05:25
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants