Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 11, 2025

📄 19% (0.19x) speedup for UploadStats._skipped_summary in google/cloud/aiplatform/tensorboard/upload_tracker.py

⏱️ Runtime : 103 microseconds 86.2 microseconds (best of 514 runs)

📝 Explanation and details

The optimized code achieves an 18% speedup through several targeted optimizations:

1. Constant Pre-computation in readable_bytes_string():

  • Moved 2**10 and 2**20 calculations to module-level constants _KB and _MB
  • Eliminates redundant power calculations on every function call
  • While line profiler shows slightly higher per-hit times due to constant lookups, the overall function benefits from reduced computation

2. Single time.time() Call in __init__():

  • Cached time.time() result in a variable to avoid potential duplicate calls
  • Minor optimization that reduces system call overhead during object initialization

3. Optimized String Building in _skipped_summary():

  • Replaced list-append-and-join pattern with direct conditional string formatting
  • For the common case of 0-2 items, direct string concatenation is more efficient than building a list and joining
  • Added local variable caching for self._num_tensors_skipped and self._num_blobs_skipped to reduce attribute lookups
  • Added explicit empty string return to handle the no-skipped-items case efficiently

Performance Impact by Test Case:
The optimizations show consistent improvements across all test scenarios:

  • Empty cases (no skipped items): 22-48% faster due to direct empty string return
  • Single item cases: 5-25% faster from avoiding list operations
  • Both tensors and blobs: 15-30% faster from direct string formatting instead of list building
  • Large scale cases: 14-24% faster, showing the optimizations scale well

The string building optimization is particularly effective because most real-world usage involves 0-2 items in the summary, making the direct conditional approach faster than the general-purpose list-and-join method.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 91 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import time

# imports
import pytest  # used for our unit tests
from aiplatform.tensorboard.upload_tracker import UploadStats

# unit tests

# Basic Test Cases

def test_skipped_summary_none_skipped():
    """Test when nothing is skipped, should return empty string."""
    stats = UploadStats()
    codeflash_output = stats._skipped_summary() # 482ns -> 368ns (31.0% faster)

def test_skipped_summary_only_tensors_skipped_bytes_zero():
    """Test when only tensors are skipped and bytes is zero."""
    stats = UploadStats()
    stats._num_tensors_skipped = 3
    stats._tensor_bytes_skipped = 0
    codeflash_output = stats._skipped_summary() # 1.54μs -> 1.38μs (11.8% faster)

def test_skipped_summary_only_blobs_skipped_bytes_zero():
    """Test when only blobs are skipped and bytes is zero."""
    stats = UploadStats()
    stats._num_blobs_skipped = 2
    stats._blob_bytes_skipped = 0
    codeflash_output = stats._skipped_summary() # 1.33μs -> 1.22μs (9.18% faster)

def test_skipped_summary_tensors_and_blobs_skipped_bytes_zero():
    """Test when both tensors and blobs are skipped and bytes is zero."""
    stats = UploadStats()
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = 0
    stats._num_blobs_skipped = 1
    stats._blob_bytes_skipped = 0
    codeflash_output = stats._skipped_summary(); result = codeflash_output # 2.01μs -> 1.58μs (27.6% faster)

def test_skipped_summary_tensors_bytes_kb():
    """Test tensors skipped with bytes in kilobytes."""
    stats = UploadStats()
    stats._num_tensors_skipped = 4
    stats._tensor_bytes_skipped = 1536  # 1.5 KB
    codeflash_output = stats._skipped_summary() # 2.52μs -> 2.21μs (14.3% faster)

def test_skipped_summary_blobs_bytes_mb():
    """Test blobs skipped with bytes in megabytes."""
    stats = UploadStats()
    stats._num_blobs_skipped = 2
    stats._blob_bytes_skipped = 3 * 1024 * 1024  # 3 MB
    codeflash_output = stats._skipped_summary() # 2.22μs -> 1.88μs (18.4% faster)

def test_skipped_summary_both_types_bytes_mixed_units():
    """Test both tensors and blobs skipped with different units."""
    stats = UploadStats()
    stats._num_tensors_skipped = 5
    stats._tensor_bytes_skipped = 2048  # 2 KB
    stats._num_blobs_skipped = 1
    stats._blob_bytes_skipped = 1048576  # 1 MB
    codeflash_output = stats._skipped_summary(); result = codeflash_output # 3.04μs -> 2.63μs (15.8% faster)

# Edge Test Cases

def test_skipped_summary_tensors_bytes_just_below_kb():
    """Test tensor bytes just below 1 KB (should show as B)."""
    stats = UploadStats()
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = 1023
    codeflash_output = stats._skipped_summary() # 1.45μs -> 1.25μs (16.1% faster)

def test_skipped_summary_tensors_bytes_exactly_kb():
    """Test tensor bytes exactly 1 KB (should show as kB)."""
    stats = UploadStats()
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = 1024
    codeflash_output = stats._skipped_summary() # 2.19μs -> 1.77μs (23.6% faster)

def test_skipped_summary_blobs_bytes_just_below_mb():
    """Test blob bytes just below 1 MB (should show as kB)."""
    stats = UploadStats()
    stats._num_blobs_skipped = 1
    stats._blob_bytes_skipped = 1048575  # 1024*1024 - 1
    # 1048575 / 1024 = 1023.999 kB, but formatted as 1024.0 kB due to rounding
    # But readable_bytes_string returns 1024.0 kB for 1048575
    codeflash_output = stats._skipped_summary() # 2.30μs -> 1.92μs (19.5% faster)

def test_skipped_summary_blobs_bytes_exactly_mb():
    """Test blob bytes exactly 1 MB (should show as MB)."""
    stats = UploadStats()
    stats._num_blobs_skipped = 1
    stats._blob_bytes_skipped = 1048576
    codeflash_output = stats._skipped_summary() # 2.10μs -> 1.71μs (22.7% faster)

def test_skipped_summary_large_numbers():
    """Test large numbers of skipped tensors and blobs."""
    stats = UploadStats()
    stats._num_tensors_skipped = 999
    stats._tensor_bytes_skipped = 999999
    stats._num_blobs_skipped = 888
    stats._blob_bytes_skipped = 888888
    codeflash_output = stats._skipped_summary(); result = codeflash_output # 3.12μs -> 2.52μs (23.7% faster)

def test_skipped_summary_zero_skipped_with_nonzero_bytes():
    """Test when skipped count is zero but bytes is nonzero, should not report."""
    stats = UploadStats()
    stats._num_tensors_skipped = 0
    stats._tensor_bytes_skipped = 12345
    stats._num_blobs_skipped = 0
    stats._blob_bytes_skipped = 67890
    codeflash_output = stats._skipped_summary() # 484ns -> 346ns (39.9% faster)

def test_skipped_summary_negative_skipped_counts():
    """Test negative skipped counts, should not report anything."""
    stats = UploadStats()
    stats._num_tensors_skipped = -1
    stats._tensor_bytes_skipped = 1024
    stats._num_blobs_skipped = -5
    stats._blob_bytes_skipped = 2048
    # Negative counts are not valid, but function only checks for truthiness
    # In Python, -1 is truthy, so will report
    codeflash_output = stats._skipped_summary() # 3.25μs -> 2.69μs (20.7% faster)

def test_skipped_summary_negative_bytes():
    """Test negative bytes, should still format but with negative value."""
    stats = UploadStats()
    stats._num_tensors_skipped = 2
    stats._tensor_bytes_skipped = -512
    stats._num_blobs_skipped = 0
    stats._blob_bytes_skipped = 0
    codeflash_output = stats._skipped_summary() # 1.47μs -> 1.32μs (11.9% faster)

def test_skipped_summary_float_bytes():
    """Test float bytes, should format correctly."""
    stats = UploadStats()
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = 1536.5
    codeflash_output = stats._skipped_summary() # 2.35μs -> 2.06μs (14.2% faster)

def test_skipped_summary_float_skipped_counts():
    """Test float skipped counts, should format as float."""
    stats = UploadStats()
    stats._num_tensors_skipped = 1.5
    stats._tensor_bytes_skipped = 1024
    codeflash_output = stats._skipped_summary() # 2.44μs -> 2.12μs (15.3% faster)

def test_skipped_summary_ordering():
    """Test that tensors are always reported before blobs."""
    stats = UploadStats()
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = 1024
    stats._num_blobs_skipped = 1
    stats._blob_bytes_skipped = 1024
    codeflash_output = stats._skipped_summary(); result = codeflash_output # 3.03μs -> 2.44μs (24.0% faster)

# Large Scale Test Cases

def test_skipped_summary_large_scale_max_elements():
    """Test with maximum allowed elements for large scale."""
    stats = UploadStats()
    stats._num_tensors_skipped = 1000
    stats._tensor_bytes_skipped = 1000 * 1024  # 1000 KB
    stats._num_blobs_skipped = 1000
    stats._blob_bytes_skipped = 1000 * 1024  # 1000 KB
    codeflash_output = stats._skipped_summary(); result = codeflash_output # 3.03μs -> 2.57μs (17.8% faster)

def test_skipped_summary_large_scale_mb_boundary():
    """Test with skipped bytes just crossing into MB for tensors and blobs."""
    stats = UploadStats()
    stats._num_tensors_skipped = 500
    stats._tensor_bytes_skipped = 2 * 1024 * 1024  # 2 MB
    stats._num_blobs_skipped = 500
    stats._blob_bytes_skipped = 3 * 1024 * 1024  # 3 MB
    codeflash_output = stats._skipped_summary(); result = codeflash_output # 2.80μs -> 2.34μs (19.8% faster)

def test_skipped_summary_large_scale_mixed():
    """Test with many tensors and few blobs, and vice versa."""
    stats = UploadStats()
    stats._num_tensors_skipped = 999
    stats._tensor_bytes_skipped = 999 * 1024
    stats._num_blobs_skipped = 1
    stats._blob_bytes_skipped = 1 * 1024
    codeflash_output = stats._skipped_summary(); result = codeflash_output # 3.00μs -> 2.42μs (23.8% faster)

def test_skipped_summary_large_scale_zero_bytes():
    """Test large skipped counts but zero bytes."""
    stats = UploadStats()
    stats._num_tensors_skipped = 1000
    stats._tensor_bytes_skipped = 0
    stats._num_blobs_skipped = 1000
    stats._blob_bytes_skipped = 0
    codeflash_output = stats._skipped_summary(); result = codeflash_output # 2.07μs -> 1.71μs (20.8% faster)

def test_skipped_summary_large_scale_float_bytes():
    """Test large scale with float bytes."""
    stats = UploadStats()
    stats._num_tensors_skipped = 1000
    stats._tensor_bytes_skipped = 1000 * 1024.5  # 1024500.0 bytes
    stats._num_blobs_skipped = 1000
    stats._blob_bytes_skipped = 1000 * 1024.5
    codeflash_output = stats._skipped_summary(); result = codeflash_output # 3.02μs -> 2.56μs (18.4% faster)

def test_skipped_summary_large_scale_edge_bytes():
    """Test with skipped bytes at maximum allowed size (just under 100MB)."""
    stats = UploadStats()
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = 100 * 1024 * 1024 - 1  # 104857599 bytes
    stats._num_blobs_skipped = 1
    stats._blob_bytes_skipped = 100 * 1024 * 1024 - 1
    # 104857599 / 1048576 = 99.999999 MB, formatted as 100.0 MB
    codeflash_output = stats._skipped_summary(); result = codeflash_output # 3.01μs -> 2.43μs (24.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import time

# imports
import pytest  # used for our unit tests
from aiplatform.tensorboard.upload_tracker import UploadStats

# unit tests

# ------------- Basic Test Cases -------------

def test_skipped_summary_none_skipped():
    # No tensors or blobs skipped
    stats = UploadStats()
    codeflash_output = stats._skipped_summary() # 515ns -> 422ns (22.0% faster)

def test_skipped_summary_only_tensors():
    # Only tensors skipped
    stats = UploadStats()
    stats._num_tensors_skipped = 3
    stats._tensor_bytes_skipped = 500
    expected = "3 tensors (500 B)"
    codeflash_output = stats._skipped_summary() # 1.63μs -> 1.55μs (4.82% faster)

def test_skipped_summary_only_blobs():
    # Only blobs skipped
    stats = UploadStats()
    stats._num_blobs_skipped = 2
    stats._blob_bytes_skipped = 2048
    expected = "2 binary objects (2.0 kB)"
    codeflash_output = stats._skipped_summary() # 2.55μs -> 2.23μs (14.7% faster)

def test_skipped_summary_both_types():
    # Both tensors and blobs skipped
    stats = UploadStats()
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = 1048576  # 1MB
    stats._num_blobs_skipped = 4
    stats._blob_bytes_skipped = 4096
    expected = "1 tensors (1.0 MB), 4 binary objects (4.0 kB)"
    codeflash_output = stats._skipped_summary() # 3.19μs -> 2.69μs (18.4% faster)

def test_skipped_summary_zero_bytes():
    # Skipped counts but zero bytes
    stats = UploadStats()
    stats._num_tensors_skipped = 2
    stats._tensor_bytes_skipped = 0
    stats._num_blobs_skipped = 1
    stats._blob_bytes_skipped = 0
    expected = "2 tensors (0 B), 1 binary objects (0 B)"
    codeflash_output = stats._skipped_summary() # 2.07μs -> 1.62μs (27.8% faster)

# ------------- Edge Test Cases -------------

def test_skipped_summary_large_bytes():
    # Large byte counts for readable formatting
    stats = UploadStats()
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = 2**21  # 2MB
    stats._num_blobs_skipped = 1
    stats._blob_bytes_skipped = 2**11    # 2kB
    expected = "1 tensors (2.0 MB), 1 binary objects (2.0 kB)"
    codeflash_output = stats._skipped_summary() # 3.04μs -> 2.48μs (22.7% faster)

def test_skipped_summary_kb_boundary():
    # Exactly 1024 bytes should be formatted as kB
    stats = UploadStats()
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = 1024
    expected = "1 tensors (1.0 kB)"
    codeflash_output = stats._skipped_summary() # 2.00μs -> 1.78μs (12.6% faster)

def test_skipped_summary_mb_boundary():
    # Exactly 1048576 bytes should be formatted as MB
    stats = UploadStats()
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = 1048576
    expected = "1 tensors (1.0 MB)"
    codeflash_output = stats._skipped_summary() # 2.00μs -> 1.60μs (24.9% faster)

def test_skipped_summary_pluralization():
    # Check pluralization for 1 and >1 skipped
    stats = UploadStats()
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = 100
    stats._num_blobs_skipped = 2
    stats._blob_bytes_skipped = 200
    expected = "1 tensors (100 B), 2 binary objects (200 B)"
    codeflash_output = stats._skipped_summary() # 2.07μs -> 1.70μs (21.9% faster)

def test_skipped_summary_order():
    # Tensors should always come before blobs
    stats = UploadStats()
    stats._num_blobs_skipped = 5
    stats._blob_bytes_skipped = 500
    stats._num_tensors_skipped = 3
    stats._tensor_bytes_skipped = 300
    expected = "3 tensors (300 B), 5 binary objects (500 B)"
    codeflash_output = stats._skipped_summary() # 1.95μs -> 1.49μs (30.5% faster)

def test_skipped_summary_negative_values():
    # Negative values should be handled gracefully (not expected, but test for robustness)
    stats = UploadStats()
    stats._num_tensors_skipped = -1
    stats._tensor_bytes_skipped = -500
    stats._num_blobs_skipped = -2
    stats._blob_bytes_skipped = -1024
    # Should not include negative counts (since the if checks for truthiness, negative is truthy)
    expected = "-1 tensors (-500 B), -2 binary objects (-1.0 kB)"
    codeflash_output = stats._skipped_summary() # 2.00μs -> 1.64μs (21.5% faster)

def test_skipped_summary_float_bytes():
    # Bytes as float, should be formatted correctly
    stats = UploadStats()
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = 1536.5  # 1.5kB + 512.5B
    expected = "1 tensors (1.5 kB)"
    codeflash_output = stats._skipped_summary() # 2.33μs -> 2.19μs (6.29% faster)

def test_skipped_summary_float_count():
    # Skipped count as float, should be formatted correctly
    stats = UploadStats()
    stats._num_tensors_skipped = 1.5
    stats._tensor_bytes_skipped = 100
    expected = "1.5 tensors (100 B)"
    codeflash_output = stats._skipped_summary() # 1.70μs -> 1.50μs (12.8% faster)

# ------------- Large Scale Test Cases -------------

def test_skipped_summary_large_scale_tensors():
    # Many tensors skipped, large bytes
    stats = UploadStats()
    stats._num_tensors_skipped = 999
    stats._tensor_bytes_skipped = 999 * 1048576  # 999MB
    expected = "999 tensors (999.0 MB)"
    codeflash_output = stats._skipped_summary() # 2.49μs -> 2.10μs (18.4% faster)

def test_skipped_summary_large_scale_blobs():
    # Many blobs skipped, large bytes
    stats = UploadStats()
    stats._num_blobs_skipped = 1000
    stats._blob_bytes_skipped = 1000 * 1024  # 1000kB
    expected = "1000 binary objects (1000.0 kB)"
    codeflash_output = stats._skipped_summary() # 2.36μs -> 2.06μs (14.2% faster)

def test_skipped_summary_large_scale_both():
    # Both tensors and blobs, large scale
    stats = UploadStats()
    stats._num_tensors_skipped = 500
    stats._tensor_bytes_skipped = 500 * 2048  # 1,024,000 bytes = ~1000kB
    stats._num_blobs_skipped = 750
    stats._blob_bytes_skipped = 750 * 4096   # 3,072,000 bytes = ~3.0 MB
    expected = "500 tensors (1000.0 kB), 750 binary objects (3.0 MB)"
    codeflash_output = stats._skipped_summary() # 3.16μs -> 2.70μs (17.4% faster)

def test_skipped_summary_maximum_bytes():
    # Maximum allowed bytes (under 100MB)
    stats = UploadStats()
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = 99 * 2**20  # 99MB
    stats._num_blobs_skipped = 1
    stats._blob_bytes_skipped = 99 * 2**20
    expected = "1 tensors (99.0 MB), 1 binary objects (99.0 MB)"
    codeflash_output = stats._skipped_summary() # 2.71μs -> 2.36μs (14.6% faster)

def test_skipped_summary_empty_string_no_skipped_large():
    # Large scale, but no skipped
    stats = UploadStats()
    stats._num_tensors = 1000
    stats._num_blobs = 1000
    # No skipped
    codeflash_output = stats._skipped_summary() # 495ns -> 335ns (47.8% faster)

# ------------- Determinism Test -------------

def test_skipped_summary_deterministic():
    # Ensure repeated calls give same result
    stats = UploadStats()
    stats._num_tensors_skipped = 5
    stats._tensor_bytes_skipped = 5000
    stats._num_blobs_skipped = 2
    stats._blob_bytes_skipped = 2048
    codeflash_output = stats._skipped_summary(); result1 = codeflash_output # 3.26μs -> 2.69μs (20.9% faster)
    codeflash_output = stats._skipped_summary(); result2 = codeflash_output # 1.35μs -> 1.14μs (18.1% faster)

# ------------- Readability Test -------------

def test_skipped_summary_readability():
    # Output should be readable and formatted
    stats = UploadStats()
    stats._num_tensors_skipped = 2
    stats._tensor_bytes_skipped = 12345
    stats._num_blobs_skipped = 3
    stats._blob_bytes_skipped = 67890
    codeflash_output = stats._skipped_summary(); result = codeflash_output # 2.76μs -> 2.30μs (19.8% faster)

# ------------- Mutation Testing Guard -------------

def test_skipped_summary_mutation_guard():
    # Changing the order of tensor/blob should fail this test
    stats = UploadStats()
    stats._num_tensors_skipped = 2
    stats._tensor_bytes_skipped = 2048
    stats._num_blobs_skipped = 1
    stats._blob_bytes_skipped = 1024
    expected = "2 tensors (2.0 kB), 1 binary objects (1.0 kB)"
    codeflash_output = stats._skipped_summary() # 2.71μs -> 2.21μs (22.2% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-UploadStats._skipped_summary-mglr1o22 and push.

Codeflash

The optimized code achieves an 18% speedup through several targeted optimizations:

**1. Constant Pre-computation in `readable_bytes_string()`:**
- Moved `2**10` and `2**20` calculations to module-level constants `_KB` and `_MB`
- Eliminates redundant power calculations on every function call
- While line profiler shows slightly higher per-hit times due to constant lookups, the overall function benefits from reduced computation

**2. Single `time.time()` Call in `__init__()`:**
- Cached `time.time()` result in a variable to avoid potential duplicate calls
- Minor optimization that reduces system call overhead during object initialization

**3. Optimized String Building in `_skipped_summary()`:**
- Replaced list-append-and-join pattern with direct conditional string formatting
- For the common case of 0-2 items, direct string concatenation is more efficient than building a list and joining
- Added local variable caching for `self._num_tensors_skipped` and `self._num_blobs_skipped` to reduce attribute lookups
- Added explicit empty string return to handle the no-skipped-items case efficiently

**Performance Impact by Test Case:**
The optimizations show consistent improvements across all test scenarios:
- **Empty cases** (no skipped items): 22-48% faster due to direct empty string return
- **Single item cases**: 5-25% faster from avoiding list operations  
- **Both tensors and blobs**: 15-30% faster from direct string formatting instead of list building
- **Large scale cases**: 14-24% faster, showing the optimizations scale well

The string building optimization is particularly effective because most real-world usage involves 0-2 items in the summary, making the direct conditional approach faster than the general-purpose list-and-join method.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 11, 2025 04:02
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant