Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 11, 2025

📄 9% (0.09x) speedup for UploadStats.summarize in google/cloud/aiplatform/tensorboard/upload_tracker.py

⏱️ Runtime : 169 microseconds 155 microseconds (best of 514 runs)

📝 Explanation and details

The optimized code achieves a 9% speedup through several key optimizations:

1. Precomputed Constants in readable_bytes_string
The original code repeatedly computed 2**10, 2**20, and float() conversions on every call. The optimized version precomputes these as module-level constants (_KB = 1024, _MB = 1024 * 1024, _KB_f, _MB_f), eliminating expensive power operations and float conversions. This optimization is particularly effective since readable_bytes_string is called frequently during summarization.

2. Reduced Function Call Overhead in summarize
The original code used a conditional expression self._skipped_summary() if self._skipped_any() else None that always called _skipped_any() and potentially _skipped_summary(). The optimized version inlines the skipped check (if self._num_tensors_skipped or self._num_blobs_skipped) and only calls _skipped_summary() when needed, eliminating one function call per summarize operation.

3. Improved Data Structure Construction
Instead of creating an empty list and repeatedly calling append(), the optimized version constructs the string_pieces list directly using a list literal with conditional expressions. This reduces list resize operations and method call overhead.

4. Variable Access Optimization
The optimized code uses a local variable b = bytes in readable_bytes_string and moves all computation calculations to the top of summarize(), improving CPU cache locality and reducing repeated attribute access.

These optimizations show consistent improvements across all test cases, with the most significant gains (15-25%) in simple cases with fewer bytes formatting operations, and smaller but meaningful gains (2-12%) in complex cases with multiple data types and skipped items. The optimizations are particularly effective for high-frequency logging scenarios where summarize() is called repeatedly.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 92 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import time

# imports
import pytest  # used for our unit tests
from aiplatform.tensorboard.upload_tracker import UploadStats

# unit tests

# ---------------------
# Basic Test Cases
# ---------------------

def test_summarize_empty_stats():
    # Test summarize on a fresh UploadStats object
    stats = UploadStats()
    summary, skipped = stats.summarize() # 2.11μs -> 1.73μs (21.7% faster)

def test_summarize_only_scalars():
    # Test with only scalars uploaded
    stats = UploadStats()
    stats._num_scalars = 5
    summary, skipped = stats.summarize() # 1.97μs -> 1.64μs (20.1% faster)

def test_summarize_only_tensors():
    # Test with only tensors uploaded
    stats = UploadStats()
    stats._num_tensors = 3
    stats._tensor_bytes = 500
    summary, skipped = stats.summarize() # 2.78μs -> 2.65μs (5.03% faster)

def test_summarize_only_blobs():
    # Test with only binary objects uploaded
    stats = UploadStats()
    stats._num_blobs = 2
    stats._blob_bytes = 2048
    summary, skipped = stats.summarize() # 4.06μs -> 3.43μs (18.5% faster)

def test_summarize_all_types():
    # Test with scalars, tensors, and binary objects uploaded
    stats = UploadStats()
    stats._num_scalars = 1
    stats._num_tensors = 2
    stats._tensor_bytes = 1024
    stats._num_blobs = 1
    stats._blob_bytes = 1048576
    summary, skipped = stats.summarize() # 4.16μs -> 3.78μs (9.92% faster)

def test_summarize_with_skipped_tensor():
    # Test with one tensor skipped
    stats = UploadStats()
    stats._num_scalars = 2
    stats._num_tensors = 4
    stats._tensor_bytes = 4096
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = 1024
    summary, skipped = stats.summarize() # 4.59μs -> 4.28μs (7.32% faster)

def test_summarize_with_skipped_blob():
    # Test with one binary object skipped
    stats = UploadStats()
    stats._num_blobs = 3
    stats._blob_bytes = 3072
    stats._num_blobs_skipped = 2
    stats._blob_bytes_skipped = 2048
    summary, skipped = stats.summarize() # 4.42μs -> 4.11μs (7.67% faster)

def test_summarize_with_skipped_tensor_and_blob():
    # Test with both tensors and binary objects skipped
    stats = UploadStats()
    stats._num_tensors = 10
    stats._tensor_bytes = 10240
    stats._num_tensors_skipped = 3
    stats._tensor_bytes_skipped = 3072
    stats._num_blobs = 5
    stats._blob_bytes = 5120
    stats._num_blobs_skipped = 2
    stats._blob_bytes_skipped = 2048
    summary, skipped = stats.summarize() # 5.13μs -> 4.84μs (5.95% faster)

# ---------------------
# Edge Test Cases
# ---------------------

def test_summarize_zero_bytes():
    # Test with zero bytes for tensors and blobs
    stats = UploadStats()
    stats._num_tensors = 2
    stats._tensor_bytes = 0
    stats._num_blobs = 2
    stats._blob_bytes = 0
    summary, skipped = stats.summarize() # 2.91μs -> 2.60μs (11.7% faster)

def test_summarize_skipped_all_tensors():
    # All tensors are skipped
    stats = UploadStats()
    stats._num_tensors = 5
    stats._tensor_bytes = 5120
    stats._num_tensors_skipped = 5
    stats._tensor_bytes_skipped = 5120
    summary, skipped = stats.summarize() # 3.72μs -> 3.48μs (6.93% faster)

def test_summarize_skipped_all_blobs():
    # All binary objects are skipped
    stats = UploadStats()
    stats._num_blobs = 4
    stats._blob_bytes = 4096
    stats._num_blobs_skipped = 4
    stats._blob_bytes_skipped = 4096
    summary, skipped = stats.summarize() # 3.57μs -> 3.27μs (9.28% faster)

def test_summarize_skipped_bytes_greater_than_uploaded():
    # Skipped bytes exceed uploaded bytes (should not happen, but test behavior)
    stats = UploadStats()
    stats._num_tensors = 5
    stats._tensor_bytes = 2048
    stats._num_tensors_skipped = 2
    stats._tensor_bytes_skipped = 4096  # skipped bytes > total bytes
    summary, skipped = stats.summarize() # 4.20μs -> 3.97μs (5.74% faster)

def test_summarize_negative_bytes():
    # Negative bytes (should not happen, but test behavior)
    stats = UploadStats()
    stats._num_tensors = 2
    stats._tensor_bytes = -1024
    stats._num_tensors_skipped = 1
    stats._tensor_bytes_skipped = -512
    summary, skipped = stats.summarize() # 3.49μs -> 3.20μs (9.00% faster)

def test_summarize_large_bytes_units():
    # Test readable_bytes_string for MB and kB boundaries
    stats = UploadStats()
    stats._num_tensors = 1
    stats._tensor_bytes = 2**20  # 1 MB
    stats._num_blobs = 1
    stats._blob_bytes = 2**10  # 1 kB
    summary, skipped = stats.summarize() # 3.92μs -> 3.69μs (6.18% faster)

def test_summarize_skipped_none_if_none_skipped():
    # Skipped summary should be None if nothing is skipped
    stats = UploadStats()
    stats._num_tensors = 2
    stats._tensor_bytes = 2048
    stats._num_blobs = 2
    stats._blob_bytes = 2048
    summary, skipped = stats.summarize() # 3.59μs -> 3.23μs (11.4% faster)

def test_summarize_skipped_summary_format():
    # Skipped summary should format both tensors and blobs
    stats = UploadStats()
    stats._num_tensors_skipped = 2
    stats._tensor_bytes_skipped = 2048
    stats._num_blobs_skipped = 1
    stats._blob_bytes_skipped = 1024
    summary, skipped = stats.summarize() # 5.17μs -> 4.85μs (6.52% faster)

def test_summarize_no_scalars_no_tensors_no_blobs():
    # Test with no data at all
    stats = UploadStats()
    summary, skipped = stats.summarize() # 1.84μs -> 1.52μs (20.5% faster)

# ---------------------
# Large Scale Test Cases
# ---------------------

def test_summarize_large_tensor_count_and_bytes():
    # Test with large number of tensors and large bytes
    stats = UploadStats()
    stats._num_scalars = 100
    stats._num_tensors = 999
    stats._tensor_bytes = 999 * 2048  # 2MB approx
    stats._num_blobs = 500
    stats._blob_bytes = 500 * 1024  # 500kB
    summary, skipped = stats.summarize() # 4.35μs -> 4.06μs (7.04% faster)

def test_summarize_large_skipped_counts():
    # Large skipped counts
    stats = UploadStats()
    stats._num_tensors = 1000
    stats._tensor_bytes = 1000 * 1024
    stats._num_tensors_skipped = 500
    stats._tensor_bytes_skipped = 500 * 1024
    stats._num_blobs = 1000
    stats._blob_bytes = 1000 * 2048
    stats._num_blobs_skipped = 999
    stats._blob_bytes_skipped = 999 * 2048
    summary, skipped = stats.summarize() # 5.38μs -> 5.28μs (1.82% faster)

def test_summarize_large_bytes_boundary():
    # Test with bytes just below and above MB boundary
    stats = UploadStats()
    stats._num_tensors = 1
    stats._tensor_bytes = 2**20 - 1  # Just below 1MB
    stats._num_blobs = 1
    stats._blob_bytes = 2**20 + 1  # Just above 1MB
    summary, skipped = stats.summarize() # 3.87μs -> 3.49μs (10.9% faster)

def test_summarize_large_skipped_bytes():
    # Large skipped bytes for tensors and blobs
    stats = UploadStats()
    stats._num_tensors = 100
    stats._tensor_bytes = 100 * 1024
    stats._num_tensors_skipped = 99
    stats._tensor_bytes_skipped = 99 * 1024
    stats._num_blobs = 100
    stats._blob_bytes = 100 * 2048
    stats._num_blobs_skipped = 99
    stats._blob_bytes_skipped = 99 * 2048
    summary, skipped = stats.summarize() # 5.34μs -> 5.04μs (5.77% faster)

def test_summarize_maximum_scalars():
    # Test with maximum allowed scalars (under 1000)
    stats = UploadStats()
    stats._num_scalars = 999
    summary, skipped = stats.summarize() # 1.84μs -> 1.64μs (12.4% faster)

def test_summarize_maximum_tensors_and_blobs():
    # Test with maximum allowed tensors and blobs
    stats = UploadStats()
    stats._num_tensors = 999
    stats._tensor_bytes = 999 * 1024
    stats._num_blobs = 999
    stats._blob_bytes = 999 * 2048
    summary, skipped = stats.summarize() # 4.12μs -> 3.79μs (8.71% faster)

def test_summarize_large_skipped_and_uploaded():
    # Large uploaded and skipped counts
    stats = UploadStats()
    stats._num_scalars = 500
    stats._num_tensors = 999
    stats._tensor_bytes = 999 * 1024
    stats._num_tensors_skipped = 499
    stats._tensor_bytes_skipped = 499 * 1024
    stats._num_blobs = 999
    stats._blob_bytes = 999 * 2048
    stats._num_blobs_skipped = 499
    stats._blob_bytes_skipped = 499 * 2048
    summary, skipped = stats.summarize() # 5.41μs -> 4.97μs (9.00% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import time

# imports
import pytest  # used for our unit tests
from aiplatform.tensorboard.upload_tracker import UploadStats

# unit tests

# --- Basic Test Cases ---

def test_summarize_all_zeros():
    """Basic: All counters are zero, should report zero for everything and None for skipped."""
    stats = UploadStats()
    summary, skipped = stats.summarize() # 1.85μs -> 1.60μs (15.9% faster)

def test_summarize_only_scalars():
    """Basic: Only scalars are present, tensors and blobs are zero."""
    stats = UploadStats()
    stats._num_scalars = 5
    summary, skipped = stats.summarize() # 1.82μs -> 1.60μs (13.1% faster)

def test_summarize_tensors_and_scalars():
    """Basic: Scalars and tensors present, no skipped, no blobs."""
    stats = UploadStats()
    stats._num_scalars = 3
    stats._num_tensors = 2
    stats._tensor_bytes = 2048  # 2 kB
    summary, skipped = stats.summarize() # 3.55μs -> 3.18μs (11.7% faster)

def test_summarize_blobs_and_scalars():
    """Basic: Scalars and blobs present, no tensors."""
    stats = UploadStats()
    stats._num_scalars = 1
    stats._num_blobs = 2
    stats._blob_bytes = 3000  # ~2.9 kB
    summary, skipped = stats.summarize() # 3.44μs -> 3.04μs (13.3% faster)

def test_summarize_all_types():
    """Basic: Scalars, tensors, and blobs present."""
    stats = UploadStats()
    stats._num_scalars = 7
    stats._num_tensors = 3
    stats._tensor_bytes = 1000000  # ~976.6 kB
    stats._num_blobs = 4
    stats._blob_bytes = 4000000  # ~3.8 MB
    summary, skipped = stats.summarize() # 4.08μs -> 3.83μs (6.61% faster)

# --- Edge Test Cases ---

def test_summarize_some_tensors_skipped():
    """Edge: Some tensors are skipped, should show correct uploaded and skipped counts/bytes."""
    stats = UploadStats()
    stats._num_scalars = 2
    stats._num_tensors = 5
    stats._num_tensors_skipped = 2
    stats._tensor_bytes = 5000
    stats._tensor_bytes_skipped = 2000
    summary, skipped = stats.summarize() # 4.54μs -> 4.22μs (7.56% faster)

def test_summarize_some_blobs_skipped():
    """Edge: Some blobs are skipped, should show correct uploaded and skipped counts/bytes."""
    stats = UploadStats()
    stats._num_blobs = 10
    stats._num_blobs_skipped = 7
    stats._blob_bytes = 10000
    stats._blob_bytes_skipped = 7000
    summary, skipped = stats.summarize() # 4.40μs -> 3.99μs (10.3% faster)

def test_summarize_tensors_and_blobs_skipped():
    """Edge: Both tensors and blobs are skipped, should show both in skipped string."""
    stats = UploadStats()
    stats._num_tensors = 8
    stats._num_tensors_skipped = 3
    stats._tensor_bytes = 8000
    stats._tensor_bytes_skipped = 3000
    stats._num_blobs = 6
    stats._num_blobs_skipped = 2
    stats._blob_bytes = 6000
    stats._blob_bytes_skipped = 2000
    summary, skipped = stats.summarize() # 5.16μs -> 4.86μs (6.15% faster)

def test_summarize_all_tensors_skipped():
    """Edge: All tensors are skipped, so uploaded tensors count is zero."""
    stats = UploadStats()
    stats._num_tensors = 4
    stats._num_tensors_skipped = 4
    stats._tensor_bytes = 4096
    stats._tensor_bytes_skipped = 4096
    summary, skipped = stats.summarize() # 3.54μs -> 3.28μs (8.11% faster)

def test_summarize_all_blobs_skipped():
    """Edge: All blobs are skipped, so uploaded blobs count is zero."""
    stats = UploadStats()
    stats._num_blobs = 5
    stats._num_blobs_skipped = 5
    stats._blob_bytes = 5000
    stats._blob_bytes_skipped = 5000
    summary, skipped = stats.summarize() # 3.55μs -> 3.20μs (10.9% faster)

def test_summarize_bytes_boundaries():
    """Edge: Test readable_bytes_string boundaries (B, kB, MB)."""
    stats = UploadStats()
    stats._num_tensors = 3
    stats._tensor_bytes = 1023  # Should be '1023 B'
    summary, _ = stats.summarize() # 2.66μs -> 2.35μs (13.2% faster)

    stats._tensor_bytes = 1024  # Should be '1.0 kB'
    summary, _ = stats.summarize() # 2.06μs -> 1.99μs (3.21% faster)

    stats._tensor_bytes = 2**20  # Should be '1.0 MB'
    summary, _ = stats.summarize() # 1.23μs -> 986ns (24.4% faster)

def test_summarize_negative_bytes():
    """Edge: Negative bytes (should still format, but negative values are possible from incorrect usage)."""
    stats = UploadStats()
    stats._num_tensors = 2
    stats._tensor_bytes = -100
    summary, _ = stats.summarize() # 2.52μs -> 2.30μs (9.51% faster)

def test_summarize_negative_skipped_bytes():
    """Edge: Negative skipped bytes (should still format, but negative values are possible from incorrect usage)."""
    stats = UploadStats()
    stats._num_tensors = 2
    stats._num_tensors_skipped = 1
    stats._tensor_bytes = 200
    stats._tensor_bytes_skipped = 300  # Skipped > total, uploaded bytes negative
    summary, skipped = stats.summarize() # 3.50μs -> 3.17μs (10.6% faster)

def test_summarize_plugin_names_ignored():
    """Edge: _plugin_names is not used in summarize, changing it should not affect output."""
    stats = UploadStats()
    stats._plugin_names.add("foo_plugin")
    summary, skipped = stats.summarize() # 1.77μs -> 1.50μs (17.8% faster)

# --- Large Scale Test Cases ---

def test_summarize_large_scalars():
    """Large: Many scalars, no tensors or blobs."""
    stats = UploadStats()
    stats._num_scalars = 999
    summary, skipped = stats.summarize() # 1.80μs -> 1.54μs (17.1% faster)

def test_summarize_large_tensors():
    """Large: Many tensors, large total bytes."""
    stats = UploadStats()
    stats._num_tensors = 1000
    stats._tensor_bytes = 1000 * 1024  # 1,024,000 bytes = 1000 kB = ~1000.0 kB
    summary, skipped = stats.summarize() # 3.83μs -> 3.47μs (10.5% faster)

def test_summarize_large_blobs():
    """Large: Many blobs, large total bytes."""
    stats = UploadStats()
    stats._num_blobs = 1000
    stats._blob_bytes = 1000 * 2048  # 2,048,000 bytes = ~2.0 MB
    summary, skipped = stats.summarize() # 3.64μs -> 3.13μs (16.3% faster)

def test_summarize_large_skipped():
    """Large: Large numbers of skipped tensors and blobs."""
    stats = UploadStats()
    stats._num_tensors = 1000
    stats._num_tensors_skipped = 500
    stats._tensor_bytes = 1000 * 1024
    stats._tensor_bytes_skipped = 500 * 1024
    stats._num_blobs = 800
    stats._num_blobs_skipped = 400
    stats._blob_bytes = 800 * 2048
    stats._blob_bytes_skipped = 400 * 2048
    summary, skipped = stats.summarize() # 5.60μs -> 5.20μs (7.74% faster)

def test_summarize_large_all_skipped():
    """Large: All large tensors and blobs are skipped."""
    stats = UploadStats()
    stats._num_tensors = 1000
    stats._num_tensors_skipped = 1000
    stats._tensor_bytes = 1000 * 1024
    stats._tensor_bytes_skipped = 1000 * 1024
    stats._num_blobs = 1000
    stats._num_blobs_skipped = 1000
    stats._blob_bytes = 1000 * 2048
    stats._blob_bytes_skipped = 1000 * 2048
    summary, skipped = stats.summarize() # 4.55μs -> 4.24μs (7.22% faster)

def test_summarize_performance():
    """Large: Performance test for summarize with large numbers, should be quick."""
    stats = UploadStats()
    stats._num_scalars = 1000
    stats._num_tensors = 1000
    stats._tensor_bytes = 1000 * 1024
    stats._num_blobs = 1000
    stats._blob_bytes = 1000 * 2048
    # Time the summarize call
    start = time.time()
    summary, skipped = stats.summarize() # 3.69μs -> 3.60μs (2.28% faster)
    end = time.time()

def test_summarize_large_skipped_bytes_boundary():
    """Large: Skipped bytes at MB boundary."""
    stats = UploadStats()
    stats._num_tensors = 100
    stats._num_tensors_skipped = 50
    stats._tensor_bytes = 100 * 2**20  # 100 MB
    stats._tensor_bytes_skipped = 50 * 2**20  # 50 MB
    summary, skipped = stats.summarize() # 4.32μs -> 3.86μs (11.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-UploadStats.summarize-mglqouos and push.

Codeflash

The optimized code achieves a 9% speedup through several key optimizations:

**1. Precomputed Constants in `readable_bytes_string`**
The original code repeatedly computed `2**10`, `2**20`, and `float()` conversions on every call. The optimized version precomputes these as module-level constants (`_KB = 1024`, `_MB = 1024 * 1024`, `_KB_f`, `_MB_f`), eliminating expensive power operations and float conversions. This optimization is particularly effective since `readable_bytes_string` is called frequently during summarization.

**2. Reduced Function Call Overhead in `summarize`**
The original code used a conditional expression `self._skipped_summary() if self._skipped_any() else None` that always called `_skipped_any()` and potentially `_skipped_summary()`. The optimized version inlines the skipped check (`if self._num_tensors_skipped or self._num_blobs_skipped`) and only calls `_skipped_summary()` when needed, eliminating one function call per summarize operation.

**3. Improved Data Structure Construction**
Instead of creating an empty list and repeatedly calling `append()`, the optimized version constructs the `string_pieces` list directly using a list literal with conditional expressions. This reduces list resize operations and method call overhead.

**4. Variable Access Optimization**
The optimized code uses a local variable `b = bytes` in `readable_bytes_string` and moves all computation calculations to the top of `summarize()`, improving CPU cache locality and reducing repeated attribute access.

These optimizations show consistent improvements across all test cases, with the most significant gains (15-25%) in simple cases with fewer bytes formatting operations, and smaller but meaningful gains (2-12%) in complex cases with multiple data types and skipped items. The optimizations are particularly effective for high-frequency logging scenarios where `summarize()` is called repeatedly.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 11, 2025 03:52
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant