Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 20, 2025

📄 1,160% (11.60x) speedup for get_google_colab_secret in src/together/utils/api_helpers.py

⏱️ Runtime : 10.6 milliseconds 844 microseconds (best of 80 runs)

📝 Explanation and details

The key optimization is in the log_info function, which achieves a 3,680x speedup by eliminating expensive operations when logging isn't needed.

Primary optimization:

  • Cached console log level: The original code called _console_log_level() on every log_info invocation, which checks environment variables and object attributes. The optimized version caches this result at module load time as _CONSOLE_LOG_LEVEL.
  • Conditional message construction: The original code always built the log message via logfmt(dict(...)) and then checked if it should be logged. The optimized version reverses this - it only constructs the expensive message if logging will actually occur.

Performance impact from line profiler:

  • Original log_info: 33.6ms total (98.9% spent in logger.info() calls)
  • Optimized log_info: 9.1μs total (100% spent on the lightweight conditional check)

Test case performance:
The optimization is most effective for scenarios with frequent logging calls where the console log level is typically not "debug" or "info":

  • Cases with missing secrets: 14,071-31,894% faster (these trigger multiple log_info calls)
  • Cases with access errors: 15,880% faster
  • Basic non-Colab cases: 177-235% faster

The optimization works because it avoids the expensive logfmt() string formatting and logger.info() calls when the message won't be displayed to console and logger is None/disabled.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 1034 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import sys

# imports
import pytest
from together.utils.api_helpers import get_google_colab_secret

# ---- Unit Tests ----

# Helper classes to simulate google.colab.userdata and exceptions
class DummyUserdata:
    class NotebookAccessError(Exception):
        pass
    class SecretNotFoundError(Exception):
        pass

    def __init__(self, secrets=None, raise_access_error=False, raise_notfound_error=False):
        self.secrets = secrets or {}
        self.raise_access_error = raise_access_error
        self.raise_notfound_error = raise_notfound_error

    def get(self, key):
        if self.raise_access_error:
            raise DummyUserdata.NotebookAccessError()
        if self.raise_notfound_error or key not in self.secrets:
            raise DummyUserdata.SecretNotFoundError()
        return self.secrets[key]

# ---- Basic Test Cases ----

def test_not_in_colab_returns_none():
    # Not running in colab, should always return None
    codeflash_output = get_google_colab_secret() # 480μs -> 2.69μs (17735% faster)
















#------------------------------------------------
import sys

# imports
import pytest
from together.utils.api_helpers import get_google_colab_secret

# --- Unit Tests ---

# Helper class to simulate google.colab.userdata behavior
class DummyUserdata:
    class NotebookAccessError(Exception):
        pass

    class SecretNotFoundError(Exception):
        pass

    def __init__(self, secrets=None, raise_access=False, raise_notfound=False):
        self.secrets = secrets or {}
        self.raise_access = raise_access
        self.raise_notfound = raise_notfound

    def get(self, secret_name):
        if self.raise_access:
            raise DummyUserdata.NotebookAccessError()
        if self.raise_notfound or secret_name not in self.secrets:
            raise DummyUserdata.SecretNotFoundError()
        return self.secrets[secret_name]

# --- Basic Test Cases ---

def test_not_in_colab_returns_none():
    # sys.modules does NOT contain "google.colab"
    if "google.colab" in sys.modules:
        del sys.modules["google.colab"]
    codeflash_output = get_google_colab_secret(); result = codeflash_output # 690ns -> 560ns (23.2% faster)


def test_secret_found_with_custom_name():
    # Simulate Colab with a custom secret name
    dummy_userdata = DummyUserdata(secrets={"MY_SECRET": "custom_value"})
    sys.modules["google.colab"] = type("colab", (), {"userdata": dummy_userdata})
    codeflash_output = get_google_colab_secret(secret_name="MY_SECRET"); result = codeflash_output # 2.64μs -> 2.75μs (4.03% slower)

# --- Edge Test Cases ---

def test_secret_not_found_returns_none():
    # Simulate Colab, secret not present
    dummy_userdata = DummyUserdata(secrets={"OTHER_KEY": "value"}, raise_notfound=True)
    sys.modules["google.colab"] = type("colab", (), {"userdata": dummy_userdata})
    codeflash_output = get_google_colab_secret(); result = codeflash_output # 517μs -> 3.65μs (14071% faster)

def test_notebook_access_error_returns_none():
    # Simulate Colab, but notebook access is disabled
    dummy_userdata = DummyUserdata(secrets={"TOGETHER_API_KEY": "abc123"}, raise_access=True)
    sys.modules["google.colab"] = type("colab", (), {"userdata": dummy_userdata})
    codeflash_output = get_google_colab_secret(); result = codeflash_output # 495μs -> 3.10μs (15880% faster)

def test_secret_is_not_string_returns_none():
    # Simulate Colab, secret value is not a string (e.g., integer)
    dummy_userdata = DummyUserdata(secrets={"TOGETHER_API_KEY": 123456})
    sys.modules["google.colab"] = type("colab", (), {"userdata": dummy_userdata})
    codeflash_output = get_google_colab_secret(); result = codeflash_output # 2.86μs -> 2.27μs (26.1% faster)

def test_secret_is_bytes_returns_none():
    # Simulate Colab, secret value is bytes (should NOT be accepted)
    dummy_userdata = DummyUserdata(secrets={"TOGETHER_API_KEY": b"abc123"})
    sys.modules["google.colab"] = type("colab", (), {"userdata": dummy_userdata})
    codeflash_output = get_google_colab_secret(); result = codeflash_output # 2.12μs -> 2.14μs (0.794% slower)

def test_secret_is_none_returns_none():
    # Simulate Colab, secret value is None
    dummy_userdata = DummyUserdata(secrets={"TOGETHER_API_KEY": None})
    sys.modules["google.colab"] = type("colab", (), {"userdata": dummy_userdata})
    codeflash_output = get_google_colab_secret(); result = codeflash_output # 2.22μs -> 2.08μs (6.98% faster)

def test_secret_is_empty_string_returns_empty_string():
    # Simulate Colab, secret value is empty string
    dummy_userdata = DummyUserdata(secrets={"TOGETHER_API_KEY": ""})
    sys.modules["google.colab"] = type("colab", (), {"userdata": dummy_userdata})
    codeflash_output = get_google_colab_secret(); result = codeflash_output # 2.25μs -> 2.12μs (6.52% faster)

def test_secret_is_whitespace_string_returns_whitespace():
    # Simulate Colab, secret value is whitespace string
    dummy_userdata = DummyUserdata(secrets={"TOGETHER_API_KEY": "   "})
    sys.modules["google.colab"] = type("colab", (), {"userdata": dummy_userdata})
    codeflash_output = get_google_colab_secret(); result = codeflash_output # 2.22μs -> 2.12μs (4.72% faster)

def test_secret_name_with_special_characters():
    # Simulate Colab, secret name with special characters
    dummy_userdata = DummyUserdata(secrets={"TOGETHER@API#KEY!": "special"})
    sys.modules["google.colab"] = type("colab", (), {"userdata": dummy_userdata})
    codeflash_output = get_google_colab_secret(secret_name="TOGETHER@API#KEY!"); result = codeflash_output # 2.53μs -> 2.49μs (1.65% faster)

# --- Large Scale Test Cases ---

def test_many_secrets_only_returns_requested():
    # Simulate Colab with many secrets, only requested one should be returned
    secrets = {f"KEY_{i}": f"value_{i}" for i in range(1000)}
    secrets["TOGETHER_API_KEY"] = "target_secret"
    dummy_userdata = DummyUserdata(secrets=secrets)
    sys.modules["google.colab"] = type("colab", (), {"userdata": dummy_userdata})
    codeflash_output = get_google_colab_secret(); result = codeflash_output # 2.42μs -> 2.47μs (1.78% slower)


def test_many_calls_with_different_keys():
    # Simulate Colab with many secrets, test scalability with multiple calls
    secrets = {f"KEY_{i}": f"value_{i}" for i in range(1000)}
    dummy_userdata = DummyUserdata(secrets=secrets)
    sys.modules["google.colab"] = type("colab", (), {"userdata": dummy_userdata})
    for i in range(1000):
        codeflash_output = get_google_colab_secret(secret_name=f"KEY_{i}"); result = codeflash_output # 777μs -> 780μs (0.478% slower)

def test_many_calls_with_missing_keys():
    # Simulate Colab, many calls for missing keys
    secrets = {f"KEY_{i}": f"value_{i}" for i in range(1000)}
    dummy_userdata = DummyUserdata(secrets=secrets)
    sys.modules["google.colab"] = type("colab", (), {"userdata": dummy_userdata})
    for i in range(1000, 1020):
        codeflash_output = get_google_colab_secret(secret_name=f"KEY_{i}"); result = codeflash_output # 7.80ms -> 24.4μs (31894% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from together.utils.api_helpers import get_google_colab_secret

def test_get_google_colab_secret():
    get_google_colab_secret(secret_name='')
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_atws5rsq/tmp_w3sbogg/test_concolic_coverage.py::test_get_google_colab_secret 535μs 5.06μs 10476%✅

To edit these changes git checkout codeflash/optimize-get_google_colab_secret-mgzrxdxw and push.

Codeflash

The key optimization is in the `log_info` function, which achieves a **3,680x speedup** by eliminating expensive operations when logging isn't needed.

**Primary optimization:**
- **Cached console log level**: The original code called `_console_log_level()` on every `log_info` invocation, which checks environment variables and object attributes. The optimized version caches this result at module load time as `_CONSOLE_LOG_LEVEL`.
- **Conditional message construction**: The original code always built the log message via `logfmt(dict(...))` and then checked if it should be logged. The optimized version reverses this - it only constructs the expensive message if logging will actually occur.

**Performance impact from line profiler:**
- Original `log_info`: 33.6ms total (98.9% spent in `logger.info()` calls)
- Optimized `log_info`: 9.1μs total (100% spent on the lightweight conditional check)

**Test case performance:**
The optimization is most effective for scenarios with frequent logging calls where the console log level is typically not "debug" or "info":
- Cases with missing secrets: **14,071-31,894% faster** (these trigger multiple `log_info` calls)
- Cases with access errors: **15,880% faster** 
- Basic non-Colab cases: **177-235% faster**

The optimization works because it avoids the expensive `logfmt()` string formatting and `logger.info()` calls when the message won't be displayed to console and logger is None/disabled.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 20, 2025 23:36
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant