Skip to content

fix: 5 critical bugs — TEE randomization, HTTP 402 hint, int dtype, inference None crash, gas estimation#201

Open
amathxbt wants to merge 4 commits intoOpenGradient:mainfrom
amathxbt:fix/critical-5-bugs-tee-402-dtype-inference-gas
Open

fix: 5 critical bugs — TEE randomization, HTTP 402 hint, int dtype, inference None crash, gas estimation#201
amathxbt wants to merge 4 commits intoOpenGradient:mainfrom
amathxbt:fix/critical-5-bugs-tee-402-dtype-inference-gas

Conversation

@amathxbt
Copy link
Contributor

Summary

This PR fixes 5 critical bugs identified from open issues and code audit, spanning TEE selection, LLM error handling, type conversions, inference safety, and gas estimation.


Bug 1 — TEE always picks the same node (closes #200)

File: src/opengradient/client/tee_registry.py

Root cause: get_llm_tee() always returned tees[0], routing 100% of traffic to a single TEE with zero load distribution or failover.

Fix: Replace tees[0] with random.choice(tees) so each LLM() construction independently selects from the full pool of active, registry-verified TEEs.

# Before
return tees[0]

# After
selected = random.choice(tees)
logger.debug("Selected TEE %s from %d active LLM proxy TEE(s)", selected.tee_id, len(tees))
return selected

Bug 2 — HTTP 402 swallowed as cryptic RuntimeError (closes #188)

File: src/opengradient/client/llm.py

Root cause: All HTTP errors (including 402 Payment Required) were caught by except Exception and re-raised as a generic RuntimeError("TEE LLM chat failed: ..."). Users had no idea they needed to call ensure_opg_approval() first.

Fix: Add import httpx and intercept httpx.HTTPStatusError before the generic handler. When status_code == 402, raise a RuntimeError with an explicit, actionable message pointing to ensure_opg_approval(). Applied to _chat_request(), completion(), and _parse_sse_response() (streaming path).

# New constant
_402_HINT = (
    "Payment required (HTTP 402): your wallet may have insufficient OPG token allowance. "
    "Call llm.ensure_opg_approval(opg_amount=<amount>) to approve Permit2 spending "
    "before making requests. Minimum amount is 0.05 OPG."
)

# In _chat_request / completion / _parse_sse_response:
except httpx.HTTPStatusError as e:
    if e.response.status_code == 402:
        raise RuntimeError(_402_HINT) from e
    raise RuntimeError(f"TEE LLM chat failed: {e}") from e

Bug 3 — Fixed-point int returns np.float32 instead of int (partially closes #103)

File: src/opengradient/client/_conversions.py

Root cause: convert_to_float32(value, decimals) always returned np.float32, even when decimals == 0 (i.e., the original value was an integer). Users expecting integer outputs received np.float32 and had to cast manually.

Fix: Add convert_fixed_point_to_python(value, decimals) that returns int when decimals == 0 and np.float32 otherwise. np.array() then infers the correct dtype (int64 vs float32) automatically. The old convert_to_float32 is kept as a deprecated alias for backward compatibility.

def convert_fixed_point_to_python(value: int, decimals: int) -> Union[int, np.float32]:
    if decimals == 0:
        return int(value)      # ← integer tensor stays integer
    return np.float32(Decimal(value) / (10 ** Decimal(decimals)))

Bug 4 — infer() crashes with AttributeError when inference node returns None

File: src/opengradient/client/alpha.py

Root cause: _get_inference_result_from_node() returns None when the node has no result yet. This None was passed directly to convert_to_model_output(), which calls event_data.get("output", {})AttributeError: NoneType object has no attribute get. Additionally, if ModelInferenceEvent logs were empty, parsed_logs[0] caused an IndexError.

Fix:

# Guard 1: empty precompile logs
if not precompile_logs:
    raise RuntimeError("ModelInferenceEvent not found in transaction logs.")

# Guard 2: None from inference node
if inference_result is None:
    raise RuntimeError(
        f"Inference node returned no result for inference ID {inference_id!r}. "
        "The result may not be available yet — retry after a short delay."
    )

Bug 5 — run_workflow() uses hardcoded 30M gas

File: src/opengradient/client/alpha.py

Root cause: run_workflow() hardcoded gas=30000000, unlike infer() which correctly uses estimate_gas(). This is both wasteful (users overpay) and fragile on networks with lower block gas limits.

Fix: Call estimate_gas() first and multiply by 3 for headroom. Fall back to 30M only if estimation itself fails.

try:
    estimated_gas = run_function.estimate_gas({"from": self._wallet_account.address})
    gas_limit = int(estimated_gas * 3)
except Exception:
    gas_limit = 30000000  # fallback

Also replaces the hardcoded timeout=60 in new_workflow() with the INFERENCE_TX_TIMEOUT constant.


Files Changed

File Bugs Fixed
src/opengradient/client/tee_registry.py Bug 1 (TEE randomization)
src/opengradient/client/llm.py Bug 2 (HTTP 402 hint)
src/opengradient/client/_conversions.py Bug 3 (int dtype)
src/opengradient/client/alpha.py Bugs 4 & 5 (None guard + gas)

Related Issues

Closes #200
Closes #188
Partially closes #103

Previously get_llm_tee() always returned tees[0], the first TEE in the
registry list. This caused all clients to hit the same TEE, providing no
load distribution and no resilience when that TEE starts failing.

Fix: use random.choice(tees) so each LLM() construction independently
picks from all currently active TEEs. Successive retries or re-initializations
will therefore naturally spread across the healthy pool.

Closes OpenGradient#200
Previously any HTTP error from the TEE (including 402 Payment Required)
was silently wrapped into a generic RuntimeError("TEE LLM chat failed: ...").
This caused the confusing traceback seen in issue OpenGradient#188 — the real cause
(insufficient OPG allowance) was buried inside the exception message.

Fix:
- Add `import httpx` and intercept httpx.HTTPStatusError before the
  generic `except Exception` handler in both _chat_request() and
  completion().
- When status == 402, raise a RuntimeError with a clear, actionable hint
  telling the user to call llm.ensure_opg_approval(opg_amount=<amount>).
- Also handle 402 in _parse_sse_response() for the streaming path.
- All other HTTP errors continue to surface as before.

Closes OpenGradient#188
Previously convert_to_float32() always returned np.float32 regardless of
the decimals field, forcing callers to manually cast integer results
(issue OpenGradient#103 — add proper type conversions from Solidity contract to Python).

Fix:
- Add convert_fixed_point_to_python(value, decimals) that returns int when
  decimals == 0 and np.float32 otherwise.  np.array() automatically picks
  the correct dtype (int64 vs float32) based on the element types.
- Update both convert_to_model_output() and convert_array_to_model_output()
  to call the new function.
- Keep convert_to_float32() as a deprecated backward-compatible alias so
  any external code that imports it directly continues to work.

Partially closes OpenGradient#103
… in run_workflow

Bug 4 — None crash in infer():
  _get_inference_result_from_node() can legitimately return None when the
  inference node has no result yet.  Previously this None was passed
  directly to convert_to_model_output(), which calls event_data.get(),
  causing an AttributeError: 'NoneType' object has no attribute 'get'.

  Fix: after calling _get_inference_result_from_node(), check for None
  and raise a clear RuntimeError with a human-readable message and the
  inference_id so callers know what to retry.

  Also guard the precompile log fetch: if parsed_logs is empty, raise
  RuntimeError instead of crashing with an IndexError on parsed_logs[0].

Bug 5 — hardcoded 30M gas in run_workflow():
  run_workflow() built the transaction with gas=30000000 (30 million)
  unconditionally.  This is wasteful (users overpay for gas) and can
  fail on networks with a lower block gas limit.

  Fix: call estimate_gas() first (consistent with infer()), then multiply
  by 3 for headroom.  Fall back to 30M only if estimation itself fails.

Also fixes new_workflow() deploy to use INFERENCE_TX_TIMEOUT constant
instead of the previous hardcoded 60 seconds.
@amathxbt
Copy link
Contributor Author

Hey @adambalogh 👋 — this PR addresses 5 critical bugs found during a code audit, 3 of which directly close or partially close open issues you and the team have filed:

# Bug Issue
1 get_llm_tee() always returned tees[0] — no randomization or load distribution Closes #200
2 HTTP 402 from TEE was swallowed as a cryptic RuntimeError with no guidance Closes #188
3 Fixed-point values with decimals=0 returned np.float32 instead of int Partially closes #103
4 infer() crashed with AttributeError when the inference node returned None Code audit
5 run_workflow() hardcoded gas=30000000 instead of using estimate_gas() Code audit

All changes are backward-compatible. Happy to adjust anything before review! 🙏

@adambalogh
Copy link
Collaborator

Hey @amathxbt thanks for your contributions, I added it to my list to review, will get back to you tomorrow!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants