Skip to content

Conversation

@chughtapan
Copy link
Owner

Implements a new intermediate API prediction mode that uses oracle data to identify required services, then exposes all APIs from those services.

Changes:

  • Add app_oracle mode: Uses ground truth to identify apps (e.g., spotify, venmo), then loads all APIs from those apps. System apps (supervisor) only include ground truth APIs.
  • Refactor: Split appworld_helpers.py into api_predictor.py (API prediction) and prompts.py (prompt management) for better separation of concerns
  • Fix: Remove 20-API limit for "all" mode (now returns all 473 APIs)
  • Fix: Eliminate duplicate Task loading in predict_apis()

API count comparison for typical task:

  • ground_truth: 6 APIs (exact oracle)
  • app_oracle: 95 APIs (3 supervisor + 92 spotify)
  • all: 473 APIs (no limit)

Usage:
pytest tests/benchmarks/appworld/test_appworld.py --api-mode app_oracle \ --dataset train --limit 5 --model gpt-4o

🤖 Generated with Claude Code

Implements a new intermediate API prediction mode that uses oracle data to
identify required services, then exposes all APIs from those services.

Changes:
- Add app_oracle mode: Uses ground truth to identify apps (e.g., spotify,
  venmo), then loads all APIs from those apps. System apps (supervisor)
  only include ground truth APIs.
- Refactor: Split appworld_helpers.py into api_predictor.py (API prediction)
  and prompts.py (prompt management) for better separation of concerns
- Fix: Remove 20-API limit for "all" mode (now returns all 473 APIs)
- Fix: Eliminate duplicate Task loading in predict_apis()

API count comparison for typical task:
- ground_truth: 6 APIs (exact oracle)
- app_oracle: 95 APIs (3 supervisor + 92 spotify)
- all: 473 APIs (no limit)

Usage:
  pytest tests/benchmarks/appworld/test_appworld.py --api-mode app_oracle \
    --dataset train --limit 5 --model gpt-4o

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@chughtapan chughtapan requested a review from Copilot October 23, 2025 20:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements an intermediate API prediction mode called app_oracle that uses ground truth data to identify required services, then exposes all APIs from those services. This provides a middle ground between exact oracle APIs (ground_truth) and all available APIs (all).

Key changes:

  • Added app_oracle mode that returns ~50-100 APIs by identifying required apps from ground truth, then loading all APIs from those apps
  • Refactored code by splitting appworld_helpers.py into separate api_predictor.py and prompts.py modules
  • Removed the 20-API limit for "all" mode to return all 473 available APIs

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
tests/benchmarks/appworld/test_appworld.py Updated imports to use new api_predictor and prompts modules
tests/benchmarks/appworld/prompts.py Removed API prediction logic, keeping only prompt management functions
tests/benchmarks/appworld/conftest.py Added app_oracle to API mode choices and updated documentation
tests/benchmarks/appworld/api_predictor.py New module implementing all API prediction modes including the new app_oracle mode

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@chughtapan chughtapan merged commit 533f386 into main Oct 23, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants