Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: cast port to int #6643

Merged
merged 1 commit into from
Jan 12, 2024
Merged

fix: cast port to int #6643

merged 1 commit into from
Jan 12, 2024

Conversation

orarbel
Copy link
Contributor

@orarbel orarbel commented Dec 31, 2023

Background

When executing port = os.getenv("PORT", 8000) if the port is being fetched from a .env file it is fetched as a string.

This causes an error: TypeError: 'str' object cannot be interpreted as an integer

Changes 🏗️

The port is casted to int before passed to uvicorn

PR Quality Scorecard ✨

  • Have you used the PR description template?   +2 pts
  • Is your pull request atomic, focusing on a single change?   +5 pts
  • Have you linked the GitHub issue(s) that this PR addresses?   +5 pts
  • Have you documented your changes clearly and comprehensively?   +5 pts
  • Have you changed or added a feature?   -4 pts
    • Have you added/updated corresponding documentation?   +4 pts
    • Have you added/updated corresponding integration tests?   +5 pts
  • Have you changed the behavior of AutoGPT?   -5 pts
    • Have you also run agbenchmark to verify that these changes do not regress performance?   +10 pts

When executing `port = os.getenv("PORT", 8000)` if the port is being fetched from a `.env` file it is fetched as a string.

This causes an error: `TypeError: 'str' object cannot be interpreted as an integer`
Copy link

netlify bot commented Dec 31, 2023

Deploy Preview for auto-gpt-docs ready!

Name Link
🔨 Latest commit 99835b0
🔍 Latest deploy log https://app.netlify.com/sites/auto-gpt-docs/deploys/659163e32732390008d255b7
😎 Deploy Preview https://deploy-preview-6643--auto-gpt-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Contributor

@Swiftyos Swiftyos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you

@Pwuts Pwuts merged commit 48a2186 into Significant-Gravitas:master Jan 12, 2024
6 checks passed
MaHDiaLaGaB added a commit to germatech/GxAgents that referenced this pull request Jan 28, 2024
* [Documentation Update] Updating Using and Creating Abilities to use Action Annotations (Significant-Gravitas#6653)

Changing ability documentation

* AGBenchmark codebase clean-up (Significant-Gravitas#6650)

* refactor(benchmark): Deduplicate configuration loading logic

   - Move the configuration loading logic to a separate `load_agbenchmark_config` function in `agbenchmark/config.py` module.
   - Replace the duplicate loading logic in `conftest.py`, `generate_test.py`, `ReportManager.py`, `reports.py`, and `__main__.py` with calls to `load_agbenchmark_config` function.

* fix(benchmark): Fix type errors, linting errors, and clean up CLI validation in __main__.py

   - Fixed type errors and linting errors in `__main__.py`
   - Improved the readability of CLI argument validation by introducing a separate function for it

* refactor(benchmark): Lint and typefix app.py

   - Rearranged and cleaned up import statements
   - Fixed type errors caused by improper use of `psutil` objects
   - Simplified a number of `os.path` usages by converting to `pathlib`
   - Use `Task` and `TaskRequestBody` classes from `agent_protocol_client` instead of `.schema`

* refactor(benchmark): Replace `.agent_protocol_client` by `agent-protcol-client`, clean up schema.py

   - Remove `agbenchmark.agent_protocol_client` (an offline copy of `agent-protocol-client`).
      - Add `agent-protocol-client` as a dependency and change imports to `agent_protocol_client`.
   - Fix type annotation on `agent_api_interface.py::upload_artifacts` (`ApiClient` -> `AgentApi`).
   - Remove all unused types from schema.py (= most of them).

* refactor(benchmark): Use pathlib in agent_interface.py and agent_api_interface.py

* refactor(benchmark): Improve typing, response validation, and readability in app.py

   - Simplified response generation by leveraging type checking and conversion by FastAPI.
   - Introduced use of `HTTPException` for error responses.
   - Improved naming, formatting, and typing in `app.py::create_evaluation`.
   - Updated the docstring on `app.py::create_agent_task`.
   - Fixed return type annotations of `create_single_test` and `create_challenge` in generate_test.py.
   - Added default values to optional attributes on models in report_types_v2.py.
   - Removed unused imports in `generate_test.py`

* refactor(benchmark): Clean up logging and print statements

   - Introduced use of the `logging` library for unified logging and better readability.
   - Converted most print statements to use `logger.debug`, `logger.warning`, and `logger.error`.
   - Improved descriptiveness of log statements.
   - Removed unnecessary print statements.
   - Added log statements to unspecific and non-verbose `except` blocks.
   - Added `--debug` flag, which sets the log level to `DEBUG` and enables a more comprehensive log format.
   - Added `.utils.logging` module with `configure_logging` function to easily configure the logging library.
   - Converted raw escape sequences in `.utils.challenge` to use `colorama`.
   - Renamed `generate_test.py::generate_tests` to `load_challenges`.

* refactor(benchmark): Remove unused server.py and agent_interface.py::run_agent

   - Remove unused server.py file
   - Remove unused run_agent function from agent_interface.py

* refactor(benchmark): Clean up conftest.py

   - Fix and add type annotations
   - Rewrite docstrings
   - Disable or remove unused code
   - Fix definition of arguments and their types in `pytest_addoption`

* refactor(benchmark): Clean up generate_test.py file

   - Refactored the `create_single_test` function for clarity and readability
      - Removed unused variables
      - Made creation of `Challenge` subclasses more straightforward
      - Made bare `except` more specific
   - Renamed `Challenge.setup_challenge` method to `run_challenge`
   - Updated type hints and annotations
   - Made minor code/readability improvements in `load_challenges`
   - Added a helper function `_add_challenge_to_module` for attaching a Challenge class to the current module

* fix(benchmark): Fix and add type annotations in execute_sub_process.py

* refactor(benchmark): Simplify const determination in agent_interface.py

   - Simplify the logic that determines the value of `HELICONE_GRAPHQL_LOGS`

* fix(benchmark): Register category markers to prevent warnings

   - Use the `pytest_configure` hook to register the known challenge categories as markers. Otherwise, Pytest will raise "unknown marker" warnings at runtime.

* refactor(benchmark/challenges): Fix indentation in 4_revenue_retrieval_2/data.json

* refactor(benchmark): Update agent_api_interface.py

   - Add type annotations to `copy_agent_artifacts_into_temp_folder` function
   - Add note about broken endpoint in the `agent_protocol_client` library
   - Remove unused variable in `run_api_agent` function
   - Improve readability and resolve linting error

* feat(benchmark): Improve and centralize pathfinding

   - Search path hierarchy for applicable `agbenchmark_config`, rather than assuming it's in the current folder.
   - Create `agbenchmark.utils.path_manager` with `AGBenchmarkPathManager` and exporting a `PATH_MANAGER` const.
   - Replace path constants defined in __main__.py with usages of `PATH_MANAGER`.

* feat(benchmark/cli): Clean up and improve CLI

   - Updated commands, options, and their descriptions to be more intuitive and consistent
   - Moved slow imports into the entrypoints that use them to speed up application startup
   - Fixed type hints to match output types of Click options
   - Hid deprecated `agbenchmark start` command
   - Refactored code to improve readability and maintainability
   - Moved main entrypoint into `run` subcommand
   - Fixed `version` and `serve` subcommands
   - Added `click-default-group` package to allow using `run` implicitly (for backwards compatibility)
   - Renamed `--no_dep` to `--no-dep` for consistency
   - Fixed string formatting issues in log statements

* refactor(benchmark/config): Move AgentBenchmarkConfig and related functions to config.py

   - Move the `AgentBenchmarkConfig` class from `utils/data_types.py` to `config.py`.
   - Extract the `calculate_info_test_path` function from `utils/data_types.py` and move it to `config.py` as a private helper function `_calculate_info_test_path`.
   - Move `load_agent_benchmark_config()` to `AgentBenchmarkConfig.load()`.
   - Changed simple getter methods on `AgentBenchmarkConfig` to calculated properties.
   - Update all code references according to the changes mentioned above.

* refactor(benchmark): Fix ReportManager init parameter types and use pathlib

   - Fix the type annotation of the `benchmark_start_time` parameter in `ReportManager.__init__`, was mistyped as `str` instead of `datetime`.
   - Change the type of the `filename` parameter in the `ReportManager.__init__` method from `str` to `Path`.
   - Rename `self.filename` with `self.report_file` in `ReportManager`.
   - Change the way the report file is created, opened and saved to use the `Path` object.

* refactor(benchmark): Improve typing surrounding ChallengeData and clean up its implementation

   - Use `ChallengeData` objects instead of untyped `dict` in  app.py, generate_test.py, reports.py.
   - Remove unnecessary methods `serialize`, `get_data`, `get_json_from_path`, `deserialize` from `ChallengeData` class.
   - Remove unused methods `challenge_from_datum` and `challenge_from_test_data` from `ChallengeData class.
   - Update function signatures and annotations of `create_challenge` and `generate_single_test` functions in generate_test.py.
   - Add types to function signatures of `generate_single_call_report` and `finalize_reports` in reports.py.
   - Remove unnecessary `challenge_data` parameter (in generate_test.py) and fixture (in conftest.py).

* refactor(benchmark): Clean up generate_test.py, conftest.py and __main__.py

   - Cleaned up generate_test.py and conftest.py
      - Consolidated challenge creation logic in the `Challenge` class itself, most notably the new `Challenge.from_challenge_spec` method.
      - Moved challenge selection logic from generate_test.py to the `pytest_collection_modifyitems` hook in conftest.py.
   - Converted methods in the `Challenge` class to class methods where appropriate.
   - Improved argument handling in the `run_benchmark` function in `__main__.py`.

* refactor(benchmark/config): Merge AGBenchmarkPathManager into AgentBenchmarkConfig and reduce fragmented/global state

   - Merge the functionality of `AGBenchmarkPathManager` into `AgentBenchmarkConfig` to consolidate the configuration management.
   - Remove the `.path_manager` module containing `AGBenchmarkPathManager`.
   - Pass the `AgentBenchmarkConfig` and its attributes through function arguments to reduce global state and improve code clarity.

* feat(benchmark/serve): Configurable port for `serve` subcommand

   - Added `--port` option to `serve` subcommand to allow for specifying the port to run the API on.
   - If no `--port` option is provided, the port will default to the value specified in the `PORT` environment variable, or 8080 if not set.

* feat(benchmark/cli): Add `config` subcommand

   - Added a new subcommand `config` to the AGBenchmark CLI, to display information about the present AGBenchmark config.

* fix(benchmark): Gracefully handle incompatible challenge spec files in app.py

   - Added a check to skip deprecated challenges
   - Added logging to allow debugging of the loading process
   - Added handling of validation errors when parsing challenge spec files
   - Added missing `spec_file` attribute to `ChallengeData`

* refactor(benchmark): Move `run_benchmark` entrypoint to main.py, use it in `/reports` endpoint

   - Move `run_benchmark` and `validate_args` from __main__.py to main.py
   - Replace agbenchmark subprocess in `app.py:run_single_test` with `run_benchmark`
   - Move `get_unique_categories` from __main__.py to challenges/__init__.py
   - Move `OPTIONAL_CATEGORIES` from __main__.py to challenge.py
   - Reduce operations on updates.json (including `initialize_updates_file`) outside of API

* refactor(benchmark): Remove unused `/updates` endpoint and all related code

   - Remove `updates_json_file` attribute from `AgentBenchmarkConfig`
   - Remove `get_updates` and `_initialize_updates_file` in app.py
   - Remove `append_updates_file` and `create_update_json` functions in agent_api_interface.py
   - Remove call to `append_updates_file` in challenge.py

* refactor(benchmark/config): Clean up and update docstrings on `AgentBenchmarkConfig`

   - Add and update docstrings
   - Change base class from `BaseModel` to `BaseSettings`, allow extras for backwards compatibility
   - Make naming of path attributes on `AgentBenchmarkConfig` more consistent
   - Remove unused `agent_home_directory` attribute
   - Remove unused `workspace` attribute

* fix(benchmark): Restore mechanism to select (optional) categories in agent benchmark config

* fix(benchmark): Update agent-protocol-client to v1.1.0

   - Fixes issue with fetching task artifact listings

* fix(forge): cast port to int (Significant-Gravitas#6643)

When executing `port = os.getenv("PORT", 8000)` if the port is being fetched from a `.env` file it is fetched as a string.

This caused an error: `TypeError: 'str' object cannot be interpreted as an integer`

* feat(agent/server): Make port configurable, add documentation for Agent Protocol DB and port config (Significant-Gravitas#6569)

* docs: Add documentation on how to use Agent Protocol in the template

- Added documentation on how to use Agent Protocol and its settings in the `.env.template` file.
- An explanation is provided for the `AP_SERVER_PORT` and `AP_SERVER_DB_URL` settings.
- This change aims to improve the understanding and usage of Agent Protocol in the project.

* docs: Update usage.md with information about configuring the API port

- Update the documentation for the `serve` mode in `usage.md`
- Add information about configuring the port for the API server using the `AP_SERVER_PORT` environment variable.

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>

* fix(agent/workspace): Fix GCS workspace binary file upload

* refactor(benchmark): Disable Helicone integrations

We want to upgrade the OpenAI library, but `helicone` does not support `openai@^1.0.0`, so we're disabling the Helicone integration for now.

* chore(benchmark): Upgrade OpenAI client lib from v0 to v1

* chore(forge): Upgrade OpenAI client lib and LiteLLM from v0 to v1

* Update `openai` dependency from `^0.27.8` to `^1.7.2`
* Update `litellm` dependency from `^0.1.821` to `^1.17.9`
* Migrate llm.py from OpenAI module-level client to client instance
* Update return types in llm.py for new OpenAI and LiteLLM versions
   * Also remove `Exception` as a return type because they are raised, not returned
   * Update tutorials/003_crafting_agent_logic.md accordingly

Note: this changes the output types of the functions in `forge.llm`: `chat_completion_request`, `create_embedding_request`, `transcribe_audio`

* lint(forge): `black .` and `isort .`

* refactor(agent/openai): Upgrade OpenAI library to v1

- Update `openai` dependency from ^v0.27.10 to ^v1.7.2
- Update poetry.lock
- Update code for changed endpoints and new output types of OpenAI library
- Replace uses of `AssistantChatMessageDict` by `AssistantChatMessage`
   - Update `PromptStrategy`, `BaseAgent`, and all of their subclasses accordingly
- Update `OpenAIProvider`, `OpenAICredentials`, azure.yaml.template, .env.template and test_config.py to work with new separate `AzureOpenAI` client
- Remove `_OpenAIRetryHandler` and implement retry mechanism with `tenacity`
- Rewrite pytest fixture `cached_openai_client` (renamed from `patched_api_requestor`) for OpenAI v1 library

* refactor(benchmark): Interface & type consoledation, and arch change, to allow adding challenge providers

Squashed commit of the following:

commit 7d6476d
Author: Reinier van der Leer <pwuts@agpt.co>
Date:   Tue Jan 9 18:10:45 2024 +0100

    refactor(benchmark/challenge): Set up structure to support more challenge providers

    - Move `Challenge`, `ChallengeData`, `load_challenges` to `challenges/builtin.py` and rename to `BuiltinChallenge`, `BuiltinChallengeSpec`, `load_builtin_challenges`
    - Create `BaseChallenge` to serve as interface and base class for different challenge implementations
    - Create `ChallengeInfo` model to serve as universal challenge info object
    - Create `get_challenge_from_source_uri` function in `challenges/__init__.py`
    - Replace `ChallengeData` by `ChallengeInfo` everywhere except in `BuiltinChallenge`
    - Add strong typing to `task_informations` store in app.py
    - Use `call.duration` in `finalize_test_report` and remove `timer` fixture
    - Update docstring on `challenges/__init__.py:get_unique_categories`
    - Add docstring to `generate_test.py`

commit 5df2aa7
Author: Reinier van der Leer <pwuts@agpt.co>
Date:   Tue Jan 9 16:58:01 2024 +0100

    refactor(benchmark): Refactor & rename functions in agent_interface.py and agent_api_interface.py

    - `copy_artifacts_into_temp_folder` -> `copy_challenge_artifacts_into_workspace`
    - `copy_agent_artifacts_into_folder` -> `download_agent_artifacts_into_folder`
    - Reorder parameters of `run_api_agent`, `copy_challenge_artifacts_into_workspace`; use `Path` instead of `str`

commit 6a256fe
Author: Reinier van der Leer <pwuts@agpt.co>
Date:   Tue Jan 9 16:02:25 2024 +0100

    refactor(benchmark): Refactor & typefix report generation and handling logic

    - Rename functions in reports.py and ReportManager.py to better reflect what they do
       - `get_previous_test_results` -> `get_and_update_success_history`
       - `generate_single_call_report` -> `initialize_test_report`
       - `finalize_reports` -> `finalize_test_report`
       - `ReportManager.end_info_report` -> `SessionReportManager.finalize_session_report`
    - Modify `pytest_runtest_makereport` hook in conftest.py to finalize the report immediately after the challenge finishes running instead of after teardown
       - Move result processing logic from `initialize_test_report` to `finalize_test_report` in reports.py
    - Use `Test` and `Report` types from report_types.py where possible instead of untyped dicts: reports.py, utils.py, ReportManager.py
    - Differentiate `ReportManager` into `SessionReportManager`, `RegressionTestsTracker`, `SuccessRateTracker`
    - Move filtering of optional challenge categories from challenge.py (`Challenge.skip_optional_categories`) to conftest.py (`pytest_collection_modifyitems`)
    - Remove unused `scores` fixture in conftest.py

commit 370d6db
Author: Reinier van der Leer <pwuts@agpt.co>
Date:   Tue Jan 9 15:16:43 2024 +0100

    refactor(benchmark): Simplify models in report_types.py

    - Removed ForbidOptionalMeta and BaseModelBenchmark classes.
    - Changed model attributes to optional: `Metrics.difficulty`, `Metrics.success`, `Metrics.success_percentage`, `Metrics.run_time`, and `Test.reached_cutoff`.
    - Added validator to `Metrics` model to require `success` and `run_time` fields if `attempted=True`.
    - Added default values to all optional model fields.
    - Removed duplicate imports.
    - Added condition in process_report.py to prevent null lookups if `metrics.difficulty` is not set.

* feat(forge/db): Add `AgentDB.update_artifact` method

* fix(agent): Handle artifact modification properly

- When an Artifact's file is modified by the agent, set its `agent_created` attribute to `True` instead of registering a new Artifact
- Update the `autogpt-forge` dependency to the newest version, in which `AgentDB.update_artifact` has been implemented

* fix(agent): Fix "ChatModelResponse not subscriptable" errors in `summarize_text` and `QueryLanguageModel` ability

- `summarize_text` and `QueryLanguageModel.__call__` still tried to access `response["content"]`, which isn't possible since upgrading to the OpenAI v1 client library.

* fix(agent): Fix `extract_dict_from_response` flakiness

- The `extract_dict_from_response` function, which is supposed to reliably extract a JSON object from an LLM's response, positively discriminated objects defined on a single line, causing issues.

* fix(benchmark): Fix challenge input artifact upload

* feat(agent/llm): Add cost tracking and logging to `AgentProtocolServer`

* fix(agent/serve): Fix task cost tracking persistence in `AgentProtocolServer`

- Pydantic shallow-copies models when they are passed into a parent model, meaning they can't be updated through the original reference. This commit adds a fix for the resulting cost persistence issue.

* feat(agent/llm/openai): Include compatibility tool call extraction in LLM response parse-fix loop

* fix(benchmark/report): Fix and clean up logic in `update_challenges_already_beaten`

- `update_challenges_already_beaten` incorrectly marked challenges as beaten if it was present in the report file but set to `false`

* feat(benchmark): JungleGym WebArena (Significant-Gravitas#6691)

* feat(benchmark): Add JungleGym WebArena challenges
   - Add `WebArenaChallenge`, `WebArenaChallengeSpec`, and other logic to make these challenges work
   - Add WebArena challenges to Pytest collection endpoint generate_test.py

* feat(benchmark/webarena): Add hand-picked selection of WebArena challenges

* feat(benchmark): Add `-N`, `--attempts` option for multiple attempts per challenge

LLMs are probabilistic systems. Reproducibility of completions is not guaranteed. It only makes sense to account for this, by running challenges multiple times to obtain a success ratio rather than a boolean success/failure result.

Changes:
- Add `-N`, `--attempts` option to CLI and `attempts_per_challenge` parameter to `main.py:run_benchmark`.
- Add dynamic `i_attempt` fixture through `pytest_generate_tests` hook in conftest.py to achieve multiple runs per challenge.
- Modify `pytest_runtest_makereport` hook in conftest.py to handle multiple reporting calls per challenge.
- Refactor report_types.py, reports.py, process_report.ty to allow multiple results per challenge.
   - Calculate `success_percentage` from results of the current run, rather than all known results ever.
   - Add docstrings to a number of models in report_types.py.
   - Allow `None` as a success value, e.g. for runs that did not render any results before being cut off.
- Make SingletonReportManager thread-safe.

---------

Co-authored-by: Himanshu Mittal <himsmittal@gmail.com>
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
Co-authored-by: Or Arbel <orarbel@gmail.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
ph-ausseil added a commit to ph-ausseil/afaas that referenced this pull request Feb 2, 2024
* [Documentation Update] Updating Using and Creating Abilities to use Action Annotations (Significant-Gravitas#6653)

Changing ability documentation

* AGBenchmark codebase clean-up (Significant-Gravitas#6650)

* refactor(benchmark): Deduplicate configuration loading logic

   - Move the configuration loading logic to a separate `load_agbenchmark_config` function in `agbenchmark/config.py` module.
   - Replace the duplicate loading logic in `conftest.py`, `generate_test.py`, `ReportManager.py`, `reports.py`, and `__main__.py` with calls to `load_agbenchmark_config` function.

* fix(benchmark): Fix type errors, linting errors, and clean up CLI validation in __main__.py

   - Fixed type errors and linting errors in `__main__.py`
   - Improved the readability of CLI argument validation by introducing a separate function for it

* refactor(benchmark): Lint and typefix app.py

   - Rearranged and cleaned up import statements
   - Fixed type errors caused by improper use of `psutil` objects
   - Simplified a number of `os.path` usages by converting to `pathlib`
   - Use `Task` and `TaskRequestBody` classes from `agent_protocol_client` instead of `.schema`

* refactor(benchmark): Replace `.agent_protocol_client` by `agent-protcol-client`, clean up schema.py

   - Remove `agbenchmark.agent_protocol_client` (an offline copy of `agent-protocol-client`).
      - Add `agent-protocol-client` as a dependency and change imports to `agent_protocol_client`.
   - Fix type annotation on `agent_api_interface.py::upload_artifacts` (`ApiClient` -> `AgentApi`).
   - Remove all unused types from schema.py (= most of them).

* refactor(benchmark): Use pathlib in agent_interface.py and agent_api_interface.py

* refactor(benchmark): Improve typing, response validation, and readability in app.py

   - Simplified response generation by leveraging type checking and conversion by FastAPI.
   - Introduced use of `HTTPException` for error responses.
   - Improved naming, formatting, and typing in `app.py::create_evaluation`.
   - Updated the docstring on `app.py::create_agent_task`.
   - Fixed return type annotations of `create_single_test` and `create_challenge` in generate_test.py.
   - Added default values to optional attributes on models in report_types_v2.py.
   - Removed unused imports in `generate_test.py`

* refactor(benchmark): Clean up logging and print statements

   - Introduced use of the `logging` library for unified logging and better readability.
   - Converted most print statements to use `logger.debug`, `logger.warning`, and `logger.error`.
   - Improved descriptiveness of log statements.
   - Removed unnecessary print statements.
   - Added log statements to unspecific and non-verbose `except` blocks.
   - Added `--debug` flag, which sets the log level to `DEBUG` and enables a more comprehensive log format.
   - Added `.utils.logging` module with `configure_logging` function to easily configure the logging library.
   - Converted raw escape sequences in `.utils.challenge` to use `colorama`.
   - Renamed `generate_test.py::generate_tests` to `load_challenges`.

* refactor(benchmark): Remove unused server.py and agent_interface.py::run_agent

   - Remove unused server.py file
   - Remove unused run_agent function from agent_interface.py

* refactor(benchmark): Clean up conftest.py

   - Fix and add type annotations
   - Rewrite docstrings
   - Disable or remove unused code
   - Fix definition of arguments and their types in `pytest_addoption`

* refactor(benchmark): Clean up generate_test.py file

   - Refactored the `create_single_test` function for clarity and readability
      - Removed unused variables
      - Made creation of `Challenge` subclasses more straightforward
      - Made bare `except` more specific
   - Renamed `Challenge.setup_challenge` method to `run_challenge`
   - Updated type hints and annotations
   - Made minor code/readability improvements in `load_challenges`
   - Added a helper function `_add_challenge_to_module` for attaching a Challenge class to the current module

* fix(benchmark): Fix and add type annotations in execute_sub_process.py

* refactor(benchmark): Simplify const determination in agent_interface.py

   - Simplify the logic that determines the value of `HELICONE_GRAPHQL_LOGS`

* fix(benchmark): Register category markers to prevent warnings

   - Use the `pytest_configure` hook to register the known challenge categories as markers. Otherwise, Pytest will raise "unknown marker" warnings at runtime.

* refactor(benchmark/challenges): Fix indentation in 4_revenue_retrieval_2/data.json

* refactor(benchmark): Update agent_api_interface.py

   - Add type annotations to `copy_agent_artifacts_into_temp_folder` function
   - Add note about broken endpoint in the `agent_protocol_client` library
   - Remove unused variable in `run_api_agent` function
   - Improve readability and resolve linting error

* feat(benchmark): Improve and centralize pathfinding

   - Search path hierarchy for applicable `agbenchmark_config`, rather than assuming it's in the current folder.
   - Create `agbenchmark.utils.path_manager` with `AGBenchmarkPathManager` and exporting a `PATH_MANAGER` const.
   - Replace path constants defined in __main__.py with usages of `PATH_MANAGER`.

* feat(benchmark/cli): Clean up and improve CLI

   - Updated commands, options, and their descriptions to be more intuitive and consistent
   - Moved slow imports into the entrypoints that use them to speed up application startup
   - Fixed type hints to match output types of Click options
   - Hid deprecated `agbenchmark start` command
   - Refactored code to improve readability and maintainability
   - Moved main entrypoint into `run` subcommand
   - Fixed `version` and `serve` subcommands
   - Added `click-default-group` package to allow using `run` implicitly (for backwards compatibility)
   - Renamed `--no_dep` to `--no-dep` for consistency
   - Fixed string formatting issues in log statements

* refactor(benchmark/config): Move AgentBenchmarkConfig and related functions to config.py

   - Move the `AgentBenchmarkConfig` class from `utils/data_types.py` to `config.py`.
   - Extract the `calculate_info_test_path` function from `utils/data_types.py` and move it to `config.py` as a private helper function `_calculate_info_test_path`.
   - Move `load_agent_benchmark_config()` to `AgentBenchmarkConfig.load()`.
   - Changed simple getter methods on `AgentBenchmarkConfig` to calculated properties.
   - Update all code references according to the changes mentioned above.

* refactor(benchmark): Fix ReportManager init parameter types and use pathlib

   - Fix the type annotation of the `benchmark_start_time` parameter in `ReportManager.__init__`, was mistyped as `str` instead of `datetime`.
   - Change the type of the `filename` parameter in the `ReportManager.__init__` method from `str` to `Path`.
   - Rename `self.filename` with `self.report_file` in `ReportManager`.
   - Change the way the report file is created, opened and saved to use the `Path` object.

* refactor(benchmark): Improve typing surrounding ChallengeData and clean up its implementation

   - Use `ChallengeData` objects instead of untyped `dict` in  app.py, generate_test.py, reports.py.
   - Remove unnecessary methods `serialize`, `get_data`, `get_json_from_path`, `deserialize` from `ChallengeData` class.
   - Remove unused methods `challenge_from_datum` and `challenge_from_test_data` from `ChallengeData class.
   - Update function signatures and annotations of `create_challenge` and `generate_single_test` functions in generate_test.py.
   - Add types to function signatures of `generate_single_call_report` and `finalize_reports` in reports.py.
   - Remove unnecessary `challenge_data` parameter (in generate_test.py) and fixture (in conftest.py).

* refactor(benchmark): Clean up generate_test.py, conftest.py and __main__.py

   - Cleaned up generate_test.py and conftest.py
      - Consolidated challenge creation logic in the `Challenge` class itself, most notably the new `Challenge.from_challenge_spec` method.
      - Moved challenge selection logic from generate_test.py to the `pytest_collection_modifyitems` hook in conftest.py.
   - Converted methods in the `Challenge` class to class methods where appropriate.
   - Improved argument handling in the `run_benchmark` function in `__main__.py`.

* refactor(benchmark/config): Merge AGBenchmarkPathManager into AgentBenchmarkConfig and reduce fragmented/global state

   - Merge the functionality of `AGBenchmarkPathManager` into `AgentBenchmarkConfig` to consolidate the configuration management.
   - Remove the `.path_manager` module containing `AGBenchmarkPathManager`.
   - Pass the `AgentBenchmarkConfig` and its attributes through function arguments to reduce global state and improve code clarity.

* feat(benchmark/serve): Configurable port for `serve` subcommand

   - Added `--port` option to `serve` subcommand to allow for specifying the port to run the API on.
   - If no `--port` option is provided, the port will default to the value specified in the `PORT` environment variable, or 8080 if not set.

* feat(benchmark/cli): Add `config` subcommand

   - Added a new subcommand `config` to the AGBenchmark CLI, to display information about the present AGBenchmark config.

* fix(benchmark): Gracefully handle incompatible challenge spec files in app.py

   - Added a check to skip deprecated challenges
   - Added logging to allow debugging of the loading process
   - Added handling of validation errors when parsing challenge spec files
   - Added missing `spec_file` attribute to `ChallengeData`

* refactor(benchmark): Move `run_benchmark` entrypoint to main.py, use it in `/reports` endpoint

   - Move `run_benchmark` and `validate_args` from __main__.py to main.py
   - Replace agbenchmark subprocess in `app.py:run_single_test` with `run_benchmark`
   - Move `get_unique_categories` from __main__.py to challenges/__init__.py
   - Move `OPTIONAL_CATEGORIES` from __main__.py to challenge.py
   - Reduce operations on updates.json (including `initialize_updates_file`) outside of API

* refactor(benchmark): Remove unused `/updates` endpoint and all related code

   - Remove `updates_json_file` attribute from `AgentBenchmarkConfig`
   - Remove `get_updates` and `_initialize_updates_file` in app.py
   - Remove `append_updates_file` and `create_update_json` functions in agent_api_interface.py
   - Remove call to `append_updates_file` in challenge.py

* refactor(benchmark/config): Clean up and update docstrings on `AgentBenchmarkConfig`

   - Add and update docstrings
   - Change base class from `BaseModel` to `BaseSettings`, allow extras for backwards compatibility
   - Make naming of path attributes on `AgentBenchmarkConfig` more consistent
   - Remove unused `agent_home_directory` attribute
   - Remove unused `workspace` attribute

* fix(benchmark): Restore mechanism to select (optional) categories in agent benchmark config

* fix(benchmark): Update agent-protocol-client to v1.1.0

   - Fixes issue with fetching task artifact listings

* fix(forge): cast port to int (Significant-Gravitas#6643)

When executing `port = os.getenv("PORT", 8000)` if the port is being fetched from a `.env` file it is fetched as a string.

This caused an error: `TypeError: 'str' object cannot be interpreted as an integer`

* feat(agent/server): Make port configurable, add documentation for Agent Protocol DB and port config (Significant-Gravitas#6569)

* docs: Add documentation on how to use Agent Protocol in the template

- Added documentation on how to use Agent Protocol and its settings in the `.env.template` file.
- An explanation is provided for the `AP_SERVER_PORT` and `AP_SERVER_DB_URL` settings.
- This change aims to improve the understanding and usage of Agent Protocol in the project.

* docs: Update usage.md with information about configuring the API port

- Update the documentation for the `serve` mode in `usage.md`
- Add information about configuring the port for the API server using the `AP_SERVER_PORT` environment variable.

---------

Co-authored-by: Reinier van der Leer <pwuts@agpt.co>

* fix(agent/workspace): Fix GCS workspace binary file upload

* refactor(benchmark): Disable Helicone integrations

We want to upgrade the OpenAI library, but `helicone` does not support `openai@^1.0.0`, so we're disabling the Helicone integration for now.

* chore(benchmark): Upgrade OpenAI client lib from v0 to v1

* chore(forge): Upgrade OpenAI client lib and LiteLLM from v0 to v1

* Update `openai` dependency from `^0.27.8` to `^1.7.2`
* Update `litellm` dependency from `^0.1.821` to `^1.17.9`
* Migrate llm.py from OpenAI module-level client to client instance
* Update return types in llm.py for new OpenAI and LiteLLM versions
   * Also remove `Exception` as a return type because they are raised, not returned
   * Update tutorials/003_crafting_agent_logic.md accordingly

Note: this changes the output types of the functions in `forge.llm`: `chat_completion_request`, `create_embedding_request`, `transcribe_audio`

* lint(forge): `black .` and `isort .`

* refactor(agent/openai): Upgrade OpenAI library to v1

- Update `openai` dependency from ^v0.27.10 to ^v1.7.2
- Update poetry.lock
- Update code for changed endpoints and new output types of OpenAI library
- Replace uses of `AssistantChatMessageDict` by `AssistantChatMessage`
   - Update `PromptStrategy`, `BaseAgent`, and all of their subclasses accordingly
- Update `OpenAIProvider`, `OpenAICredentials`, azure.yaml.template, .env.template and test_config.py to work with new separate `AzureOpenAI` client
- Remove `_OpenAIRetryHandler` and implement retry mechanism with `tenacity`
- Rewrite pytest fixture `cached_openai_client` (renamed from `patched_api_requestor`) for OpenAI v1 library

* refactor(benchmark): Interface & type consoledation, and arch change, to allow adding challenge providers

Squashed commit of the following:

commit 7d6476d
Author: Reinier van der Leer <pwuts@agpt.co>
Date:   Tue Jan 9 18:10:45 2024 +0100

    refactor(benchmark/challenge): Set up structure to support more challenge providers

    - Move `Challenge`, `ChallengeData`, `load_challenges` to `challenges/builtin.py` and rename to `BuiltinChallenge`, `BuiltinChallengeSpec`, `load_builtin_challenges`
    - Create `BaseChallenge` to serve as interface and base class for different challenge implementations
    - Create `ChallengeInfo` model to serve as universal challenge info object
    - Create `get_challenge_from_source_uri` function in `challenges/__init__.py`
    - Replace `ChallengeData` by `ChallengeInfo` everywhere except in `BuiltinChallenge`
    - Add strong typing to `task_informations` store in app.py
    - Use `call.duration` in `finalize_test_report` and remove `timer` fixture
    - Update docstring on `challenges/__init__.py:get_unique_categories`
    - Add docstring to `generate_test.py`

commit 5df2aa7
Author: Reinier van der Leer <pwuts@agpt.co>
Date:   Tue Jan 9 16:58:01 2024 +0100

    refactor(benchmark): Refactor & rename functions in agent_interface.py and agent_api_interface.py

    - `copy_artifacts_into_temp_folder` -> `copy_challenge_artifacts_into_workspace`
    - `copy_agent_artifacts_into_folder` -> `download_agent_artifacts_into_folder`
    - Reorder parameters of `run_api_agent`, `copy_challenge_artifacts_into_workspace`; use `Path` instead of `str`

commit 6a256fe
Author: Reinier van der Leer <pwuts@agpt.co>
Date:   Tue Jan 9 16:02:25 2024 +0100

    refactor(benchmark): Refactor & typefix report generation and handling logic

    - Rename functions in reports.py and ReportManager.py to better reflect what they do
       - `get_previous_test_results` -> `get_and_update_success_history`
       - `generate_single_call_report` -> `initialize_test_report`
       - `finalize_reports` -> `finalize_test_report`
       - `ReportManager.end_info_report` -> `SessionReportManager.finalize_session_report`
    - Modify `pytest_runtest_makereport` hook in conftest.py to finalize the report immediately after the challenge finishes running instead of after teardown
       - Move result processing logic from `initialize_test_report` to `finalize_test_report` in reports.py
    - Use `Test` and `Report` types from report_types.py where possible instead of untyped dicts: reports.py, utils.py, ReportManager.py
    - Differentiate `ReportManager` into `SessionReportManager`, `RegressionTestsTracker`, `SuccessRateTracker`
    - Move filtering of optional challenge categories from challenge.py (`Challenge.skip_optional_categories`) to conftest.py (`pytest_collection_modifyitems`)
    - Remove unused `scores` fixture in conftest.py

commit 370d6db
Author: Reinier van der Leer <pwuts@agpt.co>
Date:   Tue Jan 9 15:16:43 2024 +0100

    refactor(benchmark): Simplify models in report_types.py

    - Removed ForbidOptionalMeta and BaseModelBenchmark classes.
    - Changed model attributes to optional: `Metrics.difficulty`, `Metrics.success`, `Metrics.success_percentage`, `Metrics.run_time`, and `Test.reached_cutoff`.
    - Added validator to `Metrics` model to require `success` and `run_time` fields if `attempted=True`.
    - Added default values to all optional model fields.
    - Removed duplicate imports.
    - Added condition in process_report.py to prevent null lookups if `metrics.difficulty` is not set.

* feat(forge/db): Add `AgentDB.update_artifact` method

* fix(agent): Handle artifact modification properly

- When an Artifact's file is modified by the agent, set its `agent_created` attribute to `True` instead of registering a new Artifact
- Update the `autogpt-forge` dependency to the newest version, in which `AgentDB.update_artifact` has been implemented

* fix(agent): Fix "ChatModelResponse not subscriptable" errors in `summarize_text` and `QueryLanguageModel` ability

- `summarize_text` and `QueryLanguageModel.__call__` still tried to access `response["content"]`, which isn't possible since upgrading to the OpenAI v1 client library.

* fix(agent): Fix `extract_dict_from_response` flakiness

- The `extract_dict_from_response` function, which is supposed to reliably extract a JSON object from an LLM's response, positively discriminated objects defined on a single line, causing issues.

* fix(benchmark): Fix challenge input artifact upload

* feat(agent/llm): Add cost tracking and logging to `AgentProtocolServer`

* fix(agent/serve): Fix task cost tracking persistence in `AgentProtocolServer`

- Pydantic shallow-copies models when they are passed into a parent model, meaning they can't be updated through the original reference. This commit adds a fix for the resulting cost persistence issue.

* feat(agent/llm/openai): Include compatibility tool call extraction in LLM response parse-fix loop

* fix(benchmark/report): Fix and clean up logic in `update_challenges_already_beaten`

- `update_challenges_already_beaten` incorrectly marked challenges as beaten if it was present in the report file but set to `false`

* feat(benchmark): JungleGym WebArena (Significant-Gravitas#6691)

* feat(benchmark): Add JungleGym WebArena challenges
   - Add `WebArenaChallenge`, `WebArenaChallengeSpec`, and other logic to make these challenges work
   - Add WebArena challenges to Pytest collection endpoint generate_test.py

* feat(benchmark/webarena): Add hand-picked selection of WebArena challenges

* feat(benchmark): Add `-N`, `--attempts` option for multiple attempts per challenge

LLMs are probabilistic systems. Reproducibility of completions is not guaranteed. It only makes sense to account for this, by running challenges multiple times to obtain a success ratio rather than a boolean success/failure result.

Changes:
- Add `-N`, `--attempts` option to CLI and `attempts_per_challenge` parameter to `main.py:run_benchmark`.
- Add dynamic `i_attempt` fixture through `pytest_generate_tests` hook in conftest.py to achieve multiple runs per challenge.
- Modify `pytest_runtest_makereport` hook in conftest.py to handle multiple reporting calls per challenge.
- Refactor report_types.py, reports.py, process_report.ty to allow multiple results per challenge.
   - Calculate `success_percentage` from results of the current run, rather than all known results ever.
   - Add docstrings to a number of models in report_types.py.
   - Allow `None` as a success value, e.g. for runs that did not render any results before being cut off.
- Make SingletonReportManager thread-safe.

* INIT

* INIT

* INIT

* Remove unused code + Start tests suit

* INIT

* Update wrapper.py

* INIT

* INIT

* Allow support of multiple VectoreStore

* Add similarity threshold

* Update test_task.py

* Update task.py

* feat(agent/llm): Add support for `gpt-4-0125-preview`

* Add `gpt-4-0125-preview` model to OpenAI model list
* Add `gpt-4-turbo-preview` alias to OpenAI model list

* Update task.py

* Update test_task.py

* fix(agent/json_utils): Make `extract_dict_from_response` more robust

* Accommodate for both ```json and ```JSON blocks in responses

* Saving progress (tests ok but no dependency injection)

* INIT

* Set VectorStoreWrapper as an Abstract Class/

* Fix tests & Make an separate adapter for Chroma

* fix(agent/prompting): Fix representation of (optional) command parameters in prompt

* fix(agent/json_utils): Decode as JSON rather than Python objects

* Replace `ast.literal_eval` with `json.loads` in `extract_dict_from_response`

This fixes a bug where boolean values could not be decoded because of their required capitalization in Python.

* feat(agent/web): Improve `read_webpage` information extraction abilities

* Implement `extract_information` function in `autogpt.processing.text` module. This function extracts pieces of information from a body of text based on a list of topics of interest.
* Add `topics_of_interest` and `get_raw_content` parameters to `read_webpage` commmand
   * Limit maximum content length if `get_raw_content=true` is specified

* feat(agent): Add history compression to increase longevity and efficiency

* Compress steps in the prompt to reduce token usage, and to increase longevity when using models with limited context windows
* Move multiple copies of step formatting code to `Episode.format` method
* Add `EpisodicActionHistory.handle_compression` method to handle compression of new steps

* fix(forge): Fix "no module named 'forge.sdk.abilities'" (Significant-Gravitas#6571)

Fixes Significant-Gravitas#6537

* INIT

* Prompts Changes to avoid fast_track prevalence.

* Fix one more test

* Remove unused code

* update execute  code

* Update conftest.py

---------

Co-authored-by: Himanshu Mittal <himsmittal@gmail.com>
Co-authored-by: Reinier van der Leer <pwuts@agpt.co>
Co-authored-by: Or Arbel <orarbel@gmail.com>
Co-authored-by: Nicholas Tindle <nicholas.tindle@agpt.co>
Co-authored-by: Fernando Navarro Páez <kfern@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants