Establish an effective integration test suite with mocked model calls.
Each apollo service should have a suite of Service tests which run against it. A Service test is basically an integration test, but it mocks out any external service calls (which is a bit weird for an integration test!)
I am open to better names (mock integration tests?)
These are the principles of Service Tests:
- Tests always call the top
main() function of each service: these are high-level service api tests
- They are call through python directly
- All LLM calls are mocked, so tests are free to run
- Tests assert on the resulting data structure, internal routing path, or values passed to the LLM call. They do not assert on the resulting content.
- The idea is to test the logic and flow of information within apollo, not to test the behaviour of the actual models
- Tests would be expected to run on every push to an open PR on GitHub
Implementation notes:
The anthropic client allows the http client to be fully configured: https://platform.claude.com/docs/en/api/sdks/python#configuring-the-http-client
We can use this as our mock layer, allowing us to simulate any LLM calls.
Tests are set up so that when the call the service entry point, they include a second argument called options. This object is not exposed to the HTTP service, only to direct python calls. It accepts configuration which can be used in testing
For example, the job chat signature would become:
def main(data_dict: dict, options) -> dict:
Options might include a key called anthropicHttpClient, which will be passed to the Anthropic client instance created by our AnthropicClient class. This simulates any HTTP calls.
Test code would then set up a mock HTTP client which returns fixed values. It should also allow the unit tests to interrogate the request so that we can check certain values. For example, we might check that logs were appended to the prompt, or that the api key was included in the request headers.
The options object can take any options or value which aid in testing these functions generally. For example an option might include a toolCalls list, where any toolCalls get pushed into the list like breadcrumbs.
Establish an effective integration test suite with mocked model calls.
Each apollo service should have a suite of Service tests which run against it. A Service test is basically an integration test, but it mocks out any external service calls (which is a bit weird for an integration test!)
I am open to better names (mock integration tests?)
These are the principles of Service Tests:
main()function of each service: these are high-level service api testsImplementation notes:
The anthropic client allows the http client to be fully configured: https://platform.claude.com/docs/en/api/sdks/python#configuring-the-http-client
We can use this as our mock layer, allowing us to simulate any LLM calls.
Tests are set up so that when the call the service entry point, they include a second argument called
options. This object is not exposed to the HTTP service, only to direct python calls. It accepts configuration which can be used in testingFor example, the job chat signature would become:
Options might include a key called
anthropicHttpClient, which will be passed to the Anthropic client instance created by ourAnthropicClientclass. This simulates any HTTP calls.Test code would then set up a mock HTTP client which returns fixed values. It should also allow the unit tests to interrogate the request so that we can check certain values. For example, we might check that logs were appended to the prompt, or that the api key was included in the request headers.
The options object can take any options or value which aid in testing these functions generally. For example an option might include a
toolCallslist, where any toolCalls get pushed into the list like breadcrumbs.