Description
Similarly to what was done for benchmarks in #3 we need to define the protocol for contributing model tests.
- Benchmark Test Case Handling: Define how the system manages test cases, accommodating two distinct types:
A. Locally provided tests: Model owners can embed their model tests inside the Nexus Package structure
B. External tests: Model owners have tests defined in their library (or a third party one) and want to re-use them
- Test Script Requirements: Define the standard for the script that runs a test.
- Interface: What is the standard interface the test script must expose? Do we define one (e.g., pytest) or just require the test script return 0 for success or another value for failure?
- Responsibilities: The script is responsible for sourcing any data wherever required, loading the model and performing all the relevant tests.
- Ownership: This script will be provided by the model contributor.
- Testing models that support serving with vLLM
- the test must verify the model works as expected with vLLM.
- Handle vLLM as optional dependency. i.e., should vLLM and non vLLM tests be separated for us to run them conditionally?
- Model/Algorithm Contributor Responsibilities: Articulate what contributors must provide when they add a model
- Dataset Sourcing & Hosting: Specify requirements for datasets, noting they may be managed internally or be part of an external framework.
- Execution Environment: Outline infrastructure requirements, including a mechanism to handle dependencies for external frameworks (e.g., via containerization).
Description
Similarly to what was done for benchmarks in #3 we need to define the protocol for contributing model tests.
A. Locally provided tests: Model owners can embed their model tests inside the Nexus Package structure
B. External tests: Model owners have tests defined in their library (or a third party one) and want to re-use them