docs(test): Define requirements for models testing

## Description

Similarly to what was done for benchmarks in #3 we need to define the protocol for contributing model tests.

1. Benchmark Test Case Handling: Define how the system manages test cases, accommodating two distinct types:
  A. Locally provided tests: Model owners can embed their model tests inside the Nexus Package structure
  B. External tests: Model owners have tests defined in their library (or a third party one) and want to re-use them
2. Test Script Requirements: Define the standard for the script that runs a test. 
  - Interface: What is the standard interface the test script must expose? Do we define one (e.g., pytest) or just require the test script return 0 for success or another value for failure?
  - Responsibilities: The script is responsible for sourcing any data wherever required, loading the model and performing all the relevant tests.
  - Ownership: This script will be provided by the model contributor.
3. Testing models that support serving with vLLM
  - the test must verify the model works as expected with vLLM.
  - Handle vLLM as optional dependency. i.e., should vLLM and non vLLM  tests be separated for us to run them conditionally?
4. Model/Algorithm Contributor Responsibilities: Articulate what contributors must provide when they add a model
5. Dataset Sourcing & Hosting: Specify requirements for datasets, noting they may be managed internally or be part of an external framework.
6. Execution Environment: Outline infrastructure requirements, including a mechanism to handle dependencies for external frameworks (e.g., via containerization).




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(test): Define requirements for models testing #93

Description

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

docs(test): Define requirements for models testing #93

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions