Skip to content

Support RunId or Experimentation API in a cross-process manner #433

@skylarbpayne

Description

@skylarbpayne

Description

When setting up evals for an existing system, the reality is that the "AI pipeline" is often not "pure". It depends on a lot of external resources in the middle. This makes it hard to simply "extract" it out and run it as part of an experiment.

As such, it is common to have an external tool/script which triggers an endpoint to start the AI pipeline. Generally, this is the same endpoint that real users would trigger in the product. This script can then be pointed at a local development or even production environment to generate logs.

However, it is possible to run the same set of inputs against the same versioned function multiple times. It would be helpful to be able to annotate and compare these separately (i.e. analogous to an A/A test).

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions