🗺️ Evaluation Experiment Tracking #2220

mikeldking · 2024-02-06T23:31:39Z

As a user I would like to:

Ship LLM apps with confidence
Golden datasets for QA, experimentation, and fine-tuning
Evaluate changes to LLMs, prompts, retrieval,
Track experiment runs during development and production
Production data provides the critical feedback loop, keeping benchmarks up to date.

mikeldking · 2024-05-31T15:09:12Z

merging with #2017

mikeldking added the roadmap label Feb 6, 2024

dosubot bot added c/evals c/metrics documentation Improvements or additions to documentation enhancement New feature or request labels Feb 6, 2024

mikeldking removed the c/metrics label Feb 7, 2024

Arize-ai deleted a comment from dosubot bot Feb 7, 2024

mikeldking removed the enhancement New feature or request label May 13, 2024

mikeldking closed this as completed May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🗺️ Evaluation Experiment Tracking #2220

🗺️ Evaluation Experiment Tracking #2220

mikeldking commented Feb 6, 2024

mikeldking commented May 31, 2024

🗺️ Evaluation Experiment Tracking #2220

🗺️ Evaluation Experiment Tracking #2220

Comments

mikeldking commented Feb 6, 2024

mikeldking commented May 31, 2024