Is your feature request related to a problem?
Partners are currently running separate AI assessment flows manually, which leads to inefficiencies and a lack of standardized evaluation comparisons on the same dataset.
Describe the solution you'd like
Build a shared Kaapi assessment pipeline that includes:
- Dataset APIs
- Multimodal processing
- Multiple config-based batch runs
- Run status tracking
- Retries
- Cron polling
- SSE updates
- Result exports
Why is this enhancement needed?
This enhancement will reduce duplicate partner efforts, ensure evaluations are repeatable, and improve scalability for high-volume programs like Inquilab.
Original issue
Describe the current behavior
A clear description of how it currently works and what the limitations are.
Partners run similar AI assessment flows separately, with manual prompt testing and no shared way to compare evaluations and configs on the same dataset.
Describe the enhancement you'd like
A clear and concise description of the improvement you want to see.
Build a shared Kaapi assessment pipeline with dataset APIs, multimodal processing, multiple config-based batch runs, run status tracking, retries, cron polling, SSE updates, and result exports.
Why is this enhancement needed?
Explain the benefits (e.g., performance, usability, maintainability, scalability).
Reduces duplicate partner effort, makes evaluations repeatable, and scales better for high-volume programs (like Inquilab).
Additional context
Add any other context, metrics, screenshots, or examples about the enhancement here.
Is your feature request related to a problem?
Partners are currently running separate AI assessment flows manually, which leads to inefficiencies and a lack of standardized evaluation comparisons on the same dataset.
Describe the solution you'd like
Build a shared Kaapi assessment pipeline that includes:
Why is this enhancement needed?
This enhancement will reduce duplicate partner efforts, ensure evaluations are repeatable, and improve scalability for high-volume programs like Inquilab.
Original issue
Describe the current behavior
A clear description of how it currently works and what the limitations are.
Partners run similar AI assessment flows separately, with manual prompt testing and no shared way to compare evaluations and configs on the same dataset.
Describe the enhancement you'd like
A clear and concise description of the improvement you want to see.
Build a shared Kaapi assessment pipeline with dataset APIs, multimodal processing, multiple config-based batch runs, run status tracking, retries, cron polling, SSE updates, and result exports.
Why is this enhancement needed?
Explain the benefits (e.g., performance, usability, maintainability, scalability).
Reduces duplicate partner effort, makes evaluations repeatable, and scales better for high-volume programs (like Inquilab).
Additional context
Add any other context, metrics, screenshots, or examples about the enhancement here.