Evaluation: Investigate score drop for Antara

**Is your feature request related to a problem?**
We are currently in the process of migrating from a single unified assistant (X) to three cadre-specific assistants (Y1, Y2, Y3) for Antara, as the latter setup has been yielding better results.

Please note that while the prompts for each of these assistants are different, they initially shared the same knowledge base (i.e., the same Vector Store ID was used across all assistants).

For this new setup, we ran the first round of evaluations on the older Kaapi Konsole. The results can be accessed here: Click [here](https://docs.google.com/spreadsheets/d/1bHYA6RfyofUYEC2GYKCzyo-rP-Id4BJAd5rcpmjItD8/edit?gid=242914726#gid=242914726)

Following this, we made updates to the prompts and decided to add additional guidelines to the testing assistants.
All the versions of the prompts can be accessed here: Click [here](https://drive.google.com/drive/folders/1xzMtymMOF-4aVflHlCzjYVnpEFEK1ybJ?usp=drive_link)

Since the same Vector Store ID was shared between the testing assistants and the live assistant in production, we created a copy of the Vector Store and attached it to the testing assistants from the backend (outside the Glific UI). This ensured that any additional files used for testing would not be added to the assistant currently live in production. More details about this process can be accessed here: Click [here](https://discord.com/channels/717975833226248303/1475699369759080468) 

After adding the additional files to testing assistants KB, we are observing a considerable dip in the evaluation scores, which is not expected as all the docs added are new guidelines. 

**Describe the solution you'd like**
- Investigate the cause of the dip in evaluation scores.
- Compare the performance metrics before and after adding the new files.
- Consider rollback options or adjustments to the prompts if necessary.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation: Investigate score drop for Antara #674

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluation: Investigate score drop for Antara #674

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions