Skip to content

Evaluation: Investigate score drop for Antara #674

@tanuprasad530

Description

@tanuprasad530

Is your feature request related to a problem?
We are currently in the process of migrating from a single unified assistant (X) to three cadre-specific assistants (Y1, Y2, Y3) for Antara, as the latter setup has been yielding better results.

Please note that while the prompts for each of these assistants are different, they initially shared the same knowledge base (i.e., the same Vector Store ID was used across all assistants).

For this new setup, we ran the first round of evaluations on the older Kaapi Konsole. The results can be accessed here: Click here

Following this, we made updates to the prompts and decided to add additional guidelines to the testing assistants.
All the versions of the prompts can be accessed here: Click here

Since the same Vector Store ID was shared between the testing assistants and the live assistant in production, we created a copy of the Vector Store and attached it to the testing assistants from the backend (outside the Glific UI). This ensured that any additional files used for testing would not be added to the assistant currently live in production. More details about this process can be accessed here: Click here

After adding the additional files to testing assistants KB, we are observing a considerable dip in the evaluation scores, which is not expected as all the docs added are new guidelines.

Describe the solution you'd like

  • Investigate the cause of the dip in evaluation scores.
  • Compare the performance metrics before and after adding the new files.
  • Consider rollback options or adjustments to the prompts if necessary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions