Is your feature request related to a problem?
Educate Girls's existing config, prompt, and vector store setup currently gives low evaluation score.
Describe the solution you'd like
- Audit the current setup and baseline evaluation score.
- Identify factors hurting performance like the prompt, model choice, files used in setting up vector store or poor golden question answers
- Iterate on improvements based on findings from similar NGOs like CBC and Antara, focusing on prompt and configuration changes.
Additional Context
In parallel with the Gemini-based eval runner work, we should also test Gemini as the underlying model and switch if it shows a meaningful improvement while still using openai for cosine similarity calculation and llm as judge score so we keep scoring framework constant.
Is your feature request related to a problem?
Educate Girls's existing config, prompt, and vector store setup currently gives low evaluation score.
Describe the solution you'd like
Additional Context
In parallel with the Gemini-based eval runner work, we should also test Gemini as the underlying model and switch if it shows a meaningful improvement while still using openai for cosine similarity calculation and llm as judge score so we keep scoring framework constant.