Gemini Evaluations

Screencast.2023-12-22.21_50_25.mp4

1.mov

2.mov

3.mov

problem statement : when creating multiagent environments, fixed prompts and default prompts are used. However, Gemini does not have system prompts and the "voice" returned is different than openai.

what is the optimal configuration of text and multimodal input prompts to produce ai autonomous agent builders and accomplish complex data processing tasks using gemini ?

solution : We're working with trulens to evaluate default prompts and optimize performance.

methodology : to arrive at our optimal solution:

we made "simple applications" using trulens wrappers in jupyter notebooks.
We evaluated prompts and combinations of prompts for gemini and openai for our libraries Autogen, TaskWeaver, Semantic-Kernel.
We evaluated RAG retrieval performance for various embedding models.
We evaluated Multimodal Performance.
Analyzed the results for best performance and quality. then applied them to our enterprise application DataTonic.

default prompts : these are our baselines:

autogen
semantic-kernel
taskweaver
datatonic

proposed prompts : these are our unique prompts:

taskweaver planner
semantic-kernel planner

evaluation results :

these are our results

analysis : based on our results , we have ranked prompts and prompt combinations and optimized their useage and their applicability to our use case

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Gemini Evaluations

Files

README.md

Latest commit

History

README.md

File metadata and controls

Gemini Evaluations