See course module 4 content here
In this module, we learn how to evaluate and monitor our LLM and RAG system.
In the evaluation part, we assess the quality of our entire RAG system before it goes live.
In the monitoring part, we collect, store and visualize metrics to assess the answer quality of a deployed LLM. We also collect chat history and user feedback.
Links:
- notebook
- results-llama3_8b_8192.csv (answers from llama3-8b-8192)
- results-gemma_7b_it.csv (answers from gemma-7b-it)
Content
- A->Q->A' cosine similarity
- Evaluating llama3-8b-8192
- Evaluating gemma-7b-it
- Evaluating gemma2-9b-it
- LLM as a judge
- A->Q->A' evaluation
- Q->A evaluation
Links:
- notebook
- evaluations-aqa.csv (A->Q->A evaluation results)
- evaluations-qa.csv (Q->A evaluation results)
You can see the prompts and the output from claude here
Content
- Adding +1 and -1 buttons
- Setting up a postgres database
- Putting everything in docker compose
pip install pgcli
pgcli -h localhost -U your_username -d course_assistant -W
Links:
- adding vector search
- adding OpenAI
Links:
- Setting up Grafana
- Tokens and costs
- QA relevance
- User feedback
- Other metrics
Links:
- Grafana variables
- Exporting and importing dashboards
Links: