Skip to content

Latest commit

 

History

History
100 lines (62 loc) · 2.49 KB

README.md

File metadata and controls

100 lines (62 loc) · 2.49 KB

Module 4: Evaluation and Monitoring

See course module 4 content here

In this module, we learn how to evaluate and monitor our LLM and RAG system.

In the evaluation part, we assess the quality of our entire RAG system before it goes live.

In the monitoring part, we collect, store and visualize metrics to assess the answer quality of a deployed LLM. We also collect chat history and user feedback.

Offline vs Online (RAG) evaluation

Generating data for offline RAG evaluation

Links:

Offline RAG evaluation: cosine similarity

Content

  • A->Q->A' cosine similarity
  • Evaluating llama3-8b-8192
  • Evaluating gemma-7b-it
  • Evaluating gemma2-9b-it

Offline RAG evaluation: LLM as a judge

  • LLM as a judge
  • A->Q->A' evaluation
  • Q->A evaluation

Links:

Capturing user feedback

You can see the prompts and the output from claude here

Content

  • Adding +1 and -1 buttons
  • Setting up a postgres database
  • Putting everything in docker compose
pip install pgcli
pgcli -h localhost -U your_username -d course_assistant -W

Links:

Capturing user feedback: part 2

  • adding vector search
  • adding OpenAI

Links:

Monitoring the system

  • Setting up Grafana
  • Tokens and costs
  • QA relevance
  • User feedback
  • Other metrics

Links:

Extra Grafana video

  • Grafana variables
  • Exporting and importing dashboards

Links:

OpenAI API