 1. Your approach to the problem, data, and architecture
 2. Given more time how would you improve the system?
 3. In production, how would you evaluate the system?

## Efficient Knowledge Retrieval with RAG Chatbots
By Douglas Patton
 ### Agenda
- #### Problem
- #### Data
- #### Solutions
- #### Evaluation
- #### Extensions

### Problem Statement

#### Requirements
 - build a RAG chatbot
 - that answers customer queries
 - using existing html files

#### Additional Desireable Features
  - allow conversation
  - show reference information
  - avoid confabulations/hallucinations
  - handle irrelevant or unanswerable questions

### Data
#### Reference documents: collection of ~1200 dense `.html` files
#### 874 pages remaining after cleaning and filtering 

#### `HtmlCleaner` creates dictionary of clean text for each page
 1. Inscriptis get_text() yields markdown for page menus, white space for content structure
 1. Beautiful Soup used to extract page title
 1. Filters applied:
    - `HtmlPatterns` to remove text associated with page menus, copyright, selected page titles (e.g., "Terms of Use"), etc.
    - minimum length: `min_chars_ref_text=10`,
    - English language: `min_engl_share_ref_text=90`,

|tokens per page|
|-|
|![alt](images/tokens_per_page.png)|


|pages per title|tokens per title|
|-|-|
|![alt](images/pages_per_title.png)|  ![alt](images/tokens_per_title.png)|  

#### `TextEmbedder` processes text into list of langchain documents 
 1. create documents from clean text
    - split text if longer than `max_tokens_ref_text=4000`
 1. create embeddings and add to FAISS vector store

### Solutions

#### Four variants from `chatbots.py`
 1. simple `RagChatbot` using langchain *defaults*

 1. more flexible `RagChatbotMultiRetrieverCombiner` uses *custom retriever*, `MultiRetrieverCombiner`
 1. `RagChatbotMultiQA` *queries documents individually* and merges the answers
 1. `RagChatSyntheticQ` retrieves *similar synthetic queries* and merges the answers

#### Relative performance examples
 Query: What's the latest release for Maya?
 
 Responses:
 - `RagChatBot`: The latest release for Maya is **not explicitly mentioned** in the provided documents.
   
 - `RagChatbotMultiRetrieverCombiner`: The latest release for Maya is **not specified** in the provided document.
 - `RagChatMultiQA`: The latest release for Maya is **Maya 2024.2**, which includes various featured enhancements such as Time Slider Redesign, New Graph Editor Sculpting Tools, and Mac M1 Architecture Support.
 - `RagChatSyntheticQ`: The latest release for **Maya is 2024.2**, which includes various enhancements such as Time Slider Redesign, New Graph Editor Sculpting Tools, and Mac M1 Architecture Support.

#### Relative performance examples
 Query: How does Smart Blocks work in AutoCAD 2024?

 Responses:
 - `RagChatBot`: Smart Blocks in AutoCAD 2024 **automatically place blocks** based on previous placements or quickly find and replace existing blocks.
   
 - `RagChatbotMultiRetrieverCombiner`: Smart Blocks in AutoCAD 2024 **automatically place blocks** based on previous placements or quickly find and replace existing blocks.
 - `RagChatMultiQA`: Smart Blocks in AutoCAD 2024 work by allowing users to **automate tasks** such as comparing drawings, replacing blocks, counting objects, creating schedules, and more.
 - `RagChatSyntheticQ`: Smart Blocks in AutoCAD 2024 work by allowing users to create and customize reusable content, such as blocks, to streamline their workflows and **automate common tasks**. This feature helps to improve productivity and efficiency in drafting and design processes.

### Evaluation

 - Labeling system for evaluating performance
    - Explicit: elicit feedback from chatbot users
    - Implicit: 
      - ad-hoc & business outcomes
        - short-term:
          - software usage metrics
          - subsequent questions highly similar to initial question.
        - long-term:
          - conversions and churn
          - UX surveys
      - predict labels, probabilities using data from explicit labels
      - AI as judge
        - augment with reference docs
        - more advanced LLMs
        - few-shot with high quality examples 

 - Applications for labels
   - A/B testing
   - refining chat responses
     - identifying and fixing errors
   - relate labels to similarity measures from vector store queries

### Extensions

 - improve retention of html structure
 - enhance source attribution
   - ask LLM to include % attribution along with answer
   - use separate LLM for attribution
 - create question and answer database for caching answers
   - track performance to predict answer quality
   - use for smarter retrieval
   - identify bugs
 - memory class for dynamic chat
 - enhance retrieval
   - leverage retrieval similarity scores
   - compress long docs, clusters of similar docs
   - rank retrieved docs based on the question/context
 - logic for distinguishing between subproducts, e.g., LT and Full
   - few-shot high quality examples in prompt when relevant
   - instructions for distinction in prompt 
 

 - intercept problematic answers
 - tune chatbot for the user and/or the query
   - length, tone, level of technical detail
 - performance:cost can be optimized for the user and/or query
   - skip the LLM and just retrieve existing Q:A pairs
   - use cheaper LLM depending on the user or query
   - compress and store reference docs for fewer LLM input tokens
   - flexible prompts asking for short or detailed answers to control LLM output tokens
 - create agents for CoT for more sophisticated queries
 - create token budget class for flexibly managing the construction of complex, multi-stage queries