<p style = 'font-size:20px;font-family:Arial'><b>Introduction:</b></p>

<p style = 'font-size:16px;font-family:Arial'>In the Chat with documentation system using Generative AI demo, the combination of <b>RAG, Langchain, and LLM models</b> allows users to ask queries in layman's terms, retrieve relevant information from the Vector store, and generate accurate and concise answers based on the retrieved data. This integration of retrieval-based and generative-based approaches provides a powerful tool for extracting knowledge from structured sources and delivering user-friendly responses.</p>

<p style = 'font-size:16px;font-family:Arial'>In this demo we will build Chatbot using Panel (for chat UI), LangChain, a powerful library for working with LLMs like GPT-3.5, GPT-4, Bloom, etc. and JumpStart in ClearScape notebooks, a system is built where users can ask business questions in natural English and receive answers with data drawn from the relevant databases.</p>

<p style = 'font-size:16px;font-family:Arial'>The following diagram illustrates the architecture.</p>

<center><img src="images/header_chat_td.png" alt="architecture" /></center>


<br>
<p style = 'font-size:16px;font-family:Arial'>Before going any farther, let's get a better understanding of RAG, LangChain, and LLM.</p>

<ol style = 'font-size:16px;font-family:Arial'><b><li> Retrieval-Augmented Generation (RAG):</li></b></ol>
<p style = 'font-size:16px;font-family:Arial'> &emsp;  &emsp;RAG is a framework that combines the strengths of retrieval-based and generative-based approaches in question-answering systems.It utilizes both a retrieval model and a generative model to generate high-quality answers to user queries. The retrieval model is responsible for retrieving relevant information from a knowledge source, such as a database or documents. The generative model then takes the retrieved information as input and generates concise and accurate answers in natural language.</p>


<p style = 'font-size:16px;font-family:Arial'>A typical RAG (Retrieval-and-Generation) application has two main components:</p>

<p style = 'font-size:16px;font-family:Arial'><b>Indexing:</b> a pipeline for ingesting data from a source and indexing it. This usually happens offline. The indexing process involves several steps, including loading the data, splitting it into smaller chunks, and storing and indexing the splits. This is often done using a VectorStore and Embeddings model.</p>
    
<p style = 'font-size:16px;font-family:Arial'><b>Retrieval and generation:</b> the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model. The retrieval process involves searching the index for the most relevant data based on the user query, and then passing that data to the model for generation.</p>

<p style = 'font-size:16px;font-family:Arial'>The most common full sequence from raw data to answer looks like:</p>
<p style = 'font-size:16px;font-family:Arial'><b>Indexing</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li><b>Load:</b> Load: First we need to load our data. We'll use <code>PyPDFLoader</code> for this.</li>
    <li><b>Split:</b> Text splitters break large Documents into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't in a model's finite context window. Here, our pdf document will be splits into pages.</li>
    <li><b>Store:</b> We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a VectorStore and Embeddings model</li>
    </ul>

<p style = 'font-size:16px;font-family:Arial'>The following diagram illustrates the architecture of load, split and store.</p>

<center><img src="images/rag_load_store.png" alt="rag indexing architecture" /></center>
<center>image source: <a href="https://python.langchain.com/docs/use_cases/question_answering/">langchain.com</a></center>

<p style = 'font-size:16px;font-family:Arial'><b>Retrieval and generation</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li><b>Retrieval:</b> During runtime, the user inputs a query. We first generate embeddings for it, which are then passed to the Vantage in the db_function <b>TD_VectorDistance</b> to retrieve similar documents as context. This context is then fed into the LLM model.</li>
    <li><b>Generation:</b> Finally, the model generates an answer based on the retrieved data. The answer is then presented to the user.</li>
    </ul>
    
<p style = 'font-size:16px;font-family:Arial'>The following diagram illustrates the architecture of retrieval and generation.</p>
<center><img src="images/rag_retrieval_generation_td.png" alt="retrieval generation architecture"/></center>
<center>image source: <a href="https://python.langchain.com/docs/use_cases/question_answering/">langchain.com</a></center>

<ol style = 'font-size:16px;font-family:Arial' start="2"><b><li> Langchain:</li></b></ol>
<p style = 'font-size:16px;font-family:Arial'> &emsp;  &emsp; LangChain is a framework that facilitates the integration and chaining of large language models with other tools and sources to build more sophisticated AI applications. LangChain does not serve its own LLMs; instead, it provides a standard way of communicating with a variety of LLMs, including those from OpenAI and HuggingFace. LangChain accelerates the development of AI applications with building blocks. We learn the leverage the following building blocks in this notebook:</p>
 
<ol style = 'font-size:16px;font-family:Arial'>
    <li> <b> LLMs</b> – LangChain's <code>llm</code> class is designed to provide a standard interface for all LLM it supports.   </li>
    <li> <b> PromptTemplate</b>  - LangChain’s <code>PromptTemplate</code> class are predefined structures for generating prompts for LLM’s. They can be reused across different LLM's.</li>
    <li> <b> Chains</b> – When we build complex AI applications, we may need to combine multiple calls to LLM’s and to other components  LangChain’s <code>chain</code> class allows us to link calls to LLM’s and components. The most common type of chaining in any LLM application is combining a prompt template with an LLM and optionally an output parser. </li>
</ol>

<ol style = 'font-size:16px;font-family:Arial' start="3"><b><li> LLM Models (Large Language Models):</li></b></ol>

<p style = 'font-size:16px;font-family:Arial'> &emsp;  &emsp; LLM models refer to the large-scale language models that are trained on vast amounts of text data.
These models, such as GPT-3 (Generative Pre-trained Transformer 3),  GPT-3.5, GPT-4, HuggingFace BLOOM, LLaMA, Google's FLAN-T5, etc. are capable of generating human-like text responses. LLM models have been pre-trained on diverse sources of text data, enabling them to learn patterns, grammar, and context from a wide range of topics. They can be fine-tuned for specific tasks, such as question-answering, natural language understanding, and text generation.
LLM models have achieved impressive results in various natural language processing tasks and are widely used in AI applications for generating human-like text responses.</p>