In [21]:
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

## Phase 1: Indexing

### Step 1 of indexing: Load the data

In [37]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(dotenv_path='./RAG/.env')
openai_api_key = os.environ["OPENAI_API_KEY"]
groq_api_key = os.environ['GROQ_API_KEY']

In [39]:
os.getenv('LANGCHAIN_PROJECT')

'llm bootcamp'

In [23]:
# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()


In [24]:
print(docs[0].page_content[:500])



      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In


#### Additional info on Loading Data
* [Documentation on Document Loaders](https://python.langchain.com/docs/modules/data_connection/document_loaders/). Interesting points:
    * PDF. Check the Multimodal LLM Apps section.
    * CSV.
    * JSON.
    * HTML.
* [Integrations with 3rd Party Document Loaders](https://python.langchain.com/docs/integrations/document_loaders/). Interesting points:
    * Email.
    * Github.
    * Google Drive.
    * HugggingFace dataset.
    * Images.
    * Jupyter Notebook.
    * MS Excel.
    * MS Powerpoint.
    * MS Word.
    * MongoDB.
    * Pandas DataFrame.
    * Twitter.
    * URL.
    * Unstructured file.
    * WebBaseLoader.
    * YouTube audio.
    * YouTube transcription.

### Step 2 : Split Data into Small Chunks

In [25]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)

### 3rd Step: Transform the small chunks into embeddings and store them into a Vector Store

In [26]:
vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())

#### Additional info on Embeddings
* [Documentation on Embeddings](https://python.langchain.com/docs/modules/data_connection/text_embedding).
* [Integration with 3rd Party Embedding Models](https://python.langchain.com/docs/integrations/text_embedding/). Interesting points:
    *  OpenAI.
    *  Cohere.
    *  Fireworks.
    *  Google Vertex AI.
    *  Hugging Face.
    *  Llama-ccp.
    *  MistralAI.
    *  Ollama.

#### Additional info on Vector Databases (also called Vector Stores)
* [Documentation on Vector Databases](https://python.langchain.com/docs/modules/data_connection/vectorstores/).
* [Integration with 3rd Party Vector Databases](https://python.langchain.com/docs/integrations/vectorstores/). Interesting points:
    * Chroma.
    * Faiss.
    * Postgres.
    * PGVector.
    * Databricks.
    * Pinecone.
    * Redis.
    * Supabase.
    * LanceDB.
    * MongoDB.
    * Neo4j.
    * Activeloop DeepLake. 

## Phase 2: Retrieval

# Phase 2: Retrieval
* We are going to create a RAG app that
    * takes a user question,
    * searches for documents relevant to that question,
    * passes the retrieved documents and initial question to a model,
    * and returns an answer.
* First we need to define our logic for searching over documents. LangChain defines a Retriever interface which wraps an index that can return relevant Documents given a string query.
* The most common type of Retriever is the VectorStoreRetriever, which uses the similarity search capabilities of a vector store to facilitate retrieval. Any VectorStore can easily be turned into a Retriever with VectorStore.as_retriever().

#### Set the retriever
A "retriever" refers to a component or object that is used to retrieve (or search for) specific information from a database or a collection of data. The retriever is designed to work with a vector database that stores data in the form of embeddings.

Embeddings are numerical representations of data (in this case, text data split into smaller chunks) that capture the semantic meaning of the original content. These embeddings are created using models like the ones provided by OpenAI (for example, through `OpenAIEmbeddings`), which convert textual content into high-dimensional vectors that numerically represent the semantic content of the text.

The vector database, in this context represented by `Chroma`, is a specialized database that stores these embeddings. The primary function of this database is to enable efficient similarity searches. That is, given a query in the form of an embedding, the database can quickly find and return the most similar embeddings (and by extension, the corresponding chunks of text or documents) stored within it.

When the code sets the `retriever` as `vectorstore.as_retriever()`, it essentially initializes a retriever object with the capability to perform these similarity-based searches within the `Chroma` vector database. This means that the retriever can be used to find the most relevant pieces of information (text chunks, in this scenario) based on a query. This is particularly useful in applications that needs to retrieve information based on semantic similarity rather than exact keyword matches.

In [27]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

#### Additional information on Retrievers.
* [Documentation on Retrievers](https://python.langchain.com/docs/modules/data_connection/retrievers/). Interesting points:
    * Vectorstore.
    * Time-Weighted Vectorstore: most recent first.
    * Advanced Retrievers. 
* [Integrations with 3rd Party Retrieval Services](https://python.langchain.com/docs/integrations/retrievers/).

### Phase 3: Generation

In [28]:
llm = ChatOpenAI(model='gpt-3.5-turbo-0125', temperature=0)

Set the Prompt

In [29]:
prompt = hub.pull('rlm/rag-prompt')
prompt

ChatPromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], input_types={}, partial_variables={}, template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"), additional_kwargs={})])

Define the RAG Chain

In [30]:
# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [31]:
rag_chain = (
    {'context': retriever | format_docs, 'question': RunnablePassthrough()  }
    | prompt
    | llm
    | StrOutputParser()
)

* Using the LCEL Runnable protocol to define the chain allows us to:
    * pipe together components and functions in a transparent way.
    * automatically trace our chain in LangSmith.
    * get streaming, async, and batched calling out of the box.

### Runnables
A "Runnable" is like a tool or a small machine in a factory line that has a specific job. You can think of it as a mini-program that knows how to do one particular task, like read data, change it, or make decisions based on that data.

A "Runnable" in the context of LangChain is essentially a component or a piece of code that can be executed or "run". 

It acts like a building block that can perform a specific operation or process data in a particular way.

The chaining or composition of Runnables allows you to build a sequence where data flows from one Runnable to another, each performing its function, ultimately leading to a final result.

In essence, a Runnable makes it easy to define and reuse modular operations that can be combined in various ways to build complex data processing and interaction pipelines efficiently.

#### RunnableParallel
RunnableParallel is like a manager that can handle multiple tasks at once in a software system. Instead of doing one job at a time, it can do several jobs simultaneously, making the process faster and more efficient.

RunnableParallel is used to manage different components that do tasks like retrieving information, passing questions, or formatting data, and then brings together all their outputs neatly. This makes it very handy for building complex operations where multiple, independent processes need to run at the same time.

#### RunnablePassThrough
RunnablePassthrough is a simple tool that lets data pass through it without making any changes. You can think of it like a clear tube: whatever you send into one end comes out the other end unchanged.

When working with data transformations and pipelines, RunnablePassthrough is often used alongside other tools that might modify the data. For example, when multiple operations are run in parallel, you might want to let some data flow through unaltered while other data gets modified or processed differently.

RunnablePassthrough is often used in combination with RunnableParallel:

#### We can start asking questions to our RAG application

* We will use streaming, so the answer will flow like what we see using chatGPT instead printing the whole answer in one single step.

In [32]:
for chunk in rag_chain.stream("What is Task Decomposition?"):
    print(chunk, end="", flush=True)

Task Decomposition is a technique that breaks down complex tasks into smaller and simpler steps. It helps agents manage and interpret the thinking process by transforming big tasks into manageable ones. This process can be done through prompting techniques like Chain of Thought or Tree of Thoughts.

#### Alternative approach with a customized prompt and without streaming.

In [33]:
from langchain_core.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:"""
custom_rag_prompt = PromptTemplate.from_template(template)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is Task Decomposition?")

'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps, making them more manageable for agents or models to handle. It can be achieved through methods like Chain of Thought or Tree of Thoughts, which explore multiple reasoning possibilities at each step. Thanks for asking!'

#### Additional Information on Chat Models and LLM Models
* [Documentation on Chat Models](https://python.langchain.com/docs/modules/model_io/chat/). Interesting points:
    * How-To Guides.
    * Function calling.
    * Streaming. 
* [Integration with 3rd Party Chat Models](https://python.langchain.com/docs/integrations/chat/). Interesting points:
    * OpenAI.
    * Anthropic.
    * Azure OpenAI.
    * Cohere.
    * Fireworks.
    * Google AI.
    * Hugging Face.
    * Llama 2 Chat.
    * MistralAI.
    * Ollama.
* [Documentation on LLM Models](https://python.langchain.com/docs/modules/model_io/llms). Interesting points:
    * How-To Guides.
    * Streaming.
    * Tracking token usage. 
* [Integrations with 3rd Party LLM Models](https://python.langchain.com/docs/integrations/llms). Interesting points:
    * OpenAI.
    * Anthropic.
    * Azure OpenAI.
    * Cohere.
    * Fireworks.
    * Google AI.
    * GPT4All.
    * Hugging Face.
    * Llama.cpp
    * Ollama.
    * Replicate.