<td>
   <a target="_blank" href="https://www.clarifai.com/" ><img src="https://upload.wikimedia.org/wikipedia/commons/b/bc/Clarifai_Logo_FC_Web.png" width=256/></a>
</td>

<td>
<a href="https://colab.research.google.com/github/Clarifai/examples/blob/main/Integrations/Langchain/Agents/Doc-retrieve_using_Langchain-ReAct_Agent.ipynb" target="_blank"><img
src="https://colab.research.google.com/assets/colab-badge.svg" alt="Colab"></a>
</td>

# Clarifai Doc-Retrieval using React Docstore

The notebook gives a walkthrough to build a Doc Q/A using clarifai vectorstore and langchain's React Docstore with Webscraped docs of Clarifai. This enables the user to retrive info regarding the Docs of Clarifai.


The steps are as follows:

- Websracping from Clarifai Docs Website.
- Processing and Storing the Docs in Clarifai Vectorstore.
- Building a React Agent to search in the Clarifai vectorstore.
- Using the Agent to answer for User queries related to Clarifai.

## Agents

The core idea of agents is to use a language model to choose a sequence of actions to take. In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.

To know more on Agents: https://python.langchain.com/docs/modules/agents/


### Setup

In [None]:
!pip install -U langchain
!pip install clarifai

Initialize your PAT key as environment variable.

In [None]:
#Note we also have an option to pass the PAT key directly while calling the classes, so user can either intialize it as env variable or arguments.
import os
os.environ["CLARIFAI_PAT"]="YOUR_CLARIFAI_PAT"

*Note: Guide to get your [PAT](https://docs.clarifai.com/clarifai-basics/authentication/personal-access-tokens)*

### Web Scraping

Extracting Docs form https://docs.clarifai.com/ using BeautifulSoup

**Note: Storing only some pages(Portal Guide) of the website for demo purpose**

In [None]:
#Getting the URLs associated with Portal Guide in https://docs.clarifai.com
import requests
from bs4 import BeautifulSoup
import re

url = 'https://docs.clarifai.com/'
reqs = requests.get(url)
soup = BeautifulSoup(reqs.text, 'html.parser')

urls = []
for link in soup.find_all('a', attrs={'href': re.compile("^/portal")}):
    portal_url = 'https://docs.clarifai.com'+link.get('href')
    sub_reqs = requests.get(portal_url)
    soup_1 = BeautifulSoup(sub_reqs.text, 'html.parser')
    re_match = portal_url.split('/')[-2]
    for sublink in soup_1.find_all('a', attrs={'href': re.compile("^/portal-guide/"+re_match)}):
        portal_sub_url = sublink.get('href')
        if portal_sub_url.startswith('/'):
            urls.append('https://docs.clarifai.com'+portal_sub_url)

### Using Langchain's HTMLHeaderTextSplitter to split the docs based on the headers

In [None]:
#Splitting the docs based on headers. Every doc corresponds to a content under a header
from langchain.document_loaders import AsyncHtmlLoader
from langchain.text_splitter import HTMLHeaderTextSplitter


headers_to_split_on = [
    ("h1", "Header 1"),
    ("h2", "Header 2"),
    ("h3", "Header 3"),
]

def parse_website(urls):
    final_docs = []
    loader = AsyncHtmlLoader(urls)
    docs = loader.load()
    html_splitter = HTMLHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
    #looping the URLS
    for doc in docs:
        try:
            html_header_splits = html_splitter.split_text(doc.page_content)
            for header_doc in html_header_splits:
                if len(header_doc.metadata)>0:
                    if len(header_doc.page_content)>300:
                        header_doc.metadata.update(doc.metadata)
                        final_docs.append(header_doc)
        except:
            pass
    return final_docs

In [None]:
#parsing the docs
parsed_docs = parse_website(urls)

## Uploading the Docs to Clarifai VectorStore

- Clarifai has inbuilt vectorstore in its application. 
- When input is uploaded into the app, it would convert it into embeddings and store it in the vectorstore.

To know more about Clarifai Vectorstore: https://python.langchain.com/docs/integrations/vectorstores/clarifai

In [None]:
#importing Clarifai Vectorstore from langchain
from langchain.vectorstores import Clarifai as Clarifaivectorstore

In [None]:
clarifai_vector_db = Clarifaivectorstore.from_documents(
    user_id="user_id",
    app_id= "app_id",
    documents = parsed_docs,
    number_of_docs=1
)

## Retrievar function(Custom Search function for Docstore)

Refer: https://python.langchain.com/docs/modules/agents/agent_types/react_docstore

In [None]:
from langchain.llms import Clarifai as Clarifaillm
from langchain.retrievers.multi_query import MultiQueryRetriever

You can use several language models from [clarifai](https://clarifai.com/explore/models?filterData=%5B%7B%22field%22%3A%22use_cases%22%2C%22value%22%3A%5B%22llm%22%5D%7D%5D&page=1&perPage=24) platform.

Using Clarifai LLM for retriever

In [None]:
#Model URL from Clarifai Community
MODEL_URL = "https://clarifai.com/openai/chat-completion/models/GPT-4"

llm=Clarifaillm(model_url= MODEL_URL)

## MultiQueryRetriever

- The MultiQueryRetriever automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query.
- By generating multiple perspectives on the same question, the MultiQueryRetriever might be able to overcome some of the limitations of the distance-based retrieval and get a richer set of results.

To know more on MultiQueryRetriever: https://python.langchain.com/docs/modules/data_connection/retrievers/MultiQueryRetriever

In [None]:
retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever=clarifai_vector_db.as_retriever(), llm=llm
)

### Custom Lookup function for the React agent

- This function will retrieve the similar documents based on the similarity between the given query and the uploaded documents

In [None]:
def doc_lookup(search_query):
    unique_docs = retriever_from_llm.get_relevant_documents(query=search_query)
    return unique_docs[0].page_content

In [None]:
lookup_function = doc_lookup

## React Docstore Agent

This agent uses the ReAct framework to interact with a docstore. This agent is equivalent to the original [ReAct paper](https://arxiv.org/pdf/2210.03629.pdf), specifically the Wikipedia example.

https://python.langchain.com/docs/modules/agents/agent_types/react_docstore

In [None]:
#Import necessary libraries.
import langchain
from langchain.agents import AgentType, Tool, initialize_agent
from langchain.agents.react.base import DocstoreExplorer
from langchain.docstore import DocstoreFn
from langchain.llms import Clarifai as Clarifaillm

### Initializing the tools for the agent

To know more on tools: https://python.langchain.com/docs/modules/agents/tools/

In [None]:
docstore = DocstoreExplorer(DocstoreFn(lookup_fn=lookup_function))
tools = [
    Tool(
        name="Search",
        func=docstore.search,
        description="useful for when you need to ask with search",
    ),
    Tool(
        name="Lookup",
        func=docstore.lookup,
        description="useful for when you need to ask with lookup",
    ),
]

### Initializing the agent with Clarifai LLM

#### GPT-4
OpenAI's GPT-4 is a generative language model (GLM) that provides insightful responses. As a state-of-the-art large language model (LLM), ChatGPT generates contextually relevant text.

You can use several language models from [clarifai](https://clarifai.com/explore/models?filterData=%5B%7B%22field%22%3A%22use_cases%22%2C%22value%22%3A%5B%22llm%22%5D%7D%5D&page=1&perPage=24) platform.

In [None]:
#Model URL from Clarifai Community
MODEL_URL = "https://clarifai.com/openai/chat-completion/models/GPT-4"

llm=Clarifaillm(model_url=MODEL_URL)

In [None]:
#initializing the agent
react = initialize_agent(tools, llm, agent=AgentType.REACT_DOCSTORE, verbose=True,handle_parsing_errors=True)

## Custom Promt Template

- Here we are tweaking the Prompt Template of the LLM to leverage the agent for our specific use case.
- Below is an example of giving our own Prompt Template for better accuracy.

In [None]:
#Defining the Template
CUSTOM_PROMPT_TEMPLATE = """Question: How to make prediction with a model in Clarifai?
Thought: I need to search how to make prediction with a model in clarifai docs.
Action: Search[How to make prediction with a model]
Observation: These are the six key steps in the process for building new models:  \nGathering data—Involves collecting data either from Clarifai’s data reserves or collecting your own unique datasets.Building a visual dictionary—Involves defining the process and the success criteria.Preparing or ‘cleaning’ the data—This step prepares the data for use in training the model. You can recognize important variables or visual features, and check for data imbalances that could impact the predictions of your model.Training the model—Involves using a process called training to “teach” the model what it will eventually predict on, based on the data you prepared. This happens over many interactions, improving accuracy.
Thought: It explains about the process of building new models. I need to search how to make prediction with a model in clarifai docs.
Action: Search[Model Prediction]
Observation: You can use the Clarifai portal to analyze your inputs and understand what's inside of them. The portal will return a list of concepts with corresponding probabilities of how likely it is that these concepts are contained within the provided inputs.
Predictions are ready the moment you upload an input. You can make predictions using custom or public models and workflows. You can search for your preferred model on the Community platform and use it to make predictions.
Thought: It explains about how to make prediction with a model. So this is the answer.
Action: Finish[So to make prediction with a model, Use the clarifai portal to analyze the inputs and you can use custom or public models to make predictions.]

Question: How to create workflow in Clarifai?
Thought: I need to search how to create  a workflow in clarifai docs.
Action: Search[How to create workflow?]
Observation: To build a workflow, just grab a model from the left-hand sidebar and drag it onto your workspace. This model will automatically be configured as a node in your workflow. Once a node is added to your workflow, you can configure the model parameters in the right-hand sidebar.  \nThe models in your workflow will automatically connect when they are placed near each other. You can also grab the node connectors on each model and configure your workflow nodes manually.
Thought: It explains about the process of creating a workflow. So this is the answer.
Action: Finish[Grab a model from the left-hand sidebar and drag it into your workspace. Once a node is added, you can configure the parameters in the right-hand sidebar. You can also grab the node connectors and configure your nodes manually.]

Question: How to create moduels in Clarifai?
Thought: I need to search how to create  a module in clarifai docs.
Action: Search[How to create workflow?]
Observation: To create a new module, go to the individual page of your application. Then, select the Modules option on the collapsible left sidebar.You'll be redirected to the Modules manager page, where you can create new modules and view already created ones.Click the Create Module button at the upper-right corner of the page.
Thought: It explains about the process of creating a workflow. So this is the answer.
Action: Finish[1) Go to individual page of you application. 2) Select the Modules option on the collapsible left sidebar. 3) Action: Finish[1) Go to individual page of you application. 2) Select the Modules option on the collapsible left sidebar. 3) reate Module button at the upper-right corner of the page.]

Question: {input}
{agent_scratchpad}"""

### Assiging our own Custom Prompt Template to the Agent

In [59]:
react.agent.llm_chain.prompt.template = CUSTOM_PROMPT_TEMPLATE
print(react.agent.llm_chain.prompt.template) 

Question: How to make prediction with a model in Clarifai?
Thought: I need to search how to make prediction with a model in clarifai docs.
Action: Search[How to make prediction with a model]
Observation: These are the six key steps in the process for building new models:  
Gathering data—Involves collecting data either from Clarifai’s data reserves or collecting your own unique datasets.Building a visual dictionary—Involves defining the process and the success criteria.Preparing or ‘cleaning’ the data—This step prepares the data for use in training the model. You can recognize important variables or visual features, and check for data imbalances that could impact the predictions of your model.Training the model—Involves using a process called training to “teach” the model what it will eventually predict on, based on the data you prepared. This happens over many interactions, improving accuracy.
Thought: It explains about the process of building new models. I need to search how to make 

# Interacting with the Docstore

In [60]:
question = "How to install a module in Clarifai?"
react.run(question)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to search how to install a module in clarifai docs.
Action: Search[How to install module in Clarifai][0m


Observation: [36;1m[1;3mThere are two steps for creating modules:  
1. Creating a module and its versions—This is the process of authoring a new module and registering it on our AI lake. At the end of this process, it won’t be interactable with our UI just yet; the next process of installing the module version into the sidebar handles that. Creating a module requires familiarity with GitHub and Python development. You’ll start by coding your module, which is a great experience even locally, and then creating the module in the Clarifai platform, where we will fully host it for you. Each time you update the code of your module, you can simply create a new module version to capture that change.  
2. Installing a module—Once a module is created (or you find an already created one in our Community platform, or from your team members), you can install the module. This process will register the module in your Clarifai app, and it will appear on the portal’s collapsible left sidebar so that

'To install a module, you need to register it on the Clarifai platform. Once a module is created, you can install the module. This process will register the module in your Clarifai app, and it will appear on the portal’s collapsible left sidebar so that you can interact with it.'

## Clarifai Resources

**Website**: [https://www.clarifai.com](https://www.clarifai.com/)

**Demo**: [https://clarifai.com/demo](https://clarifai.com/demo)

**Sign up for a free Account**: [https://clarifai.com/signup](https://clarifai.com/signup)

**Developer Guide**: [https://docs.clarifai.com](https://docs.clarifai.com/)

**Clarifai Community**: [https://clarifai.com/explore](https://clarifai.com/explore)

**Python SDK Docs**: [https://docs.clarifai.com/python-sdk/api-reference](https://docs.clarifai.com/python-sdk/api-reference)

---