# RAG using chroma and Llama 2 with Ollama API

In [4]:
from pathlib import Path
import sys
import warnings

path = Path.cwd().parent

sys.path.append(str(path))

from chromadb_cli.chromadb_cli import load_n_split_doc, load_doc_to_db, get_db, perform_search


warnings.filterwarnings("ignore")


### Load Web Page and split into chunks

In [5]:
http_path = 'https://lilianweng.github.io/posts/2023-06-23-agent/'
docs = load_n_split_doc(http_path, 1500, 100)

### # Embed and store

In [6]:
load_doc_to_db(docs, collection_name='ollama_exp')

<langchain.vectorstores.chroma.Chroma at 0x286a0ea70>

### Search

In [7]:
perform_search(query='"How can Task Decomposition be done?', collection_name='ollama_exp')

[Document(page_content='(2) Model selection: LLM distributes the tasks to expert models, where the request is framed as a multiple-choice question. LLM is presented with a list of models to choose from. Due to the limited context length, task type based filtration is needed.\nInstruction:\n\nGiven the user request and the call command, the AI assistant helps the user to select a suitable model from a list of models to process the user request. The AI assistant merely outputs the model id of the most appropriate model. The output must be in a strict JSON format: "id": "id", "reason": "your detail reason for the choice". We have a list of models for you to choose from {{ Candidate Models }}. Please select one model from the list.\n\n(3) Task execution: Expert models execute on the specific tasks and log results.\nInstruction:\n\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User 

### RAG Prompt

In [14]:
from langchain import hub
QA_CHAIN_PROMPT  = hub.pull("rlm/rag-prompt-llama")

### LLM
Ensure use ollama to download llama2 
```ollama pull llama2```
```

In [23]:
from langchain.llms import Ollama
from langchain.llms import Ollama
from langchain.callbacks.manager import CallbackManager 
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = Ollama(
    model="llama2",
    verbose=True,
    callbacks=CallbackManager([StreamingStdOutCallbackHandler()])
)

### QA Chain

In [17]:
from langchain.chains import RetrievalQA 

vectorstore = get_db(collection_name='ollama_exp')

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever(),
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},

)

In [18]:
question = "What are the various approaches to Task Decomposition for AI Agents?"
result = qa_chain({"query": question})

 There are several approaches to task decomposition for AI agents, including:

1. Chain of thought (CoT): This involves instructing the model to "think step by step" to decompose a complex task into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and provides insights into the model's thinking process.
2. Tree of thoughts (ToT): This extends CoT by exploring multiple reasoning possibilities at each step, generating multiple thoughts per step, and creating a tree structure. ToT can be used to solve complex problems by searching through the tree structure.
3. Using task-specific instructions: This involves providing specific instructions for completing a task, such as "Write a story outline." for writing a novel.
4. Human inputs: This involves using human inputs to guide the decomposition of tasks, such as providing detailed instructions or API call context.

The limited context length can make it challenging for LLMs to adjust plans when faced with une

### You can also get logging for tokens 

In [28]:
from langchain.schema import LLMResult 
from langchain.callbacks.base import BaseCallbackHandler 


class GenerationStaticsCalleback(BaseCallbackHandler):
    def on_llm_end(self, result: LLMResult, **kwargs)-> None: 
        print(result.generations[0][0].generation_info)


callback_manager = CallbackManager(
        [StreamingStdOutCallbackHandler(), GenerationStaticsCalleback()]
    )

llm = Ollama(
    base_url="http://localhost:11434",
    model="llama2",
    verbose=True,
    callbacks=callback_manager
)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever(),
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
)

# question = "What are the various approaches to Task Decomposition for AI Agents?"
# result = qa_chain({"query": question})

### Use Mistral 
Need to download mistral
``
ollama pull mistral:7b-instruct

In [33]:
QA_CHAIN_PROMPT= hub.pull("rlm/rag-prompt-mistral")

llm = Ollama(
    model="mistral:7b-instruct",
    verbose=True,
    callbacks=CallbackManager([StreamingStdOutCallbackHandler()]),
)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=vectorstore.as_retriever(),
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
)

In [34]:
question = "What are the various approaches to Task Decomposition for AI Agents?"
result = qa_chain({"query": question})

The various approaches to Task Decomposition for AI Agents are Chain of Thought (CoT), Tree of Thoughts, and simple prompting or human inputs. CoT involves breaking down a big task into smaller and simpler steps and exploring multiple reasoning possibilities at each step with BFS or DFS search process and evaluation by a classifier or majority vote. Tree of Thoughts extends CoT by generating multiple thoughts per step to create a tree structure, and the search process can also be BFS or DFS. Simple prompting involves using LLM with simple prompts like "Steps for XYZ" or human inputs for task-specific instructions like "Write a story outline."

Task Decomposition can be done (1) by LLM with simple prompting, (2) by using task-specific instructions, and (3) with human inputs.