We've actually already seen how it can be useful to "chain" various LLM operations together. In the Hinglish chat example we chained a response generation and then a machine translation using LLMs.

**As you solve problems with LLMs, do NOT always think about your task as a single prompt.** Decompose your problem into multiple steps. Just like programming which uses multiple functions, classes, etc. LLM integration is a new kind of reasoning engine that you can "program" in a multi-step, conditional, control flow sort of fashion.

Further, enterprise LLM appllications need reliability, trust, and consistency. **Because LLMs only predict probable text, they have no understanding or connection to reality.** This produces **hallucinations** that can be part of a coherent text block but factually (or otherwise) wrong. To deal with this we need to **ground** on LLM operations with external data. 

# Dependencies and imports

In [None]:
! pip install langchain predictionguard llama-index unstructured chromadb pdf2image pytesseract html2text

In [None]:
! apt-get install -y poppler-utils tesseract-ocr libtesseract-dev

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma
from langchain.document_loaders import UnstructuredPDFLoader
from langchain.llms import PredictionGuard
from langchain.chains.question_answering import load_qa_chain
from llama_index import (
    LLMPredictor,
    ServiceContext,
    GPTListIndex, 
    GPTVectorStoreIndex,
    SimpleWebPageReader,
    StorageContext
)
import chromadb
from llama_index.embeddings.langchain import LangchainEmbedding
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.vector_stores import ChromaVectorStore
import predictionguard as pg
from langchain import PromptTemplate, FewShotPromptTemplate, LLMChain
import numpy as np

In [None]:
client = pg.Client(token="<your access token>")

# External knowledge in prompts

We've actually already seen external knowledge within our prompts. In the question and answer example, the `context` that we pasted in was a copy of phrasing on the Domino's website. 

In [None]:
template = """Read the context below and answer the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write "Sorry I had trouble answering this question, based on the information I found."

Context: {context}

Question: {question}

Answer: """
 
prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)

In [None]:
context = "Domino's gift cards are great for any person and any occasion. There are a number of different options to choose from. Each comes with a personalized card carrier and is delivered via US Mail."

question = "How are gift cards delivered?"

myprompt = prompt.format(context=context, question=question)
print(myprompt)

In [None]:
client.predict(name="default-text-gen", data={
    "prompt": myprompt,
    "temperature": 0.1
})

# Chaining

In order to make the insertion of such external knowledge (and the sequencing of LLM operations) easier, we are going to use a package called [LangChain](https://python.langchain.com/en/latest/index.html). LangChain allows us to create chains of operations like chaining a prompt template and an LLM prediction together. There are also pre-configured chains that add a bunch of convenience to our workflows!

In [None]:
llm_chain = LLMChain(prompt=prompt, 
                     llm=PredictionGuard(token="n4HehSxYpKxQyhKX58IzqPjXa2pOOJ"), 
                     verbose=True)
 
question = "How are gift cards delivered?"
llm_chain.predict(question=question, context=context)

# Chaining with augmentation from documents

You might have seen one example of augmentation/ retrieval from external data with the popular [ChatPDF](https://www.chatpdf.com/). With LangChain chains and our LLM, this type of "answer questions out of your document" can be implented quite quickly.

To do this, we will:

1. Load in a PDF
2. Load the pages of the PDF into a vector database (Chroma)
3. Use a QA chain from LangChain to execute retrieval based question answering over the document.



In [None]:
! wget https://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf

In [None]:
# Use the convenience of LangChain to load the PDF into pages and create
# a vector database from the document.
loader = UnstructuredPDFLoader("rules_of_ml.pdf")
pages = loader.load_and_split()
docsearch = Chroma.from_documents(pages).as_retriever()

In [None]:
# Ask a question of the document.
query = "What does it mean to launch and iterate?"
docs = docsearch.get_relevant_documents(query)
chain = load_qa_chain(PredictionGuard(token="<your access token>",
                                      name="default-text-gen"), chain_type="stuff")
output = chain.run(input_documents=docs, question=query)
print(output)

# Augmentation with information on the Internet

This time we will use some slightly different tools including [LlamaIndex](https://gpt-index.readthedocs.io/en/latest/). LlamaIndex is a project that provides a central interface to connect your LLM’s with external data. There are all sorts of really powerful indices, data stores, and query structures to retrieve and generate output from websites, documents, databases, Slack, etc.

We will use a website connector to parse a website, convert it to text, and then query information out of the website. 

In [None]:
# Define an embedding model (which will be used with our vector database)
model_name = "sentence-transformers/all-mpnet-base-v2"
emb = LangchainEmbedding(HuggingFaceEmbeddings(model_name=model_name))  

In [None]:
# Define the LLM for LlamaIndex
llm_predictor = LLMPredictor(llm=PredictionGuard(token="<your access token>",
                                      name="default-text-gen"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor,
                                               embed_model=emb)

In [None]:
# Read a page from Paul Graham's website.
documents = SimpleWebPageReader(html_to_text=True).load_data(["http://paulgraham.com/worked.html"])

In [None]:
# Setup our vector database
chroma_client = chromadb.Client()
chroma_collection = chroma_client.create_collection("paul_graham")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [None]:
# Create the "Index" that will be used as the backbone of our queries. This is
# composed of "nodes" that can be queried using various techniques (vector
# based search, LLM summary search, etc.)
index = GPTVectorStoreIndex.from_documents(
    documents, 
    storage_context=storage_context,
    service_context=service_context)

In [None]:
# Query the website!
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response.response)