We've actually already seen how it can be useful to "chain" various LLM operations together. In the Hinglish chat example we chained a response generation and then a machine translation using LLMs.

**As you solve problems with LLMs, do NOT always think about your task as a single prompt.** Decompose your problem into multiple steps. Just like programming which uses multiple functions, classes, etc. LLM integration is a new kind of reasoning engine that you can "program" in a multi-step, conditional, control flow sort of fashion.

Further, enterprise LLM appllications need reliability, trust, and consistency. **Because LLMs only predict probable text, they have no understanding or connection to reality.** This produces **hallucinations** that can be part of a coherent text block but factually (or otherwise) wrong. To deal with this we need to **ground** on LLM operations with external data.

# Dependencies and imports

In [21]:
! pip install langchain predictionguard llama-index unstructured chromadb pdf2image pytesseract html2text sentence_transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentence_transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/86.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting transformers<5.0.0,>=4.6.0 (from sentence_transformers)
  Downloading transformers-4.30.2-py3-none-any.whl (7.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [31m124.9 MB/s[0m eta [36m0:00:00[0m
Collecting sentencepiece (from sentence_transformers)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m94.2 MB/s[0m eta [36m0:00:0

In [2]:
! apt-get install -y poppler-utils tesseract-ocr libtesseract-dev

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libarchive-dev libleptonica-dev tesseract-ocr-eng tesseract-ocr-osd
The following NEW packages will be installed:
  libarchive-dev libleptonica-dev libtesseract-dev poppler-utils tesseract-ocr
  tesseract-ocr-eng tesseract-ocr-osd
0 upgraded, 7 newly installed, 0 to remove and 46 not upgraded.
Need to get 8,367 kB of archives.
After this operation, 32.7 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 libarchive-dev amd64 3.4.0-2ubuntu1.2 [491 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal/universe amd64 libleptonica-dev amd64 1.79.0-1 [1,389 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal/universe amd64 libtesseract-dev amd64 4.1.1-2build2 [1,463 kB]
Get:4 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 poppler-utils amd64 0.86.1-0ubuntu1.1 [174 kB]
Get:5 http://archi

In [36]:
import os

from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma
from langchain.document_loaders import UnstructuredPDFLoader
from langchain.llms import PredictionGuard
from langchain.chains.question_answering import load_qa_chain
from langchain.text_splitter import CharacterTextSplitter
from llama_index import (
    LLMPredictor,
    ServiceContext,
    GPTListIndex,
    GPTVectorStoreIndex,
    SimpleWebPageReader,
    StorageContext
)
import chromadb
from llama_index.embeddings.langchain import LangchainEmbedding
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.vector_stores import ChromaVectorStore
import predictionguard as pg
from langchain import PromptTemplate, FewShotPromptTemplate, LLMChain
import numpy as np
from getpass import getpass

In [9]:
pg_access_token = getpass('Enter your Prediction Guard access token: ')
os.environ['PREDICTIONGUARD_TOKEN'] = pg_access_token

Enter your Prediction Guard access token: ··········


# External knowledge in prompts

We've actually already seen external knowledge within our prompts. In the question and answer example, the `context` that we pasted in was a copy of phrasing on the Domino's website.

In [10]:
template = """Read the context below and answer the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write "Sorry I had trouble answering this question, based on the information I found."

Context: {context}

Question: {question}

Answer: """

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=template,
)

In [11]:
context = "Domino's gift cards are great for any person and any occasion. There are a number of different options to choose from. Each comes with a personalized card carrier and is delivered via US Mail."

question = "How are gift cards delivered?"

myprompt = prompt.format(context=context, question=question)
print(myprompt)

Read the context below and answer the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write "Sorry I had trouble answering this question, based on the information I found."

Context: Domino's gift cards are great for any person and any occasion. There are a number of different options to choose from. Each comes with a personalized card carrier and is delivered via US Mail.

Question: How are gift cards delivered?

Answer: 


In [16]:
result = pg.Completion.create(
    model="Camel-5B",
    prompt=myprompt
)
result['choices'][0]['text']

"\nDomino's gift cards are delivered via US Mail."

# Chaining

In order to make the insertion of such external knowledge (and the sequencing of LLM operations) easier, we are going to use a package called [LangChain](https://python.langchain.com/en/latest/index.html). LangChain allows us to create chains of operations like chaining a prompt template and an LLM prediction together. There are also pre-configured chains that add a bunch of convenience to our workflows!

In [17]:
llm_chain = LLMChain(prompt=prompt,
                     llm=PredictionGuard(model="Camel-5B"),
                     verbose=True)

question = "How are gift cards delivered?"
llm_chain.predict(question=question, context=context)



[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mRead the context below and answer the question. If the question cannot be answered based on the context alone or the context does not explicitly say the answer to the question, write "Sorry I had trouble answering this question, based on the information I found."

Context: Domino's gift cards are great for any person and any occasion. There are a number of different options to choose from. Each comes with a personalized card carrier and is delivered via US Mail.

Question: How are gift cards delivered?

Answer: [0m

[1m> Finished chain.[0m


'\nGift cards are delivered via US Mail.'

# Chaining with augmentation from documents

You might have seen one example of augmentation/ retrieval from external data with the popular [ChatPDF](https://www.chatpdf.com/). With LangChain chains and our LLM, this type of "answer questions out of your document" can be implented quite quickly.

To do this, we will:

1. Load in a PDF
2. Load the pages of the PDF into a vector database (Chroma)
3. Use a QA chain from LangChain to execute retrieval based question answering over the document.



In [18]:
! wget https://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf

--2023-06-14 11:32:51--  https://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf
Resolving martin.zinkevich.org (martin.zinkevich.org)... 173.236.154.195
Connecting to martin.zinkevich.org (martin.zinkevich.org)|173.236.154.195|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 460245 (449K) [application/pdf]
Saving to: ‘rules_of_ml.pdf’


2023-06-14 11:32:51 (10.5 MB/s) - ‘rules_of_ml.pdf’ saved [460245/460245]



In [56]:
# Define an embedding model (which will be used with our vector database)
model_name = "sentence-transformers/all-mpnet-base-v2"
emb = HuggingFaceEmbeddings(model_name=model_name)

In [57]:
# Use the convenience of LangChain to load the PDF into pages and create
# a vector database from the document.
loader = UnstructuredPDFLoader("rules_of_ml.pdf")
pages = loader.load_and_split()
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
docs = text_splitter.split_documents(pages)
docsearch = Chroma.from_documents(docs, emb).as_retriever()



In [63]:
# Ask a question of the document.
query = "What does it mean to launch and iterate?"
docs = docsearch.get_relevant_documents(query)
chain = load_qa_chain(PredictionGuard(
    model="Dolly-3B",
    stop=["Question:"],
    max_tokens=100), chain_type="stuff")
output = chain.run(input_documents=docs, question=query)
print(output.split('.')[0])

A machine learning model that you develop and test doesn’t become “finished” until you have deployed it to a production environment
