# GPT3.5 Instructor & FLARE

FLARE is a simple way to improve document Q&A, however, it depends on the logprobs being available for each token.

Many foundation models do not return logprobs, so FLARE often relies on older generative models.

With the recent launch of `gpt-3.5-turbo-instruct` from OpenAI, we can improve FLARE results.

In this notebook, we'll look at

- Using a FLARE chain [from langchain](https://python.langchain.com/docs/use_cases/question_answering/how_to/flare) with OpenAI's `gpt-3.5-turbo-instruct` model
- Embeddings generated using OpenAI `text-embedding-ada-002`
- Astra DB (build on Apache Cassandra) as our vector store

Let's take a look.

## What is FLARE?

Forward-Looking Active REtrieval augmented generation ([FLARE](https://arxiv.org/abs/2305.06983)) is an approach to improve the results of text generations from the LLM.

It works by starting to answer a question, then as tokens are used that the LLM is uncertain about, FLARE generates new search queries to the vector database to retrieve additional documents to use in the text generation.

Below is an example from the original paper:

![FLARE example](https://drive.google.com/uc?export=view&id=1pyes2ZoV97zHkyB85oSlYDD56X6Gbvoy)

## Confirm the GPT3.5 Instruct returns logprobs

In [2]:
!pip install langchain openai

Defaulting to user installation because normal site-packages is not writeable


In [3]:
# this code ensure that long text generations for the text generation wrap for
# readability
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

**If you do not have an OpenAI API Key, get one at https://platform.openai.com.**

In [5]:
import getpass
import os

# set your OpenAI API key
OPENAI_API_KEY = getpass.getpass("Your OpenAI API key (sk-...)")

import os
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

Your OpenAI API key (sk-...) ········


Here we see that the new model does return logprobs.

In [6]:
import openai

response = openai.Completion.create(
  model="gpt-3.5-turbo-instruct",
  prompt="who was george washington?",
  max_tokens=24,
  logprobs=0
)

print(response)



{
  "id": "cmpl-82hPMBao1h4HShRrhwY07jFmNb9nV",
  "object": "text_completion",
  "created": 1695654208,
  "model": "gpt-3.5-turbo-instruct",
  "choices": [
    {
      "text": "\nGeorge Washington (1732-1799) was the first president of the United States of America, serving two terms",
      "index": 0,
      "logprobs": {
        "tokens": [
          "\n",
          "George",
          " Washington",
          " (",
          "173",
          "2",
          "-",
          "179",
          "9",
          ")",
          " was",
          " the",
          " first",
          " president",
          " of",
          " the",
          " United",
          " States",
          " of",
          " America",
          ",",
          " serving",
          " two",
          " terms"
        ],
        "token_logprobs": [
          -1.7701586,
          -0.0016792956,
          -0.00040832703,
          -0.4290861,
          -0.04477831,
          -0.0004197157,
          -0.0057247938,
        

## Connect GPT3.5 Instruct to FLARE

In [7]:
!pip install langchain cassio pypdf tiktoken

Defaulting to user installation because normal site-packages is not writeable
Collecting pypdf
  Obtaining dependency information for pypdf from https://files.pythonhosted.org/packages/bf/53/8840f93c5dcd108c02cac7343e194f9dc5d15ade6200ccc661ab4e1352b5/pypdf-3.16.2-py3-none-any.whl.metadata
  Downloading pypdf-3.16.2-py3-none-any.whl.metadata (7.3 kB)
Downloading pypdf-3.16.2-py3-none-any.whl (276 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m276.3/276.3 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-3.16.2


### Let's connect to Astra DB

Sign up for a free account at https://astra.datastax.com.

In this section you'll need:

- A token JSON file with the connection credentials for your Astra database
- A secure connect bundle for your Astra database.

You can generate both of these in the Astra UI on the "Connect" tab for your database.

In [10]:
ASTRA_TOKEN_PATH = "/Users/alejandro.cantarero/code/datastax/tokens/astra/alejandro.cantarero@datastax.com-token.json"
ASTRA_DB_SECURE_BUNDLE_PATH = "/Users/alejandro.cantarero/code/datastax/tokens/astra/secure-connect-astra-demo-db.zip"

In [15]:
# Connect to Astra
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
import json

with open(ASTRA_TOKEN_PATH, "r") as f:
    creds = json.load(f)
    ASTRA_DB_APPLICATION_TOKEN = creds["token"]

cluster = Cluster(
    cloud={
        # name needs to match file you uploaded
        "secure_connect_bundle": ASTRA_DB_SECURE_BUNDLE_PATH,
    },
    auth_provider=PlainTextAuthProvider(
        "token",
        ASTRA_DB_APPLICATION_TOKEN,
    ),
)
astraSession = cluster.connect()

### Fetch data and create embeddings

In [17]:
# set parameters for data embedding
keyspace = "pdf_demo" # keyspace you made in Astra
src_dir = "data/prompt_papers"

In [16]:
# optionally drop the table to regenerate the embeddings
astraSession.execute(f"DROP TABLE IF EXISTS {keyspace}.flare_pdf_demo;")

<cassandra.cluster.ResultSet at 0x13eac4730>

In [19]:
# set up the vector store
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Cassandra

embeddings = OpenAIEmbeddings()

vectorstore = Cassandra(
    embedding=embeddings,
    session=astraSession,
    keyspace=keyspace,
    table_name="flare_pdf_demo",
)

In [20]:
# run the embedding process

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

from pprint import pprint

FILE_SUFFIX = ".pdf"

list_of_pdfs = []
# generate the list of PDF files
for f in os.listdir(src_dir):
  filename = os.path.join(src_dir, f)
  if os.path.isfile(filename) and f[-len(FILE_SUFFIX):] == FILE_SUFFIX:
    list_of_pdfs.append(filename)

# tell us what files are being processed
print("Files found:")
pprint(list_of_pdfs)

pdf_loaders = [
    PyPDFLoader(pdf_name)
    for pdf_name in list_of_pdfs
]

# strip and load the docs
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=80,
)
documents = [
    doc
    for loader in pdf_loaders
    for doc in loader.load_and_split(text_splitter=text_splitter)
]
#
texts, metadatas = zip(*((doc.page_content, doc.metadata) for doc in documents))
vectorstore.add_texts(texts=texts, metadatas=metadatas)

Files found:
['data/prompt_papers/2109.01652.pdf',
 'data/prompt_papers/2306.00739.pdf',
 'data/prompt_papers/2304.05128.pdf',
 'data/prompt_papers/2203.11171.pdf',
 'data/prompt_papers/2210.08726.pdf',
 'data/prompt_papers/2201.11903.pdf',
 'data/prompt_papers/2302.04761.pdf',
 'data/prompt_papers/2205.10625.pdf',
 'data/prompt_papers/2305.17126.pdf',
 'data/prompt_papers/2210.11416.pdf']


['ca42e241c9cf4da59629ef28dc1975cc',
 '913c55db82c340a588ca618e6f32a858',
 '5fd98de4cb094894bf862feb8f81ef03',
 '9a71345adcbd4fdf8659ccee851ff58b',
 'c8c01460649b4388afd3adc611d8c9a7',
 '5d4b4c53e9c44078ab8a1520a925ecce',
 'ece61943b2f64d58ad78bfd27f212b81',
 'b0aef03cc6aa46b2817792ff66a1396f',
 '3a1751ce68e84d39a75640a11be50cae',
 '88efebda5db248d3af5579278811cf74',
 '139caf9ad1b94cfdb472e1d5284a82cf',
 '1bec3523a05149afaec393575fab3cd0',
 '8c7d6952f84d4b1797c89379ed7046d9',
 'fce24d66c2004923b890ea50db976bd1',
 'e868b3bcd6d14955b85e5056b761f4ee',
 'c4f7d592248d4aaab76c28be083e7644',
 '1bf6503b1dbd4c72a1f9df0eeb4b88f5',
 'b840ff9110a84924914ab3c2a053a68c',
 '39480bbfb7a14d1788c351187e0f9f4a',
 '9937321a939c43bb9a596569e94fadb0',
 '189467e039f5498985a09bbf4f0efe86',
 'd921af10b23b42098c543b748f12759b',
 '3616e86ca3414d5a961678ab4032879b',
 '40c4b9ba69014fd58902e5f1d1077b51',
 '9f3db8c6817545fab0a4821292fba19f',
 '1d7b20233b1548fba20dbc8abc1aef65',
 '316eb4013b8d4b78965a1b460680a633',
 

In [21]:
# check how much data we loaded
# NOTE: count() is not a good way to do this and will timeout for large tables
#       but is an easy approach for a small number of documents
row = astraSession.execute("select count(*) from pdf_demo.flare_pdf_demo;")
print(row.one())

Row(count=2919)


In [22]:
# define a function to extract FLARE generated questions from the verbose langchain logs
import re

def get_generated_questions(logs):
  pattern = r"Generated Questions:\s*\[(.*?)\]"
  matches = re.findall(pattern, logs)

  if matches:
    questions = matches[0].split(", ")
    return questions

Note that langchain does not provide a standard interface for changing out the generative model used in the `FlareChain`. Because of this, the way you create the `FlareChain` with the `gpt-3.5-turbo-instruct` model is different than the standard interface.

This will likely be updated by langchain in the future so they share a common interface where you can change models.

In [23]:
import langchain
from langchain.chains import FlareChain
from langchain.chains.flare.base import QuestionGeneratorChain, _OpenAIResponseChain
from langchain.llms import OpenAI
from io import StringIO
import sys

langchain.verbose = False

#query = "how should I write a prompt to solve a math problem?"
#query = "my prompt produced SQL code that is incorrect. how can I fix it?"
query = "my chatbot is giving incorrect instructions on how to perform tasks in the UI. What prompting techniques can I use to fix this?"

langchain.verbose = True

# retrieve data from the vector store created above
retriever = vectorstore.as_retriever()

# setup a default LLM with no FLARE for comparison
llm = OpenAI(temperature=0)

original_stdout = sys.stdout
stdout_str = StringIO()
sys.stdout = stdout_str

# create the FLARE process
flare = FlareChain.from_llm(
    llm,
    retriever=retriever,
    max_generation_len=512,
    min_prob=0.3,
)

# setup new instruct model
llm35 = OpenAI(
    model="gpt-3.5-turbo-instruct",
    temperature=0,
    max_tokens=512,
    model_kwargs={"logprobs": 1}
)

flare35 = FlareChain(
    question_generator_chain=QuestionGeneratorChain(llm=llm35),
    response_chain=_OpenAIResponseChain(llm=llm35),
    retriever=retriever,
    min_prob=0.3,
)

flare_result = flare.run(query)
questions = get_generated_questions(stdout_str.getvalue())

stdout_str.truncate(0)
stdout_str.seek(0)

flare35_result = flare35.run(query)
questions35 = get_generated_questions(stdout_str.getvalue())

sys.stdout = original_stdout

# call model directly
llm_response = openai.Completion.create(
  model="gpt-3.5-turbo-instruct",
  prompt=query,
  max_tokens=512,
)

print(f"GENERATED FLARE QUESTIONS:\n{questions}\n\n")
print(f"FLARE RESULT:\n {flare_result}\n\n")

print("-------------------\n")
print(f"GENERATED FLARE QUESTIONS (with GPT3.5):\n{questions35}\n\n")
print(f"FLARE RESULT 3.5:\n {flare35_result}\n\n")

GENERATED FLARE QUESTIONS:
['" How can you fix your chatbot\'s incorrect instructions on how to perform tasks in the UI?"', "' What are some ways to adjust the prompting techniques used by your chatbot?'"]


FLARE RESULT:
  You can use a variety of prompting techniques to help your chatbot give more accurate instructions. For example, you can use multiple-rounds of few-shot prompting (denoted as “composite prompting”) to combine different formats of prompts for the same task. You can also use prompt tuning, which involves fine-tuning the prompts to better match the task. Finally, you can use in-domain examples taken from the GSM8K train set to engineer the prompts. 




-------------------

FLARE RESULT 3.5:
 You can try using decomposition prompts or least-to-most prompting to help your chatbot better understand the task and provide accurate instructions.  Additionally, you can also use errorless learning techniques to prevent your chatbot from giving incorrect instructions in the fut