<a href="https://colab.research.google.com/github/datastax/ragstack-ai/blob/main/examples/notebooks/FLARE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Forward-Looking Augmented REtrieval (FLARE)

FLARE is an advanced retrieval technique that combines retrieval and generation in LLMs. It enhances the accuracy of responses by iteratively predicting the upcoming sentence to anticipate future content when the model encounters a token it is uncertain about. See the [original repository](https://github.com/jzbjyb/FLARE/tree/main) for more detail. 

The basic workflow is: 
1. Send a query
2. The model generates tokens while iteratively predicting the upcoming sentence
3. If the model sees a token with a low confidence level, it uses the predicted sentence as a query to retrieve new, relevant documents
4. The upcoming sentence is regenerated using the retrieved documents
5. Repeat 2-4 until the response is complete

In this tutorial, you will use an Astra DB vector store, an OpenAI embedding model, an OpenAI LLM, and LangChain to orchestrate FLARE in a RAG pipeline. 

## Prerequisites

You will need a vector-enabled Astra database and an OpenAI Account.

* Create an [Astra vector database](https://docs.datastax.com/en/astra-serverless/docs/getting-started/create-db-choices.html).
* Create an [OpenAI account](https://openai.com/)
* Within your database, create an [Astra DB Access Token](https://docs.datastax.com/en/astra-serverless/docs/manage/org/manage-tokens.html) with Database Administrator permissions.
* Get your Astra DB Endpoint: 
  * `https://<ASTRA_DB_ID>-<ASTRA_DB_REGION>.apps.astra.datastax.com`

See the [Prerequisites](https://docs.datastax.com/en/ragstack/docs/prerequisites.html) page for more details.

## Setup
`ragstack-ai` includes all the packages you need to build a RAG pipeline. 

In [None]:
! pip install ragstack-ai

In [3]:
import os
from getpass import getpass

# Enter your settings for Astra DB and OpenAI:
os.environ["ASTRA_DB_ENDPOINT"] = input("Enter your Astra DB API Endpoint: ")
os.environ["ASTRA_DB_TOKEN"] = getpass("Enter your Astra DB Token: ")
os.environ["OPEN_AI_KEY"] = getpass("Enter your OpenAI API Key: ")

In [4]:
# The Collection is where documents are stored. ex: test
collection = "flare"

## Create RAG Pipeline

### Embedding Model and Vector Store

In [5]:
from langchain.vectorstores.astradb import AstraDB
from langchain.embeddings import OpenAIEmbeddings
import os

# Configure your embedding model and vector store
embedding = OpenAIEmbeddings()
vstore = AstraDB(
    collection_name=collection,
    embedding=embedding,
    token=os.environ["ASTRA_DB_TOKEN"],
    api_endpoint=os.environ["ASTRA_DB_ENDPOINT"],
)
print("Astra vector store configured")

Astra vector store configured


In [6]:
# Retrieve the text of a short story that will be indexed in the vector store
! curl https://raw.githubusercontent.com/CassioML/cassio-website/main/docs/frameworks/langchain/texts/amontillado.txt --output amontillado.txt
input = "amontillado.txt"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13022  100 13022    0     0  54469      0 --:--:-- --:--:-- --:--:-- 63213


In [7]:
from langchain.document_loaders import TextLoader

# Load your input and split it into documents
loader = TextLoader(input)
documents = loader.load_and_split()

In [8]:
# Create embeddings by inserting your documents into the vector store.
inserted_ids = vstore.add_documents(documents)
print(f"\nInserted {len(inserted_ids)} documents.")


Inserted 4 documents.


In [None]:
# Checks your collection to verify the documents are embedded.
print(vstore.astra_db.collection(collection).find())

### FLARE Chain

Using LangChain's FLARE chain with verbose on, we can see exactly what is happening under the hood.

In [9]:
from langchain.globals import set_verbose

set_verbose(True)

In [10]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import FlareChain

retriever = vstore.as_retriever()

flare = FlareChain.from_llm(
    # Note that FlareChain currently uses an internal gpt-3.5-turbo model that cannot be changed, so token limits are 4097.
    # https://github.com/langchain-ai/langchain/issues/10493
    llm=ChatOpenAI(temperature=0),
    retriever=retriever,
    max_generation_len=128,
    min_prob=0.2,
)

In [11]:
# Note that FLARE uses a larger prompt, so more complex questions that this may cause the token limit to be exceeded.
query = "Who is Luchesi in relation to Antonio?"
flare.run(query)



[1m> Entering new FlareChain chain...[0m
[36;1m[1;3mCurrent Response: [0m
Prompt after formatting:
[32;1m[1;3mRespond to the user message using any relevant context. If context is provided, you should ground your answer in that context. Once you're done responding return FINISHED.

>>> CONTEXT: 
>>> USER INPUT: Who is Luchesi in relation to Antonio?
>>> RESPONSE: [0m


[1m> Entering new QuestionGeneratorChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven a user input and an existing partial response as context, ask a question to which the answer is the given term/entity/phrase:

>>> USER INPUT: Who is Luchesi in relation to Antonio?
>>> EXISTING PARTIAL RESPONSE:  
Luchesi is a rival of Antonio's. He is a wealthy merchant who is jealous of Antonio's success. FINISHED

The question to which the answer is the term/entity/phrase " jealous of Antonio" is:[0m

[1m> Finished chain.[0m
[33;1m[1;3mGenerated Questions: ['Why is Luchesi considered a rival of Antonio?']

' Luchesi is a rival connoisseur of wine to Fortunato. He is mentioned by Montresor as someone who could distinguish Amontillado from Sherry, but not as well as Fortunato.\n\n'

### Cleanup

In [2]:
# WARNING: This will delete the collection and all documents in the collection
# vstore.delete_collection()

## What's Next

You now have a fully functioning RAG pipeline using the FLARE technique! FLARE is one of many ways to improve RAG. See our other [examples](https://docs.datastax.com/en/ragstack/docs/examples/index.html) for advanced RAG techniques, as well as evaluation examples that compare results using multiple RAG techniques. 