# LangChain Multi-Query for RAG

## Introduction
One of the advanced features in LangChain is Multi-Query for RAG, which allows for more efficient and effective information retrieval from large datasets.

## What is Multi-Query for RAG?

Multi-Query for RAG is a technique used to enhance the retrieval process in retrieval-augmented generation systems. It involves generating multiple queries from a single input and using these queries to retrieve a more diverse and relevant set of documents from the database. This approach improves the quality of the retrieved information, leading to better and more accurate generated responses.

## Key Components

1. **Query Generation**: Instead of generating a single query, the system creates multiple queries based on different aspects or perspectives of the input. These queries can be variations or expansions of the original input.

2. **Document Retrieval**: Each generated query is used to search the indexed documents. The results from all queries are combined to form a more comprehensive set of retrieved documents.

3. **Response Generation**: The retrieved documents are then used to generate the final response. By incorporating information from multiple queries, the generated response is more likely to be accurate and relevant.

## Benefits of Multi-Query for RAG

- **Improved Diversity**: Multiple queries increase the chances of retrieving diverse pieces of information, covering different aspects of the input topic.
- **Enhanced Relevance**: Combining results from multiple queries can lead to a more relevant and contextually appropriate set of documents.
- **Robustness**: Multi-query retrieval is more robust to variations and ambiguities in the input, as different queries can capture different interpretations.

## Example Workflow

1. **Input Processing**: The user provides an input query.
2. **Query Generation**: The system generates multiple queries from the input.
3. **Document Retrieval**: Each query is used to search the document index, and the results are combined.
4. **Response Generation**: The retrieved documents are used by the language model to generate the final response.

<div>
<img src="https://education-team-2020.s3.eu-west-1.amazonaws.com/ai-eng/images-langchain-4/muti.avif" alt='auto' width="1000"/>
</div>

## Example Code

Here is a simplified example of how Multi-Query for RAG might be implemented using LangChain:

```python
from langchain.chains import MultiQueryRAGChain
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.indexes import VectorIndex

# Define the base prompt template
base_prompt_template = PromptTemplate(
    input_variables=["input_text"],
    template="Generate multiple queries from the following input: {input_text}"
)

# Initialize the language model
llm = OpenAI(model="text-davinci-003")

# Create the Multi-Query RAG Chain
multi_query_chain = MultiQueryRAGChain(
    prompt=base_prompt_template,
    llm=llm,
    index=VectorIndex()  # Assuming VectorIndex is already populated with documents
)

# Input text
input_text = "Tell me about the history of artificial intelligence."

# Run the chain and get the response
response = multi_query_chain.run({"input_text": input_text})
print(response)


## Getting Data

We will download an existing dataset from Hugging Face Datasets.

In [8]:
from datasets import load_dataset

data = load_dataset("data", split="train")
data

Dataset({
    features: ['text'],
    num_rows: 1
})

In [9]:
import os
import json
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load the GPT-2 model and tokenizer
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Add a padding token
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
model.resize_token_embeddings(len(tokenizer))

def read_files_in_directory(directory):
    data = []
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith('.txt'):
                file_path = os.path.join(root, file)
                with open(file_path, 'r', encoding='utf-8') as f:
                    content = f.read()
                    data.append({'file_name': file, 'content': content})
    return data

def summarize_text(text):
    # Preprocess the text for GPT-2
    input_text = "summarize: " + text
    inputs = tokenizer.encode_plus(input_text, return_tensors="pt", max_length=512, truncation=True, padding='max_length')
    
    # Get input_ids and attention_mask
    input_ids = inputs["input_ids"]
    attention_mask = inputs["attention_mask"]
    
    # Generate the summary
    summary_ids = model.generate(
        input_ids, 
        attention_mask=attention_mask, 
        max_length=150, 
        min_length=40, 
        length_penalty=2.0, 
        num_beams=4, 
        early_stopping=True, 
        pad_token_id=tokenizer.pad_token_id
    )
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    
    return summary

def extract_data(data):
    summaries = []
    for item in data:
        summary = summarize_text(item['content'])
        summaries.append({'file_name': item['file_name'], 'summary': summary})
    return summaries

base_directory = 'data'
directories = ['landmarks', 'municipalities', 'news/elmundo_chunked_en_page1_15years', 'news/elmundo_chunked_es_page1_15years', 'news/elmundo_chunked_es_page1_40years']

data_repository = {}

# Process each directory
for dir_name in directories:
    full_path = os.path.join(base_directory, dir_name)
    dir_data = read_files_in_directory(full_path)
  

# Save the data repository to a JSON file
with open('puerto_rico_data_repository.json', 'w', encoding='utf-8') as f:
    json.dump(data_repository, f, ensure_ascii=False, indent=4)

print('Data repository created successfully.')


The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`


Data repository created successfully.


In [5]:
import os
import json
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load the GPT-2 model and tokenizer
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

# Add a padding token
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
model.resize_token_embeddings(len(tokenizer))

def read_files_in_directory(directory):
    data = []
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith('.txt'):
                file_path = os.path.join(root, file)
                with open(file_path, 'r', encoding='utf-8') as f:
                    content = f.read()
                    data.append({'file_name': file, 'content': content})
    return data

def summarize_text(text):
    # Preprocess the text for GPT-2
    input_text = "summarize: " + text
    inputs = tokenizer.encode_plus(input_text, return_tensors="pt", max_length=512, truncation=True, padding='max_length')
    
    # Get input_ids and attention_mask
    input_ids = inputs["input_ids"]
    attention_mask = inputs["attention_mask"]
    
    # Generate the summary
    summary_ids = model.generate(
        input_ids, 
        attention_mask=attention_mask, 
        max_length=150, 
        min_length=40, 
        length_penalty=2.0, 
        num_beams=4, 
        early_stopping=True, 
        pad_token_id=tokenizer.pad_token_id
    )
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    
    return summary

def extract_data(data):
    summaries = []
    for item in data:
        summary = summarize_text(item['content'])
        summaries.append({'file_name': item['file_name'], 'summary': summary})
    return summaries

base_directory = 'data'
directories = ['landmarks', 'municipalities', 'news/elmundo_chunked_en_page1_15years', 'news/elmundo_chunked_es_page1_15years', 'news/elmundo_chunked_es_page1_40years']

data_repository = {}

# Process each directory
for dir_name in directories:
    full_path = os.path.join(base_directory, dir_name + '_small')

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`


In [8]:
from langchain.docstore.document import Document

docs = []

for row in data:
    doc = Document(
        page_content=row["text"],
        metadata={
            "title": row["title"],
            "source": row["source"],
            "id": row["id"],
            "chunk-id": row["chunk-id"],
            "text": row["chunk"]
        }
    )
    docs.append(doc)

KeyError: 'title'

In [10]:
docs = []

for row in data:
    title = row.get("title", "No Title")  # Provide a default value if 'title' key is missing
    source = row.get("source", "No Source")
    doc_id = row.get("id", "No ID")
    chunk_id = row.get("chunk-id", "No Chunk ID")
    text = row.get("text", "No Text")

    doc = Document(
        page_content=text,
        metadata={
            "title": title,
            "source": source,
            "id": doc_id,
            "chunk-id": chunk_id,
            "text": text
        }
    )
    docs.append(doc)


NameError: name 'Document' is not defined

In [None]:
from dotenv import load_dotenv, find_dotenv
import os
_ = load_dotenv(find_dotenv())

OPENAI_API_KEY  = os.getenv('OPENAI_API_KEY')
HUGGINGFACEHUB_API_TOKEN = os.getenv('HUGGINGFACEHUB_API_TOKEN')

# initialize connection to pinecone (get API key at app.pinecone.io)
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY") or "YOUR_API_KEY"

## Embedding and Vector DB Setup

Initialize our embedding model:

In [None]:
import os
from getpass import getpass
from langchain.embeddings.openai import OpenAIEmbeddings

model_name = "text-embedding-ada-002"

# Get the OpenAI API key
OPENAI_API_KEY = getpass("Enter your OpenAI API key: ")
# Initialize the OpenAI embeddings

embed = OpenAIEmbeddings(
    model=model_name, openai_api_key=OPENAI_API_KEY, disallowed_special=()
)

Now we create our vector DB to store our vectors. For this we need to get a [free Pinecone API key](https://app.pinecone.io) — the API key can be found in the "API Keys" button found in the left navbar of the Pinecone dashboard.

In [2]:
!pip install pinecone

Collecting pinecone
  Using cached pinecone-6.0.1-py3-none-any.whl.metadata (8.8 kB)
Collecting pinecone-plugin-interface<0.0.8,>=0.0.7 (from pinecone)
  Using cached pinecone_plugin_interface-0.0.7-py3-none-any.whl.metadata (1.2 kB)
Using cached pinecone-6.0.1-py3-none-any.whl (421 kB)
Using cached pinecone_plugin_interface-0.0.7-py3-none-any.whl (6.2 kB)
Installing collected packages: pinecone-plugin-interface, pinecone
Successfully installed pinecone-6.0.1 pinecone-plugin-interface-0.0.7


In [3]:
from pinecone import Pinecone
# configure client
pc = Pinecone(api_key="pcsk_2rpXkM_9sCN5zHBLDDw7mBs4LAWwfaWdtRnJSeQ7Szvjij5YeVyjHKw2okL4pogSrt7u7C")


DeprecatedPluginError: The `pinecone-plugin-inference` package has been deprecated. The features from that plugin have been incorporated into the main `pinecone` package with no need for additional plugins. Please remove the `pinecone-plugin-inference` package from your dependencies to ensure you have the most up-to-date version of these features.

Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects).

In [None]:
from pinecone import ServerlessSpec

spec = ServerlessSpec(
    cloud="aws", region="us-east-1"
)


DeprecatedPluginError: The `pinecone-plugin-inference` package has been deprecated. The features from that plugin have been incorporated into the main `pinecone` package with no need for additional plugins. Please remove the `pinecone-plugin-inference` package from your dependencies to ensure you have the most up-to-date version of these features.

Creating an index, we set `dimension` equal to to dimensionality of Ada-002 (`1536`), and use a `metric` also compatible with Ada-002 (this can be either `cosine` or `dotproduct`). We also pass our `spec` to index initialization.

In [None]:
import time

index_name = "langchain-multi-query-demo"
existing_indexes = [
    index_info["name"] for index_info in pc.list_indexes()
]

# check if index already exists (it shouldn't if this is first time)
if index_name not in existing_indexes:
    # if does not exist, create index
    pc.create_index(
        index_name,
        dimension=1536,  # dimensionality of ada 002
        metric='dotproduct',
        spec=spec
    )
    # wait for index to be initialized
    while not pc.describe_index(index_name).status['ready']:
        time.sleep(1)

# connect to index
index = pc.Index(index_name)
time.sleep(1)
# view index stats
index.describe_index_stats()

NameError: name 'pc' is not defined

Populate our index:

In [None]:
len(docs)

NameError: name 'docs' is not defined

In [8]:
# if you want to speed things up to follow along
#docs = docs[:5000]

In [43]:
pip install pinecone --upgrade

Note: you may need to restart the kernel to use updated packages.


In [3]:
import os
import getpass
from pinecone import Pinecone, IndexSpec

# Prompt for the Pinecone API key securely
api_key = getpass.getpass("Enter your Pinecone API key: ")

# Use the Pinecone API key
pc = Pinecone(
    api_key=api_key
)

# Now do stuff
if 'my_index' not in pc.list_indexes().names():
    spec = IndexSpec(
        dimension=128,
        metric='cosine'  # Change to your desired metric
    )
    pc.create_index(name='my_index', spec=spec)

# Perform other operations


DeprecatedPluginError: The `pinecone-plugin-inference` package has been deprecated. The features from that plugin have been incorporated into the main `pinecone` package with no need for additional plugins. Please remove the `pinecone-plugin-inference` package from your dependencies to ensure you have the most up-to-date version of these features.

In [30]:
import os
import json
from transformers import GPT2LMHeadModel, GPT2Tokenizer
from pinecone import Index, init

# Initialize Pinecone
init(api_key="your_pinecone_api_key")
index = Index("your_index_name")

# Load the GPT-2 model and tokenizer
model_name = "gpt2"
model = GPT2LMHeadModel.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)

def read_files_in_directory(directory):
    data = []
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith('.txt'):
                file_path = os.path.join(root, file)
                with open(file_path, 'r', encoding='utf-8') as f:
                    content = f.read()
                    data.append({'file_name': file, 'content': content})
    return data

def summarize_text(text):
    # Preprocess the text for GPT-2
    input_text = "summarize: " + text
    inputs = tokenizer.encode_plus(input_text, return_tensors="pt", max_length=512, truncation=True, padding='max_length')
    
    # Get input_ids and attention_mask
    input_ids = inputs["input_ids"]
    attention_mask = inputs["attention_mask"]
    
    # Generate the summary
    summary_ids = model.generate(
        input_ids, 
        attention_mask=attention_mask, 
        max_length=150, 
        min_length=40, 
        length_penalty=2.0, 
        num_beams=4, 
        early_stopping=True, 
        pad_token_id=tokenizer.eos_token_id
    )
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    
    return summary

def extract_data(data):
    summaries = []
    for item in data:
        summary = summarize_text(item['content'])
        summaries.append({'file_name': item['file_name'], 'summary': summary})
    return summaries

def trim_metadata(metadata, max_size=40960):
    """Trim metadata to ensure it fits within the specified max size."""
    metadata_str = json.dumps(metadata)
    if len(metadata_str) <= max_size:
        return metadata
    # Trim the metadata if it exceeds the max size
    trimmed_metadata = {}
    for key, value in metadata.items():
        if len(json.dumps(trimmed_metadata)) + len(json.dumps({key: value})) > max_size:
            break
        trimmed_metadata[key] = value
    return trimmed_metadata

base_directory = 'data'
directories = ['landmarks', 'municipalities', 'news/elmundo_chunked_en_page1_15years', 'news/elmundo_chunked_es_page1_15years', 'news/elmundo_chunked_es_page1_40years']

data_repository = {}

# Process each directory
for dir_name in directories:
    full_path = os.path.join(base_directory, dir_name)
    dir_data = read_files_in_directory(full_path)
    summaries = extract_data(dir_data)
    data_repository[dir_name] = summaries

# Save the data repository to a JSON file
with open('puerto_rico_data_repository.json', 'w', encoding='utf-8') as f:
    json.dump(data_repository, f, ensure_ascii=False, indent=4)

# Upsert data to Pinecone
for dir_name, summaries in data_repository.items():
    ids = [summary['file_name'] for summary in summaries]
    embeds = [embed(summary['summary']) for summary in summaries]  # Replace with your embedding function
    metadata = [trim_metadata(summary) for summary in summaries]
    to_upsert = zip(ids, embeds, metadata)
    index.upsert(vectors=to_upsert)

print('Data repository created and upserted to Pinecone successfully.')


AttributeError: init is no longer a top-level attribute of the pinecone package.

Please create an instance of the Pinecone class instead.

Example:

    import os
    from pinecone import Pinecone, ServerlessSpec

    pc = Pinecone(
        api_key=os.environ.get("PINECONE_API_KEY")
    )

    # Now do stuff
    if 'my_index' not in pc.list_indexes().names():
        pc.create_index(
            name='my_index', 
            dimension=1536, 
            metric='euclidean',
            spec=ServerlessSpec(
                cloud='aws',
                region='us-west-2'
            )
        )



In [29]:
from tqdm.auto import tqdm

batch_size = 100

for i in tqdm(range(0, len(docs), batch_size)):
    i_end = min(len(docs), i+batch_size)
    docs_batch = docs[i:i_end]
    # get IDs
    ids = [f"{doc.metadata['id']}-{doc.metadata['chunk-id']}" for doc in docs_batch]
    # get text and embed
    texts = [d.page_content for d in docs_batch]
    embeds = embed.embed_documents(texts=texts)
    # get metadata
    metadata = [d.metadata for d in docs_batch]
    to_upsert = zip(ids, embeds, metadata)
    index.upsert(vectors=to_upsert)

  0%|          | 0/1 [00:00<?, ?it/s]

PineconeApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Date': 'Tue, 11 Feb 2025 18:29:01 GMT', 'Content-Type': 'application/json', 'Content-Length': '116', 'Connection': 'keep-alive', 'x-pinecone-request-latency-ms': '328', 'x-pinecone-request-id': '8679767041873157683', 'x-envoy-upstream-service-time': '40', 'server': 'envoy'})
HTTP response body: {"code":3,"message":"Metadata size is 195140 bytes, which exceeds the limit of 40960 bytes per vector","details":[]}


## Multi-Query with LangChain

Now we switch across to using our populated index as a vectorstore in Langchain.

In [18]:
from langchain.vectorstores import Pinecone

text_field = "text"

vectorstore = Pinecone(index, embed.embed_query, text_field)

  vectorstore = Pinecone(index, embed.embed_query, text_field)


In [19]:
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)

  llm = ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)


We initialize the `MultiQueryRetriever`:

In [20]:
from langchain.retrievers.multi_query import MultiQueryRetriever

retriever = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(), llm=llm
)

We set logging so that we can see the queries as they're generated by our LLM.

In [21]:
# Set logging for the queries
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

To query with our multi-query retriever we call the `get_relevant_documents` method.

In [23]:
question = "tell me about Puerto Rico?"

docs = retriever.get_relevant_documents(query=question)
len(docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What information can you provide about Puerto Rico?', 'What are some details about Puerto Rico that you can share?', 'Can you give me an overview of Puerto Rico?']


0

From this we get a variety of docs retrieved by each of our queries independently. By default the `retriever` is returning `3` docs for each query — totalling `9` documents — however, as there is some overlap we actually return `6` unique docs.

In [25]:
docs

[]

## Adding the Generation in RAG

So far we've built a multi-query powered **R**etrieval **A**ugmentation chain. Now, we need to add **G**eneration.

In [16]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

QA_PROMPT = PromptTemplate(
    input_variables=["query", "contexts"],
    template="""You are a helpful assistant who answers user queries using the
    contexts provided. If the question cannot be answered using the information
    provided say "I don't know".

    Contexts:
    {contexts}

    Question: {query}""",
)

# Chain
qa_chain = LLMChain(llm=llm, prompt=QA_PROMPT)

In [17]:
out = qa_chain(
    inputs={
        "query": question,
        "contexts": "\n---\n".join([d.page_content for d in docs])
    }
)
out["text"]

'Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. The fine-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc, are optimized for dialogue use cases. These models outperform open-source chat models on most benchmarks and are considered a suitable substitute for closed-source models based on humane evaluations for helpfulness and safety. The approach to fine-tuning and safety is described in detail.'

## Chaining Everything with a SequentialChain

We can pull together the logic above into a function or set of methods, whatever is prefered — however if we'd like to use LangChain's approach to this we must "chain" together multiple chains. The first retrieval component is (1) not a chain per se, and (2) requires processing of the output. To do that, and fit with LangChain's "chaining chains" approach, we setup the _retrieval_ component within a `TransformChain`:

In [18]:
from langchain.chains import TransformChain

def retrieval_transform(inputs: dict) -> dict:
    docs = retriever.get_relevant_documents(query=inputs["question"])
    docs = [d.page_content for d in docs]
    docs_dict = {
        "query": inputs["question"],
        "contexts": "\n---\n".join(docs)
    }
    return docs_dict

retrieval_chain = TransformChain(
    input_variables=["question"],
    output_variables=["query", "contexts"],
    transform=retrieval_transform
)

Now we chain this with our generation step using the `SequentialChain`:

In [19]:
from langchain.chains import SequentialChain

rag_chain = SequentialChain(
    chains=[retrieval_chain, qa_chain],
    input_variables=["question"],  # we need to name differently to output "query"
    output_variables=["query", "contexts", "text"]
)

Then we perform the full RAG pipeline:

In [20]:
out = rag_chain({"question": question})
out["text"]

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What information can you provide about llama 2?', '2. Could you give me some details about llama 2?', '3. I would like to learn more about llama 2. Can you help me with that?']


'Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. These LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc, are optimized for dialogue use cases. They have been shown to outperform open-source chat models on most benchmarks and are considered a suitable substitute for closed-source models based on humane evaluations for helpfulness and safety. The approach to fine-tuning and safety is described in detail in the work.'

---

## Custom Multiquery

We'll try this with two prompts, both encourage more variety in search queries.

**Prompt A**
```
Your task is to generate 3 different search queries that aim to
answer the user question from multiple perspectives.
Each query MUST tackle the question from a different viewpoint,
we want to get a variety of RELEVANT search results.
Provide these alternative questions separated by newlines.
Original question: {question}
```


**Prompt B**
```
Your task is to generate 3 different search queries that aim to
answer the user question from multiple perspectives. The user questions
are focused on Large Language Models, Machine Learning, and related
disciplines.
Each query MUST tackle the question from a different viewpoint, we
want to get a variety of RELEVANT search results.
Provide these alternative questions separated by newlines.
Original question: {question}
```

In [1]:
pip install openai==0.28

Note: you may need to restart the kernel to use updated packages.


In [12]:
from typing import List
from langchain.chains import LLMChain
from pydantic import BaseModel, Field
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser


# Output parser will split the LLM result into a list of queries
class LineList(BaseModel):
    # "lines" is the key (attribute name) of the parsed output
    lines: List[str] = Field(description="Lines of text")


class LineListOutputParser(PydanticOutputParser):
    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = text.strip().split("\n")
        return LineList(lines=lines)


output_parser = LineListOutputParser()

template = """
Your task is to generate 3 different search queries that aim to
answer the user question from multiple perspectives. The user questions
are focused on traveling to Puerto Rico and visiting the most iconic places in the island. You must make sure to provide a variety of search queries that cover different
options regarding the user itinerary.
Each query MUST tackle the question from a different viewpoint, we
want to get a variety of RELEVANT search results.
Provide these alternative questions separated by newlines.
Original question: {question}
"""

QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template=template,
)
from abc import ABC, abstractmethod

class Runnable(ABC):
    @abstractmethod
    def run(self, input_data):
        pass

import openai

class ChatOpenAI(Runnable):
    def __init__(self, temperature, openai_api_key):
        self.temperature = temperature
        self.openai_api_key = openai_api_key

    def run(self, input_data):
        openai.api_key = self.openai_api_key
        response = openai.Completion.create(
            engine="davinci",
            prompt=input_data,
            max_tokens=100
        )
        return response.choices[0].text.strip()

user_input = "Tell me about Puerto Rico?"
# New code to replace the use of openai.Completion.create
openai.api_key = "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" 
# Generate a response using the travel agent engine model
response = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Welcome to the Travel Agent Bot. I can help you plan your trip to Puerto Rico."},
        {"role": "user", "content": user_input}
    ]
)

# Extract the generated response from the API call
generated_response = response['choices'][0]['message']['content']
print(generated_response)


# Create an instance of ChatOpenAI and pass the user input to run the chatbot
#llm = ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
#llm.run(user_input)


# Now llm can be used as an instance of Runnable in LLMChain
#llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)

Puerto Rico is a vibrant Caribbean island full of rich culture, stunning landscapes, and a blend of historical and modern attractions. Here are some key points that might interest you:

1. **Geography and Climate**: Puerto Rico is located in the northeastern Caribbean Sea. It boasts beautiful beaches, lush rainforests, and mountainous regions. The climate is tropical, with warm temperatures year-round, making it a perfect destination for sun-seekers and outdoor enthusiasts.

2. **Culture and History**: The island has a rich cultural heritage influenced by its Taíno, African, and Spanish roots. You'll find evidence of this diverse history in its music, dance, festivals, and cuisine. Historical sites such as Old San Juan, with its colonial architecture and forts like Castillo San Felipe del Morro, are must-see attractions.

3. **Language**: Spanish and English are the official languages, with Spanish being the most widely spoken. English is commonly spoken in tourist areas.

4. **Cuisine

In [1]:
# Run
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.vectorstores import Pinecone
import vectorstore
from langchain.chains import LLMChain
from langchain.chains import TransformChain
from langchain.chains import SequentialChain

from vectorstore import Retriever
retriever = Retriever(llm_chain=llm_chain, parser_key="lines")
retriever = MultiQueryRetriever(
   #retriever=vectorstore.as_retriever(), llm_chain=llm_chain, parser_key="lines"
   retriever = vectorstore.Retriever(llm_chain=llm_chain, parser_key="lines")
)  # "lines" is the key (attribute name) of the parsed output

# Results
docs = retriever.get_relevant_documents(
    query=question
)
len(docs)

ImportError: cannot import name 'Retriever' from 'vectorstore' (c:\Users\Latif-Calderón\anaconda3\Lib\site-packages\vectorstore\__init__.py)

In [23]:
docs

[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\nSergey Edunov Thomas Scialom\x03\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'chunk-id': '1', 'id': '2307.09288', 'source': 'http://arxiv.org/pdf/2307.09288', 'title': '

Putting this together in another `SequentialChain`:

In [2]:
retrieval_chain = TransformChain(
    input_variables=["question"],
    output_variables=["query", "contexts"],
    transform=retrieval_transform
)

rag_chain = SequentialChain(
    chains=[retrieval_chain, qa_chain],
    input_variables=["question"],  # we need to name differently to output "query"
    output_variables=["query", "contexts", "text"]
)

NameError: name 'retrieval_transform' is not defined

And asking again:

In [3]:
out = rag_chain({"question": question})
out["text"]

NameError: name 'rag_chain' is not defined

After finishing, delete your Pinecone index to save resources:

In [26]:
pc.delete_index(index_name)

---