In [1]:
from langchain.document_loaders import TextLoader

# text to write to a local file
# taken from https://www.theverge.com/2023/3/14/23639313/google-ai-language-model-palm-api-challenge-openai
text = """Google opens up its AI language model PaLM to challenge OpenAI and GPT-3
Google is offering developers access to one of its most advanced AI language models: PaLM.
The search giant is launching an API for PaLM alongside a number of AI enterprise tools
it says will help businesses “generate text, images, code, videos, audio, and more from
simple natural language prompts.”

PaLM is a large language model, or LLM, similar to the GPT series created by OpenAI or
Meta’s LLaMA family of models. Google first announced PaLM in April 2022. Like other LLMs,
PaLM is a flexible system that can potentially carry out all sorts of text generation and
editing tasks. You could train PaLM to be a conversational chatbot like ChatGPT, for
example, or you could use it for tasks like summarizing text or even writing code.
(It’s similar to features Google also announced today for its Workspace apps like Google
Docs and Gmail.)
"""

# write text to local file
with open("my_file.txt", "w") as file:
    file.write(text)

# use TextLoader to load text from local file
loader = TextLoader("my_file.txt")
docs_from_file = loader.load()

print(len(docs_from_file))

1


Then, we use CharacterTextSplitter to split the docs into texts.

In [2]:
from langchain.text_splitter import CharacterTextSplitter

# create a text splitter
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=20)

# split documents into chunks
docs = text_splitter.split_documents(docs_from_file)

print(len(docs))

Created a chunk of size 373, which is longer than the specified 200


2


In [3]:
from langchain_community.embeddings import OllamaEmbeddings

embeddings = OllamaEmbeddings(model="llama2")

In [None]:
embeddings.embed_query("example")

In [6]:
from langchain.vectorstores import DeepLake

# Before executing the following code, make sure to have your
# Activeloop key saved in the “ACTIVELOOP_TOKEN” environment variable. 
# or pass it directly as a third argument token 


# create Deep Lake dataset
# TODO: use your organization id here. (by default, org id is your username)
my_activeloop_org_id = "georgeasro"
my_activeloop_dataset_name = "langchain_course_indexes"
dataset_path = f"hub://{my_activeloop_org_id}/{my_activeloop_dataset_name}"

api_token = ""
db = DeepLake(dataset_path=dataset_path, embedding_function=embeddings, token = api_token)

# add documents to our Deep Lake dataset
db.add_documents(docs)

Using embedding function is deprecated and will be removed in the future. Please use embedding instead.


Your Deep Lake dataset has been successfully created!


Creating 2 embeddings in 1 batches of size 2:: 100%|██████████| 1/1 [00:30<00:00, 30.44s/it]

Dataset(path='hub://georgeasro/langchain_course_indexes', tensors=['text', 'metadata', 'embedding', 'id'])

  tensor      htype      shape     dtype  compression
  -------    -------    -------   -------  ------- 
   text       text      (2, 1)      str     None   
 metadata     json      (2, 1)      str     None   
 embedding  embedding  (2, 4096)  float32   None   
    id        text      (2, 1)      str     None   





['bb470be6-ce6f-11ee-a8ec-00155d99c3d9',
 'bb470c5e-ce6f-11ee-a8ec-00155d99c3d9']

Once we have the retriever, we can start with question-answering.

In [7]:
# create retriever from db
retriever = db.as_retriever()

In [8]:
from langchain.chains import RetrievalQA
from langchain_community.llms import Ollama

# create a retrieval chain
qa_chain = RetrievalQA.from_chain_type(
	llm=Ollama(model="llama2"),
	chain_type="stuff",
	retriever=retriever
)

In [9]:
query = "How Google plans to challenge OpenAI?"
response = qa_chain.run(query)
print(response)

  warn_deprecated(


Based on the context provided, it appears that Google plans to challenge OpenAI by offering developers access to one of its most advanced AI language models, PaLM, through an API. This allows businesses to generate text, images, code, videos, audio, and more from simple natural language prompts. While PaLM is similar to the GPT series created by OpenAI and Meta's LLaMA family of models, it is a flexible system that can potentially carry out all sorts of text generation and editing tasks, such as conversational chatbot tasks or summarizing text. This challenge to OpenAI's dominance in the AI language model space could lead to increased competition and innovation in the field.


### What occurred behind the scenes?
Initially, we employed a so-called "stuff chain" (refer to CombineDocuments Chains). Stuffing is one way to supply information to the LLM. Using this technique, we "stuff" all the information into the LLM's prompt. However, this method is only effective with shorter documents, as most LLMs have a context length limit.

Additionally, a similarity search is conducted using the embeddings to identify matching documents to be used as context for the LLM. Although it might not seem particularly useful with just one document, we are effectively working with multiple documents since we "chunked" our text. Preselecting the most suitable documents based on semantic similarity enables us to provide the model with meaningful knowledge through the prompt while remaining within the allowed context size.

## A Potential Problem
This method has a downside: you might not know how to get the right documents later when storing data. In the Q&A example, we cut the text into equal parts, causing both useful and useless text to show up when a user asks a question.

Including unrelated information in the LLM prompt is detrimental because:

It can divert the LLM's focus from pertinent details.
It occupies valuable space that could be utilized for more relevant information.

## Possible Solution
A **DocumentCompressor** abstraction has been introduced to address this issue, allowing compress_documents on the retrieved documents.

The **ContextualCompressionRetriever** is a wrapper around another retriever in LangChain. It takes a base retriever and a DocumentCompressor and automatically compresses the retrieved documents from the base retriever. This means that only the most relevant parts of the retrieved documents are returned, given a specific query.

A popular compressor choice is the **LLMChainExtractor**, which uses an LLMChain to extract only the statements relevant to the query from the documents. To improve the retrieval process, a ContextualCompressionRetriever is used, wrapping the base retriever with an LLMChainExtractor. The LLMChainExtractor iterates over the initially returned documents and extracts only the content relevant to the query. 

Here's an example of how to use ContextualCompressionRetriever with LLMChainExtractor:

In [10]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

# create GPT3 wrapper
llm = Ollama(model="llama2", temperature=0)

# create compressor for the retriever
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
	base_compressor=compressor,
	base_retriever=retriever
)

In [11]:
# retrieving compressed documents
retrieved_docs = compression_retriever.get_relevant_documents(
	"How Google plans to challenge OpenAI?"
)
print(retrieved_docs[0].page_content)



The following parts of the context are relevant to the question:

* Google's AI language model PaLM being made available to challenge OpenAI and GPT-3.
* Google launching an API for PaLM alongside AI enterprise tools.
* The ability of PaLM to generate text, images, code, videos, audio, and more from simple natural language prompts.
