## Key concepts
<ol>
    <li>An index is a powerful data structure that meticulously organizes and stores documents to enable efficient searching, while a retriever harnesses the index to locate and return pertinent documents in response to user queries. Within LangChain, the primary index types are centered on vector databases, with embeddings-based indexes being the most prevalent.</li>
    <li>
Retrievers focus on extracting relevant documents to merge with prompts for language models. A retriever exposes a get_relevant_documents method, which accepts a query string as input and returns a list of related documents.</li>

In [1]:
from langchain.llms import OpenAI



Extracted from <a href="https://en.wikipedia.org/wiki/J._Robert_Oppenheimer"> Robert oppenheimer's webpage and condensed </a>

# Text Splitting

#### To process large datasets through indexes and retrievers, we have to first split the text into more manageable sizes. We cannot tokenize / feed extremely large datasets to the model in one shot.

<b> We split the text into more manageable pieces of text since all llms have a maximum limit on the number of 'tokens' (similar but not exactly equal to number of words) that they can process in one go. In case, more tokens are provided than can be processed, the LLM will throw an error which looks like:<br>
`error: {
  message: "This model's maximum context length is 4097 tokens, however you requested 5360 tokens (1360 in your prompt; 4000 for the completion). Please reduce your prompt; or completion length.",`<br>
 `type: 'invalid_request_error',`<br>
  `param: null,`<br>
  `code: null`<br>
`}`

## Create split text from plain text

In [2]:
text='''Julius Robert Oppenheimer (April 22, 1904 – February 18, 1967) was an American theoretical physicist and director of the Manhattan Project's Los Alamos Laboratory during World War II. He is often called the "father of the atomic bomb".

Born in New York City, Oppenheimer earned a bachelor of arts degree in chemistry from Harvard University in 1925 and a doctorate in physics from the University of Göttingen in Germany in 1927, where he studied under Max Born. After research at other institutions, he joined the physics department at the University of California, Berkeley, where he became a full professor in 1936. He made significant contributions to theoretical physics, including achievements in quantum mechanics and nuclear physics such as the Born–Oppenheimer approximation for molecular wave functions, work on the theory of electrons and positrons, the Oppenheimer–Phillips process in nuclear fusion, and the first prediction of quantum tunneling. With his students, he also made contributions to the theory of neutron stars and black holes, quantum field theory, and the interactions of cosmic rays.

In 1942, Oppenheimer was recruited to work on the Manhattan Project, and in 1943 he was appointed director of the project's Los Alamos Laboratory in New Mexico, tasked with developing the first nuclear weapons. His leadership and scientific expertise were instrumental in the project's success. On July 16, 1945, he was present at the first test of the atomic bomb, Trinity. In August 1945, the weapons were used against Japan in the bombings of Hiroshima and Nagasaki, the only use to date of nuclear weapons in an armed conflict.

In 1947, Oppenheimer became the director of the Institute for Advanced Study in Princeton, New Jersey, and chaired the influential General Advisory Committee of the newly created U.S. Atomic Energy Commission. He lobbied for international control of nuclear power to avert nuclear proliferation and a nuclear arms race with the Soviet Union. He opposed the development of the hydrogen bomb during a 1949–1950 governmental debate on the question and subsequently took positions on defense-related issues that provoked the ire of some U.S. government and military factions. During the Second Red Scare, Oppenheimer's stances, together with his past associations with the Communist Party USA, led to the revocation of his security clearance following a 1954 security hearing. This effectively ended his access to the government's atomic secrets and thus his career as a nuclear physicist. Stripped also of his direct political influence, Oppenheimer continued to lecture, write, and work in physics. In 1963, he was awarded the Enrico Fermi Award as a gesture of political rehabilitation. He died four years later of throat cancer. In 2022, the federal government vacated the 1954 revocation of Oppenheimer's security clearance.


Manhattan Project
Los Alamos
On October 9, 1941, two months before the United States entered World War II, President Franklin D. Roosevelt approved a crash program to develop an atomic bomb. Lawrence brought Oppenheimer into the project on October 21. On May 18, 1942, [105] National Defense Research Committee Chairman James B. Conant, who had been one of Oppenheimer's lecturers at Harvard, asked Oppenheimer to take over work on fast neutron calculations, a task Oppenheimer threw himself into with full vigor. He was given the title "Coordinator of Rapid Rupture", which specifically referred to the propagation of a fast neutron chain reaction in an atomic bomb. One of his first acts was to host a summer school for bomb theory in Berkeley. The mix of European physicists and his own students—a group including Robert Serber, Emil Konopinski, Felix Bloch, Hans Bethe and Edward Teller—kept themselves busy by calculating what needed to be done, and in what order, to make the bomb.

In June 1942, the U.S. Army established the Manhattan Engineer District to handle its part in the atom bomb project, beginning the process of transferring responsibility from the Office of Scientific Research and Development to the military.In September, Brigadier General Leslie R. Groves, Jr., was appointed director of what became known as the Manhattan Project. By October 12, 1942, Groves and Oppenheimer had decided that for security and cohesion, they needed to establish a centralized, secret research laboratory in a remote location.

Groves selected Oppenheimer to head the project's secret weapons laboratory, although it is not known precisely when. This decision surprised many, because Oppenheimer had left-wing political views and no record as a leader of large projects. Groves worried that because Oppenheimer did not have a Nobel Prize, he might not have had the prestige to direct fellow scientists, but Groves was impressed by Oppenheimer's singular grasp of the practical aspects of the project and by the breadth of his knowledge. As a military engineer, Groves knew that this would be vital in an interdisciplinary project that would involve not just physics but also chemistry, metallurgy, ordnance, and engineering. Groves also detected in Oppenheimer something that many others did not, an "overweening ambition", which Groves reckoned would supply the drive necessary to push the project to a successful conclusion. Oppenheimer's past associations were not overlooked, but on July 20, 1943, Groves directed that he receive a security clearance "without delay irrespective of the information which you have concerning Mr Oppenheimer. He is absolutely essential to the project."Rabi considered Oppenheimer's appointment "a real stroke of genius on the part of General Groves, who was not generally considered to be a genius".

Oppenheimer favored a location for the laboratory in New Mexico, not far from his ranch. On November 16, 1942, he, Groves and others toured a prospective site. Oppenheimer feared that the high cliffs surrounding it would feel claustrophobic, and there was concern about possible flooding. He then suggested a site he knew well: a flat mesa near Santa Fe, New Mexico, which was the site of a private boys' school, the Los Alamos Ranch School. The engineers were concerned about the poor access road and the water supply but otherwise felt that it was ideal. The Los Alamos Laboratory was built on the site of the school, taking over some of its buildings, while many new buildings were erected in great haste. At the laboratory, Oppenheimer assembled a group of the top physicists of the time, whom he called the "luminaries".

Los Alamos was initially supposed to be a military laboratory, and Oppenheimer and other researchers were to be commissioned into the Army. He went so far as to order himself a lieutenant colonel's uniform and take the Army physical test, which he failed. Army doctors considered him underweight at 128 pounds (58 kg), diagnosed his chronic cough as tuberculosis, and were concerned about his chronic lumbosacral joint pain. The plan to commission scientists fell through when Rabi and Robert Bacher balked at the idea. Conant, Groves, and Oppenheimer devised a compromise whereby the University of California operated the laboratory under contract to the War Department. It soon turned out that Oppenheimer had hugely underestimated the magnitude of the project: Los Alamos grew from a few hundred people in 1943 to over 6,000 in 1945.

Oppenheimer at first had difficulty with the organizational division of large groups but rapidly learned the art of large-scale administration after he took up permanent residence at Los Alamos. He was noted for his mastery of all scientific aspects of the project and for his efforts to control the inevitable cultural conflicts between scientists and the military. Victor Weisskopf wrote:

Oppenheimer directed these studies, theoretical and experimental, in the real sense of the words. Here his uncanny speed in grasping the main points of any subject was a decisive factor; he could acquaint himself with the essential details of every part of the work.

He did not direct from the head office. He was intellectually and physically present at each decisive step. He was present in the laboratory or in the seminar rooms, when a new effect was measured, when a new idea was conceived. It was not that he contributed so many ideas or suggestions; he did so sometimes, but his main influence came from something else. It was his continuous and intense presence, which produced a sense of direct participation in all of us; it created that unique atmosphere of enthusiasm and challenge that pervaded the place throughout its time.


Trinity
The Trinity test was the first detonation of a nuclear device.
In the early morning hours of July 16, 1945, near Alamogordo, New Mexico, the work at Los Alamos culminated in the test of the world's first nuclear weapon. Oppenheimer had code-named the site "Trinity" in mid-1944, saying later that the name came from John Donne's Holy Sonnets; he had been introduced to Donne's work in the 1930s by Jean Tatlock, who killed herself in January 1944.

Brigadier General Thomas Farrell, who was present in the control bunker with Oppenheimer, recalled:

Dr. Oppenheimer, on whom had rested a very heavy burden, grew tenser as the last seconds ticked off. He scarcely breathed. He held on to a post to steady himself. For the last few seconds, he stared directly ahead and then when the announcer shouted "Now!" and there came this tremendous burst of light followed shortly thereafter by the deep growling roar of the explosion, his face relaxed into an expression of tremendous relief.

Oppenheimer's brother Frank recalled Oppenheimer's first words as, "I guess it worked".

External video
video icon Oppenheimer recalling his thoughts after witnessing the Trinity test
Two men, one in a suit and hat and the other in military uniform, stand in front of twisted metal whilst wearing white overshoes
Oppenheimer and Groves at the remains of the Trinity test tower. Oppenheimer is wearing his trademark pork pie hat; white overshoes protect against fallout.
According to a 1949 magazine profile, while witnessing the explosion Oppenheimer thought of verses from the Bhagavad Gita: "If the radiance of a thousand suns were to burst at once into the sky, that would be like the splendor of the mighty one ... Now I am become Death, the shatterer of worlds." In 1965 he recalled the moment this way:

We knew the world would not be the same. A few people laughed, a few people cried. Most people were silent. I remembered the line from the Hindu scripture, the Bhagavad Gita; Vishnu is trying to persuade the Prince that he should do his duty and, to impress him, takes on his multi-armed form and says, "Now I am become Death, the destroyer of worlds." I suppose we all thought that, one way or another.

Rabi described seeing Oppenheimer somewhat later: "I'll never forget his walk ... like High Noon ... this kind of strut. He had done it". At an assembly at Los Alamos on August 6, the evening of the atomic bombing of Hiroshima, Oppenheimer took to the stage and clasped his hands together "like a prize-winning boxer" while the crowd cheered. He expressed regret that the weapon was ready too late for use against Nazi Germany.

On August 17, however, Oppenheimer traveled to Washington to hand-deliver a letter to Secretary of War Henry L. Stimson expressing his revulsion and his wish to see nuclear weapons banned. In October he met with President Harry S. Truman, who dismissed Oppenheimer's concern about an arms race with the Soviet Union and belief that atomic energy should be under international control. Truman became infuriated when Oppenheimer said, "Mr. President, I feel I have blood on my hands", responding that he (Truman) bore sole responsibility for the decision to use atomic weapons against Japan, and later said, "I don't want to see that son of a bitch in this office ever again".

For his services as director of Los Alamos, Oppenheimer was awarded the Medal for Merit by Truman in 1946.
'''

### Create the CharacterTextSplitter

In [3]:
from langchain.text_splitter import CharacterTextSplitter

In [4]:
splitter=CharacterTextSplitter(chunk_size=1000,chunk_overlap=50)

chunk_size basically describes how many letters should there be in a single "chunk" while chunk_overlap describes how much overlap is required between two adjacent chunks. Higher the overlap, more chunks are created and higher processing time but with better context transfers between two chunks

### Split the text

In [5]:
from langchain.document_loaders import TextLoader

In [6]:
text_loader=TextLoader(text)
print(text_loader)

<langchain.document_loaders.text.TextLoader object at 0x00000250D8F4D850>


In [7]:
texts=splitter.split_text(text)

Created a chunk of size 1225, which is longer than the specified 1000
Created a chunk of size 1306, which is longer than the specified 1000


In [8]:
texts[0]

'Julius Robert Oppenheimer (April 22, 1904 – February 18, 1967) was an American theoretical physicist and director of the Manhattan Project\'s Los Alamos Laboratory during World War II. He is often called the "father of the atomic bomb".'

## Split text from a file rather than from copying / pasting the text in Jupyter

Instead of copying lengthy texts into a Jupyter window, we can also extract and split the text from a file

The CharacterTextSplitter needs a <b>document_loader</b> object to load files. They cannot load the file directly.<br>
This is because every document_loader comes with a <b>page_content attribute that the CharacterTextSplitter object refers to while creating the split text

### Load the file

In [9]:
from langchain.document_loaders import TextLoader

In [10]:
loader=TextLoader('oppenheimer.txt')

In [11]:
docs=loader.load()

In [12]:
docs

[Document(page_content='Julius Robert Oppenheimer (April 22, 1904 â€“ February 18, 1967) was an American theoretical physicist and director of the Manhattan Project\'s Los Alamos Laboratory during World War II. He is often called the "father of the atomic bomb".\n\nBorn in New York City, Oppenheimer earned a bachelor of arts degree in chemistry from Harvard University in 1925 and a doctorate in physics from the University of GÃ¶ttingen in Germany in 1927, where he studied under Max Born. After research at other institutions, he joined the physics department at the University of California, Berkeley, where he became a full professor in 1936. He made significant contributions to theoretical physics, including achievements in quantum mechanics and nuclear physics such as the Bornâ€“Oppenheimer approximation for molecular wave functions, work on the theory of electrons and positrons, the Oppenheimerâ€“Phillips process in nuclear fusion, and the first prediction of quantum tunneling. With h

### Create the CharacterTextSplitter

In [13]:
from langchain.text_splitter import CharacterTextSplitter

In [14]:
splitter=CharacterTextSplitter(chunk_size=1000,chunk_overlap=50)

In [15]:
texts_from_file=splitter.split_documents(docs)

Created a chunk of size 1227, which is longer than the specified 1000
Created a chunk of size 1306, which is longer than the specified 1000


## Analysing the difference in outputs between using a file as an input and using a text string as an input

In [16]:
print(type(texts_from_file))
print(texts_from_file[0])
print(type(texts_from_file[0]))

<class 'list'>
page_content='Julius Robert Oppenheimer (April 22, 1904 â€“ February 18, 1967) was an American theoretical physicist and director of the Manhattan Project\'s Los Alamos Laboratory during World War II. He is often called the "father of the atomic bomb".' metadata={'source': 'oppenheimer.txt'}
<class 'langchain.schema.Document'>


In [17]:
print(type(texts))
print(texts[0])
print(type(texts[0]))

<class 'list'>
Julius Robert Oppenheimer (April 22, 1904 – February 18, 1967) was an American theoretical physicist and director of the Manhattan Project's Los Alamos Laboratory during World War II. He is often called the "father of the atomic bomb".
<class 'str'>


So if we compare the two modes (texts = directly splitting the text from the variable 'text' and texts_from_file= splitting the text provided in a file, we notice the following):
    <ul>
    <li> Both return lists of items
    <li>texts_from_file is a Document object (which is the same object as the original <b>'docs'</b> while texts is simply, a string</li>
    <li>The Document object has multiple attributes such as page_content and metadata. Metadata contains data about the file and the content rather than the content itself. In this case, the metadata contains the source key whose value is the file name</li>

### Updating the metadata of a document

We can update the metadata of any split text with any key_value pair

#### Updating metadata for a single chunk

In [18]:
texts_from_file[0].metadata['Creator']='SK'

In [19]:
print(texts_from_file[0])

page_content='Julius Robert Oppenheimer (April 22, 1904 â€“ February 18, 1967) was an American theoretical physicist and director of the Manhattan Project\'s Los Alamos Laboratory during World War II. He is often called the "father of the atomic bomb".' metadata={'source': 'oppenheimer.txt', 'Creator': 'SK'}


#### Updating metadata for the entire document

To update the metadata for the entire document, we can directly update the metadata of the original Document object <b>docs</b>

In [20]:
docs[0].metadata['Creator']='Swaminathan Kannan'

In [21]:
texts_from_file=splitter.split_documents(docs)

Created a chunk of size 1227, which is longer than the specified 1000
Created a chunk of size 1306, which is longer than the specified 1000


In [22]:
for text_from_file in texts_from_file:
    print(text_from_file.metadata['Creator'])

Swaminathan Kannan
Swaminathan Kannan
Swaminathan Kannan
Swaminathan Kannan
Swaminathan Kannan
Swaminathan Kannan
Swaminathan Kannan
Swaminathan Kannan
Swaminathan Kannan
Swaminathan Kannan
Swaminathan Kannan
Swaminathan Kannan
Swaminathan Kannan
Swaminathan Kannan
Swaminathan Kannan


## Create the embeddings for the text

In [27]:
from langchain.embeddings import OpenAIEmbeddings

In [28]:
embeddings=OpenAIEmbeddings(model='text-embedding-ada-002')

# Creating the index and store

In [23]:
from langchain.vectorstores import DeepLake

<b>Before executing the following code, make sure to have your<br>
Activeloop key saved in the “ACTIVELOOP_TOKEN” environment variable.<br><br>

create Deep Lake dataset<br>
TODO: use your organization id here. (by default, org id is your username)</b>

## Setting up the vector store on DeepLake

In [24]:
user_name='swamikannan'
project_name='langchain_course_indexers_retrievers'
link=f'hub://{user_name}/{project_name}'

In [30]:
db=DeepLake(dataset_path=link, embedding_function=embeddings)

Deep Lake Dataset in hub://swamikannan/langchain_course_indexers_retrievers already exists, loading from the storage


We also need to add the embedding function here. As we upload the data to the db object, it needs to store it in a numerical representation of the text provided. There are multiple such numeric representations, each tightly related to the family of models it is built for. Adding the embedding function tells the db object, which numerical representation to hold

<img src="deeplake/empty.PNG">

Obviously, this is an empty dataset since we haven't really loaded in our <b>texts_from_file</b> object yet. We do that now

In [31]:
db.add_documents(texts_from_file)

|

Dataset(path='hub://swamikannan/langchain_course_indexers_retrievers', tensors=['embedding', 'id', 'metadata', 'text'])

  tensor      htype      shape      dtype  compression
  -------    -------    -------    -------  ------- 
 embedding  embedding  (15, 1536)  float32   None   
    id        text      (15, 1)      str     None   
 metadata     json      (15, 1)      str     None   
   text       text      (15, 1)      str     None   


 

['ab4ffdcd-40be-11ee-b60b-503eaa4cf025',
 'ab4ffdce-40be-11ee-98a7-503eaa4cf025',
 'ab4ffdcf-40be-11ee-b082-503eaa4cf025',
 'ab4ffdd0-40be-11ee-b8f9-503eaa4cf025',
 'ab4ffdd1-40be-11ee-9aea-503eaa4cf025',
 'ab4ffdd2-40be-11ee-8297-503eaa4cf025',
 'ab4ffdd3-40be-11ee-9092-503eaa4cf025',
 'ab4ffdd4-40be-11ee-bdca-503eaa4cf025',
 'ab4ffdd5-40be-11ee-a0b8-503eaa4cf025',
 'ab4ffdd6-40be-11ee-9f19-503eaa4cf025',
 'ab4ffdd7-40be-11ee-99ea-503eaa4cf025',
 'ab4ffdd8-40be-11ee-bac4-503eaa4cf025',
 'ab4ffdd9-40be-11ee-b229-503eaa4cf025',
 'ab4ffdda-40be-11ee-b51c-503eaa4cf025',
 'ab4ffddb-40be-11ee-9369-503eaa4cf025']

If we now re-look at the db object, it looks as follows:
    <img src="deeplake/updated.PNG">

# Creating a retriever

## Create a retriever object

In [35]:
retriever=db.as_retriever()

## Create the LLM object

In [37]:
from langchain.llms import OpenAI

In [38]:
llm=OpenAI(model='text-davinci-003')

## Create a retrievalQA chain

Like an LLM Chain, there are specific <b>retriever chains</b> that are used to retrieve informations from the index store. 

In [34]:
from langchain.chains import RetrievalQA

In [43]:
retrievalQA=RetrievalQA.from_chain_type(llm=llm,retriever=retriever,verbose=True, return_source_documents=False)

In [44]:
q1='Who approved the establishment of the Manhattan Project?'
a1=retrievalQA.run(q1)
print(a1)



[1m> Entering new  chain...[0m

[1m> Finished chain.[0m
 President Franklin D. Roosevelt


If we want the metadata and the initial query as part of the response, we can set `return_source_documents=True`</b>. However, the `retrievalQA2.run()` function does not work for multiple outputs i.e. the actual response and the source_documents

In [45]:
retrievalQA2=RetrievalQA.from_chain_type(llm=llm,retriever=retriever,verbose=True, return_source_documents=True)

In [47]:
q2='Explain the Manhattan project in less than 100 words'
a2=retrievalQA2({'query':q2})
print(a2)



[1m> Entering new  chain...[0m

[1m> Finished chain.[0m
{'query': 'Explain the Manhattan project in less than 100 words', 'result': ' The Manhattan Project was a research and development program undertaken by the US in 1942 to develop the first nuclear weapons. It was led by General Leslie Groves, with J. Robert Oppenheimer in charge of the Los Alamos Laboratory in New Mexico where the weapons were developed. On July 16, 1945, the first atomic bomb was successfully tested, and two bombs were dropped on Japan in August 1945.', 'source_documents': [Document(page_content='Manhattan Project\nLos Alamos\nOn October 9, 1941, two months before the United States entered World War II, President Franklin D. Roosevelt approved a crash program to develop an atomic bomb. Lawrence brought Oppenheimer into the project on October 21. On May 18, 1942, [105] National Defense Research Committee Chairman James B. Conant, who had been one of Oppenheimer\'s lecturers at Harvard, asked Oppenheimer to t

In [65]:
a2.keys()

dict_keys(['query', 'result', 'source_documents'])

Hence, it now provides us with three sets of outputs instead of just the result:
    <ol>
    <li> The original query </li>
    <li> The actual result </li>
    <li> The metadata </li>
    </ol>

In [56]:
result2=a2['result']

In [64]:
print(f'Number of characters:\t{len(result2)}')
print(f"Number of words:\t{len(result2.split(' '))}")

Number of characters:	396
Number of words:	69


In conclusion:
<ul>
    <li>A similarity search is conducted using the embeddings to identify matching documents from the db to be used as context for the LLM.</li>
    <li>Although it might not seem particularly useful with just one document, we are effectively working with multiple documents since we "chunked" our text.</li> <li>Preselecting the most suitable documents based on semantic similarity enables us to provide the model with meaningful knowledge through the prompt while remaining within the allowed context size.</li>
    </ul>

# Contextual Compression Retrievers

In the text splitter, we splice the entire text body irrespective of the different topics within the text body but only on a fixed length of text (1000 characters with 50 characters overlap). If we could use a separate toolkit to first summarize these chunks before extracting the relevant output, we could have a better quality of output since each summary would refer to a small set of closely related sentences.

First, we need a <b>Compression library such as the LLMChainExtractor</b>. Compressors aim to make it easy to pass <b>only the relevant information to the LLM</b>.<br> Doing this also enables you to pass along more information to the LLM since in the initial retrieval step, you can focus on <b>recall </b>(e.g., by increasing the number of documents returned) and let the compressors handle <b>precision</b>

## Creating the Compressor object

In [70]:
from langchain.retrievers.document_compressors import LLMChainExtractor

In [72]:
compressor=LLMChainExtractor.from_llm(llm=llm)

## Creating the Retriever

In [73]:
retriever = db.as_retriever()

We need a <b>retriever</b> that can incorporate this compressor that we have created as well as our original retriever object that we had created from the db.<br> The <b>ContextualCompressionRetriever</b> is used, wrapping the base retriever with an LLMChainExtractor. The LLMChainExtractor iterates over the initially returned documents and extracts only the content relevant to the query. 

## Creating the ContextualCompressionRetriever

In [67]:
from langchain.retrievers import ContextualCompressionRetriever,

In [117]:
ccr=ContextualCompressionRetriever(base_retriever=retriever, base_compressor=compressor)

## Running our query

In [84]:
response_ccr=ccr.get_relevant_documents("What was Oppenheimer's reaction to the Trinity test?")
print(response_ccr)



[Document(page_content='"I guess it worked" and "Now I am become Death, the shatterer of worlds."', metadata={'source': 'oppenheimer.txt', 'Creator': 'Swaminathan Kannan'}), Document(page_content='"In the early morning hours of July 16, 1945, near Alamogordo, New Mexico, the work at Los Alamos culminated in the test of the world\'s first nuclear weapon." \n"Dr. Oppenheimer, on whom had rested a very heavy burden, grew tenser as the last seconds ticked off. He scarcely breathed. He held on to a post to steady himself. For the last few seconds, he stared directly ahead and then when the announcer shouted "Now!" and there came this tremendous burst of light followed shortly thereafter by the deep growling roar of the explosion, his face relaxed into an expression of tremendous relief."', metadata={'source': 'oppenheimer.txt', 'Creator': 'Swaminathan Kannan'}), Document(page_content='"In 1942, Oppenheimer was recruited to work on the Manhattan Project, ... tasked with developing the first 

In [85]:
for rccr in response_ccr:
    print(rccr.page_content+'\n')

"I guess it worked" and "Now I am become Death, the shatterer of worlds."

"In the early morning hours of July 16, 1945, near Alamogordo, New Mexico, the work at Los Alamos culminated in the test of the world's first nuclear weapon." 
"Dr. Oppenheimer, on whom had rested a very heavy burden, grew tenser as the last seconds ticked off. He scarcely breathed. He held on to a post to steady himself. For the last few seconds, he stared directly ahead and then when the announcer shouted "Now!" and there came this tremendous burst of light followed shortly thereafter by the deep growling roar of the explosion, his face relaxed into an expression of tremendous relief."

"In 1942, Oppenheimer was recruited to work on the Manhattan Project, ... tasked with developing the first nuclear weapons. ... On July 16, 1945, he was present at the first test of the atomic bomb, Trinity."

"In October he met with President Harry S. Truman, who dismissed Oppenheimer's concern about an arms race with the Sovi

## Comparing it to our original QARetrieval Chain

In [86]:
a3=retrievalQA2({'query':"What was Oppenheimer's reaction to the Trinity test?"})
print(a3)



[1m> Entering new  chain...[0m

[1m> Finished chain.[0m
{'query': "What was Oppenheimer's reaction to the Trinity test?", 'result': ' Oppenheimer\'s first words were, "I guess it worked", and he reportedly had a relieved expression on his face. He also thought of verses from the Bhagavad Gita, "If the radiance of a thousand suns were to burst at once into the sky, that would be like the splendor of the mighty one ... Now I am become Death, the shatterer of worlds."', 'source_documents': [Document(page_content='Oppenheimer\'s brother Frank recalled Oppenheimer\'s first words as, "I guess it worked".\n\nExternal video\nvideo icon Oppenheimer recalling his thoughts after witnessing the Trinity test\nTwo men, one in a suit and hat and the other in military uniform, stand in front of twisted metal whilst wearing white overshoes\nOppenheimer and Groves at the remains of the Trinity test tower. Oppenheimer is wearing his trademark pork pie hat; white overshoes protect against fallout.\n

In [92]:
print(a3['result'])

 Oppenheimer's first words were, "I guess it worked", and he reportedly had a relieved expression on his face. He also thought of verses from the Bhagavad Gita, "If the radiance of a thousand suns were to burst at once into the sky, that would be like the splendor of the mighty one ... Now I am become Death, the shatterer of worlds."


In [100]:
for reference in a3['source_documents']:
    for meta in reference:
        print(meta[1])

Oppenheimer's brother Frank recalled Oppenheimer's first words as, "I guess it worked".

External video
video icon Oppenheimer recalling his thoughts after witnessing the Trinity test
Two men, one in a suit and hat and the other in military uniform, stand in front of twisted metal whilst wearing white overshoes
Oppenheimer and Groves at the remains of the Trinity test tower. Oppenheimer is wearing his trademark pork pie hat; white overshoes protect against fallout.
According to a 1949 magazine profile, while witnessing the explosion Oppenheimer thought of verses from the Bhagavad Gita: "If the radiance of a thousand suns were to burst at once into the sky, that would be like the splendor of the mighty one ... Now I am become Death, the shatterer of worlds." In 1965 he recalled the moment this way:
{'source': 'oppenheimer.txt', 'Creator': 'Swaminathan Kannan'}
Trinity
The Trinity test was the first detonation of a nuclear device.
In the early morning hours of July 16, 1945, near Alamogo

### Analysis of the comparison

If we compare the two, we notice the following:
    <ol>
    <li> The reference material (<b>metadata</b>) for ContextualCompressionRetriever is in fact a compressed version of the original text while the standard QARetrievalChain used the entire text as a reference. </li>
    <li>The results from the ContextualCompressionRetriever chain has far more output. This includes Truman's reporting of Oppenheimer's reactions and his immediate reaction to the explosion which the original QARetrievalChain misses out.<br> If we look at the relevant chunk's in the output of the cell below, we see that there are multiple events in that chunk, including the hand delivery of the paper, the reference to the Secretary of the war and then his subsequent meeting with Pres. Truman at the end of the chunk followed by Truman's reaction. This could have led to the ignorance of this reaction in the QARetrievalChain</li>
    <li>However, there are some irrelevant reactions to the fourth point in the ContextualCompressionRetriever output. This could be because of the way the summarization was done
    </ol>

In [127]:
for c in range(len(texts_from_file)):
    chunk=texts_from_file[c]
    if 'Truman' in chunk.page_content or 'on whom had rested a very heavy burden' in chunk.page_content:
        print(f'Chunk:{c},Content:{chunk.page_content}+\n') 
        

Chunk:11,Content:Trinity
The Trinity test was the first detonation of a nuclear device.
In the early morning hours of July 16, 1945, near Alamogordo, New Mexico, the work at Los Alamos culminated in the test of the world's first nuclear weapon. Oppenheimer had code-named the site "Trinity" in mid-1944, saying later that the name came from John Donne's Holy Sonnets; he had been introduced to Donne's work in the 1930s by Jean Tatlock, who killed herself in January 1944.

Brigadier General Thomas Farrell, who was present in the control bunker with Oppenheimer, recalled:

Dr. Oppenheimer, on whom had rested a very heavy burden, grew tenser as the last seconds ticked off. He scarcely breathed. He held on to a post to steady himself. For the last few seconds, he stared directly ahead and then when the announcer shouted "Now!" and there came this tremendous burst of light followed shortly thereafter by the deep growling roar of the explosion, his face relaxed into an expression of tremendous 