## MultiQuery Retrieval

In [40]:
from langchain.document_loaders import WikipediaLoader
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

from dotenv import load_dotenv

In [18]:
load_dotenv()

True

In [5]:
loader = WikipediaLoader(query='MKUltra')
documents = loader.load()

In [6]:
len(documents)

24

In [7]:
from langchain.text_splitter import CharacterTextSplitter

In [8]:
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=500)
docs = text_splitter.split_documents(documents)

Created a chunk of size 525, which is longer than the specified 500
Created a chunk of size 542, which is longer than the specified 500
Created a chunk of size 694, which is longer than the specified 500


In [14]:
print(docs[0].page_content)

Project MKUltra was a human experimentation program designed and undertaken by the U.S. Central Intelligence Agency (CIA) to develop procedures and identify drugs that could be used during interrogations to weaken individuals and force confessions through brainwashing and psychological torture. The term MKUltra is a CIA cryptonym: "MK" is an arbitrary prefix standing for the Office of Technical Service and "Ultra" is an arbitrary word out of a dictionary used to name this project. The program has been widely condemned as a violation of individual rights and an example of the CIA's abuse of power, with critics highlighting its disregard for consent and its corrosive impact on democratic principles.
Project MKUltra began in 1953 and was halted in 1973. MKUltra used numerous methods to manipulate its subjects' mental states and brain functions, such as the covert administration of high doses of psychoactive drugs (especially LSD) and other chemicals without the subjects' consent. Addition

In [19]:
embedding_function = OpenAIEmbeddings()

In [21]:
from langchain_chroma import Chroma

In [22]:
db = Chroma.from_documents(documents=docs, 
                           embedding=embedding_function,
                           persist_directory="./mkultra")

In [26]:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI

In [27]:
question = "When was this declassified?"

In [28]:
model = ChatOpenAI(model="gpt-4o-mini-2024-07-18",
                   max_tokens=300,
                   temperature=0.7)

In [30]:
retriever_from_llm = MultiQueryRetriever.from_llm(retriever=db.as_retriever(), llm=model)

In [32]:
import logging
logging.basicConfig()
logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)

In [35]:
unique_docs = retriever_from_llm.invoke(input=question)

INFO:langchain.retrievers.multi_query:Generated queries: ['When was this document made public after declassification?  ', 'What is the date on which this information was officially released?  ', 'Can you tell me the timeline for when this was declassified?']


In [39]:
print(unique_docs[0].page_content)

== Background ==
By the early years of the 1970s, a series of troubling reports had appeared in the press concerning U.S. intelligence activities. First came revelations by Army intelligence officer Christopher Pyle in January 1970 regarding the US Army's spying on the civilian population in the United States. Senator Sam Ervin's investigations of military surveillance produced further revelations. 
Then on December 22, 1974, The New York Times published a lengthy front-page article by investigative journalist Seymour Hersh detailing covert activities engaged in by the Central Intelligence Agency under Operation CHAOS to collect information on the political activities of American citizens.
The resulting uproar led to the creation of the Church Committee, which was approved by the Senate on January 27, 1975, on a vote of 82 to 4.


== Overview ==
The Church Committee's final report was published in April 1976 in six books. Also published were seven volumes of Church Committee hearings i