# Multi-Query Retriver

In [2]:
import os
import warnings
from dotenv import load_dotenv
from typing import Dict, List
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import WikipediaLoader
from langchain_community.document_loaders import TextLoader
from langchain.chat_models import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Milvus

warnings.filterwarnings("ignore")

In [3]:
# Init Config
load_dotenv(".env")
api_key = os.environ.get("key")

In [4]:
loder = WikipediaLoader(query='MKUltra')
documents = loder.load()

In [5]:
len(documents)

24

In [6]:
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size = 500)
docs = text_splitter.split_documents(documents)

Created a chunk of size 526, which is longer than the specified 500


In [7]:
len(docs)

53

In [8]:
embeddingModel = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-V2")

In [9]:
embeddingModel

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Normalize()
), model_name='all-MiniLM-L6-V2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False)

In [10]:
db = Milvus.from_documents(
    docs,
    embeddingModel,
    connection_args = {
        "host": "192.168.43.163",
        "port": "19530"
    },
    collection_name = 'wikipedia'
)

In [11]:
db.similarity_search("What is decalssified", k = 1)[0].page_content

'"Truth serum" is a colloquial name for any of a range of psychoactive drugs used in an effort to obtain information from subjects who are unable or unwilling to provide it otherwise. These include ethanol, scopolamine, 3-quinuclidinyl benzilate, midazolam, flunitrazepam, sodium thiopental, and amobarbital, among others.\nAlthough a variety of such substances have been tested, serious issues have been raised about their use scientifically, ethically and legally. There is currently no drug proven to cause consistent or predictable enhancement of truth-telling. Subjects questioned under the influence of such substances have been found to be suggestible and their memories subject to reconstruction and fabrication. When such drugs have been used in the course of investigating civil and criminal cases, they have not been accepted by Western legal systems and legal experts as genuine investigative tools. In the United States, it has been suggested that their use is a potential violation of t

In [16]:
llm = ChatOpenAI(api_key=api_key)

In [17]:
retriver_from_llm = MultiQueryRetriever.from_llm(retriever=db.as_retriever(), llm = llm)

In [18]:
import logging
logging.basicConfig()
logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)

In [19]:
retriver_from_llm.get_relevant_documents('What is decalssified')

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Can you provide information on the process of declassification?', '2. What does the term "declassified" mean and how does it relate to classified information?', '3. Could you explain the significance of declassifying information and its impact on transparency and historical understanding?']


[Document(page_content='Unethical human experimentation is human experimentation that violates the principles of medical ethics. Such practices have included denying patients the right to informed consent, using pseudoscientific frameworks such as race science, and torturing people under the guise of research. Around World War II, Imperial Japan and Nazi Germany carried out brutal experiments on prisoners and civilians through groups like Unit 731 or individuals like Josef Mengele; the Nuremberg Code was developed after the war in response to the Nazi experiments. Countries have carried out brutal experiments on marginalized populations. Examples include American abuses during Project MKUltra and the Tuskegee syphilis experiments, and the mistreatment of indigenous populations in Canada and Australia. The Declaration of Helsinki, developed by the World Medical Association (WMA), is widely regarded as the cornerstone document on human research ethics.\n\n\n== Nazi Germany ==', metadata=

In [20]:
retriver_from_llm.get_relevant_documents('Can you provide information on the process of declassification?')[0].page_content

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What are the steps involved in the declassification process?', '2. Could you explain how the declassification process works?', '3. Can you outline the procedures followed during the declassification process?']


"Project MKUltra was an illegal human experimentation program designed and undertaken by the U.S. Central Intelligence Agency (CIA) and intended to develop procedures and identify drugs that could be used during interrogations to weaken people and force confessions through brainwashing and psychological torture. It began in 1953 and was halted in 1973. MKUltra used numerous methods to manipulate its subjects' mental states and brain functions, such as the covert administration of high doses of psychoactive drugs (especially LSD) and other chemicals without the subjects' consent, electroshocks, hypnosis, sensory deprivation, isolation, verbal and sexual abuse, and other forms of torture.MKUltra was preceded by Project Artichoke. It was organized through the CIA's Office of Scientific Intelligence and coordinated with the United States Army Biological Warfare Laboratories. The program engaged in illegal activities, including the use of U.S. and Canadian citizens as unwitting test subject

# Context Compression

In [21]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

In [23]:
compressor = LLMChainExtractor.from_llm(llm)
compressor_retriver = ContextualCompressionRetriever(base_compressor=compressor,
                                                    base_retriever=db.as_retriever())

In [28]:
print(compressor_retriver.base_compressor.llm_chain.prompt)

input_variables=['context', 'question'] output_parser=NoOutputParser() template='Given the following question and context, extract any part of the context *AS IS* that is relevant to answer the question. If none of the context is relevant return NO_OUTPUT. \n\nRemember, *DO NOT* edit the extracted parts of the context.\n\n> Question: {question}\n> Context:\n>>>\n{context}\n>>>\nExtracted relevant parts:'


In [29]:
docs = db.similarity_search("What is decalssified", k = 1)

In [30]:
docs[0]

Document(page_content='"Truth serum" is a colloquial name for any of a range of psychoactive drugs used in an effort to obtain information from subjects who are unable or unwilling to provide it otherwise. These include ethanol, scopolamine, 3-quinuclidinyl benzilate, midazolam, flunitrazepam, sodium thiopental, and amobarbital, among others.\nAlthough a variety of such substances have been tested, serious issues have been raised about their use scientifically, ethically and legally. There is currently no drug proven to cause consistent or predictable enhancement of truth-telling. Subjects questioned under the influence of such substances have been found to be suggestible and their memories subject to reconstruction and fabrication. When such drugs have been used in the course of investigating civil and criminal cases, they have not been accepted by Western legal systems and legal experts as genuine investigative tools. In the United States, it has been suggested that their use is a po

In [35]:
compressed_docs = compressor_retriver.get_relevant_documents('What was this decalssified')

In [36]:
compressed_docs[0].page_content

'Operation Midnight Climax was established in order to study the effects of LSD on non-consenting individuals. Prostitutes on the CIA payroll were instructed to lure clients back to the safehouses, where they were surreptitiously plied with a wide range of substances, including LSD, and monitored behind one-way glass. The prostitutes were instructed in the use of post-coital questioning to investigate whether the victims could be convinced to involuntarily reveal secrets. The victims were sometimes fed subliminal messages in attempts to induce them to involuntary actions, including criminal activity such as robbery, assault, and assassination. Many of the CIA operatives involved in the experiments voluntarily indulged in the drugs and prostitutes for recreational purposes. Additionally, information from Wilmington News Journal on October 15, 1978, reports from a FOIA request that, "the spy agency purchased two pounds of Yohimbine hydrochloride... by Dr. Robert V. Lashbrook, the chief a

In [38]:
compressed_docs[0].metadata['summary']

'Operation Midnight Climax was an operation carried out by the CIA as a sub-project of Project MKUltra, the mind-control research program that began in the 1950s. It was initially established in 1954 by Sidney Gottlieb and placed under the direction of the Federal Bureau of Narcotics in Boston, Massachusetts with the "Federal Narcotics Agent and CIA consultant" George Hunter White under the pseudonym of Morgan Hall. Dr. Sidney Gottlieb was a chemist who was chief of the Chemical Division of the Office of Technical Service of the CIA. Gottlieb based his plan for Project MKUltra and Operation Midnight Climax off of interrogation method research under Project Artichoke. Unlike Project Artichoke, Operation Midnight Climax gave Gottlieb permission to test drugs on unknowing citizens, which made way for the legacy of this operation. Hundreds of federal agents, field operatives, and scientists worked on these programs before they were shut down in the 1960s.'