# Optimizing queries about cryptocurrency through similarity search using LangChain

***In this project, I used an internal source to perform similarity searches and context compression using LangChain and FAISS, allowing me to query whatever I wanted.***

Source link : https://documents1.worldbank.org/curated/en/293821525702130886/pdf/Cryptocurrencies-and-blockchain.pdf

In [131]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain,SequentialChain
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.docstore import InMemoryDocstore
from langchain.vectorstores import FAISS
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.chat_models import ChatOpenAI
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain.chat_models import ChatOpenAI
from langchain.retrievers import MultiQueryRetriever
import numpy as np
import faiss

In [141]:
#LOAD DOC ---> SPLIT CHUNKS
#EMBEDDING ---> EMBED CHUNKS -->VECTORS
#VECTOR CHUNKS --->SAVE FAISS
#QUERY ---> SIMILARITY SEARCH FAISS
api_key = open("openai api.txt").read()
loader = PyPDFLoader("Cryptocurrencies-and-blockchain.pdf")
documents = loader.load()
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=500)
docs = text_splitter.split_documents(documents)
embedding_function = OpenAIEmbeddings(api_key=api_key)

In [143]:
len(docs)

107

In [165]:
api_key = open("openai api.txt").read()
loader = PyPDFLoader("Cryptocurrencies-and-blockchain.pdf")
documents = loader.load()
# Split documents into chunks
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=750)
docs = text_splitter.split_documents(documents)

In [189]:
# Create embedding function
embedding_function = OpenAIEmbeddings(api_key=api_key)

# Extract text content from docs
texts = [doc.page_content for doc in docs]

# Embed chunks into vectors
embeddings = np.array(embedding_function.embed_documents(texts)).astype('float32')
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(embeddings)

# Create a document store
docstore = InMemoryDocstore({i: docs[i] for i in range(len(docs))})

# Create a mapping from index to document IDs
index_to_docstore_id = {i: i for i in range(len(docs))}

# Create a FAISS vector store for LangChain
db_connection = FAISS(index=index, docstore=docstore, index_to_docstore_id=index_to_docstore_id, embedding_function=embedding_function)

In [195]:
# Initialize the LLM
llm = ChatOpenAI(api_key=api_key)

retriever_from_llm = MultiQueryRetriever.from_llm(retriever=db_connection.as_retriever(), llm=llm)

question = 'Give me a summary of Turkey in the year of 2018'

# Use the original retriever for the similarity search
retrieved_documents = db_connection.similarity_search(question, k=2)
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=db_connection.as_retriever())

compressed_docs = compression_retriever.get_relevant_documents(question)

print(compressed_docs[0].page_content if compressed_docs else "No compressed documents found.")

For 2018, economic growth is projected at 
4.7 percent, gradually converging to a 
potential rate of around 4.5 -5 percent. 
Recent surveys point to a moderation in 
consumer demand, weighed down by 
rising costs and declining real wages. Rap-
id credit expansion has increased credit 
risk and raised lending rates, pointing to a 
slowdown in credit growth in 2018.
