# Indexing the Data

In [1]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

file_path = 'data/Segment-anything-meta-paper.pdf'
loader = PyPDFLoader(file_path=file_path)
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=0,
)
data = loader.load_and_split(text_splitter=splitter)

In [5]:
data[1]

Document(page_content='(c) Data: data engine (top) & dataset (bottom)•1+ billion masks•11 million images •privacy respecting•licensed imagesannotatetraindatamodelSegment Anything 1B (SA-1B):\nFigure 1: We aim to build a foundation model for segmentation by introducing three interconnected components: a prompt-\nable segmentation task, a segmentation model (SAM) that powers data annotation and enables zero-shot transfer to a range', metadata={'source': 'data/Segment-anything-meta-paper.pdf', 'page': 0})

In [None]:
# To index our pdf, we use an open source tool called Faiss.
# it is a library for efficient similarity search and clustering of dense vectors.
%pip install faiss-cpu

In [3]:
from langchain_openai import OpenAIEmbeddings

# An example of how to embed a query using the OpenAI embeddings
embeddings = OpenAIEmbeddings()
vector1 = embeddings.embed_query('How are you?')
len(vector1)

1536

In [6]:
from langchain.vectorstores import FAISS
index = FAISS.from_documents(data, embeddings)

question = "Explain the main idea and results of the paper."
index.similarity_search_with_relevance_scores(question)

[(Document(page_content='Results. We show qualitative results in Fig. 12. SAM\ncan segment objects based on simple text prompts like “a\nwheel” as well as phrases like “beaver tooth grille”. When\nSAM fails to pick the right object from a text prompt only,\nan additional point often fixes the prediction, similar to [31].\n7.6. Ablations\nWe perform several ablations on our 23 dataset suite with\nthe single center point prompt protocol. Recall that a sin-\ngle point may be ambiguous and that ambiguity may not', metadata={'source': 'data/Segment-anything-meta-paper.pdf', 'page': 10}),
  0.6708498665022007),
 (Document(page_content='that annotators improved annotation quality and quantity over time.\n3.How did you choose the specific wording of your task instructions? What\nsteps, if any, were taken to verify the clarity of task instructions and wording\nfor annotators? As our task was annotating images, the annotation guide-\nlines included visual examples. Our research team completed 30

In [8]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.callbacks import StdOutCallbackHandler


retriever = index.as_retriever()
retriever.search_kwargs['fetch_k'] = 20
retriever.search_kwargs['maximal_marginal_relevance'] = True
retriever.search_kwargs['k'] = 10


llm = ChatOpenAI()

chain = RetrievalQA.from_chain_type(
    llm=llm, 
    retriever=retriever,
    verbose=True
)
handler = StdOutCallbackHandler()

chain.run(
    query=question,
    callbacks=[handler]
)



[1m> Entering new RetrievalQA chain...[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: Use the following pieces of context to answer the user's question. 
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
Results. We show qualitative results in Fig. 12. SAM
can segment objects based on simple text prompts like “a
wheel” as well as phrases like “beaver tooth grille”. When
SAM fails to pick the right object from a text prompt only,
an additional point often fixes the prediction, similar to [31].
7.6. Ablations
We perform several ablations on our 23 dataset suite with
the single center point prompt protocol. Recall that a sin-
gle point may be ambiguous and that ambiguity may not

that annotators improved annotation quality and quantity over time.
3.How did you choose the specific wording of your task instructions? What
steps, if 

'The main idea of the paper is the development and evaluation of SAM, a model for segmenting objects in images based on text prompts. SAM can accurately segment objects based on simple text prompts like "a wheel" or more complex phrases like "beaver tooth grille." The paper discusses the success of SAM in accurately segmenting objects and how additional points can often fix any incorrect predictions. Additionally, the paper discusses ablations performed on a dataset suite and the importance of task, model, and data components in image segmentation.\n\nIn terms of results, the paper shows qualitative results in Figure 12, demonstrating SAM\'s capabilities in segmenting objects based on text prompts. The paper also mentions that the representations from SAM may be useful for various purposes, such as data labeling, understanding dataset contents, or as features for downstream tasks. Furthermore, the paper discusses a human study experimental design used to evaluate mask quality, as well 

# Loading Data into a Vector DB

In [None]:
%pip install pinecone-client

In [16]:
import pinecone 
from langchain.vectorstores import Pinecone

import os 
from dotenv import load_dotenv
load_dotenv()

pc_api_key = os.getenv("PINECONE_API_KEY")
pc_env = os.getenv("PINECONE_ENV")

pc = pinecone.Pinecone(api_key=pc_api_key)

In [17]:
pc_env

'default'

In [19]:
proj_name = 'demo'
db = Pinecone.from_documents(
    data, 
    embeddings, 
    index_name=proj_name
)


In [21]:

chain = RetrievalQA.from_chain_type(
    llm=llm, 
    retriever=db.as_retriever(),
    verbose=True
)

chain.invoke(input = question, 
             callbacks=[handler])



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


{'query': 'Explain the main idea and results of the paper.',
 'result': 'The main idea of the paper is the development of a system called SAM, which can segment objects in images based on text prompts. The system shows promising results in segmenting objects based on simple text prompts like "a wheel" or more complex phrases like "beaver tooth grille." Additionally, when SAM fails to pick the right object from a text prompt initially, an additional point often helps correct the prediction.\n\nThe results of the paper demonstrate the effectiveness of SAM in object segmentation based on text prompts, showcasing its ability to understand and segment objects accurately in images. The system\'s performance is highlighted through qualitative results shown in Figure 12, indicating its capability to interpret and act on textual instructions for image segmentation tasks.'}