# Using the Pinecone Retrieval App

In this walkthrough we will see how to use the retrieval API with a Pinecone datastore for *semantic search / question-answering*.

Before running this notebook you should have already initialized the retrieval API and have it running locally or elsewhere. The full instructions for doing this are found in the [project README]().

We will summarize the instructions (specific to the Pinecone datastore) before moving on to the walkthrough.

## Preparing Data

In this example, we will use the **S**tanford **Qu**estion **A**nswering **D**ataset (SQuAD), which we download from Hugging Face Datasets.

In [1]:
from datasets import load_dataset

data = load_dataset("squad", split="train")
data = data.to_pandas()
len(data)

  from .autonotebook import tqdm as notebook_tqdm


87599

The dataset contains a lot of duplicate `context` paragraphs, this is because each `context` can have many relevant questions. We don't want these duplicates so we remove like so:

In [2]:
data = data.drop_duplicates(subset=["context"])
data = data[:100]
len(data)

100

In [11]:
import os
from openai import OpenAI

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")

client = OpenAI(api_key=OPENAI_API_KEY)

documents = [
    {
        'id': r['id'],
		'values': client.embeddings.create(input=r['context'], model="text-embedding-3-large").data[0].embedding,
        'metadata': {
            'title': r['title'],
			'content': r['context']
        }
    } for r in data.to_dict(orient='records')
]

In [5]:
from pinecone import Pinecone

doc_ids = []
vectors = []

PINECONE_API_KEY = os.environ.get("PINECONE_API_KEY")
PINECONE_ENVIRONMENT = os.environ.get("PINECONE_ENVIRONMENT")
PINECONE_INDEX = os.environ.get("PINECONE_INDEX")

pc = Pinecone(api_key=PINECONE_API_KEY)
index = pc.Index(PINECONE_INDEX)

for doc in documents:
	# Append the id to the ids list
	doc_ids.append(doc['id'])
	pinecone_metadata = {}
	# Add the text and document id to the metadata dict
	pinecone_metadata["text"] = doc['text']
	pinecone_metadata["document_id"] = doc['id']
	vector = (doc['id'], doc['embeddings'] ,pinecone_metadata)
	vectors.append(vector)

UPSERT_BATCH_SIZE = 100

batches = [
	vectors[i : i + UPSERT_BATCH_SIZE]
	for i in range(0, len(vectors), UPSERT_BATCH_SIZE)
]
# Upsert each batch to Pinecone
for batch in batches:
	try:
		index.upsert(vectors=batch)
	except Exception as e:
		raise e

We're now ready to begin indexing (or *upserting*) our `documents`. To make these requests to the retrieval app API, we will need to provide authorization in the form of the `BEARER_TOKEN` we set earlier. We do this below:

In [13]:
import os
from dotenv import load_dotenv

_ = load_dotenv()


PINECONE_API_KEY = os.environ.get("PINECONE_API_KEY")
assert PINECONE_API_KEY is not None

headers = {
    "Api_key": PINECONE_API_KEY,
	'Content-Type': 'application/json'
}

We'll perform the upsert in batches of `batch_size`. Make sure that the `endpoint_url` variable is set to the correct location for your running *retrieval-app* API.

In [14]:
from tqdm.auto import tqdm
import requests
from requests.adapters import HTTPAdapter, Retry

batch_size = 100
endpoint_url = "http://0.0.0.0:8000"
s = requests.Session()

# we setup a retry strategy to retry on 5xx errors
retries = Retry(
    total=5,  # number of retries before raising error
    backoff_factor=0.1,
    status_forcelist=[500, 502, 503, 504]
)
s.mount('http://', HTTPAdapter(max_retries=retries))

# Upsert vectors in batches
for i in tqdm(range(0, len(documents), batch_size)):
    i_end = min(len(documents), i + batch_size)
    batch = documents[i:i_end]
    # Make POST request with retries
    res = s.post(
        f"{endpoint_url}/vectors/upsert",
        headers=headers,
        json={
            "vectors": batch
        }
    )
    if res.status_code != 200:
        print(f"Failed to upsert batch {i} to {i_end}: {res.text}")

100%|██████████| 1/1 [00:02<00:00,  2.42s/it]


With that our SQuAD records have all been indexed and we can move on to querying.

### Making Queries

To query the datastore all we need to do is pass one or more queries to the `/query` endpoint. We can take a few questions from SQuAD:

In [15]:
queries = data['question'].tolist()
# format into the structure needed by the /query endpoint
queries = [{'query': queries[i]} for i in range(len(queries))]

100

In [26]:
query_payload = []

for query in queries:
	query_payload.append({
		"vector": client.embeddings.create(input=query, model="text-embedding-3-large").data[0].embedding
	})

In [27]:
query_payload[1]

{'vector': [0.013733557425439358,
  -0.00022022299526724964,
  -0.015625109896063805,
  0.002531602280214429,
  -0.01315289456397295,
  0.01785978302359581,
  -0.00045226819929666817,
  0.044904597103595734,
  -0.03267548605799675,
  0.05032411590218544,
  0.01782459206879139,
  -0.016469711437821388,
  -0.025038888677954674,
  -0.011446096934378147,
  -0.011278936639428139,
  0.031144646927714348,
  -0.03395998105406761,
  0.00559108005836606,
  -0.008824316784739494,
  -0.014314220286905766,
  0.007627798710018396,
  0.01886274665594101,
  0.01979532651603222,
  0.015361173078417778,
  0.011534076184034348,
  0.014437391422688961,
  0.007495830301195383,
  0.017578249797225,
  -0.020463967695832253,
  -0.023719199001789093,
  0.005067603662610054,
  0.02025281824171543,
  0.01997128501534462,
  0.0025997860357165337,
  -0.04184291884303093,
  -0.015519535169005394,
  0.006035374943166971,
  -0.048599723726511,
  -0.015537131577730179,
  0.023208919912576675,
  -0.0037963036447763443,

We will use just the first *three* questions:

In [33]:
res = s.post(
    f"{endpoint_url}/query",
    headers=headers,
    json=query_payload[1]
)

if res.status_code == 200:
    results = res.json()
    print("Query results:", results)
else:
    print(f"Failed to query vectors: {res.status_code} {res.text}")

Query results: {'results': [], 'matches': [{'id': '5733b5344776f419006610dd', 'score': 0.109763727, 'values': [], 'metadata': {'content': 'As of 2012[update] research continued in many fields. The university president, John Jenkins, described his hope that Notre Dame would become "one of the pre–eminent research institutions in the world" in his inaugural address. The university has many multi-disciplinary institutes devoted to research in varying fields, including the Medieval Institute, the Kellogg Institute for International Studies, the Kroc Institute for International Peace studies, and the Center for Social Concerns. Recent research includes work on family conflict and child development, genome mapping, the increasing trade deficit of the United States with China, studies in fluid mechanics, computational science and engineering, and marketing trends on the Internet. As of 2013, the university is home to the Notre Dame Global Adaptation Index which ranks countries annually based 

Now we can loop through the responses and see the results returned for each query:

In [29]:
for query_result in res.json()['results']:
    query = query_result['query']
    answers = []
    scores = []
    for result in query_result['results']:
        answers.append(result['text'])
        scores.append(round(result['score'], 2))
    print("-"*70+"\n"+query+"\n\n"+"\n".join([f"{s}: {a}" for a, s in zip(answers, scores)])+"\n"+"-"*70+"\n\n")

The top results are all relevant as we would have hoped. With that we've finished. The retrieval app API can be shut down, and to save resources the Pinecone index can be deleted within the [Pinecone console](https://app.pinecone.io/).