- Here we look at how to build a standard retiever against a vector database
  - fetch nodes via top-k similarity
- Pinecone vector database

1. How to generate query embedding
2. Query vector database using different search modes
3. parse results into nodes
4. custom retrievers

- pinecone index

In [13]:
import pinecone
import os

pinecone.init(api_key=os.getenv("PINECONE_API_KEY"), environment="gcp-starter")

pinecone.create_index("quickstart", dimension=1536, metric="euclidean")

pinecone_index = pinecone.Index("quickstart")

In [14]:
from llama_index.vector_stores import PineconeVectorStore

vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

- load documents into vectorstore

In [8]:
!mkdir data
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"

mkdir: cannot create directory ‘data’: File exists
--2023-09-19 19:39:16--  https://arxiv.org/pdf/2307.09288.pdf
Resolving arxiv.org (arxiv.org)... 128.84.21.199
Connecting to arxiv.org (arxiv.org)|128.84.21.199|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 13661300 (13M) [application/pdf]
Saving to: ‘data/llama2.pdf’


2023-09-19 19:39:23 (2.01 MB/s) - ‘data/llama2.pdf’ saved [13661300/13661300]



In [9]:
from pathlib import Path
from llama_hub.file.pymu_pdf.base import PyMuPDFReader

In [15]:
loader = PyMuPDFReader()
documents = loader.load(file_path="./data/llama2.pdf")

In [16]:
from llama_index import VectorStoreIndex, ServiceContext
from llama_index.storage import StorageContext

In [17]:
service_context = ServiceContext.from_defaults(chunk_size=1024)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, service_context=service_context, storage_context=storage_context
)

Upserted vectors:   0%|          | 0/112 [00:00<?, ?it/s]

- Next, we'll define a retriever against this vector store to retrieve a set of nodes

In [22]:
query = "Can you tell me about the key concepts for safety finetuning?"

In [27]:
# 1. Generate query embedding
from llama_index.embeddings import OpenAIEmbedding

embed_model = OpenAIEmbedding()

query_embedding = embed_model.get_query_embedding(query)

In [44]:
# 2. Query the Vector Database
from llama_index.vector_stores import VectorStoreQuery

query_mode = "default"
# query_mode = "sparse"
# query_mode = "hybrid"

vector_store_query = VectorStoreQuery(query_str=query,
    query_embedding=query_embedding, similarity_top_k=2, mode=query_mode
)

query_result = vector_store.query(vector_store_query)
print(query_result)

VectorStoreQueryResult(nodes=[TextNode(id_='9ebba6fb-ee01-4c0a-990c-4eca010bb2cb', embedding=[-0.02406401, -0.00613856362, 0.0143711725, -0.0271455478, -0.0187133402, 0.0200720187, -0.0121300537, 0.0107083442, -0.0301990714, -0.0505372211, 0.00907653, 0.00329514453, -0.0100079952, -0.00833416, -0.000745872268, 0.00824311376, 0.0336728059, -0.00469934521, 0.0130475117, -0.00761980284, -0.0164161939, 0.00883140787, -0.0252546053, -0.0108834319, -0.0058093993, 0.00317608519, 0.030002974, -0.0310394913, -0.0210385, -0.00335467421, -0.00127901335, 0.00283991732, 0.00449974602, -0.0054767332, -0.00506702904, -0.0146092912, 0.00794896763, 0.000868433446, 0.0319079235, -0.0203101374, 0.0246943254, 0.0199459549, -0.00405852543, -0.0272856187, 0.0014829901, 0.0196097866, 0.013803889, -0.0226072837, -0.033952944, 0.0201840736, 0.0447943583, 0.0110585196, -0.0197218433, -0.000518696383, -0.00103476644, 0.00655877357, 0.0182371028, 0.00781590119, 0.0126202991, -0.0005541516, -0.00615957426, 0.00933

In [46]:
# 3. Parse results into Nodes
# Construct a NodeWithScore object from the nodes
from llama_index.schema import NodeWithScore
from typing import Optional

nodes_with_scores = []

for idx, node in enumerate(query_result.nodes):
    score: Optional[float] = None

    if query_result.similarities is not None:
        score = query_result.similarities[idx]
    
    nodes_with_scores.append(NodeWithScore(node=node, score=score))

In [52]:
from llama_index.response.notebook_utils import display_source_node

for node in nodes_with_scores:
    display_source_node(node, show_source_metadata=True, source_length=1024)

**Node ID:** 9ebba6fb-ee01-4c0a-990c-4eca010bb2cb<br>**Similarity:** 0.339785933<br>**Text:** advice). The attack vectors explored consist of psychological manipulation (e.g., authority manipulation),
logic manipulation (e.g., false premises), syntactic manipulation (e.g., misspelling), semantic manipulation
(e.g., metaphor), perspective manipulation (e.g., role playing), non-English languages, and others.
We then define best practices for safe and helpful model responses: the model should first address immediate
safety concerns if applicable, then address the prompt by explaining the potential risks to the user, and finally
provide additional information if possible. We also ask the annotators to avoid negative user experience
categories (see Appendix A.5.2). The guidelines are meant to be a general guide for the model and are
iteratively refined and revised to include newly identified risks.
4.2.2
Safety Supervised Fine-Tuning
In accordance with the established guidelines from Section 4.2.1, we gather prompts and demonstrations
of safe model responses from trained annotators, and use the data for...<br>**Metadata:** {'total_pages': 77, 'file_path': './data/llama2.pdf', 'source': '24'}<br>

**Node ID:** 710de9d9-7f5f-410b-997d-93a1be29ff36<br>**Similarity:** 0.382039309<br>**Text:** TruthfulQA ↑
ToxiGen ↓
MPT
7B
29.13
22.32
30B
35.25
22.61
Falcon
7B
25.95
14.53
40B
40.39
23.44
Llama 1
7B
27.42
23.00
13B
41.74
23.08
33B
44.19
22.57
65B
48.71
21.77
Llama 2
7B
33.29
21.25
13B
41.86
26.10
34B
43.45
21.19
70B
50.18
24.60
Table 11: Evaluation of pretrained LLMs on automatic safety benchmarks. For TruthfulQA, we present the
percentage of generations that are both truthful and informative (the higher the better). For ToxiGen, we
present the percentage of toxic generations (the smaller, the better).
Benchmarks give a summary view of model capabilities and behaviors that allow us to understand general
patterns in the model, but they do not provide a fully comprehensive view of the impact the model may have
on people or real-world outcomes; that would require study of end-to-end product deployments. Further
testing and mitigation should be done to understand bias and other social issues for the specific context
in which a system may be deployed. For this, it may be necessary to test beyond the g...<br>**Metadata:** {'total_pages': 77, 'file_path': './data/llama2.pdf', 'source': '23'}<br>

In [60]:
# 4. Put in a retriever that can be used in LlamaIndex workflows

from llama_index import QueryBundle
from llama_index.retrievers import BaseRetriever
from typing import Any, List

from llama_index.schema import NodeWithScore

class PineconeRetriever(BaseRetriever):
    """Retriever for Pinecode Vector Store"""

    def __init__(
        self,
        vector_store: PineconeVectorStore,
        embed_model: Any,
        query_mode: str = "default",
        similarity_top_k: int = 2,
    ):
        """Init params"""
        self._vector_store = vector_store
        self._embed_model = embed_model
        self._query_mode = query_mode
        self._similarity_top_k = similarity_top_k

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        """Retrieve"""

        query_embedding = embed_model.get_query_embedding(query)
        vector_store_query = VectorStoreQuery(
            query_embedding=query_embedding,
            similarity_top_k=self._similarity_top_k,
            mode=self._query_mode,
        )
        query_result = self._vector_store.query(vector_store_query)

        nodes_with_scores = []
        for idx, node in enumerate(query_result.nodes):
            score: Optional[float] = None

            if query_result.similarities is not None:
                score = query_result.similarities[idx]

            nodes_with_scores.append(NodeWithScore(node=node, score=score))

        return nodes_with_scores

In [61]:
retriever = PineconeRetriever(
    vector_store=vector_store, embed_model=embed_model, query_mode="default",
    similarity_top_k=2
)

retrieved_nodes = retriever.retrieve(query)
for node in retrieved_nodes:
    display_source_node(node, show_source_metadata=True, source_length=1024)

**Node ID:** 9ebba6fb-ee01-4c0a-990c-4eca010bb2cb<br>**Similarity:** 0.338965535<br>**Text:** advice). The attack vectors explored consist of psychological manipulation (e.g., authority manipulation),
logic manipulation (e.g., false premises), syntactic manipulation (e.g., misspelling), semantic manipulation
(e.g., metaphor), perspective manipulation (e.g., role playing), non-English languages, and others.
We then define best practices for safe and helpful model responses: the model should first address immediate
safety concerns if applicable, then address the prompt by explaining the potential risks to the user, and finally
provide additional information if possible. We also ask the annotators to avoid negative user experience
categories (see Appendix A.5.2). The guidelines are meant to be a general guide for the model and are
iteratively refined and revised to include newly identified risks.
4.2.2
Safety Supervised Fine-Tuning
In accordance with the established guidelines from Section 4.2.1, we gather prompts and demonstrations
of safe model responses from trained annotators, and use the data for...<br>**Metadata:** {'total_pages': 77, 'file_path': './data/llama2.pdf', 'source': '24'}<br>

**Node ID:** 710de9d9-7f5f-410b-997d-93a1be29ff36<br>**Similarity:** 0.381277442<br>**Text:** TruthfulQA ↑
ToxiGen ↓
MPT
7B
29.13
22.32
30B
35.25
22.61
Falcon
7B
25.95
14.53
40B
40.39
23.44
Llama 1
7B
27.42
23.00
13B
41.74
23.08
33B
44.19
22.57
65B
48.71
21.77
Llama 2
7B
33.29
21.25
13B
41.86
26.10
34B
43.45
21.19
70B
50.18
24.60
Table 11: Evaluation of pretrained LLMs on automatic safety benchmarks. For TruthfulQA, we present the
percentage of generations that are both truthful and informative (the higher the better). For ToxiGen, we
present the percentage of toxic generations (the smaller, the better).
Benchmarks give a summary view of model capabilities and behaviors that allow us to understand general
patterns in the model, but they do not provide a fully comprehensive view of the impact the model may have
on people or real-world outcomes; that would require study of end-to-end product deployments. Further
testing and mitigation should be done to understand bias and other social issues for the specific context
in which a system may be deployed. For this, it may be necessary to test beyond the g...<br>**Metadata:** {'total_pages': 77, 'file_path': './data/llama2.pdf', 'source': '23'}<br>

- Plug into `RetrieverQueryEngine` to synthesize a response (higher-level abstraction)

In [62]:
from llama_index.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args(retriever)

response = query_engine.query(query)

print(str(response))

The key concepts for safety fine-tuning include supervised safety fine-tuning, safety RLHF (Reinforcement Learning from Human Feedback), and safety context distillation. Supervised safety fine-tuning involves gathering adversarial prompts and safe demonstrations to train the model to align with safety guidelines. Safety RLHF integrates safety into the RLHF pipeline by training a safety-specific reward model and gathering challenging adversarial prompts for fine-tuning. Safety context distillation refines the RLHF pipeline by generating safer model responses using a safety preprompt and fine-tuning the model on these responses without the preprompt. These concepts are used to mitigate safety risks and improve the safety of the model's responses.
